MF for collaborative filtering What is collaborative filtering? Recovering latent factors in a matrix n users m movies V[i,j] = user is rating of movie j

MF for image modeling PC1 MF for images 1000 images 1000 * 10,000,00 10,000 pixels 2 prototypes PC2 V[i,j] = pixel j in image i MF for modeling text

Recovering latent factors in a matrix The Neatest Little Guide to Stock Market Investing doc term matrix m terms Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns

Share of Stock Market Returns b1 b2 bm x2 y2 The Little Book of Value Investing .. .. Value Investing: From Graham to Buffett and Beyond Rich Dads Guide to Investing: What the vij

V[i,j] = TFIDF score of term j in doc i https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis19 lsa-tutorial/ n documents

Investing for real estate Rich Dads Advisors: The ABCs of Real Estate Investment The little book of common sense investing: Neatest Little Guide to Stock Market Investing MF vs other learning tasks

k-means Clustering Each point is in one cluster Each cluster is a weighted average of points centroids

Matrix factorization as SGD - why does this work? Heres the key claim: Checking the claim Think for SGD for logistic regression LR loss = compare y and = dot(w,x) similar but now update w (user weights) and x (movie weight) What loss functions are possible?

output 2 Vectorizing logistic regression Many ML methods can be rewritten using nothing but vector-matrix operations (vectorizing) Why do this? Simpler (once you understand it well) Faster (given the right infrastructure - e.g., numpy, GPUs, ) Can simplify optimization (more later) 58 Vectorized minibatch logistic regression Computation wed like to vectorize: For each x in the minibatch, compute For each feature j: update w

using j 59 Vectorizing logistic regression Computation wed like to parallelize: For each x in the minibatch Xbatch, compute 1 1 hh = 1 [

1 1 = ][ ] [ ] 60

Vectorizing logistic regression Computation wed like to parallelize: For each x in the minibatch Xbatch, compute +1 [ ] in numpy if M is a matrix M+1 does the right thing so does M.exp(), M.dot(), 61 Vectorizing logistic regression

Computation wed like to parallelize: For each x in the minibatch, compute def logistic(X): return (X.exp() +1).reciprocal() p = logistic(Xb.dot(w)) # B rows, 1 column 62 Binary to softmax logistic regression 1 1 hh = 1 [

1 1 = ][ ] [ ]

63 Binary to softmax logistic regression exp ( ) exp ( ) X

XW 64 http://minpy.readthedocs.io/en/latest/get-started/logistic_regression Matrix multiply,; then exponentiate component-wise prob will have B rows and K columns, and each row will sum to 1 exp ( ) exp ( )

Sum the columns to get the denominator; keepdim=True means that this line will work correctly even though a and a_sum have different shapes XW 65 http://minpy.readthedocs.io/en/latest/get-started/logistic_regression 66 http://minpy.readthedocs.io/en/latest/get-started/logistic_regression 67

http://minpy.readthedocs.io/en/latest/get-started/logistic_regression x.T dy Error on each example x in batch and each class y python bug: should be x.T (transpose) The gradient step! 68 Vectorizing logistic regression Many ML methods can be rewritten

using nothing but vector-matrix operations (vectorizing) Why do this? Simpler (once you understand it well) Faster (given the right infrastructure - e.g., numpy, GPUs, ) Can simplify optimization (more later) 69