# Neural networks (1) - Emory University

Neural networks (1) Traditional multi-layer perceptrons http://neuralnetworksanddeeplearning.com/ Neural network K-class classification: K nodes in top layer

Continuous outcome: Single node in top layer Zm are created from linear combinations of the inputs, Yk is modeled as a function of linear combinations of the Zm For regression, typically K = 1,

K-class classification: Or in more general terms: regression: classification: Neural network

An old activation function Neural network Other activation functions are used. We will continue using sigmoid function for this discussion.

Neural network A simple network with linear functions. bias: intercept y1: x1 + x2 + 0.5 0 y2: x1 +x2 1.5 0

z1 = +1 if and only if both y1=1 and y2=-1 Neural network Neural network Fitting Neural Networks

Set of parameters (weights): Objective function: Regression (typically K=1): Classification: cross-entropy (deviance) Fitting Neural Networks

minimizing R() ) ) is by gradient descent, called back-propagation Middle-layer values for each data point: We use the square error loss for demonstration: Fitting Neural Networks

Rules of derivatives used here: Sum rule & constant multiple rule: Chain rule: (f(g(x))) = f(g(x)) g(x) Note: we are going to take derivatives against the coefficients .

Fitting Neural Networks = Derivatives: k Descent along the gradient:

m l :learning rate i: observation index

Fitting Neural Networks By definition Fitting Neural Networks General workflow of back-propagation: Forward: fix weights and compute

Backward: compute back propagate to compute use both and to compute the gradients update the weights Fitting Neural Networks Fitting Neural Networks

Can use parallel computing - each hidden unit passes and receives information only to and from units that share a connection. Online training the fitting scheme allows the network to handle very large training sets, and also to update the weights as new observations come in. Training neural network is an art the model is generally over-parametrized optimization problem is non-convex and unstable

A neural network model is a blackbox and hard to directly interpret Fitting Neural Networks Initiation When weight vectors are close to length zero all Z values are close to zero. The sigmoid curve is close to linear. the overall model is close to linear.

a relatively simple model. (This can be seen as a regularized solution) Start with very small weights. Let the neural network learn necessary nonlinear relations from the data. Starting with large weights often leads to poor solutions. Fitting Neural Networks

Overfitting The model is too flexible, involving too many parameters. May easily overfit the data. Early stopping do not let the algorithm converge. Because the model starts with linear, this is a regularized solution (towards linear). Explicit regularization (weight decay) minimize tends to shrink smaller weights more.

Cross-validation is used to estimate . Fitting Neural Networks Fitting Neural Networks Fitting Neural Networks

Number of Hidden Units and Layers Too few might not have enough flexibility to capture the nonlinearities in the data Too many overly flexible, BUT extra weights can be shrunk toward zero if appropriate regularization is used. Examples

A radial function is in a sense the most difficult for the neural net, as it is spherically symmetric and with no preferred directions. Examples Examples Going beyond single hidden layer

An old benchmark problem: classification of handwritten numerals. Going beyond single hidden layer 5x5 1 No weight sharing

each of the units in a single 8 8 feature map share the same set of nine weights (but have their own bias parameter)

3x3 1 5x5 1 3x3 1 Decision boundaries of parallel lines

same operation on different parts weight shared Going beyond single hidden layer A training epoch: one sweep

through the entire training set. Too few: underfitting Too many: potential overfitting

## Recently Viewed Presentations

• Cant Strip Detail Wall CIM 1000 60mils CIM 1000TG Cant Strip Floor Substrate 1. Apply cant strip of CIM 1000TG at least ½ inch to wall and floor 2. Apply CIM 1000 to minimum thickness of 60 wet mils. Notes:...
• XAML alone will work great for many apps - retained-mode graphics with smooth 60 fps animations. SurfaceImageSource for close integration with XAML composition at a per-element level. VirtualSurfaceImageSource for scrolling/zooming large-scale DirectX content.
• Invoice review and approvals . Wide Area Work Flow e-Business Suite. Access - the "Acceptor Role" Monthly contractor invoice submittal. COR Review/Approval for contract payments.
• Your High-Level Overview of the Components Provided by ESP Solutions Group Developing a Disaster Prevention and Recovery Plan First: Decide how much help you need to develop your DPRP. Second: Find out how ready you are to recover from a...
• Jim crow a set of laws that caused segregation in the country lasted until the late 1960's. I wish I could go to art school to learn how to paint and drawing too. ... Find one synonym that for the...
• Station 1: Sumo Squat Purpose: This exercise develops strength and mobility of the hips, legs and lower back muscles. Starting Position: Straddle stance with the feet slightly wider than the shoulders and the toes pointing outward. Hold a single kettlebell...
• God will not let a situation endanger our eternal welfare against our will, / but because death helps us acknowledge who we are as limited creatures in need of God, / He may allow suffering and even death to confront...
• Ovid emphasizes Icarus's adventurousness, whereas Sexton emphasizes Icarus's timidity. Ovid believes the goddess Pallas is the true hero of the myth, whereas Sexton believes Daedalus is the true hero. Ovid considers Icarus's flight an act of human arrogance, whereas Sexton...