lec4. Backpropagation & Neural Networks

cs231n

lec4. Backpropagation & Neural Networks

pasongsongeggtak 2021. 7. 8. 15:32

★ 1. Backpropagation

2. Neural Networks

1. Backpropagation

- Computational Graphs는 이렇게 생겼다. 예시다.

1) Backpropagation: Upstream Gradient x Local Gradient

[Simple Example]

- $ f(x,y,z) = (x+y)z $

- $ x=-2, y=5, z=-4 $

- want to know $ \frac{ \delta f }{ \delta x}, \frac{ \delta f }{ \delta y}, \frac{ \delta f }{ \delta z} $

① local gradient를 구한다.

② Upstream Gradient와 Local Gradient를 곱한다.

[Another Example]

- $ f(W, x) = \frac{1}{1+{e}^{-(w_{0}x_{0}+w_{1}x_{1}+w_{2})}} $

- $ w_{0}=2, x_{0}=1, w_{1} = -3, x_{1} =-2, w_{2} = -3 $

- Patterns in backward flow

① add gate: gradient distributor, upstream gradient가 그대로 전달

② max gate: gradient router, forward pass에서 값이 큰 쪽으로 gradient 몰빵, 다른 쪽은 0

③ mul gate: gradient switcher, (upstream gradient) x (반대 편의 forward pass 값)

2) Vectorized Operations

- Gradients for vectorized code

- Jacobian matrix: 다변수벡터의 도함수 행렬, 결과값 벡터에서 차원별로 편미분을 모두 구함

- Vectorized Operations

* 강의에서 나온 질문들

ⓛ What is the size of the Jacobian matrix? (4096 x 4096)

: 전체 minibatch를 한번에 다룸. 예를 들어, 100이라 치면 Jacobian matrix는 409,600 x 409,600 matrix 크기를 가짐

② What does it look like?

- $ f(x, W) = {|| W \cdot x ||}^2 = \sum_{i=1}^n {(W \cdot x)}^2 $

- $ x \in {R}^n, W \in {R}^n $

2. Neural Networks

(Before) Linear Score function: $ f = W_1x $

(Now) 2-layer Neural Network: $ f = W_2 \max (0, W_1x) $

3-layer Neural Network: $ f = W_3 \max (0, W_2 \max (0, W_1x)) $

...

n-layer Neural Network

1) Activation Functions

2) Neural Networks

# forward-pass of a 3-layer neural network:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function: sigmoid
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1,x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2,h1) + b2) # calcuate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)