2018-02-06  1,017 views 评论

# 吴恩达深度学习课程 DeepLearning.ai 编程作业（2-1）Part.1

标签：

l  加速梯度收敛

l  增加梯度下降收敛到较低的训练（和泛化）错误的几率

## 1 – 深度网络模型

• Zeros initialization -- setting initialization = "zeros"      in the input argument.

• Random initialization -- setting initialization = "random"      in the input argument. This initializes the weights to large random      values.

• He initialization -- setting initialization = "he" in      the input argument. This initializes the weights to random values scaled      according to a paper by He et al., 2015.

Instructions: 请快速阅读下面的代码，并运行它。 在下一部分中，您将实现这个model()调用的三个初始化方法。

## 2 – 零初始化

• 深度网络(W[1],W[2],W[3],...,W[L1],W[L])(W[1],W[2],W[3],...,W[L1],W[L])

• 偏差值向量(b[1],b[2],b[3],...,b[L1],b[L])(b[1],b[2],b[3],...,b[L1],b[L])

Exercise: Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry", but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

用以下代码来做测试：

Cost after iteration 0: 0.6931471805599453
Cost after iteration 1000: 0.6931471805599453
Cost after iteration 2000: 0.6931471805599453
Cost after iteration 3000: 0.6931471805599453
Cost after iteration 4000: 0.6931471805599453
Cost after iteration 5000: 0.6931471805599453
Cost after iteration 6000: 0.6931471805599453
Cost after iteration 7000: 0.6931471805599453
Cost after iteration 8000: 0.6931471805599453
Cost after iteration 9000: 0.6931471805599453
Cost after iteration 10000: 0.6931471805599455
Cost after iteration 11000: 0.6931471805599453
Cost after iteration 12000: 0.6931471805599453
Cost after iteration 13000: 0.6931471805599453
Cost after iteration 14000: 0.6931471805599453

On the train set:

Accuracy: 0.5

On the test set:

Accuracy: 0.5

• 为了打破对称，权重W[l]应当被随机初始化。

• 将b[l]初始化为0是可以的，只要权重矩阵随机初始化就可以打破对称性。

## 3 – 随机初始化

Exercise: 执行接下来的函数来用较大的数值进行随机初始化，将偏差值初始化为0。Use np.random.randn(..,..) * 10 for weights and np.zeros((.., ..)) for biases. We are using a fixed np.random.seed(..) to make sure your "random" weights match ours, so don't worry if running several times your code gives you always the same initial values for the parameters.

Cost after iteration 0: inf
Cost after iteration 1000: 0.6237287551108738
Cost after iteration 2000: 0.5981106708339466
Cost after iteration 3000: 0.5638353726276827
Cost after iteration 4000: 0.550152614449184
Cost after iteration 5000: 0.5444235275228304
Cost after iteration 6000: 0.5374184054630083
Cost after iteration 7000: 0.47357131493578297
Cost after iteration 8000: 0.39775634899580387
Cost after iteration 9000: 0.3934632865981078
Cost after iteration 10000: 0.39202525076484457
Cost after iteration 11000: 0.38921493051297673
Cost after iteration 12000: 0.38614221789840486
Cost after iteration 13000: 0.38497849983013926
Cost after iteration 14000: 0.38278397192120406

On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86

代价函数在开始的时候非常高。 这是因为对于大的随机值权重，最后一次激活（sigmoid）输出的结果非常接近于0或1，并且当这个例子错误时，这个例子会导致很高的损失。 事实上，当log（a [3]）= log（0）log（a [3]）=log（0）时，损失趋于无穷大。

初始化不良会导致渐变/爆炸渐变，这也会减慢优化算法的速度。

如果你长时间训练这个网络，你会看到更好的结果，但是用过大的随机数初始化会减慢优化速度。

• 初始化权重矩阵的值非常大的结果并不好。

• 较小随机值的理想随机表现的更好。

## 4 – He初始化

Exercise: Implement the following function to initialize your parameters with He initialization.

Hint: This function is similar to the previous initialize_parameters_random(...). The only difference is that instead of multiplying np.random.randn(..,..) by 10, you will multiply it byof the previous layer, which is what He initialization recommends for layers with a ReLU activation.

## 5 – 结论

 Model Train accuracy Problem/Comment 3-layer NN with zeros initialization 50% fails to break symmetry 3-layer NN with large random initialization 83% too large weights 3-layer NN with He initialization 99% recommended method

（1）不同的初始化导致不同的结果
（2）随机初始化用于破坏对称性，并确保不同的隐藏单元可以学习不同的东西
（3）不要初始化太大的值
（4）HE初始化适用于ReLU激活的网络。