2018-02-07  2,233 views 评论

# 吴恩达深度学习课程 DeepLearning.ai 编程作业（2-1）Part.2

Let's first import the packages you are going to use.

Problem Statement: 假设你刚刚被法国足球公司聘为AI专家。 他们希望你推荐法国守门员应该踢球的位置，这样法国队的球员用头击出。

## 1 – 非正则化模型

• regularization mode -- 通过将lambd输入设置为非零值。 我们使用“lambd”而不是“lambda”，因为“lambda”是Python中的保留关键字。

• in dropout mode -- 通过将keep_prob设置为小于1的值

• L2 regularization -- functions: "`compute_cost_with_regularization()`" and "`backward_propagation_with_regularization()`"

• Dropout --      functions: "`forward_propagation_with_dropout()`" and "`backward_propagation_with_dropout()`"

Cost after iteration 0: 0.6557412523481002

Cost after iteration 10000: 0.1632998752572419

Cost after iteration 20000: 0.13851642423239133

On the training set:

Accuracy: 0.947867298578

On the test set:

Accuracy: 0.915

## 2 - L2 正则化

Exercise: 执行 `compute_cost_with_regularization()` 来计算式（2）给的计算代价的方式。

cost = 1.78648594516

Exercise: Implement the changes needed in backward propagation to take into account regularization. The changes only concern dW1, dW2 and dW3. For each, you have to add the regularization term's gradient

 dW1 [[-0.25604646 0.12298827 -0.28297129] [-0.17706303   0.34536094 -0.4410571 ]] dW2 [[ 0.79276486 0.85133918] [-0.0957219 -0.01720463]   [-0.13100772 -0.03750433]] dW3 [[-1.77691347 -0.11832879 -0.09397446]]

compute_cost_with_regularization而不是compute_cost

backward_propagation_with_regularization而不是backward_propagation

Cost after iteration 0: 0.6974484493131264

Cost after iteration 10000: 0.2684918873282239

Cost after iteration 20000: 0.2680916337127301

On the train set:

Accuracy: 0.938388625592

On the test set:

Accuracy: 0.93

L2正则化实际上在做什么？

L2规则化依赖于这样的假设，即具有小权重的模型比具有大权重的模型简单。

L2正则化对以下内容的影响：

正则化术语被添加到成本中

在权重矩阵的梯度中有额外的术语

权重变小（“权重衰减”）：

权重被推到较小的值。

## 3 - Dropout

A3 =[[ 0.36974721 0.00305176 0.04565099 0.49683389 0.36974721]]

### 3.1 – 带有Dropout的反向传播

Exercise: Implement the backward propagation with dropout. As before, you are training a 3 layer network. Add dropout to the first and second hidden layers, using the masks D[1] and D[2] stored in the cache.

Instruction: Backpropagation with dropout is actually quite easy. You will have to carry out 2 Steps:

1. You had previously shut down some neurons during forward propagation, by applying a mask D[1] to A1. In backpropagation, you will have to shut down the same neurons, by reapplying the same mask D[1] to dA1.

2. During forward propagation, you had divided A1 by keep_prob. In backpropagation, you'll therefore have to divide dA1 by keep_prob again (the calculus interpretation is that if A[1] is scaled by keep_prob, then its derivative dA[1] is also scaled by the same keep_prob).

dA1 = [[ 0.36544439  0.         -0.00188233  0.         -0.17408748]

[ 0.65515713  0.         -0.00337459  0.         -0.        ]]

dA2 = [[ 0.58180856  0.         -0.00299679  0.         -0.27715731]

[ 0.          0.53159854 -0.          0.53159854 -0.34089673]

[ 0.          0.         -0.00292733  0.         -0.        ]]

Cost after iteration 0: 0.6543912405149825

Cost after iteration 10000: 0.061016986574905605

Cost after iteration 20000: 0.060582435798513114

On the train set:

Accuracy: 0.928909952607

On the test set:

Accuracy: 0.95

1.dropout只能在训练中使用。

1.dropout是一种正则化技术。

2.只能在训练期间只能使用dropout。 测试期间不要使用dropout。

3.在前向传播和反向传播期间都是用dropout。

4.在训练期间，通过keep_prob分隔每个丢失层，以保持激活的相同期望值。 例如，如果keep_prob是0.5，那么我们将平均关闭一半的节点，所以输出将被缩放0.5，因为只剩下一半对解决方案有贡献。 除以0.5相当于乘以2。因此，输出现在具有相同的期望值。 即使keep_prob是0.5以外的值，你也可以检查它是否有效。

## 4 - Conclusions

Here are the results of our three models:

 model train accuracy test accuracy 3-layer NN without regularization 95% 91.5% 3-layer NN with L2-regularization 94% 93% 3-layer NN with dropout 93% 95%

Note that regularization hurts training set performance! This is because it limits the ability of the network to overfit to the training set. But since it ultimately gives better test accuracy, it is helping your system.

Congratulations for finishing this assignment! And also for revolutionizing French football.