Planar data classification with one hidden layer

1 – Packages

Let’s first import all the packages that you will need during this assignment. 
– numpy is the fundamental package for scientific computing with Python. 
– sklearn provides simple and efficient tools for data mining and data analysis. 
– matplotlib is a library for plotting graphs in Python. 
– testCases_v2 provides some test examples to assess the correctness of your functions 
– planar_utils provide various useful functions used in this assignment

2 – Dataset

First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.

X, Y = load_planar_dataset()

Visualize the dataset using matplotlib. The data looks like a “flower” with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data.

plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40,;#将Y中的单维去掉

Exercise: How many training examples do you have? In addition, what is the shape of the variables X and Y?

shape_X = X.shapeprint("X's shape is " + str(shape_X))shape_Y = Y.shapeprint("Y's shape is " + str(shape_Y))m=X.shape[1]print("We have " + str(m) + " examples.")


3 – Simple Logistic Regression

Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset. 

# Train the logistic regression classifierclf = sklearn.linear_model.LogisticRegressionCV();, Y.T);# Plot the decision boundary for logistic regressionplot_decision_boundary(lambda x: clf.predict(x), X, np.squeeze(Y))plt.title("Logistic Regression") # Print accuracyLR_predictions = clf.predict(X.T)print ('Accuracy of logistic regression: %d ' % float((,LR_predictions) +,1-LR_predictions))/float(Y.size)*100) + '% ' +        "(percentage of correctly labelled datapoints)")


4 – Neural Network model

Logistic regression did not work well on the “flower dataset”. You are going to train a Neural Network with a single hidden layer.

Reminder: The general methodology to build a Neural Network is to: 
1. Define the neural network structure ( # of input units, # of hidden units, etc). (
2. Initialize the model’s parameters(
3. Loop: 
– Implement forward propagation(
– Compute loss
– Implement backward propagation to get the gradients(
– Update parameters (gradient descent)(

You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you’ve built nn_model() and learnt the right parameters, you can make predictions on new data.您经常构建帮助函数来计算步骤1-3,然后将它们合并到一个函数中,我们称之为nn_model()。一旦你建立了nn_model()并学习了正确的参数,你就可以预测新的数据。

4.1 – Defining the neural network structure

Exercise: Define three variables: 
– n_x: the size of the input layer 
– n_h: the size of the hidden layer (set this to 4) 
– n_y: the size of the output layer 

Hint: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.

# GRADED FUNCTION: layer_sizes def layer_sizes(X, Y):    """    Arguments:    X -- input dataset of shape (input size, number of examples)    Y -- labels of shape (output size, number of examples)     Returns:    n_x -- the size of the input layer    n_h -- the size of the hidden layer    n_y -- the size of the output layer    """    ### START CODE HERE ### (≈ 3 lines of code)    n_x = X.shape[0] # size of input layer    n_h = 4    n_y = Y.shape[0] # size of output layer    ### END CODE HERE ###    return (n_x, n_h, n_y)

4.2 – Initialize the model’s parameters

Exercise: Implement the function initialize_parameters().

– Make sure your parameters’ sizes are right. Refer to the neural network figure above if needed.(
– You will initialize the weights matrices with random values. (
– Use: 
np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b). 
– You will initialize the bias vectors as zeros. (
– Use: 
np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.

def initialize_parameters(n_x, n_h, n_y):    """    Argument:    n_x -- size of the input layer    n_h -- size of the hidden layer    n_y -- size of the output layer     Returns:    params -- python dictionary containing your parameters:                    W1 -- weight matrix of shape (n_h, n_x)                    b1 -- bias vector of shape (n_h, 1)                    W2 -- weight matrix of shape (n_y, n_h)                    b2 -- bias vector of shape (n_y, 1)    """     np.random.seed(2)     # we set up a seed so that your output matches ours     #although the initialization is random.     ### START CODE HERE ### (≈ 4 lines of code)    W1 = np.random.randn(n_h, n_x)    W2 = np.random.randn(n_y, n_h)    b1 = np.zeros((n_h, 1))    b2 = np.zeros((n_y, 1))    ### END CODE HERE ###     assert (W1.shape == (n_h, n_x))    assert (b1.shape == (n_h, 1))    assert (W2.shape == (n_y, n_h))    assert (b2.shape == (n_y, 1))     parameters = {"W1": W1,                  "b1": b1,                  "W2": W2,                  "b2": b2} return parameters


4.3 – The Loop

Question: Implement forward_propagation().

– Look above at the mathematical representation of your classifier.(
– You can use the function 
sigmoid(). It is built-in (imported) in the notebook.(你可以使用函数sigmoid().它是notebook的内置函数
– You can use the function 
np.tanh(). It is part of the numpy library.(你可以使用函数np.tanh().它是notebook的内置函数
– The steps you have to implement are: 
1. Retrieve each parameter from the dictionary “parameters” (which is the output of 
initialize_parameters()) by using parameters[".."].(使用parameters [“..”]从字典“parameters”(这是initialize_parameters()的输出)中检索每个参数。
2. Implement Forward Propagation. Compute 
Z[1],A[1],Z[2] and A[2] (the vector of all your predictions on all the examples in the training set).(实现向前传播。计算Z[1]A[1]Z[2]A[2](训练中所有例子的所有预测的向量组)。
– Values needed in the backpropagation are stored in “
cache“. The cache will be given as an input to the backpropagation function.(反向传播所需的值存储在cache”中。cache`将作为反向传播函数的输入。)

# GRADED FUNCTION: forward_propagation def forward_propagation(X, parameters):    """    Argument:    X -- input data of size (n_x, m)    parameters -- python dictionary containing your parameters (output of initialization function)     Returns:    A2 -- The sigmoid output of the second activation    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"    """    # Retrieve each parameter from the dictionary "parameters"    ### START CODE HERE ### (≈ 4 lines of code)    W1 = parameters['W1']    W2 = parameters['W2']    b1 = parameters['b1']    b2 = parameters['b2']    ### END CODE HERE ###     # Implement Forward Propagation to calculate A2 (probabilities)    ### START CODE HERE ### (≈ 4 lines of code)    Z1 =, X) + b1    A1 = np.tanh(Z1)    Z2 =, A1) + b2    A2 = np.tanh(Z2)    ### END CODE HERE ###     assert(A2.shape == (1, X.shape[1]))     cache = {"Z1": Z1,             "A1": A1,             "Z2": Z2,             "A2": A2} return A2, cache


Exercise: Implement compute_cost() to compute the value of the cost J.

– There are many ways to implement the cross-entropy loss. To help you, we give you how we would have implemented 

# GRADED FUNCTION: compute_cost def compute_cost(A2, Y, parameters):    """    Computes the cross-entropy cost given in equation (13)     Arguments:    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)    Y -- "true" labels vector of shape (1, number of examples)    parameters -- python dictionary containing your parameters W1, b1, W2 and b2     Returns:    cost -- cross-entropy cost given equation (13)    """     m = Y.shape[1] # number of example     # Compute the cross-entropy cost    ### START CODE HERE ### (≈ 2 lines of code)    logprobs = np.multiply(np.log(A2),Y)    cost = -np.sum(logprobs + np.multiply(np.log(1 - A2),1 - Y))/m    ### END CODE HERE ###     cost = np.squeeze(cost)     # makes sure cost is the dimension we expect.                                 # E.g., turns [[17]] into 17     assert(isinstance(cost, float))     return cost



# GRADED FUNCTION: backward_propagation def backward_propagation(parameters, cache, X, Y):    """    Implement the backward propagation using the instructions above.     Arguments:    parameters -- python dictionary containing our parameters    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".    X -- input data of shape (2, number of examples)    Y -- "true" labels vector of shape (1, number of examples)     Returns:    grads -- python dictionary containing your gradients with respect to different parameters    """    m = X.shape[1]     # First, retrieve W1 and W2 from the dictionary "parameters".    ### START CODE HERE ### (≈ 2 lines of code)    W1 = parameters['W1']    W2 = parameters['W2']    ### END CODE HERE ###     # Retrieve also A1 and A2 from dictionary "cache".    ### START CODE HERE ### (≈ 2 lines of code)    A1 = cache['A1']    A2 = cache['A2']    ### END CODE HERE ###     # Backward propagation: calculate dW1, db1, dW2, db2.    ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above)    dZ2 = A2-Y    dW2 = (1.0/m)*,A1.T)    db2 = (1.0/m)*np.sum(dZ2, axis = 1, keepdims=True)    dZ1 =,dZ2)*(1 - np.power(A1, 2))    dW1 =, X.T)/m    db1 = np.sum(dZ1,axis=1,keepdims = True)/m    ### END CODE HERE ###     grads = {"dW1": dW1,             "db1": db1,             "dW2": dW2,             "db2": db2} return grads


Question: Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).(实施更新规则。使用渐变下降。你必须使用(dW1db1dW2db2)来更新(W1b1W2b2)。)

General gradient descent ruleθ=θαJθ where α is the learning rate and θ represents a parameter.

Illustration: The gradient descent algorithm with a good learning rate (converging) and a bad learning rate (diverging). Images courtesy of Adam Harley.(具有良好学习速率(收敛)和不良学习速率(发散)的梯度下降算法。)

# GRADED FUNCTION: update_parameters def update_parameters(parameters, grads, learning_rate = 1.2):    """    Updates parameters using the gradient descent update rule given above     Arguments:    parameters -- python dictionary containing your parameters    grads -- python dictionary containing your gradients     Returns:    parameters -- python dictionary containing your updated parameters    """    # Retrieve each parameter from the dictionary "parameters"    ### START CODE HERE ### (≈ 4 lines of code)    W1 = parameters["W1"]    b1 = parameters["b1"]    W2 = parameters["W2"]    b2 = parameters["b2"]    ### END CODE HERE ###     # Retrieve each gradient from the dictionary "grads"    ### START CODE HERE ### (≈ 4 lines of code)    dW1 = grads["dW1"]    db1 = grads["db1"]    dW2 = grads["dW2"]    db2 = grads["db2"]    ## END CODE HERE ###     # Update rule for each parameter    ### START CODE HERE ### (≈ 4 lines of code)    W1 = W1 - learning_rate*dW1    b1 = b1 - learning_rate*db1    W2 = W2 - learning_rate*dW2    b2 = b2 - learning_rate*db2    ### END CODE HERE ###     parameters = {"W1": W1,                  "b1": b1,                  "W2": W2,                  "b2": b2} return parameters


4.4 – Integrate parts 4.1, 4.2 and 4.3 in nn_model()

Question: Build your neural network model in nn_model().

Instructions: The neural network model has to use the previous functions in the right order.

# GRADED FUNCTION: nn_model def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):    """    Arguments:    X -- dataset of shape (2, number of examples)    Y -- labels of shape (1, number of examples)    n_h -- size of the hidden layer    num_iterations -- Number of iterations in gradient descent loop    print_cost -- if True, print the cost every 1000 iterations     Returns:    parameters -- parameters learnt by the model. They can then be used to predict.    """     np.random.seed(3)    n_x = layer_sizes(X, Y)[0]    n_y = layer_sizes(X, Y)[2]     # Initialize parameters, then retrieve W1, b1, W2, b2.     #Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".    ### START CODE HERE ### (≈ 5 lines of code)    parameters = initialize_parameters(n_x, n_h, n_y)    W1 = parameters['W1']    b1 = parameters['b1']    W2 = parameters['W2']    b2 = parameters['b2']    ### END CODE HERE ###     # Loop (gradient descent)    import pdb    for i in range(0, num_iterations):         ### START CODE HERE ### (≈ 4 lines of code)        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".        A2, cache = forward_propagation(X, parameters)         # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".        cost = compute_cost(A2, Y, parameters)         # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".        grads = backward_propagation(parameters, cache, X, Y)         # Gradient descent parameter update. Inputs: "parameters, grads".         #Outputs: "parameters".        parameters = update_parameters(parameters, grads)         ### END CODE HERE ###         # Print the cost every 1000 iterations        if print_cost and i % 1000 == 0:            print ("Cost after iteration %i: %f" %(i, cost)) return parameters


4.5 Predictions

Question: Use your model to predict by building predict(). Use forward propagation to predict results.

Reminder: predictions = yprediction=?{activation > 0.5}={1if activation>0.5 0otherwise

As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: X_new = (X > threshold) (例如,如果你想根据一个阈值将矩阵X的条目设置为01,你可以这样做:X_new = (X > threshold))

# GRADED FUNCTION: predict def predict(parameters, X):    """    Using the learned parameters, predicts a class for each example in X     Arguments:    parameters -- python dictionary containing your parameters     X -- input data of size (n_x, m)     Returns    predictions -- vector of predictions of our model (red: 0 / blue: 1)    """     # Computes probabilities using forward propagation,     #and classifies to 0/1 using 0.5 as the threshold.    ### START CODE HERE ### (≈ 2 lines of code)    A2, cache = forward_propagation(X, parameters)    predictions = np.array([0 if i <= 0.5 else 1 for i in np.squeeze(A2)])    ### END CODE HERE ### return predictionsparameters, X_assess = predict_test_case() predictions = predict(parameters, X_assess)print("predictions mean = " + str(np.mean(predictions)))# Build a model with a n_h-dimensional hidden layerparameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True) # Plot the decision boundaryplot_decision_boundary(lambda x: predict(parameters, x.T), X, np.squeeze(Y))plt.title("Decision Boundary for hidden layer size " + str(4))# Print accuracypredictions = predict(parameters, X)print ('Accuracy: %d' % float((,predictions.T) +,1-predictions.T))/float(Y.size)*100) + '%')

The image is


Accuracy: 88%