3 – General Architecture of the learning algorithm

It’s time to design a simple algorithm to distinguish cat images from non-cat images.

You will build a Logistic Regression, using a Neural Network mindset. The following Figure explains why Logistic Regression is actually a very simple Neural Network!


Key steps
In this exercise, you will carry out the following steps: 
– Initialize the parameters of the model 
– Learn the parameters for the model by minimizing the cost 
– Use the learned parameters to make predictions (on the test set) 
– Analyse the results and conclude

4 – Building the parts of our algorithm ##

The main steps for building a Neural Network are: 
1. Define the model structure (such as number of input features) 
2. Initialize the model
’s parameters 
3. Loop: 
– Calculate current loss (forward propagation) 
– Calculate current gradient (backward propagation) 
– Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

.1 – Helper functions

Exercise: Using your code from “Python Basics”, implement sigmoid()

4.2 – Initializing parameters

Exercise: Implement parameter initialization in the cell below. You have to initialize w as a vector of zeros. If you don’t know what numpy function to use, look up np.zeros() in the Numpy library’s documentation.

4.3 – Forward and Backward propagation

Now that your parameters are initialized, you can do the “forward” and “backward” propagation steps for learning the parameters.

Exercise: Implement a function propagate() that computes the cost function and its gradient.

d) Optimization

·        You have initialized your parameters.

·        You are also able to compute a cost function and its gradient.

·        Now, you want to update the parameters using gradient descent.

Exercise: Write down the optimization function. The goal is to learn w and b by minimizing the cost function J. For a parameter θ, the update rule is θ=θα dθ, where α is the learning rate.



Exercise: The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the predict() function. There is two steps to computing predictions:

1.    Calculate Y^=A=σ(wTX+b)

2.    Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

What to remember: 
You’ve implemented several functions that: 

– Initialize (w,b) 
– Optimize the loss iteratively to learn parameters (w,b): 
– computing the cost and its gradient 
– updating the parameters using gradient descent 
– Use the learned (w,b) to predict the labels for a given set of examples

5 – Merge all functions into a model

You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.

Exercise: Implement the model function. Use the following notation: 
– Y_prediction for your predictions on the test set 
– Y_prediction_train for your predictions on the train set 
– w, costs, grads for the outputs of optimize()