During forward propagation, in the forward function for a layer l you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer l, since the gradient depends on it.

How can we prevent our model from overfitting?

What does the analogy "AI is the new electricity" refer to?

How does the k-NN Algorithm work?

What happens if we initialize all weights with small random numbers?

When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models.

What does a neuron compute?

A demographic dataset with statistics on different cities' population, GPD per capita, economic growth is an example of "unstructured" data because it contains data coming from different sources.

Which of the following are true?

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer.

Images for cat recognition is an example of "structured" data, because it is represented as a structured array in a computer.

During forward propagation, in the forward function for a layer l you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer l, since the gradient depends on it.

1. True

2. False

How can we prevent our model from overfitting?

Regularization

What does the analogy "AI is the new electricity" refer to?

1. Similar to electricity starting about 100 years ago, AI is transforming multiple industries

2. AI is powering personal devices in our homes and offices, similar to elextricity

3. AI runs on computers and is thus powered by electricity, but it is letting computers do things not possible before

4. Through the „smart grid“, AI is delivering a new wave of electricity

How does the k-NN Algorithm work?

A classifier that looks at the distance of k neighbors:

2. Initialise the value of k

3. For gettin the predicted class, iterate from 1 tot toal number of training data points

3.1. Calculate the distance between test data and each row of training data (e.g. Euclidean Distance)

3.2. Add the distance and the index of the example to an ordered collection

4. Sort the calculated distances in ascending order based on distance values

5. Get top k rows from the sorted array

6. Get the most frequent class of theses rows

7. Return the predicted class

8. If regression: return the mean of the K; else: return the mode of the K Labels

What happens if we initialize all weights with small random numbers?

When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models.

1. False

2. True

What does a neuron compute?

1. A neuron computes the mean of all features before applying the output to an activation function

2. A neuron computes an activation function followed by a linear function (z = Wx +b)

3. A neuron computes a linear function (z = Wx + b) followed by an activation function

4. A neuron computes a function g that scales the input x linearly (Wx +b)

• Weight Decay
• Early Stopping
• Bagging and Ensemble Methods
• Dropout
• Batch Normalization

A demographic dataset with statistics on different cities' population, GPD per capita, economic growth is an example of "unstructured" data because it contains data coming from different sources.

1. False

2. True

Which of the following are true?

1. X is a matrix in which each column is one training example

2. a₄[²] is the activation output by the 4th neuron of the 2nd layer

3. a[²]⁽¹²⁾ denotes activation vector of the 12th layer on the 2nd training example

4. a[²] denotes the activation vextor of the 2nd layer

5. a₄[²] is the activation output of the 2nd layer for the 4th training example

6. a[²]⁽¹²⁾ denotes the activation vector of the 2nd layer for the 12th training example

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer.

1. True

2. False

Images for cat recognition is an example of "structured" data, because it is represented as a structured array in a computer.

1. True

2. False

