Kommilitonen im Kurs Pattern Recognition an der TU München. erstellen und teilen Zusammenfassungen, Karteikarten, Lernpläne und andere Lernmaterialien mit der intelligenten StudySmarter Lernapp. Jetzt mitmachen!

Jetzt mitmachen!

Pattern Recognition

Classification losses

(−(ylog(p)+(1−y)log(1−p)) measure the performance of a classification model whose output is a probability value between 0 and 1. An important aspect of this is that cross-entropy loss penalizes heavily the predictions that are confident but wrong.*Cross-entropy loss, or log loss:*(max(0, 1 – yHat * y) In simple terms, the score of the correct category should be greater than the sum of scores of all incorrect categories by some safety margin. Although not differentiable, it’s a convex function that makes it easy to work with usual convex optimizers used in machine learning domain.*Hinge Loss/Multi-class SVM Loss:*

Pattern Recognition

What is dropout and why is it used?

Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. This method is used to reduce overfitting and improve generalization error in deep neural networks of all kinds.

Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsible for the inputs.

This conceptualization suggests that perhaps dropout breaks-up situations where network layers co-adapt to correct mistakes from prior layers, in turn making the model more robust.

Note: The weights of the network will be larger than normal because of dropout. Therefore, before finalizing the network, the weights are first scaled by the chosen dropout rate. The network can then be used as per normal to make predictions.

Pattern Recognition

Why do we add padding to an image?

To fix the border problem. By starting the filter outside the frame of the image, it gives the pixels on the border of the image more of an opportunity for interacting with the filter, more of an opportunity for features to be detected by the filter, and in turn, an output feature map that has the same shape as the input image.

Pattern Recognition

Briefly explain the idea behind LSTM.

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.

The key to LSTM is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along with it unchanged. The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.

- “forget gate layer” decides what information we’re going to throw away from the cell state.
- a sigmoid layer called the “input gate layer” decides which values we’ll update.
- a tanh layer creates a vector of new candidate values, C~t, that could be added to the state.

Pattern Recognition

What is a convolution?

Convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

Pattern Recognition

Briefly explain the method Backpropagation.

The method calculates the gradient of the error function with respect to the neural network’s weights. The „backwards“ part of the name stems from the fact that calculation of the gradient proceeds backwards through the network, with the gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last. Partial computations of the gradient from one layer are reused in the computation of the gradient for the previous layer.

Δw=−α(∂E(X)/∂w)

Pattern Recognition

What is gradient descent?

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Parameters refer to coefficients in Linear Regression and weights in neural networks.

Pattern Recognition

What is batch normalization? Why do we use it?

Batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

- It helps reduce the amount by what the hidden unit values shift around and speeds up the learning.
- It solves the vanishing gradient problem, that might arise with the use of the sigmoid or tanh activation function.
- Also, batch normalization allows each layer of a network to learn by itself a little bit more independently of other layers.
- It reduces overfitting because it has a slight regularization effect. It adds some noise to each hidden layer’s activations.

Pattern Recognition

What is the vanishing gradient problem?

As more layers using certain activation functions are added to neural networks, the gradients of the loss function approach zero, making the network hard to train. When n hidden layers use an activation like the sigmoid function, n small derivatives are multiplied together. Thus, the gradient decreases exponentially as we propagate down to the initial layers.

A small gradient means that the weights and biases of the initial layers will not be updated effectively with each training session. Since these initial layers are often crucial to recognizing the core elements of the input data, it can lead to overall inaccuracy of the whole network.

Pattern Recognition

What is a pooling layer? Why do we use it?

A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to downsample the feature maps. This has the effect of making the resulting down-sampled feature maps more robust to changes in the position of the feature in the image. Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Calculate the average value for each patch on the feature map.*Average Pooling:*Calculate the maximum value for each patch of the feature map.*Max Pooling:*

Pattern Recognition

What are the advantages and disadvantages of RELU?

__Advantages__

- Computationally efficient—allows the network to converge very quickly
- Non-linear—although it looks like a linear function, ReLU has a derivative function and allows for backpropagation
- No vanishing gradient problem

__Disadvantages__

- The Dying ReLU problem—when inputs approach zero or are negative, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn.

Pattern Recognition

What are the advantages of Softmax?

*Advantages*

- Able to handle multiple classes only one class in other activation functions, normalizes the outputs for each class between 0 and 1, and divides by their sum, giving the probability of the input value being in a specific class.
- Useful for output neurons—typically Softmax is used only for the output layer, for neural networks that need to classify inputs into multiple categories.

Für deinen Studiengang an der TU München gibt es bereits viele Kurse auf StudySmarter, denen du beitreten kannst. Karteikarten, Zusammenfassungen und vieles mehr warten auf dich.

Zurück zur TU München ÜbersichtsseiteStudySmarter ist eine intelligente Lernapp für Studenten. Mit StudySmarter kannst du dir effizient und spielerisch Karteikarten, Zusammenfassungen, Mind-Maps, Lernpläne und mehr erstellen. Erstelle deine eigenen Karteikarten z.B. für Pattern Recognition an der TU München oder greife auf tausende Lernmaterialien deiner Kommilitonen zu. Egal, ob an deiner Uni oder an anderen Universitäten. Hunderttausende Studierende bereiten sich mit StudySmarter effizient auf ihre Klausuren vor. Erhältlich auf Web, Android & iOS. Komplett kostenfrei. Keine Haken.

Bestes EdTech Startup in Deutschland

1## Lernplan

2## Karteikarten

3## Zusammenfassungen

4## Teamwork

5## Feedback