Your peers in the course Pattern Recognition at the TU München create and share summaries, flashcards, study plans and other learning materials with the intelligent StudySmarter learning app.

Get started now!

Pattern Recognition

What are the advantages of Softmax?

*Advantages*

- Able to handle multiple classes only one class in other activation functions, normalizes the outputs for each class between 0 and 1, and divides by their sum, giving the probability of the input value being in a specific class.
- Useful for output neurons—typically Softmax is used only for the output layer, for neural networks that need to classify inputs into multiple categories.

Pattern Recognition

What is boosting? Briefly explain the idea behind it.

The basic idea behind boosting is converting many weak learners to form a single strong learner.

- First, the inputs are initialized with equal weights. It uses the first base learning algorithm to do this, which is generally a decision stump. This means, in the first stage, it will be a weak learner, that will fit a subsample of the data and make predictions for all the data.
- Now we do the following till maximum number of trees is reached :
- Update the weights of inputs based on the previous run, and weights are higher for wrongly predicted/classified inputs
- Make another rule(decision stump in this case) and fit it to a subsample of data. Note that this time rule will be formed by keeping the wrongly classified inputs(ones having higher weight) in mind.
- Finally we predict/ classify all inputs using this rule.

- After the iterations have been completed, we combine weak rules to form a single strong rule, which will then be used as our model.
- There is also known as the learning rate, which controls the magnitude by which each tree contributes to the model.

Note: By changing the depth you have simple and easy control over the bias/variance trade-off, knowing that boosting can reduce bias but also significantly reduces variance.

Pattern Recognition

Briefly explain the idea behind LSTM.

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.

The key to LSTM is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along with it unchanged. The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.

- “forget gate layer” decides what information we’re going to throw away from the cell state.
- a sigmoid layer called the “input gate layer” decides which values we’ll update.
- a tanh layer creates a vector of new candidate values, C~t, that could be added to the state.

Pattern Recognition

Classification losses

(−(ylog(p)+(1−y)log(1−p)) measure the performance of a classification model whose output is a probability value between 0 and 1. An important aspect of this is that cross-entropy loss penalizes heavily the predictions that are confident but wrong.*Cross-entropy loss, or log loss:*(max(0, 1 – yHat * y) In simple terms, the score of the correct category should be greater than the sum of scores of all incorrect categories by some safety margin. Although not differentiable, it’s a convex function that makes it easy to work with usual convex optimizers used in machine learning domain.*Hinge Loss/Multi-class SVM Loss:*

Pattern Recognition

What is dropout and why is it used?

Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. This method is used to reduce overfitting and improve generalization error in deep neural networks of all kinds.

Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsible for the inputs.

This conceptualization suggests that perhaps dropout breaks-up situations where network layers co-adapt to correct mistakes from prior layers, in turn making the model more robust.

Note: The weights of the network will be larger than normal because of dropout. Therefore, before finalizing the network, the weights are first scaled by the chosen dropout rate. The network can then be used as per normal to make predictions.

Pattern Recognition

Briefly explain the method Backpropagation.

The method calculates the gradient of the error function with respect to the neural network’s weights. The “backwards” part of the name stems from the fact that calculation of the gradient proceeds backwards through the network, with the gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last. Partial computations of the gradient from one layer are reused in the computation of the gradient for the previous layer.

Δw=−α(∂E(X)/∂w)

Pattern Recognition

What is gradient descent?

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Parameters refer to coefficients in Linear Regression and weights in neural networks.

Pattern Recognition

Why do we add padding to an image?

To fix the border problem. By starting the filter outside the frame of the image, it gives the pixels on the border of the image more of an opportunity for interacting with the filter, more of an opportunity for features to be detected by the filter, and in turn, an output feature map that has the same shape as the input image.

Pattern Recognition

What is the vanishing gradient problem?

As more layers using certain activation functions are added to neural networks, the gradients of the loss function approach zero, making the network hard to train. When n hidden layers use an activation like the sigmoid function, n small derivatives are multiplied together. Thus, the gradient decreases exponentially as we propagate down to the initial layers.

A small gradient means that the weights and biases of the initial layers will not be updated effectively with each training session. Since these initial layers are often crucial to recognizing the core elements of the input data, it can lead to overall inaccuracy of the whole network.

Pattern Recognition

What is a pooling layer? Why do we use it?

A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to downsample the feature maps. This has the effect of making the resulting down-sampled feature maps more robust to changes in the position of the feature in the image. Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Calculate the average value for each patch on the feature map.*Average Pooling:*Calculate the maximum value for each patch of the feature map.*Max Pooling:*

Pattern Recognition

What is a convolution?

Convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

Pattern Recognition

Explain the idea behind a random forest.

Any individual, tightly ﬁtted (i.e. overﬁtted) classiﬁer has a low bias, but high variance (classiﬁers for resamples of the data will look very diﬀerent each time). However, by pooling many such classiﬁers, the majority vote should give us a powerful combination of low bias (because the individual classiﬁers have a low bias) and low variance (because the individual diﬀerences will get averaged out).

Note: This is only true under the assumption that the individual classiﬁers can be drawn independently.

For your degree program at the TU München there are already many courses on StudySmarter, waiting for you to join them. Get access to flashcards, summaries, and much more.

Back to TU München overview pageStudySmarter is an intelligent learning tool for students. With StudySmarter you can easily and efficiently create flashcards, summaries, mind maps, study plans and more. Create your own flashcards e.g. for Pattern Recognition at the TU München or access thousands of learning materials created by your fellow students. Whether at your own university or at other universities. Hundreds of thousands of students use StudySmarter to efficiently prepare for their exams. Available on the Web, Android & iOS. It’s completely free.

Best EdTech Startup in Europe

1## Learning Plan

2## Flashcards

3## Summaries

4## Teamwork

5## Feedback