Your peers in the course Introduction to Deep Learning at the TU München create and share summaries, flashcards, study plans and other learning materials with the intelligent StudySmarter learning app.

Get started now!

Introduction to Deep Learning

What effects does dropout have on the model size and training time?

You need bigger networks because you are only using half of it and the training time increases

Introduction to Deep Learning

Why does the Softmax formulation for binary predictions doesn't work for multiple classes?

Because the outputs need to sum to 1

Introduction to Deep Learning

bagging and ensemble methods as regularization?

- bagging: use k different datasets to train different models

ensemble:

- train three models and average results
- change a different algorithm for optimization or change objective function
- if errors uncorrelated -> expected combined error will decrease linearly with ensemble size

Introduction to Deep Learning

early stopping as regularization?

- take model with lowest validation error(before overfitting sets in) -> optimize training time as a hyperparameter

Introduction to Deep Learning

explain regularization by data augmentation

- classifier has to be invariant to a wider variety of transformations
- generate fake data simulating plausible transformations (crop augmentation, flip augmentation, random brightness and contrast changes)
- should be part of network design -> use same data augmentation when comparing two networks

Introduction to Deep Learning

what are the benefits of BN? what are drawbacks?

- very deep nets are much easier to train -> more stable gradients
- much larger range of hyperparameters works similarly when using BN

- drawback: doesn’t work well for small batch sizes (about < 16)

Introduction to Deep Learning

what is a problem for test time? what is the solution? batch normalization

- no chance to compute meaningful mean and variance for just one image
- hence, we compute mean and variance by running an exponentially weighted average across training mini-batches

Introduction to Deep Learning

is it ok to treat dimensions seperately for BN? what happens to the biases before layers?

- empirically shown that treating the dimensions separately still leads to faster convergence
- biases can be set to zero since they wil be cancelled out by BN

Introduction to Deep Learning

how do you include Batch normalization in network?

- can be applied after fully connected or convolutional layers and before non-linear activation functions
- all unit gaussians before tanh might not be good

Introduction to Deep Learning

what is the goal of batch normalization? what is the solution?

- preventing activations of dying out
- unit Gaussian activations (in shown example)

Introduction to Deep Learning

weight initialization: what happens if all weights are set to zero?

- hidden units are all going to compute the same function
- gradients are going to be the same

Introduction to Deep Learning

What are ensemble methods for regularization?

Train multiple models with different objectiv function and optimization. Average the models.

For your degree program Data Engineering And Analytics at the TU München there are already many courses on StudySmarter, waiting for you to join them. Get access to flashcards, summaries, and much more.

Back to TU München overview pageStudySmarter is an intelligent learning tool for students. With StudySmarter you can easily and efficiently create flashcards, summaries, mind maps, study plans and more. Create your own flashcards e.g. for Introduction to Deep Learning at the TU München or access thousands of learning materials created by your fellow students. Whether at your own university or at other universities. Hundreds of thousands of students use StudySmarter to efficiently prepare for their exams. Available on the Web, Android & iOS. It’s completely free.

Best EdTech Startup in Europe

1## Learning Plan

2## Flashcards

3## Summaries

4## Teamwork

5## Feedback