What is a qualitative variable?

What are discrete variables?

What is an Continuous variable?

What is the major goal of supervised learning?

What is meant by Classification?

What is the problem with hyperparameters?

What are approaches inherited from the hyperparameter optimization?

What fundamental problem lies ahead when searching for the right learning rate?

What does iterative optimization algorithm mean?

To what refers Iterations?

What are the main problems of the Batch Gradient Descent?

What ensures strict convexity?

What is a qualitative variable?
Also called categorical variables, take on values that are names or labels e.g. the gender

What are discrete variables?
Can only take on a finite number of values, eg. The number of cars owned.

What is an Continuous variable?
Can take on any value in a certain range, eg. The body height

What is the major goal of supervised learning?
To find parameter values that will allow a model to perform well on historical data and to make predictions on unknown data that have benn part of the training dataset.

What is meant by Classification?
• the process of modeling and predicting a discrete class label that is often not measurable but observable
• two classes to predict, usually 1 or 0 values, then it is called binary classification
• when there are more than two class labels to predict it is called multi-classification
• examples eg predicting employee churn, email spam, financial fraud, student letter grades

What is the problem with hyperparameters?
There is no magic number that work everywhere

The best hyperparameter setting for a specific learning algorithm specifically depends on each learning task and on each dataset!

What are approaches inherited from the hyperparameter optimization?
grid search and random search

Einführung Data Science

• high values for the learning rate often lead to overshooting or oscillation around the minimum
• very low values often slowing down the process of reaching the minimum and/or tapping the algorithm in an undesirable local minimum

What does iterative optimization algorithm mean?
to pass over the training set for a multiple of times in a way that can also be controlled via a series of additional hyperparameters in order to get the best result

Einführung Data Science

The number of batches need to complete one epoch

For a training set with 1000 instances and a batch size of 250 it would take 4 iterations to complete 1 epoch

What are the main problems of the Batch Gradient Descent?
• the batch gradient descent passes over the entire training set before taking only a single small step in the direction of steepest descent
• therefore it is a very costly operation especially if the training set is large
• another problem with gradient descent algorithms in general and with the batch gradient descent in particular is that they are susceptible to local minima

What ensures strict convexity?
That the only local optimum of a function is global at the same time

