Machine Learning And Forecasting an der Maastricht University | Karteikarten & Zusammenfassungen

Lernmaterialien für Machine Learning and Forecasting an der Maastricht University

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Machine Learning and Forecasting Kurs an der Maastricht University zu.

TESTE DEIN WISSEN

Pro's and Con's of linear regression

Lösung anzeigen
TESTE DEIN WISSEN

Pro's

1. The SSE is a convex function. Hence, gradient descent will convergence to the global
optimum given the learning rate has been properly chosen.

2. Due to the iterative nature, gradient descent can be easily extended to work with big
data or streaming data


Cons:
1. Requires fine-tuning the learning rate.
2. Data must be normalized in order for gradient descent to converge fast.
3. Slower compared to the analytical least squares solution.

Lösung ausblenden
TESTE DEIN WISSEN


Decision Boundary

Lösung anzeigen
TESTE DEIN WISSEN

Pˆ(y = 1|x) = ˆP(y = 0|x) = 0.5


Note that σ(0) = 0.5. Hence the problem
simplifies to finding all x such that:
wx + b = 0

Lösung ausblenden
TESTE DEIN WISSEN


Binary Cross Entropy

Lösung anzeigen
TESTE DEIN WISSEN

Let X be a binary random variable. X has been
estimated by two distributions ˆP and P. Let:
pˆ = ˆP(X = 1), p = P(X = 1)


The binary cross entropy between ˆP and P is:
E(ˆp, p) = −p log(ˆp) − (1 − p) log(1 − ˆp)


Binary cross entropy can be interpreted as the
’distance’ between ˆP and P.


When using binary cross entropy, our estimation problem becomes a convex
optimization problem which we can also nicely solve by gradient descent.

Lösung ausblenden
TESTE DEIN WISSEN


Bernouli Distribution

Lösung anzeigen
TESTE DEIN WISSEN

X ∼ Bern(ρ) is distributed by a Bernoulli distribution with probability ρ ∈ [0, 1].


Example: Suppose X ∈ {Yes, No} represents a customer subscribing to a short-term
loan at a bank. It is known that X ∼ Bern(0.6). We have that:


P(X = Yes) = ρ = 0.6
P(X = No) = 1 − ρ = 0.4

Lösung ausblenden
TESTE DEIN WISSEN


Marginal Independence

Lösung anzeigen
TESTE DEIN WISSEN

X and Y are marginally independent iff:


P(X,Y) = P(X)P(Y) for all values of X and Y


Y does not provide any information about X (and vica versa).

Lösung ausblenden
TESTE DEIN WISSEN


Conditional Independence


Lösung anzeigen
TESTE DEIN WISSEN

X and Y are conditionally independent given Z iff:


P(X,Y|Z) = P(X|Z)P(Y|Z) for all values of X, Y, and Z


Y does not provide any additional information about X (and vica versa) if Z is known.

Lösung ausblenden
TESTE DEIN WISSEN


Maximum Likelihood Estimation Assumptions

Lösung anzeigen
TESTE DEIN WISSEN


In MLE, we search for the set of parameters that most likely generated a given sample of data.

It is assumed that the sample is IID:


1. Independent: each data point is generated independently from other data points.
2. Identically Distributed: each data point is generated by the same distribution.


If these assumptions hold, then MLE estimates converge to the true distribution
parameters when the sample size goes to infinity

Lösung ausblenden
TESTE DEIN WISSEN

How many estimations?

Lösung anzeigen
TESTE DEIN WISSEN

Suppose each feature takes k discrete values. In this case, we need to estimate

(2k^m) − 1 parameters. When m = 10 and k = 4, we need to estimate over 2 million parameters!


Lösung ausblenden
TESTE DEIN WISSEN

How many estimation only in Niave bayes then?

Lösung anzeigen
TESTE DEIN WISSEN


In this case, we need to estimate 2m(k − 1) + 1 parameters. Hence, for m = 10 and k = 4,
there are only 61 parameters that need to be estimated!

Lösung ausblenden
TESTE DEIN WISSEN


Naïve Bayes Pros & Cons

Lösung anzeigen
TESTE DEIN WISSEN

Pros:
• Simple to implement.
• Requires very few data points.
• Low CPU/memory complexity.
• Can deal with missing data.
• Good baseline model.
Cons:
• Conditional independence assumption is typically too ’Naïve’.

Lösung ausblenden
TESTE DEIN WISSEN


How to Choose ζ?

Lösung anzeigen
TESTE DEIN WISSEN

It only makes sense to compare the performance of classifiers at the same threshold ζ.

Usually, ζ is unknown and must be chosen by a domain expert based on the risk appetite.

Lösung ausblenden
TESTE DEIN WISSEN


Is there a way to evaluate a classifier independent of ζ?

Lösung anzeigen
TESTE DEIN WISSEN
  1. Receiver Operating Characteristic Curve
  2. Precision-Recall Curve
Lösung ausblenden
  • 18935 Karteikarten
  • 350 Studierende
  • 1 Lernmaterialien

Beispielhafte Karteikarten für deinen Machine Learning and Forecasting Kurs an der Maastricht University - von Kommilitonen auf StudySmarter erstellt!

Q:

Pro's and Con's of linear regression

A:

Pro's

1. The SSE is a convex function. Hence, gradient descent will convergence to the global
optimum given the learning rate has been properly chosen.

2. Due to the iterative nature, gradient descent can be easily extended to work with big
data or streaming data


Cons:
1. Requires fine-tuning the learning rate.
2. Data must be normalized in order for gradient descent to converge fast.
3. Slower compared to the analytical least squares solution.

Q:


Decision Boundary

A:

Pˆ(y = 1|x) = ˆP(y = 0|x) = 0.5


Note that σ(0) = 0.5. Hence the problem
simplifies to finding all x such that:
wx + b = 0

Q:


Binary Cross Entropy

A:

Let X be a binary random variable. X has been
estimated by two distributions ˆP and P. Let:
pˆ = ˆP(X = 1), p = P(X = 1)


The binary cross entropy between ˆP and P is:
E(ˆp, p) = −p log(ˆp) − (1 − p) log(1 − ˆp)


Binary cross entropy can be interpreted as the
’distance’ between ˆP and P.


When using binary cross entropy, our estimation problem becomes a convex
optimization problem which we can also nicely solve by gradient descent.

Q:


Bernouli Distribution

A:

X ∼ Bern(ρ) is distributed by a Bernoulli distribution with probability ρ ∈ [0, 1].


Example: Suppose X ∈ {Yes, No} represents a customer subscribing to a short-term
loan at a bank. It is known that X ∼ Bern(0.6). We have that:


P(X = Yes) = ρ = 0.6
P(X = No) = 1 − ρ = 0.4

Q:


Marginal Independence

A:

X and Y are marginally independent iff:


P(X,Y) = P(X)P(Y) for all values of X and Y


Y does not provide any information about X (and vica versa).

Mehr Karteikarten anzeigen
Q:


Conditional Independence


A:

X and Y are conditionally independent given Z iff:


P(X,Y|Z) = P(X|Z)P(Y|Z) for all values of X, Y, and Z


Y does not provide any additional information about X (and vica versa) if Z is known.

Q:


Maximum Likelihood Estimation Assumptions

A:


In MLE, we search for the set of parameters that most likely generated a given sample of data.

It is assumed that the sample is IID:


1. Independent: each data point is generated independently from other data points.
2. Identically Distributed: each data point is generated by the same distribution.


If these assumptions hold, then MLE estimates converge to the true distribution
parameters when the sample size goes to infinity

Q:

How many estimations?

A:

Suppose each feature takes k discrete values. In this case, we need to estimate

(2k^m) − 1 parameters. When m = 10 and k = 4, we need to estimate over 2 million parameters!


Q:

How many estimation only in Niave bayes then?

A:


In this case, we need to estimate 2m(k − 1) + 1 parameters. Hence, for m = 10 and k = 4,
there are only 61 parameters that need to be estimated!

Q:


Naïve Bayes Pros & Cons

A:

Pros:
• Simple to implement.
• Requires very few data points.
• Low CPU/memory complexity.
• Can deal with missing data.
• Good baseline model.
Cons:
• Conditional independence assumption is typically too ’Naïve’.

Q:


How to Choose ζ?

A:

It only makes sense to compare the performance of classifiers at the same threshold ζ.

Usually, ζ is unknown and must be chosen by a domain expert based on the risk appetite.

Q:


Is there a way to evaluate a classifier independent of ζ?

A:
  1. Receiver Operating Characteristic Curve
  2. Precision-Recall Curve
Machine Learning and Forecasting

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten Machine Learning and Forecasting Kurse im gesamten StudySmarter Universum

Search and Matching

Hochschule der Bundesagentur für Arbeit

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Machine Learning and Forecasting
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Machine Learning and Forecasting