Gauss markov and unbiased, consistent and efficient estimators

5) Expected value of the residual vector, given 𝑋, is 0 (𝐸 𝜀 𝑋 = 0)

What are the three main estimators in the fixed-effect Model?

What is the random effects assumption?

1) linearity+ reformulations

Numerical Prediction

Association Rule Analysis

Classification

What are the steps from data to information?

What is the fixed effect assumption?

Clustering

For an algorithm to be useful in a wide range of real-world
applications it must:

• Basic algorithm needs to be extended to fulfill these requirements

Unbiased, if ß^=ß,

Consist-ent, if var down, with n up

efficient, if no other linear estimates better

Gauss markov assumptions

1) linearity

2) no multicollinearity amongst predictors

3) Homoskedacity

4) No autocorrelation

5) Expected value of residual = 0

5) Expected value of the residual vector, given 𝑋, is 0 (𝐸 𝜀 𝑋 = 0)

Assumption: Other factors, which are not explicitely accounted for in the
model but are contained in 𝜀, are not correlated with 𝑋 (exogeneity)

• Endogeneity is given when an independent variable is correlated with the
error term and the covariance is not null

–> Probably omitted variable bias

What are the three main estimators in the fixed-effect Model?

1) First-differences (most elegant way)
2) Within

3) Between
4) Least squares dummy variable (disadvantage of keeping track of many dummy variables)

What is the random effects assumption?

The random effects assumption (in a random effects model) is that the
individual specific effects are uncorrelated with the independent variables
(𝑐𝑜𝑣 𝜆𝑖, 𝑥𝑗𝑖𝑡 = 0, but 𝜆𝑖 might be correlated).

1) linearity+ reformulations

If not applicable –> reformulate

1) polynomial regressions (if curve in data)

2) transform log if outliers

3) non linear with constant (ex Experten(..) if curve, but no negative turn

4) piecewise

Numerical Prediction

• Given a collection of data with known numeric outputs, create a function that outputs a
predicted value from a new set of inputs
• E.g. given gestation time of an animal, predict its maximum life span

Association Rule Analysis

• Identify relationships in data from co-occuring terms or items
• E.g., analyze grocery store purchases to identify items most commonly purchased together

Classification

• From data with known labels, create a classifier that determines which label to apply to a
new observation
• E.g. Identify new loan applicants as low, medium, or high risk based on existing applicant
behavior

What are the steps from data to information?
1. Data consolidation
2. Selection and Processing
3. Predictive Analytics
4. Interpretation and Evaluation

What is the fixed effect assumption?

The fixed effect assumption is that the individual specific effect is
correlated with the independent variables (𝑐𝑜𝑣 𝜆𝑖, 𝑥𝑗𝑖𝑡 ≠ 0).

Clustering

• Identify “natural” groupings in data
• Unsupervised learning, no predefined groups
• E.g. Identify clusters of “similar” customers

– Permit numeric attributes
– Allow missing values
– Be robust in the presence of noise

• Basic algorithm needs to be extended to fulfill these requirements

