Your peers in the course Business Analytics at the TU München create and share summaries, flashcards, study plans and other learning materials with the intelligent StudySmarter learning app.

Get started now!

Business Analytics

Partial least squares

PLS uses weights to reflect covariance structure –> More difficult

Business Analytics

5) Expected value of the residual vector, given 𝑋, is 0 (𝐸 𝜀 𝑋 = 0)

Assumption: Other factors, which are not explicitely accounted for in the

model but are contained in 𝜀, are not correlated with 𝑋 (exogeneity)

• Endogeneity is given when an independent variable is correlated with the

error term and the covariance is not null

–> Probably omitted variable bias

Business Analytics

Bootstrap

– sampling several times (replacement from training set to bootstrap set

– some observations more than once

– bootstrap data set contains 𝑛 observations, sampled

with replacement from the original data set.

• Then the model is estimated on a bootstrap data set, and

predictions are made for original training set.

• This process is repeated many times and the resulting

statistics are averaged.

Business Analytics

Leave one out

𝑛-Fold Cross-Validation

𝑛 instances are in the data set

Use all but one instance for training

Each iteration is evaluated by predicting the omitted instance

• Advantages / Disadvantages

Maximum use of the data for training

Deterministic (no random sampling of test sets)

High computational cost

Non-stratified sample!

Business Analytics

Solutions multicollinearity

Subset selection

best subset, backward, forward, stepwise selection of features

(already discussed in the context of the linear regression)

• Using derived input

Principal component regression

Partial least squares

• Coefficient shrinkage (regularization)

Ridge regression

Lasso (least absolute shrinkage and selection operator)

Business Analytics

MOdel Selection

Wide-spread methods for model selection are:

• Akaike Information Criterion (AIC)

• 𝐴𝐼𝐶 = 2𝑘 − 2 ln 𝐿 , already discussed in the context of log. regression

• 𝑘 is the number of parameters, ln(𝐿) the log likelihood

• Minimum description length (Risannen, 1978)

• discussed later in this class

• Resampling methods

• Cross validation, jackknife, bootstrap, etc.

Business Analytics

Model Selection and Model Assessment

Model selection: Estimating performances of different models to choose the

best one (produces the minimum of the test error)

Model assessment: Having chosen a model, estimating the prediction error

on new data

Business Analytics

Generalization errors

Components of generalization error

• Bias is error from erroneous assumptions in the learning algorithm. Error might be

due to inaccurate assumptions/simplifications made by the model.

• Variance is error from sensitivity to small fluctuations in the training set. High

variance causes overfitting.

Underfitting: model is too “simple” to represent all relevant characteristics

• High bias and low variance

• High training error and high test error

Overfitting: model is too “complex” and fits irrelevant characteristics/noise

• Low bias and high variance

• Low training error and high test error

Business Analytics

Supervised learning

y^=f(x)

�y^yy223444

Supervised learning is inferring a function from labeled training data

Training: given a training set of labeled examples

estimate the prediction function 𝑓 by minimizing the prediction error on the

training set

Testing: apply 𝑓 to a never before seen test example 𝒙 and output the

predicted value ො𝑦 = 𝑓(𝒙)

Business Analytics

For an algorithm to be useful in a wide range of real-world

applications it must:

• Basic algorithm needs to be extended to fulfill these requirements

– Permit numeric attributes

– Allow missing values

– Be robust in the presence of noise

• Basic algorithm needs to be extended to fulfill these requirements

Business Analytics

Comparing Error rates

Choose lowest error rate

–

Estimated error rate is just an estimate (random)

• Student’s paired 𝑡-test tells us whether the means of two samples are

significantly different

• Construct a 𝑡-test statistic

Need variance as well as point estimates

Business Analytics

1) linearity+ reformulations

If not applicable –> reformulate

1) polynomial regressions (if curve in data)

2) transform log if outliers

3) non linear with constant (ex Experten(..) if curve, but no negative turn

4) piecewise

For your degree program Computer Science at the TU München there are already many courses on StudySmarter, waiting for you to join them. Get access to flashcards, summaries, and much more.

Back to TU München overview pageStudySmarter is an intelligent learning tool for students. With StudySmarter you can easily and efficiently create flashcards, summaries, mind maps, study plans and more. Create your own flashcards e.g. for Business Analytics at the TU München or access thousands of learning materials created by your fellow students. Whether at your own university or at other universities. Hundreds of thousands of students use StudySmarter to efficiently prepare for their exams. Available on the Web, Android & iOS. It’s completely free.

Best EdTech Startup in Europe

1## Learning Plan

2## Flashcards

3## Summaries

4## Teamwork

5## Feedback