How are true positive and true negative rate calculated?

TP rate = sensitivity = 1- FNR = 1 - (Type II errors/P) = TP/P

TN rate = specificity = 1 - FPR = 1 - (Type I errors/N) = TN/N

Type I error: false positive or FP

Type II error: false negative or FN

Which activites belong to the Data Flow dimension?

- data collection
- data storage
- data accessing

Which activites belong to the Data Curation dimension?

- data cleaning
- data presentation
- data evaluation

Which activites belong to the Data Analytics dimension?

- statistical analysis
- modeling & simulations
- visual techniques

What are the two data types?

quantitative: data are measurable values

qualitative: provide information about the quality of a good or a service

Describe the subtypes of quantitative data

- categorical nominal: no inherent order
- categorical ordinal: has an inherent order
- categeorical binary: divided in 2 categories
- discrete: data attribute = any digit in the numbering system
- continuous: data attribute = value within range

What are the five approaches to transform a dataset?

- logarithm tarnsformation
- power law transformation
- reciprocal transformation
- radial transformation
- discrete fourier transform

How are the target variable and the output of the regression model related to each other?

y' = y + epsilon

The predicted output slightly differs vom the target because the relationship between the independent variables and the target variable is not exactly linear. Therefore we need to add the error term epsilon.

Compare Feature Engineering and Deep Learning

In FE we have to define our features, DL finds the features on its own.

Domain knowledge is needed to choose relevant features.

Compare batch and stream processing.

Batch = transmits data as a block; for example, retailer store

Stream = provides data immediately; for example, sensor data in industry

For what is AR used? And for what ARIMA?

AR = for data with an underlying linear relationship

ARIMA = for non-linear data; uses additional MA an integrated terms

What is the "least-squares method"?

find values for w0 and w1 minimizing the sum of the squared error

