Quantitative Variable

takes numerical values for which arithemetic operations such as adding and averaging make sense

- interval scale
Cases

objects described by a set of data (customers, companies, subjects in study, …)

residuals

- difference between an observed value of the response variable and the value predicted by the regression line. That is, residual = observed y−predicted y =y−yˆ

- the mean of the least-squares residuals is always zero

Intercept

the value of y when x = 0

b0= y¯-b1*x¯

Inter-rater reliability/ Cohen’s Kappa (K oder  Ψ)

-  provides a measure of agreement between two observers coding on a nominal scale
- observed level of agreement relative to the level of agreement that would be expected by chance
- To what extent are the judgements similar?
Kappa (К) = (Ao–Ae) / (N -Ae)
Ao= Agreement observed
Ae= Agreement expected
0.70 or higher considered‘good’ (0.40 ≤ К≤ 0.70 ‘reasonable’, К< 0.40 ‘bad’)

Outliers in regression

- outliers in the y direction of a scatterplot have large regression residuals, but other outliers need not have large residuals

- Points that are out-liers in the x direction of a scatterplot are often influential for the least-squares regression line

1.

There is a close connection between correlation and the slope of the least-squares line.

• change of one standard deviation in x corresponds to change of r standard deviations in y
• correlation=0, slope= 0
2.
The least-squares regression line always passes through the point (x¯,y¯)
3.
The distinction between explanatory and response variables is essential in regression
Correlation

r
- measure direction& strength of linear relationship between 2 quantitative variables

- positive correlation when the association is positive and vice versa

- variables has to be quantitative

- because you use the z- score r does not change when you make a linear transformation

- always a number between 1 & -1

Histiogram

• Stemplots are not suitable for large data sets
• displays only the count or percent of the observations that fall into each class
• large sets of data are usually presented in a frequency table
Categorical Variable(qualititativ)

places a case into one of several groups or categories(e.g. gender)
- nominal, ordinal

Inferential statistics
conclusions about population based on limited number of elements (= sample) from that population

Quartils
- describe spread by giving several percentiles
• median is the 50th percentile
• upper quartile is the median of the upper half of the data
• lower quartile is the median of the lower half of the data
• The first quartile Q1 is the median of one half of the distribution
The third quartile Q3 is the median of the other half of the distribution
