Data Mining and KD

ROC - explanation

Scatter plot of the true positive rate TPR and the false positive rate FPR

Data Mining and KD

PR - breakeven point

- The main diagonal of precision recall
- Important classification criterion
- High breakeven point = Good classifier

Data Mining and KD

Linear discriminant Analysis - how to calculate?

- You want to find wx + b
- w = mean1 – mean 2
- b = – w * mean1+mean2/2

Data Mining and KD

Median filter, when?

- Series data
- When we have outliers
- Remove noise

Data Mining and KD

Edit distance

- The
**minimum**number of**edit****operations** **Operations**: insert, delete, or change a sequence element

We denote

Lij(x; y) as the edit distance between the ﬁrst i elements of x and the ﬁrst j elements of y

Data Mining and KD

Fuzzy clustering, when?

- Good results, even if the clusters are overlapping and data are noisy
- Sensitive to outliers.

Outliers are equivalent to other data points that

are equidistant to all data points like the middle point. But intuitively we expect outliers to have low membership

Data Mining and KD

ID3, when to use?

- Extension of classification and regression tree
Accept real-valued and missing features

- Uses a pruning mechanism to reduce tree size

Data Mining and KD

Principal component analysis - when?

Data Mining and KD

Hypercube standardization is appropriate for

Data Mining and KD

Mean and variance standardization is appropriate for

Data Mining and KD

How to calculate the principal axis (PCA)

- Calculate mean per feature
- Calculate covariance matrix
- 1/n SUM( (x-x_mean)*(y-y_mean) )

- A-ILambda
- Get lambda
- go back to your covariance matrix and make
- First row = x11*Lambda
- Second row = x12* Lambda
- etc

- Get a relationship between x11, x12 etc for all different lambdas
- Normalise them. Divide them on Sqr(a^2 + b^2 …)
- You now have your principal axes

Data Mining and KD

When does Naive Bayesian Classifier not work?

- When classes are based on correlation
- Variance difference
- Hight -> + Weight -> –
- Hight -> + Weight -> +

