Data And Web Mining at Instituto Politécnico De Coimbra | Flashcards & Summaries

# Lernmaterialien für Data and web mining an der Instituto Politécnico de Coimbra

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Data and web mining Kurs an der Instituto Politécnico de Coimbra zu.

TESTE DEIN WISSEN

How to calculate the principal axis (PCA)

Lösung anzeigen
TESTE DEIN WISSEN
• Calculate mean per feature
• Calculate covariance matrix
• 1/n SUM( (x-x_mean)*(y-y_mean) )
• A-ILambda
• Get lambda
• go back to your covariance matrix and make
• First row = x11*Lambda
• Second row = x12* Lambda
• etc
• Get a relationship between x11, x12 etc for all different lambdas
• Normalise them. Divide them on Sqr(a^2 + b^2 ...)
• You now have your principal axes
Lösung ausblenden
TESTE DEIN WISSEN

ROC - explanation

Lösung anzeigen
TESTE DEIN WISSEN

Scatter plot of the true positive rate TPR and the false positive rate FPR

Lösung ausblenden
TESTE DEIN WISSEN

ID3, when to use?

Lösung anzeigen
TESTE DEIN WISSEN
• Extension of classification and regression tree
• Accept real-valued and missing features
• Uses a pruning mechanism to reduce tree size

Lösung ausblenden
TESTE DEIN WISSEN

Principal component analysis - when?

Lösung anzeigen
TESTE DEIN WISSEN

When we want to visualize high-dimensional data

Work with fewer dimensions

Lösung ausblenden
TESTE DEIN WISSEN

Give examples of different types of partitioning clustering

Lösung anzeigen
TESTE DEIN WISSEN

k-means

k-medoids

CLARAUS

Lösung ausblenden
TESTE DEIN WISSEN

Give examples of different types of GRID-based clustering

Lösung anzeigen
TESTE DEIN WISSEN

STING

waveCluster

CLIQUE

Lösung ausblenden
TESTE DEIN WISSEN

What is the dimensional curse?

Lösung anzeigen
TESTE DEIN WISSEN

As the number of features/dimensions grow, the amount of data we need to generalize and outcome grows exponentially

Lösung ausblenden
TESTE DEIN WISSEN

What are the reasons for overfitting ?

Lösung anzeigen
TESTE DEIN WISSEN
• data contains noise
• not enough data
• model is to complex
Lösung ausblenden
TESTE DEIN WISSEN

Requirements for a good clustering models

Lösung anzeigen
TESTE DEIN WISSEN

->Scalability which is being able to use the model with more data than the sample data

->Being able to use it with different types of attributes (binary,numerical,categorical)

->Interpretability and usability

Lösung ausblenden
TESTE DEIN WISSEN

Discuss ensemble learning

Lösung anzeigen
TESTE DEIN WISSEN

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models
Lösung ausblenden
TESTE DEIN WISSEN

Pros of ensemble learning

Lösung anzeigen
TESTE DEIN WISSEN

Better accuracy

More consistence

Reduces bias

Lösung ausblenden
TESTE DEIN WISSEN

What are the reasons for underfitting?

Lösung anzeigen
TESTE DEIN WISSEN
• data is not clean
• model has bias
• small amount od data
• model is too simple
Lösung ausblenden
• 54 Karteikarten
• 96 Studierende
• 0 Lernmaterialien

## Beispielhafte Karteikarten für deinen Data and web mining Kurs an der Instituto Politécnico de Coimbra - von Kommilitonen auf StudySmarter erstellt!

Q:

How to calculate the principal axis (PCA)

A:
• Calculate mean per feature
• Calculate covariance matrix
• 1/n SUM( (x-x_mean)*(y-y_mean) )
• A-ILambda
• Get lambda
• go back to your covariance matrix and make
• First row = x11*Lambda
• Second row = x12* Lambda
• etc
• Get a relationship between x11, x12 etc for all different lambdas
• Normalise them. Divide them on Sqr(a^2 + b^2 ...)
• You now have your principal axes
Q:

ROC - explanation

A:

Scatter plot of the true positive rate TPR and the false positive rate FPR

Q:

ID3, when to use?

A:
• Extension of classification and regression tree
• Accept real-valued and missing features
• Uses a pruning mechanism to reduce tree size

Q:

Principal component analysis - when?

A:

When we want to visualize high-dimensional data

Work with fewer dimensions

Q:

Give examples of different types of partitioning clustering

A:

k-means

k-medoids

CLARAUS

Q:

Give examples of different types of GRID-based clustering

A:

STING

waveCluster

CLIQUE

Q:

What is the dimensional curse?

A:

As the number of features/dimensions grow, the amount of data we need to generalize and outcome grows exponentially

Q:

What are the reasons for overfitting ?

A:
• data contains noise
• not enough data
• model is to complex
Q:

Requirements for a good clustering models

A:

->Scalability which is being able to use the model with more data than the sample data

->Being able to use it with different types of attributes (binary,numerical,categorical)

->Interpretability and usability

Q:

Discuss ensemble learning

A:

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models
Q:

Pros of ensemble learning

A:

Better accuracy

More consistence

Reduces bias

Q:

What are the reasons for underfitting?

A:
• data is not clean
• model has bias
• small amount od data
• model is too simple

### Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

## Das sind die beliebtesten Data and web mining Kurse im gesamten StudySmarter Universum

##### Englisch Medical Engineering and Data Science

Technische Hochschule Aschaffenburg

##### Reasoning and logic

Delft University of Technology

##### data structures and algorithms (c++)

Jackson State University

##### dating, courtship, and marriage

Ateneo de Manila University