Data And Web Mining at Instituto Politécnico De Coimbra | Flashcards & Summaries

Lernmaterialien für Data and web mining an der Instituto Politécnico de Coimbra

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Data and web mining Kurs an der Instituto Politécnico de Coimbra zu.

TESTE DEIN WISSEN


How to calculate the principal axis (PCA)


Lösung anzeigen
TESTE DEIN WISSEN
  • Calculate mean per feature
  • Calculate covariance matrix
    • 1/n SUM( (x-x_mean)*(y-y_mean) )
  • A-ILambda
  • Get lambda
  • go back to your covariance matrix and make 
    • First row = x11*Lambda
    • Second row = x12* Lambda
    • etc
  • Get a relationship between x11, x12 etc for all different lambdas
  • Normalise them. Divide them on Sqr(a^2 + b^2 ...)
  • You now have your principal axes
Lösung ausblenden
TESTE DEIN WISSEN


ROC - explanation


Lösung anzeigen
TESTE DEIN WISSEN

Scatter plot of the true positive rate TPR and the false positive rate FPR

Lösung ausblenden
TESTE DEIN WISSEN


ID3, when to use?


Lösung anzeigen
TESTE DEIN WISSEN
  • Extension of classification and regression tree
  • Accept real-valued and missing features
  • Uses a pruning mechanism to reduce tree size


Lösung ausblenden
TESTE DEIN WISSEN


Principal component analysis - when?


Lösung anzeigen
TESTE DEIN WISSEN


When we want to visualize high-dimensional data

Work with fewer dimensions


Lösung ausblenden
TESTE DEIN WISSEN

Give examples of different types of partitioning clustering

Lösung anzeigen
TESTE DEIN WISSEN

k-means

k-medoids

CLARAUS

Lösung ausblenden
TESTE DEIN WISSEN

Give examples of different types of GRID-based clustering

Lösung anzeigen
TESTE DEIN WISSEN

STING

waveCluster

CLIQUE

Lösung ausblenden
TESTE DEIN WISSEN

What is the dimensional curse?

Lösung anzeigen
TESTE DEIN WISSEN

As the number of features/dimensions grow, the amount of data we need to generalize and outcome grows exponentially 

Lösung ausblenden
TESTE DEIN WISSEN

What are the reasons for overfitting ?

Lösung anzeigen
TESTE DEIN WISSEN
  • data contains noise
  • not enough data
  • model is to complex
Lösung ausblenden
TESTE DEIN WISSEN

Requirements for a good clustering models

Lösung anzeigen
TESTE DEIN WISSEN

->Scalability which is being able to use the model with more data than the sample data

->Being able to use it with different types of attributes (binary,numerical,categorical)

->Interpretability and usability

Lösung ausblenden
TESTE DEIN WISSEN

Discuss ensemble learning

Lösung anzeigen
TESTE DEIN WISSEN


Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models
Lösung ausblenden
TESTE DEIN WISSEN

Pros of ensemble learning

Lösung anzeigen
TESTE DEIN WISSEN

Better accuracy

More consistence

Reduces bias

Lösung ausblenden
TESTE DEIN WISSEN

What are the reasons for underfitting?

Lösung anzeigen
TESTE DEIN WISSEN
  • data is not clean
  • model has bias 
  • small amount od data
  • model is too simple
Lösung ausblenden
  • 54 Karteikarten
  • 96 Studierende
  • 0 Lernmaterialien

Beispielhafte Karteikarten für deinen Data and web mining Kurs an der Instituto Politécnico de Coimbra - von Kommilitonen auf StudySmarter erstellt!

Q:


How to calculate the principal axis (PCA)


A:
  • Calculate mean per feature
  • Calculate covariance matrix
    • 1/n SUM( (x-x_mean)*(y-y_mean) )
  • A-ILambda
  • Get lambda
  • go back to your covariance matrix and make 
    • First row = x11*Lambda
    • Second row = x12* Lambda
    • etc
  • Get a relationship between x11, x12 etc for all different lambdas
  • Normalise them. Divide them on Sqr(a^2 + b^2 ...)
  • You now have your principal axes
Q:


ROC - explanation


A:

Scatter plot of the true positive rate TPR and the false positive rate FPR

Q:


ID3, when to use?


A:
  • Extension of classification and regression tree
  • Accept real-valued and missing features
  • Uses a pruning mechanism to reduce tree size


Q:


Principal component analysis - when?


A:


When we want to visualize high-dimensional data

Work with fewer dimensions


Q:

Give examples of different types of partitioning clustering

A:

k-means

k-medoids

CLARAUS

Mehr Karteikarten anzeigen
Q:

Give examples of different types of GRID-based clustering

A:

STING

waveCluster

CLIQUE

Q:

What is the dimensional curse?

A:

As the number of features/dimensions grow, the amount of data we need to generalize and outcome grows exponentially 

Q:

What are the reasons for overfitting ?

A:
  • data contains noise
  • not enough data
  • model is to complex
Q:

Requirements for a good clustering models

A:

->Scalability which is being able to use the model with more data than the sample data

->Being able to use it with different types of attributes (binary,numerical,categorical)

->Interpretability and usability

Q:

Discuss ensemble learning

A:


Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models
Q:

Pros of ensemble learning

A:

Better accuracy

More consistence

Reduces bias

Q:

What are the reasons for underfitting?

A:
  • data is not clean
  • model has bias 
  • small amount od data
  • model is too simple
Data and web mining

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten Data and web mining Kurse im gesamten StudySmarter Universum

Englisch Medical Engineering and Data Science

Technische Hochschule Aschaffenburg

Zum Kurs
Reasoning and logic

Delft University of Technology

Zum Kurs
data structures and algorithms (c++)

Jackson State University

Zum Kurs
dating, courtship, and marriage

Ateneo de Manila University

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Data and web mining
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Data and web mining