Data Mining an der Universität Mainz | Karteikarten & Zusammenfassungen

Lernmaterialien für Data Mining an der Universität Mainz

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Data Mining Kurs an der Universität Mainz zu.

TESTE DEIN WISSEN

What is a clustering?

Lösung anzeigen
TESTE DEIN WISSEN

Set of clusters

the output of cluster analysis

Lösung ausblenden
TESTE DEIN WISSEN

What are possible requirements of Clustering in Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
  • Scalability: Should be feasible with large datasets
  • different types of attributes: boolean, real...
  • dynamically changing data: (distribution drifts?)
  • clusters of arbitrary shape:
  • minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
  • handle noise and outliers well:
  • insensitive to order of input records
  • high dimensionality
  • user-specified constraints
  • interpretability and usability
Lösung ausblenden
TESTE DEIN WISSEN

What types of clusterings are there?

Lösung anzeigen
TESTE DEIN WISSEN
  • exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
  • categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
  • hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
  • online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once


Lösung ausblenden
TESTE DEIN WISSEN

Why shouldn't you confuse clusters and classes in labeled data?

Lösung anzeigen
TESTE DEIN WISSEN

There maybe several clusters for one class

Lösung ausblenden
TESTE DEIN WISSEN

How to evaluate k?

Lösung anzeigen
TESTE DEIN WISSEN

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Lösung ausblenden
TESTE DEIN WISSEN

How to handle nominal attributes?

Lösung anzeigen
TESTE DEIN WISSEN

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Lösung ausblenden
TESTE DEIN WISSEN

How is canonization and graph isomophism related for graphs?

Lösung anzeigen
TESTE DEIN WISSEN

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Lösung ausblenden
TESTE DEIN WISSEN

What is Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
  • Knowledge Discovery in Databases (KDD)
    (Fayyad 96): “KDD is the non-trivial
    process of identifying valid, novel,
    potentially useful, and ultimately
    understandable patterns in data.“
  • Data Mining: data analysis step within
    the KDD process
Lösung ausblenden
TESTE DEIN WISSEN

What is Machine Learning?

Lösung anzeigen
TESTE DEIN WISSEN

Improve on Task T wrt to measure P based on experience E.

eg checkers, games won, play against oneself

Lösung ausblenden
TESTE DEIN WISSEN

What is descriptive or predictive pattern mining?

Lösung anzeigen
TESTE DEIN WISSEN

descriptive: eg clustering

predictive: eg classification

Lösung ausblenden
TESTE DEIN WISSEN

Theoretical formulation of pattern mining

Lösung anzeigen
TESTE DEIN WISSEN

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Lösung ausblenden
TESTE DEIN WISSEN

What is graph mining?

Lösung anzeigen
TESTE DEIN WISSEN

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

Lösung ausblenden
  • 128864 Karteikarten
  • 2407 Studierende
  • 87 Lernmaterialien

Beispielhafte Karteikarten für deinen Data Mining Kurs an der Universität Mainz - von Kommilitonen auf StudySmarter erstellt!

Q:

What is a clustering?

A:

Set of clusters

the output of cluster analysis

Q:

What are possible requirements of Clustering in Data Mining?

A:
  • Scalability: Should be feasible with large datasets
  • different types of attributes: boolean, real...
  • dynamically changing data: (distribution drifts?)
  • clusters of arbitrary shape:
  • minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
  • handle noise and outliers well:
  • insensitive to order of input records
  • high dimensionality
  • user-specified constraints
  • interpretability and usability
Q:

What types of clusterings are there?

A:
  • exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
  • categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
  • hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
  • online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once


Q:

Why shouldn't you confuse clusters and classes in labeled data?

A:

There maybe several clusters for one class

Q:

How to evaluate k?

A:

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Mehr Karteikarten anzeigen
Q:

How to handle nominal attributes?

A:

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Q:

How is canonization and graph isomophism related for graphs?

A:

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Q:

What is Data Mining?

A:
  • Knowledge Discovery in Databases (KDD)
    (Fayyad 96): “KDD is the non-trivial
    process of identifying valid, novel,
    potentially useful, and ultimately
    understandable patterns in data.“
  • Data Mining: data analysis step within
    the KDD process
Q:

What is Machine Learning?

A:

Improve on Task T wrt to measure P based on experience E.

eg checkers, games won, play against oneself

Q:

What is descriptive or predictive pattern mining?

A:

descriptive: eg clustering

predictive: eg classification

Q:

Theoretical formulation of pattern mining

A:

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Q:

What is graph mining?

A:

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

Data Mining

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang Data Mining an der Universität Mainz

Für deinen Studiengang Data Mining an der Universität Mainz gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Das sind die beliebtesten Data Mining Kurse im gesamten StudySmarter Universum

Data Mining and KD

TU München

Zum Kurs
Big Data

Hochschule Aalen

Zum Kurs
Big Data

ETHZ - ETH Zurich

Zum Kurs
Data and web mining

Instituto Politécnico de Coimbra

Zum Kurs
Big Data

FOM Hochschule für Oekonomie & Management

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Data Mining
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Data Mining