Data Mining an der Universität Mainz | Karteikarten & Zusammenfassungen

# Lernmaterialien für Data Mining an der Universität Mainz

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Data Mining Kurs an der Universität Mainz zu.

TESTE DEIN WISSEN

What is a clustering?

Lösung anzeigen
TESTE DEIN WISSEN

Set of clusters

the output of cluster analysis

Lösung ausblenden
TESTE DEIN WISSEN

What are possible requirements of Clustering in Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
• Scalability: Should be feasible with large datasets
• different types of attributes: boolean, real...
• dynamically changing data: (distribution drifts?)
• clusters of arbitrary shape:
• minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
• handle noise and outliers well:
• insensitive to order of input records
• high dimensionality
• user-specified constraints
• interpretability and usability
Lösung ausblenden
TESTE DEIN WISSEN

What types of clusterings are there?

Lösung anzeigen
TESTE DEIN WISSEN
• exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
• categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
• hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
• online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once

Lösung ausblenden
TESTE DEIN WISSEN

Why shouldn't you confuse clusters and classes in labeled data?

Lösung anzeigen
TESTE DEIN WISSEN

There maybe several clusters for one class

Lösung ausblenden
TESTE DEIN WISSEN

How to evaluate k?

Lösung anzeigen
TESTE DEIN WISSEN

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Lösung ausblenden
TESTE DEIN WISSEN

How to handle nominal attributes?

Lösung anzeigen
TESTE DEIN WISSEN

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Lösung ausblenden
TESTE DEIN WISSEN

How is canonization and graph isomophism related for graphs?

Lösung anzeigen
TESTE DEIN WISSEN

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Lösung ausblenden
TESTE DEIN WISSEN

What is Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
• Knowledge Discovery in Databases (KDD)
(Fayyad 96): “KDD is the non-trivial
process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.“
• Data Mining: data analysis step within
the KDD process
Lösung ausblenden
TESTE DEIN WISSEN

What is Machine Learning?

Lösung anzeigen
TESTE DEIN WISSEN

Improve on Task T wrt to measure P based on experience E.

eg checkers, games won, play against oneself

Lösung ausblenden
TESTE DEIN WISSEN

What is descriptive or predictive pattern mining?

Lösung anzeigen
TESTE DEIN WISSEN

descriptive: eg clustering

predictive: eg classification

Lösung ausblenden
TESTE DEIN WISSEN

Theoretical formulation of pattern mining

Lösung anzeigen
TESTE DEIN WISSEN

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Lösung ausblenden
TESTE DEIN WISSEN

What is graph mining?

Lösung anzeigen
TESTE DEIN WISSEN

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

Lösung ausblenden
• 128864 Karteikarten
• 2407 Studierende
• 87 Lernmaterialien

## Beispielhafte Karteikarten für deinen Data Mining Kurs an der Universität Mainz - von Kommilitonen auf StudySmarter erstellt!

Q:

What is a clustering?

A:

Set of clusters

the output of cluster analysis

Q:

What are possible requirements of Clustering in Data Mining?

A:
• Scalability: Should be feasible with large datasets
• different types of attributes: boolean, real...
• dynamically changing data: (distribution drifts?)
• clusters of arbitrary shape:
• minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
• handle noise and outliers well:
• insensitive to order of input records
• high dimensionality
• user-specified constraints
• interpretability and usability
Q:

What types of clusterings are there?

A:
• exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
• categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
• hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
• online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once

Q:

Why shouldn't you confuse clusters and classes in labeled data?

A:

There maybe several clusters for one class

Q:

How to evaluate k?

A:

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Q:

How to handle nominal attributes?

A:

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Q:

How is canonization and graph isomophism related for graphs?

A:

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Q:

What is Data Mining?

A:
• Knowledge Discovery in Databases (KDD)
(Fayyad 96): “KDD is the non-trivial
process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.“
• Data Mining: data analysis step within
the KDD process
Q:

What is Machine Learning?

A:

Improve on Task T wrt to measure P based on experience E.

eg checkers, games won, play against oneself

Q:

What is descriptive or predictive pattern mining?

A:

descriptive: eg clustering

predictive: eg classification

Q:

Theoretical formulation of pattern mining

A:

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Q:

What is graph mining?

A:

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

### Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

## Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang Data Mining an der Universität Mainz

Für deinen Studiengang Data Mining an der Universität Mainz gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

## Das sind die beliebtesten Data Mining Kurse im gesamten StudySmarter Universum

TU München

Hochschule Aalen

##### Big Data

ETHZ - ETH Zurich

##### Data and web mining

Instituto Politécnico de Coimbra

##### Big Data

FOM Hochschule für Oekonomie & Management