Select your language

Suggested languages for you:
Login Anmelden

Lernmaterialien für Data Mining an der Universität Mainz

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Data Mining Kurs an der Universität Mainz zu.

TESTE DEIN WISSEN

What types of clusterings are there?

Lösung anzeigen
TESTE DEIN WISSEN
  • exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
  • categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
  • hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
  • online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once


Lösung ausblenden
TESTE DEIN WISSEN

What is the difference with coverage and subsumption?

Lösung anzeigen
TESTE DEIN WISSEN

Coverage is what you do with the database scan, subsuption is what you do between the patterns

Lösung ausblenden
TESTE DEIN WISSEN

how does candidate formation work for itemsets?

Lösung anzeigen
TESTE DEIN WISSEN

ab, bc -> abc

however if ac infrequent, abc as well 

thus candidate elimination by checking whether all subsets are frequent


(afterwards selection with database)

Lösung ausblenden
TESTE DEIN WISSEN

How to handle nominal attributes?

Lösung anzeigen
TESTE DEIN WISSEN

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Lösung ausblenden
TESTE DEIN WISSEN

How is canonization and graph isomophism related for graphs?

Lösung anzeigen
TESTE DEIN WISSEN

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Lösung ausblenden
TESTE DEIN WISSEN

What is Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
  • Knowledge Discovery in Databases (KDD)
    (Fayyad 96): “KDD is the non-trivial
    process of identifying valid, novel,
    potentially useful, and ultimately
    understandable patterns in data.“
  • Data Mining: data analysis step within
    the KDD process
Lösung ausblenden
TESTE DEIN WISSEN

What is descriptive or predictive pattern mining?

Lösung anzeigen
TESTE DEIN WISSEN

descriptive: eg clustering

predictive: eg classification

Lösung ausblenden
TESTE DEIN WISSEN

What are possible requirements of Clustering in Data Mining?

Lösung anzeigen
TESTE DEIN WISSEN
  • Scalability: Should be feasible with large datasets
  • different types of attributes: boolean, real...
  • dynamically changing data: (distribution drifts?)
  • clusters of arbitrary shape:
  • minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
  • handle noise and outliers well:
  • insensitive to order of input records
  • high dimensionality
  • user-specified constraints
  • interpretability and usability
Lösung ausblenden
TESTE DEIN WISSEN

How to evaluate k?

Lösung anzeigen
TESTE DEIN WISSEN

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Lösung ausblenden
TESTE DEIN WISSEN

Theoretical formulation of pattern mining

Lösung anzeigen
TESTE DEIN WISSEN

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Lösung ausblenden
TESTE DEIN WISSEN

What is graph mining?

Lösung anzeigen
TESTE DEIN WISSEN

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

Lösung ausblenden
TESTE DEIN WISSEN

How do we want our data for apriori on itemsets?

Lösung anzeigen
TESTE DEIN WISSEN

table, columns: args, rows: instances

discretized, features selected, cleaned, sampled (preprocessing)

Lösung ausblenden
  • 250412 Karteikarten
  • 3829 Studierende
  • 132 Lernmaterialien

Beispielhafte Karteikarten für deinen Data Mining Kurs an der Universität Mainz - von Kommilitonen auf StudySmarter erstellt!

Q:

What types of clusterings are there?

A:
  • exclusive vs overlapping: instances belong to exactly one cluster vs possibly several
  • categorical vs probabilistic: each instance either belongs to a cluster or not vs each instance has for each cluster a probability
  • hierarchical vs flat: there is a hierarchy of clusters (like a tree) vs not
  • online vs batch: stream of data hast to be handled online for each new received instance vs algorithm has access to all instances at once


Q:

What is the difference with coverage and subsumption?

A:

Coverage is what you do with the database scan, subsuption is what you do between the patterns

Q:

how does candidate formation work for itemsets?

A:

ab, bc -> abc

however if ac infrequent, abc as well 

thus candidate elimination by checking whether all subsets are frequent


(afterwards selection with database)

Q:

How to handle nominal attributes?

A:

just save discrete probability table??

if correlated: table grows exponentially in number attributes

Q:

How is canonization and graph isomophism related for graphs?

A:

If we can solve canonization, we can solve isomophism, by just testing whether the canonized form is the same.

Mehr Karteikarten anzeigen
Q:

What is Data Mining?

A:
  • Knowledge Discovery in Databases (KDD)
    (Fayyad 96): “KDD is the non-trivial
    process of identifying valid, novel,
    potentially useful, and ultimately
    understandable patterns in data.“
  • Data Mining: data analysis step within
    the KDD process
Q:

What is descriptive or predictive pattern mining?

A:

descriptive: eg clustering

predictive: eg classification

Q:

What are possible requirements of Clustering in Data Mining?

A:
  • Scalability: Should be feasible with large datasets
  • different types of attributes: boolean, real...
  • dynamically changing data: (distribution drifts?)
  • clusters of arbitrary shape:
  • minimal requirements on domain to determine input parameters: We possibly don't know anything about domain
  • handle noise and outliers well:
  • insensitive to order of input records
  • high dimensionality
  • user-specified constraints
  • interpretability and usability
Q:

How to evaluate k?

A:

crossvalidation

big advantage over non probabilistic clustering: likelihood can be used to compare clusterings

Q:

Theoretical formulation of pattern mining

A:

language of patterns L

Database D

interestingness predicate q(p , D) = 1 or 0 if p \in L interesting wrt D or not

Q:

What is graph mining?

A:

Pattern Mining on graphs

Given graph database D, find all subgraphs (patterns) that occur with frequency >= f

Q:

How do we want our data for apriori on itemsets?

A:

table, columns: args, rows: instances

discretized, features selected, cleaned, sampled (preprocessing)

Data Mining

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Data Mining
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Data Mining