Learning From User-generated Data at Johannes Kepler Universität Linz | Flashcards & Summaries

Lernmaterialien für Learning from User-generated Data an der Johannes Kepler Universität Linz

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Learning from User-generated Data Kurs an der Johannes Kepler Universität Linz zu.

TESTE DEIN WISSEN

What are the differences between the SVD, the "Thin Variant" of the SVD, and the " Truncated" SVD?


Lösung anzeigen
TESTE DEIN WISSEN

Thin SVD:

We can exploit the fact that 𝑅 (usually) is not square and cannot have full rank.
𝑘=𝑚𝑖𝑛(𝑛,𝑚)

As a result, U, Σ, and V have different dimensions:
𝑈 -- (n∗ k) of left singular vectors (it corresponds to users);
Σ -- (k ∗ k) square diagonal matrix, containing singular values;
𝑉 -- (𝑚∗ 𝑘) of right singular vectors (it corresponds to items);

This is a more efficient approach (requires less memory) and it also makes the following demonstration clearer.


Truncated SVD

consider only f largest singular triplets ⇒ approximation and dimensionality reduction


Lösung ausblenden
TESTE DEIN WISSEN

In real-world application which method scales better?

Lösung anzeigen
TESTE DEIN WISSEN

Item-based CF

Lösung ausblenden
TESTE DEIN WISSEN

Why are recommender systems used?

Lösung anzeigen
TESTE DEIN WISSEN
  1. Annotation in Context
  2. Find Good Items
  3. Find All Good Items
  4. Recommend Sequence
  5. Just Browsing
  6. Find Credible Recommender
  7. Improve Profile
  8. Express Self
  9. Help Others
  10. Influence Others
Lösung ausblenden
TESTE DEIN WISSEN

What's the difference between User-based and Item Based CF?

Lösung anzeigen
TESTE DEIN WISSEN

In User-based CF  our goal is to find similar users based on rated items. Predictions are weighted combinations of the most similar user ratings 


In Item-based CF we try to find similar items based on user ratings. Predictions are a weighted combination of most similar items’ ratings

Lösung ausblenden
TESTE DEIN WISSEN

What is the main problem of CF?

Lösung anzeigen
TESTE DEIN WISSEN

Missing Ratings

Data sparsity can lead to unreliable similarity values (or make calculation even impossible)

Lösung ausblenden
TESTE DEIN WISSEN

What are some solutions to deal with missing ratings in CF? 

Lösung anzeigen
TESTE DEIN WISSEN
  • default voting
  • augmentation
    • use transitive interactions between users and items
    • construct graph-based model
    • calculate association strength between users and items
Lösung ausblenden
TESTE DEIN WISSEN

Core issues of Memory-based CF

Lösung anzeigen
TESTE DEIN WISSEN
  • Very memory-demanding
  • "lazy learning"; use of nearest neighbor approaches
  • data sparsity, “cold-start”, curse of dimensionality
Lösung ausblenden
TESTE DEIN WISSEN

Which performance measures are there for retrieval?

Lösung anzeigen
TESTE DEIN WISSEN
  • Recall and Precision 
  • F-measure 
  • Precision a k document (also Precision@k or P@k) 
  • Average Precision (AP) 
  • R-precision 
  • Reciprocal Rank (RR) 
  • Mean Average Precision (MAP) 
  • Discounted Cumulative Gain (DCG) 
  • Rank Correlation 
Lösung ausblenden
TESTE DEIN WISSEN

What are aspects of user-centric evaluation that should be considered?

Lösung anzeigen
TESTE DEIN WISSEN
  • Similarity
    • items should match/be similar to the seed user’s taste
  • Diversity
    • recommended items should not be too similar/boring
  • Novelty / Familiarity 
    • has the user already seen the item
  • Serendipity
    • a user wants to discover something exciting, unexpected; hard to measure
  • Explainability
    • recommender system should explain why an item was recommended
Lösung ausblenden
TESTE DEIN WISSEN

What are latent factor models?

Lösung anzeigen
TESTE DEIN WISSEN

A latent variable model is a statistical model that relates a set of observable variables to a set of latent variables.


These models try to explain ratings by characterizing both users and items in an f-dimensional space of factors derived from the rating patterns.


There is not just item and user but we have user characteristics and item characteristics as features.

Lösung ausblenden
TESTE DEIN WISSEN

What are some of the problems with user-generated content (UGC)?

Lösung anzeigen
TESTE DEIN WISSEN
  • Double-counting: same UGC is sometimes accessible on a variety of sites (duplicate detection)

  • Inactive accounts: not all registered users are active

  • Counting unique users: same user has multiple accounts 

  • Distinction between UGC and other content: e.g., uploading of clips from TV shows by regular (non-professional) user

Lösung ausblenden
TESTE DEIN WISSEN

What are the problems of SVD?

Lösung anzeigen
TESTE DEIN WISSEN

only makes sense on a fully known user rating matrix (i.e., no missing values) as we don't want to learn missing values. 

Full matrices are computationally expensive 

Lösung ausblenden
  • 55814 Karteikarten
  • 1027 Studierende
  • 50 Lernmaterialien

Beispielhafte Karteikarten für deinen Learning from User-generated Data Kurs an der Johannes Kepler Universität Linz - von Kommilitonen auf StudySmarter erstellt!

Q:

What are the differences between the SVD, the "Thin Variant" of the SVD, and the " Truncated" SVD?


A:

Thin SVD:

We can exploit the fact that 𝑅 (usually) is not square and cannot have full rank.
𝑘=𝑚𝑖𝑛(𝑛,𝑚)

As a result, U, Σ, and V have different dimensions:
𝑈 -- (n∗ k) of left singular vectors (it corresponds to users);
Σ -- (k ∗ k) square diagonal matrix, containing singular values;
𝑉 -- (𝑚∗ 𝑘) of right singular vectors (it corresponds to items);

This is a more efficient approach (requires less memory) and it also makes the following demonstration clearer.


Truncated SVD

consider only f largest singular triplets ⇒ approximation and dimensionality reduction


Q:

In real-world application which method scales better?

A:

Item-based CF

Q:

Why are recommender systems used?

A:
  1. Annotation in Context
  2. Find Good Items
  3. Find All Good Items
  4. Recommend Sequence
  5. Just Browsing
  6. Find Credible Recommender
  7. Improve Profile
  8. Express Self
  9. Help Others
  10. Influence Others
Q:

What's the difference between User-based and Item Based CF?

A:

In User-based CF  our goal is to find similar users based on rated items. Predictions are weighted combinations of the most similar user ratings 


In Item-based CF we try to find similar items based on user ratings. Predictions are a weighted combination of most similar items’ ratings

Q:

What is the main problem of CF?

A:

Missing Ratings

Data sparsity can lead to unreliable similarity values (or make calculation even impossible)

Mehr Karteikarten anzeigen
Q:

What are some solutions to deal with missing ratings in CF? 

A:
  • default voting
  • augmentation
    • use transitive interactions between users and items
    • construct graph-based model
    • calculate association strength between users and items
Q:

Core issues of Memory-based CF

A:
  • Very memory-demanding
  • "lazy learning"; use of nearest neighbor approaches
  • data sparsity, “cold-start”, curse of dimensionality
Q:

Which performance measures are there for retrieval?

A:
  • Recall and Precision 
  • F-measure 
  • Precision a k document (also Precision@k or P@k) 
  • Average Precision (AP) 
  • R-precision 
  • Reciprocal Rank (RR) 
  • Mean Average Precision (MAP) 
  • Discounted Cumulative Gain (DCG) 
  • Rank Correlation 
Q:

What are aspects of user-centric evaluation that should be considered?

A:
  • Similarity
    • items should match/be similar to the seed user’s taste
  • Diversity
    • recommended items should not be too similar/boring
  • Novelty / Familiarity 
    • has the user already seen the item
  • Serendipity
    • a user wants to discover something exciting, unexpected; hard to measure
  • Explainability
    • recommender system should explain why an item was recommended
Q:

What are latent factor models?

A:

A latent variable model is a statistical model that relates a set of observable variables to a set of latent variables.


These models try to explain ratings by characterizing both users and items in an f-dimensional space of factors derived from the rating patterns.


There is not just item and user but we have user characteristics and item characteristics as features.

Q:

What are some of the problems with user-generated content (UGC)?

A:
  • Double-counting: same UGC is sometimes accessible on a variety of sites (duplicate detection)

  • Inactive accounts: not all registered users are active

  • Counting unique users: same user has multiple accounts 

  • Distinction between UGC and other content: e.g., uploading of clips from TV shows by regular (non-professional) user

Q:

What are the problems of SVD?

A:

only makes sense on a fully known user rating matrix (i.e., no missing values) as we don't want to learn missing values. 

Full matrices are computationally expensive 

Learning from User-generated Data

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang Learning from User-generated Data an der Johannes Kepler Universität Linz

Für deinen Studiengang Learning from User-generated Data an der Johannes Kepler Universität Linz gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Das sind die beliebtesten Learning from User-generated Data Kurse im gesamten StudySmarter Universum

Deep Learning

TU München

Zum Kurs
Deep Learning

Universität Stuttgart

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Learning from User-generated Data
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Learning from User-generated Data