Big Data at Universität Duisburg-Essen | Flashcards & Summaries

Select your language

Suggested languages for you:
Log In Start studying!

Lernmaterialien für big data an der Universität Duisburg-Essen

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen big data Kurs an der Universität Duisburg-Essen zu.

TESTE DEIN WISSEN

Which sectors are generating Big Data (6 points)

Lösung anzeigen
TESTE DEIN WISSEN

- Social media

- User tracking & Engagement

- Learning/ Education

- Ecommerce

- FInancial Services

- Web / Real-Time Search

Lösung ausblenden
TESTE DEIN WISSEN

Definition of big data

Lösung anzeigen
TESTE DEIN WISSEN

• Big Data consists of extensive datasets, primarily in the characteristics of volume, velocity, and/or variety that require a scalable architecture for efficient storage, manipulation, and analysis

Lösung ausblenden
TESTE DEIN WISSEN

What are the 5 V's

Lösung anzeigen
TESTE DEIN WISSEN

- Volume (data at rest)

- Velocity (data in motion)

- Variety (data in many forms eg. Structured, unstructured, tet, etc)

- Veracity (trustworthyness)

- Value (turn data into value)

Lösung ausblenden
TESTE DEIN WISSEN

How to handle small, medium and big data?

Lösung anzeigen
TESTE DEIN WISSEN

- small data (fit in main memory) - CSV file

-medium data (fit in machine but not memory) - database system to manipulate on disk eg. SQL queries

- Big data (does not fit in one machine)- Distributed system file eg. NoSQL databases

Lösung ausblenden
TESTE DEIN WISSEN

 2 approaches to scale up strorage system

Lösung anzeigen
TESTE DEIN WISSEN

 Vertical scaling 

- Enlarge a single machine 

- Limited in space 

-  Expensive 

Horizontal scaling 

-  Use many commodity machines and form computer clusters or grids 

-  Cluster maintenance is needed

Lösung ausblenden
TESTE DEIN WISSEN

Different storage types in NoSQL (4 items)

Lösung anzeigen
TESTE DEIN WISSEN

• Key-Value Store 

• Column Store 

• Document Store 

• Graph Store 

Lösung ausblenden
TESTE DEIN WISSEN

What is documet store in NoSQL (4 points)

Lösung anzeigen
TESTE DEIN WISSEN

• Store documents in form of XML or JSON 

• Each document is assigned a unique key (used to retrieve the document) 

• Semi-structured data records that do not have a homogeneous structure 

• Data records can have more than one value (arrays) 

Lösung ausblenden
TESTE DEIN WISSEN

what is graph store in NoSQL (3 points)

Lösung anzeigen
TESTE DEIN WISSEN

• Use graphs to store and represent relationships between entities 

• Composed of nodes and edges 

• Each node and each edge can contain properties (Property-Graphs) 

Lösung ausblenden
TESTE DEIN WISSEN

Big data Analytics - Definition

Lösung anzeigen
TESTE DEIN WISSEN

Process of examining large and varied data sets to

- uncover hidden patterns

- unknown correlations

- market rends

- customer preferences

that can help make business decisions

Lösung ausblenden
TESTE DEIN WISSEN

Why do we use Hadoop (2 points)

Lösung anzeigen
TESTE DEIN WISSEN

- Scalability

- Availability/Reliability/fault tolerance

Lösung ausblenden
TESTE DEIN WISSEN

How does Hadoop handle scalability?

+ What is a node in Hadoop

Lösung anzeigen
TESTE DEIN WISSEN

- Distributed system

- nodes: Individual server within a cluster to store and processes

- adding more nodes = increase scalability

Lösung ausblenden
TESTE DEIN WISSEN

Who is generating and consuming big data (old model vs new model)

Lösung anzeigen
TESTE DEIN WISSEN

Old model: Few companies are generating and all others are consuming


New model: All of us are generating and all of us are comsuming

Lösung ausblenden
  • 225527 Karteikarten
  • 4862 Studierende
  • 86 Lernmaterialien

Beispielhafte Karteikarten für deinen big data Kurs an der Universität Duisburg-Essen - von Kommilitonen auf StudySmarter erstellt!

Q:

Which sectors are generating Big Data (6 points)

A:

- Social media

- User tracking & Engagement

- Learning/ Education

- Ecommerce

- FInancial Services

- Web / Real-Time Search

Q:

Definition of big data

A:

• Big Data consists of extensive datasets, primarily in the characteristics of volume, velocity, and/or variety that require a scalable architecture for efficient storage, manipulation, and analysis

Q:

What are the 5 V's

A:

- Volume (data at rest)

- Velocity (data in motion)

- Variety (data in many forms eg. Structured, unstructured, tet, etc)

- Veracity (trustworthyness)

- Value (turn data into value)

Q:

How to handle small, medium and big data?

A:

- small data (fit in main memory) - CSV file

-medium data (fit in machine but not memory) - database system to manipulate on disk eg. SQL queries

- Big data (does not fit in one machine)- Distributed system file eg. NoSQL databases

Q:

 2 approaches to scale up strorage system

A:

 Vertical scaling 

- Enlarge a single machine 

- Limited in space 

-  Expensive 

Horizontal scaling 

-  Use many commodity machines and form computer clusters or grids 

-  Cluster maintenance is needed

Mehr Karteikarten anzeigen
Q:

Different storage types in NoSQL (4 items)

A:

• Key-Value Store 

• Column Store 

• Document Store 

• Graph Store 

Q:

What is documet store in NoSQL (4 points)

A:

• Store documents in form of XML or JSON 

• Each document is assigned a unique key (used to retrieve the document) 

• Semi-structured data records that do not have a homogeneous structure 

• Data records can have more than one value (arrays) 

Q:

what is graph store in NoSQL (3 points)

A:

• Use graphs to store and represent relationships between entities 

• Composed of nodes and edges 

• Each node and each edge can contain properties (Property-Graphs) 

Q:

Big data Analytics - Definition

A:

Process of examining large and varied data sets to

- uncover hidden patterns

- unknown correlations

- market rends

- customer preferences

that can help make business decisions

Q:

Why do we use Hadoop (2 points)

A:

- Scalability

- Availability/Reliability/fault tolerance

Q:

How does Hadoop handle scalability?

+ What is a node in Hadoop

A:

- Distributed system

- nodes: Individual server within a cluster to store and processes

- adding more nodes = increase scalability

Q:

Who is generating and consuming big data (old model vs new model)

A:

Old model: Few companies are generating and all others are consuming


New model: All of us are generating and all of us are comsuming

big data

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang big data an der Universität Duisburg-Essen

Für deinen Studiengang big data an der Universität Duisburg-Essen gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Mehr Karteikarten anzeigen

Das sind die beliebtesten big data Kurse im gesamten StudySmarter Universum

Big Data

ETHZ - ETH Zurich

Zum Kurs
Big Data

FOM Hochschule für Oekonomie & Management

Zum Kurs
Big Data

FOM Hochschule für Oekonomie & Management

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden big data
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen big data