YARN+Spark at ETHZ - ETH Zurich | Flashcards & Summaries

Select your language

Suggested languages for you:
Log In Start studying!

Lernmaterialien für YARN+Spark an der ETHZ - ETH Zurich

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen YARN+Spark Kurs an der ETHZ - ETH Zurich zu.

TESTE DEIN WISSEN

Whats the right way to think about schedulers if there is no preemption? E.g. if a cluster has 1'000 GB and a user has a share of 50% and schedules 15 jobs that need 100GB each and take one hour how do you do that under a Capacity scheduler and under a FAIR scheduler?

Lösung anzeigen
TESTE DEIN WISSEN

Under a Capacity scheduler you could only start 5 jobs at a time so you would have to wait 3 hours.

Under a FAIR scheduler you could start 10 jobs at the same time if nobody else is using the cluster and the jobs that are started are definitely executed until the end. After an hour the last 5 jobs can definitely be scheduled because the queue is entitled to 50% of the cluster.

Lösung ausblenden
TESTE DEIN WISSEN

Which are the 4 properties Dominant Resource Fairness (DRF) satisfies?

Lösung anzeigen
TESTE DEIN WISSEN
  • sharing incentive (each user should be better off sharing the cluster)
  • strategy-proofness (user can't get a better allocation by lying)
  • Pareto efficiency (it should not be possible to increase the allocation of a user without decreasing the allocation of at least one other user)
  • envy-free (a user should not prefer the allocation of another user)
Lösung ausblenden
TESTE DEIN WISSEN

List 5 main shortcomings of MapReduce v1, which are addressed by YARN design.

Lösung anzeigen
TESTE DEIN WISSEN
  1. Scalability issues: MapReduce has limited scalability, while YARN can scale to 10,000 nodes.
  2. Rigidity issues: MapReduce v1 only supports MapReduce specific jobs. There is a need, however, for scheduling non-MapReduce workloads. For instance, we would like the ability to share cluster with MPI, graph processing, and any user code.
  3. Resource utilization isues: in MapReduce v1, the reducers wait on the mappers to finish (and vice-versa), leaving large fractions of time when either the reducers or the mappers are idle. Ideally all resources should be used at any given time.
  4. Flexibility issues: mapper and reducer roles are decided at configuration time, and cannot be reconfigured. 
  5. Ability to maintain MapReduce frameworks of different versions.
Lösung ausblenden
TESTE DEIN WISSEN

What is steady fair share?

Lösung anzeigen
TESTE DEIN WISSEN

How much resources a departement should get in general.

Lösung ausblenden
TESTE DEIN WISSEN

Whose responsibility is tracking status and progress of running applications?

Lösung anzeigen
TESTE DEIN WISSEN

ResourceManager

Lösung ausblenden
TESTE DEIN WISSEN

The ResourceManager does not have a global view of all usage of cluster resources. Therefore, it tries to make better scheduling decisions based on probabilistic prediction.

Lösung anzeigen
TESTE DEIN WISSEN

True

Lösung ausblenden
TESTE DEIN WISSEN

How many instances of Container are in a cluster in YARN?

Lösung anzeigen
TESTE DEIN WISSEN

Many per node

Lösung ausblenden
TESTE DEIN WISSEN

Does the ResourceManager have to provide fault tolerance for resources across the cluster?

Lösung anzeigen
TESTE DEIN WISSEN

Yes

Lösung ausblenden
TESTE DEIN WISSEN

Whose responsibility is asking for resources needed for an application?

Lösung anzeigen
TESTE DEIN WISSEN

ResourceManager

Lösung ausblenden
TESTE DEIN WISSEN

How many instances of NodeManager are in a cluster in YARN?

Lösung anzeigen
TESTE DEIN WISSEN

One per node

Lösung ausblenden
TESTE DEIN WISSEN

ResourceManager has the ability to request resources back from a running application.

Lösung anzeigen
TESTE DEIN WISSEN

True

Lösung ausblenden
TESTE DEIN WISSEN


How many instances of a ApplicationMaster are in a cluster in YARN?

Lösung anzeigen
TESTE DEIN WISSEN

Many per cluster, but usually not per every node

Lösung ausblenden
  • 96766 Karteikarten
  • 1723 Studierende
  • 87 Lernmaterialien

Beispielhafte Karteikarten für deinen YARN+Spark Kurs an der ETHZ - ETH Zurich - von Kommilitonen auf StudySmarter erstellt!

Q:

Whats the right way to think about schedulers if there is no preemption? E.g. if a cluster has 1'000 GB and a user has a share of 50% and schedules 15 jobs that need 100GB each and take one hour how do you do that under a Capacity scheduler and under a FAIR scheduler?

A:

Under a Capacity scheduler you could only start 5 jobs at a time so you would have to wait 3 hours.

Under a FAIR scheduler you could start 10 jobs at the same time if nobody else is using the cluster and the jobs that are started are definitely executed until the end. After an hour the last 5 jobs can definitely be scheduled because the queue is entitled to 50% of the cluster.

Q:

Which are the 4 properties Dominant Resource Fairness (DRF) satisfies?

A:
  • sharing incentive (each user should be better off sharing the cluster)
  • strategy-proofness (user can't get a better allocation by lying)
  • Pareto efficiency (it should not be possible to increase the allocation of a user without decreasing the allocation of at least one other user)
  • envy-free (a user should not prefer the allocation of another user)
Q:

List 5 main shortcomings of MapReduce v1, which are addressed by YARN design.

A:
  1. Scalability issues: MapReduce has limited scalability, while YARN can scale to 10,000 nodes.
  2. Rigidity issues: MapReduce v1 only supports MapReduce specific jobs. There is a need, however, for scheduling non-MapReduce workloads. For instance, we would like the ability to share cluster with MPI, graph processing, and any user code.
  3. Resource utilization isues: in MapReduce v1, the reducers wait on the mappers to finish (and vice-versa), leaving large fractions of time when either the reducers or the mappers are idle. Ideally all resources should be used at any given time.
  4. Flexibility issues: mapper and reducer roles are decided at configuration time, and cannot be reconfigured. 
  5. Ability to maintain MapReduce frameworks of different versions.
Q:

What is steady fair share?

A:

How much resources a departement should get in general.

Q:

Whose responsibility is tracking status and progress of running applications?

A:

ResourceManager

Mehr Karteikarten anzeigen
Q:

The ResourceManager does not have a global view of all usage of cluster resources. Therefore, it tries to make better scheduling decisions based on probabilistic prediction.

A:

True

Q:

How many instances of Container are in a cluster in YARN?

A:

Many per node

Q:

Does the ResourceManager have to provide fault tolerance for resources across the cluster?

A:

Yes

Q:

Whose responsibility is asking for resources needed for an application?

A:

ResourceManager

Q:

How many instances of NodeManager are in a cluster in YARN?

A:

One per node

Q:

ResourceManager has the ability to request resources back from a running application.

A:

True

Q:


How many instances of a ApplicationMaster are in a cluster in YARN?

A:

Many per cluster, but usually not per every node

YARN+Spark

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang YARN+Spark an der ETHZ - ETH Zurich

Für deinen Studiengang YARN+Spark an der ETHZ - ETH Zurich gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden YARN+Spark
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen YARN+Spark