SparkMonitor

Example Use Case - Spark Tutorial Notebooks

This use case runs a few notebooks used at CERN for training in Apache Spark. They test a wide range of Spark APIs including reading data from files.

These notebook were run with a local Apache Spark installation, using 1 executor and 4 cores, running inside a Docker container based on Scientific Linux CERN 6.

The extension shows all the jobs that have been run from a cell
The stages for each job are shown in an expanded view which can be individually collapsed.

An aggregated view of resource usage is provided through a graph between number of active tasks and available executor cores. This gives insight into whether the job is blocking on some I/O or waiting for other results. This view gives a picture of the level of parallelization of the tasks between cores across a cluster.

An event timeline shows the overall picture of what is happening in the cluster, split into jobs stages and tasks.

The timeline shows various tasks running on each executor as a group
It shows the time spent by the task in various phases. An overall view of this gives insight into the nature of the workload - I/O bound or CPU bound. This feature can be toggled using a checkbox.
On clicking on an item on the timeline, the corresponding details of the item are shown as a pop-up. For jobs and stages, this shows the Spark Web UI page. For tasks a custom pop-up is shown with various details.

For more advanced details, the extension provides access to the Spark Web UI through a server proxy. This can used by advanced users for an in-depth analysis.

This site is open source. Improve this page.