Final Report |
Installation |
How it Works |
Use Cases |
Code |
License
Example Use Case - Spark Tutorial Notebooks
Introduction
This use case runs a few notebooks used at CERN for training in Apache Spark.
They test a wide range of Spark APIs including reading data from files.
Notebooks
Environment
- These notebook were run with a local Apache Spark installation, using 1 executor and 4 cores, running inside a Docker container based on Scientific Linux CERN 6.
Monitoring the Notebook
- The extension shows all the jobs that have been run from a cell
- The stages for each job are shown in an expanded view which can be individually collapsed.
- An aggregated view of resource usage is provided through a graph between number of active tasks and available executor cores. This gives insight into whether the job is blocking on some I/O or waiting for other results. This view gives a picture of the level of parallelization of the tasks between cores across a cluster.
- An event timeline shows the overall picture of what is happening in the cluster, split into jobs stages and tasks.
- The timeline shows various tasks running on each executor as a group
- It shows the time spent by the task in various phases. An overall view of this gives insight into the nature of the workload - I/O bound or CPU bound. This feature can be toggled using a checkbox.
- On clicking on an item on the timeline, the corresponding details of the item are shown as a pop-up. For jobs and stages, this shows the Spark Web UI page. For tasks a custom pop-up is shown with various details.
- For more advanced details, the extension provides access to the Spark Web UI through a server proxy. This can used by advanced users for an in-depth analysis.