Final Report |
How it Works |
Use Cases |
SparkMonitor - How the extension works
The SparkMonitor extension for Jupyter Notebook has 4 components.
- IPython Kernel extension written in Python.
- Notebook web server extension written in Python.
- An implementation of SparkListener interface written in Scala.
The Frontend Extension
- Receives data from the IPython kernel through Jupyter’s comm API mechanism for widgets.
- Jupyter frontend extensions are requirejs modules that are loaded when the browser page loads.
- Contains the logic for displaying the progress bars, graphs and timeline.
- Keeps track of cells running using a queue by tracking execution requests and kernel busy/idle events.
- Creates and renders the display if a job start event is received while a cell is running.
IPython Kernel Extension
- The kernel extension is an importable Python module called
- It is configured to load when the IPython kernel process starts.
- The extension acts as a bridge between the frontend and the SparkListener callback interface.
- To communicate with the SparkListener the extension opens a socket and waits for connections.
- The port of the socket is exported as an environment variable. When a Spark application starts, the custom SparkListener connects to this port and forwards data.
- To communicate with the frontend the extension uses the IPython Comm API provided by Jupyter.
- The extension also adds to the users namespace a SparkConf instance named as
conf. This object is configured with the Spark properties that makes Spark load the custom SparkListener as well as adds the necessary JAR file paths to the Java class path.
- Written in Scala.
- The listener receives notifications of Apache Spark application lifecycle events as callbacks.
- The custom implementation used in this extension connects to a socket opened by the IPython kernel extension.
The Notebook Webserver Extension - A Spark Web UI proxy
- Written in Python.
- This module proxies the Spark UI running typically on 127.0.0.1:4040 to the user through Jupyter’s web server.
- Jupyter notebook is based on the Tornado web server back end. Tornado is a Python webserver.
- Jupyter webserver extensions are custom request handlers sub-classing the
IPythonHandler class. They provide custom endpoints with additional content.
- This module provides the Spark UI as an endpoint at
- In the front end extension, the Spark UI can also be accessed as an IFrame dialog through the monitoring display.
- For the Spark UI web application to work as expected, the server extension replaces all relative URLs in the requested page, adding the endpoints base URL to each.