Final Report |
Installation |
How it Works |
Use Cases |
Code |
License
SparkMonitor - How the extension works
Jupyter Notebook is a web based application that follows a client-server architecture. It consists of a JavaScript browser client that renders the notebook interface and a web server process on the back end. The computation of the cells are outsourced to a separate kernel process running on the server. To extend the notebook, it is required to implement a separate extension component for each part.
The SparkMonitor extension for Jupyter Notebook has 4 components.
- Notebook Frontend extension written in JavaScript.
- IPython Kernel extension written in Python.
- Notebook web server extension written in Python.
- An implementation of SparkListener interface written in Scala.
The Frontend Extension
- Written in JavaScript.
- Receives data from the IPython kernel through Jupyter’s comm API mechanism for widgets.
- Jupyter frontend extensions are requirejs modules that are loaded when the browser page loads.
- Contains the logic for displaying the progress bars, graphs and timeline.
- Keeps track of cells running using a queue by tracking execution requests and kernel busy/idle events.
- Creates and renders the display if a job start event is received while a cell is running.
IPython Kernel Extension
- The kernel extension is an importable Python module called
sparkmonitor.kernelextension
- It is configured to load when the IPython kernel process starts.
- The extension acts as a bridge between the frontend and the SparkListener callback interface.
- To communicate with the SparkListener the extension opens a socket and waits for connections.
- The port of the socket is exported as an environment variable. When a Spark application starts, the custom SparkListener connects to this port and forwards data.
- To communicate with the frontend the extension uses the IPython Comm API provided by Jupyter.
- The extension also adds to the users namespace a SparkConf instance named as
conf
. This object is configured with the Spark properties that makes Spark load the custom SparkListener as well as adds the necessary JAR file paths to the Java class path.
- Written in Scala.
- The listener receives notifications of Apache Spark application lifecycle events as callbacks.
- The custom implementation used in this extension connects to a socket opened by the IPython kernel extension.
- All the data is forwarded to the kernel through this socket which forwards it to the frontend JavaScript.
The Notebook Webserver Extension - A Spark Web UI proxy
- Written in Python.
- This module proxies the Spark UI running typically on 127.0.0.1:4040 to the user through Jupyter’s web server.
- Jupyter notebook is based on the Tornado web server back end. Tornado is a Python webserver.
- Jupyter webserver extensions are custom request handlers sub-classing the
IPythonHandler
class. They provide custom endpoints with additional content.
- This module provides the Spark UI as an endpoint at
notebook_base_url/sparkmonitor
.
- In the front end extension, the Spark UI can also be accessed as an IFrame dialog through the monitoring display.
- For the Spark UI web application to work as expected, the server extension replaces all relative URLs in the requested page, adding the endpoints base URL to each.