SparkMonitor

Final Report | Installation | How it Works | Use Cases | Code | License

A DistROOT Example

Introduction

One of the main goals of this project was to make it easier for the scientific community in leveraging the power of distributed computing for scientific analysis. In particular by combining Apache Spark and Jupyter Notebooks. ROOT is a popular library based on C++ used for various scientific analysis tasks. This example for the SparkMonitor extension, uses the DistROOT module to process ROOT TTree objects in a distributed cluster using Apache Spark. The Spark job is divided into a map phase, that extracts data from the TTree and uses it to fill histograms, and a reduce phase, that merges all the histograms into a final list.

Environment

Notebook

Monitoring

4

1

2

3

5