DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Use Jupyter as Spark Notebook


PySpark in Jupyter

First download Spark 2.2 and extract. Then setup jupyter and install minrk/findspark:

conda create -n pysparkenv python=3.5
. activate pysparkenv
conda install jupyter ipython
conda install -c conda-forge findspark

In the browser, create a new Python 3 notebook, and run:

import findspark
findspark.init("/home/leo/apps/spark-2.2.0-bin-hadoop2.7/", edit_profile=True)
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)

Note: Running findspark.init with option edit_profile=True makes a startup script created in ~/.ipython/profile_default/startup/findspark.py. Next time after creating a new notebook in browser, no need to run findspark.init(...) again.

Scala in Jupyter

For now (2017.12.11), Apache Toree only supports Scala 2.10 and Spark 1.6.3. You can't use pyspark.

Installation

Install Toree kernel according to its Quick Start. It report Permission denied: '/usr/local/share/jupyter'.

According Install to non-/usr/local/share location, and introduction about --ToreeInstall.prefix in jupyter toree install --help-all, the prefix should be $MINICONDA_HOME/envs/py35-anaconda-keras. Here py35-anaconda-keras is the virtual environment name which contains Anaconda and Jupyter notebook.

So install with:

. activate py35-anaconda-keras
pip install toree
jupyter toree install --spark_home=/home/leo/apps/spark-2.2.0-bin-hadoop2.7/ --ToreeInstall.prefix=/home/leo/apps/miniconda3/envs/py35-anaconda-keras/

The server can start, but the toree kernel starting failed.

Accroding to Apache Toree and Spark Scala Not Working in Jupyter, for now toree only support the Scala 2.10 or lower. So I install Spark 1.6.3 and reinstall toree kernel using the following commands:

. activate py35-anaconda-keras
jupyter kernelspec list
rm -rf /home/leo/apps/miniconda3/envs/py35-anaconda-keras/share/jupyter/kernels/apache_toree_scala
jupyter toree install --spark_home=/home/leo/apps/spark-1.6.3-bin-hadoop2.6/ --ToreeInstall.prefix=/home/leo/apps/miniconda3/envs/py35-anaconda-keras/


Published

Dec 11, 2017

Last Updated

Dec 24, 2017

Category

Tech

Tags

  • jupyter 6
  • scala 20
  • spark 21
  • toree 1

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor