PySpark in Jupyter
First download Spark 2.2 and extract. Then setup jupyter and install minrk/findspark:
conda create -n pysparkenv python=3.5
. activate pysparkenv
conda install jupyter ipython
conda install -c conda-forge findspark
In the browser, create a new Python 3 notebook, and run:
import findspark
findspark.init("/home/leo/apps/spark-2.2.0-bin-hadoop2.7/", edit_profile=True)
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
Note:
Running findspark.init
with option edit_profile=True
makes
a startup script created in ~/.ipython/profile_default/startup/findspark.py.
Next time after creating a new notebook in browser,
no need to run findspark.init(...)
again.
Scala in Jupyter
For now (2017.12.11), Apache Toree only supports Scala 2.10 and Spark 1.6.3. You can't use pyspark.
Installation
Install Toree kernel according to its Quick Start.
It report Permission denied: '/usr/local/share/jupyter'
.
According Install to non-/usr/local/share location,
and introduction about --ToreeInstall.prefix
in
jupyter toree install --help-all
, the prefix should be
$MINICONDA_HOME/envs/py35-anaconda-keras
.
Here py35-anaconda-keras
is the virtual environment name which contains
Anaconda and Jupyter notebook.
So install with:
. activate py35-anaconda-keras
pip install toree
jupyter toree install --spark_home=/home/leo/apps/spark-2.2.0-bin-hadoop2.7/ --ToreeInstall.prefix=/home/leo/apps/miniconda3/envs/py35-anaconda-keras/
The server can start, but the toree kernel starting failed.
Accroding to Apache Toree and Spark Scala Not Working in Jupyter, for now toree only support the Scala 2.10 or lower. So I install Spark 1.6.3 and reinstall toree kernel using the following commands:
. activate py35-anaconda-keras
jupyter kernelspec list
rm -rf /home/leo/apps/miniconda3/envs/py35-anaconda-keras/share/jupyter/kernels/apache_toree_scala
jupyter toree install --spark_home=/home/leo/apps/spark-1.6.3-bin-hadoop2.6/ --ToreeInstall.prefix=/home/leo/apps/miniconda3/envs/py35-anaconda-keras/