Download Spark 2.2.0 (spark-2.2.0-bin-hadoop2.7.tgz) and extract to d:\apps.

Install Oracle JDK instead of OpenJDK. Set environment variables (modify user environment variables with rapidee):

SPARK_HOME=D:\apps\spark-2.2.0-bin-hadoop2.7
HADOOP_HOME=%SPARK_HOME%\hadoop
JAVA_HOME=c:\Program Files\Java\jdk1.8.0_151
Path=%SPARK_HOME%\python\pyspark;C:\Users\lee_c\Miniconda3\Scripts;C:\Users\lee_c\Miniconda3;%SPARK_HOME%\bin;%JAVA_HOME%\bin;C:\Users\lee_c\Miniconda3;%USERPROFILE%\AppData\Local\Microsoft\WindowsApps;
PYSPARK_PYTHON=C:\Users\lee_c\Miniconda3\python.exe
PYTHONPATH=%SPARK_HOME%\python\pyspark;C:\Users\lee_c\Miniconda3

Check values of environment variables with set in Windows console.

Using winutils.exe to set file system permission

mkdir -p %SPARK_HOME%/hadoop/bin
cd %SPARK_HOME%/hadoop/bin
wget https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe
winutils.exe chmod 777 D:\tmp\hive

Verify

$ spark-submit d:\apps\spark-2.2.0-bin-hadoop2.7\examples\src\main\python\wordcount.py d:\apps\spark-2.2.0-bin-hadoop2.7\examples\src\main\python\wordcount.py
$ pyspark
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> spark
<pyspark.sql.session.SparkSession object at 0x0000022917AF87F0>

Note: on Windows the path seperator must be backslash. If using slash, write it like this: file:///d:/apps/....

Run pyspark on Windows

Using winutils.exe to set file system permission

Verify

Published

Last Updated

Category

Tags

Contact