DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Run pyspark on Windows


Download Spark 2.2.0 (spark-2.2.0-bin-hadoop2.7.tgz) and extract to d:\apps.

Install Oracle JDK instead of OpenJDK. Set environment variables (modify user environment variables with rapidee):

SPARK_HOME=D:\apps\spark-2.2.0-bin-hadoop2.7
HADOOP_HOME=%SPARK_HOME%\hadoop
JAVA_HOME=c:\Program Files\Java\jdk1.8.0_151
Path=%SPARK_HOME%\python\pyspark;C:\Users\lee_c\Miniconda3\Scripts;C:\Users\lee_c\Miniconda3;%SPARK_HOME%\bin;%JAVA_HOME%\bin;C:\Users\lee_c\Miniconda3;%USERPROFILE%\AppData\Local\Microsoft\WindowsApps;
PYSPARK_PYTHON=C:\Users\lee_c\Miniconda3\python.exe
PYTHONPATH=%SPARK_HOME%\python\pyspark;C:\Users\lee_c\Miniconda3

Check values of environment variables with set in Windows console.

Using winutils.exe to set file system permission

mkdir -p %SPARK_HOME%/hadoop/bin
cd %SPARK_HOME%/hadoop/bin
wget https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe
winutils.exe chmod 777 D:\tmp\hive

Verify

$ spark-submit d:\apps\spark-2.2.0-bin-hadoop2.7\examples\src\main\python\wordcount.py d:\apps\spark-2.2.0-bin-hadoop2.7\examples\src\main\python\wordcount.py
$ pyspark
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> spark
<pyspark.sql.session.SparkSession object at 0x0000022917AF87F0>

Note: on Windows the path seperator must be backslash. If using slash, write it like this: file:///d:/apps/....



Published

Dec 20, 2017

Last Updated

Dec 20, 2017

Category

Tech

Tags

  • pyspark 5
  • spark 21
  • windows 67

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor