DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Using scikit-learn & pandas on Spark Cluster


With method 'Create a Temporary Local Repository' described in Creating and Using a Parcel Repository for Cloudera Manager, we can copy anaconda parcel files to cdh manager host, and install anaconda to all the nodes in spark cluster. Then run python scripts with PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python spark-submit pyspark_script.py according to Using the Anaconda parcel.

Install Anaconda with Local Parcel

Download 3 files: Anaconda-4.1.1-el6.parcel, Anaconda-4.1.1-el6.parcel.sha and manifest.json from Anaconda CDH Parcel Repo according to Using the Anaconda parcel to local disk, copy them to cdh manager.



Published

May 7, 2017

Last Updated

May 7, 2017

Category

Tech

Tags

  • anaconda 4
  • cdh 2
  • pandas 5
  • python 136
  • sklearn 1
  • spark 21

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor