Dask is distributed version of nunpy, pandas and scikit-learn.
File System
Dask can use local file system and HDFS. For example:
import dask.dataframe as dd
df = dd.read_csv('data/2000-*-*.csv')
import dask.bag as db
b = db.read_text('hdfs://path/to/*.json').map(json.loads)
Ref:
Cluster Setup
By SSH.