DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Dask Notes


Dask is distributed version of nunpy, pandas and scikit-learn.

File System

Dask can use local file system and HDFS. For example:

import dask.dataframe as dd
df = dd.read_csv('data/2000-*-*.csv')

import dask.bag as db
b = db.read_text('hdfs://path/to/*.json').map(json.loads)

Ref:

  • Remote Data
  • DataFrames: Read and Write Data
  • Create and Store Dask DataFrames

Cluster Setup

By SSH.



Published

Jul 22, 2020

Last Updated

Jul 22, 2020

Category

Tech

Tags

  • dask 1
  • distributed 2
  • pandas 5
  • python 136

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor