DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Data Science in Command Line


Tools

xsv installed with asdf and cargo.

feedgnuplot installed with apt.

display in imagemagick installed with apt.

header, Rio in foler tools of data-science-at-the-command-line. Delete --vanilla option in Rio script to use customized R environment setup.

Examples

git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
cd data-science-at-the-command-line/data/ch07

< data/tips.csv Rio -ge 'g + geom_histogram(aes(bill))' | display
# use `q` to quit image window

< data/immigration.csv xsv select Period,Denmark,Netherlands,Norway,Sweden |
  Rio -d',' -re 'reshape2::melt(df, id="Period", variable.name="Country", value.name="Count")' |
  tee immigration-long.csv | head | xsv table
# note how to use `tee` to save calculation results in file
# here `-d` option is unnecessary

< data/tips.csv | xsv select size | header -d |
  feedgnuplot --terminal 'dumb 80,25' --histogram 0  --with boxes --unset grid --exit

seq 5 | awk '{print 2*$1, $1*$1}' |
  feedgnuplot --lines --points --legend 0 "data 0" --title "Test plot"\
  --y2 1 --unset grid --terminal 'dumb 80,40' --exit

# a sin plots
seq -15 15 | awk '{print $1, sin($1)}' | feedgnuplot --domain --lines --points \
  --unset grid --terminal 'dumb 120 30' --exit --legend 0 'sin(x)'

Note the difference between --domain and --dataid:

--domain means using the first column as the X column, instead of the row number.

While --dataid means the 1st, 3rd, 5th ... columns are the ID of the 2nd, 4th, 6th columns, respectively. So you can put multiple curves in one column with different IDs. For example, with --dataid, the dataset below:

1 1.0
1 2.0
2 1.5
2 2.5
1 3.0

will be ploted as 2 curves:

1 1.0
1 2.0
1 3.0

and

2 1.5
2 2.5


Published

Feb 6, 2020

Last Updated

Feb 8, 2020

Category

Tech

Tags

  • data 1
  • shell 46
  • terminal 8
  • visualization 2

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor