DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Manipulating CSV Files in Command Line


csvkit

Here we manipulate csv files with csvkit. It's developed with Python. So install it with sudo pip install csvkit;

Convert xls, xlsx file into csv file: in2csv input.xlsx > data.csv;

List column titles of a csv file: csvcut -n data.csv;

Print statistic information for each columns in a csv file: csvstat data.csv;

Pretty print the first 3 columns, first 5 rows of data: csvcut -c 1-3 agenda.csv | head -6 | csvlook.

Print the first, second, 11th, 16th column of first 9 records (another one for headline): csvcut -c 1,2,11,16 data.csv|head;

If there are newline characters in records, add "-l" option to make it clear: csvcut -l -c 1,2,11,16 data.csv|head;

Convert to json file with csvjson:, head fairData150104.csv|csvjson -i 4, where "-i" specify how many indent spaces, default is no indent. Or select some collomns to show: csvcut -c 1,2,11,16 fairData150104.csv|head|csvjson -i 4

Select some columns and save to a new file: csvcut -c <col-index> input.csv > output.csv. For example, save the 2nd and 3rd columns of file 276107.csv to file windspeed.csv: csvcut -c 2,3 276107.csv > windspeed.csv.

csvtool

Install: sudo aptitude install csvtool;

  • Pretty print the first 3 columns, first 5 rows of data: csvtool col 1-3 agenda.csv | head -6 | csvtool readable -.

  • csvtool sub data.csv:

    Take a square subset of the CSV, top left at row , column , which is deep and wide. and count from 1, or from 0 if -z option is given.

    Example: csvtool sub 1 16 1 1 data.csv

  • csvtool take data.csv: take first rows of the csv, like "head" in shell;

  • csvtool drop data.csv: drop the first rows and return the rest (if any).

See csvtool -h for details.

To print the nth record easily, add the following function into ~/.bash_alias file:

function ctls(){
  if test $# -eq 3; then
    csvtool drop $2 $1 | csvtool take $3 -
  else
    echo ctls: csvtool list specified record
    echo Synopsis: ctls '<filename> <startNo> <count>'
  fi
}

Now print the 3rd record of data.csv with ctls data.csv 3 1.

The "-" in "csvtool take" means "get input from stdin instead of a file".

Select some columns and save to a new file: csvtool col <col-index> input.csv > output.csv. For example, save the 1,2,3 and 6th columns of input file to output file: csvtool col 1-3,6 input.csv > output.csv.

Join some files column-wise:

$ cat aa.csv
col1,col2
11,21
12,22

$ cat bb.csv
col3,col4
31,41
32,42

$ csvtool paste aa.csv bb.csv 
col1,col2,col3,col4
11,21,31,41
12,22,32,42

Put them together

  • Count the number of records which the 7th field is empty: csvgrep -c 7 -r "^$" data.csv | csvtool height -;

  • Count the number of records which the 7th field is NOT empty: csvgrep -c 7 -r "^.+$" data.csv | csvtool height -;

  • Pretty print all rows whose 3rd column is Bob in file agenda.csv: csvgrep -c 3 -m 'Bob' agenda.csv| csvcut -c 1-3 | csvtool readable -

  • Print data on screen without line wrap: less -S <filename>;



Published

Jan 5, 2015

Last Updated

Sep 3, 2018

Category

Tech

Tags

  • command line 14
  • csv 4
  • linux 158

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor