Easily parse a CSV file at the command line with Ruby
Share this page | Read it later using CloudBreak Wallabag
If you’ve ever needed to handle a large CSV file at the command line, Ruby is a good option and would be more reliable at CSV parsing than awk. I found this necessary when trying to manipulate a CSV with more than 65k lines, which my available spreadsheet program (Apple’s Numbers) didn’t like.
This is the base command:
ruby -rcsv -ne 'puts $_.parse_csv'
Broken down:
-rcsvloads (“requires”) the built-in CSV library-eprovides a script to execute (rather than a filename)-nassumes awhile getsloop around the script provided with-e, makingrubyact a little likesedorawk
The $_ variable is the current line, and each line is manipulated separately in the stream. Once parsed, it’s a Ruby Array, so you can use any Enumerable or Array methods. For example, to get the very last column:
ruby -rcsv -ne 'puts $_.parse_csv.last'
# or
ruby -rcsv -ne 'puts $_.parse_csv[-1]'
You can pipe it like you’d expect in Unix as well. As an example, here’s how to get the maximum value for the last column of a big CSV file.
cat big.csv | ruby -rcsv -ne 'puts $_.parse_csv.last' | sort -n | tail -n1