CSV & Excel Parsing through Ruby

Ruby the versatile. Building web applications through Rails and dealing with Ruby in that capacity, it's good to be able to use it as the admirable server side scripting language that it is.

The CSV Library
Comma Separated Value (CSV), formatted with commas that are delimiters for data. It's used almost everywhere. The dealio with CSV and spreadsheets like Excel is that they are files. And like all simple files, you can simply require them in your ruby application. Here's the Ruby documentation on [CSV](http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html).

The first thing:
require 'csv'

The next thing:
customers = CSV.read('Violations-2012.csv')

Here I am assigning a variable 'customers' to hold the value of the csv file called Violations-2012.csv. This spat out a single array, which contained a multitude of other arrays (the rows being the arrays). The point being once you create a variable object in Ruby you now have access to a bunch of Ruby methods.

Optionally, you can iterate through a CSV file to read its rows.
CSV.foreach('customers.csv') do |row| puts row.inspect end

Also in that same code snippet above you can add headers: true to the above .forEach parameter after the CSV filename. This will take care of the double-array situation as now you are returning as CSV::Table Object, and also have the added benefit of accessing CSV::Row objects that represent the rows.

How does Ruby read the data?
All the strings we see in a simple text file are seen as an Array object by Ruby. Each row in the CSV is represented as an array. The object itself is recognized by Ruby as CSV::Table Object.

Methods you will be commonly using:
CSV.read, CSV.parse, CSV.forEach

I would play around with these commands in the IRB, and then add the code I needed in my application.

Ruby Gems for Some Extra Juice
  • [Excel gem library](https://rubygems.org/gems/spreadsheet/versions/1.1.3) is a good resource. Its designed to only work for Microsoft Excel
  • [FasterCSV](https://rubygems.org/gems/fastercsv), "is intended as a complete replacement to the CSV standard library. It is significantly faster and smaller while still being pure Ruby code. It also strives for a better interface." This gem has the most downloads in the CSV category.
  • [Smarter CSV](https://rubygems.org/gems/smarter_csv), "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
  • Tutorials
    The SitePoint tutorials were really the only helpful resource I could find about the CSV Library in Ruby. Here's Part1 and Part2 of the series.
    This blog post by Udemy is an easy read.