Using Ruby CSV to extract one column - ruby

I've been trying to work with getting a single column out of a csv file.
I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
but still don't really understand how to use it.
If I use CSV.table, the response is incredibly slow compared to CSV.read. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.
My request is simply currently looks like this
#dataTable = CSV.table('path_to_csv.csv')
and when I debug I get a response of
#<CSV::Table mode:col_or_row row_count:2104 >
The documentation says I should be able to use by_col(), but when I try to output
<%= debug #dataTable.by_col('col_name or index') %>
It gives me "undefined method 'col' error"
Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?
I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.

To pluck a column out of a csv I'd probably do something like the following:
col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}
That should be substantially faster than any operations on CSV.Table

You can get the values from single column of the csv files using the following snippet.
#dataTable = CSV.table('path_to_csv.csv')
#dataTable[:columnname]

Related

Ruby and Excel Data Extraction

I am learning Ruby and trying to manipulate Excel data.
my goal:
To be able to extract email addresses from an excel file and place them in a text file one per line and add a comma to the end.
my ideas:
i think my answer lies in the use of spreadsheet and File.new.
What I am looking for is direction. I would like to hear any tips or rather hints to accomplish my goal. thanks
Please do not post exact code only looking for direction would like to figure it out myself...
thanks, karen
UPDATE::
So, regex seems to be able to find all matching strings and store them into an array. I´m having some trouble setting that up but should be able to figure it out....but for right now to get started I will extract only the column labeled "E Mail"..... the question I have now is:
`parse_csv = CSV.parse(read_csv, :headers => true)`
The default value for :skip_blanks is set to false.. I need to set it to true but nowhere can I find the correct syntax for doing so... I was assumming something like
`parse_csv = CSV.parse(read_csv, :headers => true :skip_blanks => true)`
But no.....
save your excel file as csv (comma separated value) and work with Ruby's libraries
besides spreadsheet (which can read and write), you can read Excel and other file types with with RemoteTable.
gem install remote_table
and
require 'remote_table'
t = RemoteTable.new('/path/to/file.xlsx', headers: :first_row)
when you write the CSV, as #aug2uag says, you can use ruby's standard library (no gem install required):
require 'csv'
puts [name, email].to_csv
Personally, I'd keep it as simple as possible and use a CSV.
Here is some pseudocode of how that would work:
read in your file line by line
extract your fields using regex, or cell count (depending on how consistent the email address location is), and insert into an arry
iterate through the array and write the values in the fashion you wish (to console, or file)
The code in the comment you had is a great start, however, puts will only write to console, not file. You will also need to figure out how you are going to know you are getting the email address.
Hope this helps.

How do I create a copy of some columns of a CSV file in Ruby with different data in one column?

I have a CSV file called "A.csv". I need to generate a new CSV file called "B.csv" with data from "A.csv".
I will be using a subset of columns from "A.csv" and will have to update one column's values to new values in "B.csv". Ultimately, I will use this data from B.csv to validate against a database.
How do I create a new CSV file?
How do I copy the required columns' data from A.csv to "B.csv"?
How do I append values for a particular column?
I am new to Ruby, but I am able to read CSV to get an array or hash.
As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):
To create a new file:
In this file we'll have two rows, a header row and data row, very simple CSV:
require "csv"
CSV.open("file.csv", "wb") do |csv|
csv << ["animal", "count", "price"]
csv << ["fox", "1", "$90.00"]
end
result, a file called "file.csv" with the following:
animal,count,price
fox,1,$90.00
How to append data to a CSV
Almost the same formula as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?
CSV.open("file.csv", "a+") do |csv|
csv << ["cow", "3","2500"]
end
Now when we open our file.csv we have:
animal,count,price
fox,1,$90.00
cow,3,2500
Read from our CSV file
Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:
CSV.foreach("file.csv") do |row|
puts row #first row would be ["animal", "count", "price"] - etc.
end
Of course, this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info, I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
Have you seen Ruby's CSV class? It seems pretty comprehensive. Check it out here:
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
You will probably want to use CSV::parse to help Ruby understand your CSV as the table of data that it is and enable easy access to values by header.
Unfortunately, the available documentation on the CSV::parse method doesn't make it very clear how to actually use it for this purpose.
I had a similar task and was helped much more by How to Read & Parse CSV Files With Ruby on rubyguides.com than by the CSV class documentation or by the answers pointing to it from here.
I recommend reading that page in its entirety. The crucial part is about transforming a given CSV into a CSV::Table object using:
table = CSV.parse(File.read("cats.csv"), headers: true)
Now there's documentation on the CSV::Table class, but again you might be helped more by the clear examples on the rubyguides.com page. One thing I'll highlight is that when you tell .parse to expect headers, the resulting table will treat the first row of data as row [0].
You will probably be especially interested in the .by_col method available for your new Table object. This will allow you to iterate through different column index positions in the input and/or output and either copy from one to the other or add a new value to the output. If I get it working, I'll come back and post an example.

Read Dates as Strings with Spreadsheet Ruby Gem

I have been looking for a way to read out an Excel spreadsheet with keeping the dates that are in them being kept as a string. Unfortunately I can't see if this is possible or not, has anyone managed to do this or know how?
You may want to have a look at the Row class of the spreadsheet gem:
http://spreadsheet.rubyforge.org/Spreadsheet/Row.html
There's a lot that you can get there, but the Row#formatted method is probably what you want:
row = sheet.to_a[row_index] # Get row object
value = row.formatted[column_index]
The formatted method takes all the Excel formatting data for you and gives you an array of Ruby-classed objects
I think you can try row.at(col_index) method..
You can refer to this page

Ruby: parse csv data to pdf

I am using prawn gem to generate pdf document
how one can parse data from csv to pdf
i have used the code some thing like this
Prawn::Document.generate("user.pdf", :page_layout => :landscape) do
exa_url = "D:/userReport.csv"
csv_data = open(exa_url).read.lines.to_a
headers = CSV.parse(csv_data[6]).first
body = CSV.parse(csv_data[7..-1].join)
table body, :headers => headers, :font_size =>10, :position=>:centere
end
it me gives an error,
Is there any other approach, or other advice to fix this
headers = CSV.parse(csv_data[6]).first doesn't mention pos, but your error message does. So you are looking at some error inside CSV.parse (I guess).
Most likely your csv_data Array doesn't contain what you think it does or is otherwise invalid.
In other words, it sounds like you're dealing solely with an issue of parsing CSV data. Try reducing your code to a simpler case and investigate your CSV data.
Good luck.
Sounds like you have an issue with opening your CSV file. I'd recommend doing some irb detective work and making sure csv_data = open(exa_url).read.lines.to_a is working. Sounds like you're trying to interact with csv_data and it's nil.

Ruby parse comma separated text file

I need some help with a Ruby script I can call from the console. The script needs to parse a simple .txt file with comma separated values.
value 1, value2, value3, etc...
The values needs to be added to the database.
Any suggestions?
array = File.read("csv_file.txt").split(",").map(&:strip)
You will get the values in the array and use it to store to database. If you want more functions, you can make use of FasterCSV gem.
Ruby 1.9.2 has a very good CSV library which is useful for this stuff: http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
On earlier versions of Ruby you could use http://fastercsv.rubyforge.org/ (which essentially became CSV in 1.9.2)
You could do it manually by reading the file into a string and using .split(',') but I'd go with one of the libraries above.
Quick and dirty solution:
result = []
File.open("<path-to-file>","r") do |handle|
handle.each_line do |line|
result << line.split(",").strip
end
end # closes automatically when EOF reached
result.flatten!
result # => big array of values
Now you can iterate the result array and save the values to the database.
This simple file iteration doesn't take care for order or special fields, because it wasn't mentioned in the question.
Something easy to get you started:
IO.readlines("csv_file.txt", '').each do |line|
values = line.split(",").collect(&:strip)
# do something with the values?
end
Hope this helps.

Resources