I am learning Ruby and trying to manipulate Excel data.
my goal:
To be able to extract email addresses from an excel file and place them in a text file one per line and add a comma to the end.
my ideas:
i think my answer lies in the use of spreadsheet and File.new.
What I am looking for is direction. I would like to hear any tips or rather hints to accomplish my goal. thanks
Please do not post exact code only looking for direction would like to figure it out myself...
thanks, karen
UPDATE::
So, regex seems to be able to find all matching strings and store them into an array. I´m having some trouble setting that up but should be able to figure it out....but for right now to get started I will extract only the column labeled "E Mail"..... the question I have now is:
`parse_csv = CSV.parse(read_csv, :headers => true)`
The default value for :skip_blanks is set to false.. I need to set it to true but nowhere can I find the correct syntax for doing so... I was assumming something like
`parse_csv = CSV.parse(read_csv, :headers => true :skip_blanks => true)`
But no.....
save your excel file as csv (comma separated value) and work with Ruby's libraries
besides spreadsheet (which can read and write), you can read Excel and other file types with with RemoteTable.
gem install remote_table
and
require 'remote_table'
t = RemoteTable.new('/path/to/file.xlsx', headers: :first_row)
when you write the CSV, as #aug2uag says, you can use ruby's standard library (no gem install required):
require 'csv'
puts [name, email].to_csv
Personally, I'd keep it as simple as possible and use a CSV.
Here is some pseudocode of how that would work:
read in your file line by line
extract your fields using regex, or cell count (depending on how consistent the email address location is), and insert into an arry
iterate through the array and write the values in the fashion you wish (to console, or file)
The code in the comment you had is a great start, however, puts will only write to console, not file. You will also need to figure out how you are going to know you are getting the email address.
Hope this helps.
I have a CSV file called "A.csv". I need to generate a new CSV file called "B.csv" with data from "A.csv".
I will be using a subset of columns from "A.csv" and will have to update one column's values to new values in "B.csv". Ultimately, I will use this data from B.csv to validate against a database.
How do I create a new CSV file?
How do I copy the required columns' data from A.csv to "B.csv"?
How do I append values for a particular column?
I am new to Ruby, but I am able to read CSV to get an array or hash.
As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):
To create a new file:
In this file we'll have two rows, a header row and data row, very simple CSV:
require "csv"
CSV.open("file.csv", "wb") do |csv|
csv << ["animal", "count", "price"]
csv << ["fox", "1", "$90.00"]
end
result, a file called "file.csv" with the following:
animal,count,price
fox,1,$90.00
How to append data to a CSV
Almost the same formula as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?
CSV.open("file.csv", "a+") do |csv|
csv << ["cow", "3","2500"]
end
Now when we open our file.csv we have:
animal,count,price
fox,1,$90.00
cow,3,2500
Read from our CSV file
Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:
CSV.foreach("file.csv") do |row|
puts row #first row would be ["animal", "count", "price"] - etc.
end
Of course, this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info, I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
Have you seen Ruby's CSV class? It seems pretty comprehensive. Check it out here:
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
You will probably want to use CSV::parse to help Ruby understand your CSV as the table of data that it is and enable easy access to values by header.
Unfortunately, the available documentation on the CSV::parse method doesn't make it very clear how to actually use it for this purpose.
I had a similar task and was helped much more by How to Read & Parse CSV Files With Ruby on rubyguides.com than by the CSV class documentation or by the answers pointing to it from here.
I recommend reading that page in its entirety. The crucial part is about transforming a given CSV into a CSV::Table object using:
table = CSV.parse(File.read("cats.csv"), headers: true)
Now there's documentation on the CSV::Table class, but again you might be helped more by the clear examples on the rubyguides.com page. One thing I'll highlight is that when you tell .parse to expect headers, the resulting table will treat the first row of data as row [0].
You will probably be especially interested in the .by_col method available for your new Table object. This will allow you to iterate through different column index positions in the input and/or output and either copy from one to the other or add a new value to the output. If I get it working, I'll come back and post an example.
I've been trying to work with getting a single column out of a csv file.
I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
but still don't really understand how to use it.
If I use CSV.table, the response is incredibly slow compared to CSV.read. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.
My request is simply currently looks like this
#dataTable = CSV.table('path_to_csv.csv')
and when I debug I get a response of
#<CSV::Table mode:col_or_row row_count:2104 >
The documentation says I should be able to use by_col(), but when I try to output
<%= debug #dataTable.by_col('col_name or index') %>
It gives me "undefined method 'col' error"
Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?
I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.
To pluck a column out of a csv I'd probably do something like the following:
col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}
That should be substantially faster than any operations on CSV.Table
You can get the values from single column of the csv files using the following snippet.
#dataTable = CSV.table('path_to_csv.csv')
#dataTable[:columnname]
I have the following CSV file:
Date,Av,Sec,128,440,1024,Mixed,,rule,sn,version
6/30/2010,3.40,343,352.0,1245.8,3471.1,650.7,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 342,-0.26%,-0.91%,1.51%,-0.97%
6/24/2010,3.40,342,352.9,1257.2,3419.5,657.1,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 341,0.23%,0.50%,-1.34%,0.67%
6/17/2010,3.40,341,352.1,1251.0,3466.1,652.7,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 340,7.77%,5.32%,9.04%,1.71%
6/14/2010,3.40,340,326.7,1187.8,3178.7,641.7,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 339,-0.88%,-0.34%,-0.95%,0.05%
6/11/2010,3.40,339,329.6,1191.9,3209.2,641.4,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 338,0.58%,0.51%,-1.83%,0.99%
6/11/2010,3.40,338,327.7,1185.8,3269.1,635.1,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 335,-0.40%,-0.44%,1.46%,-1.96%
6/11/2010,3.40,335,329.0,1191.0,3221.9,647.8,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 333,-6.83%,-4.70%,-7.04%,-0.32%
6/11/2010,3.40,333,353.1,1249.8,3465.8,649.9,Mbps,on,s-2.8.6-38,4.9.1-229,,vs. 332,2.53%,2.02%,1.71%,2.14%
and I want to parse columns 4, 5, 6 and 7 and have four arrays, on which I can do operations like create a line graph against time, etc.
You need the Ruby CSV module which ships with Ruby. Example:
require 'csv'
require 'pp'
file = File.open( 'bar.csv' )
CSV::Reader.create( file ).each do |row|
pp row[4..7]
end
why reinvent the wheel!
Use plugin fastercsv or csv
You are spoiled for choice in parsing CSVs with Ruby, as there are options included in the standard lib, as well as easy home-brewed methods or Open Source libs.
You can start with the examples on "How to parse CSV data with Ruby", and that should point you in the direction for digging deeper.
you can use the smarter_csv Ruby gem and use a :key_mapping to ignore unwanted input columns.
See:
https://github.com/tilo/smarter_csv
I am having an array like "author","post title","date","time","post category", etc etc
I scrape the details from a forum and I want to
save the data using ruby
update the data using ruby
update the data using text editor or I was thinking of one of OpenOffice programs? Calc would be the best.
I guess to have some kind of SQL database would be a solution but I need quick solution for that (somthing that I can do by myself :-)
any suggestions?
Thank you
YAML is your friend here.
require "yaml"
yaml= ["author","post title","date","time","post category"].to_yaml
File.open("filename", "w") do |f|
f.write(yaml)
end
this will give you
---
- author
- post title
- date
- time
- post category
vice versa you get
require "yaml"
YAML.load(File.read("filename")) # => ["author","post title","date","time","post category"]
Yaml is easily human readable, so you can edit it with any text editor (not word proccessor like ooffice). You can not only searialize array's and strings. Yaml works out of the box for most ruby objects, even for objects of user defined classes. This is a good itrodution into the yaml syntax: http://yaml.kwiki.org/?YamlInFiveMinutes.
If you want to use a spreadsheet, csv is the way to go. You can use the stdlib csv api like:
require 'csv'
my2DArray = [[1,2],["foo","bar"]]
File.open('data.csv', 'w') do |outfile|
CSV::Writer.generate(outfile) do |csv|
my2DArray.each do |row|
csv << row
end
end
end
You can then open the resulting file in calc or in most statistics applications.
The same API can be used to re-import the result in ruby if you need.
You could serialize it to json and save it to a file. This would allow you to edit it using a simple text editor.
if you want to edit it in something like calc, you could consider generating a CSV (comma separated values) file and import it.
If I understand correctly, you have a two-dimensional array. You could output it in csv format like so:
array.each do |row|
puts row.join(",")
end
Then you import it with Calc to edit it or just use a text editor.
If your data might contain commas, you should have a look at the csv module instead:
http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html