Ruby and Excel Data Extraction - ruby

I am learning Ruby and trying to manipulate Excel data.
my goal:
To be able to extract email addresses from an excel file and place them in a text file one per line and add a comma to the end.
my ideas:
i think my answer lies in the use of spreadsheet and File.new.
What I am looking for is direction. I would like to hear any tips or rather hints to accomplish my goal. thanks
Please do not post exact code only looking for direction would like to figure it out myself...
thanks, karen
UPDATE::
So, regex seems to be able to find all matching strings and store them into an array. I´m having some trouble setting that up but should be able to figure it out....but for right now to get started I will extract only the column labeled "E Mail"..... the question I have now is:
`parse_csv = CSV.parse(read_csv, :headers => true)`
The default value for :skip_blanks is set to false.. I need to set it to true but nowhere can I find the correct syntax for doing so... I was assumming something like
`parse_csv = CSV.parse(read_csv, :headers => true :skip_blanks => true)`
But no.....

save your excel file as csv (comma separated value) and work with Ruby's libraries

besides spreadsheet (which can read and write), you can read Excel and other file types with with RemoteTable.
gem install remote_table
and
require 'remote_table'
t = RemoteTable.new('/path/to/file.xlsx', headers: :first_row)
when you write the CSV, as #aug2uag says, you can use ruby's standard library (no gem install required):
require 'csv'
puts [name, email].to_csv

Personally, I'd keep it as simple as possible and use a CSV.
Here is some pseudocode of how that would work:
read in your file line by line
extract your fields using regex, or cell count (depending on how consistent the email address location is), and insert into an arry
iterate through the array and write the values in the fashion you wish (to console, or file)
The code in the comment you had is a great start, however, puts will only write to console, not file. You will also need to figure out how you are going to know you are getting the email address.
Hope this helps.

Related

How to make a CSV file into a hash

In ruby, if I have a CSV file that looks like this:
make,model,color,doors
dodge,charger,black,4
ford,focus,blue,5
nissan,350z,black,2
mazda,miata,white,2
honda,civid,brown,4
corvette,stingray,red,2
ford,fiesta,blue,5
How would I be able to turn this CSV data into a hash and be able to manipulate it as a regular hash. I tried looking for an answer in StackOverflow but could not seem to find one. I tried to find other example online as well but they did not exactly work.
You need to just try
require 'csv'
CSV.open(filename, headers: :first_row).map(&:to_h)
Also you can get reference form below link
Convert CSV file into array of hashes

Write to a ruby file using ruby

Alright, so what I have is a ruby file that takes an input, and writes it to another ruby file. I do not want to write it as a text file, because I am trying to insert this item into a Hash that can later be accessed in another run of the program, which can only be achieved by writing the info to a text file or another ruby file. In this case I want to write it into another ruby file.Here's the first file:
test_text=gets.chomp
to_write_to=File.open("rubylib.rb", "a")
test_text="hobby => #{test_test},"
to_write_to.puts test_text
This inserts the given info at the BOTTOM of the page. The other file is this: (rubylib.rb)
user_info={
"name" => "bob",,
"favorite_color" => "red"
}
I have a threefold question:
1) Is it possible to add test_text to the hash BEFORE the closing bracket?
2) using this method, will the rubylib.rb file, when run, parse the added text as code, or something else?
3)is there a better way to do this?
What I am trying to do is actually physically write the new data to the Hash so that it is still there the next time the file is run, to store data about the user. Because if I add it the normal way, it will be lost the next time the file is run. Is there a way to store data between runs of a ruby file without writing to a text file?
I've done the best I can to give you the info you need and explain the situation as best I can. If you need clarification or more info, please leave a comment and I'll try and get back to you by commenting on that.
Thanks for the help
You should use YAML for this.
Here's how you could create a .yml file with the data you used in your example:
require "yaml"
user_info = { "name" => "bob", "favorite_color" => "red" }
File.write("user_info.yml", user_info.to_yaml)
This creates a file that looks like this:
---
name: bob
favorite_color: red
On a subsequent execution of your program, you can load the .yml file and you'll get back the same Hash that you started with:
user_info = YAML.load_file("user_info.yml")
# => { "name" => "bob", "favorite_color" => "red" }
And you can add new items to the Hash and save it again:
user_info["hobby"] = "fishing"
File.write("user_info.yml", user_info.to_yaml)
Now the file has these contents:
---
name: bob
favorite_color: red
hobby: fishing
Use a database, even SQLite, and it'll let you store data for multiple sessions without any sort of encoding. Writing to a file as you are is really not scalable or practical. You'll slam into some real problems quickly with it.
I'd recommend looking at Sequel and its associated documentation for how to easily work with databases. That's a much more scalable approach and will save you a lot of headaches as you grow your code.

Extract JSON values from remote api with Ruby

I'm trying to grab some data from last.fm and use it in a simple sinatra app. I've worked out how to open the document but having issues extracting the data in ruby here is the first list of the API data I'd like to grab the name:
{"similarartists":{"artist":[{"name":"Sonny & Cher"}]}
This is just an extract of the return, I'm using this in my rb file:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=xxx&format=json").read)
puts data["similarartists"]["artist"]["name"]
It doesn't seem to be working I get can't convert String into Integer (TypeError) on ruby 1.9.3 but the name in the JSON isn't an integer? If I just put the following:
puts data["similarartists"]["artist"]
It returns the whole thing, but I want to grab inside of that and get the name.
"name"=>"Interpol"
I don't understand why it would complain about integers when the name is a string? Hope someone can help me!
Based on the comments thread, the issue is a misunderstanding of the structure of the data returned from the API call.
The exact issue was the structure had an array of artists under the artist key so to get at the name you need to do:
data['similarartists']['artist'][0]['name']
Note though that you should only do that if you are sure there will only be one artist. The nature of the return data suggests that won't always be the case so you might be better off pulling all names depending on your use doing something like:
data['similarartists']['artist'].map {|a| a['name']}.join(',')
That will join all of the artist names together comma separated.
In the future, you can track this issue down by looking at the full structure of the return data and making sure you see the correct structure. The docs on the API may indicate some help here too.
You also might check if someone has made a gem for accessing the API. Often a gem will up-level some of this raw output and give you a nice object to work with. I suggest searching GitHub for a last.fm gem.
The problem is that you are trying to access an Array with the index "name", Ruby tries to convert this to an Integer and fails which results in the Error message you are seeing.
If you test the class of data["similarartists"]["artist"].class you will see that it returns Array. So basically what is happening is that the JSON.parse() called created as the value of data["similarartists"]["artist"] an Array of Hashes. To access all of the artist names you can simply iterate through this array:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=29da5a0e01ca2d1524cac596d5462d67&format=jso\
n").read)
# iterate through the Array of returned artists and print their names
data["similarartists"]["artist"].each do |artist|
puts artist["name"]
end
# output
# Interpol
# White Lies
# The Cinematics
# Smith & Burrows
# The National
# Julian Plenti
# She Wants Revenge
# etc ...
If you only want the first entry for Interpol you can just use index [0]:
puts data["similarartists"]["artist"][0]["name"]

Read Dates as Strings with Spreadsheet Ruby Gem

I have been looking for a way to read out an Excel spreadsheet with keeping the dates that are in them being kept as a string. Unfortunately I can't see if this is possible or not, has anyone managed to do this or know how?
You may want to have a look at the Row class of the spreadsheet gem:
http://spreadsheet.rubyforge.org/Spreadsheet/Row.html
There's a lot that you can get there, but the Row#formatted method is probably what you want:
row = sheet.to_a[row_index] # Get row object
value = row.formatted[column_index]
The formatted method takes all the Excel formatting data for you and gives you an array of Ruby-classed objects
I think you can try row.at(col_index) method..
You can refer to this page

Using Ruby CSV to extract one column

I've been trying to work with getting a single column out of a csv file.
I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
but still don't really understand how to use it.
If I use CSV.table, the response is incredibly slow compared to CSV.read. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.
My request is simply currently looks like this
#dataTable = CSV.table('path_to_csv.csv')
and when I debug I get a response of
#<CSV::Table mode:col_or_row row_count:2104 >
The documentation says I should be able to use by_col(), but when I try to output
<%= debug #dataTable.by_col('col_name or index') %>
It gives me "undefined method 'col' error"
Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?
I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.
To pluck a column out of a csv I'd probably do something like the following:
col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}
That should be substantially faster than any operations on CSV.Table
You can get the values from single column of the csv files using the following snippet.
#dataTable = CSV.table('path_to_csv.csv')
#dataTable[:columnname]

Resources