Write Hash to CSV file and then read the values back to form a hash - ruby

I am banging my head try to resolve an issue I am experiencing with one of my latest projects. Here is the senario:
I am making an call to GoToWebinar API to fetch upcoming webinars. Everything is working fine and the webinars are fetched in the form of hash just like this :
[
{
"webinarKey":5303085652037254656,
"subject":"Test+Webinar+One",
"description":"Test+Webinar+One+Description",
"times":[{"startTime":"2011-04-26T17:00:00Z","endTime":"2011-04-26T18:00:00Z"}]
},
{
"webinarKey":9068582024170238208,
"name":"Test+Webinar+Two",
"description":"Test Webinar Two Description",
"times":[{"startTime":"2011-04-26T17:00:00Z","endTime":"2011-04-26T18:00:00Z"}]
}
]
I have created a rake task which we are going to run once a day to populate the CSV file with this hash and then the CSV file is read in the controller action to populate the views.
Here is my code to populate the CSV file :
g = GoToWebinar::API.new()
#all_webinars = g.get_upcoming_webinars
CSV.open("#{Rails.root.to_s}/public/upcoming_webinars.csv", "wb") do |csv|
#all_webinars.each do |webinar|
webinar.to_a.each {|elem| csv << elem}
end
end
I need some help in figuring out a way to save the information received in the form of hashed to be saved in the CSV file in such a way that the order is preserved and also a way to read to the information back from the CSV file that it populates the hash in the controller action in the very same way.

You want to use the keys of the hash (since they are constant) as the headers for your CSV file. Then push each element on as you are doing.
g = GoToWebinar::API.new()
#all_webinars = g.get_upcoming_webinars
headers= #all_webinars.keys
CSV.open("#{Rails.root.to_s}/public/upcoming_webinars.csv", "wb", headers: headers) do |csv|
#all_webinars.each do |webinar|
webinar.to_a.each {|elem| csv << elem}
end
end
You are going to want to make sure, however, that any data inside the hash values is flattened. That hash inside of an array for times needs to be dealt with (perhaps just remove times and have a startTime and endTime key in the hash).

From What I have learnt from all the examples and work done to accomplish this I think the best way to go around this type of functionality is to create a rake task to and populate the database with the information and use the information saved to populate the views.

Related

What's the correct way to read an array from a csv field in Ruby?

I am trying to save some class objects to a csv file, everything works fine. I can save and read back from the csv file, there is only a 'minor' problem with an attribute that is an Array of Strings.
When I save it to the file it appears like this: "[""Dan Brown""]"
CSV.open('documents.csv', "w") do |csv|
csv << %w[ISBN Titre Auteurs Type Disponibilité]
#docs.each { |doc|
csv << [doc.isbn, doc.titre, doc.auteurs, doc.type, doc.empruntable ? "Disponible" : "Emprunté"]
}
end
And when I try to extract the data from the file I end up with something like this: ["[\"Dan Brown\"]"].
table = CSV.parse(File.read("documents.csv"), headers: true)
table.each do |row|
doc = Document.new(row['Titre'], row['ISBN'], row['Type'])
doc.auteurs << row['Auteurs'] #This the array where there is a 'problem'
if row['Disponibilité'] == "Disponible"
doc.empruntable = true
else
doc.empruntable = false
end
#docs.push(doc) #this an array where I save my objects
end
I tried many things to solve this but without any luck. I would be thankful if you can help me find a solution.
Since a CSV file, by it's nature, contains in its fields only strings, not arrays or other data types, the CSV class is applying the to_s method of the objects to turn them into a string before putting them into the CSV.
When you later read them back, you just get this - the string representation of what once had been your array. The only one who knows that 'Auteurs' should end up as an array of strings, is the application, i.e. you.
Hence on reading the CSV, after having extracted the autheurs string, you need to convert it manually back to an Array, because there is no automatic "inverse method" to reverse the to_s.
A cheap, but dangerous way to do it, is to use eval, which indeed would reconstruct your array. However, you need to be sure that nobody had a chance to fiddle manually with the CSV data, because an eval allows sneaking in arbitrary code.
A safer way would be to either write your own conversion function to and from String representation, or use a format such as YAML or JSON for representing the Array as String, instead of using to_s.

How to parse two elements from a list to make a new one

I have this input repeated in 1850 files:
[
{
"id"=>66939,
"login"=>"XXX",
"url"=>"https://website.com/XX/users/XXX"
},
...
{}
]
And I wanted to make a list in a way that by looking for the login I can retrieve the ID using a syntax like:
users_list[XXX]
This is my desired output:
{"XXX"=>"66570", "XXX"=>"66570", "XXX"=>"66570", "XXX"=>"66570", ... }
My code is:
i2 = 1
while i2 != users_list_raw.parsed.count
temp_user = users_list_raw.parsed[i2]
temp_user_login = temp_user['login']
temp_user_id = temp_user['id']
user = {
temp_user_login => temp_user_id
}
users_list << user
i2 += 1
end
My output is:
[{"XXX":66570},{"XXX":66569},{"XXX":66568},{"XXX":66567},{"XXX":66566}, ... {}]
but this is not what I want.
What's wrong with my code?
hash[key] = value to add an entry in a hash. So I guess in your case users_list[temp_user_login] = temp_user_id
But I'm unsure why you'd want to do that. I think you could look up the id of a user by having the login with a statement like:
login = XXX
user = users_list.select {|user| user["login"] == login}.first
id = user["id"]
and maybe put that in a function get_id(login) which takes the login as its parameter?
Also, you might want to look into databases if you're going to manipulate large amounts of data like this. ORMs (Object Relational Mappers) are available in Ruby such as Data Mapper and Active Record (which comes bundled with Rails), they allow you to "model" the data and create Ruby objects from data stored in a database, without writing SQL queries manually.
If your goal is to lookup users_list[XXX] then a Hash would work well. We can construct that quite simply:
users_list = users_list_raw.parsed.each.with_object({}) do |user, list|
list[user['login']] = user['id']
end
Any time you find yourself writing a while loop in Ruby, there might be a more idiomatic solution.
If you want to keep track of a mapping from keys to values, the best data structure is a hash. Be aware that assignment via the array operator will replace existing values in the hash.
login_to_id = {}
Dir.glob("*.txt") { |filename| # Use Dir.glob to find all files that you want to process
data = eval(File.read(filename)) # Your data seems to be Ruby encoded hash/arrays. Eval is unsafe, I hope you know what you are doing.
data.each { |hash|
login_to_id[hash["login"]] = hash["id"]
}
}
puts login_to_id["XXX"] # => 66939

Why can't I just delete the header row from a CSV file?

I'm having the hardest time deleting a header row for a CSV file.
This is some code that I wrote to generate a list of account ids from a third-party email service that determines if an account is valid or not.
This is the CSV generator code:
CSV.foreach('complete_failed_list.csv', {headers: false}) do |failed_csv_list|
CSV.open("failed_id_list.csv", "a", {headers: false}) do |csv|
csv << [failed_csv_list[0]]
end
end
This creates a CSV file with the following format:
id
1
2
3
I don't want the id header but, even though I specify headers: false, there still is a header! I want to get rid of it but I can't manually delete it because I'm on a Mac and I have to save the file as a Numbers ".numbers" file instead of ".csv".
Setting the csv headers options only determines whether or not the returned results row come back as an array or a hash which is keyed by the first row in the csv. By setting { headers: true } you essentially skip the header row but have to access the values using a key instead of an index. See below.
CSV.foreach('complete_failed_list.csv', {headers: true}) do |failed_csv_list|
CSV.open("failed_id_list.csv", "a", {headers: false}) do |csv|
csv << [failed_csv_list['id']]
end
end
More information on what options are accepted can be found here. I played around with the return_headers option as well but it didn't seem to matter in terms of what foreach iterates over.
Specifying the headers option tells Ruby whether or not you expect your data to have headers, and if you set headers: true you can actually access columns in your data by the header's name. See "A Guide to the Ruby CSV Library, Part II" for some more details.
However, you are still iterating over every row in the data and so you are dumping the header row into your target file. From the CSV.foreach documentation:
Each row of the file will be passed to the provided block in turn.
Something I have often done is to set a flag to skip the headers before iterating, then jump over the first step:
skip_headers = true
CSV.foreach('complete_failed_list.csv') do |failed_csv_list|
if skip_headers
skip_headers = false # Make sure to change the flag or you will skip every row
next # Move to the next step in the iteration
end
CSV.open("failed_id_list.csv", "a") do |csv|
csv << [failed_csv_list[0]]
end
end
I have also tried to find a more "Ruby-way" to do this, as it seems simple enough but I have not found one better than this basic approach.

Concept for recipe-based parsing of webpages needed

I'm working on a web-scraping solution that grabs totally different webpages and lets the user define rules/scripts in order to extract information from the page.
I started scraping from a single domain and build a parser based on Nokogiri.
Basically everything works fine.
I could now add a ruby class each time somebody wants to add a webpage with a different layout/style.
Instead I thought about using an approach where the user specifies elements where content is stored using xpath and storing this as a sort of recipe for this webpage.
Example: The user wants to scrape a table-structure extracting the rows using a hash (column-name => cell-content)
I was thinking about writing a ruby function for extraction of this generic table information once:
# extracts a table's rows as an array of hashes (column_name => cell content)
# html - the html-file as a string
# xpath_table - specifies the html table as xpath which hold the data to be extracted
def basic_table(html, xpath_table)
xpath_headers = "#{xpath_table}/thead/tr/th"
html_doc = Nokogiri::HTML(html)
html_doc = Nokogiri::HTML(html)
row_headers = html_doc.xpath(xpath_headers)
row_headers = row_headers.map do |column|
column.inner_text
end
row_contents = Array.new
table_rows = html_doc.xpath('#{xpath_table}/tbody/tr')
table_rows.each do |table_row|
cells = table_row.xpath('td')
cells = cells.map do |cell|
cell.inner_text
end
row_content_hash = Hash.new
cells.each_with_index do |cell_string, column_index|
row_content_hash[row_headers[column_index]] = cell_string
end
row_contents << [row_content_hash]
end
return row_contents
end
The user could now specify a website-recipe-file like this:
<basic_table xpath='//div[#id="grid"]/table[#id="displayGrid"]'
The function basic_table is referenced here, so that by parsing the website-recipe-file I would know that I can use the function basic_table to extract the content from the table referenced by the xPath.
This way the user can specify simple recipe-scripts and only has to dive into writing actual code if he needs a new way of extracting information.
The code would not change every time a new webpage needs to be parsed.
Whenever the structure of a webpage changes only the recipe-script would need to be changed.
I was thinking that someone might be able to tell me how he would approach this. Rules/rule engines pop into my mind, but I'm not sure if that really is the solution to my problem.
Somehow I have the feeling that I don't want to "invent" my own solution to handle this problem.
Does anybody have a suggestion?
J.

Sinatra can't convert Symbol into Integer when making MongoDB query

This is a sort of followup to my other MongoDB question about the torrent indexer.
I'm making an open source torrent indexer (like a mini TPB, in essence), and offer both SQLite and MongoDB for backend, currently.
However, I'm having trouble with the MongoDB part of it. In Sinatra, I get when trying to upload a torrent, or search for one.
In uploading, one needs to tag the torrent — and it fails here. The code for adding tags is as follows:
def add_tag(tag)
if $sqlite
unless tag_exists? tag
$db.execute("insert into #{$tag_table} values ( ? )", tag)
end
id = $db.execute("select oid from #{$tag_table} where tag = ?", tag)
return id[0]
elsif $mongo
unless tag_exists? tag
$tag.insert({:tag => tag})
end
return $tag.find({:tag => tag})[:_id] #this is the line it presumably crashes on
end
end
It reaches line 105 (noted above), and then fails. What's going on? Also, as an FYI this might turn into a few other questions as solutions come in.
Thanks!
EDIT
So instead of returning the tag result with [:_id], I changed the block inside the elsif to:
id = $tag.find({:tag => tag})
puts id.inspect
return id
and still get an error. You can see a demo at http://torrent.hypeno.de and the source at http://github.com/tekknolagi/indexer/
Given that you are doing an insert(), the easiest way to get the id is:
id = $tag.insert({:tag => tag})
id will be a BSON::ObjectId, so you can use appropriate methods depending on the return value you want:
return id # BSON::ObjectId('5017cace1d5710170b000001')
return id.to_s # "5017cace1d5710170b000001"
In your original question you are trying to use the Collection.find() method. This returns a Mongo::Cursor, but you are trying to reference the cursor as a document. You need to iterate over the cursor using each or next, eg:
cursor = $tag.find_one({:tag => tag})
return cursor.next['_id'];
If you want a single document, you should be using Collection.find_one().
For example, you can find and return the _id using:
return $tag.find_one({:tag => tag})['_id']
I think the problem here is [:_id]. I dont know much about Mongo but `$tag.find({:tag => tag}) is probably retutning an array and passing a symbol to the [] array operator is not defined.

Resources