What's the correct way to read an array from a csv field in Ruby? - ruby

I am trying to save some class objects to a csv file, everything works fine. I can save and read back from the csv file, there is only a 'minor' problem with an attribute that is an Array of Strings.
When I save it to the file it appears like this: "[""Dan Brown""]"
CSV.open('documents.csv', "w") do |csv|
csv << %w[ISBN Titre Auteurs Type Disponibilité]
#docs.each { |doc|
csv << [doc.isbn, doc.titre, doc.auteurs, doc.type, doc.empruntable ? "Disponible" : "Emprunté"]
}
end
And when I try to extract the data from the file I end up with something like this: ["[\"Dan Brown\"]"].
table = CSV.parse(File.read("documents.csv"), headers: true)
table.each do |row|
doc = Document.new(row['Titre'], row['ISBN'], row['Type'])
doc.auteurs << row['Auteurs'] #This the array where there is a 'problem'
if row['Disponibilité'] == "Disponible"
doc.empruntable = true
else
doc.empruntable = false
end
#docs.push(doc) #this an array where I save my objects
end
I tried many things to solve this but without any luck. I would be thankful if you can help me find a solution.

Since a CSV file, by it's nature, contains in its fields only strings, not arrays or other data types, the CSV class is applying the to_s method of the objects to turn them into a string before putting them into the CSV.
When you later read them back, you just get this - the string representation of what once had been your array. The only one who knows that 'Auteurs' should end up as an array of strings, is the application, i.e. you.
Hence on reading the CSV, after having extracted the autheurs string, you need to convert it manually back to an Array, because there is no automatic "inverse method" to reverse the to_s.
A cheap, but dangerous way to do it, is to use eval, which indeed would reconstruct your array. However, you need to be sure that nobody had a chance to fiddle manually with the CSV data, because an eval allows sneaking in arbitrary code.
A safer way would be to either write your own conversion function to and from String representation, or use a format such as YAML or JSON for representing the Array as String, instead of using to_s.

Related

Specific Values in Json Parse

I am having difficulty getting to specific values when I parse a JSON file in Ruby. My JSON is based off of this link https://www.mcdonalds.com/services/mcd/us/restaurantLocator?latitude=40.7217861&longitude=-74.00944709999999&radius=8045&maxResults=100&country=us&language=en-us
No matter what I try I cannot pull the values I want, which is the "addressLine1" field. I get the following error:
`[]': no implicit conversion of String into Integer (TypeError)
Code
require 'json'
file = File.read('MCD.json')
data_hash = JSON.parse(file)
print data_hash.keys
print "\n"
print data_hash['features']['addressLine1']
data_hash['features'] is an array. Depending on what do you actually need, you might either iterate over it, or call:
data_hash['features'].first['properties']['addressLine1']
Note 'properties' there, since addressLine1 is not a direct descendant of 'features' elements.

Concept for recipe-based parsing of webpages needed

I'm working on a web-scraping solution that grabs totally different webpages and lets the user define rules/scripts in order to extract information from the page.
I started scraping from a single domain and build a parser based on Nokogiri.
Basically everything works fine.
I could now add a ruby class each time somebody wants to add a webpage with a different layout/style.
Instead I thought about using an approach where the user specifies elements where content is stored using xpath and storing this as a sort of recipe for this webpage.
Example: The user wants to scrape a table-structure extracting the rows using a hash (column-name => cell-content)
I was thinking about writing a ruby function for extraction of this generic table information once:
# extracts a table's rows as an array of hashes (column_name => cell content)
# html - the html-file as a string
# xpath_table - specifies the html table as xpath which hold the data to be extracted
def basic_table(html, xpath_table)
xpath_headers = "#{xpath_table}/thead/tr/th"
html_doc = Nokogiri::HTML(html)
html_doc = Nokogiri::HTML(html)
row_headers = html_doc.xpath(xpath_headers)
row_headers = row_headers.map do |column|
column.inner_text
end
row_contents = Array.new
table_rows = html_doc.xpath('#{xpath_table}/tbody/tr')
table_rows.each do |table_row|
cells = table_row.xpath('td')
cells = cells.map do |cell|
cell.inner_text
end
row_content_hash = Hash.new
cells.each_with_index do |cell_string, column_index|
row_content_hash[row_headers[column_index]] = cell_string
end
row_contents << [row_content_hash]
end
return row_contents
end
The user could now specify a website-recipe-file like this:
<basic_table xpath='//div[#id="grid"]/table[#id="displayGrid"]'
The function basic_table is referenced here, so that by parsing the website-recipe-file I would know that I can use the function basic_table to extract the content from the table referenced by the xPath.
This way the user can specify simple recipe-scripts and only has to dive into writing actual code if he needs a new way of extracting information.
The code would not change every time a new webpage needs to be parsed.
Whenever the structure of a webpage changes only the recipe-script would need to be changed.
I was thinking that someone might be able to tell me how he would approach this. Rules/rule engines pop into my mind, but I'm not sure if that really is the solution to my problem.
Somehow I have the feeling that I don't want to "invent" my own solution to handle this problem.
Does anybody have a suggestion?
J.

Parse a string with multiple XML-like tags using Ruby

I have a string which looks like the following:
string = " <SET-TOPIC>INITIATE</SET-TOPIC>
<SETPROFILE>
<PROFILE-KEY>predicates_live</PROFILE-KEY>
<PROFILE-VALUE>yes</PROFILE-VALUE>
</SETPROFILE>
<think>
<set><name>first_time_initiate</name>yes</set>
</think>
<SETPROFILE>
<PROFILE-KEY>first_time_initiate</PROFILE-KEY>
<PROFILE-VALUE>YES</PROFILE-VALUE>
</SETPROFILE>"
My objective is to be able to read out each top level that is in caps with the parse. I use a case statement to evaluate what is the top level key, such as <SETPROFILE> but there can be lots of different values, and then run a method that does different things with the contnts of the tag.
What this means is I need to be able to know very easily:
top_level_keys = ['SET-TOPIC', 'SET-PROFILE', 'SET-PROFILE']
when I pass in the key know the full value
parsed[0].value = {:PROFILE-KEY => predicates_live, :PROFILE-VALUE => yes}
parsed[0].key = ['SET-TOPIC']
I currently parse the whole string as follows:
doc = Nokogiri::XML::DocumentFragment.parse(string)
parsed = doc.search('*').each_with_object({}){ |n, h|
h[n.name] = n.text
}
As a result, I only parse and know of the second tag. The values from the first tag do not show up in the parsed variable.
I have control over what the tags are, if that helps.
But I need to be able to parse and know the contents of both tag as a result of the parse because I need to apply a method for each instance of the node.
Note: the string also contains just regular text, both before, in between, and after the XML-like tags.
It depends on what you are going to achieve. The problem is that you are overriding hash keys by new values. The easiest way to collect values is to store them in array:
parsed = doc.search('*').each_with_object({}) do |n, h|
# h[n.name] = n.text :: removed because it overrides values
(h[n.name] ||= []) << n.text
end

Ruby String to access an object attribute

I have a text file (objects.txt) which contains Objects and its attributes.
The content of the file is something like:
Object.attribute = "data"
On a different file, I am Loading the objects.txt file and if I type:
puts object.attribute it prints out data
The issue comes when I am trying to access the object and/or the attribute with a string. What I am doing is:
var = "object" + "." + "access"
puts var
It prints out object.access and not the content of it "data".
I have already tried with instance_variable_get and it works, but I have to modify the object.txt and append an # at the beginning to make it an instance variable, but I cannot do this, because I am not the owner of the object.txt file.
As a workaround I can parse the object.txt file and get the data that I need but I don't want to do this, as I want take advantage of what is already there.
Any suggestions?
Yes, puts is correctly spitting out "object.access" because you are creating that string exactly.
In order to evaluate a string as if it were ruby code, you need to use eval()
eg:
var = "object" + "." + "access"
puts eval(var)
=> "data"
Be aware that doing this is quite dangerous if you are evaluating anything that potentially comes from another user.

Write Hash to CSV file and then read the values back to form a hash

I am banging my head try to resolve an issue I am experiencing with one of my latest projects. Here is the senario:
I am making an call to GoToWebinar API to fetch upcoming webinars. Everything is working fine and the webinars are fetched in the form of hash just like this :
[
{
"webinarKey":5303085652037254656,
"subject":"Test+Webinar+One",
"description":"Test+Webinar+One+Description",
"times":[{"startTime":"2011-04-26T17:00:00Z","endTime":"2011-04-26T18:00:00Z"}]
},
{
"webinarKey":9068582024170238208,
"name":"Test+Webinar+Two",
"description":"Test Webinar Two Description",
"times":[{"startTime":"2011-04-26T17:00:00Z","endTime":"2011-04-26T18:00:00Z"}]
}
]
I have created a rake task which we are going to run once a day to populate the CSV file with this hash and then the CSV file is read in the controller action to populate the views.
Here is my code to populate the CSV file :
g = GoToWebinar::API.new()
#all_webinars = g.get_upcoming_webinars
CSV.open("#{Rails.root.to_s}/public/upcoming_webinars.csv", "wb") do |csv|
#all_webinars.each do |webinar|
webinar.to_a.each {|elem| csv << elem}
end
end
I need some help in figuring out a way to save the information received in the form of hashed to be saved in the CSV file in such a way that the order is preserved and also a way to read to the information back from the CSV file that it populates the hash in the controller action in the very same way.
You want to use the keys of the hash (since they are constant) as the headers for your CSV file. Then push each element on as you are doing.
g = GoToWebinar::API.new()
#all_webinars = g.get_upcoming_webinars
headers= #all_webinars.keys
CSV.open("#{Rails.root.to_s}/public/upcoming_webinars.csv", "wb", headers: headers) do |csv|
#all_webinars.each do |webinar|
webinar.to_a.each {|elem| csv << elem}
end
end
You are going to want to make sure, however, that any data inside the hash values is flattened. That hash inside of an array for times needs to be dealt with (perhaps just remove times and have a startTime and endTime key in the hash).
From What I have learnt from all the examples and work done to accomplish this I think the best way to go around this type of functionality is to create a rake task to and populate the database with the information and use the information saved to populate the views.

Resources