Weird JSON parsing issues with Ruby - ruby

I'm downloading content from a webpage that seems to be in JSON. It is a large file with the following format:
"address1":"123 Street","address2":"Apt 1","city":"City","state":"ST","zip":"xxxxx","country":"US"
There are about 1000 of these entries, where each entry is contained within brackets. When I download the page using RestClient.get (open-uri for some reason was throwing a http 500 error), the data is in the following format:
\"address\1":\"123 Street\",\"address2\":\"Apt 1\",\"city\":\"City\",\"state\":\"ST\",\"zip\":\"xxxxx\",\"country\":\"US\"
When I then use the json class
parsed = JSON.parse(data_out)
it completely scrambles both the order of entries within the data structure, and also the order of the objects within each entry, for example:
"address1"=>"123 Street", "city"=>"City", "country"=>"US", "address2"=>"Apt 1"
If instead I use
data_j=data_out.to_json
then I get:
\\\"address\\\1":\\\"123 Street\\\",\\\"address2\\\":\\\"Apt 1\\\",\\\"city\\\":\\\"City\\\",\\\"state\\\":\\\"ST\\\",\\\"zip\\\":\\\"xxxxx\\\",\\\"country\\\":\\\"US\\\"
Further, only using the json class seems to allow me to select the entries I want:
parsed[1]["address1"]
=> "123 Street"
data_j[1]["address1"]
TypeError: can't convert String into Integer
from (irb):17:in `[]'
from (irb):17
from :0
Any idea whats going on? I guess since the json commands are working I can use them, but it is disconcerting that its scrambling the entries and order of the objects.

Although the data appears ordered in string form, it represents an unordered dataset. The line:
parsed = JSON.parse(data_out)
which you use is the correct way to convert the string form into something usable in Ruby. I cannot see the full structure from your example, so I don't know whether the top level is an array or id-based hash. I suspect the latter since you say it becomes unordered when you view from Ruby. Therefore, if you knew which part of the address you were interested in you might have code like this:
# Writes all the cities
parsed.each do |id,data|
puts data["city"]
end
If the outer structure is an array, you'd do this:
# Writes all the cities
parsed.each do |data|
puts data["city"]
end

Related

Ruby extract single key value from first block in json

I'm parsing a very large json output from an application API and end up with a ruby array similar to the sanitized version below:
{"log_entries"=>
[{"id=>"SDF888B2B2KAZZ0AGGB200",
"type"=>"warning",
"summary"=>"Things happened",
"created"=>"2017-07-11T18:40:31Z",
"person"=>
{"id"=>"44bAN8",
"name"=>"Harry"}
"system"=>"local",
"service"=>"syslog"
{"id=>"HMB001NBALLB81MMLLABLK",
"type"=>"info",
"summary"=>"Notice",
"created"=>"2017-06-02T11:23:21Z",
"person"=>
{"id"=>"372z1j",
"name"=>"Sally"}
"system"=>"local",
"service"=>"syslog"}]},
"other"=>200,
"set"=>0,
"more"=>false,
"total"=nil}
I just need to be able to print the value of the "created" key only in the first block. Meaning, when the program exits, I need it to print "2017-07-11T18:40:31Z." I've googled a lot but wasn't successful in finding anything. I've tried something like:
puts ["log_entries"]["id"]["created"]
My expectation was to print all of them to start somewhere and even that yields an error. Forgive me, I don't use ruby much.
Since log_entries is an array you can just access the first element and get its created value.
Assuming the variable result holds the whole hash (the JSON you parse from the API):
puts result['log_entries'][0]['created']
will print out the first date. Now you might want to guard that for cases where log_entries empty, so wrap it in a if:
if result['log_entries'].any?
puts result['log_entries'][0]['created']
end
Your json is not in valid format. But assuming you have the right format, following should work
result["log_entries"].collect{|entry| entry["created"]}
=> ["2017-07-11T18:40:31Z", "2017-06-02T11:23:21Z"]
Above code will collect all the created date and give you an array

How to check whether a value exists in a Ruby structure?

I used to have a series of independent arrays (e.g. name(), id(), description() ). I used to be able to check whether a value existed in a specific array by doing name.include?("Mark")
Now that I moved to a MUCH MORE elegant way to manage different these independent arrays (here for background: How do I convert an Array with a JSON string into a JSON object (ruby)) I am trying to figure out how I do the same.
In short I put all the independent arrays in a single structure so that I can reference the content as object().name, object().id, object().description.
However I am missing now how I can check whether the object array has a value "Mark" in its name structure.
I have tried object.name.include?("Mark") but it doesn't quite like it.
I have also tried to use has_value?but that doesn't seem to be working either (likely because it used to be an hash before I imported it into the structure but right now is no longer a hash - see here: How do I convert an Array with a JSON string into a JSON object (ruby))
Thoughts? How can I check whether object.name contains a certain string?
Thanks.
If you want to find all customers called Mark you can write the following:
customers_named_mark = array_of_customers.select{|c| c.name == 'Mark' }
This will return a potentially empty array.
If you want to find the first customer named Mark, write
customer_named_mark = array_of_customers.detect{|c| c.name == 'Mark' }
This will return the first matching item or nil.

Can we store multiple objects in file?

I am already familiar with How can I save an object to a file?
But what if we have to store multiple objects (say hashes) to a file.
I tried appending YAML.dump(hash) to a file from various locations in my code. But the difficult part is reading it back. As yaml dump can extend to many lines, do I have to parse the file? Also this will only complicate code. Is there a better way to achieve this?
PS: Same issue will persist with Marshal.dump. So I prefer YAML as its more human readable.
YAML.dump creates a single Yaml document. If you have several Yaml documents together in a file then you have a Yaml stream. So when you appended the results from several calls to YAML.dump together you would have had a stream.
If you try reading this back using YAML.load you will only get the first document. To get all the documents back you can use YAML.load_stream, which will give you an array with an entry for each of the documents.
An example:
f = File.open('data.yml', 'w')
YAML.dump({:foo => 'bar'}, f)
YAML.dump({:baz => 'qux'}, f)
f.close
After this data.yml will look like this, containing two separate documents:
---
:foo: bar
---
:baz: qux
You can now read it back like this:
all_docs = YAML.load_stream(File.open('data.yml'))
Which will give you an array like [{:foo=>"bar"}, {:baz=>"qux"}].
If you don’t want to load all the documents into an array in one go you can pass a block to load_stream and handle each document as it is parsed:
YAML.load_stream(File.open('data.yml')) do |doc|
# handle the doc here
end
You could manage to save multiple objects by creating a delimiter (something to mark that one object is finished and that you go to the next one). You could then process the file in two steps:
read the file, splitting it around each delimiter
use YAML to restore the hashes from each chunk
Now, this would be a bit cumbersome, as there is a much simpler solution. Let's say you have three hash to save:
student = { first_name: "John"}
restaurant = { location: "21 Jump Street" }
order = { main_dish: "Happy Meal" }
You can simply put them in an array and then dump them:
objects = [student, restaurant, order]
dump = YAML.dump(objects)
You can restore your objects easily:
saved_objects = YAML.load(dump)
saved_student = saved_objects[0]
Depending of your objects relationship, you may prefer to use an Hash to save them instead of an array (so that you can name them instead of depending on the order).

unwrapping an object returned from twitter api

While reading some data from the Twitter api, I inserted the data into the file like this
results.each do |f|
running_count += 1
myfile.puts "#{f.user_mentions}"
...
The results (2 sample lines below) look like this in the file
[#<Twitter::Entity::UserMention:0x007fda754035803485 #attrs={:screen_name=>"mr_blah_blah", :name=>"mr blah blah", :id=>2142450461, :id_str=>"2141354324324", :indices=>[3, 15]}>]
[#<Twitter::Entity::UserMention:0x007f490580928 #attrs={:screen_name=>"andrew_jackson", :name=>"Andy Jackson", :id=>1607sdfds, :id_str=>"16345435", :indices=>[3, 14]}>]
Since the only information I'm actually interested in is the :screen_name, I was wondering if there's a way that I could only insert the screen names into the file. Since each line is in array brackets and then I'm looking for the screen name inside the #attrs, I did this
myfile.puts "#{f.user_mentions[0]#attrs{"screen_name"}}"
This didn't work, and I didn't expect it to, as I'm not really sure if that's technically array etc. Can you suggest how it would be done?
You need to access the #attrs instance variable in the Twitter UserMention object. If you want to puts the screen name from the first object, based on your current output, I would write
myfile.puts "#{f.user_mentions[0].attrs[:screen_name]"
Also, putting the code on how results is returned would help get a definite answer quickly. Cheers!
Assuming that results is an array of Twitter::Entity::UserMention
results.each do |r|
myfile.puts r.screen_name
end

Extract JSON values from remote api with Ruby

I'm trying to grab some data from last.fm and use it in a simple sinatra app. I've worked out how to open the document but having issues extracting the data in ruby here is the first list of the API data I'd like to grab the name:
{"similarartists":{"artist":[{"name":"Sonny & Cher"}]}
This is just an extract of the return, I'm using this in my rb file:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=xxx&format=json").read)
puts data["similarartists"]["artist"]["name"]
It doesn't seem to be working I get can't convert String into Integer (TypeError) on ruby 1.9.3 but the name in the JSON isn't an integer? If I just put the following:
puts data["similarartists"]["artist"]
It returns the whole thing, but I want to grab inside of that and get the name.
"name"=>"Interpol"
I don't understand why it would complain about integers when the name is a string? Hope someone can help me!
Based on the comments thread, the issue is a misunderstanding of the structure of the data returned from the API call.
The exact issue was the structure had an array of artists under the artist key so to get at the name you need to do:
data['similarartists']['artist'][0]['name']
Note though that you should only do that if you are sure there will only be one artist. The nature of the return data suggests that won't always be the case so you might be better off pulling all names depending on your use doing something like:
data['similarartists']['artist'].map {|a| a['name']}.join(',')
That will join all of the artist names together comma separated.
In the future, you can track this issue down by looking at the full structure of the return data and making sure you see the correct structure. The docs on the API may indicate some help here too.
You also might check if someone has made a gem for accessing the API. Often a gem will up-level some of this raw output and give you a nice object to work with. I suggest searching GitHub for a last.fm gem.
The problem is that you are trying to access an Array with the index "name", Ruby tries to convert this to an Integer and fails which results in the Error message you are seeing.
If you test the class of data["similarartists"]["artist"].class you will see that it returns Array. So basically what is happening is that the JSON.parse() called created as the value of data["similarartists"]["artist"] an Array of Hashes. To access all of the artist names you can simply iterate through this array:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=29da5a0e01ca2d1524cac596d5462d67&format=jso\
n").read)
# iterate through the Array of returned artists and print their names
data["similarartists"]["artist"].each do |artist|
puts artist["name"]
end
# output
# Interpol
# White Lies
# The Cinematics
# Smith & Burrows
# The National
# Julian Plenti
# She Wants Revenge
# etc ...
If you only want the first entry for Interpol you can just use index [0]:
puts data["similarartists"]["artist"][0]["name"]

Resources