Extract JSON values from remote api with Ruby - ruby

I'm trying to grab some data from last.fm and use it in a simple sinatra app. I've worked out how to open the document but having issues extracting the data in ruby here is the first list of the API data I'd like to grab the name:
{"similarartists":{"artist":[{"name":"Sonny & Cher"}]}
This is just an extract of the return, I'm using this in my rb file:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=xxx&format=json").read)
puts data["similarartists"]["artist"]["name"]
It doesn't seem to be working I get can't convert String into Integer (TypeError) on ruby 1.9.3 but the name in the JSON isn't an integer? If I just put the following:
puts data["similarartists"]["artist"]
It returns the whole thing, but I want to grab inside of that and get the name.
"name"=>"Interpol"
I don't understand why it would complain about integers when the name is a string? Hope someone can help me!

Based on the comments thread, the issue is a misunderstanding of the structure of the data returned from the API call.
The exact issue was the structure had an array of artists under the artist key so to get at the name you need to do:
data['similarartists']['artist'][0]['name']
Note though that you should only do that if you are sure there will only be one artist. The nature of the return data suggests that won't always be the case so you might be better off pulling all names depending on your use doing something like:
data['similarartists']['artist'].map {|a| a['name']}.join(',')
That will join all of the artist names together comma separated.
In the future, you can track this issue down by looking at the full structure of the return data and making sure you see the correct structure. The docs on the API may indicate some help here too.
You also might check if someone has made a gem for accessing the API. Often a gem will up-level some of this raw output and give you a nice object to work with. I suggest searching GitHub for a last.fm gem.

The problem is that you are trying to access an Array with the index "name", Ruby tries to convert this to an Integer and fails which results in the Error message you are seeing.
If you test the class of data["similarartists"]["artist"].class you will see that it returns Array. So basically what is happening is that the JSON.parse() called created as the value of data["similarartists"]["artist"] an Array of Hashes. To access all of the artist names you can simply iterate through this array:
require 'json'
require 'open-uri'
data = JSON.parse(open("http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=editors&api_key=29da5a0e01ca2d1524cac596d5462d67&format=jso\
n").read)
# iterate through the Array of returned artists and print their names
data["similarartists"]["artist"].each do |artist|
puts artist["name"]
end
# output
# Interpol
# White Lies
# The Cinematics
# Smith & Burrows
# The National
# Julian Plenti
# She Wants Revenge
# etc ...
If you only want the first entry for Interpol you can just use index [0]:
puts data["similarartists"]["artist"][0]["name"]

Related

Ruby extract single key value from first block in json

I'm parsing a very large json output from an application API and end up with a ruby array similar to the sanitized version below:
{"log_entries"=>
[{"id=>"SDF888B2B2KAZZ0AGGB200",
"type"=>"warning",
"summary"=>"Things happened",
"created"=>"2017-07-11T18:40:31Z",
"person"=>
{"id"=>"44bAN8",
"name"=>"Harry"}
"system"=>"local",
"service"=>"syslog"
{"id=>"HMB001NBALLB81MMLLABLK",
"type"=>"info",
"summary"=>"Notice",
"created"=>"2017-06-02T11:23:21Z",
"person"=>
{"id"=>"372z1j",
"name"=>"Sally"}
"system"=>"local",
"service"=>"syslog"}]},
"other"=>200,
"set"=>0,
"more"=>false,
"total"=nil}
I just need to be able to print the value of the "created" key only in the first block. Meaning, when the program exits, I need it to print "2017-07-11T18:40:31Z." I've googled a lot but wasn't successful in finding anything. I've tried something like:
puts ["log_entries"]["id"]["created"]
My expectation was to print all of them to start somewhere and even that yields an error. Forgive me, I don't use ruby much.
Since log_entries is an array you can just access the first element and get its created value.
Assuming the variable result holds the whole hash (the JSON you parse from the API):
puts result['log_entries'][0]['created']
will print out the first date. Now you might want to guard that for cases where log_entries empty, so wrap it in a if:
if result['log_entries'].any?
puts result['log_entries'][0]['created']
end
Your json is not in valid format. But assuming you have the right format, following should work
result["log_entries"].collect{|entry| entry["created"]}
=> ["2017-07-11T18:40:31Z", "2017-06-02T11:23:21Z"]
Above code will collect all the created date and give you an array

Bizarre field switch between cli and file output in Ruby

I'm having such an strange issue with an ruby script which i'm working with... in this script i parse an iTunes Library xml file and form objects for Artists, Albums and Tracks. In my Album class, i have two numeric field, YEAR and TRACK_COUNT.
My script parses correctly the two fields, let's say, for example, the output of it:
#<Album:0x007f59b1472a18 #compilation=false, #title="Straight Out Of Hell", #year=2013, #track_count=13, #trackList=[], #coverList=[]>
when i output this same object to file, it get crippled, transforming to this, here in json format:
{"compilation":false,"title":"Straight Out Of Hell","year":13,"track_count":13,"trackList":[],"coverList":[]}]
as you can see, the field YEAR get overwritten with the value in TRACK_COUNT field... i'm getting crazy with this, as i don't do any change to this field between these outputs!
UPDATE
As asked by #Amadan...
http://pastebin.com/1FUuvaCr Biblioteca.xml (EXCERPT)
http://pastebin.com/F8wgu6bz Track.rb
http://pastebin.com/3qhd4TRU Song.rb
http://pastebin.com/RNf5S7AZ dependencies.rb
http://pastebin.com/haXPpJgN Cover.rb
http://pastebin.com/1JYtT1nn Artist.rb
http://pastebin.com/qsgLsAJa Album.rb
http://pastebin.com/eiUAMfwR app.rb (MAIN SCRIPT)
This is happening because your source file is not as clean as you believe it to be. In some albums in the source XML, "Track Count" and "Year" are appearing on the same line, without a recognized line break between them. So you might have a line like this:
<key>Track Count</key><integer>12</integer><key>Year</key><integer>2006</integer>
When your if-else-if ladder asks if "track count" appears in the line, it does, so you're grabbing the first <integer>something</integer> match on the line. This works fine. But when you try to extract the year out of this line, you're again asking for the first <integer> on the line, which is the Track Count.
The bigger problem is that you're attempting to parse an XML file line-by-line, and that's not how they're meant to be read. Install the nokogiri gem and call this:
data = Nokogiri::XML('Biblioteca.xml')
Now you can get to any information contained in the document. The official tutorials on user Nokogiri are here: http://www.nokogiri.org/tutorials/
Use this method to parse your file:
def parse filename
xml = Nokogiri::XML(filename)
songs = xml.css('dict key').select{|key| key.text =~ /^[0-9]{4}$/}
songs.map do |song|
info = {}
song.next_element.css('key').each do |attribute|
info[attribute.text] = attribute.next_element.text
end
info
end
end
This will create a list of song hashes. Here are some examples for how to use it:
# load the two songs in your example file
songs = parse('Biblioteca.xml')
# Get the year of the first song
songs[0]['Year'] #=> 2006
# Get the Track Count of the second song's album
songs[1]['Track Count'] #=> 12
# Get the Name of the second song
songs[1]['Name'] #=> 'Baby Come On'
# Get the Album name of the second song
songs[1]['Album'] #=> 'When Your Heart Stops Beating'
From here, you can easily put info into your song objects. Let me know if you have any more questions.
I've found a library for iTunes dodgy plist xml standart... Nokogiri-plist... working fine now :D

unwrapping an object returned from twitter api

While reading some data from the Twitter api, I inserted the data into the file like this
results.each do |f|
running_count += 1
myfile.puts "#{f.user_mentions}"
...
The results (2 sample lines below) look like this in the file
[#<Twitter::Entity::UserMention:0x007fda754035803485 #attrs={:screen_name=>"mr_blah_blah", :name=>"mr blah blah", :id=>2142450461, :id_str=>"2141354324324", :indices=>[3, 15]}>]
[#<Twitter::Entity::UserMention:0x007f490580928 #attrs={:screen_name=>"andrew_jackson", :name=>"Andy Jackson", :id=>1607sdfds, :id_str=>"16345435", :indices=>[3, 14]}>]
Since the only information I'm actually interested in is the :screen_name, I was wondering if there's a way that I could only insert the screen names into the file. Since each line is in array brackets and then I'm looking for the screen name inside the #attrs, I did this
myfile.puts "#{f.user_mentions[0]#attrs{"screen_name"}}"
This didn't work, and I didn't expect it to, as I'm not really sure if that's technically array etc. Can you suggest how it would be done?
You need to access the #attrs instance variable in the Twitter UserMention object. If you want to puts the screen name from the first object, based on your current output, I would write
myfile.puts "#{f.user_mentions[0].attrs[:screen_name]"
Also, putting the code on how results is returned would help get a definite answer quickly. Cheers!
Assuming that results is an array of Twitter::Entity::UserMention
results.each do |r|
myfile.puts r.screen_name
end

Weird JSON parsing issues with Ruby

I'm downloading content from a webpage that seems to be in JSON. It is a large file with the following format:
"address1":"123 Street","address2":"Apt 1","city":"City","state":"ST","zip":"xxxxx","country":"US"
There are about 1000 of these entries, where each entry is contained within brackets. When I download the page using RestClient.get (open-uri for some reason was throwing a http 500 error), the data is in the following format:
\"address\1":\"123 Street\",\"address2\":\"Apt 1\",\"city\":\"City\",\"state\":\"ST\",\"zip\":\"xxxxx\",\"country\":\"US\"
When I then use the json class
parsed = JSON.parse(data_out)
it completely scrambles both the order of entries within the data structure, and also the order of the objects within each entry, for example:
"address1"=>"123 Street", "city"=>"City", "country"=>"US", "address2"=>"Apt 1"
If instead I use
data_j=data_out.to_json
then I get:
\\\"address\\\1":\\\"123 Street\\\",\\\"address2\\\":\\\"Apt 1\\\",\\\"city\\\":\\\"City\\\",\\\"state\\\":\\\"ST\\\",\\\"zip\\\":\\\"xxxxx\\\",\\\"country\\\":\\\"US\\\"
Further, only using the json class seems to allow me to select the entries I want:
parsed[1]["address1"]
=> "123 Street"
data_j[1]["address1"]
TypeError: can't convert String into Integer
from (irb):17:in `[]'
from (irb):17
from :0
Any idea whats going on? I guess since the json commands are working I can use them, but it is disconcerting that its scrambling the entries and order of the objects.
Although the data appears ordered in string form, it represents an unordered dataset. The line:
parsed = JSON.parse(data_out)
which you use is the correct way to convert the string form into something usable in Ruby. I cannot see the full structure from your example, so I don't know whether the top level is an array or id-based hash. I suspect the latter since you say it becomes unordered when you view from Ruby. Therefore, if you knew which part of the address you were interested in you might have code like this:
# Writes all the cities
parsed.each do |id,data|
puts data["city"]
end
If the outer structure is an array, you'd do this:
# Writes all the cities
parsed.each do |data|
puts data["city"]
end

Ruby and Excel Data Extraction

I am learning Ruby and trying to manipulate Excel data.
my goal:
To be able to extract email addresses from an excel file and place them in a text file one per line and add a comma to the end.
my ideas:
i think my answer lies in the use of spreadsheet and File.new.
What I am looking for is direction. I would like to hear any tips or rather hints to accomplish my goal. thanks
Please do not post exact code only looking for direction would like to figure it out myself...
thanks, karen
UPDATE::
So, regex seems to be able to find all matching strings and store them into an array. I´m having some trouble setting that up but should be able to figure it out....but for right now to get started I will extract only the column labeled "E Mail"..... the question I have now is:
`parse_csv = CSV.parse(read_csv, :headers => true)`
The default value for :skip_blanks is set to false.. I need to set it to true but nowhere can I find the correct syntax for doing so... I was assumming something like
`parse_csv = CSV.parse(read_csv, :headers => true :skip_blanks => true)`
But no.....
save your excel file as csv (comma separated value) and work with Ruby's libraries
besides spreadsheet (which can read and write), you can read Excel and other file types with with RemoteTable.
gem install remote_table
and
require 'remote_table'
t = RemoteTable.new('/path/to/file.xlsx', headers: :first_row)
when you write the CSV, as #aug2uag says, you can use ruby's standard library (no gem install required):
require 'csv'
puts [name, email].to_csv
Personally, I'd keep it as simple as possible and use a CSV.
Here is some pseudocode of how that would work:
read in your file line by line
extract your fields using regex, or cell count (depending on how consistent the email address location is), and insert into an arry
iterate through the array and write the values in the fashion you wish (to console, or file)
The code in the comment you had is a great start, however, puts will only write to console, not file. You will also need to figure out how you are going to know you are getting the email address.
Hope this helps.

Resources