Working with nested hashes in Rails 3 - ruby

I'm working with the Koala gem and the Facebook Graph API, and I want to break down the results I get for a users feed into separate variables for inserting into a mySQL database, probably using Active Record. Here is the code I have so far:
#token = Service.where(:provider => 'facebook', :user_id => session[:user_id]).first.token
#graph = Koala::Facebook::GraphAPI.new(#token)
#feeds = params[:page] ? #graph.get_page(params[:page]) : #graph.get_connections("me", "home")
And here is what #feeds looks like:
[{"id"=>"1519989351_1799856285747", "from"=>{"name"=>"April Daggett Swayne", "id"=>"1519989351"},
"picture"=>"http://photos-d.ak.fbcdn.net/hphotos-ak-ash4/270060_1799856805760_1519989351_31482916_3866652_s.jpg",
"link"=>"http://www.facebook.com/photo.php?fbid=1799856805760&set=a.1493877356465.2064294.1519989351&type=1", "name"=>"Mobile Uploads",
"icon"=>"http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/og8V99JVf8G.gif", "type"=>"photo", "object_id"=>"1799856805760", "application"=>{"name"=>"Facebook for Android",
"id"=>"350685531728"}, "created_time"=>"2011-07-03T03:14:04+0000", "updated_time"=>"2011-07-03T03:14:04+0000"}, {"id"=>"2733058_10100271380562998", "from"=>{"name"=>"Joshua Ramirez",
"id"=>"2733058"}, "message"=>"Just posted a photo",
"picture"=>"http://platform.ak.fbcdn.net/www/app_full_proxy.php?app=124024574287414&v=1&size=z&cksum=228788edbab39cb34861aecd197ff458&src=http%3A%2F%2Fimages.instagram.com%2Fmedia%2F2011%2F07%2F02%2F2ad9768378cf405fad404b63bf5e2053_7.jpg",
"link"=>"http://instagr.am/p/G1tp8/", "name"=>"jtrainexpress's photo", "caption"=>"instagr.am",
"icon"=>"http://photos-e.ak.fbcdn.net/photos-ak-snc1/v27562/10/124024574287414/app_2_124024574287414_6936.gif", "actions"=>[{"name"=>"Comment",
"link"=>"http://www.facebook.com/2733058/posts/10100271380562998"}, {"name"=>"Like", "link"=>"http://www.facebook.com/2733058/posts/10100271380562998"}], "type"=>"link",
"application"=>{"name"=>"Instagram", "id"=>"124024574287414"}, "created_time"=>"2011-07-03T02:07:37+0000", "updated_time"=>"2011-07-03T02:07:37+0000"},
{"id"=>"588368718_10150230423643719", "from"=>{"name"=>"Eric Bailey", "id"=>"588368718"}, "link"=>"http://www.facebook.com/pages/Martis-Camp/105474549513998", "name"=>"Martis Camp",
"caption"=>"Eric checked in at Martis Camp.", "description"=>"Rockin the pool", "icon"=>"http://www.facebook.com/images/icons/place.png", "actions"=>[{"name"=>"Comment",
"link"=>"http://www.facebook.com/588368718/posts/10150230423643719"}, {"name"=>"Like", "link"=>"http://www.facebook.com/588368718/posts/10150230423643719"}],
"place"=>{"id"=>"105474549513998", "name"=>"Martis Camp", "location"=>{"city"=>"Truckee", "state"=>"CA", "country"=>"United States", "latitude"=>39.282813917575,
"longitude"=>-120.16736760768}}, "type"=>"checkin", "application"=>{"name"=>"Facebook for iPhone", "id"=>"6628568379"}, "created_time"=>"2011-07-03T01:58:32+0000",
"updated_time"=>"2011-07-03T01:58:32+0000", "likes"=>{"data"=>[{"name"=>"Mike Janes", "id"=>"725535294"}], "count"=>1}}]
I have looked around for clues on this, and haven't found it yet (but I'm still working on my stackoverflow-foo). Any help would be greatly appreciated.

That isn't a Ruby Hash, that's a fragment of a JSON string. First you need to decode into a Ruby data structure:
# If your JSON string is in json...
h = ActiveSupport::JSON.decode(json) # Or your favorite JSON decoder.
Now you'll have a Hash in h so you can access it like any other Hash:
array = h['data']
puts array[0]['id']
# prints out 1111111111_0000000000000
puts array[0]['from']['name']
# prints Jane Done

Related

How to pull data from tags based on other tags

I have the following example document:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<n1:Form109495CTransmittalUpstream xmlns="urn:us:gov:treasury:irs:ext:aca:air:7.0" xmlns:irs="urn:us:gov:treasury:irs:common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage IRS-Form1094-1095CTransmitterUpstreamMessage.xsd" xmlns:n1="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage">
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>1</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JOHN</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>Doe</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>123456790</irs:SSN>
</Form1095CUpstreamDetail>
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>2</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JANE</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>DOE</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>222222222</irs:SSN>
</EmployeeInfoGrp>
</Form1095CUpstreamDetail>
</n1:Form109495CTransmittalUpstream>
Using Nokogiri I want to extract the value between the <PersonFirstNm>, <PersonLastNm> and <irs:SSN> for each <Form1095CUpstreamDetail> based on the <RecordId>.
I tried removing namespaces as well. I posted a small snippet, but I have tried many iterations of working through the XML with no success. This is my first time using XML, so I realize I am likely missing something easy.
When I set my XPath:
require 'nokogiri'
submission_doc = Nokogiri::XML(open('1094C_Request.xml'))
submissions = submission_doc.remove_namespaces
nodes = submission.xpath('//Form1095CUpstreamDetail')
I do not seem to have any association between the RecordId and the tags mentioned above, and I am stuck on where to go next.
The fields are not listed as children for the RecordId, so I can't think of how to approach obtaining their values. I am including the full document as an example to make sure I am not excluding anything.
I have an array of values, and I would like to pull the three tags mentioned above if the RecordId is contained within the array of numbers.
Nokogiri makes it pretty easy to do what you want (assuming the XML is syntactically correct). I'd do something like:
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(<<EOT)
<n1:Form109495CTransmittalUpstream xmlns="urn:us:gov:treasury:irs:ext:aca:air:7.0" xmlns:irs="urn:us:gov:treasury:irs:common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage IRS-Form1094-1095CTransmitterUpstreamMessage.xsd" xmlns:n1="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage">
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>1</RecordId>
<PersonFirstNm>JOHN</PersonFirstNm>
<PersonLastNm>Doe</PersonLastNm>
<irs:SSN>123456790</irs:SSN>
</Form1095CUpstreamDetail>
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>2</RecordId>
<PersonFirstNm>JANE</PersonFirstNm>
<PersonLastNm>DOE</PersonLastNm>
<irs:SSN>222222222</irs:SSN>
</Form1095CUpstreamDetail>
</Form109495CTransmittalUpstream>
EOT
info = doc.search('Form1095CUpstreamDetail').map{ |form|
{
record_id: form.at('RecordId').text,
person_first_nm: form.at('PersonFirstNm').text,
person_last_nm: form.at('PersonLastNm').text,
ssn: form.at('irs|SSN').text
}
}
pp info
# >> [{:record_id=>"1",
# >> :person_first_nm=>"JOHN",
# >> :person_last_nm=>"Doe",
# >> :ssn=>"123456790"},
# >> {:record_id=>"2",
# >> :person_first_nm=>"JANE",
# >> :person_last_nm=>"DOE",
# >> :ssn=>"222222222"}]
While it's possible to do this with XPath, Nokogiri's implementation of CSS selectors tends to result in more easily read selectors, which translates to easier to maintain, which is a very good thing.
You'll see the use of | in 'irs|SSN' which is Nokogiri's way of defining a namespace for CSS. This is documented in "Namespaces".
First of all the xml validator reports error
The default (no prefix) Namespace URI for XPath queries is always '' and it cannot be redefined to 'urn:us:gov:treasury:irs:ext:aca:air:7.0'.
so you must set this default xmlns to "".
You can use this code.
require 'nokogiri'
doc = Nokogiri::XML(open('1094C_Request.xml'))
doc.namespaces['xmlns'] = ''
details = doc.xpath("//:Form1095CUpstreamDetail")
elem_a = ["PersonFirstNm", "PersonLastNm", "irs:SSN"]
output = details.each_with_object({}) do |element, exp|
exp[element.xpath("./:RecordId").text] = elem_a.each_with_object({}) do |elem_n, exp_h|
exp_h[elem_n] = element.xpath(".//#{elem_n.include?(':') ? elem_n : ":#{elem_n}"}").text
end
end
output
p output
# {
# "1" => {"PersonFirstNm" => "JOHN", "PersonLastNm" => "Doe", "irs:SSN" => "123456790"},
# "2" => {"PersonFirstNm" => "JANE", "PersonLastNm" => "DOE", "irs:SSN" => "222222222"}
# }
I hope this helps

Building Data Structure out of JSON output

Hi Im working on writing a script that pulls data out of a ticketing system. Once it pulls the data it analyze the content of it and if it the content match a specific criteria then it needs to build a data structure file that will be dump in the same server.
I was able to parse the data in JSON format, listed below is the content:
[{"id"=>10423,
"type"=>"Ticket",
"lastUpdated"=>"2014-11-04T10:58:47Z",
"shortSubject"=>"FOO STATUS UPDATE",
"shortDetail"=>"Reply to this message if all systems are functional..",
"displayClient"=>"No Client",
"updateFlagType"=>0,
"prettyLastUpdated"=>"54 minutes ago",
"latestNote"=>
{"id"=>16850,
"type"=>"TechNote",
"mobileListText"=>"<b>t. trust: </b> All Systems are OK",
"noteColor"=>"clear",
"noteClass"=>"bubble right"}},
{"id"=>10422,
"type"=>"Ticket",
"lastUpdated"=>"2014-11-04T10:54:07Z",
"shortSubject"=>"FOO STATUS UPDATE",
"shortDetail"=>"Reply to this message if all systems are functional..",
"displayClient"=>"No Client",
"updateFlagType"=>0,
"prettyLastUpdated"=>"58 minutes ago",
"latestNote"=>nil},
{"id"=>10421,
"type"=>"Ticket",
"lastUpdated"=>"2014-11-04T10:53:17Z",
"shortSubject"=>"FOO STATUS UPDATE",
"shortDetail"=>"Reply to this message if all systems are functional..",
"displayClient"=>"No Client",
"updateFlagType"=>0,
"prettyLastUpdated"=>"59 minutes ago",
"latestNote"=>nil}]
In the data above you can see that each ticket has an id, lastupdate, short Subject, short Detail and lastest note the value of the latest note will be nill if no one reply to the ticket but if someone does reply then the value mobileListText will have something.
So what I need to do pretty much is once I get this data the script will look for the subject that complies with "FOO STATUS UPDATE" if that value matches then looks for the content of the shortDetail matches "Reply to this message if all systems are functional.." and if this complies then looks for its latestNote, if latestNote is nill then it will create a log file specifiying date and time when it run, the id of the ticket with this state and a message saying, ticket has not being reply, but if the latest note has the value "mobileListText"=>"t. trust: All Systems are OK", then creates the follwing data structure:
{"LastUpdate":1415130257,"Service":[{"time":"11-04-2014 10:58:47 GMT","region:":"","id":"","description":"All Systems are OK","service":""},{"time":"11-04-2014 10:54:07 GMT","region:":"","id":"","description":"All Systems are OK","service":""},{"time":"11-04-2014 10:53:17 GMT","region:":"","id":"","description":"All Systems are OK","service":""}]}
Im able to have part of this however, based on the data above, only one ticket has All Systems are OK, meaining that only one of the tickets has being reply, and it only should write something like this:
{"LastUpdate":1415130257,"Service":[{"time":"11-04-2014 10:58:47 GMT","region:":"","id":"","description":"All Systems are OK","service":""}]}
But Instead repeats this only ticket that has being replied serveral times.
this my code so far:
require 'rubygems'
require 'json'
require 'net/http'
require 'highline/import'
require 'pp'
require 'logger'
#usersol='foo'
#passol= 'foo123'
#urlsol= "http://dev-webhelpdesk.foo.corp:8081/helpdesk/WebObjects/Helpdesk.woa/ra/Tickets?list=group&page=1&limit=#{#limit}&username=#{#usersol}&password=#{#passol}"
#limit = '25'
#log = #log= Logger.new( 'message_solar.log')
def ticket_data #looks for ticket data in solarwinds
resp = Net::HTTP.get_response(URI.parse(#urlsol))
url_output = resp.body
JSON.parse(url_output)
end
#CRONJOB THAT START ALL
#echo "Reply to this message if all systems are functional.." | mail -r noc#foo.com -s "FOO STATUS UPDATE:" noc-team#FOO.com >> /dev/null
# Looking for all the tickets with the following content
# ticket id, ticket subject and content
def search_allok(allok)
description = []
allok.each do |systems|
output1 = systems.has_key?'id'
if output1
systems.values_at('shortSubject').each do | subject |
output2 = subject.match(%r(TRUST STATUS UPDATE))
if output2
latestnote = systems.values_at('latestNote')
latestnote.each do |content|
if content
final = content.values_at('mobileListText')
final_ok = final[0].sub!(/^\<b\>.*\<\/b\>\s/, "")
systems_ok = final_ok.match(%r(All Systems are OK))
if systems_ok
ids = systems['id']
notify = {"LastUpdate" => Time.now.to_i, "Service" => []}
allok.each do |lastup|
reference = lastup.has_key? 'id'
if reference
timeid = lastup.values_at('lastUpdated')
timeid.each do |lines|
final=lines.split(/[-, T, Z]/)
notify["Service"] << { "time" => "#{final[1]}-#{final[2]}-#{final[0]} #{final[3]} GMT", "region:" => "", "id" => "#{ids}", "description" => "#{systems_ok}" , "service" => ''}
end
end
end
File.open("notify.json", "w") do |fileformatted|
fileformatted.puts (JSON.dump(notify))
end
else
time = Time.now
#log.info("#{time} - Ticket ID #{systems['id']} has not being updated")
end
else
#log.info("#{time} - Ticket ID #{systems['id']} has not being reply")
end
end
end
end
end
end
end
# If the content is there then it need to create
# the data structure including the lastupdated
# (time when it run the script), and the lastupdate for the ticket
# and the description All Systems OK
#This method below I added to the one above, but I was thinking on doing it separate but I encouter issues passing the information needed from above to below
def datastructure(format_file) #creates JSON file lastupdated of each ticket in the queue
notify = {"LastUpdate" => Time.now.to_i, "Service" => []}
format_file.each do |lastup|
reference = lastup.has_key? 'id'
if reference
timeid = lastup.values_at('lastUpdated')
timeid.each do |lines|
final=lines.split(/[-, T, Z]/)
notify["Service"] << { "time" => "#{final[1]}-#{final[2]}-#{final[0]} #{final[3]} GMT", "region:" => "", "id" => "", "description" => region , "service" => ''}
end
end
end
File.open("notify.json", "w") do |fileformatted|
fileformatted.puts (JSON.dump(notify))
end
end
#ticket_data
#datastructure(ticket_data)
search_allok(ticket_data)
The code you've written is basically doing a roundabout version of what is achieved by using ruby's map and select methods. See this article: Ruby Explained: Map, Select, and Other Enumerable Methods

Generating KML files with Ruby

I'm using the ruby_kml gem right now to try to generate KML from some data in my model.
I also tried georuby.
Both of them, when they generate XML it seems to be coming back escaped like this:
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<kml xmlns=\"http://earth.google.com/kml/2.1\">\n <Folder>\n <name>San Francisco</name>\n <LineStyle>\n <color>#0D7215</color>\n </LineStyle>\n <Placemark>\n <name>21 Google Bus</name>\n <description>\n <![CDATA[Click to add description.]]>\n </description>\n <LineString>\n <coordinates>37.784282779035216,-122.42228507995605 37.784144999999995,-122.42225699999999,37.784084,-122.42274499999999,37.785472,-122.423023,37.785391,-122.423564,37.785364,-122.423839,37.785418,-122.424714,37.785410999999996,-122.42497999999999,37.785391,-122.42522,37.784839,-122.42956,37.784631,-122.431297,37.782576,-122.43086799999999,37.776969,-122.42975399999999,37.776759999999996,-122.431384,37.776368,-122.431305 37.776368,-122.431305,37.777699999999996,-122.431575,37.778746999999996,-122.42335399999999,37.773609,-122.42231199999999,37.773013999999996,-122.42222799999999,37.772974999999995,-122.42222799999999,37.772915,-122.42226799999999,37.772774,-122.422446,37.772636999999996,-122.422585,37.772562,-122.42263399999999,37.772521999999995,-122.422643,37.771588,-122.42253799999999,37.771631,-122.421759</coordinates>\n </LineString>\n </Placemark>\n <LineStyle>\n <color>#0071CA</color>\n </LineStyle>\n <Placemark>\n <name>45 Inverter</name>\n <description>\n <![CDATA[Click to add description.]]>\n </description>\n <LineString>\n <coordinates>37.792490234462946,-122.40863800048828 37.792516,-122.408429,37.793068,-122.408541,37.792957,-122.409357,37.792051,-122.409189,37.788289999999996,-122.40841499999999,37.785495,-122.407866,37.785713,-122.406229,37.785713,-122.40591599999999,37.785699,-122.40576999999999,37.785658,-122.40568999999999,37.783249999999995,-122.40270699999999,37.778850999999996,-122.40827499999999,37.779104,-122.408577</coordinates>\n </LineString>\n </Placemark>\n <LineStyle>\n <color>#AD0101</color>\n </LineStyle>\n <Placemark>\n <name>82 X Wing</name>\n <description>\n <![CDATA[Click to add description.]]>\n </description>\n <LineString>\n <coordinates></coordinates>\n </LineString>\n </Placemark>\n <LineStyle>\n <color>#AD0101</color>\n </LineStyle>\n <Placemark>\n <name>93 X Wing</name>\n <description>\n <![CDATA[Click to add description.]]>\n </description>\n <LineString>\n <coordinates></coordinates>\n </LineString>\n </Placemark>\n </Folder>\n</kml>\n"
I'm not sure why it should be coming it escaped, since it definitely is not valid XML.
georuby does the same.
Does anyone know why it's coming out escaped and also how to unescape it?
Here's the code I'm using:
map = self;
kml = KMLFile.new
folder = KML::Folder.new(:name => map[:name])
map.lines.each do |line|
folder.features << KML::LineStyle.new(
color: line.color,
)
folder.features << KML::Placemark.new(
:name => line.name,
:geometry => KML::LineString.new(:coordinates => line.coordinates),
:description => line.description
)
end
kml.objects << folder
kml.render
Thanks!!!

How to parse a more complicated JSON object in Ruby on Sinatra

I'm a Java guy, new to Ruby. I've been playing with it just to see what it can do, and I'm running into an issue that I can't solve.
I decided to try out Sinatra, again, just to see what it can do, and decided to play with the ESPN API and see if I can pull the venue of a team via the API.
I'm able to make the call and get the data back, but I am having trouble parsing it:
{"sports"=>[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17"}, "news"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"}, "notes"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}}, "web"=>{"teams"=>{"href"=>"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}}, "mobile"=>{"teams"=>{"href"=>"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}], "resultsOffset"=>0, "resultsLimit"=>50, "resultsCount"=>1, "timestamp"=>"2013-08-04T14:47:13Z", "status"=>"success"}
I want to pull the venues part of the object, specifically the name value. Every time I try to parse it I end up getting an error along the lines of "cannot change from nil to string" and then also I've gotten an integer to string error.
Here's what i have so far:
get '/venue/:team' do
id = ids[params[:team]]
url = 'http://api.espn.com/v1/sports/baseball/mlb/teams/' + id + '?enable=venues&apikey=' + $key
resp = Net::HTTP.get_response(URI.parse(url))
data = resp.body
parsed = JSON.parse(resp.body)
#venueData = parsed["sports"]
"Looking for the venue of the #{params[:team]}, which has id " + id + ", and here's the data returned: " + venueData.to_s
end
When I do parsed["sports"} I get:
[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17"}, "news"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"}, "notes"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}}, "web"=>{"teams"=>{"href"=>"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}}, "mobile"=>{"teams"=>{"href"=>"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}]
But nothing else parses. Please help!
Like I said, I'm not trying to do anything fancy, just figure out Ruby a little for fun, but I have been stuck on this issue for days now. Any help would be appreciated!
EDIT:
JSON straight from the API:
{"sports" :[{"name" :"baseball","id" :1,"uid" :"s:1","leagues" :[{"name" :"Major League Baseball","abbreviation" :"mlb","id" :10,"uid" :"s:1~l:10","groupId" :9,"shortName" :"MLB","teams" :[{"id" :17,"uid" :"s:1~l:10~t:17","location" :"Cincinnati","name" :"Reds","abbreviation" :"CIN","color" :"D60042","venues" :[{"id" :83,"name" :"Great American Ball Park","city" :"Cincinnati","state" :"Ohio","country" :"","capacity" :0}],"links" :{"api" :{"teams" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17"},"news" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"},"notes" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}},"web" :{"teams" :{"href" :"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}},"mobile" :{"teams" :{"href" :"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}],"resultsOffset" :0,"resultsLimit" :50,"resultsCount" :1,"timestamp" :"2013-08-05T19:44:32Z","status" :"success"}
The result of data.inspect:
"{\"sports\" :[{\"name\" :\"baseball\",\"id\" :1,\"uid\" :\"s:1\",\"leagues\" :[{\"name\" :\"Major League Baseball\",\"abbreviation\" :\"mlb\",\"id\" :10,\"uid\" :\"s:1~l:10\",\"groupId\" :9,\"shortName\" :\"MLB\",\"teams\" :[{\"id\" :17,\"uid\" :\"s:1~l:10~t:17\",\"location\" :\"Cincinnati\",\"name\" :\"Reds\",\"abbreviation\" :\"CIN\",\"color\" :\"D60042\",\"venues\" :[{\"id\" :83,\"name\" :\"Great American Ball Park\",\"city\" :\"Cincinnati\",\"state\" :\"Ohio\",\"country\" :\"\",\"capacity\" :0}],\"links\" :{\"api\" :{\"teams\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17\"},\"news\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news\"},\"notes\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes\"}},\"web\" :{\"teams\" :{\"href\" :\"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public\"}},\"mobile\" :{\"teams\" :{\"href\" :\"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public\"}}}}]}]}],\"resultsOffset\" :0,\"resultsLimit\" :50,\"resultsCount\" :1,\"timestamp\" :\"2013-08-05T19:44:24Z\",\"status\" :\"success\"}"
parsed["sports"] does not exist, parse your input and inspect it/ dump it
With the data you've provided in the question, you can get to the venues information like this:
require 'json'
json = JSON.parse data
json["sports"].first["leagues"].first["teams"].first["venues"]
# => [{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}]
By replacing each of the first calls with an iterator, you can search through without knowing where the data is:
json["sports"].each{|h|
h["leagues"].each{|h|
h["teams"].each{|h|
venues = h["venues"].map{|h| h["name"]}.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
}
}
This outputs:
name: Cincinnati Reds venues: Great American Ball Park
Depending on how stable the response data is you may be able to cut out several of the iterators:
json["sports"].first["leagues"]
.first["teams"]
.each{|h|
venues = h["venues"].map{|h| h["name"] }.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
and you'll most likely want to save the data, so something like each_with_object is helpful:
team_and_venues = json["sports"].first["leagues"]
.first["teams"]
.each_with_object([]){|h,xs|
venues = h["venues"].map{|h| h["name"]}.join(", ")
xs << %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
team_and_venues
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
Notice that when an iterator declares variables, even if there is a variable with the same name outside the block, the scope of the block is respected and the block's variables remain local.
That's some pretty ugly code if you ask me, but it's a place to start.

How do I parse Google image URLs using Ruby and Nokogiri?

I'm trying to make an array of all the image files on a Google images webpage.
I want a regular expression to pull everything after "imagurl=" and ending before "&amp" as seen in this HTML:
<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>
I feel like I can do this with a regex, but I can't find a way to search my parsed document using regex, but I'm not finding any solutions.
str = '<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>'
str.split('imgurl=')[1].split('&amp')[0]
#=> "http://www.trendytree.com/old-world- christmas/images/20031chapel20031-silent-night-chapel.jpg"
Is that what you're looking for?
The problem with using a regex is you assume too much knowledge about the order of parameters in the URL. If the order changes, or & disappears the regex won't work.
Instead, parse the URL, then split the values out:
# encoding: UTF-8
require 'nokogiri'
require 'cgi'
require 'uri'
doc = Nokogiri::HTML.parse('<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>')
img_url = doc.search('a').each do |a|
query_params = CGI::parse(URI(a['href']).query)
puts query_params['imgurl']
end
Which outputs:
http://www.trendytree.com/old-world-christmas/images/20031chapel20031-silent-night-chapel.jpg
Both URI and CGI are used because URI's decode_www_form raises an exception when trying to decode the query.
I've also been known to decode the query string into a hash using something like:
Hash[URI(a['href']).query.split('&').map{ |p| p.split('=') }]
That will return:
{"imgurl"=>
"http://www.trendytree.com/old-world-christmas/images/20031chapel20031-silent-night-chapel.jpg",
"imgrefurl"=>
"http://www.trendytree.com/old-world-christmas/silent-night-chapel-20031-christmas-ornament-old-world-christmas.html",
"usg"=>"__YJdf3xc4ydSfLQa9tYnAzavKHYQ",
"h"=>"400",
"w"=>"400",
"sz"=>"58",
"hl"=>"en",
"start"=>"19",
"zoom"=>"1",
"tbnid"=>"ajDcsGGs0tgE9M:",
"tbnh"=>"124",
"tbnw"=>"124",
"ei"=>"qagfUbXmHKfv0QHI3oG4CQ",
"itbs"=>"1",
"sa"=>"X",
"ved"=>"0CE4QrQMwEg"}
To get all the img urls you want do
# get all links
url = 'some-google-images-url'
links = Nokogiri::HTML( open(url) ).css('a')
# get regex match or nil on desired img
img_urls = links.map {|a| a['href'][/imgurl=(.*?)&/, 1] }
# get rid of nils
img_urls.compact
The regex you want is /imgurl=(.*?)&/ because you want a non-greedy match between imgurl= and &, otherwise the greedy .* would take everything to the last & in the string.

Resources