How to parse rMeetup response - ruby

I'm using rMeetup gem that queries via api version 2 and I don't understand how to extract the "members" value from a response. Here is where I get stuck (using irb for this example):
>> require 'rmeetup'
=> true
>> client = RMeetup::Client.new do |config| config.api_key = "LALAMYKEYNOTYOURS" end
=> #<RMeetup::Client:0x007fbda4b58060 #configuration=#<RMeetup::Configuration:0x007fbda4b63fa0 #api_key="LALAMYKEYNOTYOURS">>
>> results = client.fetch(:groups, {:group_urlname => 'San-Francisco-Riak-Meetup'})
=> [#<RMeetup::Type::Group:0x007fbda4b80088 #group={"utc_offset"=>-25200000, "country"=>"US", "visibility"=>"public", "city"=>"San Francisco", "timezone"=>"US/Pacific", "created"=>1278976613000, "topics"=>[{"urlkey"=>"opensource", "name"=>"Open Source", "id"=>563}, {"urlkey"=>"web", "name"=>"Web Technology", "id"=>10209}, {"urlkey"=>"big-data", "name"=>"Big Data", "id"=>18062}, {"urlkey"=>"database-development", "name"=>"Database Development", "id"=>21506}, {"urlkey"=>"erlang-programming", "name"=>"Erlang Programming", "id"=>46514}, {"urlkey"=>"nosql", "name"=>"NoSQL", "id"=>58162}, {"urlkey"=>"riak", "name"=>"Riak", "id"=>112355}, {"urlkey"=>"distributed-systems", "name"=>"Distributed Systems", "id"=>113032}], "link"=>"http://www.meetup.com/San-Francisco-Riak-Meetup/", "rating"=>4.57, "description"=>"<p>A monthly meetup for those in the Bay Area to talk Riak, distributed systems, and app. development.</p>", "lon"=>-122.4000015258789, "group_photo"=>{"highres_link"=>"http://photos4.meetupstatic.com/photos/event/e/6/9/e/highres_16559038.jpeg", "photo_id"=>16559038, "photo_link"=>"http://photos4.meetupstatic.com/photos/event/e/6/9/e/600_16559038.jpeg", "thumb_link"=>"http://photos2.meetupstatic.com/photos/event/e/6/9/e/thumb_16559038.jpeg"}, "join_mode"=>"open", "organizer"=>{"member_id"=>140545442, "name"=>"Basho"}, "members"=>696, "name"=>"San Francisco Riak Meetup", "id"=>1674527, "state"=>"CA", "urlname"=>"San-Francisco-Riak-Meetup", "category"=>{"name"=>"tech", "id"=>34, "shortname"=>"tech"}, "lat"=>37.790000915527344, "who"=>"Riaktors"}>]
>> results.each do |k| puts k["members"] end
This is likely my misunderstanding of how to query the #group within this result. I haven't found anything that clarifies it despite similar questions on SO and other sites.

I figured this out today. This is an example of confusing a method with a hash. The proper syntax is:
results.each do |k| puts k.members end
Because members is a method of the Meetup::Type::Group, at least from the looks of it. That's not how it's documented, but it works.
The original syntax of k["members"] would work if k was a hash:
irb(main):027:0> k = {"members" => 1000}
=> {"members"=>1000}
irb(main):028:0> k["members"]
=> 1000

Related

How to to parse HTML contents of a page using Nokogiri

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = 'https://www.trumba.com/calendars/smithsonian-events.xml'
doc = Nokogiri::XML(open url)
I am trying to fetch the basic set of information like:
event_name
categories
sponsor
venue
event_location
cost
For example, for event_name I have this xpath:
"/html/body/div[2]/div[2]/div[1]/h3/a/span"
And use it like:
puts doc.xpath "/html/body/div[2]/div[2]/div[1]/h3/a/span"
This returns nil for event_name.
If I save the URL contents locally then above XPath works.
Along with this, I need above mentioned information as well. I checked the other XPaths too, but the result turns out to be blank.
Here's how I'd go about doing this:
require 'nokogiri'
doc = Nokogiri::XML(open('/Users/gferguson/smithsonian-events.xml'))
namespaces = doc.collect_namespaces
entries = doc.search('entry').map { |entry|
entry_title = entry.at('title').text
entry_time_start, entry_time_end = ['startTime', 'endTime'].map{ |p|
entry.at('gd|when', namespaces)[p]
}
entry_notes = entry.at('gc|notes', namespaces).text
{
title: entry_title,
start_time: entry_time_start,
end_time: entry_time_end,
notes: entry_notes
}
}
Which, when run, results in entries being an array of hashes:
require 'awesome_print'
ap entries [0, 3]
# >> [
# >> [0] {
# >> :title => "Conservation Clinics",
# >> :start_time => "2016-11-09T14:00:00Z",
# >> :end_time => "2016-11-09T17:00:00Z",
# >> :notes => "Have questions about the condition of a painting, frame, drawing,\n print, or object that you own? Our conservators are available by\n appointment to consult with you about the preservation of your art.\n \n To request an appointment or to learn more,\n e-mail DWRCLunder#si.edu and specify CLINIC in the subject line."
# >> },
# >> [1] {
# >> :title => "Castle Highlights Tour",
# >> :start_time => "2016-11-09T14:00:00Z",
# >> :end_time => "2016-11-09T14:45:00Z",
# >> :notes => "Did you know that the Castle is the Smithsonian’s first and oldest building? Join us as one of our dynamic volunteer docents takes you on a tour to explore the highlights of the Smithsonian Castle. Come learn about the founding and early history of the Smithsonian; its original benefactor, James Smithson; and the incredible history and architecture of the Castle. Here is your opportunity to discover the treasured stories revealed within James Smithson's crypt, the Gre...
# >> },
# >> [2] {
# >> :title => "Exhibition Interpreters/Navigators (throughout the day)",
# >> :start_time => "2016-11-09T15:00:00Z",
# >> :end_time => "2016-11-09T15:00:00Z",
# >> :notes => "Museum volunteer interpreters welcome visitors, answer questions, and help visitors navigate exhibitions. Interpreters may be stationed in several of the following exhibitions at various times throughout the day, subject to volunteer interpreter availability. <ul> \t<li><em>The David H. Koch Hall of Human Origins: What Does it Mean to be Human?</em></li> \t<li><em>The Sant Ocean Hall</em></li> </ul>"
# >> }
# >> ]
I didn't try to gather the specific information you asked for because event_name doesn't exist and what you're doing is very generic and easily done once you understand a few rules.
XML is generally very repetitive because it represents tables of data. The "cells" of the table might vary but there's repetition you can use to help you. In this code
doc.search('entry')
loops over the <entry> nodes. Then it's easy to look inside them to find the information needed.
The XML uses namespaces to help avoid tag-name collisions. At first those seem really hard, but Nokogiri provides the collect_namespaces method for the document that returns a hash of all namespaces in the document. If you're looking for a namespaces-tag, pass that hash as the second parameter.
Nokogiri allows us to use XPath and CSS for selectors. I almost always go with CSS for readability. ns|tag is the format to tell Nokogiri to use a CSS-based namespaced tag. Again, pass it the hash of namespaces in the document and Nokogiri will do the rest.
If you're familiar with working with Nokogiri you'll see the above code is very similar to normal code used to pull the content of <td> cells inside <tr> rows in an HTML <table>.
You should be able to modify that code to gather the data you need without risking namespace collisions.
The provided link contains XML, so your XPath expressions should work with XML structure.
The key thing is that the document has namespaces. As I understand all XPath expressions should keep that in mind and specify namespaces too.
In order to simply XPath expressions one can use the remove_namespaces! method:
require 'nokogiri'
require 'open-uri'
url = 'https://www.trumba.com/calendars/smithsonian-events.xml'
doc = Nokogiri::XML(open(url)); nil # nil is used to avoid huge output
doc.remove_namespaces!; nil
event = doc.xpath('//feed/entry[1]') # it will give you the first event
event.xpath('./title').text # => "Conservation Clinics"
event.xpath('./categories').text # => "Demonstrations,Lectures & Discussions"
Most likely you would like to have array of all event hashes.
You can do it like:
doc.xpath('//feed/entry').reduce([]) do |memo, event|
event_hash = {
title: event.xpath('./title').text,
categories: event.xpath('./categories').text
# all other attributes you need ...
}
memo << event_hash
end
It will give you an array like:
[
{:title=>"Conservation Clinics", :categories=>"Demonstrations,Lectures & Discussions"},
{:title=>"Castle Highlights Tour", :categories=>"Gallery Talks & Tours"},
...
]

Structuring Nokogiri output without HTML tags

I got Ruby to travel to a web site, iterate through a list of campaigns and scrape the pages for specific data. The problem I have now is getting it from the structure Nokogiri gives me, and outputting it into a readable form.
campaign_list = Array.new
campaign_list.push(1042360, 1042386, 1042365, 992307)
browser = Watir::Browser.new :chrome
browser.goto '<redacted>'
browser.text_field(:id => 'email').set '<redacted>'
browser.text_field(:id => 'password').set '<redacted>'
browser.send_keys :enter
file = File.new('hourlysales.csv', 'w')
data = {}
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
puts "Error loading page, I recommend restarting script"
# Possibly automatic restart of script
else
hourly_data = Nokogiri::HTML.parse(browser.html).text
# file.write data
puts hourly_data
end
This is the output I get:
{"views":[[17,145],[18,165],[19,99],[20,71],[21,31],[22,26],[23,10],[0,15],[1,1], [2,18],[3,19],[4,35],[5,47],[6,44],[7,67],[8,179],[9,141],[10,112],[11,95],[12,46],[13,82],[14,79],[15,70],[16,103]],"orders":[[17,10],[18,9],[19,5],[20,1],[21,1],[22,0],[23,0],[0,1],[1,0],[2,1],[3,0],[4,1],[5,2],[6,1],[7,5],[8,11],[9,6],[10,5],[11,3],[12,1],[13,2],[14,4],[15,6],[16,7]],"conversion_rates":[0.06870229007633588,0.05442176870748299,0.050505050505050504,0.014084507042253521,0.03225806451612903,0.0,0.0,0.06666666666666667,0.0,0.05555555555555555,0.0,0.02857142857142857,0.0425531914893617,0.022727272727272728,0.07462686567164178,0.06134969325153374,0.0425531914893617,0.044642857142857144,0.031578947368421054,0.021739130434782608,0.024390243902439025,0.05063291139240506,0.08571428571428572,0.06741573033707865]}
The arrays stand for { views [[hour, # of views], [hour, # of views], etc. }. Same with orders. I don't need conversion rates.
I also need to add the values up for each key, so after doing this for 5 pages, I have one key for each hour of the day, and the total number of views for that hour. I tried a couple each loops, but couldn't make any progress.
I appreciate any help you guys can give me.
It looks like the output (which from your code I assume is the content of hourly_data) is JSON. In that case, it's easy to parse and add up the numbers. Something like this:
require "json" # at the top of your script
# ...
def sum_hours_values(data, hours_values=nil)
# Start with an empty hash that automatically initializes missing keys to `0`
hours_values ||= Hash.new {|hsh,hour| hsh[hour] = 0 }
# Iterate through the [hour, value] arrays, adding `value` to the running
# count for that `hour`, and return `hours_values`
data.each_with_object(hours_values) do |(hour, value), hsh|
hsh[hour] += value
end
end
# ... Watir/Nokogiri stuff here...
# Initialize these so they persist outside the loop
hours_views, orders_views = nil
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
# ...
else
# ...
hourly_data_parsed = JSON.parse(hourly_data)
hours_views = sum_hours_values(hourly_data_parsed["views"], hours_views)
hours_orders = sum_hours_values(hourly_data_parsed["orders"], orders_views)
end
end
puts "Views by hour:"
puts hours_views.sort.map {|hour_views| "%2i\t%4i" % hour_views }
puts "Orders by hour:"
puts hours_orders.sort.map {|hour_orders| "%2i\t%4i" % hour_orders }
P.S. There's a really nice recursive version of sum_hours_values I didn't include since the iterative version is clearer to most Ruby programmers. If you're into recursion I leave it as an exercise for you. ;)

Mongodb not inserting Ruby time.new consistantly on Heroku

Built a small app to grab Tweets from political candidates for the upcoming election. Using Ruby, Twitterstream, Mongodb and Heroku.
The time is being inserted into the database inconsistantly. Sometimes it works, sometimes it doesn't. Is this my code, Heroku or Mongodb (Mongohq). I have a support question in.
Working
{
_id: ObjectId("52556b5bd2d9530002000002"),
time: ISODate("2013-10-09T14:42:35.044Z"),
user: "Blondetigressnc",
userid: 1342776674,
tweet: "RT #GovBrewer: Mr. President #BarackObama, reopen America’s National Parks or let the states do it. #GrandCanyon #Lead http://t.co/kkPKt9B7…",
statusid: "387951226866110464"
}
Not working
{
_id: ObjectId("52556c2454d4ad0002000016"),
user: "PeterMcC66",
userid: 1729065984,
tweet: "#GovBrewer #Blondetigressnc #BarackObama Time to impeach surely?",
statusid: "387952072223506432"
}
Seems random. See anything wrong or stupid in my code?
require 'rubygems'
require 'tweetstream'
require 'mongo'
# user ids
users = 'list of Twitter user ids here'
# connect to stream
TweetStream.configure do |config|
config.consumer_key = ENV['T_KEY']
config.consumer_secret = ENV['T_SECRET']
config.oauth_token = ENV['T_TOKEN']
config.oauth_token_secret = ENV['T_TOKEN_SECRET']
config.auth_method = :oauth
end
# connection to database
if ENV['MONGOHQ_URL']
uri = URI.parse(ENV['MONGOHQ_URL'])
conn = Mongo::Connection.from_uri(ENV['MONGOHQ_URL'])
DB = conn.db(uri.path.gsub(/^\//, ''))
else
DB = Mongo::Connection.new.db("tweetsDB")
end
# creation of collections
tweets = DB.create_collection("tweets")
deleted = DB.create_collection("deleted-tweets")
#client = TweetStream::Client.new
#client.on_delete do | status_id, user_id |
puts "#{status_id}"
timenow = Time.new
id = status_id.to_s
deleted.insert({ :time => timenow, :user_id => user_id, :statusid => id })
end
#client.follow(users) do |status|
puts "[#{status.user.screen_name}] #{status.text}"
timenow = Time.new
id = status.id
tweets.insert({ :time => timenow, :user => status.user.screen_name, :userid => status.user.id, :tweet => status.text, :statusid => id.to_s })
end
The issue is that you need to use a UTC time, not your local timezone. This is not a MongoDB or a Ruby driver issue, its a constraint of the BSON spec and the ISODate BSON type.
http://docs.mongodb.org/manual/reference/bson-types/#date
http://bsonspec.org/#/specification
Also, just a good practice though.
General word of advice: Use UTC on the back-end of whatever you're building always anyway, regardless of what datastore you're using (not a MongoDB specific thing). This is especially true if this data is something you want to query on directly.
If you need to convert to a local timezone, its best to handle that when you display or output the data rather than trying to manage that elsewhere. Some of the most fantastic bugs I've ever seen were related to inconsistent handling of timezones in the persistence layer of the application.
Keep those times consistent on the back-and, deal with local timezone conversion when in your application and life will be much easier for you.
Here is an examples of how to work with times in MongoDB using Ruby:
require 'time' # required for ISO-8601
require 'mongo'
include Mongo
client = MongoClient.new
coll = client['example_database']['example_collection']
coll.insert({ 'updated_at' => Time.now.utc })
doc = coll.find_one()
doc['updated_it'].is_a?(Time) #=> true
doc['updated_at'].to_s #=> "2013-10-07 22:43:52 UTC"
doc['updated_at'].iso8601 #=> "2013-10-07T22:43:52Z"
doc['updated_at'].strftime("updated at %m/%d/%Y") #=> "updated at 10/07/2013"
I keep a gist of this available here:
https://gist.github.com/brandonblack/6876374

how to use Xively API Library on Ruby?

i'm triyng to upload some data to xively from ruby, i did install all the gems and this test code runs ok, but nothing changes in the xively graph of my device.
This small code was isolated from the fragment of a bigger code that works fine, and post data to my server with an interface written in php, but now i want to use xively to log the data.
I did remove my personal data from this code, API_KEY, Feed number and feed Name.
#!/usr/bin/ruby
require 'rubygems'
require 'json'
require 'xively-rb'
##Creating the xively client instance
API_KEY = "MY_API_KEY_WAS_HERE"
client = Xively::Client.new(API_KEY)
#on an endless loop
while true
#n is a random float between 0 y 1
n = rand()
##Creating datapoint and sendig it to xively
puts "Creating datapoint "+Time.now.to_s+", "+n.to_s+" and sending it to xively"
datapoint = Xively::Datapoint.new(:at => Time.now, :value => n)
client.post('/api/v2/feeds/[number]/datastreams/[name]', :body => {:datapoints => [datapoint]}.to_json)
end
it would be nice to get an example on how to use that library, i didn't find any concise example.
(It's possible to find some silly errors in the code, if it's so, it's ok because im learning ruby at the moment, if it isn't critical just point it out briefly to not go offtopic, i will be happy to research and learn later)
im really looking forward for some answer, so thanks in advance.
I receive a solution that works from a classmate, it was in a post about Cosm the beta of what now is xively, what previously was pachube also.
we were about two weeks looking for something like this:
afulki.net more-on-ruby-and-cosm
#!/usr/bin/ruby
require 'xively-rb'
require 'json'
require 'rubygems'
class XivelyConnector
API_KEY = 'MY_API_KEY_HARD-CODED_HERE'
def initialize( xively_feed_id )
#feed_id = xively_feed_id
#xively_response = Xively::Client.get("/v2/feeds/#{#feed_id}.json", :headers => {"X-ApiKey" => API_KEY})
end
def post_polucion( sensor, polucion_en_mgxm3 )
return unless has_sensor? sensor
post_path = "/v2/feeds/#{#feed_id}/datastreams/#{sensor}/datapoints"
datapoint = Xively::Datapoint.new(:at => Time.now, :value => polucion_en_mgxm3.to_s )
response = Xively::Client.post(post_path,
:headers => {"X-ApiKey" => API_KEY},
:body => {:datapoints => [datapoint]}.to_json)
end
def has_sensor?( sensor )
#xively_response["datastreams"].index { |ds| ds["id"] == sensor }
end
end
Using that class:
#!/usr/bin/ruby
require 'rubygems'
require 'json'
require 'xively-rb'
require_relative 'XivelyConnector'
xively_connector = XivelyConnector.new( MY_FEED_ID_HERE )
while true
n = rand()
xively_connector.post_polucion 'Sensor-Asdf', n
sleep 1
end

Feedzirra cannot parse atom feeds

The idea of having a single parser for any kind of feed is great and was hoping that it would work for me.
I have been trying to get feedzirra to parse atom feeds.
specifically:
http://pindancing.blogspot.com/feeds/posts/default
http://adam.heroku.com/feed
Those are just 2 that I tried with the problem is that feedzirra cannot parse the
entry URL. It always comes out nil
feed = Feedzirra::Feed.fetch_and_parse(search.rss_feed_url)
p feed.entries.first.title
p feed.entries.first.url #=> returns nil
Is there anything I need to do to get it working?
thanks for your help
Hate to say "works for me", but, well, works for me:
require 'Feedzirra'
urls = %w{
http://adam.heroku.com/feed
http://pindancing.blogspot.com/feeds/posts/default
}
urls.each do |url|
feed = Feedzirra::Feed.fetch_and_parse(url)
puts feed.entries.first.title
puts feed.entries.first.url
end
# => Memcached, a Database?
# => http://adam.heroku.com/past/2010/7/19/memcached_a_database/
# => The answer to "Will you mentor me?" is
# => http://pindancing.blogspot.com/2010/12/answer-to-will-you-mentor-me-is.html
It'd help to see the rest of your code, particularly the actual parameter you're using in the fetch_and_parse method.

Resources