Ruby get RSS feed won't get the latest feed - ruby

I have a problem parsing an RSS feed.
When I do this:
feed = getFeed("http://example.com/rss)
If the feed content changes it don't update.
If I do it like this:
feed = getFeed("http://example.com/rss?" + Random.rand(20).to_s)
It works most of the time but not always.
getFeed() is implemented like this:
def getFeed(url)
rss_content = ""
open(url) do |f|
rss_content = f.read
end
return rss_content
end
I used this in Sinatra with Ruby 1.9.3, if this make a difference.
On my opinion somewhere it gets cached but I have no idea where.
Edit:
Okey after 1/2 day running on the server it works with out a problem.

This:
feed = getFeed("http://example.com/rss?" + Random.rand(20).to_s)
implies the problem is with caching, but Ruby, OpenURI and Sinatra shouldn't be caching anything. Perhaps your code is running behind a caching device or app that is handling outgoing requests as well as incoming?
This isn't the fix, but your code can be streamlined greatly:
def getFeed(url)
open(url).read
end

Related

Use with Excel data to display on Dashing Dashboard?

I'm trying to get an example of the following code from github that looks to be a dead topic for my Linux/Ubuntu install. I have been trying to scrape data from my company intranet using "mechanize" see stack question for details. Since I'm not smart enough to figure a way around my login issue I thought I would try and feed data from an excel sheet as a work around until I can figure out the mechanize route. Once again I'm not smart enough to get the provided code to work on Linux because I'm getting the following error:
`kqueue=': kqueue is not supported on this platform (EventMachine::Unsupported)
If I'm understanding correctly from the information provided in the original source, the problem is that kqueue isn't supported in Linux. The OP states that inotify is an alternative but I've had no luck finding a similar example using it to display Excel in a widget.
Here is the code that is shown on GitHub and would like help converting it to work on Linux:
require 'roo'
EM.kqueue = EM.kqueue?
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell(1, 2) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
EM.next_tick do
EM.watch_file(file_path, Handler)
end
Okay, so I was able to get this working and to display my data on a Dashing Dashboard widget by doing the following:
First: I uploaded my spreadsheet.xls to the root directory of my dashboard.
Second: I replaced the /jobs/sample.rb code with:
#!/usr/bin/env ruby
require 'roo'
SCHEDULER.every '2s' do
file_path = "#{Dir.pwd}/spreadsheet.xls"
def fetch_spreadsheet_data(path)
s = Roo::Excel.new(path)
send_event('valuation', { current: s.cell('B',49) })
end
module Handler
def file_modified
fetch_spreadsheet_data(path)
end
end
fetch_spreadsheet_data(file_path)
end
Third: Make sure the /widgets/number is in your dashboard "this is part of the sample install".
Fourth: Add the following code to your /dashboards/sample.erb file "this is part of the sample install as well".
<li data-row="1" data-col="1" data-sizex="1" data-sizey="1">
<div data-id="valuation" data-view="Number" data-title="Current Valuation" data-prefix="$"></div>
</li>
I used this source to help me better understand how Roo works. I tested my widget by changing my values and re-uploading the spreadsheet.xls to server and seen instant changes on my dashboard.
Hope this helps someone and I'm still looking for help to automate this process by scraping the data. Reference this if you can help.
Thanks for sharing this code sample. I did not manage to make it work in my environment (Raspberry/Raspbian) but after some efforts I managed to come up something that works -- at least for me ;)
I had never worked with Ruby before this week, so this code may be a bit crappy. Please accept apologizes.
-- Christophe
require 'roo'
require 'rubygems'
require 'rb-inotify'
# Implement INotify::Notifier.watch as described here:
# https://www.go4expert.com/articles/track-file-changes-ruby-inotify-t30264/
file_path = "#{Dir.pwd}/datasheet.csv"
def fetch_spreadsheet_data(path)
s = Roo::CSV.new(path)
send_event('csvdata', { value: s.cell(1, 1) })
end
SCHEDULER.every '5s' do
notifier = INotify::Notifier.new
notifier.watch(file_path, :modify) do |event|
event.flags.each do |flag|
## convert to string
flag = flag.to_s
puts case flag
when 'modify' then fetch_spreadsheet_data(file_path)
end
end
end
## loop, wait for events from inotify
notifier.process
end

Setting an HTTP Timeout in Ruby 1.9.3

I'm using Ruby 1.9.3 and need to GET a URL. I have this working with Net::HTTP, however, if the site is down, Net::HTTP ends up hanging.
While searching the internet, I've seen many people faced similar problems, all with hacky solutions. However, many of those posts are quite old.
Requirements:
I'd prefer using Net::HTTP to installing a new gem.
I need both the Body and the Response Code. (e.g. 200)
I do not want to require open-uri, since that makes global changes and raises some security issues.
I need to GET a URL within X seconds, or return error.
Using Ruby 1.9.3, how can I GET a URL while setting a timeout?
To clarify, my existing code looks like:
Net::HTTP.get_response(URI.parse(url))
Trying to add:
Net::HTTP.open_timeout(1000)
Results in:
NoMethodError: undefined method `open_timeout' for Net::HTTP:Class
You can set the open_timeout attribute of the Net::HTTP object before making the connection.
uri = URI.parse(url)
Net::HTTP.new(uri.hostname, uri.port) do |http|
http.open_timeout = 1000
response = http.request_get(uri.request_uri)
end
I tried all the solutions here and on the other questions about this problem but I only got everything right with the following code, The open-uri gem is a wrapper for net::http.
I needed a get that had to wait longer than the default timeout and read the response. The code is also simpler.
require 'open-uri'
open(url, :read_timeout => 5 * 60) do |response|
if response.read[/Return: Ok/i]
log "sending ok"
else
raise "error sending, no confirmation received"
end
end

Getting Data from Public Google Calendars with Ruby

I want to download and parse all the event data from a website's public google calendar, what would be the best way to do so? I'm considering just downloading the ics file or getting xml data and parsing that myself. I've looked into google api but it looks unnecessarily complex if all I want to do is read the data. I'm a beginner to working with API's and programming in general so I'm having trouble navigating all that documentation. They don't provide very many helpful examples.
How about something like this:
require 'ri_cal'
require 'open-uri'
components = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
components = RiCal.parse(cal)
end
components.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end
You will need to install the ri_gem.
UPDATE: Using iCalendar
require 'icalendar'
require 'open-uri'
calendars = nil
open("https://www.google.com/calendar/ical/ocs.events%40gmail.com/public/basic.ics") do |cal|
#calendars = RiCal.parse(cal)
calendars = Icalendar.parse(cal)
end
calendars.each do |calendar|
calendar.events.each do |event|
puts "#{event.summary} starts at: #{event.dtstart} and ends at #{event.dtend}"
end
end

Ruby and Timeout.timeout performance issue

I'm not sure how to solve this big performance issue of my application. I'm using open-uri to request the most popular videos from youtube and when I ran perftools https://github.com/tmm1/perftools.rb
It shows that the biggest performance issue is Timeout.timeout. Can anyone suggest me how to solve the problem?
I'm using ruby 1.8.7.
Edit:
This is the output from my profiler
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B4bANr--YcONZDRlMmFhZjQtYzIyOS00YjZjLWFlMGUtMTQyNzU5ZmYzZTU4&hl=en_US
Timeout is wrapping the function that is actually doing the work to ensure that if the server fails to respond within a certain time, the code will raise an error and stop execution.
I suspect that what you are seeing is that the server is taking some time to respond. You should look at caching the response in some way.
For instance, using memcached (pseudocode)
require 'dalli'
require 'open-uri'
DALLI = Dalli.client.new
class PopularVideos
def self.get
result = []
unless result = DALLI.get("videos_#{Date.today.to_s}")
doc = open("http://youtube/url")
result = parse_videos(doc) # parse the doc somehow
DALLI.set("videos_#{Date.today.to_s}", result)
end
result
end
end
PopularVideos.get # calls your expensive parsing script once
PopularVideos.get # gets the result from memcached for the rest of the day

Download an image from a URL?

I am trying to use HTTP::get to download an image of a Google chart from a URL I created.
This was my first attempt:
failures_url = [title, type, data, size, colors, labels].join("&")
require 'net/http'
Net::HTTP.start("http://chart.googleapis.com") { |http|
resp = http.get("/chart?#{failures_url")
open("pie.png" ,"wb") { |file|
file.write(resp.body)
}
}
Which produced only an empty PNG file.
For my second attempt I used the value stored inside failure_url inside the http.get() call.
require 'net/http'
Net::HTTP.start("http://chart.googleapis.com") { |http|
resp = http.get("/chart?chtt=Builds+in+the+last+12+months&cht=bvg&chd=t:296,1058,1217,1615,1200,611,2055,1663,1746,1950,2044,2781,1553&chs=800x375&chco=4466AA&chxl=0:|Jul-2010|Aug-2010|Sep-2010|Oct-2010|Nov-2010|Dec-2010|Jan-2011|Feb-2011|Mar-2011|Apr-2011|May-2011|Jun-2011|Jul-2011|2:|Months|3:|Builds&chxt=x,y,x,y&chg=0,6.6666666666666666666666666666667,5,5,0,0&chxp=3,50|2,50&chbh=23,5,30&chxr=1,0,3000&chds=0,3000")
open("pie.png" ,"wb") { |file|
file.write(resp.body)
}
}
And, for some reason, this version works even though the first attempt had the same data inside the http.get() call. Does anyone know why this is?
SOLUTION:
After trying to figure why this is happening I found "How do I download a binary file over HTTP?".
One of the comments mentions removing http:// in the Net::HTTP.start(...) call otherwise it won't succeed. Sure enough after I did this:
failures_url = [title, type, data, size, colors, labels].join("&")
require 'net/http'
Net::HTTP.start("chart.googleapis.com") { |http|
resp = http.get("/chart?#{failures_url")
open("pie.png" ,"wb") { |file|
file.write(resp.body)
}
}
it worked.
I'd go after the file using Ruby's Open::URI:
require "open-uri"
File.open('pie.png', 'wb') do |fo|
fo.write open("http://chart.googleapis.com/chart?#{failures_url}").read
end
The reason I prefer Open::URI is it handles redirects automatically, so WHEN Google makes a change to their back-end and tries to redirect the URL, the code will handle it magically. It also handles timeouts and retries more gracefully if I remember right.
If you must have lower level control then I'd look at one of the many other HTTP clients for Ruby; Net::HTTP is fine for creating new services or when a client doesn't exist, but I'd use Open::URI or something besides Net::HTTP until the need presents itself.
The URL:
http://chart.googleapis.com/chart?chtt=Builds+in+the+last+12+months&cht=bvg&chd=t:296,1058,1217,1615,1200,611,2055,1663,1746,1950,2044,2781,1553&chs=800x375&chco=4466AA&chxl=0:|Jul-2010|Aug-2010|Sep-2010|Oct-2010|Nov-2010|Dec-2010|Jan-2011|Feb-2011|Mar-2011|Apr-2011|May-2011|Jun-2011|Jul-2011|2:|Months|3:|Builds&chxt=x,y,x,y&chg=0,6.6666666666666666666666666666667,5,5,0,0&chxp=3,50|2,50&chbh=23,5,30&chxr=1,0,3000&chds=0,3000
makes URI upset. I suspect it is seeing characters that should be encoded in URLs.
For documentation purposes, here is what URI says when trying to parse that URL as-is:
URI::InvalidURIError: bad URI(is not URI?)
If I encode the URI first, I get a successful parse. Testing further using Open::URI shows it is able to retrieve the document at that point and returns 23701 bytes.
I think that is the appropriate fix for the problem if some of those characters are truly not acceptable to URI AND they are out of the RFC.
Just for information, the Addressable::URI gem is a great replacement for the built-in URI.
resp = http.get("/chart?#{failures_url")
If you copied your original code then you're missing a closing curly bracket in your path string.
Your original version did not have the parameter name for each parameter, just the data. For example, on the title, you cannot just submit "Builds+in+the+last+12+months", but instead it must be "chtt=Builds+in+the+last+12+months".
Try this:
failures_url = ["title="+title, "type="+type, "data="+data, "size="+size, "colors="+colors, "labels="+labels].join("&")

Resources