I can't seem to get this RSS feed to work properly. I've tried Nokogiri and now RSS::Parser and neither work:
a = 'https://phys.org/rss-feed/biology-news/biology-other/'
URI.open(a) do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
The code is taken directly out of the docs: https://github.com/ruby/rss
The feed is valid, so I'm confused as to why there's a 400 error code.
What am I doing wrong? Anybody have insight as to how to get this RSS parsed?
Here is the error:
/Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:364:in `open_http': 400 Bad request (OpenURI::HTTPError)
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:741:in `buffer_open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:212:in `block in open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `catch'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:151:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/open_uri_redirections-0.2.1/lib/open-uri/redirections_patch.rb:55:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:721:in `open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:29:in `open'
from /users/user3/app.rb:1856:in `<main>'
The web server requires the request to have a User-Agent set in the headers. Without such a User-Agent header it returns the 400 error message.
require 'uri'
require 'open-uri'
require 'rss'
uri = URI.parse("https://phys.org/rss-feed/biology-news/biology-other/")
uri.open("User-Agent" => "Ruby/#{RUBY_VERSION}") do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
This code work for me.
Related
Thanks for your time. Somewhat new to OOP and Ruby and after synthesizing solutions from a few different stack overflow answers I've got myself turned around.
My goal is to write a script that parses a CSV of URLs using Nokogiri library. After trying and failing to use open-uri and the open-uri-redirections plugin to follow redirects, I settled on Net::HTTP and that got me moving...until I ran into URLs that have a 302 redirect specifically.
Here's the method I'm using to engage the URL:
require 'Nokogiri'
require 'Net/http'
require 'csv'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
#puts "The value of uri_str is: #{ uri_str}"
#puts "The value of URI.parse(uri_str) is #{ url }"
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
# puts "THE URL IS #{url.scheme + ":" + url.host + url.path}" # just a reporter so I can see if it's mangled
response = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
#puts "Problem clause!"
response.error!
end
end
Further down in my script I take an ARGV with the URL csv filename, do CSV.read, encode the URL to a string, then use Nokogiri::HTML.parse to turn it all into something I can use xpath selectors to examine and then write to an output CSV.
Works beautifully...so long as I encounter a 200 response, which unfortunately is not every website. When I run into a 302 I'm getting this:
C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1570:in `addr_port': undefined method `+' for nil:NilClass (NoMethodError)
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1503:in `begin_transport'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1442:in `transport_request'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1416:in `request'
from httpcsv.rb:14:in `block in fetch'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:877:in `start'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:608:in `start'
from httpcsv.rb:14:in `fetch'
from httpcsv.rb:17:in `fetch'
from httpcsv.rb:42:in `block in <main>'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from httpcsv.rb:38:in `<main>'
I know I'm missing something right in front of me but I can't tell what I should puts to see if it is nil. Any help is appreciated, thanks in advance.
My goal is to read a CSV file, get each ID from that file's records, use each ID into the Meetup API URL and then create a new CSV file with certain values from the JSON response.
Here's what I have so far:
require "net/https"
require "uri"
require 'csv'
require 'json'
membersCSV = CSV.foreach('id-members-meetup.csv') do |row|
id = row[1]
uri = URI.parse("https://api.meetup.com/2/members?order=name&member_id=" + id + "&format=json&key=MY_KEY")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
CSV.open("ghmeetup.csv", "w", {:col_sep => ";"}) do |csv|
JSON.parse(response.body)["other_services"].each do |single|
csv << [single["twitter"]["identifier"], single["facebook"]["identifier"], single["linkedin"]["identifier"]]
end
end
end
And this is the error I get:
/Library/Ruby/Gems/2.0.0/gems/json-1.8.2/lib/json/common.rb:155:in `parse': 757: (JSON::ParserError) '<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>cloudflare-nginx</center>
</body>
</html>
'
from /Library/Ruby/Gems/2.0.0/gems/json-1.8.2/lib/json/common.rb:155:in `parse'
from ghmeetup.rb:13:in `block (2 levels) in <main>'
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/csv.rb:1266:in `open'
from ghmeetup.rb:12:in `block in <main>'
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/csv.rb:1716:in `each'
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/csv.rb:1120:in `block in foreach'
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/csv.rb:1266:in `open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/csv.rb:1119:in `foreach'
from ghmeetup.rb:6:in `<main>'
What do you think?
EDIT
require "uri"
require 'csv'
require 'json'
require 'net/http'
ghCSV = CSV.foreach('id-gh-meetup.csv') do |row|
id = row[1]
key="KEY"
uri = URI.parse("https://api.meetup.com/2/members?order=name&member_id=#{id}&format=json&key=#{key}")
Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http|
request = Net::HTTP::Get.new uri
response = http.request request
parseResponse = JSON.parse(response.body)['results'][0]
p "working"
CSV.open("ghmeetup.csv", "w") do |csv|
p "working 2"
parseResponse.each do |single|
p "working 3"
csv << single
end
end
end
end
So it works if I keep only JSON.parse(response.body) but when I add ['results'][0] in parseResponse I get this error:
ghmeetup.rb:15:in `block (2 levels) in <main>': undefined method `[]' for nil:NilClass (NoMethodError)
This is the JSON structure, I want to target [results][0].other_services.twitter.identifier
{
results: [
- {
- other_services: {
twitter: {
identifier: "#HugoAmsellem"
Any idea?
HTTPS is enabled for an HTTP connection by #use_ssl=
This code gets a successful response on my system using Ruby 2.2.0:
require 'net/http' # Not HTTPS
key="..." # Get your personal API key from Meetup
uri = URI.parse("https://api.meetup.com/2/members?order=name&member_id=1&format=json&key=#{key}")
Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http|
request = Net::HTTP::Get.new uri
response = http.request request
p response.body
end
In previous versions of Ruby you would need to require 'net/https' to use HTTPS. This is no longer true.
Can you try the code above on your system?
If it works, great. If it doesn't work, then you can simplify your question code, such as omitting the CSV, the loop, the JSON, etc.
This question already has answers here:
Parametrized get request in Ruby?
(7 answers)
Closed 5 years ago.
I'm trying to send a GET to http://www.hello.com/sup?a=b in ruby 1.9.3-p194 (can't update the version due to legacy code)
uri = URI.parse("http://www.hello.com/sup?a=b")
uri.query = "a=b"
req = Net::HTTP::Get.new(uri)
response = Net::HTTP.start(uri.host, uri.port) { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
else
puts "Error"
end
I'm actually using ruby 1.9.3-p194 but I'm getting this error:
/Users/hithere/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:1860:in `initialize': undefined method `empty?' for #<URI::HTTP:0x007f938d9051c8> (NoMethodError)
from /Users/hithere/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:2093:in `initialize'
from send_to_hg_given_place_id.rb:101:in `new'
from send_to_hg_given_place_id.rb:101:in `block in fetch'
from /Users/hithere/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
from send_to_hg_given_place_id.rb:100:in `fetch'
from send_to_hg_given_place_id.rb:141:in `block in <main>'
from send_to_hg_given_place_id.rb:133:in `each'
from send_to_hg_given_place_id.rb:133:in `<main>'
For some reason it is trying to use http.rb 1.9.1, and 1.9.1 requires the parameter in #new to be a String instead of URI. I'd like to either fix it so it uses 1.9.3, or obtain a solution that works for 1.9.1 http.rb
You can refer to examples for Net::HTTP. You need to pass a string in Net::HTTP::Get.new
Here is an example from it (note uri.request_uri):
uri = URI('http://example.com/some_path?query=string')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri.request_uri
response = http.request request # Net::HTTPResponse object
end
GET parameters you can append to the URL. Just pay attention to the URL encoding. See for example this SO question.
instead of
uri = URI.parse(http://www.hello.com/sup?a=b)
it would be
uri = URI.parse("http://www.hello.com/sup?a=b")
I try to learn REST in Ruby using Twitter API.
According https://dev.twitter.com/docs/api/1/get/trends I have to write GET request to http://api.twitter.com/1/trends.json.
My Ruby code is:
require 'rubygems'
require 'rest-client'
require 'json'
url = 'http://api.twitter.com/1/trends.json'
response = RestClient.get(url)
puts response.body
But i'm getting next errors:
/home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient /abstract_response.rb:48:in `return!': 404 Resource Not Found (RestClient::ResourceNotFound)
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in `block in transmit'
from /home/danik/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient.rb:68:in `get'
from TwitterTrends.rb:5:in `<main>'
What is wrong?
You are getting that error because the resource you are trying to fetch with http://api.twitter.com/1/trends.json does not exist, as is explained in this doc trends docs
This method is deprecated and has been replaced by GET trends/:woeid.
Please update your applications with the new endpoint.
You want to fetch a URL like this https://api.twitter.com/1/trends/1.json. So, in your code, try doing this:
require 'rubygems'
require 'rest-client'
require 'json'
url = 'https://api.twitter.com/1/trends/1.json'
response = RestClient.get(url)
puts response.body
And you should get a response.
I'm trying to write my first Ruby program, but have a problem. The code has to download 32 MP3 files over HTTP. It actually downloads a few, then times-out.
I tried setting a timeout period, but it makes no difference. Running the code under Windows, Cygwin and Mac OS X has the same result.
This is the code:
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'set'
require 'net/http'
require 'uri'
puts "\n Up and running!\n\n"
links_set = {}
pages = ['http://www.vimeo.com/siai/videos/sort:oldest',
'http://www.vimeo.com/siai/videos/page:2/sort:oldest',
'http://www.vimeo.com/siai/videos/page:3/sort:oldest']
pages.each do |page|
doc = Nokogiri::HTML(open(page))
doc.search('//*[#href]').each do |m|
video_id = m[:href]
if video_id.match(/^\/(\d+)$/i)
links_set[video_id[/\d+/]] = m.children[0].to_s.split(" at ")[0].split(" -- ")[0]
end
end
end
links = links_set.to_a
p links
cookie = ''
file_name = ''
open("http://www.tubeminator.com") {|f|
cookie = f.meta['set-cookie'].split(';')[0]
}
links.each do |link|
open("http://www.tubeminator.com/ajax.php?function=downloadvideo&url=http%3A%2F%2Fwww.vimeo.com%2F" + link[0],
"Cookie" => cookie) {|f|
puts f.read
}
open("http://www.tubeminator.com/ajax.php?function=convertvideo&start=0&duration=1120&size=0&format=mp3&vq=high&aq=high",
"Cookie" => cookie) {|f|
file_name = f.read
}
puts file_name
Net::HTTP.start("www.tubeminator.com") { |http|
#http.read_timeout = 3600 # 1 hour
resp = http.get("/download-video-" + file_name)
open(link[1] + ".mp3", "wb") { |file|
file.write(resp.body)
}
}
end
puts "\n Yay!!"
And this is the exception:
/Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:140:in `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:134:in `rbuf_fill'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:116:in `readuntil'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:126:in `readline'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:2138:in `read_status_line'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:2127:in `read_new'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:1120:in `transport_request'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:1106:in `request'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:312:in `block in open_http'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:564:in `start'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:767:in `buffer_open'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:669:in `open'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from test.rb:38:in `block in <main>'
from test.rb:37:in `each'
from test.rb:37:in `<main>'
I'd also appreciate your comments on the rest of the code.
For Ruby 1.8 I used this to solve my time-out issues. Extending the Net::HTTP class in my code and re-initialized with default parameters including an initialization of my own read_timeout should keep things sane I think.
require 'net/http'
# Lengthen timeout in Net::HTTP
module Net
class HTTP
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
#read_timeout = 5*60 # 5 minutes
end
end
end
Your timeout isn't in the code you set the timeout for. It's here, where you use open-uri:
open("http://www.tubeminator.com/ajax.php?function=downloadvideo&url=http%3A%2F%2Fwww.vimeo.com%2F" + link[0],
You can set a read timeout for open-uri like so:
#!/usr/bin/ruby1.9
require 'open-uri'
open('http://stackoverflow.com', 'r', :read_timeout=>0.01) do |http|
http.read
end
# => /usr/lib/ruby/1.9.0/net/protocol.rb:135:in `sysread': \
# => execution expired (Timeout::Error)
# => ...
# => from /tmp/foo.rb:5:in `<main>'
:read_timeout is new for Ruby 1.9 (it's not in Ruby 1.8). 0 or nil means "no timeout."