Only OpenURI succeeds at Reddit API request - ruby

I’m making requests to the Reddit API. First, I set a subreddit top URL:
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
All of these correctly get the contents:
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]
URI.open(reddit_url, 'User-Agent' => 'My agent').read
But then I try it with a URL for a specific post:
reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
And both Net::HTTP and Open3/curl fail, getting only empty strings. URI.open continues to work, as does opening the URL in a web browser.
Why doesn’t the second request work with two of the solutions? And why does it work with URI.open, when that’s supposed to be “an easy-to-use wrapper for Net::HTTP”? What does it do differently, and how to replicate it with Net::HTTP an curl?

Working with your example, and focussing on Net::HTTP for simplicity, the first example doesn't work as written:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
Instead I used this as my starting point:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
With that working we can try the second URL. Interestingly, I get different results depending on whether I re-use the initial connection or make a new one:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
So I suspect you're getting a 301 redirect sometimes and that's causing the confusion. There's another question and answer here for how to follow redirects.

Related

stub_request with Bearer Authorisation header not working in webmock 2

The following code works webmock 1.20.4 but not with 2.0.1
stub_request(:get, "http://www.myapi.com/my-endpoint")
.with(headers: {'Authorization' => "Bearer fake_oauth_token"})
.to_return(:body => mock_response)
This is the code I am stubbing.
def get_stuff(oauth_token)
faraday = Faraday.new(:url => "http://www.myapi.com/my-endpoint", :ssl => {verify: false})
response = faraday.get do |req|
req.options[:timeout] = 10
req.headers['Authorization'] = "Bearer #{oauth_token}"
end
if response.status == 200
response.body
else
{error: "failed"}.to_json
end
end
Using assert_requested :get, "http://www.myapi.com/my-endpoint", :headers => {'Authorization' => "Bearer fake_oauth_token"}, :times => 1 and removing the headers from stub_request I get the following output from the assert.
Failure/Error: assert_requested :get, "#{Conf.graphql[:host]}?query=#{graphql_user_details_query}", :headers => headers, :times => 1
The request GET http://www.myapi.com/my-endpoint with headers {'Authorization'=>'Bearer fake_oauth_token'} was expected to execute 1 time but it executed 0 times
The following requests were made:
GET http://www.myapi.com/my-endpoint with headers {'Accept-Encoding'=>'gzip, compressed', 'Authorization'=>'Basic QmVhcmVyIGZha2Vfb2F1dGhfdG9rZW4=', 'User-Agent'=>'Faraday v0.9.2'} was made 1 time
Is there a way to make the stub_request code work with webmock 2?
UPDATE: This issue was fixed in WebMock 2.0.2
The following is now out of date.
WebMock 2.0 was overwriting the Bearer Authorization header with a Basic Autorization header. I have reported the issue on the webmock github page ( https://github.com/bblimke/webmock/issues/617 ). Until the issue is resolved, we are monkey patching to comment out the lines that cause the issue.
We created a file WebMockHttpClient.rb that we require in our spec_helper. This comments out the lines that overwrite the Bearer Authorization header.
require 'em-http-request'
module EventMachine
class WebMockHttpClient
def build_request_signature
headers, body = #req.headers, #req.body
#conn.middleware.select { |m| m.respond_to?(:request) }.each do |m|
headers, body = m.request(self, headers, body)
end
method = #req.method
uri = #req.uri.clone
query = #req.query
uri.query = encode_query(#req.uri, query).slice(/\?(.*)/, 1)
body = form_encode_body(body) if body.is_a?(Hash)
headers = #req.headers
# if headers['authorization']
# headers['Authorization'] = WebMock::Util::Headers.basic_auth_header(headers.delete('authorization'))
# end
WebMock::RequestSignature.new(
method.downcase.to_sym,
uri.to_s,
:body => body || (#req.file && File.read(#req.file)),
:headers => headers
)
end
end
end

Ruby_send the result of scraping through email

With Ruby, my app:
checks if the page status is 200
Parses the PDF files if so
sends via email the result of scraping
Having tested all the parts of the code, everything works fine, except one thing, the mail that is sent doesn't contain the result of my scrpaing;
What is the issue, is it related to the variable #monscrape that may be not recongnised in the final party of the code ?
My code:
require 'open-uri'
require "net/http"
require 'rubygems'
require 'pdf/reader'
require 'mail'
options = { :address => "smtp.gmail.com",
:port => 587,
:domain => 'gmail.com',
:user_name => 'mail#gmail.com',
:password => 'pwd',
:authentication => 'plain',
:enable_starttls_auto => true
}
lien= "http://www.example.com"
url = URI.parse(lien)
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == "200"
io = open('http://www.example.com')
reader = PDF::Reader.new(io)
reader.pages.each do |page|
res = page.text
#monscrape = res.scan(/text[\s\S]*text/)
end
Mail.defaults do
delivery_method :smtp, options
end
Mail.deliver do
to 'mail#hotmail.com'
from 'Author <mail#gmail.com>'
subject 'testing sendmail'
html_part do
content_type 'text/html; charset=UTF-8'
body '<h1>Please find below the scrape <%= #monscrape %></h1>'
end
end
else
puts "the link doenst work"
end
The problem is the Mail.deliver block is evaluated using instance_eval. Therefore no local instance #variables will be visible to the Mail block.
So #monscrape will always be nil inside the Mail.deliver block.
One solution is to use a local (non-instance) variable instead:
monscrape = "test"
Mail.deliver do
...
body "<h1>Please find below the scrape #{monscrape}</h1>"
...
end
Also note that Mail does not support ERB(!) therefore you cannot use something like <%= monscrape %> in the body. You have to treat it like a normal string using string expansion with double quotes " and not single quotes '.
See further discussion and options here:
Why can't the Mail block see my variable?
You can't use
res = req.request_head(url.path)
when url.path returns "". request_head expects a path of at least "/". That implies you need to fix up the URL being passed so it at least has the root path "/".
url = URI.parse('http://www.example.com')
url.path # => ""
req.request_head(url.path)
*** ArgumentError Exception: HTTP request path is empty
vs.
url = URI.parse('http://www.example.com/')
url.path # => "/"
req.request_head(url.path)
#<Net::HTTPOK 200 OK readbody=true>
The second problem is you're trying to read something as PDF that isn't a PDF file. Example.com returns HTML, which is text. You can't use:
io = open('http://www.example.com')
reader = PDF::Reader.new(io)
Trying to returns "PDF does not contain EOF marker".
It's really important that you understand what types of objects/resources are being returned by a site when you request a URL. You can't declare them willy-nilly and expect code to accept it without errors.

Multiple calls to the same endpoint with different results in webmock?

I have some code that looks like this:
while response.droplet.status != env["user_droplet_desired_state"] do
sleep 2
response = ocean.droplet.show env["droplet_id"]
say ".", nil, false
end
The idea being you can set the app to wait until the server is in a certain state (eg. restart it, then watch it until it's active again)
However, I'm using webmock in the tests, and I can't figure out a way to give a different response the second time.
For example, code like this:
stub_request(:get, "https://api.digitalocean.com/v2/droplets/6918990?per_page=200").
with(:headers => {'Accept'=>'*/*', 'Accept-Encoding'=>'gzip;q=1.0,deflate;q=0.6,identity;q=0.3', 'Authorization'=>'Bearer foo', 'Content-Type'=>'application/json', 'User-Agent'=>'Faraday v0.9.2'}).
to_return(:status => 200, :body => fixture('show_droplet_inactive'), :headers => {})
stub_request(:get, "https://api.digitalocean.com/v2/droplets/6918990?per_page=200").
with(:headers => {'Accept'=>'*/*', 'Accept-Encoding'=>'gzip;q=1.0,deflate;q=0.6,identity;q=0.3', 'Authorization'=>'Bearer foo', 'Content-Type'=>'application/json', 'User-Agent'=>'Faraday v0.9.2'}).
to_return(:status => 200, :body => fixture('show_droplet'), :headers => {})
With the idea being "First time mark as in-active so the loop goes through one-time, then mark as active afterwards"
The documentation says that stubs are just done as "Last one found will work":
Always the last declared stub matching the request will be applied
i.e:
stub_request(:get, "www.example.com").to_return(:body => "abc")
stub_request(:get, "www.example.com").to_return(:body => "def")
Net::HTTP.get('www.example.com', '/') # ====> "def"
Is it possible to model multiple calls to the same endpoint with different results in webmock?
If you pass multiple arguments to #to_return, it will respond each time with the next response, and then just keep returning the last one over and over. For example:
require 'webmock/rspec'
require 'uri'
describe "something" do
it "happens" do
stub_request(:get, 'example.com/blah').
to_return({status: 200, body: 'ohai'}, {status: 200, body: 'there'})
puts Net::HTTP.get(URI('http://example.com/blah'))
puts Net::HTTP.get(URI('http://example.com/blah'))
puts Net::HTTP.get(URI('http://example.com/blah'))
puts Net::HTTP.get(URI('http://example.com/blah'))
end
end
When run as rspec <file>, this will print:
ohai
there
there
there

login vk.com net::http.post_form

I want login to vk.com or m.vk.com without Ruby. But my code dosen't work.
require 'net/http'
email = "qweqweqwe#gmail.com"
pass = "qeqqweqwe"
userUri = URI('m.vk.com/index.html')
Net::HTTP.get(userUri)
res = Net::HTTP.post_form(userUri, 'email' => email, 'pass' => pass)
puts res.body
First of all, you need to change userUri to the following:
userUri = URI('https://login.vk.com/?act=login')
Which is where the vk site expects your login parameters.
I'm not very faimilar with vk, but you probably need a way to handle the session cookie. Both receiving it, and providing it for future requests. Can you elaborate on what you're doing after login?
Here is the net/http info for cookie handling:
# Headers
res['Set-Cookie'] # => String
res.get_fields('set-cookie') # => Array
res.to_hash['set-cookie'] # => Array
puts "Headers: #{res.to_hash.inspect}"
This kind of task is exactly what Mechanize is for. Mechanize handles redirects and cookies automatically. You can do something like this:
require 'mechanize'
agent = Mechanize.new
url = "http://m.vk.com/login/"
page = agent.get(url)
form = page.forms[0]
form['email'] = "qweqweqwe#gmail.com"
form['pass'] = "qeqqweqwe"
form.submit
puts agent.page.body

Ruby HTTP get with params

How can I send HTTP GET request with parameters via ruby?
I have tried a lot of examples but all of those failed.
I know this post is old but for the sake of those brought here by google, there is an easier way to encode your parameters in a URL safe manner. I'm not sure why I haven't seen this elsewhere as the method is documented on the Net::HTTP page. I have seen the method described by Arsen7 as the accepted answer on several other questions also.
Mentioned in the Net::HTTP documentation is URI.encode_www_form(params):
# Lets say we have a path and params that look like this:
path = "/search"
params = {q: => "answer"}
# Example 1: Replacing the #path_with_params method from Arsen7
def path_with_params(path, params)
encoded_params = URI.encode_www_form(params)
[path, encoded_params].join("?")
end
# Example 2: A shortcut for the entire example by Arsen7
uri = URI.parse("http://localhost.com" + path)
uri.query = URI.encode_www_form(params)
response = Net::HTTP.get_response(uri)
Which example you choose is very much dependent on your use case. In my current project I am using a method similar to the one recommended by Arsen7 along with the simpler #path_with_params method and without the block format.
# Simplified example implementation without response
# decoding or error handling.
require "net/http"
require "uri"
class Connection
VERB_MAP = {
:get => Net::HTTP::Get,
:post => Net::HTTP::Post,
:put => Net::HTTP::Put,
:delete => Net::HTTP::Delete
}
API_ENDPOINT = "http://dev.random.com"
attr_reader :http
def initialize(endpoint = API_ENDPOINT)
uri = URI.parse(endpoint)
#http = Net::HTTP.new(uri.host, uri.port)
end
def request(method, path, params)
case method
when :get
full_path = path_with_params(path, params)
request = VERB_MAP[method].new(full_path)
else
request = VERB_MAP[method].new(path)
request.set_form_data(params)
end
http.request(request)
end
private
def path_with_params(path, params)
encoded_params = URI.encode_www_form(params)
[path, encoded_params].join("?")
end
end
con = Connection.new
con.request(:post, "/account", {:email => "test#test.com"})
=> #<Net::HTTPCreated 201 Created readbody=true>
I assume that you understand the examples on the Net::HTTP documentation page but you do not know how to pass parameters to the GET request.
You just append the parameters to the requested address, in exactly the same way you type such address in the browser:
require 'net/http'
res = Net::HTTP.start('localhost', 3000) do |http|
http.get('/users?id=1')
end
puts res.body
If you need some generic way to build the parameters string from a hash, you may create a helper like this:
require 'cgi'
def path_with_params(page, params)
return page if params.empty?
page + "?" + params.map {|k,v| CGI.escape(k.to_s)+'='+CGI.escape(v.to_s) }.join("&")
end
path_with_params("/users", :id => 1, :name => "John&Sons")
# => "/users?name=John%26Sons&id=1"

Resources