This works (on ubuntu):
curl -v -g 'https://example.com/api/path/to/service?json={"select":["cats","dogs","cows"],"from":"20170115","to":"20170117"}'
And this works too (in ruby):
require 'rest-client'
resource = RestClient::Resource.new(
'https://example.com/api/path/to/service?json={"select":["cats","dogs","cows"],"from":"20170115","to":"20170117"}')
resp = resource.get
However, I would like to split request host and body (in ruby).
I tried this:
require 'rest-client'
resource = RestClient::Resource.new(
'https://example.com/api/path/to/service')
resp = resource.get(:data => 'json={"select":["cats","dogs","cows"],"from":"20170115","to":"20170117"}', :content_type => :json, :accept => :json)
but the server returns:
# => 400 BadRequest | application/json 49 bytes
No such encoding: "utf8"
and also with this variation:
resp = resource.get(:json => '{"select":["cats","dogs","cows"],"from":"20170115","to":"20170117"}', :content_type => :json, :accept => :json)
and equivalent with :payload => but got same error result.
I have looked stackoverflow thin but nothing I tried seems to work. I have a feeling it is something with the way ruby is escaping the double quotes before sending - just a guess.
Any ideas?
All help would be greatly appreciated. Even a link to the right stackoverflow answer :-)
Thanks.
Related
I’m making requests to the Reddit API. First, I set a subreddit top URL:
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
All of these correctly get the contents:
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]
URI.open(reddit_url, 'User-Agent' => 'My agent').read
But then I try it with a URL for a specific post:
reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
And both Net::HTTP and Open3/curl fail, getting only empty strings. URI.open continues to work, as does opening the URL in a web browser.
Why doesn’t the second request work with two of the solutions? And why does it work with URI.open, when that’s supposed to be “an easy-to-use wrapper for Net::HTTP”? What does it do differently, and how to replicate it with Net::HTTP an curl?
Working with your example, and focussing on Net::HTTP for simplicity, the first example doesn't work as written:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
Instead I used this as my starting point:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
With that working we can try the second URL. Interestingly, I get different results depending on whether I re-use the initial connection or make a new one:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
So I suspect you're getting a 301 redirect sometimes and that's causing the confusion. There's another question and answer here for how to follow redirects.
With Ruby, my app:
checks if the page status is 200
Parses the PDF files if so
sends via email the result of scraping
Having tested all the parts of the code, everything works fine, except one thing, the mail that is sent doesn't contain the result of my scrpaing;
What is the issue, is it related to the variable #monscrape that may be not recongnised in the final party of the code ?
My code:
require 'open-uri'
require "net/http"
require 'rubygems'
require 'pdf/reader'
require 'mail'
options = { :address => "smtp.gmail.com",
:port => 587,
:domain => 'gmail.com',
:user_name => 'mail#gmail.com',
:password => 'pwd',
:authentication => 'plain',
:enable_starttls_auto => true
}
lien= "http://www.example.com"
url = URI.parse(lien)
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == "200"
io = open('http://www.example.com')
reader = PDF::Reader.new(io)
reader.pages.each do |page|
res = page.text
#monscrape = res.scan(/text[\s\S]*text/)
end
Mail.defaults do
delivery_method :smtp, options
end
Mail.deliver do
to 'mail#hotmail.com'
from 'Author <mail#gmail.com>'
subject 'testing sendmail'
html_part do
content_type 'text/html; charset=UTF-8'
body '<h1>Please find below the scrape <%= #monscrape %></h1>'
end
end
else
puts "the link doenst work"
end
The problem is the Mail.deliver block is evaluated using instance_eval. Therefore no local instance #variables will be visible to the Mail block.
So #monscrape will always be nil inside the Mail.deliver block.
One solution is to use a local (non-instance) variable instead:
monscrape = "test"
Mail.deliver do
...
body "<h1>Please find below the scrape #{monscrape}</h1>"
...
end
Also note that Mail does not support ERB(!) therefore you cannot use something like <%= monscrape %> in the body. You have to treat it like a normal string using string expansion with double quotes " and not single quotes '.
See further discussion and options here:
Why can't the Mail block see my variable?
You can't use
res = req.request_head(url.path)
when url.path returns "". request_head expects a path of at least "/". That implies you need to fix up the URL being passed so it at least has the root path "/".
url = URI.parse('http://www.example.com')
url.path # => ""
req.request_head(url.path)
*** ArgumentError Exception: HTTP request path is empty
vs.
url = URI.parse('http://www.example.com/')
url.path # => "/"
req.request_head(url.path)
#<Net::HTTPOK 200 OK readbody=true>
The second problem is you're trying to read something as PDF that isn't a PDF file. Example.com returns HTML, which is text. You can't use:
io = open('http://www.example.com')
reader = PDF::Reader.new(io)
Trying to returns "PDF does not contain EOF marker".
It's really important that you understand what types of objects/resources are being returned by a site when you request a URL. You can't declare them willy-nilly and expect code to accept it without errors.
I'm trying to use webmock with rspec to stub out requests to Aws but I can't seem to get the regex to work for SQS polling. If I run rspec, webmock generates a 'correct' stub for me to use in a before(:each) block, in my spec_helper.rb like this:
You can stub this request with the following snippet:
stub_request(:post, "https://sqs.us-west-2.amazonaws.com/123456789012/backlog").
with(:body => "Action=ReceiveMessage&AttributeName.1=All&MaxNumberOfMessages=1&MessageAttributeName.1=All&QueueUrl=https%3A%2F%2Fsqs.us-west-2.amazonaws.com%2F123456789012%2Fbacklog&Version=2012-11-05&VisibilityTimeout=0&WaitTimeSeconds=20",
:headers => {'Accept'=>'*/*', 'Accept-Encoding'=>'', 'Authorization'=>'AWS4-HMAC-SHA256 Credential=MY_ACCESS_KEY/20150726/us-west-2/sqs/aws4_request, SignedHeaders=content-type;host;user-agent;x-amz-content-sha256;x-amz-date, Signature=large_alpha-numeric-signature', 'Content-Length'=>'224', 'Content-Type'=>'application/x-www-form-urlencoded; charset=utf-8', 'Host'=>'sqs.us-west-2.amazonaws.com', 'User-Agent'=>'aws-sdk-ruby2/2.1.7 ruby/2.2.2 x86_64-darwin14', 'X-Amz-Content-Sha256'=>'69336339ae76cf370477d4dsaf667as0b5dd8d25762c7c78sad8a', 'X-Amz-Date'=>'20150726T143009Z'}).
to_return(:status => 200, :body => "", :headers => {})
So in my spec_helper.rb I have
RSpec.configure do |config|
config.before(:each) do
stub_request(:post, "https://sqs.us-west-2.amazonaws.com/123456789012/backlog").
with(:body => "Action=ReceiveMessage&AttributeName.1=All&MaxNumberOfMessages=1&MessageAttributeName.1=All&QueueUrl=https%3A%2F%2Fsqs.us-west-2.amazonaws.com%2F123456789012%2Fbacklog&Version=2012-11-05&VisibilityTimeout=0&WaitTimeSeconds=20",
:headers => {'Accept'=>'*/*',
'Accept-Encoding'=>'',
'Authorization'=>"AWS4-HMAC-SHA256 Credential=MY_ACCESS_KEY/20150726/us-west-2/sqs/aws4_request, SignedHeaders=content-type;host;user-agent;x-amz-content-sha256;x-amz-date, Signature=" + /"^[a-zA-Z0-9]*$"/,
'Content-Length'=>'224',
'Content-Type'=>'application/x-www-form-urlencoded; charset=utf-8',
'Host'=>'sqs.us-west-2.amazonaws.com',
'User-Agent'=>'aws-sdk-ruby2/2.1.7 ruby/2.2.2 x86_64-darwin14',
'X-Amz-Content-Sha256'=>'694236339ae76cf370477d4dsaf667as0b5dd8d25762c7c78sad8a',
'X-Amz-Date'=>""+ /"^[a-zA-Z0-9]*$"/}).
to_return(:status => 200, :body => "", :headers => {})
end
The areas I'm trying to use regex against are the Signature and the X-Amz-Date because they're the only two that seem to change between different attempts to run the rspec.
The problem is the regex seems to not be working because even though I've added it into the spec_helper.rb, every time I run the suite, I get back the recommended stub from webmock instead of a passing or failing test. It should be passing at this point, from what I understand from the webmock docs and several tutorials.
How should I change this to get webmock to work for my test suite against Aws SQS polling?
I've been bashing my head against my desk for a few days now so any help is much appreciated.
Signature is likely generated using a byproduct of Time.now and I'm guessing you don't actually want to test for that. Instead simply do:
stub_request(:post, "https://sqs.us-west-2.amazonaws.com/123456789012/backlog").and_return(:status => 200, :body => "", :headers => {})
If you want to be even less specific on the URL (like ommiting that ID) you can even use a regex match:
stub_request(:post, /amazonaws.com/).and_return(:status => 200, :body => "", :headers => {})
I have the same problem and have the solution only for 'X-Amz-Date'.
Since it is date in special format, use Timecop.freeze block around your mock and method call.
Timecop.freeze do
stub_request(:get, "https://s3.eu-central-1.amazonaws.com/test_bucket/test").
with(headers: {'Accept'=>'*/*', 'Accept-Encoding'=>'', 'Authorization'=>'AWS4-HMAC-SHA256 Credential=access_key/20190814/eu-central-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=9e6a6a209a0cb05346a058c12ef706ebff185ae3b72f3e542e1becbc97e8ea7a', 'Content-Length'=>'0', 'Content-Type'=>'', 'Host'=>'s3.eu-central-1.amazonaws.com', 'User-Agent'=>'aws-sdk-ruby2/2.9.44 ruby/2.6.3 x86_64-darwin18 resources', 'X-Amz-Content-Sha256'=>'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'X-Amz-Date'=>"#{time.utc.iso8601(0).gsub(/[^\p{Alnum}]/, '')}"}).
to_return(status: 200, body: "", headers: {})
AwsService.bucket.object("test").get
end
The regexp leaves only alphanumerical characters and Timecop takes care of the time.
While I don't think it is very restful to have to include a payload in a DELETE request. I ran into an instance where I am testing a service that requires a payload for DELETE. Might there be a way using Ruby's Rest Client to accomplish this? Unfortunately, I am having a hard time with this one.
#json_request = '{"user_id": 5, "meta_data": "foo"}'
resource = RestClient::Resource.new "http://www.foo.com/some/process"
#response_update = resource.delete(#json_request, :content_type => :json, :accept => :json)
Output:
ArgumentError:
wrong number of arguments (2 for 0..1)
Try this
RestClient::Request.execute(:method => 'delete', :url => "http://www.foo.com", :payload => json_data)
Currently it's not possible with that gem. You can see a PL addressing that. Maybe you could fork it and pull those changes to your own fork of the rest-client gem.
The pull request https://github.com/rest-client/rest-client/pull/98
As a very modern update, from the ReadMe
RestClient::Request.execute(method: :delete, url: 'http://example.com/resource',
payload: 'foo', headers: {myheader: 'bar'})
I need to send a POST request as an XML string but I get odd results. The code:
require 'rest_client'
response = RestClient.post "http://127.0.0.1:2000", "<tag1>text</tag1>", :content_type => "text/xml"
I expect to receive "<tag1>text</tag1>" as the parameter on the request server. Instead, I get "tag1"=>"text". It converts the XML to a hash. Why is that? Any way around this?
Try this:
response = RestClient.post "http://127.0.0.1:2000",
"<tag1>text</tag1>",
{:accept => :xml, :content_type => :xml}
I think you just needed to specify the ":accept" to let it know you wanted to receive it in the XML format. Assuming it's your own server, you can debug on the server and see the request format used is probably html.
Hope that helps.
Instead of using RestClient, use Ruby's built-in Open::URI for GET requests or something like Net::HTTP or the incredibly powerful Typhoeus:
uri = URI('http://www.example.com/search.cgi')
res = Net::HTTP.post_form(uri, 'q' => 'ruby', 'max' => '50')
In Typhoeus, you'd use:
res = Typhoeus::Request.post(
'http://localhost:3000/posts',
:params => {
:title => 'test post',
:content => 'this is my test'
}
)
Your resulting page, if it's in XML will be easy to parse using Nokogiri:
doc = Nokogiri::XML(res.body)
At that point you'll have a fully parsed DOM, ready to be searched, using Nokogiri's search methods, such as search and at, or any of their related methods.