Ruby on Rails - get file from URL - ruby

I'm Using Amazon Ads API which giving me a URL as a response... Opening that URL in browser, giving me a file that I need... Problem is I Don't know how to get the file from the URL in ruby... can anyone help me???
Thanks

require 'open-uri'
contents = URI.open("https://hello.mdominiak.com").read
Documentation:
https://ruby-doc.org/stdlib-3.1.2/libdoc/open-uri/rdoc/OpenURI.html

Related

ruby Nokogiri requests 403 Forbidden

Hi I use gem Nokogiri to scrape the gem getails from ruby-toolbox
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name"))
but I get the error: "403 Forbidden"
Can anyone tell me why I am getting this error?
Thanks in advance
Try to change your user-agent:
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'firefox'))
www.ruby-toolbox.com doesn't seem to accept 'ruby' as an agent.
As mentioned, the user agent has to be changed. However, in addition to that you have to disable the SSL certificate verification since it would throw an error as well.
require 'nokogiri'
require 'open-uri'
require 'openssl'
url = 'https://www.ruby-toolbox.com/categories/by_name'
content = open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE, 'User-Agent' => 'opera')
doc = Nokogiri::HTML(content)
doc.xpath('//div[#id="teaser"]//h2/text()').to_s
# "All Categories by name"
This seems to be an OpenURI issue. Try this:
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'ruby'))
I spent ~1 hour trying solutions for a 403 forbidden, including tinkering with the User-Agent argument to Nokogiri::HTML(open(www.something.com, User-Agent: "Safari")), looking into proxies, and other things.
But the whole time there was nothing wrong with my code, the website I had been automated browsing had subtly changed url, and the url it previously visited was fobidden.
I hope this may save someone else some time.

Screenshot of the URL section of the browser

I want to capture screenshot of the browser URL section.
browser.screenshot.save ('tdbank.png')
It will save the entire page of internal part of the browser, but I want to capture the URL header part of the browser. Any suggestion?
Sometime, URL is saying http or https. I want to capture this in screenshot and archive it. I know I could get it through,
url = browser.url
then do some comparison. I need this for legal purpose and it should be done by taking a screenshot.
thanks in advance.
If you're on windows, you could use the win32screenshot gem. For example:
require 'watir-webdriver'
require 'win32/screenshot'
b = Watir::Browser.new # using firefox as default browser
b.goto('http://www.example.org')
Win32::Screenshot::Take.of(:window, :title => /Firefox/).write("image.bmp")
b.close

Difficulty Accessing Section of Website using Ruby Mechanize

I am trying to access the calendar data on an airbnb listing and so far have been unsuccessful. I am using the Mechanize gem in Ruby, and when I try to access the link to access the table, I am encountering the following error:
require 'mechanize'
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.get("https://www.airbnb.com/rooms/calendar_tab_inner2/73944?cal_month=11&cal_year=2013&currency=USD")
Mechanize::ResponseCodeError: 400 => Net::HTTPBadRequest for https://www.airbnb.com/rooms/calendar_tab_inner2/726348?cal_month=11&cal_year=2013&currency=USD -- unhandled response
I have also tried to click on the tab that generates the table with the following code, but doing so simply generates the html from the original url.
agent = Mechanize.new
page1=agent.get("https://www.airbnb.com/rooms/726348")
page2=agent.click(page1.link_with(:href => '#calendar'))
Any help would greatly appreciated. Thanks!
I see the problem, you need to check the request headers:
page = agent.get url, nil, nil, {'X-Requested-With' => 'XMLHttpRequest'}

Ruby - open-uri doesn't download a file itself, but just the HTML code of the website

I am trying to use this snippet:
open("data.csv", "wb") do |file|
file << open("https://website.com/data.php", http_basic_authentication: ["username", "password"]).read
end
But instead of the desired CSV file, I get just downloaded the HTML code of the website. What's the problem?
When I access the URL and I am not logged in, then it's displayed the form for login (not the HTTP authentication window).
How to solve this situation?
Thanks
I think you should try out net/http http://dzone.com/snippets/how-download-files-ruby
It's probably because your php script return a response with a mime-type different of text/plain or better : text/csv
please see this related previous response
How to use the CSV MIME-type?
in the PHP Script :
header('Content-Type: text/csv');
header('Content-disposition: attachment;filename=data.csv');

Using WSDL With Ruby

I'm getting this error:
WSDL::XMLSchema::Parser::UnknownElementError
unknown element: {}HTML
at 'new'
when I consume webservices using Ruby. Here is the code snippet:
require 'soap/wsdlDriver'
wsdl = url
driver = SOAP::WSDLDriverFactory.new(wsdl).create_rpc_driver
driver.options["protocol.http.basic_auth"] << [url, user_name, password]
the url points to a well-formed xml.
Any solutions?
Can you share the wsdl file? Maybe that would help us answering it better.
In any case, I'd suggest generating the Driver classes first using wsdl2ruby. And then loding them in your Ruby file (through require). Examples (from the man pages):
# For server side:
$ wsdl2ruby.rb --wsdl myapp.wsdl --type server
# For client side:
$ wsdl2ruby.rb --wsdl myapp.wsdl --type client
If you load the URL in a web browser, does it get redirected to a different location?
In my experience, one reason the error "unknown element: {}HTML" comes up is the WSDL parser is trying to parse the HTML portion of the HTTP redirect and failing to do so. Therefore, you should deal with the redirect yourself (either in code or manually) and give the WSDL driver the final URL.

Resources