Images don't download correctly Ruby - ruby

I'm trying to make a simple program that downloads a file (image) from the Internet and stores it on my computer using Ruby. I get it to download something but the images look really weird. I run Windows 10 and Ruby 2.2.3. This is my code:
require "open-uri"
require "openSSL"
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
File.open("test.jpg", "w+") do |f|
open("https://upload.wikimedia.org/wikipedia/commons/c/c9/Moon.jpg","r") do |file|
f.puts file.read
end
end
These two lines:
require "openSSL"
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
are to solve a problem where I get this error if I try to download a file via https:
C:/Ruby22-x64/lib/ruby/2.2.0/net/http.rb:923:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
Better solutions for this are very welcome!
Here is one example when I tried to download this image: https://upload.wikimedia.org/wikipedia/commons/c/c9/Moon.jpg the downloaded file looked like this: test.jpg
But it only seems to happen to images. HTML files look exactly the same. I know images can look different depending on file type but the URL ends with .jpg and when you download it via for example chrome, it is stored as a .jpg.
All suggestions are appreciated!

Related

How to handle ssl / force ssl_version when reading documents with Nokogiri?

I have some code that loads a web document using nokogiri:
require 'nokogiri'
require 'open-uri'
require 'openssl'
require 'net/https'
define_method (:loadWebDoc) { |url|
web_doc = nil
begin
file = open(url)
web_doc = Nokogiri::HTML(file)
rescue OpenURI::HTTPError => ex
raise ex
end
web_doc
}
#process some urls with threads...
It's always worked well, until I started using it in threads. My script calls loadWebDoc many times successfully, but after about 30 seconds of processing documents, I get an error like this:
/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/net/protocol.rb:44:in `connect_nonblock': SSL_connect SYSCALL returned=5 errno=0 state=SSLv3 read server session ticket A (OpenSSL::SSL::SSLError)
Here is a similar issue on stack which suggests to use TLSv1, but it's using the stock http and not Nokogiri.
I've tried several variations of something like:
file = open(url, :ssl_version => OpenSSL::SSL::SSLContext::TLSv1)
but this just gives me errors like
uninitialized constant OpenSSL::SSL::SSLContext::TLSv1 (NameError)
How can I force Nokogiri to do the same thing? It looks like I need to configure the ssl version and cipher(s) but I'm not sure how with Nokogiri and I'm likely using the wrong constant.
Looks like the error 'connect_nonblock' is raised coz the server can't handle the many connections, especially in the case of threads. Try to add the delay between attempts
open(url, open_timeout: 100)
https://ruby-doc.org/stdlib-2.4.0/libdoc/socket/rdoc/Socket.html#method-i-connect_nonblock

Connecting to cloudant NoSQL database from ruby

I have trouble connecting to my cloudant NoSQL database hosted on bluemix with couchrest_model library.
I have similar code written in ruby which works just fine from my computer (running locally, no rails or sinatra):
require 'couchrest'
url = "https://blah-blah#url with credentials.com"
database_name = "testdb"
db = CouchRest.database!(url+"/"+database_name)
db.save_doc('_id':"dog",:name => 'MonthyPython', :date => Date.today)
doc = db.get('dog')
The code above successfully writes data to my database. However, when I tried to do similar thing with the newest 'couchrest_model' gem, I got the
/Users/userpruser/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock': SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError)
I have viewed several pages, but with no luck. So what is the correct way to make it work with just ruby (no rails) or/and ruby+sinatra? I find this recipe http://recipes.sinatrarb.com/p/models/couchdb but I have no idea how to sed the evniroment variables and how to put it together.
Thanks for any help!
Did you try explicitly setting the port to 443 and the protocol to 'https'? See https://github.com/couchrest/couchrest_model#configuration
It looks like installing
gem install sinatra-config-file
and then requiring
require sinatra/config_file
solved my problem. Thanks to you all!

Read JSON file in git repo without checkout

I have the following method:
def getEndpointContent(url)
return JSON.parse(open(url).read)
end
I want to use this to return the contents of a json file located in a git repo without checking out the repository.
However, if I pass in, for example, https://github.com/MyRep/myFile.json for the url parameter, I get the following error:
`connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
Is what I'm trying to do possible, and if so, how?
You won't be able to access the file using that URL.
GitHub provides raw file access using a different domain, and you haven't included your user or organization name. Also remember that a Git repository isn't simply a directory; you'll also have to provide a branch name or commit hash or something similar to tell GitHub which version of the file you want to see.
Something like this should work:
https://raw.githubusercontent.com/MyUser/MyRepo/master/myFile.json
You can find the raw link for a file by browsing to it in the GitHub UI and clicking the "Raw" link in the file's header.
Ruby doesn't trust the Github SSL certificate probably because it's too new for your version of Ruby and/or your OS.
Try the following from the command-line (assuming linux/OSX):
wget http://curl.haxx.se/ca/cacert.pem
Now in your Ruby code:
ENV['SSL_CERT_FILE'] = "/path/to/your/download/cacert.pem" # where you downloaded file to
require 'open-uri' # ensure this is after the above line.
def getEndpointContent(url)
return JSON.parse(open(url).read)
end

How to download files through https in Ruby

I have a webapp where files were uploaded. You can login to the site with a valid account and then download those files. I am currently automating the whole framework using Ruby, Capybara and Selenium Webdriver, but I cannot automate the process of downloading files.
So far I tried using Selenium (which didn't work), also I used the Ruby library open-uri:
def downloadFile(path)
open('testing.docx', 'wb') do |file|
file << open(path).read
end
download = open(path)
IO.copy_stream(download, File.expand_path("resources\\downloads"))
end
Where path is the href of the link to the file, but at first I got the following error:
openssl::ssl::sslerror: ssl_connect returned=1 errno=0 state=sslv3 read server certificate b: certificate verify failed
In order to avoid it I used the following code:
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
But in the end, I could not download the file.
At this point I think I should load a certificate or maybe retrieve login token from cookies or else where, but I could not figure it out where exactly.
Is there a way to download files from a page which requires login?
If you use Selenium, you should download files through the browser by clicking corresponding links and buttons.
Here is described how to set up browser downloads.
https://watirwebdriver.com/browser-downloads/
Try to change the uri from "https" to "http" like this:
path = path.sub("https","http")

Why does accessing a SSL site with Mechanize on Windows fail, but on Mac work?

This is the code I'm using to connect to the SSL site.
require 'mechanize'
a = Mechanize.new
page = a.get 'https://site.com'
I"m using using Ruby 1.9.3 and Mechanize 2.1pre1 + dependencies. On Mac the above code works and returns the page. On windows 7 running the same versions it gives me the following error:
OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv3
read server certificate B: certificate verify failed
Reverting to Mechanize 2.0.1 seems to solve this problem, but I then get plagued with the too many connections reset by peer problem. Thus that is not a solution.
I've tried doing a.verify_mode = false, but that does not do anything. I have read that you can turn off SSL verification by using:
open(uri,:ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE)
How can I turn it off in Mechanize ? Why am I only getting this error on Windows ?
The version of OpenSSL (the library used to establish secure connections with Net::HTTPS) is not able to properly find the certificate chain in your computer.
To our bad, OpenSSL was never able to use the Windows installed cert storage to validate remote servers so is failing because of that.
From your example, you can do:
a.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
To avoid the verification, however that is far from ideal (due clear security issues)
I recommend you download some cert bundles (like the ones from curl):
http://curl.haxx.se/ca
And modify your code to something like this:
require "rbconfig"
require "mechanize"
a = Mechanize.new
# conditionally set certificate under Windows
# http://blog.emptyway.com/2009/11/03/proper-way-to-detect-windows-platform-in-ruby/
if RbConfig::CONFIG["host_os"] =~ /mingw|mswin/
# http://curl.haxx.se/ca
ca_path = File.expand_path "~/Tools/bin/curl-ca-bundle.crt"
a.agent.http.ca_file = ca_path
end
page = a.get "https://github.com/"
That seems to work, Ruby 1.9.3-p0 (i386-mingw32), Windows 7 x64 and mechanize 2.1.pre.1
Hope that helps.
Luis' answer looks fine but more generally:
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
You can simply do the following:
agent = Mechanize.new
agent.verify_mode = OpenSSL::SSL::VERIFY_NONE
This worked on the latest version 2.8

Resources