Verifying a remote image is actually an image file in ruby? - ruby

I'm trying to figure out how I can verify what I'm feeding into carrierwave is actually an image. The source I'm getting my image urls from isn't giving me back all live urls. Some of the images no longer exist. Unfortunately it doesn't really return the right status codes or anything because I was using some code to check if the remote file exists and it was passing that check. So now just to be on the safe side I'd like a way to verify i'm getting back a valid image file before I go ahead and download it.
Here is the remote file checking code I was using just for reference but I'd prefer something that actually can identify that the files are images.
require 'open-uri'
require 'net/http'
def remote_file_exists?(url)
url = URI.parse(url)
Net::HTTP.start(url.host, url.port) do |http|
return http.head(url.request_uri).code == "200"
end
end

I would check to see if the service returns the proper mime types in the Content-Type HTTP header. (here's a list of mime types)
For example, the Content-Type of the StackOverflow homepage is text/html; charset=utf-8, and the Content-Type of your gravatar image is image/png
To check the Content-Type header for image in ruby using Net::HTTP, you would use the following:
def remote_file_exists?(url)
url = URI.parse(url)
Net::HTTP.start(url.host, url.port) do |http|
return http.head(url.request_uri)['Content-Type'].start_with? 'image'
end
end

Rick Button's answer worked for me but I needed to add SSl support:
def self.remote_image_exists?(url)
url = URI.parse(url)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = (url.scheme == "https")
http.start do |http|
return http.head(url.request_uri)['Content-Type'].start_with? 'image'
end
end

I ended up using HTTParty for this. The .net request answer from Rick Button kept timing out.
def remote_file_exists?(url)
response = HTTParty.get(url)
response.code == 200 && response.headers['Content-Type'].start_with? 'image'
end
https://github.com/jnunemaker/httparty

Related

Multipart POST Ruby HTTPS

I am trying to do a multipart post with parameters in ruby securely using https. All the examples I have seen are only http without parameters in addition to the file. However, I can't seem to modify them to get them to work with https and additional parameters (or find documentation showing a good example). How can I do a multipart POST using HTTPS in ruby with parameters? I have tried modify the code from Nick Sieger as shown below but to no avail. Where do I add parameters that I need to pass in in JSON format in addition to the file?
# push file to rest service
url = URI.parse('https://some.url.test/rs/test')
File.open(tm.created_file_name) do |txt|
req = Net::HTTP::Post::Multipart.new url.path,
'file' => UploadIO.new(txt, 'text/plain', tm.created_file_name)
n = Net::HTTP.new(url.host, url.port)
n.use_ssl = true
p req.body_stream
res = n.start do |http|
response = http.request(req)
p response.body
end
end
I figured out to do a multipart form post using https and parameters. Here is the code:
require 'rest-client'
url = 'https://some.url/rs/FileUploadForm'
#res = RestClient.post url, {:multipart=>true,:tieBreakerOptions=>1,
:myFileName=>'file.txt',
:myFile=>File.new('data/file.txt','r')}
response = JSON.parse(#res)

How to do basic authentication over HTTPs in Ruby?

After looking a lot, I've found some solutions that seem working, but not for me...
For example, I have this script:
require 'net/http'
require "net/https"
#http=Net::HTTP.new('www.xxxxxxx.net', 443)
#http.use_ssl = true
#http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#http.start() {|http|
req = Net::HTTP::Get.new('/gb/PastSetupsXLS.asp?SR=31,6')
req.basic_auth 'my_user', 'my_password'
response = http.request(req)
print response.body
}
When I run it, it gives me a page that requests for authentication, but if I write the following URL in the browser, I get into the website without problems:
https://my_user:my_password#www.xxxxxxx.net/gb/PastSetupsXLS.asp?SR=31,6
I have also tried with open-uri:
module OpenSSL
module SSL
remove_const :VERIFY_PEER
end
end
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
def download(full_url, to_here)
writeOut = open(to_here, "wb")
writeOut.write(open(full_url, :http_basic_authentication=>["my_user", "my_password"]).read)
writeOut.close
end
download('https://www.xxxxxxx.net/gb/PastSetupsXLS.asp?SR=31,6', "target_file.html")
But the result is the same, the site is asking for user authentication.
Any tips of what am I doing wrong?. Must I encode the password in Base 64?
I wrote a piece of code based on examples given in the Net::HTTP docs and tested it on my local WAMP server - it works fine. Here's what I have:
require 'net/http'
require 'openssl'
uri = URI('https://localhost/')
Net::HTTP.start(uri.host, uri.port,
:use_ssl => uri.scheme == 'https',
:verify_mode => OpenSSL::SSL::VERIFY_NONE) do |http|
request = Net::HTTP::Get.new uri.request_uri
request.basic_auth 'matt', 'secret'
response = http.request request # Net::HTTPResponse object
puts response
puts response.body
end
And my .htaccess file looks like this:
AuthName "Authorization required"
AuthUserFile c:/wamp/www/ssl/.htpasswd
AuthType basic
Require valid-user
My .htpasswd is just a one liner generated with htpasswd -c .htpasswd matt for password "secret". When I run my code I get "200 OK" and contents of index.html. If I remove the request.basic_auth line, I get 401 error.
UPDATE:
As indicated by #stereoscott in the comments, the :verify_mode value I used in the example (OpenSSL::SSL::VERIFY_NONE) is not safe for production.
All available options listed in the OpenSSL::SSL::SSLContext docs are: VERIFY_NONE, VERIFY_PEER, VERIFY_CLIENT_ONCE, VERIFY_FAIL_IF_NO_PEER_CERT, out of which (according to the OpenSSL docs) only the first two ones are used in the client mode.
So VERIFY_PEER should be used on production, which is the default btw, so you can skip it entirely.
The following is what ended up working for me:
require "uri"
require "net/http"
url = URI("https://localhost/")
https = Net::HTTP.new(url.host, url.port)
https.use_ssl = true
request = Net::HTTP::Get.new(url)
request["Authorization"] = "Basic " + Base64::encode64("my_user:my_password")
response = https.request(request)
puts response.read_body
I came up with this by building a new HTTP Request in Postman, specifying the URL, choosing an Authorization Type of "Basic Auth," and inputting the credentials.
Clicking the Code icon (</>) and selecting "Ruby - Net::HTTP" will then generate a code snippet, giving you the output above.
Postman took care of encoding the credentials, but this answer helped me to dynamically set these values. You also can likely omit the "cookie" key as part of the request.

How do I use ruby get JSON back from Instagram API

I am doing my best to get JSON back from the instagram API. Here is the code I am trying in my rake task within rails.
require 'net/http'
url = "https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f"
resp = Net::HTTP.get_response(URI.parse(url))
puts resp.body
All I end up with in the terminal is "rake aborted!
end of file reached"
If you look at the instagram docs http://instagram.com/developer/endpoints/tags/ and you paste the following URL in your browser you will get JSON back so I'm sure I am doing something wrong.
https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f
It has to do with HTTPS url you need to modify your code to include SSL
require "net/https"
require "uri"
uri = URI.parse("https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
alternatively you could use somthing like https://github.com/jnunemaker/httparty to consume 3rd party services
Looks like you'd need to configure net/http to use SSL because you're using https.
Alternative : use this with Rails, it'll parse the json on the fly too :
ActiveSupport::JSON.decode(open(URI.encode(url)))
Returns a hash to play with

User-Agent in HTTP requests, Ruby

I'm pretty new to Ruby. I've tried looking over the online documentation, but I haven't found anything that quite works. I'd like to include a User-Agent in the following HTTP requests, bot get_response() and get(). Can someone point me in the right direction?
# Preliminary check that Proggit is up
check = Net::HTTP.get_response(URI.parse(proggit_url))
if check.code != "200"
puts "Error contacting Proggit"
return
end
# Attempt to get the json
response = Net::HTTP.get(URI.parse(proggit_url))
if response.nil?
puts "Bad response when fetching Proggit json"
return
end
Amir F is correct, that you may enjoy using another HTTP client like RestClient or Faraday, but if you wanted to stick with the standard Ruby library you could set your user agent like this:
url = URI.parse(proggit_url)
req = Net::HTTP::Get.new(proggit_url)
req.add_field('User-Agent', 'My User Agent Dawg')
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
res.body
Net::HTTP is very low level, I would recommend using the rest-client gem - it will also follows redirects automatically and be easier for you to work with, i.e:
require 'rest_client'
response = RestClient.get proggit_url
if response.code != 200
# do something
end

use ruby to get content length of URLs

I am trying to write a ruby script that gets some details about files on a website using net/http. My code looks like this:
require 'open-uri'
require 'net/http'
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
http.get(asset)
}
headers = res.to_hash
p headers
I would like to get two pieces of information from this request: the total length of the content inflated, and (as appropriate) the length of the content deflated.
Sometimes, the headers will include a content-length parameter, which appears to be the gzipped length of the content. I can also approximate the inflated size of the content using res.body.length, but this has not been foolproof by any stretch of the imagination. The documentation on net/http says that gzip headers are removed from the list automatically (to help me, gee thanks) so I cannot seem to get a reliable handle on this information.
Any help is appreciated (including other gems if they will do this more easily).
Got it! The "magic" behavior here only occurs if you don't specify your own accept-encoding header. Amended code as follows:
require 'open-uri'
require 'net/http'
require 'date'
require 'zlib'
headers = { "accept-encoding" => "gzip;q=1.0,deflate;q=0.6,identity;q=0.3" }
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
http.get(asset, headers)
}
headers = res.to_hash
gzipped = headers['content-encoding'] && headers['content-encoding'][0] == "gzip"
content = gzipped ? Zlib::GzipReader.new(StringIO.new(res.body)).read : res.body
full_length = content.length,
compressed_length = (headers["content-length"] && headers["content-length"][0] || res.body.length),
You can try use sockets to send HEAD request to the server with is faster (no content) and don't send "Accept-Encoding: gzip", so your response will not be gzip.

Resources