HTTPS in ruby (connecting to https://docs.google.com/) - ruby

I am new in the world of Ruby so sorry if I say something stupid.
I am trying to automate downloading some videos shared by the members of a website using https://docs.google.com/ and I am using Ruby to do it.
The link of each video has the following format :
https://docs.google.com/uc?export=download&confirm=no_antivirus&id=XXXXXXXXXXXXXXXXXXXXXXXX
I noticed that the "Download Anyway" button redirects to the following link :
https://docs.google.com/uc?export=download&confirm=**FspT**&id=XXXXXXXXXXXXXXXXXXXXXXXX
So, I noticed that the confirm get parameter value changes. Once we click on that button we have our download link.
I tried to do it using the following code :
#GoogleDocs
def get_down_link_googledocs(url)
#Getting the download webpage using https
uri = URI(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
data = http.get(uri.request_uri)
body = data.body
#Getting confirm value
confirm_regx = /confirm=[a-zA-Z0-9_-]*/
confirm = body.scan(confirm_regx)
#if the link isn't correct
if confirm == nil
return nil
end
#redirection link
confirm = String.new(confirm[0])
url = url.gsub("https://docs.google.com", "")
if url.include? "confirm=no_antivirus"
request_uri = url.gsub("confirm=no_antivirus",confirm)
else
request_uri = url + "&" + confirm
end
request = Net::HTTP::Get.new(request_uri)
data = http.request(request)
puts data.body
end
But it puts the same webpage with an other confirm value.
What Am I missing here ?

Related

Cannot make HTTP Delete request with Ruby's net/http library

I've been trying to make an API call to my server to delete a user record help on a dev database. When I use Fiddler to call the URL with the DELETE operation I am able to immediately delete the user record. When I call that same URL, again with the DELETE operation, from my script below, I get this error:
{"Message":"The requested resource does not support http method 'DELETE'."}
I have changed the url in my script below. The url I am using is definitely correct. I suspect that there is a logical error in my code that I haven't caught. My script:
require 'net/http'
require 'json'
require 'pp'
require 'uri'
def deleteUserRole
# prepare request
url= "http://my.database.5002143.access" # dev
uri = URI.parse(url)
request = Net::HTTP::Delete.new(uri.path)
http = Net::HTTP.new(uri.host, uri.port)
# send the request
response = http.request(request)
puts "response: \n"
puts response.body
puts "response code: " + response.code + "\n \n"
# parse response
buffer= response.body
result = JSON.parse(buffer)
status= result["Success"]
if status == true
then puts "passed"
else puts "failed"
end
end
deleteUserRole
It turns out that I was typing in the wrong command. I needed to change this line:
request = Net::HTTP::Delete.new(uri.path)
to this line:
request = Net::HTTP::Delete.new(uri)
By typing uri.path I was excluding part of the URL from the API call. When I was debugging, I would type puts uri and that would show me the full URL, so I was certain the URL was right. The URL was right, but I was not including the full URL in my DELETE call.
if you miss the parameters to pass while requesting delete, it won't work
you can do like this
uri = URI.parse('http://localhost/test')
http = Net::HTTP.new(uri.host, uri.port)
attribute_url = '?'
attribute_url << body.map{|k,v| "#{k}=#{v}"}.join('&')
request = Net::HTTP::Delete.new(uri.request_uri+attribute_url)
response = http.request(request)
where body is a hashmap where you can define query params as a hashmap.. while sending request it can be joined in the url by the code above.
ex:body = { :resname => 'res', :bucket_name => 'bucket', :uploaded_by => 'upload' }

Issue while fetching data from nested json

I am trying to fetch data from a nested json. Not able to understand the issue over here. Please ignore the fields that I am passing to ChildArticle class. I can sort that out.
URL for JSON - http://api.nytimes.com/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901
Below is my code:
url = 'http://api.nytimes.com'
#Define the HTTP object
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
#If the api being scraped uses https, then set use_ssl to true.
http.use_ssl = false
#Define the request_url
#Make a GET request to the given url
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
#Parse the response body
forecast = JSON.parse(response.body)
forecast["results"]["result"].each do |item|
date = Date.parse(item["published_date"].to_s)
if (#start <= date) && (#end >= date)
article = News::ChildArticle.new(author: item["author"], title: item["title"], summary: item["abstract"],
images: item["images"],source: item["url"], date: item["published_date"],
guid: item["guid"], link: item["link"], section: item["section"],
item_type: item["item_type"], updated_date: item["updated_date"],
created_date: item["created_date"],
material_type_facet: item["material_type_facet"])
#articles.concat([article])
end
end
I get below error -
[]': no implicit conversion of String into Integer (TypeError) atforecast["results"]["result"].each do |item|`
Looks like forecast['results'] is simply an array, not a hash.
Take a look at this slightly modified script. Give it a run in your terminal, and check out its output.
require 'net/http'
require 'JSON'
url = 'http://api.nytimes.com'
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = false
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
forecast = JSON.parse(response.body)
forecast["results"].each.with_index do |item, i|
puts "Item #{i}:"
puts '--'
item.each do |k, v|
puts "#{k}: #{v}"
end
puts '----'
end
Also, you may want to inspect the JSON structure of the API return from that URL. If you go to that URL, open your JavaScript console, and paste in
JSON.parse(document.body.textContent)
you can inspect the JSON structure very easily.
Another option would be downloading the response to a JSON file, and inspecting it in your editor. You'll need a JSON prettifier though.
File.open('response.json', 'w') do |f|
f.write(response.body)
end

InvalidURIError making request to Facebook Graph API with Ruby

I'm simply trying to get a response from the API that includes certain fields that I'm specifying in my uri string but I keep receiving an InvalidURIError. I've come here as a last resort, having spent hours trying to debug this.
I've already tried using the URI.encode() method on it as well, but only get the same error.
Here's my code:
url = params[:url]
uri = URI('https://graph.facebook.com/v2.3/?id=' + url + '&fields=share,og_object{id,url,engagement}&access_token=' + CONFIG['fb_access_token'])
req = Net::HTTP::Post.new(uri.path)
req.set_form_data('fields' => 'og_object[engagement]','access_token' => CONFIG['fb_access_token'])
res = Net::HTTP.new(uri.host, uri.port)
res.verify_mode = OpenSSL::SSL::VERIFY_NONE
res.use_ssl = true
response = nil
res.start do |http|
response = http.request(req)
end
response = http.request(req)
output = ""
output << "#{response.body} <br />"
return output
And the error I'm receiving:
URI::InvalidURIError - bad URI(is not URI?): https://graph.facebook.com/v2.3/?id=http://www.wikipedia.org&fields=share,og_object{id,url,engagement}&access_token=960606020650536|eJC0PoCARFaqKZWZHdwN5ogkhfs
I'm just exhausted at this point so if I left out any important information just let me know and I'll respond with it as soon as I can. Thank you!
The problem is you're just dumping strings into your URI without escaping them first.
Since you're using Sinatra you can use Rack::Utils.build_query to construct your URI's query component with the values correctly escaped:
uri = URI('https://graph.facebook.com/v2.3/')
uri.query = Rack::Utils.build_query(
id: url,
fields: 'share,og_object{id,url,engagement}',
access_token: CONFIG['fb_access_token']
)

Net::HTTP get a PDF file and save with paperclip

I would download a PDF File in a web server. I use the Net::HTTP Ruby class.
def open_file(url)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.path)
request.basic_auth(self.class.user, self.class.password)
http.request(request)
end
It works, I retrieve my PDF file, it's a string like : %PDF-1.3\n%\ ...
I have a method who return the result :
def file
result = open_file(self.file_url)
times = 0
if result.code == 404 && times <= 5
sleep(1)
times += 1
file
else
result.body
end
end
(It's a recursive method because that possible the file doesn't exist again on the server)
But when I would save this file with Paperclip, I have a error : Paperclip::AdapterRegistry::NoHandlerError (No handler found for "%PDF-1.3\n% ...
I tried manipulate the file with StringIO... without success :(.
Anyone have a idea ?
Assuming the PDF object you're getting is okay (I'm not 100% sure it is), then you could do this:
file = StringIO.new(attachment) #mimic a real upload file
file.class.class_eval { attr_accessor :original_filename, :content_type } #add attr's that paperclip needs
file.original_filename = "your_report.pdf"
file.content_type = "application/pdf"
then save the file with Paperclip.
(from "Save a Prawn PDF as a Paperclip attachment?")

How to implement cookie support in ruby net/http?

I'd like to add cookie support to a ruby class utilizing net/http to browse the web. Cookies have to be stored in a file to survive after the script has ended. Of course I can read the specs and write some kind of a handler, use some cookie.txt format and so on, but it seems to mean reinventing the wheel. Is there a better way to accomplish this task? Maybe some kind of a cooie jar class to take care of cookies?
The accepted answer will not work if your server returns and expects multiple cookies. This could happen, for example, if the server returns a set of FedAuth[n] cookies. If this affects you, you might want to look into using something along the lines of the following instead:
http = Net::HTTP.new('https://example.com', 443)
http.use_ssl = true
path1 = '/index.html'
path2 = '/index2.html'
# make a request to get the server's cookies
response = http.get(path)
if (response.code == '200')
all_cookies = response.get_fields('set-cookie')
cookies_array = Array.new
all_cookies.each { | cookie |
cookies_array.push(cookie.split('; ')[0])
}
cookies = cookies_array.join('; ')
# now make a request using the cookies
response = http.get(path2, { 'Cookie' => cookies })
end
Taken from DZone Snippets
http = Net::HTTP.new('profil.wp.pl', 443)
http.use_ssl = true
path = '/login.html'
# GET request -> so the host can set his cookies
resp, data = http.get(path, nil)
cookie = resp.response['set-cookie'].split('; ')[0]
# POST request -> logging in
data = 'serwis=wp.pl&url=profil.html&tryLogin=1&countTest=1&logowaniessl=1&login_username=blah&login_password=blah'
headers = {
'Cookie' => cookie,
'Referer' => 'http://profil.wp.pl/login.html',
'Content-Type' => 'application/x-www-form-urlencoded'
}
resp, data = http.post(path, data, headers)
# Output on the screen -> we should get either a 302 redirect (after a successful login) or an error page
puts 'Code = ' + resp.code
puts 'Message = ' + resp.message
resp.each {|key, val| puts key + ' = ' + val}
puts data
update
#To save the cookies, you can use PStore
cookies = PStore.new("cookies.pstore")
# Save the cookie
cookies.transaction do
cookies[:some_identifier] = cookie
end
# Retrieve the cookie back
cookies.transaction do
cookie = cookies[:some_identifier]
end
The accepted answer does not work. You need to access the internal representation of the response header where the multiple set-cookie values are stores separately and then remove everything after the first semicolon from these string and join them together. Here is code that works
r = http.get(path)
cookie = {'Cookie'=>r.to_hash['set-cookie'].collect{|ea|ea[/^.*?;/]}.join}
r = http.get(next_path,cookie)
Use http-cookie, which implements RFC-compliant parsing and rendering, plus a jar.
A crude example that happens to follow a redirect post-login:
require 'uri'
require 'net/http'
require 'http-cookie'
uri = URI('...')
jar = HTTP::CookieJar.new
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
req = Net::HTTP::Post.new uri
req.form_data = { ... }
res = http.request req
res.get_fields('Set-Cookie').each do |value|
jar.parse(value, req.uri)
end
fail unless res.code == '302'
req = Net::HTTP::Get.new(uri + res['Location'])
req['Cookie'] = HTTP::Cookie.cookie_value(jar.cookies(uri))
res = http.request req
end
Why do this? Because the answers above are incredibly insufficient and flat out don't work in many RFC-compliant scenarios (happened to me), so relying on the very lib implementing just what's needed is infinitely more robust if you want to handle more than one particular case.
I've used Curb and Mechanize for a similar project.
Just enable cookies support and save the cookies to a temp cookiejar...
If your using net/http or packages without cookie support built in, you will need to write your own cookie handling.
You can send receive cookies using headers.
You can store the header in any persistence framework. Whether it is some sort of database, or files.

Resources