Net::HTTP get a PDF file and save with paperclip - ruby

I would download a PDF File in a web server. I use the Net::HTTP Ruby class.
def open_file(url)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.path)
request.basic_auth(self.class.user, self.class.password)
http.request(request)
end
It works, I retrieve my PDF file, it's a string like : %PDF-1.3\n%\ ...
I have a method who return the result :
def file
result = open_file(self.file_url)
times = 0
if result.code == 404 && times <= 5
sleep(1)
times += 1
file
else
result.body
end
end
(It's a recursive method because that possible the file doesn't exist again on the server)
But when I would save this file with Paperclip, I have a error : Paperclip::AdapterRegistry::NoHandlerError (No handler found for "%PDF-1.3\n% ...
I tried manipulate the file with StringIO... without success :(.
Anyone have a idea ?

Assuming the PDF object you're getting is okay (I'm not 100% sure it is), then you could do this:
file = StringIO.new(attachment) #mimic a real upload file
file.class.class_eval { attr_accessor :original_filename, :content_type } #add attr's that paperclip needs
file.original_filename = "your_report.pdf"
file.content_type = "application/pdf"
then save the file with Paperclip.
(from "Save a Prawn PDF as a Paperclip attachment?")

Related

Upload file in chunks with progress bar

I want to upload a file in chunks while updating a progress bar after each chunk, in ruby, preferably without the implementation of any gems or plugins.
I have this POST:
uri = URI.parse("http://some/url")
http = Net::HTTP.new(uri.host,uri.port)
req = Net::HTTP::Post.new(uri.path)
req['some'] = 'header'
req.body_stream = File.new('some.file')
req.content_length = File.size('some.file')
res = https.request req
It uploads the file in one single piece in this line:
res = https.request req
I want to update a progress bar on the side.
The reverse, downloading with a progress bar in pure ruby is easy, and you can find references like this:
uri = URI('http://example.com/large_file')
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
http.request request do |response|
open 'large_file', 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
Is there a way to do something similar as above, but for uploads in Ruby?

Issue while fetching data from nested json

I am trying to fetch data from a nested json. Not able to understand the issue over here. Please ignore the fields that I am passing to ChildArticle class. I can sort that out.
URL for JSON - http://api.nytimes.com/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901
Below is my code:
url = 'http://api.nytimes.com'
#Define the HTTP object
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
#If the api being scraped uses https, then set use_ssl to true.
http.use_ssl = false
#Define the request_url
#Make a GET request to the given url
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
#Parse the response body
forecast = JSON.parse(response.body)
forecast["results"]["result"].each do |item|
date = Date.parse(item["published_date"].to_s)
if (#start <= date) && (#end >= date)
article = News::ChildArticle.new(author: item["author"], title: item["title"], summary: item["abstract"],
images: item["images"],source: item["url"], date: item["published_date"],
guid: item["guid"], link: item["link"], section: item["section"],
item_type: item["item_type"], updated_date: item["updated_date"],
created_date: item["created_date"],
material_type_facet: item["material_type_facet"])
#articles.concat([article])
end
end
I get below error -
[]': no implicit conversion of String into Integer (TypeError) atforecast["results"]["result"].each do |item|`
Looks like forecast['results'] is simply an array, not a hash.
Take a look at this slightly modified script. Give it a run in your terminal, and check out its output.
require 'net/http'
require 'JSON'
url = 'http://api.nytimes.com'
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = false
request = '/svc/mostpopular/v2/mostshared/all-sections/email/30.json?api-key=31fa4521f6572a0c05ad6822ae109b72:2:72729901'
response = http.send_request('GET', request)
forecast = JSON.parse(response.body)
forecast["results"].each.with_index do |item, i|
puts "Item #{i}:"
puts '--'
item.each do |k, v|
puts "#{k}: #{v}"
end
puts '----'
end
Also, you may want to inspect the JSON structure of the API return from that URL. If you go to that URL, open your JavaScript console, and paste in
JSON.parse(document.body.textContent)
you can inspect the JSON structure very easily.
Another option would be downloading the response to a JSON file, and inspecting it in your editor. You'll need a JSON prettifier though.
File.open('response.json', 'w') do |f|
f.write(response.body)
end

HTTPS in ruby (connecting to https://docs.google.com/)

I am new in the world of Ruby so sorry if I say something stupid.
I am trying to automate downloading some videos shared by the members of a website using https://docs.google.com/ and I am using Ruby to do it.
The link of each video has the following format :
https://docs.google.com/uc?export=download&confirm=no_antivirus&id=XXXXXXXXXXXXXXXXXXXXXXXX
I noticed that the "Download Anyway" button redirects to the following link :
https://docs.google.com/uc?export=download&confirm=**FspT**&id=XXXXXXXXXXXXXXXXXXXXXXXX
So, I noticed that the confirm get parameter value changes. Once we click on that button we have our download link.
I tried to do it using the following code :
#GoogleDocs
def get_down_link_googledocs(url)
#Getting the download webpage using https
uri = URI(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
data = http.get(uri.request_uri)
body = data.body
#Getting confirm value
confirm_regx = /confirm=[a-zA-Z0-9_-]*/
confirm = body.scan(confirm_regx)
#if the link isn't correct
if confirm == nil
return nil
end
#redirection link
confirm = String.new(confirm[0])
url = url.gsub("https://docs.google.com", "")
if url.include? "confirm=no_antivirus"
request_uri = url.gsub("confirm=no_antivirus",confirm)
else
request_uri = url + "&" + confirm
end
request = Net::HTTP::Get.new(request_uri)
data = http.request(request)
puts data.body
end
But it puts the same webpage with an other confirm value.
What Am I missing here ?

ruby net/http `read_body': Net::HTTPOK#read_body called twice (IOError)

I'm getting read_body called twice (IOError) using the net/http library. I'm trying to download files and use http sessions efficiently. Looking for some help or advice to fix my issues. From my debug message it appears when I log the response code, readbody=true. Is that why read_body is read twice when I try to write the large file in chunks?
D, [2015-04-12T21:17:46.954928 #24741] DEBUG -- : #<Net::HTTPOK 200 OK readbody=true>
I, [2015-04-12T21:17:46.955060 #24741] INFO -- : file found at http://hidden:8080/job/project/1/maven-repository/repository/org/project/service/1/service-1.zip.md5
/usr/lib/ruby/2.2.0/net/http/response.rb:195:in `read_body': Net::HTTPOK#read_body called twice (IOError)
from ./deploy_application.rb:36:in `block in get_file'
from ./deploy_application.rb:35:in `open'
from ./deploy_application.rb:35:in `get_file'
from ./deploy_application.rb:59:in `block in <main>'
from ./deploy_application.rb:58:in `each'
from ./deploy_application.rb:58:in `<main>'
require 'net/http'
require 'logger'
STAMP = Time.now.utc.to_i
#log = Logger.new(STDOUT)
# project , build, service remove variables above
project = "project"
build = "1"
service = "service"
version = "1"
BASE_URI = URI("http://hidden:8080/job/#{project}/#{build}/maven-repository/repository/org/#{service}/#{version}/")
# file pattern for application is zip / jar. Hopefully the lib in the zipfile is acceptable.
# example for module download /#{service}/#{version}.zip /#{service}/#{version}.zip.md5 /#{service}/#{version}.jar /#{service}/#{version}.jar.md5
def clean_exit(code)
# remove temp files on exit
end
def get_file(file)
puts BASE_URI
uri = URI.join(BASE_URI,file)
#log.debug(uri)
request = Net::HTTP::Get.new uri #.request_uri
#log.debug(request)
response = #http.request request
#log.debug(response)
case response
when Net::HTTPOK
size = 0
progress = 0
total = response.header["Content-Length"].to_i
#log.info("file found at #{uri}")
# need to handle file open error
Dir.mkdir "/tmp/#{STAMP}"
File.open "/tmp/#{STAMP}/#{file}", 'wb' do |io|
response.read_body do |chunk|
size += chunk.size
new_progress = (size * 100) / total
unless new_progress == progress
#log.info("\rDownloading %s (%3d%%) " % [file, new_progress])
end
progress = new_progress
io.write chunk
end
end
when 404
#log.error("maven repository file #{uri} not found")
exit 4
when 500...600
#log.error("error getting #{uri}, server returned #{response.code}")
exit 5
else
#log.error("unknown http response code #{response.code}")
end
end
#http = Net::HTTP.new(BASE_URI.host, BASE_URI.port)
files = [ "#{service}-#{version}.zip.md5", "#{service}-#{version}.jar", "#{service}-#{version}.jar.md5" ].each do |file| #"#{service}-#{version}.zip",
get_file(file)
end
Edit: Revised answer!
Net::HTTP#request, when called without a block, will pre-emptively read the body. The documentation isn't clear about this, but it hints at it by suggesting that the body is not read if a block is passed.
If you want to make the request without reading the body, you'll need to pass a block to the request call, and then read the body from within that. That is, you want something like this:
#http.request request do |response|
# ...
response.read_body do |chunk|
# ...
end
end
This is made clear in the implementation; Response#reading_body will first yield the unread response to a block if given (from #transport_request, which is called from #request), then read the body unconditionally. The block parameter to #request gives you that chance to intercept the response before the body is read.

Ruby URL Validation

I wrote out this script to basically parse a textfile of URL's and return the http response code, however I cant get it to work. I'm able to import and parse the file, however unable to get the return code. Thanks in advance!
require 'net/http'
#Open URL from file
File.open("sample_input_file", "r") do |infile|
while (URI = infile.gets)
end
end
#Get HTTP response code
http = Net::HTTP.new
response = http.request_head(URI)
#Print result
if
response.code != "200"
puts URI + "Error"
else
puts "Ok"
end
.gets returns a string, you need to actually make an a uri by calling for example URI.parse
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/

Resources