Grim pdf to png in ruby using grim - ruby

I run the below code for pdf to png conversion using grim
pdf = Grim.reap(File.dirname(__FILE__)<<"/pdf.pdf")
count = pdf.count
pdf[3].save('like.png')
text = pdf[3].text
pdf.each do |page|
puts page.text
end
It shows following error:
No such file or directory - gs -dNODISPLAY -q -sFile=./pdf.pdf
C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/pdf_info.ps (Errno::ENOENT)
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/grim/image_magic
k_processor.rb:21:in count'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/grim/pdf.rb:35:i
ncount'
from pdfpng.rb:18:in `'"

It looks like the Ghostscript dependency is missing as that provides the gs binary, or you do have that installed but it's not present in your PATH.

Related

Ruby - How to add EOF marker into a PDF file or otherwise bypass PDF::Reader::MalformedPDFError: PDF does not contain EOF marker

I'm using the Mechanize ruby gem to click a button on the web to download a PDF file and save it to the local file system.
URL = "www.my-site.com"
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::File # FYI I have also tried Mechanize::FileSaver and Mechanize::Download here
page = agent.get(URL)
form = page.forms.first
button = page.form.button_with(:value => "Some Button Text")
local_file = "path/to/file.pdf"
response = agent.submit(form, button)
response.save_as(local_file)
But when I try to read this PDF file using the PDF::Reader gem, I get an error "PDF does not contain EOF marker".
reader = PDF::Reader.new(local_file) # this also happens if I try to use PDF::Reader.new(response.body) and PDF::Reader.new(response.body_io) depending on the different pluggable_parser configurations mentioned above
#> PDF::Reader::MalformedPDFError: PDF does not contain EOF marker
I'm able to save the PDF locally and view it and it looks fine, but the PDF::Reader gem is complaining about it missing an EOF marker.
So my question is: is there a way I could add an EOF marker into the PDF or something to get around this error so I can parse the PDF?
Thanks.
Related (unanswered) question: PDF does not contain EOF marker (PDF::Reader::MalformedPDFError) with pdf-reader
Related Docs:
http://mechanize.rubyforge.org/Mechanize/File.html
http://mechanize.rubyforge.org/Mechanize/Download.html
http://mechanize.rubyforge.org/Mechanize/FileSaver.html
https://github.com/yob/pdf-reader
EDIT:
I found the EOF marker somewhere in the middle of the downloaded file contents, followed by some HTML-looking stuff that I can't seem to figure out how to get rid of. I want to isolate the PDF content and then parse that, but still running into issues. Here is the full script I am using:
https://gist.github.com/s2t2/c6766846d024edd696586b2bc7fee0bf
The issue seems to be with the website you're accessing: http://employmentsummary.abaquestionnaire.org
The add HTML data at the end of the response.
However, you could truncate the response by searching for the first substring %EOF and removing all the data after that.
i.e.:
pdf_data = result.body
pdf_data.slice!(0, pdf_data.index("%EOL").to_i + 4)
if(pdf_data.length <= 4)
# handle error
else
# save/send pdf_data
end

Saving a GIF image with RMagick failure

I'm using ruby 2.1.0, rmagick 2.15.4, ImageMagick 6.7.7-10
I'd like to load a JPEG file and then save in the GIF format.
x = Magick::Image.read("a.jpg").first
puts "Start write..."
x.format = "GIF"
x.write("a.gif")
puts "Done."
Gives me this:
Start write...
cli.rb:104:in `exit': no implicit conversion from nil to integer (TypeError)
The stack trace includes foreman and thor gems, but no step in my code.
The filesystem has a.gif defined, but the filesize is zero.
UPDATE
I think I have a problem with ImageMagick itself. Here's what happens on the command line:
$ convert -debug a.jpg a.gif
convert.im6: unrecognized event type `a.jpg' #error/convert.c/ConvertImageCommand/1135.
It looks like this question was answered here:
JPG to PNG using RMagick
Just changing the file name will not convert he file format.
Yes it will. Something else is going on here.
"This makes it easy to convert an image file to another format. Simply write the image file using a name that has either a prefix or a suffix corresponding to the format you want."
https://rmagick.github.io/imusage.html

Restore jpeg file from its encoded64 code in Rails/Ruby

Our question is that with Base64 encoded jpeg image file in uploaded_io, how to restore jpeg file out of it?
The encoded uploaded_io is generated by canvas.toDataURL("image/jpeg"). Here is the uploaded_io looks like:
uploaded_io = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2....."
In ruby/rails 4, a base64 encoded file could be decoded with:
require 'base64'
decoded = Base64.decode64(uploaded_io.sub(/.+,/, '')) #removed file header 'data:image/jpeg;base64,' as suggested
We added the gem mini_magick (v3.5.0) and installed the image magick library on our computer. Did the following:
image = MiniMagick::Image.new(decoded)
However the image is not a jpeg image file and does not respond well to .type and .size. There is no need to manipulate the image file and we are not sure that weather mini_magick/image magick are really needed here.
One issue that stands out is your decoding the image then removing the header which will cause problems.
image = MiniMagick::Image(decoded.sub(/.+,/, ''))
I did a simple test encoding / decoding an image using Ruby Base64 and everything worked as expected.
irb example:
require 'base64'
e = Base64.encode64(IO.read('/path/to/jpeg'))
d = Base64.decode64(e)
File.open("test.jpg", "w") { |f| f.write(d) }
test.jpg should be a valid file. Confirm by executing file test.jpg.

Uploading Images through Sinatra

I'm using the example code from this page:
http://www.wooptoot.com/file-upload-with-sinatra
When I try to upload an image file (png or jpg), it uploads successfully and I can see the file in the proper directory, but it gets corrupted in the process. I cannot open the image. Doing a diff with the original files, I see several newlines that are missing in the uploaded version.
I'm running Ruby 1.9.3p392 on Windows.
Edit:
I tried a test outside the context of Sinatra
File.open('57-new.jpg', "wb") do |f|
f.write(File.open('57.jpg', 'rb').read)
end
That works. The only difference is the addition of the binary flags. When using Sinatra I can set the binary flag on the write operation, but I'm not sure how I can set it on the read since I seem to be passed a file object by the request.
File.open('uploads/' + params['myfile'][:filename], "wb") do |f|
f.write(params['myfile'][:tempfile].read)
end
Okay, so it looks like all I needed was the binary flag when opening the new file.
File.open('uploads/' + params['myfile'][:filename], "wb") do |f|
f.write(params['myfile'][:tempfile].read)
end

How can I directly use pandoc to generate docx files within a Sinatra app?

I have a Sinatra app which needs to provide downloadable reports in Microsoft Word format. My approach to creating the reports is to generate the content using ERB, and then convert the resulting HTML into docx. Pandoc seems to be the best tool for accomplishing this, but my implementation involves generating some temporary files which feels kludgy.
Is there a more direct way to generate the docx file and send it to the user?
I know that PandocRuby exists, but I couldn't quite get it working for my purposes. Here is an example of my current implementation:
#setting up the docx mime type
configure do
mime_type :docx, 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
end
# route to generate the report
get '/report/:name' do
content_type :docx
input = erb :report, :layout=>false #get the HTML content for the input file
now = Time.now.to_i.to_s #create a unique file name
input_path = File.join('tmp', now+'.txt')
f = File.new(input_path, "w+")
f.write(input.to_s) #write HTML to the input to the file
f.close()
output_path = File.join('tmp', now+'.docx') # create a unique output file
system "pandoc -f html -t docx -o #{output_path} #{input_path}" # convert the input file to docs
send_file output_path
end
A recent update to pandoc-ruby added support for piping binary output to standard output. Does that solve your problem?
I don't have any experience with Sinatra, and I have not tried to use pandoc-ruby to pipe binary output, but something like
puts PandocRuby.convert(input, :from => :html, :to => :docx)
might do the trick.

Resources