I'm trying to access http://www.orimi.com/pdf-test.pdf to test if "PDF Test File" exists.
This is my code:
it 'pdf test' do
visit 'http://www.orimi.com/pdf-test.pdf'
puts page.title
sleep 5
convert_pdf_to_page
expect(page).to have_content 'PDF Test File'
end
def convert_pdf_to_page
temp_pdf = Tempfile.new('pdf')
temp_pdf << page.source.force_encoding('UTF-8')
reader = PDF::Reader.new(temp_pdf)
pdf_text = reader.pages.map(&:text)
temp_pdf.close
page.driver.response.instance_variable_set('#body', pdf_text)
end
But I got:
PDF::Reader::MalformedPDFError: PDF does not contain EOF marker
I searched and I found that the problem can be the PDF file. I checked the temp_pdf variable and there is just HTML with a empty body.
Is there something wrong in my code?
PDF is a tricky format, and different readers react differently to unexpected content in the PDF files. Some would crash, others would make assumptions to not crash.
I'd guess this is what happens here. When you open the file in the browser/pdf reader it works, but PDF::Reader can't handle whatever is not-standard there.
Try using different gem, Origami seems to have good opinions. I tried it with your file, and it seems to work:
> require 'origami'
> pdf = Origami::PDF.read '/tmp/pdf-test.pdf'
> pdf.grep(/Not existing/).any?
=> false
> pdf.grep(/PDF Test File/).any?
=> true
For reference (how I came up with this answer):
I googled the PDF::Reader::MalformedPDFError: PDF does not contain EOF marker and found this thread, which suggests that it's a more common problem with "working" PDFs. One of the last messages suggests the Origami, which (after checking) seems to be able to handle the PDF in question.
Related
I've been asked to write some tests to confirm text is contained within a PDF file. I've come across the PDF reader gem which is all good at rendering text from the file except the output isn't too good. I have a piece of text for example, that should read Date of first registration of the product but PDF reader sees this as Date offirstregistrationoftheproduct. Thus when I run my assertion, it fails due to the spacing of the text.
My code:
expected_text = 'Date of first registration of the product'
file = File.open(my_pdf, "rb")
PDF::Reader.open(file) do |reader|
reader.pages.each do |page|
expect(page).to have_text expected_text
end
The result is an RSpec expectation not met error.
Is there a way I can get this text properly formatted so that my assertion can read it?
The page object of Reader is not text. If you want to get text from a pdf, you may use page.text. Using a regex may solve your problem.
Try something like below.
expected_text = 'Date of first registration of the product'
file = File.open(my_pdf, "rb")
PDF::Reader.open(file) do |reader|
reader.pages.each do |page|
expect(page.text.match(/#{expected_text}/)).to be true
end
I have an input file and a batch file. When the batch file is executed using the System command,
a corresponding outfile is generated.
Now I want a particular text (position 350 to 357) from that outfile to be displayed on to my lineedit widget
Here is that part of my code:
system("C:/ORG_Class0178.bat")
Now the outfile will be generated
File.open("C:/ORG_Class0178_out.txt", 'r').each do |line|
var = line[350..357]
puts var
# To test whether the file is being read.
#responseLineEdit = Qt::LineEdit.new(self)
#responseLineEdit.setFont Qt::Font.new("Times NEw Roman", 12)
#responseLineEdit.resize 100,20
#responseLineEdit.move 210,395
#responseLineEdit.setText("#{var}")
end
When I do test whether the file is being read using puts statement, I get the exact required output in editor. However, the same text is not being displayed on LineEdit. Suggestions are welcome.
EDIT: A wired observation here. It works fine when I try to read the input file and display it , however it does not work with the output file. The puts statement does give the answer in editor confirming that output file does contain the required text. I am confused over this scenario.
There is nothing wrong with the code fragments shown.
Note that var is a local variable. Are the second and third code fragments in the same context? If they are in the same method, and var is not touched in-between, it will work.
If the fragments belong to different methods of the same class, than an instance variable (#var) will solve the problem.
If all that does not help, use Pry to chase the problem. Follow the link to find the pre-requisites and how to use. Place binding.pry in your code, and your program will stop at that line. Then inspect what your variables are doing.
try 'rb' instead of 'r'
File.open("C:/ORG_Class0178_out.txt", 'rb').each do |line|
var = line[350..357]
puts var
In my application, the user must upload a text document, the contents of which are then parsed by the receiving controller action. I've gotten the document to upload successfully, but I'm having trouble reading its contents.
There are several threads on this issue. I've tried more or less everything recommended on these threads, and I'm still unable to resolve the problem.
Here is my code:
file_data = params[:file]
contents = ""
if file_data.respond_to?(:read)
contents = file_data.read
else
if file_data.respond_to?(:path)
File.open(file_data, 'r').each_line do |line|
elts = line.split
#
#
end
end
end
So here are my problems:
file_data doesn't 'respond_to?' either :read or :path. According to some other threads on the topic, if the uploaded file is less than a certain size, it's interpreted as a string and will respond to :read. Otherwise, it should respond to :path. But in my code, it responds to neither.
If I try to take out the if statements and straight away attempt File.open(file_data, 'r'), I get an error saying that the file wasn't found.
Can someone please help me find out what's wrong?
PS, I'm really sorry that this is a redundant question, but I found the other threads unhelpful.
Are you actually storing the file? Because if you are not, of course it can't be found.
First, find out what you're actually getting for file_data by adding debug output of file_data.inspect. It maybe something you don't expect, especially if form isn't set up correctly (i.e. :multipart => true).
Rails should enclose uploaded file in special object providing uniform interface, so that something as simple as this should work:
file_data.read.each_line do |line|
elts = line.split
#
#
end
I'm having issues tidying up malformed XML code I'm getting back from the SEC's edgar database.
For some reason they have horribly formed xml. Tags that contain any sort of string aren't closed and it can actually contain other xml or html documents inside other tags. Normally I'd had this off to Tidy but that isn't being maintained.
I've tried using Nokogiri::XML::SAX::Parser but that seems to choke because the tags aren't closed. It seems to work alright until it hits the first ending tag and then it doesn't fire off on any more of them. But it is spiting out the right characters.
class Filing < Nokogiri::XML::SAX::Document
def start_element name, attrs = []
puts "starting: #{name}"
end
def characters str
puts "chars: #{str}"
end
def end_element name
puts "ending: #{name}"
end
end
It seems like this would be the best option because I can simply have it ignore the other xml or html doc. Also it would make the most sense because some of these documents can get quite large so storing the whole dom in memory would probably not work.
Here are some example files: 1 2 3
I'm starting to think I'll just have to write my own custom parser
Nokogiri's normal DOM mode is able to automatically fix-up the XML so it is syntactically correct, or a reasonable facsimile of that. It sometimes gets confused and will shift closing tags around, but you can preprocess the file to give it a nudge in the right direction if need be.
I saved the XML #1 out to a document and loaded it:
require 'nokogiri'
doc = ''
File.open('./test.xml') do |fi|
doc = Nokogiri::XML(fi)
end
puts doc.to_xml
After parsing, you can check the Nokogiri::XML::Document instance's errors method to see what errors were generated, for perverse pleasure.
doc.errors
If using Nokogiri's DOM model isn't good enough, have you considered using XMLLint to preprocess and clean the data, emitting clean XML so the SAX will work? Its --recover option might be of use.
xmllint --recover test.xml
It will output errors on stderr, and the code on stdout, so you can pipe it easily to another file.
As for writing your own parser... why? You have other options available to you, and reinventing a nicely implemented wheel is not a good use of time.
So, I wrote a simple Ruby class, and put it in my rails /lib directory. This class has the following method:
def Image.make_specific_image(paths, newfilename)
puts "making specific image"
#new_image = File.open(newfilename, "w")
puts #new_image.inspect
##blank.each(">") do |line|
puts line + "~~~~~"
#new_image.puts line
if line =~ /<g/
paths.each do |p|
puts "adding a path"
puts p
#new_image.puts p
end
end
end
end
Which creates a new file, and copies a hardcoded string (##blank) to this file, adding custom content at a certain location (after a g tag is found).
If I run this code from ruby, everything is just peachy.
HOWEVER, if I run this code from rails, the file gets CREATED, but is then empty. I've inspected each line of the code: nothing I'm trying to write to the file is nil, but the file is empty nonetheless.
I'm really stumped here. Is it a permissions thing? If so, why on EARTH would Rails have the permissions necessary to MAKE a file, but then not WRITE to the file it made?
Does File I/O somehow work differently in rails?
Specifically, I have a model method that calls:
Image.make_specific_image(paths, creature.id.to_s + ".svg")
which succesfully makes a file of the type "47.svg" that is empty.
Have you tried calling close on the file after you're done writing it? (You could also use the block-based File.open syntax, which will automatically close once the block is complete). I'm guessing the problem is that the writes aren't getting flushed to disk.
So.
Apparently File I/0 DOES work in Rails...just very, very slowly. In Ruby, as soon as I go to look at the file, it's there, it works, everything is spiffy.
Before, after seeing blank files from Rails, I would get frustrated, then delete the file, and change some code and try again (so as not to be full of spam, since each file is genearted on creature creation, so I would soon end up with a lot of files like "47.svg" and "48.svg", etc.
....So. I took my lunch break, came back to see if I could tell if the permissions of the rails generated file were different from the ruby generated file...and noticed that the RAILS file is no longer blank.
Seems to take about five minutes for rails to finally write to the file, even AFTER it claims it's done processing that whole call. Ruby takes a few seconds. Not really sure WHY they are so different, but at least now I know it's not a permissions thing.
Edit: Actually, on some files take so long, others are instant...