Ruby Cucumber PDF reader - ruby

I'm running tests to render and check a PDF. I've got it working but the PDF's are date stamped in the filename. I'm looking for a way to always have today's generated file to be opened. I've tried the approach but had no joy as PDF reader doesn't see it as a correct filename. Here is my code so you can see what I'm trying to do:
today =
Given /^I open the saved PDF and confirm the VRM is "(.*?)"$/ do |vrm|
filename = 'C:\Users\user\Downloads\vehicle_summary_VRM_#{today}.pdf' do |reader|
reader.pages.each do |page|
expect( have_content vrm
puts page.text
I get the following exception: input must be an IO-like object or a filename (ArgumentError)
Any ideas?

Change single quotes in:
filename = 'C:\Users\user\Downloads\vehicle_summary_VRM_#{today}.pdf'
to double quotes:
filename = "C:\Users\user\Downloads\vehicle_summary_VRM_#{today}.pdf"


Replace text String from the shell disabling any regular expression

I need to replace a large set of broken HTML links in a file. For that, I'd need to do a find/replace disabling any kind of regular expression- i.e. the kind of basic Find/Replace you would do from your notepad.
I came across to a Ruby script which should do exactly that:
ruby -p -i -e "gsub('Home', 'NEWLINK')" test.txt
However, the file test.txt is not changed, nor an output is returned. (I don't know much about ruby so I might be just missing something obvious)
Is there any other tool which does what I need?
Edit: I'd expect that the following test.txt file:
Home changed to:
Instead of a regular expression consider using a HTML parser which actually understands HTML and won't leave you with a broken HTML document.
# link_parser.rb
require 'bundler/inline'
gemfile do
source ''
gem 'nokogiri'
fn = ARGV[0]
if File.exist(fn)
puts "Processing #{fn}...", 'rw') do |file|
doc = Nokogiri::HTML(file)
links = doc.css('a[href="index.php?option=com_content&view=article&id=130&catid=111&Itemid=324"]')
if links.any?
links.each do |link|
link.href = "NEWLINK"
puts "#{links.length} links replaced"
puts "No links found"
puts "File not found."
ruby link_parser.rb path/to/file.html

Ruby - How to add EOF marker into a PDF file or otherwise bypass PDF::Reader::MalformedPDFError: PDF does not contain EOF marker

I'm using the Mechanize ruby gem to click a button on the web to download a PDF file and save it to the local file system.
URL = ""
agent =
agent.pluggable_parser.pdf = Mechanize::File # FYI I have also tried Mechanize::FileSaver and Mechanize::Download here
page = agent.get(URL)
form = page.forms.first
button = page.form.button_with(:value => "Some Button Text")
local_file = "path/to/file.pdf"
response = agent.submit(form, button)
But when I try to read this PDF file using the PDF::Reader gem, I get an error "PDF does not contain EOF marker".
reader = # this also happens if I try to use and depending on the different pluggable_parser configurations mentioned above
#> PDF::Reader::MalformedPDFError: PDF does not contain EOF marker
I'm able to save the PDF locally and view it and it looks fine, but the PDF::Reader gem is complaining about it missing an EOF marker.
So my question is: is there a way I could add an EOF marker into the PDF or something to get around this error so I can parse the PDF?
Related (unanswered) question: PDF does not contain EOF marker (PDF::Reader::MalformedPDFError) with pdf-reader
Related Docs:
I found the EOF marker somewhere in the middle of the downloaded file contents, followed by some HTML-looking stuff that I can't seem to figure out how to get rid of. I want to isolate the PDF content and then parse that, but still running into issues. Here is the full script I am using:
The issue seems to be with the website you're accessing:
The add HTML data at the end of the response.
However, you could truncate the response by searching for the first substring %EOF and removing all the data after that.
pdf_data = result.body
pdf_data.slice!(0, pdf_data.index("%EOL").to_i + 4)
if(pdf_data.length <= 4)
# handle error
# save/send pdf_data

Open URLs from CSV

I am using Ruby 2.1.0p0 on Mac OS.
I'm parsing a CSV file and grabbing all the URLs, then using Nokogiri and OpenURI to scrape them which is where I'm getting stuck.
When I try to use an each loop to run through the URLs array, I get this error:
initialize': No such file or directory # rb_sysopen - URL (Errno::ENOENT)
When I manually create an array, and then run through it I get no error. I've tried to_s, URI::encode, and everything I could think of and find on Stack Overflow.
I can copy and paste the URL from the CSV or from the terminal after using puts on the array and it opens in my browser no problem. I try to open it with Nokogiri it's not happening.
Here's my code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'csv'
events =
CSV.foreach('productfeed.csv') do |row|
events.push URI::encode(row[0]).to_s
events.each do |event|
page = Nokogiri::HTML(open("#{event}"))
#eventually, going to find info on the page, and scrape it, but not there yet.
#something to show I didn't get an error
puts "open = success"
Please help! I am completely out of ideas.
It looks like you're processing the header row, where on of those values is literally "URL". That's not a valid URI so open-uri won't touch it.
There's a headers option for the CSV module that will make use of the headers automatically. Try turning that on and referring to row["URL"]
I tried doing the same thing and found it to work better using a text file.
Here is what I did.
#import webbrowser module and time module
import webbrowser
import time
#open text file as "dataFile" and verify there is data in said file
dataFile = open('/home/user/Desktop/urls.txt','r')
if dataFile > 1:
print("Data file opened successfully")
print("!!!!NO DATA IN FILE!!!!")
#read file line by line, remove any spaces/newlines, and open link in chromium-browser
for lines in dataFile:
url = str(lines.strip())
print("Opening " + url)
#close file and exit
print("Closing Data File")
#wait two seconds before printing "Data file closed".
#this is purely for visual effect.
print("Data file closed")
#after opener has run, user is prompted to press enter key to exit.
raw_input("\n\nURL Opener has run. Press the enter key to exit.")
Hope this helps!

ruby rss module not reading full path

I am downloading an rss file posted as xml, and saving it with the rss extension.
I then use the rss module to read it as an rss file. The issue I have is the following:
If I create the file (page.rss) with an implicit path and I use just
that filename to process it with the rss parsing function, everything
is fine (downloaded_file = 'page.rss')
If I explicity enter manually the full path into the script (downloaded_file = "E:/Libraries/Documents/Android dev/page.rss"), everything works fine also.
But if I "calculate" the value of the absolute path with: downloaded_file = File.join(Dir.pwd, 'page.rss') the rss function fails. The value of the variable is apparently the same ("E:/Libraries/Documents/Android dev/page.rss") but there must be an invisible difference. I would like to be able to use the 'calculated' absolute path. I am sure there is a subtle difference in the way this string is interpreted by the rss function. How can I elucidate it?
Thanks for any suggestion.
Here is my script:
require 'rss'
require 'open-uri'
url = ''
downloaded_file = File.join(Dir.pwd, 'page.rss') # FAILS
puts "Path = #{downloaded_file}"#=> "E:/Libraries/Documents/Android dev/page.rss"
downloaded_file = 'page.rss' # WORKS
#downloaded_file = "E:/Libraries/Documents/Android dev/page.rss" # WORKS
puts "Used path/filename: #{downloaded_file}", 'wb') do |file| # Download url content into rss file
file << open(url).read
rss = RSS::Parser.parse(downloaded_file, false) # Read rss from downloaded_file
puts "Title: #{}"
Okay, so your downloaded_file string has been marked as tainted, and the RSS::Parser won't open a tainted file string for some reason (see rss/parser.rb about l. 105 for more details). The solution is to either: untaint the downloaded_file string before you call parse, e.g.:
RSS::Parser.parse(downloaded_file.untaint, false)
or to just open the file for the parser, e.g.:
RSS::Parser.parse(, false)
I'd never run into this issue before, so thanks! I'd heard of object tainting before, but I never really had any use to look into it. There is a bit more information about it here: What are tainted objects, and when should we untaint them?.
Dir.pwd is going to change depending on where you call the script from. Unless you are calling the script from E:/Libraries/Documents/Android dev, the filepath will be off.
It's better to build your filepath from the location of your script itself. To do so you can add:
ROOT = File.expand_path('..', __FILE__)
downloaded_file = File.join(ROOT, 'page.rss')
# or just downloaded_file = File.expand_path('../page.rss', __FILE__)

Ruby System Call Executing Before Script Finishes

I have a Ruby script that produces a Latex document using an erb template. After the .tex file has been generated, I'd like to make a system call to compile the document with pdflatex. Here are the bones of the script:
class Book
# initialize the class, query a database to get attributes, create the book, etc.
my_book =
tex_file ="/path/to/raw/tex/template")
template =
f ="/path/to/tex/output.tex")
f.puts template.result
system "pdflatex /path/to/tex/output.tex"
The system line puts me in interactive tex input mode, as if the document were empty. If I remove the call, the document is generated as normal. How can I ensure that the system call isn't made until after the document is generated? In the meantime I'm just using a bash script that calls the ruby script and then pdflatex to get around the issue.
The will open a new stream that won't be closed (saved to disk) until the script ends of until you manually close it.
This should work:
f ="/path/to/tex/output.tex")
f.puts template.result
system "pdflatex /path/to/tex/output.tex"
Or a more friendly way:
..."/path/to/tex/output.tex", 'w') do |f|
f.puts template.result
system "pdflatex /path/to/tex/output.tex"
The with a block will open the stream, make the stream accessible via the block variable (f in this example) and auto-close the stream after the block execution. The 'w' will open or create the file (if the file already exists the content will be erased => The file will be truncated)
