Help with Ruby & PrinceXML - ruby

I'm trying to write a very simple markdown-like converter in ruby, then pass the output to PrinceXML (which is awesome). Prince basically converts html to pdf.
Here's my code:
#!/usr/bin/ruby
# USAGE: command source-file.txt target-file.pdf
# read argument 1 as input
text = File.read(ARGV[0])
# wrap paragraphs in paragraph tags
text = text.gsub(/^(.+)/, '<p>\1</p>')
# create a new temp file for processing
htmlFile = File.new('/tmp/sample.html', "w+")
# place the transformed text in the new file
htmlFile.puts text
# run prince
system 'prince /tmp/sample.html #{ARGV[1]}'
But this dumps an empty file to /tmp/sample.html. When I exclude calling prince, the conversion happens just fine.
What am I doing wrong?

It's possible that the file output is being buffered, and not written to disk, because of how you are creating the output file. Try this instead:
# create a new temp file for processing
File.open('/tmp/sample.html', "w+") do |htmlFile|
# place the transformed text in the new file
htmlFile.puts text
end
# run prince
system 'prince /tmp/sample.html #{ARGV[1]}'
This is idiomatic Ruby; We pass a block to File.new and it will automatically be closed when the block exits. As a by-product of closing the file, any buffered output will be flushed to disk, where your code in your system call can find it.

From the fine manual:
prince doc.html -o out.pdf
Convert doc.html to out.pdf.
I think your system call should look like this:
system "prince /tmp/sample.html -o #{ARGV[1]}"
Also note the switch to double quotes so that #{} interpolation will work. Without the double quotes, the shell will see this command:
prince /tmp/sample.html #{ARGV[1]}
and then it will ignore everything after # as a comment. I'm not sure why you end up with an empty /tmp/sample.html, I'd expect a PDF in /tmp/sample.pdf based on my reading of the documentation.

Related

How to replace the first few bytes of a file in Ruby without opening the whole file?

I have a 30MB XML file that contains some gibberish in the beginning, and so typically I have to remove that in order for Nokogiri to be able to parse the XML document properly.
Here's what I currently have:
contents = File.open(file_path).read
if contents[0..123].include? 'authenticate_response'
fixed_contents = File.open(file_path).read[123..-1]
File.open(file_path, 'w') { |f| f.write(fixed_contents) }
end
However, this actually causes the ruby script to open up the large XML file twice. Once to read the first 123 characters, and another time to read everything but the first 123 characters.
To solve the first issue, I was able to accomplish this:
contents = File.open(file_path).read(123)
However, now I need to remove these characters from the file without reading the entire file. How can I "trim" the beginning of this file without having to open the entire thing in memory?
You can open the file once, then read and check the "garbage" and finally pass the opened file directly to nokogiri for parsing. That way, you only need read the file once and don't need to write it at all.
File.open(file_path) do |xml_file|
if xml_file.read(123).include? 'authenticate_response'
# header found, nothing to do
else
# no header found. We rewind and let nokogiri parse the whole file
xml_file.rewind
end
xml = Nokogiri::XML.parse(xml_file)
# Now to whatever you want with the parsed XML document
end
Please refer to the documentation of IO#read, IO#rewind and Nokigiri::XML::Document.parse for details about those methods.

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

Copy yaml formatting (indent) from one file to another

A translator completely messed up a yaml file by copying everything into word (don't ask).
I have already cleaned up the file using regexes, but the indent (spacing) is now missing; everything starts at the first character:
es:
default_blocks:
thank_you_html: "thank you text"
instead of
en:
default_blocks:
thank_you_html: "thank you text"
Do you have a good idea on how to automatically copy the format/structure/indent from the correct file (say en.yml) to the corrupt one (say es.yml)? (I'm using textmate 2.0 as editor)
Thanks!
Assuming the original and the translation contain exactly the same strings per line (except for the indentation problem), a quick&dirty script scanning the leading whitespace may solve this:
#!/usr/bin/env ruby
# encoding: UTF-8
indented = File.readlines(ARGV[0]).map do |l|
l.scan(/^\s+/)[0]
end.zip(File.readlines(ARGV[1])).map { |e| e.join }.join
File.open(ARGV[1], "w") { |io| io.write(indented) }
Save it, make it executable and call
./script_name.rb en.yml es.yml
Wouldn't mess with Textmate if this is not a regular task, but you could easily transform this to a command and either prompt for the two files via a dialog or select both in the file browser, open one of them in the current tab and differentiate them via environment variables ($TM_FILEPATH, $TM_SELECTED_FILES)

Why won't gsub! change my files?

I am trying to do a simple find/replace on all text files in a directory, modifying any instance of [RAVEN_START: by inserting a string (in this case 'raven was here') before the line.
Here is the entire ruby program:
#!/usr/bin/env ruby
require 'rubygems'
require 'fileutils' #for FileUtils.mv('your file', 'new location')
class RavenParser
rawDir = Dir.glob("*.txt")
count = 0
rawDir.each do |ravFile|
#we have selected every text file, so now we have to search through the file
#and make the needed changes.
rav = File.open(ravFile, "r+") do |modRav|
#Now we've opened the file, and we need to do the operations.
if modRav
lines = File.open(modRav).readlines
lines.each { |line|
if line.match /\[RAVEN_START:.*\]/
line.gsub!(/\[RAVEN_START:/, 'raven was here '+line)
count = count + 1
end
}
printf("Total Changed: %d\n",count)
else
printf("No txt files found. \n")
end
end
#end of file replacing instructions.
end
# S
end
The program runs and compiles fine, but when I open up the text file, there has been no change to any of the text within the file. count increments properly (that is, it is equal to the number of instances of [RAVEN_START: across all the files), but the actual substitution is failing to take place (or at least not saving the changes).
Is my syntax on the gsub! incorrect? Am I doing something else wrong?
You're reading the data, updating it, and then neglecting to write it back to the file. You need something like:
# And save the modified lines.
File.open(modRav, 'w') { |f| f.puts lines.join("\n") }
immediately before or after this:
printf("Total Changed: %d\n",count)
As DMG notes below, just overwriting the file isn't properly paranoid as you could be interrupted in the middle of the write and lose data. If you want to be paranoid (which all of us should be because they really are out to get us), then you want to write to a temporary file and then do an atomic rename to replace the original file the new one. A rename generally only works when you stay within a single file system as there is no guarantee that the OS's temp directory (which Tempfile uses by default) will be on the same file system as modRav so File.rename might not even be an option with a Tempfile unless precautions are taken. But the Tempfile constructor takes a tmpdir parameter so we're saved:
modRavDir = File.dirname(File.realpath(modRav))
tmp = Tempfile.new(modRav, modRavDir)
tmp.write(lines.join("\n"))
tmp.close
File.rename(tmp.path, modRav)
You might want to stick that in a separate method (safe_save(modRav, lines) perhaps) to avoid further cluttering your block.
There is no gsub! in the post (except the title and question). I would actually recommend not using gsub!, but rather use the result of gsub -- avoiding mutability can help reduce a number of subtle bugs.
The line read from the file stream into a String is a copy and modifying it will not affect the contents of the file. (The general approach is to read a line, process the line, and write the line. Or do it all at once: read all lines, process all lines, write all processed lines. In either case, nothing is being written back to the file in the code in the post ;-)
Happy coding.
You're not using gsub!, you're using gsub. gsub! and gsub different methods, one does replacement on the object itself and the other does replacement then returns the result, respectively.
Change this
line.gsub(/\[RAVEN_START:/, 'raven was here '+line)
to this :
line.gsub!(/\[RAVEN_START:/, 'raven was here '+line)
or this:
line = line.gsub(/\[RAVEN_START:/, 'raven was here '+line)
See String#gsub for more info

Ruby Premature EOF?

I'm trying to write one file into another one in Ruby, but the output seems to stop prematurely.
Input file - large CSS file with base64 embedded fonts
Output file - basic html file.
#write some HTML before the CSS (works)
...
#write the external CSS (doesn't work, output finished prematurely)
while !ext_css_file.eof()
out_file.puts(ext_css_file.read())
end
...
#write some HTML after the CSS (works)
The resulting file is basically a valid HTML file, with a truncated CSS (in the middle of an embedded font)
When doing a puts on the result of read(), I get the same result: The CSS file is read only up to this last string: "RMSHhoPCAGt/mELDBESFBQSggGfAgESKCUAAAAAAAwAlgABAAAAAAABAAUADAABAAAAAAAC"
It is difficult to provide a detailed solution without more insight into what the CSS file actually contains. Based on your code above, I would try something like this instead:
#write some HTML before the CSS (works)
...
#write the external CSS (doesn't work, output finished prematurely)
out_file.puts(ext_css_file.read())
...
#write some HTML after the CSS (works)
I don't think you need the .eof check because the read method reads and returns the entire file contents, or an empty string or nil if at the end of file. See here: http://apidock.com/ruby/IO/read
I would tend to read and write the same type of data. For instance if I were writing data into the new file using puts, I would read data using readlines. If I were writing binary data using write, I would read the data using read. I would be consistent with either strings or bytes and not mix the two.
Try something like this...
File.open('writable_file_path', 'w') do |f|
# f.puts "some html"
f.puts IO.readlines('css_file_path')
# f.puts "some more html"
end

Resources