Ruby Premature EOF? - ruby

I'm trying to write one file into another one in Ruby, but the output seems to stop prematurely.
Input file - large CSS file with base64 embedded fonts
Output file - basic html file.
#write some HTML before the CSS (works)
...
#write the external CSS (doesn't work, output finished prematurely)
while !ext_css_file.eof()
out_file.puts(ext_css_file.read())
end
...
#write some HTML after the CSS (works)
The resulting file is basically a valid HTML file, with a truncated CSS (in the middle of an embedded font)
When doing a puts on the result of read(), I get the same result: The CSS file is read only up to this last string: "RMSHhoPCAGt/mELDBESFBQSggGfAgESKCUAAAAAAAwAlgABAAAAAAABAAUADAABAAAAAAAC"

It is difficult to provide a detailed solution without more insight into what the CSS file actually contains. Based on your code above, I would try something like this instead:
#write some HTML before the CSS (works)
...
#write the external CSS (doesn't work, output finished prematurely)
out_file.puts(ext_css_file.read())
...
#write some HTML after the CSS (works)
I don't think you need the .eof check because the read method reads and returns the entire file contents, or an empty string or nil if at the end of file. See here: http://apidock.com/ruby/IO/read
I would tend to read and write the same type of data. For instance if I were writing data into the new file using puts, I would read data using readlines. If I were writing binary data using write, I would read the data using read. I would be consistent with either strings or bytes and not mix the two.
Try something like this...
File.open('writable_file_path', 'w') do |f|
# f.puts "some html"
f.puts IO.readlines('css_file_path')
# f.puts "some more html"
end

Related

How to replace the first few bytes of a file in Ruby without opening the whole file?

I have a 30MB XML file that contains some gibberish in the beginning, and so typically I have to remove that in order for Nokogiri to be able to parse the XML document properly.
Here's what I currently have:
contents = File.open(file_path).read
if contents[0..123].include? 'authenticate_response'
fixed_contents = File.open(file_path).read[123..-1]
File.open(file_path, 'w') { |f| f.write(fixed_contents) }
end
However, this actually causes the ruby script to open up the large XML file twice. Once to read the first 123 characters, and another time to read everything but the first 123 characters.
To solve the first issue, I was able to accomplish this:
contents = File.open(file_path).read(123)
However, now I need to remove these characters from the file without reading the entire file. How can I "trim" the beginning of this file without having to open the entire thing in memory?
You can open the file once, then read and check the "garbage" and finally pass the opened file directly to nokogiri for parsing. That way, you only need read the file once and don't need to write it at all.
File.open(file_path) do |xml_file|
if xml_file.read(123).include? 'authenticate_response'
# header found, nothing to do
else
# no header found. We rewind and let nokogiri parse the whole file
xml_file.rewind
end
xml = Nokogiri::XML.parse(xml_file)
# Now to whatever you want with the parsed XML document
end
Please refer to the documentation of IO#read, IO#rewind and Nokigiri::XML::Document.parse for details about those methods.

Append new lines to a csv from json.parse

more sysadmin (chef) than ruby guy, so this may be a five minute fix.
I am working on a task where i write a ruby script that pulls json data from multiple files, parses it, and writes the desired fields to a single .csv file. Basically pulling metadata about aws accounts and putting it in an accountant friendly format.
Got a lot of help from another stackoverflow on how to solve the problem for a single file, json.parse help.
My issue is that I am trying to pull the same data from multiple JSON files in an array. I can get it to loop through each file with the code below.
require 'csv'
require "json"
delim_file = CSV.open("delimited_test.csv", "w")
aws_account_list = %w(example example2)
aws_account_list.each do |account|
json_file = File.read(account.to_s + "_aws.json")
parsed_json = JSON.parse(json_file)
delim_file = CSV.open("delimited_test.csv", "w")
# This next line could be a problem if you ran this code multiple times
delim_file << ["EbsOptimized", "PrivateDnsName", "KeyName", "AvailabilityZone", "OwnerId"]
parsed_json['Reservations'].each do |inner_json|
inner_json['Instances'].each do |instance_json|
delim_file << [[instance_json['EbsOptimized'].to_s, instance_json['PrivateDnsName'], instance_json['KeyName'], instance_json['Placement']['AvailabilityZone'], inner_json['OwnerId']],[]]
end
delim_file.close
end
end
However, whenever I do it, it overwrites every time to the same single row in the .csv file. I have tried adding a \n string to the end of the array, converting the array to a string with hashes and doing a \n, but all that does is add a line to the same row that it overwrites.
How would I go about writing that it reads each json file, then appending each files metadata to a new row? This looks like a simple case of writing the right loop, but I can't figure it out.
You declared your file like this:
delim_file = CSV.open("delimited_test.csv", "w")
To fix your issue, all you have to do is change "w" to "a":
delim_file = CSV.open("delimited_test.csv", "a")
See the docs for IO#new for a description of the available file modes. In short, w creates an empty file at the filename, overwriting anyothers, and writes to that. a only creates the file if it doesn't exist, and appends otherwise. Because you have it currently at w, it'll overwrite it each time you run the script. With a, it'll append to what's already there.
You need to open file in append mode, use
delim_file = CSV.open("delimited_test.csv", "a")
'a' Write-only, starts at end of file if file exists, otherwise creates a new file for writing.
'a+' Read-write, starts at end of file if file exists, otherwise creates a new file for reading and writing'

Only "puts" one line to a text document

The code I'm working with at the moment is supposed to spit back every line of information in one text document that contains the word "DEBUG" and then paste it in a new text document titled "debug.txt".
For whatever reason it is only printing the final line into the new text document and I have no clue why. However, another function is to spit back every line to the command terminal, and it does that successfully, it just won't write them all to the file.
log_file = File.open("main_file.rb")
File.readlines(log_file).each do |line|
if line.include? "DEBUG"
puts line
File.open("debug.txt", "w") do |out|
out.puts line
end
end
end
You're overwriting the file every time you find a DEBUG line in main_file. You have your blocks backwards. The File.open('debug.txt') should be outside of the File.readlines.
Like this:
log_file = File.open("main_file.rb")
File.open("debug.txt", "w") do |out|
File.readlines(log_file).each do |line|
if line.include? "DEBUG"
puts line
out.puts line
end
end
end
You could also open the file in append mode by passing 'a' instead of 'w' in your File.open('debug.txt') call but this would be needlessly reopening the file every time you find a line that contains DEBUG in it. It would be better to open the debug file once for writing and using the file handle from there on as I show above.
Write it like this:
File.open("debug.txt", "w") do |out|
File.foreach("main_file.rb") do |line|
if line['DEBUG']
puts line
out.puts line
end
end
end
You need to:
Open the output file.
Iterate over the lines in the input file.
For each line, check to see if it contains the string you want.
If so, write it.
Loop until the input file is completely read.
Close the output file.
Notice I don't open the file for output as a single step. Ruby's use of blocks are really handy: By passing a block to open, Ruby will close the file when the block exits, avoiding the problem of open files hanging around to clutter memory or consume available file handles.
Use foreach to read the file. It reads a single line at a time and is extremely fast. It's also scalable, which means it'll work for a one-line file or a 10-million line file equally well. Using readlines, as in your code, results in Ruby loading the entire file into memory, splitting it into separate lines, then iterating over them. That can cause real problems if your input file exceeds available RAM.
line['DEBUG'] is shorthand for "do a substring match for this text". See String#[] for more information.

Why won't gsub! change my files?

I am trying to do a simple find/replace on all text files in a directory, modifying any instance of [RAVEN_START: by inserting a string (in this case 'raven was here') before the line.
Here is the entire ruby program:
#!/usr/bin/env ruby
require 'rubygems'
require 'fileutils' #for FileUtils.mv('your file', 'new location')
class RavenParser
rawDir = Dir.glob("*.txt")
count = 0
rawDir.each do |ravFile|
#we have selected every text file, so now we have to search through the file
#and make the needed changes.
rav = File.open(ravFile, "r+") do |modRav|
#Now we've opened the file, and we need to do the operations.
if modRav
lines = File.open(modRav).readlines
lines.each { |line|
if line.match /\[RAVEN_START:.*\]/
line.gsub!(/\[RAVEN_START:/, 'raven was here '+line)
count = count + 1
end
}
printf("Total Changed: %d\n",count)
else
printf("No txt files found. \n")
end
end
#end of file replacing instructions.
end
# S
end
The program runs and compiles fine, but when I open up the text file, there has been no change to any of the text within the file. count increments properly (that is, it is equal to the number of instances of [RAVEN_START: across all the files), but the actual substitution is failing to take place (or at least not saving the changes).
Is my syntax on the gsub! incorrect? Am I doing something else wrong?
You're reading the data, updating it, and then neglecting to write it back to the file. You need something like:
# And save the modified lines.
File.open(modRav, 'w') { |f| f.puts lines.join("\n") }
immediately before or after this:
printf("Total Changed: %d\n",count)
As DMG notes below, just overwriting the file isn't properly paranoid as you could be interrupted in the middle of the write and lose data. If you want to be paranoid (which all of us should be because they really are out to get us), then you want to write to a temporary file and then do an atomic rename to replace the original file the new one. A rename generally only works when you stay within a single file system as there is no guarantee that the OS's temp directory (which Tempfile uses by default) will be on the same file system as modRav so File.rename might not even be an option with a Tempfile unless precautions are taken. But the Tempfile constructor takes a tmpdir parameter so we're saved:
modRavDir = File.dirname(File.realpath(modRav))
tmp = Tempfile.new(modRav, modRavDir)
tmp.write(lines.join("\n"))
tmp.close
File.rename(tmp.path, modRav)
You might want to stick that in a separate method (safe_save(modRav, lines) perhaps) to avoid further cluttering your block.
There is no gsub! in the post (except the title and question). I would actually recommend not using gsub!, but rather use the result of gsub -- avoiding mutability can help reduce a number of subtle bugs.
The line read from the file stream into a String is a copy and modifying it will not affect the contents of the file. (The general approach is to read a line, process the line, and write the line. Or do it all at once: read all lines, process all lines, write all processed lines. In either case, nothing is being written back to the file in the code in the post ;-)
Happy coding.
You're not using gsub!, you're using gsub. gsub! and gsub different methods, one does replacement on the object itself and the other does replacement then returns the result, respectively.
Change this
line.gsub(/\[RAVEN_START:/, 'raven was here '+line)
to this :
line.gsub!(/\[RAVEN_START:/, 'raven was here '+line)
or this:
line = line.gsub(/\[RAVEN_START:/, 'raven was here '+line)
See String#gsub for more info

Help with Ruby & PrinceXML

I'm trying to write a very simple markdown-like converter in ruby, then pass the output to PrinceXML (which is awesome). Prince basically converts html to pdf.
Here's my code:
#!/usr/bin/ruby
# USAGE: command source-file.txt target-file.pdf
# read argument 1 as input
text = File.read(ARGV[0])
# wrap paragraphs in paragraph tags
text = text.gsub(/^(.+)/, '<p>\1</p>')
# create a new temp file for processing
htmlFile = File.new('/tmp/sample.html', "w+")
# place the transformed text in the new file
htmlFile.puts text
# run prince
system 'prince /tmp/sample.html #{ARGV[1]}'
But this dumps an empty file to /tmp/sample.html. When I exclude calling prince, the conversion happens just fine.
What am I doing wrong?
It's possible that the file output is being buffered, and not written to disk, because of how you are creating the output file. Try this instead:
# create a new temp file for processing
File.open('/tmp/sample.html', "w+") do |htmlFile|
# place the transformed text in the new file
htmlFile.puts text
end
# run prince
system 'prince /tmp/sample.html #{ARGV[1]}'
This is idiomatic Ruby; We pass a block to File.new and it will automatically be closed when the block exits. As a by-product of closing the file, any buffered output will be flushed to disk, where your code in your system call can find it.
From the fine manual:
prince doc.html -o out.pdf
Convert doc.html to out.pdf.
I think your system call should look like this:
system "prince /tmp/sample.html -o #{ARGV[1]}"
Also note the switch to double quotes so that #{} interpolation will work. Without the double quotes, the shell will see this command:
prince /tmp/sample.html #{ARGV[1]}
and then it will ignore everything after # as a comment. I'm not sure why you end up with an empty /tmp/sample.html, I'd expect a PDF in /tmp/sample.pdf based on my reading of the documentation.

Resources