Zlib inflate error - ruby

I am trying to save compressed strings to a file and load them later for use in the game. I kept getting "in 'finish': buffer error" errors when loading the data back up for use. I came up with this:
require "zlib"
def deflate(string)
zipper = Zlib::Deflate.new
data = zipper.deflate(string, Zlib::FINISH)
end
def inflate(string)
zstream = Zlib::Inflate.new
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end
setting = ["nothing","nada","nope"]
taggedskills = ["nothing","nada","nope","nuhuh"]
File.open('testzip.txt','wb') do |w|
w.write(deflate("hello world")+"\n")
w.write(deflate("goodbye world")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("Setting: name "+setting[0]+" set"+(setting[1].class == String ? "str" : "num")+" "+setting[1].to_s)+"\n")
w.write(deflate("Taggedskill: "+taggedskills[0]+" "+taggedskills[1]+" "+taggedskills[2]+" "+taggedskills[3])+"\n")
w.write(deflate("etc")+"\n")
end
File.open('testzip.txt','rb') do |file|
file.each do |line|
p inflate(line)
end
end
It was throwing errors at the "Taggedskill:" point. I don't know what it is, but trying to change it to "Skilltag:", "Skillt:", etc. continues to throw a buffer error, while things like "Setting:" or "Thing:" work fine, while changing the setting line to "Taggedskill:" continues to work fine. What is going on here?

In testzip.txt, you are storing newline separated binary blobs. However, binary blobs may contain newlines by themselves, so when you open testzip.txt and split it by line, you may end up splitting one binary blob that inflate would understand, into two binary blobs that it does not understand.
Try to run wc -l testzip.txt after you get the error. You'll see the file contains one more line, than the number of lines you are putting in.
What you need to do, is compress the whole file at once, not line by line.

Related

Assign File.readlines[n] to variable

I'm reading a text file which has instructions on each line. I want to assign the text on each line to it's own variable. When I do this, the value returned is nil but when I output the value of readlines[n] it is correct.
e.g.
# Using the variable (incorrect result)
puts current_zone_size
>
e.g.
# Using readlines after variable assignment (incorrect result)
current_zone_size = instructions.readlines[0]
instructions.readlines[0]
>
e.g.
# Using readlines (correct result)
instructions.readlines[0]
> 8 10
This is my code:
instructions = File.open("operator-input.txt", "r")
current_zone_size = instructions.readlines[0]
rover_init_location_orientation = instructions.readlines[1]
rover_movements = instructions.readlines[2]
This is the text in the file being read:
8 10
1 2 E
MMLMRMMRRMML
Edit:
Is the file being closed? Is this the reason I can't assign values from File.readlines[n] to variables if I'm not doing the variable assignment from within a block?
Also, the file will only ever have three lines which is why I'm not using a loop to read the lines.
IO#readlines reads all the lines in the file. It should not come as a surprise that, in order to read all the lines in the file, it has to read the entire file.
So, where is the file pointer after you read the entire file? It is at the end of the file.
What happens if you call IO#readlines the second time, when the file pointer is still at the end of the file? It will start reading at the position of the file pointer, which means it will read an empty file.
Therefore, if you want to do it the way you are doing it, you need to reset the file pointer to the beginning of the file every time you call IO#readlines:
instructions = File.open('operator-input.txt', 'r')
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
Note also that you are leaking resources: you never close the file, so it will only by closed at the earliest by Ruby when the instructions variable gets out of scope and the File instance gets garbage-collected, and at the latest by the OS automatically when your Ruby process exits, which may be a long time later. So, your code should rather be:
instructions = File.open('operator-input.txt', 'r')
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
instructions.close
In general, it is much better to use the block form of File::open, which closes the file handle automatically for you at the end of the block, and also ensures that this happens even in the case of complex control flow, errors, or exceptions:
File.open('operator-input.txt', 'r') do |instructions|
current_zone_size = instructions.readlines[0]
instructions.pos = 0
rover_init_location_orientation = instructions.readlines[1]
instructions.pos = 0
rover_movements = instructions.readlines[2]
end
Note, however, that what you want to do is horribly inefficient: you read the entire file, then take the first line, throw the rest away. Then you read the entire file again, take the second line, throw the rest away. Then you read the entire file again, take the third line, throw the rest away.
It makes much more sense to read the entire file once and then take the lines you need. Something like this:
File.open('operator-input.txt', 'r') do |instructions|
current_zone_size, rover_init_location_orientation, rover_movements =
instructions.readlines
end
However, in the case where all you do is open the file, read all lines, then immediately close it again, you should rather use the IO::readlines method instead of IO#readlines, since it does all three things for you in one call:
current_zone_size, rover_init_location_orientation, rover_movements =
File.readlines('operator-input.txt')
I ended up reading all the lines at once, now I'm able to set each variable outside of a block. Like this:
instructions = File.readlines "operator-input.txt"
current_zone_size = instructions[0]
rover_init_location_orientation = instructions[1]
rover_movements = instructions[2]
e.g.
puts current_zone_size
> 8 10

Ruby GPGME - How to encrypt large files

I'm having difficulty to Encrypt large files (bigger than available memory) using GPGME in Ruby.
#!/usr/bin/ruby
require 'gpgme'
def gpgfile(localfile)
crypto = GPGME::Crypto.new
filebasename = File.basename(localfile)
filecripted = crypto.encrypt File.read(localfile), :recipients => "info#address.com", :always_trust => true
File.open("#{localfile}.gpg", 'w') { |file| file.write(filecripted) }
end
gpgpfile("/home/largefile.data")
In this case I got an error of memory allocation:
"read: failed to allocate memory (NoMemoryError)"
Someone can explain me how to read the source file chunk by chunk (of 100Mb for example) and write them passing by the crypting?
The most obvious problem is that you're reading the entire file into memory with File.read(localfile). The Crypto#encrypt method will take an IO object as its input, so instead of File.read(localfile) (which returns the contents of the file as a string) you can pass it a File object. Likewise, you can give an IO object as the :output option, letting you write the output directly to a file instead of in memory:
def gpgfile(localfile)
infile = File.open(localfile, 'r')
outfile = File.open("#{localfile}.gpg", 'w')
crypto = GPGME::Crypto.new
crypto.encrypt(infile, recipients: "info#address.com",
output: outfile,
always_trust: true)
ensure
infile.close
outfile.close
end
I've never used ruby-gpgme, so I'm not 100% sure this will solve your problem since it depends a bit on what ruby-gpgme does behind the scenes, but from the docs and the source I've peeked at it seems like a sanely-built gem so I'm guessing this will do the trick.

How do I get the entirety of an uncompressed gzip file using Zlib?

I am trying to uncompress a 823,000 line file, but I'm only receiving 26,000 lines of the file. I'm new to I/O and for some reason, not grasping why this is the case. Here is my code:
Zlib::GzipReader.open( file_path ) do |gz|
puts gz.readlines.count
end
Any direction would be appreciated.
Thanks in advance.
Ok, so I managed to fix this.
It turns out the server log file I was using had about 29 streams of data in it. Zlib::GzipReader only read the first one. In order to fix it, I had to loop through until all 29 streams had been read:
File.open( file_path ) do |file|
zio = file
loop do
io = Zlib::GzipReader.new( zio )
uncompressed += io.read
unused = io.unused # where I'm writing my file
break if unused.nil?
zio.pos -= unused.length
end
end

ruby gedcom parser EOF exception

i need to parse gedcom 5.5 files for a analyziation project.
The first ruby parser i found couses a stack level too deep error, so i tryed to find alternatives. I fount this project: https://github.com/jslade/gedcom-ruby
There are some samples included, but i don't get them to work either.
Here is the parser itself: https://github.com/jslade/gedcom-ruby/blob/master/lib/gedcom.rb
If i try the sample like this:
ruby ./samples/count.rb ./samples/royal.ged
i get the following error:
D:/rails_projects/gedom_test/lib/gedcom.rb:185:in `readchar': end of file reached (EOFError)
I wrote a "gets" in every method for better unterstanding, this is the output till the exception raises:
Parsing './samples/royal.ged'...
INIT
BEFORE
CHECK_PROC_OR_BLOCK
BEFORE
CHECK_PROC_OR_BLOCK
PARSE
PARSE_FILE
PARSE_IO
DETECT_RS
The exact line that causes the trouble is
while ch = io.readchar
in the detect_rs method:
# valid gedcom may use either of \r or \r\n as the record separator.
# just in case, also detects simple \n as the separator as well
# detects the rs for this string by scanning ahead to the first occurence
# of either \r or \n, and checking the character after it
def detect_rs io
puts "DETECT_RS"
rs = "\x0d"
mark = io.pos
begin
while ch = io.readchar
case ch
when 0x0d
ch2 = io.readchar
if ch2 == 0x0a
rs = "\x0d\x0a"
end
break
when 0x0a
rs = "\x0a"
break
end
end
ensure
io.pos = mark
end
rs
end
I hope someone can help me with this.
The readchar method of Ruby's IO class will raise an EOFError when it encounters the end of the file. http://www.ruby-doc.org/core-2.1.1/IO.html#method-i-readchar
The gedcom-ruby gem hasn't been touched in years, but there was a fork of it made a couple of years go to fix this very problem.
Basically it changes:
while ch = io.readchar
to
while !io.eof && ch = io.readchar
You can get the fork of the gem here: https://github.com/trentlarson/gedcom-ruby

Is there a way to remove the BOM from a UTF-8 encoded file?

Is there a way to remove the BOM from a UTF-8 encoded file?
I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.
When I run my Ruby scripts to parse the JSON, it is failing with an error.
I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.
With ruby >= 1.9.2 you can use the mode r:bom|utf-8
This should work (I haven't test it in combination with json):
json = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
json = JSON.parse(file.read)
}
It doesn't matter, if the BOM is available in the file or not.
Andrew remarked, that File#rewind can't be used with BOM.
If you need a rewind-function you must remember the position and replace rewind with pos=:
#Prepare test file
File.open('file.txt', "w:utf-8"){|f|
f << "\xEF\xBB\xBF" #add BOM
f << 'some content'
}
#Read file and skip BOM if available
File.open('file.txt', "r:bom|utf-8"){|f|
pos =f.pos
p content = f.read #read and write file content
f.pos = pos #f.rewind goes to pos 0
p content = f.read #(re)read and write file content
}
So, the solution was to do a search and replace on the BOM via gsub!
I forced the encoding of the string to UTF-8 and also forced the regex pattern to be encoded in UTF-8.
I was able to derive a solution by looking at http://self.d-struct.org/195/howto-remove-byte-order-mark-with-ruby-and-iconv and http://blog.grayproductions.net/articles/ruby_19s_string
def read_json_file(file_name, index)
content = ''
file = File.open("#{file_name}\\game.json", "r")
content = file.read.force_encoding("UTF-8")
content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
json = JSON.parse(content)
print json
end
You can also specify encoding with the File.read and CSV.read methods, but you don't specify the read mode.
File.read(path, :encoding => 'bom|utf-8')
CSV.read(path, :encoding => 'bom|utf-8')
the "bom|UTF-8" encoding works well if you only read the file once, but fails if you ever call File#rewind, as I was doing in my code. To address this, I did the following:
def ignore_bom
#file.ungetc if #file.pos==0 && #file.getc != "\xEF\xBB\xBF".force_encoding("UTF-8")
end
which seems to work well. Not sure if there are other similar type characters to look out for, but they could easily be built into this method that can be called any time you rewind or open.
Server side cleanup of utf-8 bom bytes that worked for me:
csv_text.gsub!("\xEF\xBB\xBF".force_encoding(Encoding::BINARY), '')

Resources