How do I get the entirety of an uncompressed gzip file using Zlib? - ruby

I am trying to uncompress a 823,000 line file, but I'm only receiving 26,000 lines of the file. I'm new to I/O and for some reason, not grasping why this is the case. Here is my code:
Zlib::GzipReader.open( file_path ) do |gz|
puts gz.readlines.count
end
Any direction would be appreciated.
Thanks in advance.

Ok, so I managed to fix this.
It turns out the server log file I was using had about 29 streams of data in it. Zlib::GzipReader only read the first one. In order to fix it, I had to loop through until all 29 streams had been read:
File.open( file_path ) do |file|
zio = file
loop do
io = Zlib::GzipReader.new( zio )
uncompressed += io.read
unused = io.unused # where I'm writing my file
break if unused.nil?
zio.pos -= unused.length
end
end

Related

IO.copy_stream performance in ruby

I am trying to continously read a file in ruby (which is growing over time and needs to be processed in a separate process). Currently I am archiving this with the following bit of code:
r,w = IO.pipe
pid = Process.spawn('ffmpeg'+ffmpeg_args, {STDIN => r, STDERR => STDOUT})
Process.detach pid
while true do
IO.copy_stream(open(#options['filename']), w)
sleep 1
end
However - while working - I can't imagine that this is the most performant way of doing it. An alternative would be to use the following variation:
step = 1024*4
copied = 0
pid = Process.spawn('ffmpeg'+ffmpeg_args, {STDIN => r, STDERR => STDOUT})
Process.detach pid
while true do
IO.copy_stream(open(#options['filename']), w, step, copied)
copied += step
sleep 1
end
which would only continously copy parts of the file (the issue here being if the step should ever overreach the end of the file). Other approaches such a simple read-file led to ffmpeg failing when there was no new data. With this solution the frames are dropped if no new data is available (which is what I need).
Is there a better way (more performant) to archive something like that?
EDIT:
Using the method proposed by #RaVeN I am now using the following code:
open(#options['filename'], 'rb') do |stream|
stream.seek(0, IO::SEEK_END)
queue = INotify::Notifier.new
queue.watch(#options['filename'], :modify) do
w.write stream.read
end
queue.run
end
However now ffmpeg complaints about invalid data. Is there another method than read?

Ruby GPGME - How to encrypt large files

I'm having difficulty to Encrypt large files (bigger than available memory) using GPGME in Ruby.
#!/usr/bin/ruby
require 'gpgme'
def gpgfile(localfile)
crypto = GPGME::Crypto.new
filebasename = File.basename(localfile)
filecripted = crypto.encrypt File.read(localfile), :recipients => "info#address.com", :always_trust => true
File.open("#{localfile}.gpg", 'w') { |file| file.write(filecripted) }
end
gpgpfile("/home/largefile.data")
In this case I got an error of memory allocation:
"read: failed to allocate memory (NoMemoryError)"
Someone can explain me how to read the source file chunk by chunk (of 100Mb for example) and write them passing by the crypting?
The most obvious problem is that you're reading the entire file into memory with File.read(localfile). The Crypto#encrypt method will take an IO object as its input, so instead of File.read(localfile) (which returns the contents of the file as a string) you can pass it a File object. Likewise, you can give an IO object as the :output option, letting you write the output directly to a file instead of in memory:
def gpgfile(localfile)
infile = File.open(localfile, 'r')
outfile = File.open("#{localfile}.gpg", 'w')
crypto = GPGME::Crypto.new
crypto.encrypt(infile, recipients: "info#address.com",
output: outfile,
always_trust: true)
ensure
infile.close
outfile.close
end
I've never used ruby-gpgme, so I'm not 100% sure this will solve your problem since it depends a bit on what ruby-gpgme does behind the scenes, but from the docs and the source I've peeked at it seems like a sanely-built gem so I'm guessing this will do the trick.

Ruby Simple Read/Write File (Copy File)

I am practicing Ruby, and I am trying to copy contents from file "from" to file "to". can you tell me where I did it wrong?
thanks !
from = "1.txt"
to = "2.txt"
data = open(from).read
out = open(to, 'w')
out.write(data)
out.close
data.close
Maybe I am missing the point, but I think writing it like so is more 'ruby'
from = "1.txt"
to = "2.txt"
contents = File.open(from, 'r').read
File.open(to, 'w').write(contents)
Personally, however, I like to use the Operating systems terminal to do File operations like so. Here is an example on linux.
from = "1.txt"
to = "2.txt"
system("cp #{from} #{to}")
And for Windows I believe you would use..
from = "1.txt"
to = "2.txt"
system("copy #{from} #{to}")
Finally, if you were needing the output of the command for some sort of logging or other reason, I would use backticks.
#A nice one liner
`cp 1.txt 2.txt`
Here is the system and backtick methods documentation.
http://ruby-doc.org/core-1.9.3/Kernel.html
You can't perform data.close — data.class would show you that you have a String, and .close is not a valid String method. By opening from the way you chose to, you lost the File reference after using it with your read. One way to fix that would be:
from = "1.txt"
to = "2.txt"
infile = open(from) # Retain the File reference
data = infile.read # Use it to do the read
out = open(to, 'w')
out.write(data)
out.close
infile.close # And finally, close it

Create in-memory only gzip

I'm trying to gzip a file in ruby without having to write it to disk first. Currently I only know how to make it work by using Zlib::GzipWriter, but I'm really hoping that I can avoid that and keep it in-memory only.
I've tried this, with no success:
def self.make_gzip(data)
gz = Zlib::GzipWriter.new(StringIO.new)
gz << data
string = gz.close.string
StringIO.new(string, 'rb').read
end
Here is what happens when I test it out:
# Files
normal = File.new('chunk0.nbt')
gzipped = File.new('chunk0.nbt.gz')
# Try to create gzip in program
make_gzip normal
=> "\u001F\x8B\b\u0000\x8AJhS\u0000\u0003S\xB6q\xCB\xCCI\xB52\xA8000OK1L\xB2441J5\xB5\xB0\u0003\u0000\u0000\xB9\x91\xDD\u0018\u0000\u0000\u0000"
# Read from a gzip created with the gzip command
reader = Zlib::GzipReader.open gzipped
reader.read
"\u001F\x8B\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xED]\xDBn\xDC\xC8\u0011%\x97N\xB82<\x9E\x89\xFF!\xFF!\xC9\xD6dFp\x80\u0005\xB2y\r\"\xEC\n\x89\xB0\xC6\xDAX+A./\xF94\xBF\u0006\xF1\x83>`\u0005\xCC\u000F\xC4\xF0\u000F.............(for 10,000 columns)
You're actually gzipping normal.to_s(which is something like "#<File:0x007f53c9b55b48>") in the following code.
# Files
normal = File.new('chunk0.nbt')
# Try to create gzip in program
make_gzip normal
You should read the content of the file, and make_gzip on the content:
make_gzip normal.read
As I commented, the make_gzip can be updated:
def self.make_gzip(data)
gz = Zlib::GzipWriter.new(StringIO.new)
gz << data
gz.close.string
end

Zlib inflate error

I am trying to save compressed strings to a file and load them later for use in the game. I kept getting "in 'finish': buffer error" errors when loading the data back up for use. I came up with this:
require "zlib"
def deflate(string)
zipper = Zlib::Deflate.new
data = zipper.deflate(string, Zlib::FINISH)
end
def inflate(string)
zstream = Zlib::Inflate.new
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end
setting = ["nothing","nada","nope"]
taggedskills = ["nothing","nada","nope","nuhuh"]
File.open('testzip.txt','wb') do |w|
w.write(deflate("hello world")+"\n")
w.write(deflate("goodbye world")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("Setting: name "+setting[0]+" set"+(setting[1].class == String ? "str" : "num")+" "+setting[1].to_s)+"\n")
w.write(deflate("Taggedskill: "+taggedskills[0]+" "+taggedskills[1]+" "+taggedskills[2]+" "+taggedskills[3])+"\n")
w.write(deflate("etc")+"\n")
end
File.open('testzip.txt','rb') do |file|
file.each do |line|
p inflate(line)
end
end
It was throwing errors at the "Taggedskill:" point. I don't know what it is, but trying to change it to "Skilltag:", "Skillt:", etc. continues to throw a buffer error, while things like "Setting:" or "Thing:" work fine, while changing the setting line to "Taggedskill:" continues to work fine. What is going on here?
In testzip.txt, you are storing newline separated binary blobs. However, binary blobs may contain newlines by themselves, so when you open testzip.txt and split it by line, you may end up splitting one binary blob that inflate would understand, into two binary blobs that it does not understand.
Try to run wc -l testzip.txt after you get the error. You'll see the file contains one more line, than the number of lines you are putting in.
What you need to do, is compress the whole file at once, not line by line.

Resources