Create in-memory only gzip - ruby

I'm trying to gzip a file in ruby without having to write it to disk first. Currently I only know how to make it work by using Zlib::GzipWriter, but I'm really hoping that I can avoid that and keep it in-memory only.
I've tried this, with no success:
def self.make_gzip(data)
gz = Zlib::GzipWriter.new(StringIO.new)
gz << data
string = gz.close.string
StringIO.new(string, 'rb').read
end
Here is what happens when I test it out:
# Files
normal = File.new('chunk0.nbt')
gzipped = File.new('chunk0.nbt.gz')
# Try to create gzip in program
make_gzip normal
=> "\u001F\x8B\b\u0000\x8AJhS\u0000\u0003S\xB6q\xCB\xCCI\xB52\xA8000OK1L\xB2441J5\xB5\xB0\u0003\u0000\u0000\xB9\x91\xDD\u0018\u0000\u0000\u0000"
# Read from a gzip created with the gzip command
reader = Zlib::GzipReader.open gzipped
reader.read
"\u001F\x8B\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xED]\xDBn\xDC\xC8\u0011%\x97N\xB82<\x9E\x89\xFF!\xFF!\xC9\xD6dFp\x80\u0005\xB2y\r\"\xEC\n\x89\xB0\xC6\xDAX+A./\xF94\xBF\u0006\xF1\x83>`\u0005\xCC\u000F\xC4\xF0\u000F.............(for 10,000 columns)

You're actually gzipping normal.to_s(which is something like "#<File:0x007f53c9b55b48>") in the following code.
# Files
normal = File.new('chunk0.nbt')
# Try to create gzip in program
make_gzip normal
You should read the content of the file, and make_gzip on the content:
make_gzip normal.read
As I commented, the make_gzip can be updated:
def self.make_gzip(data)
gz = Zlib::GzipWriter.new(StringIO.new)
gz << data
gz.close.string
end

Related

How do I write data binary to gcs with ruby efficiently?

I want to upload data binary directly to GCP storage, without writing the file to disk. Below is the code snippet I have created to get to the state that I am going to be at.
require 'google/cloud/storage'
bucket_name = '-----'
data = File.open('image_block.jpg', 'rb') {|file| file.read }
storage = Google::Cloud::Storage.new("project_id": "maybe-i-will-tell-u")
bucket = storage.bucket bucket_name, skip_lookup: true
Now I want to directly put this data into a file on gcs, without having to write a file to disk.
Is there an efficient way we can do that?
I tried the following code
to_send = StringIO.new(data).read
bucket.create_file to_send, "image_inder_11111.jpg"
but this throws an error saying
/google/cloud/storage/bucket.rb:2898:in `file?': path name contains null byte (ArgumentError)
from /home/inder/.gem/gems/google-cloud-storage-1.36.1/lib/google/cloud/storage/bucket.rb:2898:in `ensure_io_or_file_exists!'
from /home/inder/.gem/gems/google-cloud-storage-1.36.1/lib/google/cloud/storage/bucket.rb:1566:in `create_file'
from champa.rb:14:in `<main>'
As suggested by #stefan, It should be to_send = StringIO.new(data), i.e. without .read (which would return a string again)

Ruby GPGME - How to encrypt large files

I'm having difficulty to Encrypt large files (bigger than available memory) using GPGME in Ruby.
#!/usr/bin/ruby
require 'gpgme'
def gpgfile(localfile)
crypto = GPGME::Crypto.new
filebasename = File.basename(localfile)
filecripted = crypto.encrypt File.read(localfile), :recipients => "info#address.com", :always_trust => true
File.open("#{localfile}.gpg", 'w') { |file| file.write(filecripted) }
end
gpgpfile("/home/largefile.data")
In this case I got an error of memory allocation:
"read: failed to allocate memory (NoMemoryError)"
Someone can explain me how to read the source file chunk by chunk (of 100Mb for example) and write them passing by the crypting?
The most obvious problem is that you're reading the entire file into memory with File.read(localfile). The Crypto#encrypt method will take an IO object as its input, so instead of File.read(localfile) (which returns the contents of the file as a string) you can pass it a File object. Likewise, you can give an IO object as the :output option, letting you write the output directly to a file instead of in memory:
def gpgfile(localfile)
infile = File.open(localfile, 'r')
outfile = File.open("#{localfile}.gpg", 'w')
crypto = GPGME::Crypto.new
crypto.encrypt(infile, recipients: "info#address.com",
output: outfile,
always_trust: true)
ensure
infile.close
outfile.close
end
I've never used ruby-gpgme, so I'm not 100% sure this will solve your problem since it depends a bit on what ruby-gpgme does behind the scenes, but from the docs and the source I've peeked at it seems like a sanely-built gem so I'm guessing this will do the trick.

How do I get the entirety of an uncompressed gzip file using Zlib?

I am trying to uncompress a 823,000 line file, but I'm only receiving 26,000 lines of the file. I'm new to I/O and for some reason, not grasping why this is the case. Here is my code:
Zlib::GzipReader.open( file_path ) do |gz|
puts gz.readlines.count
end
Any direction would be appreciated.
Thanks in advance.
Ok, so I managed to fix this.
It turns out the server log file I was using had about 29 streams of data in it. Zlib::GzipReader only read the first one. In order to fix it, I had to loop through until all 29 streams had been read:
File.open( file_path ) do |file|
zio = file
loop do
io = Zlib::GzipReader.new( zio )
uncompressed += io.read
unused = io.unused # where I'm writing my file
break if unused.nil?
zio.pos -= unused.length
end
end

Ruby Simple Read/Write File (Copy File)

I am practicing Ruby, and I am trying to copy contents from file "from" to file "to". can you tell me where I did it wrong?
thanks !
from = "1.txt"
to = "2.txt"
data = open(from).read
out = open(to, 'w')
out.write(data)
out.close
data.close
Maybe I am missing the point, but I think writing it like so is more 'ruby'
from = "1.txt"
to = "2.txt"
contents = File.open(from, 'r').read
File.open(to, 'w').write(contents)
Personally, however, I like to use the Operating systems terminal to do File operations like so. Here is an example on linux.
from = "1.txt"
to = "2.txt"
system("cp #{from} #{to}")
And for Windows I believe you would use..
from = "1.txt"
to = "2.txt"
system("copy #{from} #{to}")
Finally, if you were needing the output of the command for some sort of logging or other reason, I would use backticks.
#A nice one liner
`cp 1.txt 2.txt`
Here is the system and backtick methods documentation.
http://ruby-doc.org/core-1.9.3/Kernel.html
You can't perform data.close — data.class would show you that you have a String, and .close is not a valid String method. By opening from the way you chose to, you lost the File reference after using it with your read. One way to fix that would be:
from = "1.txt"
to = "2.txt"
infile = open(from) # Retain the File reference
data = infile.read # Use it to do the read
out = open(to, 'w')
out.write(data)
out.close
infile.close # And finally, close it

Is there a way to remove the BOM from a UTF-8 encoded file?

Is there a way to remove the BOM from a UTF-8 encoded file?
I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.
When I run my Ruby scripts to parse the JSON, it is failing with an error.
I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.
With ruby >= 1.9.2 you can use the mode r:bom|utf-8
This should work (I haven't test it in combination with json):
json = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
json = JSON.parse(file.read)
}
It doesn't matter, if the BOM is available in the file or not.
Andrew remarked, that File#rewind can't be used with BOM.
If you need a rewind-function you must remember the position and replace rewind with pos=:
#Prepare test file
File.open('file.txt', "w:utf-8"){|f|
f << "\xEF\xBB\xBF" #add BOM
f << 'some content'
}
#Read file and skip BOM if available
File.open('file.txt', "r:bom|utf-8"){|f|
pos =f.pos
p content = f.read #read and write file content
f.pos = pos #f.rewind goes to pos 0
p content = f.read #(re)read and write file content
}
So, the solution was to do a search and replace on the BOM via gsub!
I forced the encoding of the string to UTF-8 and also forced the regex pattern to be encoded in UTF-8.
I was able to derive a solution by looking at http://self.d-struct.org/195/howto-remove-byte-order-mark-with-ruby-and-iconv and http://blog.grayproductions.net/articles/ruby_19s_string
def read_json_file(file_name, index)
content = ''
file = File.open("#{file_name}\\game.json", "r")
content = file.read.force_encoding("UTF-8")
content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
json = JSON.parse(content)
print json
end
You can also specify encoding with the File.read and CSV.read methods, but you don't specify the read mode.
File.read(path, :encoding => 'bom|utf-8')
CSV.read(path, :encoding => 'bom|utf-8')
the "bom|UTF-8" encoding works well if you only read the file once, but fails if you ever call File#rewind, as I was doing in my code. To address this, I did the following:
def ignore_bom
#file.ungetc if #file.pos==0 && #file.getc != "\xEF\xBB\xBF".force_encoding("UTF-8")
end
which seems to work well. Not sure if there are other similar type characters to look out for, but they could easily be built into this method that can be called any time you rewind or open.
Server side cleanup of utf-8 bom bytes that worked for me:
csv_text.gsub!("\xEF\xBB\xBF".force_encoding(Encoding::BINARY), '')

Resources