Uncompress a .gz file and ready it chunk by chunk - ruby

I have a number of quite big .gz files that I want to read. But I don't want to read each file all at once because it may hurt RAM, instead I want to read it chunk by chunk. How can I do that? In the documentation it describes a traditional approach by reading a whole file:
Zlib::GzipReader.open('hoge.gz') do |gz|
print gz.read
end
File.open('hoge.gz') do |f|
gz = Zlib::GzipReader.new(f)
print gz.read
gz.close
end

No examples of this anywhere. I needed to read the documentation.
require 'zlib'
def read_gz_by_chunk
infile = open("file_name.gz")
rgz = Zlib::GzipReader.new(infile)
while(!rgz.eof)
data = rgz.readpartial(256)
# do stuff
puts data
end
end

Related

Ruby: how to read an mp4 file into chunks

I want to be able to read an mp4 file in chunks of 1mb.
I've tried opening the file with the following API's:
video_file = File.open(#video_filename, 'rb')
video_file = IO.binread(#video_filename)
The problem is, video_file is a string afterwards and I cannot use read to get chunks of the file.
chunk = video_file.read(4*1024*1024)
What is the right interface/tools to use in Ruby to open this file, and read it for N bytes at a time?
I suppose I would do:
chnk_size=4*1024*1024
f=File.open(fn, 'rb')
until f.eof?
chnk=f.read(chnk_size)
# process the chnk
end
Try something like this :
`FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end`

Reading a GZip compressed AVRO file in Ruby

I have a bunch of AVRO files which I've compressed externally using GZip.
I'm trying to read them in Ruby without decompressing them, but can't get it to work.
Solved it:
require 'avro'
def open_avro(file)
if file =~ /avro$/
Avro::DataFile.open(file)
elsif file =~ /avro\.gz$/
begin
$/ = ""
file = Zlib::GzipReader.open(file)
reader = Avro::IO::DatumReader.new(file, nil)
file.rewind # we need to rewind because DatumReader seeked thefile
avro = Avro::DataFile::Reader.new(StringIO.new(file.gets), reader)
end
return avro
end
end

Compress a complete directory in Ruby with zlib?

This is the code I'm trying.
require 'zlib'
Dir.glob('*.*').each do |file|
Zlib::GzipWriter.open('output.gz') do |gz|
gz.mtime = File.mtime(file)
gz.orig_name = File.basename(file)
gz.write IO.binread(file)
end
end
I've tried different variations of this. There doesn't seem to be a howto for "multiple files" online. I keep ending up with the first file name in the ouput.gz, and I think it may have the content of the last file from the directory (not sure). But that's besides the point. I just want to put each file as separate entities in a compressed file. The more cross platform compatible it is the better.
This answer is taken from http://old.thoughtsincomputation.com/posts/tar-and-a-few-feathers-in-ruby who took it from the RubyGems library.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#
First off, that will keep overwriting output.gz, leaving it containing only the last file compressed.
Second, the gzip format does not hold multiple files. It only holds one. You need to use the .tar.gz or .zip format. .zip is more "cross platform compatible". Take a look at rubyzip.

How can I copy the contents of one file to another using Ruby's file methods?

I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.

Ruby: How to replace text in a file?

The following code is a line in an xml file:
<appId>455360226</appId>
How can I replace the number between the 2 tags with another number using ruby?
There is no possibility to modify a file content in one step (at least none I know, when the file size would change).
You have to read the file and store the modified text in another file.
replace="100"
infile = "xmlfile_in"
outfile = "xmlfile_out"
File.open(outfile, 'w') do |out|
out << File.open(infile).read.gsub(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
end
Or you read the file content to memory and afterwords you overwrite the file with the modified content:
replace="100"
filename = "xmlfile_in"
outdata = File.read(filename).gsub(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
File.open(filename, 'w') do |out|
out << outdata
end
(Hope it works, the code is not tested)
You can do it in one line like this:
IO.write(filepath, File.open(filepath) {|f| f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)})
IO.write truncates the given file by default, so if you read the text first, perform the regex String.gsub and return the resulting string using File.open in block mode, it will replace the file's content in one fell swoop.
I like the way this reads, but it can be written in multiple lines too of course:
IO.write(filepath, File.open(filepath) do |f|
f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)
end
)
replace="100"
File.open("xmlfile").each do |line|
if line[/<appId>/ ]
line.sub!(/<appId>\d+<\/appId>/, "<appId>#{replace}</appId>")
end
puts line
end
The right way is to use an XML parsing tool, and example of which is XmlSimple.
You did tag your question with regex. If you really must do it with a regex then
s = "Blah blah <appId>455360226</appId> blah"
s.sub(/<appId>\d+<\/appId>/, "<appId>42</appId>")
is an illustration of the kind of thing you can do but shouldn't.

Resources