Ruby streaming tar/gz - ruby

Basically I want to stream data from memory into a tar/gz format (possibly multiple files into the tar, but it should NEVER TOUCH THE HARDDRIVE, only streaming!), then stream them somewhere else (an HTTP request body in my case).
Anyone know of an existing library that can do this? Is there something in Rails?
libarchive-ruby is only a C wrapper and seems like it would be very platform-dependent (the docs want you to compile as an installation step?!).
SOLUTION:
require 'zlib'
require 'rubygems/package'
tar = StringIO.new
Gem::Package::TarWriter.new(tar) { |writer|
writer.add_file("a_file.txt", 0644) { |f|
(1..1000).each { |i|
f.write("some text\n")
}
}
writer.add_file("another_file.txt", 0644) { |f|
f.write("some more text\n")
}
}
tar.seek(0)
gz = Zlib::GzipWriter.new(File.new('this_is_a_tar_gz.tar.gz', 'wb')) # Make sure you use 'wb' for binary write!
gz.write(tar.read)
tar.close
gz.close
That's it! You can swap out the File in the GzipWriter with any IO to keep it streaming. Cookies for dw11wtq!

Take a look at the TarWriter class in rubygems: http://rubygems.rubyforge.org/rubygems-update/Gem/Package/TarWriter.html it just operates on an IO stream, which may be a StringIO.
tar = StringIO.new
Gem::Package::TarWriter.new(tar) do |writer|
writer.add_file("hello_world.txt", 0644) { |f| f.write("Hello world!\n") }
end
tar.seek(0)
p tar.read #=> mostly padding, but a tar nonetheless
It also provides methods to add directories if you need a directory layout in the tarball.
For reference, you could achieve the gzipping with IO.popen, just piping the data in/out of the system process:
http://www.ruby-doc.org/core-1.9.2/IO.html#method-c-popen
The gzipping itself would look something like this:
gzippped_data = IO.popen("gzip", "w+") do |gzip|
gzip.puts "Hello world!"
gzip.close_write
gzip.read
end
# => "\u001F\x8B\b\u0000\xFD\u001D\xA2N\u0000\u0003\xF3H\xCD\xC9\xC9W(\xCF/\xCAIQ\xE4\u0002\u0000A䩲\r\u0000\u0000\u0000"

Based on the solution OP wrote, I wrote fully on-memory tgz archive function what I want to use to POST to web server.
# Create tar gz archive file from files, on the memory.
# Parameters:
# files: Array of hash with key "filename" and "body"
# Ex: [{"filename": "foo.txt", "body": "This is foo.txt"},...]
#
# Return:: tar_gz archived image as string
def create_tgz_archive_from_files(files)
tar = StringIO.new
Gem::Package::TarWriter.new(tar){ |tar_writer|
files.each{|file|
tar_writer.add_file(file['filename'], 0644){|f|
f.write(file['body'])
}
}
}
tar.rewind
gz = StringIO.new('', 'r+b')
gz.set_encoding("BINARY")
gz_writer = Zlib::GzipWriter.new(gz)
gz_writer.write(tar.read)
tar.close
gz_writer.finish
gz.rewind
tar_gz_buf = gz.read
return tar_gz_buf
end

Related

Compressing using LZMA on-the-fly to a file?

This code compress on-the-fly data using a Bzip2 writer a csvfile.
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
I want to do the same using lzma algorithm and ruby-lzma gem could be useful, but this gem only one method compressed = LZMA.compress('data to compress').
Question:
Is there a way to do a similar compression using lzma?
Use ruby-xz which has a much better interface to liblzma (using FFI).
The lib has XZ::StreamWriter class. Check the docs for ruby-xz
However CSV constructor does not take the XZ::StreamWriter, so you need to change the code to use CSV.generate_line. I was able to run this, which does generate the file on the fly
require 'xz'
require 'csv'
arr = ['one', 'two', 'three']
File.open('file.xz', 'wb') do |f|
XZ::StreamWriter.new(f) do |writer|
(2**16).times { writer << CSV.generate_line(arr) }
writer.finish
end
end

Compress a complete directory in Ruby with zlib?

This is the code I'm trying.
require 'zlib'
Dir.glob('*.*').each do |file|
Zlib::GzipWriter.open('output.gz') do |gz|
gz.mtime = File.mtime(file)
gz.orig_name = File.basename(file)
gz.write IO.binread(file)
end
end
I've tried different variations of this. There doesn't seem to be a howto for "multiple files" online. I keep ending up with the first file name in the ouput.gz, and I think it may have the content of the last file from the directory (not sure). But that's besides the point. I just want to put each file as separate entities in a compressed file. The more cross platform compatible it is the better.
This answer is taken from http://old.thoughtsincomputation.com/posts/tar-and-a-few-feathers-in-ruby who took it from the RubyGems library.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#
First off, that will keep overwriting output.gz, leaving it containing only the last file compressed.
Second, the gzip format does not hold multiple files. It only holds one. You need to use the .tar.gz or .zip format. .zip is more "cross platform compatible". Take a look at rubyzip.

How can I copy the contents of one file to another using Ruby's file methods?

I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.

How do I copy file contents to another file?

As basic as this seems, I simply can't manage to copy the contents of one file to another. Here is my code thus far:
#!/usr/bin/ruby
Dir.chdir( "/mnt/Shared/minecraft-server/plugins/Permissions" )
flist = Dir.glob( "*" )
flist.each do |mod|
mainperms = File::open( "AwesomeVille.yml" )
if mod == "AwesomeVille.yml"
puts "Shifting to next item..."
shift
else
File::open( mod, "w" ) do |newperms|
newperms << mainperms
end
end
puts "Updated #{ mod } with the contents of #{ mainperms }."
end
Why copy the contents of one file to another? Why not use either the OS to copy the file, or use Ruby's built-in FileUtils.copy_file?
ri FileUtils.copy_file
FileUtils.copy_file
(from ruby core)
------------------------------------------------------------------------------
copy_file(src, dest, preserve = false, dereference = true)
------------------------------------------------------------------------------
Copies file contents of src to dest. Both of src and
dest must be a path name.
A more flexible/powerful alternate is to use Ruby's built-in FileUtils.cp:
ri FileUtils.cp
FileUtils.cp
(from ruby core)
------------------------------------------------------------------------------
cp(src, dest, options = {})
------------------------------------------------------------------------------
Options: preserve noop verbose
Copies a file content src to dest. If dest is a
directory, copies src to dest/src.
If src is a list of files, then dest must be a directory.
FileUtils.cp 'eval.c', 'eval.c.org'
FileUtils.cp %w(cgi.rb complex.rb date.rb), '/usr/lib/ruby/1.6'
FileUtils.cp %w(cgi.rb complex.rb date.rb), '/usr/lib/ruby/1.6', :verbose => true
FileUtils.cp 'symlink', 'dest' # copy content, "dest" is not a symlink
This works for me
IO.copy_stream mainperms, mod
§ copy_stream
I realize that this isn't the completely approved way, but
IO.readlines(filename).join('') # join with an empty string because readlines includes its own newlines
Will load a file into a string, which you can then output into newperms just like it was a string. There's good chance the reason this isn't working currently is that you are trying to write an IO handler to a file, and the IO handler isn't getting converted into a string in the way you want it to.
However, another fix might be
newperms << mainperms.read
Also, make sure you close mainperms before the script exits, as it might break something if you don't.
Hope this helps.

RSpec: how to test file operations and file content

In my app, I have the following code:
File.open "filename", "w" do |file|
file.write("text")
end
I want to test this code via RSpec. What are the best practices for doing this?
I would suggest using StringIO for this and making sure your SUT accepts a stream to write to instead of a filename. That way, different files or outputs can be used (more reusable), including the string IO (good for testing)
So in your test code (assuming your SUT instance is sutObject and the serializer is named writeStuffTo:
testIO = StringIO.new
sutObject.writeStuffTo testIO
testIO.string.should == "Hello, world!"
String IO behaves like an open file. So if the code already can work with a File object, it will work with StringIO.
For very simple i/o, you can just mock File. So, given:
def foo
File.open "filename", "w" do |file|
file.write("text")
end
end
then:
describe "foo" do
it "should create 'filename' and put 'text' in it" do
file = mock('file')
File.should_receive(:open).with("filename", "w").and_yield(file)
file.should_receive(:write).with("text")
foo
end
end
However, this approach falls flat in the presence of multiple reads/writes: simple refactorings which do not change the final state of the file can cause the test to break. In that case (and possibly in any case) you should prefer #Danny Staple's answer.
This is how to mock File (with rspec 3.4), so you could write to a buffer and check its content later:
it 'How to mock File.open for write with rspec 3.4' do
#buffer = StringIO.new()
#filename = "somefile.txt"
#content = "the content fo the file"
allow(File).to receive(:open).with(#filename,'w').and_yield( #buffer )
# call the function that writes to the file
File.open(#filename, 'w') {|f| f.write(#content)}
# reading the buffer and checking its content.
expect(#buffer.string).to eq(#content)
end
You can use fakefs.
It stubs filesystem and creates files in memory
You check with
File.exists? "filename"
if file was created.
You can also just read it with
File.open
and run expectation on its contents.
For someone like me who need to modify multiple files in multiple directories (e.g. generator for Rails), I use temp folder.
Dir.mktmpdir do |dir|
Dir.chdir(dir) do
# Generate a clean Rails folder
Rails::Generators::AppGenerator.start ['foo', '--skip-bundle']
File.open(File.join(dir, 'foo.txt'), 'w') {|f| f.write("write your stuff here") }
expect(File.exist?(File.join(dir, 'foo.txt'))).to eq(true)
end
end

Resources