Compress a complete directory in Ruby with zlib? - ruby

This is the code I'm trying.
require 'zlib'
Dir.glob('*.*').each do |file|
Zlib::GzipWriter.open('output.gz') do |gz|
gz.mtime = File.mtime(file)
gz.orig_name = File.basename(file)
gz.write IO.binread(file)
end
end
I've tried different variations of this. There doesn't seem to be a howto for "multiple files" online. I keep ending up with the first file name in the ouput.gz, and I think it may have the content of the last file from the directory (not sure). But that's besides the point. I just want to put each file as separate entities in a compressed file. The more cross platform compatible it is the better.

This answer is taken from http://old.thoughtsincomputation.com/posts/tar-and-a-few-feathers-in-ruby who took it from the RubyGems library.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#

First off, that will keep overwriting output.gz, leaving it containing only the last file compressed.
Second, the gzip format does not hold multiple files. It only holds one. You need to use the .tar.gz or .zip format. .zip is more "cross platform compatible". Take a look at rubyzip.

Related

RubyZip docx issues with write_buffer instead of open

I'm adapting the RubyZip recursive zipping example (found here) to work with write_buffer instead of open and am coming across a host of issues. I'm doing this because the zip archive I'm producing has word documents in it and I'm getting errors on opening those word documents. Therefore, I'm trying the work-around that RubyZip suggests, which is using write_buffer instead of open (example found here).
The problem is, I'm getting errors because I'm using an absolute path, but I'm not sure how to get around that. I'm getting the error "#//', name must not start with />"
Second, I'm not sure what to do to mitigate the issue with word documents. When I used my original code, which worked and created an actual zip file, any word document in that zip file had the following error upon opening: "Word found unreadable content in Do you want to recover the contents of this document? If you trust the source of this document, click Yes." The unreadable content error is the reason why I went down the road of attempting to use write_buffer.
Any help would be appreciated.
Here is the code that I'm currently using:
require 'zip'
require 'zip/zipfilesystem'
module AdvisoryBoard
class ZipService
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
path = ""
buffer = Zip::ZipOutputStream.write_buffer do |zipfile|
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
#file = nil
#data = nil
if !File.directory?(disk_file_path)
#file = File.open(disk_file_path, "r+b")
#data = #file.read
unless [#output_file, #input_dir].include?(e)
zipfile.put_next_entry(e)
zipfile.write #data
end
#file.close
end
end
zipfile.put_next_entry(#output_file)
zipfile.put_next_entry(#input_dir)
end
File.open(#output_file, "wb") { |f| f.write(buffer.string) }
end
end
end
I was able to get word documents to open without any warnings or corruption! Here's what I ended up doing:
require 'nokogiri'
require 'zip'
require 'zip/zipfilesystem'
class ZipService
# Initialize with the directory to zip and the location of the output archive.
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
::Zip::File.open(#output_file, ::Zip::File::CREATE) do |zipfile|
write_entries entries, '', zipfile
end
end
private
# A helper method to make the recursion work.
def write_entries(entries, path, zipfile)
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
if File.directory? disk_file_path
recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
else
put_into_archive(disk_file_path, zipfile, zipfile_path, e)
end
end
end
def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
zipfile.mkdir zipfile_path
subdir = Dir.entries(disk_file_path) - %w[. ..]
write_entries subdir, zipfile_path, zipfile
end
def put_into_archive(disk_file_path, zipfile, zipfile_path, entry)
if File.extname(zipfile_path) == ".docx"
Zip::File.open(disk_file_path) do |zip|
doc = zip.read("word/document.xml")
xml = Nokogiri::XML.parse(doc)
zip.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
end
zipfile.add(zipfile_path, disk_file_path)
else
zipfile.add(zipfile_path, disk_file_path)
end
end
end

Create zip archive without save archiving file to disk in Ruby

I tried to create zip archive without save archiving file to disk. So First I write method with save to disk:
begin
file = Zip::File.open("#{file_name}.zip", Zip::File::CREATE)
save_file file_name
file.add(file_name, file_name)
rescue IOError => e
puts "Error: #{e}"
ensure
file.close unless file.nil?
File.delete file_name
end
This work fine but before create save archiving file.
Second I tried to write this code, first create StringIO zip archive with file witch I need, second I cant save them to disk in bin mode:
string_io = Zip::OutputStream.write_buffer do |zos|
zos.put_next_entry(file_name)
zos.write dictionary.join(', ')
end
# Something wrong below
File.open("#{file_name}.zip", 'wb') do |file|
file.write string_io
file.close
end
What a do wrong? and how to do it right way?
Found!
string_io = Zip::OutputStream.write_buffer do |zos|
zos.put_next_entry(file_name)
zos.write dictionary.join(', ')
end
# Rewind
string_io.rewind
# Write simply to file in bin mode
IO.write("#{file_name}.zip", string_io.sysread)

Uncompress a .gz file and ready it chunk by chunk

I have a number of quite big .gz files that I want to read. But I don't want to read each file all at once because it may hurt RAM, instead I want to read it chunk by chunk. How can I do that? In the documentation it describes a traditional approach by reading a whole file:
Zlib::GzipReader.open('hoge.gz') do |gz|
print gz.read
end
File.open('hoge.gz') do |f|
gz = Zlib::GzipReader.new(f)
print gz.read
gz.close
end
No examples of this anywhere. I needed to read the documentation.
require 'zlib'
def read_gz_by_chunk
infile = open("file_name.gz")
rgz = Zlib::GzipReader.new(infile)
while(!rgz.eof)
data = rgz.readpartial(256)
# do stuff
puts data
end
end

Ruby: check if a .zip file exists, and extract

2 small questions to create the effect I'm looking for.
How do I check if a file exists within a directory with the extension of .zip?
If it does exist I need to make a folder with the same name as the .zip without the .zip extension for the folder.
Then I need to extract the files into the folder.
Secondly, what do I do if there are more than one .zip files in the folder?
I'm doing something like this and trying to put it into ruby
`mkdir fileNameisRandom`
`unzip fileNameisRandom.zip -d fileNameisRandom`
On a similar post I found something like
Dir.entries("#{Dir.pwd}").select {|f| File.file? f}
which I know checks all files within a directory and makes sure they are a file.
The problem is I don't know how to make sure that it is only an extension of .zip
Also, I found the Glob function which checks the extension of a filename from: http://ruby-doc.org/core-1.9.3/Dir.html
How do I ensure the file exists in that case, and if it doesn't I can print out an error then.
From the comment I now have
if Dir['*.zip'].first == nil #check to see if any exist
puts "A .zip file was not found"
elsif Dir['*.zip'].select {|f| File.file? f} then #ensure each of them are a file
#use a foreach loop to go through each one
Dir['*.zip'].select.each do |file|
puts "#{file}"
end ## end for each loop
end
Here's a way of doing this with less branching:
# prepare the data
zips= Dir['*.zip'].select{ |f| File.file? }
# check if data is sane
if zips.empty?
puts "No zips"
exit 0 # or return
end
# process data
zips.each do |z|
end
This pattern is easier to follow for fellow programmers.
You can also do it using a ruby gem called rubyzip
Gemfile:
source 'https://rubygems.org'
gem 'rubyzip'
run bundle
unzip.rb:
require 'zip'
zips= Dir['*.zip'].select{ |f| File.file? }
if zips.empty?
puts "No zips"
exit 0 # or return
end
zips.each do |zip|
Zip::File.open(zip) do |files|
files.each do |file|
# write file somewhere
# see here https://github.com/rubyzip/rubyzip
end
end
end
I finally pieced together different information from tutorials and used #rogerdpack and his comment for help.
require 'rubygems/package'
#require 'zlib'
require 'fileutils'
#move to the unprocessed directory to unpack the files
#if a .tgz file exists
#take all .tgz files
#make a folder with the same name
#put all contained folders from .tgz file inside of similarly named folder
#Dir.chdir("awaitingApproval/")
if Dir['*.zip'].first == nil #check to see if any exist, I use .first because Dir[] returns an array
puts "A .zip file was not found"
elsif Dir['*.zip'].select {|f| File.file? f} then #ensure each of them are a file
#use a foreach loop to go through each one
Dir['*.zip'].select.each do |file|
puts "" #newlie for each file
puts "#{file}" #print out file name
#next line based on `mkdir fileNameisRandom`
`mkdir #{Dir.pwd}/awaitingValidation/#{ File.basename(file, File.extname(file)) }`
#next line based on `unzip fileNameisRandom.zip -d fileNameisRandom`
placement = "awaitingValidation/" + File.basename(file, File.extname(file))
puts "#{placement}"
`sudo unzip #{file} -d #{placement}`
puts "Unzip complete"
end ## end for each loop
end

How can I copy the contents of one file to another using Ruby's file methods?

I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.

Resources