how to split/rejoin the zip file using ruby - ruby

i am new to Ruby. Is there any way to split a large zip file & then again join the split files to one large zip file?
i can see a link with split sample, but can see an error while running(split object error)
split sample link
Can anyone help me in SPlit/join the zip filesin ruby?

The Zip::ZipFile.split isn't available in the latest rubyzip version 0.9.9. It exists only in the latest master branch of the source code. If you're finding a way to split a large file into small parts and join them later, or rather, you don't rely on the intermediate split results, you can try split of Unix/Linux. E.g. you want to use a USB drive to copy the small files and join them in another computer.
# each file will contain 1048576 bytes
# the file will be splitted into xaa, xab, xac...
# You can add optional prefix to the end of the command
split -b 1048576 large_input_file.zip
# join them some where after
cat x* >large_input_file.zip
The rubyzip gem provides a way to create multi-part zip files from a large zip file. You can use p7zip or WinRAR to unzip the split zip file parts. However, it's strange that unzip doesn't support multi-part zip files. The manual of unzip says,
Multi-part archives are not yet supported, except in conjunction with zip. (All parts must be concatenated together in order, and then zip -F'' (for zip 2.x) orzip -FF'' (for zip 3.x) must be performed on the concatenated archive in order to fix'' it. Also, zip 3.0 and later can combine multi-part (split) archives into a combined single-file archive usingzip -s- inarchive -O outarchive''. See the zip 3 manual page for more information.) This will definitely be corrected in the next major release.
If you want this, you can clone the latest master branch and use that lib to do the job.
$ git clone https://github.com/aussiegeek/rubyzip.git
$ vim split.rb
Then in your ruby file "split.rb":
$:.unshift './rubyzip/lib'
require 'zip/zip'
part_zip_count = Zip::ZipFile.split("large_zip_file.zip", 102400, false)
puts "Zip file splitted in #{part_zip_count} parts"
You can checkout the docs for split

Related

Extracting specific files with file extension from a .tar.xz archive using MacOS terminal

I have a number of compressed archives with the extension .tar.xz. I am advised that, when decompressed, the total size required is around 2TB.
Within the archives are a number of images that I am solely after.
Is there a method to solely extract files for example with the extensions .jpeg, .jpeg and .gif from the compressed archives without having to extract every file?
Thanks
It's trivial to just extract one of the file types; for example:
tar -xjf archive.tar.xz '*.jpeg'
will extract all files with the .jpeg extension. It's important to quote the *, as otherwise the shell would attempt to expand it, and would only try to match only the files that were found (or fail because there were no files with that name).
You can similarly use other patterns like '*.gif', or both together:
tar -xjf archive.tar.xz '*.jpeg' '*.gif'
Because you tag that you're using OSX, I'll skip the need to use the --wildcards option, which is needed when trying to extract only those files under linux.

how to get `zip` to add .zip extension when the filename is constructed from a variable

If I run...
$ myTest="bar"
$ zip -r foo-${myTest} path/*
...then I get a zip file named foo-bar.zip. (note the .zip extension!) However, if I run...
$ myTest="1.0.1"
$ zip -r foo-${myTest} path/*
...then I get a zip file named foo-1.0.1. (no .zip extension!)
I can obviously add .zip to my script, but I would like to understand what is going on here. Why doesn't zip add the extension when the filename is built from a variable with numbers in it?
It dawned on me as I wrote that last question that this isn't about numbers. Quoting from man zip:
If the name of the zip archive does not contain an extension, the extension .zip is added. If the name already contains an extension other than .zip, the existing extension is kept unchanged. However, split archives (archives split over multiple files) require the .zip extension on the last split.
The problem is that I have .'s in the variable, which zip interprets as filename extensions. Luckily, my script constructs the variable with .'s so I can confidently add .zip to the end. Otherwise, I would need to test for .'s to name the file correctly.

Extracting contents of many zipped folders into a single directory

Kind of easy question, but I can't find the answer. I want to extract the contents of multiple zipped folders into a single directory. I am using the bash console, which is the only tool available on the particular website I am using.
For example, I have two folders: a.zip (which contains a1.txt and a2.txt) and b.zip (which contains b1.txt and b2.txt). I want to get extract all four text files into a single directory.
I have tried
unzip \*.zip -d \newdirectory
But it creates two directories (a and b) with two text files in each.
I also tried concatenating the two zipped folders into one big folder and extracting it, but it still creates two directories, even when I specify a new directory.
I can't figure what I am doing wrong. Any help?
Thanks in advance!
Use the -j parameter to ignore any directory structure.
unzip -j -d /path/to/your/directory '*.zip*'

Replacing a file in a zip archive

Using Ruby (1.9.3) I need to replace a single file in a zip archive.
The situation is as follows. I have ~1000 zip archives that need to be updated, specifically one file in each of them needs to be replaced. The archives are all of the same structure. Is there a quick and dirty way for Ruby, or a library/gem for Ruby, to simply say "replace the file in this zip archive with this file on the filesystem"?
I'll work on a solution of my own in the meantime.
You can use the zip command, called from the ruby, which probably will be the best solution. From the zip manpage zip manpage
-d
--delete
Remove (delete) entries from a zip archive. For example:
zip -d foo foo/tom/junk foo/harry/\* \*.o
will remove the entry foo/tom/junk, all of the files that start with foo/harry/, and all of the files that end with .o (in any path). Note that shell path‐
name expansion has been inhibited with backslashes, so that zip can see the asterisks, enabling zip to match on the contents of the zip archive instead of the
contents of the current directory. (The backslashes are not used on MSDOS-based platforms.) Can also use quotes to escape the asterisks as in
zip -d foo foo/tom/junk "foo/harry/*" "*.o"
Not escaping the asterisks on a system where the shell expands wildcards could result in the asterisks being converted to a list of files in the current
directory and that list used to delete entries from the archive.
Under MSDOS, -d is case sensitive when it matches names in the zip archive. This requires that file names be entered in upper case if they were zipped by
PKZIP on an MSDOS system. (We considered making this case insensitive on systems where paths were case insensitive, but it is possible the archive came from
a system where case does matter and the archive could include both Bar and bar as separate files in the archive.) But see the new option -ic to ignore case
in the archive.
If you want a pure ruby solution take a look at ZipFileSystem
Zip::ZipFile looks promising. It appears to have a way to delete and add files to a zip archive.

What is the fastest way to unzip textfiles in Matlab during a function?

I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz
There are thousands of files which I need to analyze and high performance is important.
I have two ideas:
(1) Use an external program an call it from the command line in Matlab
(2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.
Does anyone knows a way to unzip these files as quick as possible from within Matlab?
Thanks!
You could always try the Matlab unzip() function:
unzip
Extract contents of zip file
Syntax
unzip(zipfilename)
unzip(zipfilename, outputdir)
unzip(url, ...)
filenames = unzip(...)
Description
unzip(zipfilename) extracts the archived contents of zipfilename into the current folder and sets the files' attributes, preserving the timestamps. It overwrites any existing files with the same names as those in the archive if the existing files' attributes and ownerships permit it. For example, files from rerunning unzip on the same zip filename do not overwrite any of those files that have a read-only attribute; instead, unzip issues a warning for such files.
Internally, this uses Java's zip library org.apache.tools.zip. If your zip archives each contain many text files it might be faster to drop down into Java and extract them entry by entry, without explicitly unzipped files. look at the source of unzip.m to get some ideas, and also the Java documentation.
I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.
[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.
Just write a new function that imitates basic MATLAB gunzip functionality:
function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); end
app_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];
system([app_path switches options '_' fullfilename]);
Then use it as you would use gunzip:
sunzip('/data/time_1000.out.gz',tmp_dir);
With MATLAB's toc timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:
gunzip: 10.15s
sunzip: 7.84s
worked well, just needed a minor change to Max's syntax calling the executable.
system([app_path switches ' ' fullfilename options ]);

Resources