Issue when using System.IO.Compression and ZIP file with larger than 4Gb files inside - ziparchive

I've just spent few hours struggling to do a very simple task: enumerate zip file contents using C# and System.IO.Compression.
When I have zip files with small files inside, all is good.
When i use a zip file of 1.2gb which has a database file inside 4.8Gb in size i get "Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory." I've read and I've read a lot and i cant seem to find a way to work with my archive.
The code is:
string zipPath = #"E:\Path\Filename.zip";
using (ZipArchive archive = ZipFile.OpenRead(zipPath))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
Console.WriteLine(entry.FullName);
}
}
Is there a way, any way, to use System.IO.Compression with large zip file contents? All I want is to enumerate the contents of the archive and I do not want to use any 3rd party bits as I was it was suggested on other places.

Related

Dowloading files as a single .zip on windows server

a client have a download area where users can download or browse single files. Files are divided in folder, so there are documents, catalogues, newsletter and so on, and their extension can vary: they can be .pdf, .ai or simple .jpeg. He asked me if I can provide a link to download every item in a specific folder as a big, compressed file. Problem is, I'm on a Windows server, so I'm a bit clueless if there's a way. I can edit pe pages of this area, so I can include jquery and scripts with a little freedom. Any hint?
Windows archiver is TAR and you are needing to build a TARbALL (Historically all related files in one Tape ARchive)
I have a file server which is mapped as S:\ (it does not have TAR command, and Tar cannot use URL but can use device:)
For any folders contents (including sub folders) it is easy to remotely save all current files in a zip with a single command (for multiple root locations they need a loop or a list)
It will build the Tape Archive as a windows.zip using the -a (auto) switch but you need to consider the desired level of nesting by collect all contents at the desired root location.
TAR -a[other options] file.zip [folder / files]
Points to watch out for
ensure here is not an older archive
it will comment error/warnings like the two given during run, however, should complete without fail.
Once you have the zip file you can offer post as a web asset such as
<a href="\\server\folder\all.zip" download="all.zip">Get All<a>
for other notes see https://stackoverflow.com/a/68728992/10802527

Moving files inside a tar archive

I have a script that archives a mongo collection:
archive.tar.gz contains:
folder/file.bson
and I need to add a additional top level folder to that structure, example:
top-folder/folder/file.bson
It seems that one way is to unpack and re-pack everything but is there any other solution to this ?
The problem is that there's is a third party script that unpacks the archive and fetches the files from top-folder/folder/file.bson and in current formal, the path is wrong.
.tar.gz is actually what the name suggests - first tar converts a directory structure to a byte stream (i.e. a single file), and this byte stream is then compressed by gzip.
Which means that changing the file path inside the archive is equal to byte-editing a compressed data stream - an unnecessarily difficult thing to do without decompressing the stream.

Make: Dependency on newest file in directory

For a small project, I have the following workflow:
compile code and generate ./data and ./images
run code, which will write many files to ./data
generate images from the data files, place them in ./images
generate a video from the images
I have written a makefile, which can run the code, and compile it before, if necessary. But I don't know how to implement the dependencies of steps 3 and 4, and currently make that targets manually.
So, is there a way to check if e.g. the newest file in ./data is newer than the newest file in ./images ? It's not necessary to do this on a file-by-file basis, and the total number of data / image files is not known.
Typically the date of the directory is the date that the last file was added/modified, so you could use the timestamp on the directory itself for dependencies.
images : data
// generate images
Alternatively, if there is a mapping between the files in the two directories, you could do something like:
images/%.img: data/%.dat
// generate image...
which would prevent reprocessing data that's already been handled.

How to restore a folder structure 7Zip'd with split volume option?

I 7Zip'd a multi-gig folder which contained many folders each with many files using the split to volumes (9Meg) option. 7Zip created files of type .zip.001,
.zip.002, etc. When I extract .001 it appears to work correctly but I get an 'unexpected end of data' error. 7Zip does not automatically go to .002. When I extract .002, it also gives the same error and it does not continue the original folder/file structure. Instead it extracts a zip file in the same folder as the previously extracted files. How do I properly extract split files to obtain the original folder/file structure? Thank you.

What is the fastest way to unzip textfiles in Matlab during a function?

I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz
There are thousands of files which I need to analyze and high performance is important.
I have two ideas:
(1) Use an external program an call it from the command line in Matlab
(2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.
Does anyone knows a way to unzip these files as quick as possible from within Matlab?
Thanks!
You could always try the Matlab unzip() function:
unzip
Extract contents of zip file
Syntax
unzip(zipfilename)
unzip(zipfilename, outputdir)
unzip(url, ...)
filenames = unzip(...)
Description
unzip(zipfilename) extracts the archived contents of zipfilename into the current folder and sets the files' attributes, preserving the timestamps. It overwrites any existing files with the same names as those in the archive if the existing files' attributes and ownerships permit it. For example, files from rerunning unzip on the same zip filename do not overwrite any of those files that have a read-only attribute; instead, unzip issues a warning for such files.
Internally, this uses Java's zip library org.apache.tools.zip. If your zip archives each contain many text files it might be faster to drop down into Java and extract them entry by entry, without explicitly unzipped files. look at the source of unzip.m to get some ideas, and also the Java documentation.
I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.
[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.
Just write a new function that imitates basic MATLAB gunzip functionality:
function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); end
app_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];
system([app_path switches options '_' fullfilename]);
Then use it as you would use gunzip:
sunzip('/data/time_1000.out.gz',tmp_dir);
With MATLAB's toc timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:
gunzip: 10.15s
sunzip: 7.84s
worked well, just needed a minor change to Max's syntax calling the executable.
system([app_path switches ' ' fullfilename options ]);

Resources