Using 7z to unzip a 50 GB file that contains about 600,000 files, the speed of extraction dramatically drops as the number of extracted files increases. It takes approximately 20 hours to unzip the entire batch.
I wrote a Linux script that runs in parallel to the 7z process. It moves unzipped files to another directory so that the 7z target directory never accumulates too many files. Doing this dropped the total time to about 30 minutes.
While this works, is there a better way to do this? Preferably built into 7z?
Running on Ubuntu 22.04.1 LTS
7-Zip [64] V16.02
Related
I have a 7zip file that I need to split into several smaller files so that I can put it on a FAT32 flash drive. However, unzipping it requires more space than I currently have. How can I split the 7zip file into several smaller files so that I can later unzip them on another computer?
You can use the split command to partition the file bytewise
split -b bytesize xyz.7z
To merge, simply run:
cat *
on all the pieces.
I have in my bash script tar command to create daily files backup. The code is:
tar czfP /path/to/file.tar.gz /path/to/source
It works, but tar.gz file has size of 14GB and when I untar that, the sum size of all files is about only 9GB.
I've found that I should be caused by white space blocks and as a solution I've read I should use the i argument, but the final filesize is the same when I use it (approx. 14GB).
tar czfPi /path/to/file.tar.gz /path/to/source
Where is the problem, it's really in using (in my code bad using) of i argument, or I'd need to upgrade my code any way?
Thanks
Roman
I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours.
Is there is a way to know which was the last of the csv files that was used by Stata? I would like to know if the code is somewhere near finishing. I thought that organizing the 750 files by "last used" would do the trick, but it does not.
Next time I should be more careful about signalling how the process is going...
Thank you
From the OS X terminal, cd to the directory containing the CSV files, and run the command
ls -lUt | head
which should show your files, sorted by the most recent access time, limited to the 10 most recently accessed.
On the most basic level you can use display and log your session:
clear
set more off
local myfiles file1 file2 file3
forvalues i = 1/3 {
display "processing file`i'"
// <do-something-with-file>
}
See also help log.
I'm using command line 7zip to zip up the contents of a folder (in Windows) thus:
7za a myzip.zip * -tzip -r
I've discovered that running exactly the same command line twice will produce two different ZIP files - they've got the same size but if you run a binary compare (ie fc /b file1.zip file2.zip) they are different. To complicate matters it seems that if you make the two zips in rapid succession then they are the same. But if you do them on different days or separated by a few hours they are not.
I presume that there's a date/time stamp in the ZIP file somewhere but I can't find anything on the 7zip site to confirm that.
Assuming I'm right does anyone know how to suppress the date/time? Or is something else causing the binaries to be different?
7-zip has the switch -m with parameter tc which has value on by default if not specified on command line.
With -mtc=on all 3 dates of a file stored on an NTFS partition are stored in the archive:
the last modification time,
the creation time, and also
the last access time.
See in help of 7-zip the page with title -m (Set compression Method) switch.
The last access time of the files is most likely responsible for the differences between the archive files.
You have to append -mtc=off to avoid storage of the NTFS timestamps in archive file.
I am downloading many folders and then splitting them into 1 Gb folder in windows OS. Right now, it is a manual process. Is there any windows command/program to split all the files into 1Gb folder?
You have the command split
split -b1G your_filename part