is 7zip faster archiving (solid or not solid)? - 7zip

I am creating a .7z file, with the -ms=on flag, which is supposed to result in a solid archive. But the listing of the archive, shows that solid is off.
But my really question is what is the fastest way to archive with 7zip, solid or not solid.
I really don't care about compression. What I want is the fastest elapsed time - for creating the archive and especially for the unpack of the archive. And I heard that solid .7z is very fast for the unpack. I am using Powershell to do the commands. (the resulting archive is about 760MB and about 176K files). It is taking me about 12 minutes to create and 8 minutes to unpack.
[string]$zipper = "$($Env:ProgramFiles)\7-Zip\7z.exe"
[Array]$archive = "C:\zip\GL.7z"
[Array]$flags = "a","-t7z","-mx0","-mmt=on","-ms=on", "-r"
[Array]$skip = "-xr!.svn","-xr!.vs","-xr!bin","-xr!obj","-xr!Properties","-x!*.csproj","-x!*.user","-x!*.sln","-x!*.suo","-x!web.config","-x!web.*.config"
$ElapsedTime = [System.Diagnostics.Stopwatch]::StartNew()
echo "Toby..."
[Array]$in = "C:\wwwroot\Toby"
[Array]$cmd = $flags + $archive + $in + $skip
& $zipper $cmd

plushpuffin was correct, solid archives are only created if you have compression on e.g. (-mx1).
Here is the timing of what it took to do compress and uncompress
The original is 950MB of disk space in 176K files, mostly JPG.
7z uncompressed, not-solid,-mx0
size: 728 MB
pack: 12:28
unpack: 9:28
7z compressed, solid -mx1
size: 555 MB
pack: 18:18
unpack: 9:13
7z compressed, solid -mx1 -mmt=off (single thread)
size: 555 MB
pack: 22:48
unpack: 10:32

It depends on how you want to use the archive. If you're wanting smallest archive possible and will want ALL the files in the archive at once (all or nothing), and you're okay with losing everything if any part of it is corrupted (or adding blocking and recovery code, making the file larger), then 7z would be fine.
If you want to look at one or several files (rather than decompressing the entire archive) or to add, change, or remove files, then you would want the.zip format because it can do that. It can decompress just the file name index so you can choose one or more files to extract. .7z would have to decompress the entire archive every single time and then re-compress the entire archive, if that were needed (adding a file, for instance).
Also, if there is damage to a file in the archive, with .zip, the remaining files are likely still recoverable. Unless 7z is compressed with blocking and recovery code added, then the entire archive of files would likely be lost.
For my use, I think that the .zip format offers more data safety than 7z and much better DE-compress speed for a file or two, or adding or deleting files.
The safety of my data and ease/speed of handling single file viewing, adding, etc. are my primary considerations. So I prefer the non-solid zip format.
Plus ZIP is not a proprietary format. You won't lose access to your archives if you later decide to switch to different archive software.
Just my two cents,
AnneF

Related

Why can making a compressed archive (tar + pigz) result in a file that is substantially larger than the sum of the original files?

I have about 700 folders containing relatively heavy files (10-40gb each), and I'm trying to make a compressed archive out of them.
When I use tar cf - ~/folder/* | pigz -p 30 > ~/folder/archive.tar.gz, I run out of my storage space (~13tb), which is larger than the size of my original files.
If I just tar the original files without compressing them, the resulting archive is about the same size as the sum of the original files (~8tb).
I tried compressing a small subset of my folders, and while pigz did not reduce the size by much, at least it did not increase the size...
What can be the issue here?
ps Not sure if it's relevant, but some of the heavy files in my folders are in a compressed format

Winrar Settings, Batchfile or any other possibility to Monitor and create Archives automated

i searched in Winrar Settings but coudn't the Settings which i need, is there anyway to Setup Winrar to Monitor a specific Folder, and every Folder that gets put in that specific Folder should be packed to a sperate Archive, i don't need any compression, the Archive has to have:
Same Name as the Folder
Size Should not be bigger then 0,99 GB (1.073.000.000 bytes) if Size is bigger then the Archive should be splitted in max. 0,99 GB (1.073.000.000 bytes) Parts
Every Archive Should be Password protected (Same Password for every Archive)
Is it also Possible to set a command like Put in Every Archive the Content which is located in C:\FolderX
After successful creating the Archvies only the created Archive/ Parts should remain the the Folders and rest should be moved to Recycle Bin
C:\FolderX [Every File which is here Should be Put in Every Archvie which will be created]
C:\MainFolder [Every Folder which is here should be created to an Archvie like]
C:\MainFolder\Folder1 = 500 MB -> Folder1.rar = 500mb Password = xyz
C:\MainFolder\Folder2 = 1500 MB -> Folder2.part1.rar = 0,99 GB, Folder2.part2.rar = 427 MB Password = xyz
C:\MainFolder\Folder3 = 3500 MB -> Folder3.part1.rar = 0,99 GB, Folder3.part2.rar = 0,99 GB, Folder3.part3.rar = 0,99 GB, Folder3.part4.rar = 281 MB Password = xyz
When the Archvies are extracted they Should be like just 1 Folder with the content Folder1\textfile1, textfile2 etc
/ NO SubFolders Like Folder1\Fodler1\textfile1, textfile2
Winrar should only run process at time the rest should be put in queue.
Can this be done with a Batchfile and use the Batchfile as a Windows Service?
My OS is Windows 2019 Standard, i hope you guys can understand my English.
best regards
jimboy
WinRAR does not do that as it is for Windows, instead use RAR.exe (in the same folder as WinRAR.exe). The RAR manual is RAR.txt
RAR for Windows Command Line works with Batch files so can be tested and run from Windows by typing Cmd in the search bar.
It also works much better with the more advanced Windows Powershell.
PS. My RAR Command Line Secrets course will be available in 2021, you can register for Priority access at bit.ly/RCLSw-rN for more information about this affordable course.

How can I compare the file sizes match between duplicate directories?

I need to compare two directories to validate a backup.
Say my directory looks like the following:
Filename Filesize Filename Filesize
user#main_server:~/mydir/ user#backup_server:~/mydir/
file1000.txt 4182410737 file1000.txt 4182410737
file1001.txt 8241410737 - <-- missing on backup_server!
... ...
file9999.txt 2410418737 file9999.txt 1111111111 <-- size != main_server
Is there a quick one liner that would get me close to output like:
Invalid Backup Files:
file1001.txt
file9999.txt
(with the goal to instruct the backup script to refetch these files)
I've tried to get variations of the following to no avail.
[main_server] $ rsync -n ~/mydir/ user#backup_server:~/mydir
I cannot do rsync to backup the directories itself because it takes way too long (8-24hrs). Instead I run multiple threads of scp to fetch files in batches. This completes regularly <1hr. However, occasionally I find a few files that were somehow missed (perhaps dropped connection).
Speed is a priority, so file sizes should be sufficient. But I'm open to including a checksum, provided it doesn't slow the process down like I find with rsync.
Here's my test process:
# Generate Large Files (1GB)
for i in {1..100}; do head -c 1073741824 </dev/urandom >foo-$i ; done
# SCP them from src to dest
for i in {1..100}; do ( scp ~/mydir/foo-$i user#backup_server:~/mydir/ & ) ; sleep 0.1 ; done
# Confirm destination has everything from source
# This is the point of the question. I've tried:
rsync -Sa ~/mydir/ user#backup_server:~/mydir
# Way too slow
What do you recommend?
By default, rsync uses the quick check method which only transfers files that differ in size or last-modified time. As you report that the sizes are unchanged, that would seem to indicate that the timestamps differ. Two options to handlel this are:
Use -p to preserve timestamps when transferring files.
Use --size-only to ignore timestamps and transfer only files that differ in size.

file's date changes after zip in and out again, according to XCOPY

So, here's the problem: I have files which are regular files, and they are put into a ZIP file (see below for details on ZIP). Then I unzip them (see below for details on the tool used), and the files are restored. The date of the file is restored, as in standard in the ZIP/UNZIP tools used. When querying using DIR, or in Windows Explorer, the files involved have the same date as they had, before being handled by the ZIP/UNZIP process.
So, all OK.
But then, I'm using the XCOPY /D command, to further manipulate different copies of those files on the disk ... and, XCOPY says : one file is newer than the other one. Given the fact the date, hour, up until minutes is the same .. the difference would be regarding a smaller entity, like seconds ?
All involved disks have NTFS file system.
Example:
C:\my>dir C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd
Volume in drive C is mycomputerC
Volume Serial Number is 1234-5678
Directory of C:\windows
31/12/2014 19:50 51 Background_mycomputer.cmd
1 File(s) 51 bytes
Directory of C:\my\directory
31/12/2014 19:50 51 Background_mycomputer.cmd
1 File(s) 51 bytes
0 Dir(s) 33.655.316.480 bytes free
C:\my>xcopy C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd /D
Overwrite C:\my\directory\Background_mycomputer.cmd (Yes/No/All)? y
C:\windows\Background_mycomputer.cmd
1 File(s) copied
C:\my>xcopy C:\my\directory\Background_mycomputer.cmd C:\windows\Background_mycomputer.cmd /D
0 File(s) copied
C:\my>xcopy C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd /D
0 File(s) copied
C:\my>unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with Microsoft C 13.10 (Visual C++ 7.1) for
Windows 9x / Windows NT/2K/XP/2K3 (32-bit) on Apr 20 2009.
UnZip special compilation options:
ASM_CRC
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
NTSD_EAS
SET_DIR_ATTRIB
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars] (handle UTF-8 paths)
MBCS-support (multibyte character support, MB_CUR_MAX = 1)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.5, 10-Dec-2007)
VMS_TEXT_CONV
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
C:\my>ver
Microsoft Windows [Version 6.1.7601]
C:\my>zip -?
Copyright (c) 1990-2006 Info-ZIP - Type 'zip "-L"' for software license.
Zip 2.32 (June 19th 2006). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
The default action is to add or replace zipfile entries from list, which
can include the special name - to compress standard input.
If zipfile and list are omitted, zip compresses stdin to stdout.
-f freshen: only changed files -u update: only changed or new files
-d delete entries in zipfile -m move into zipfile (delete files)
-r recurse into directories -j junk (don't record) directory names
-0 store only -l convert LF to CR LF (-ll CR LF to LF)
-1 compress faster -9 compress better
-q quiet operation -v verbose operation/print version info
-c add one-line comments -z add zipfile comment
-# read names from stdin -o make zipfile as old as latest entry
-x exclude the following names -i include only the following names
-F fix zipfile (-FF try harder) -D do not add directory entries
-A adjust self-extracting exe -J junk zipfile prefix (unzipsfx)
-T test zipfile integrity -X eXclude eXtra file attributes
-! use privileges (if granted) to obtain all aspects of WinNT security
-R PKZIP recursion (see manual)
-$ include volume label -S include system and hidden files
-e encrypt -n don't compress these suffixes
C:\my>
Question: I do not want XCOPY to make updates where I know they are invalid cause the time format is doing something wrong. How do I prevent that ?
From how I see, there's different things involved, being XCOPY, very specific ZIP and UNZIP, and NTFS file system. Which one is doing something wrong ?
I must stress that apart from ZIP and UNZIP, there are no other changes done to the file, like changing 1 file, then making a change to another one, in less than 60 seconds time.
At moment of test, the time shown was NOT the current time, and not close to it either. No file is adjusting to the current time, the times refer to last changes of the file in question, which may be any time in the past. In this case, it's one day later, but it can be anything.
I noticed the peculiar behavior Raymond Chen describes when writing a Powershell script (GitHub link) to freshen a zip archive using the System.IO.Compression and System.IO.Compression.FileSystem libraries.
Interestingly, Zip archives can store multiple copies of the same file with identical metadata (name, relative path, modification dates). Extracting the second copy of the file will fail in Windows Explorer because the file already exists.
When trying to prevent re-zipping a file was already archived, I checked the relative path and date, and noticed that there was a discrepancy of up to two seconds in the LastWriteTime. This workaround compensates for the loss of precision:
$AlreadyArchivedFile = ($WriteArchive.Entries | Where-Object {#zip will store multiple copies of the exact same file - prevent this by checking if already archived.
(($_.FullName -eq $RelativePath) -and ($_.Length -eq $File.Length) ) -and
([math]::Abs(($_.LastWriteTime.UtcDateTime - $File.LastWriteTimeUtc).Seconds) -le 2) #ZipFileExtensions timestamps are only precise within 2 seconds.
})
Also, the IsDaylightSavingTime flag is not stored in the Zip archive. As a result I was surprised when extracted files became an hour newer than the original archived file. I tried this several times and saw the extracted file's timestamp incremented by an hour every time it was compressed and extracted.
Here's a very ugly workaround that decreases the archived file time by one hour to make the original source file and extracted file timestamps consistent:
If($File.LastWriteTime.IsDaylightSavingTime() -and $ArchivedFile){#HACK: fix for buggy date - adds an hour inside archive when the zipped file was created during PDT (files created during PST are not affected).
$entry = $WriteArchive.GetEntry($RelativePath)
$entry.LastWriteTime = ($File.LastWriteTime.ToLocalTime() - (New-TimeSpan -Hours 1))
}
There's probably a better way to handle this. Unfortunately I'm not aware of any way to store a Daylight Savings indicator for a file in a .Zip archive, and that information is lost.

Is rsync really any faster on files that have changed?

Why can't I trust rsync to be minimally as fast as cp? (I'm ignoring negligible differences for overhead.)
It seems to me like rsync is fairly slow on files with no content difference, but a changed timestamp.
If I make a file: cp -a testfile-100M destfile
And then I rsync them, I get what you would expect:
$ rsync -av testfile-100M destfile
sending incremental file list
sent 56 bytes received 12 bytes 8.00 bytes/sec
total size is 104857600 speedup is 1542023.53
But that's just because rsync is checking the size and the timestamp and skipping the file. What if I just change the timestamp?
$ touch testfile-100M
$ rsync -av testfile-100M destfile sending incremental file list
testfile-100M
sent 104870495 bytes received 31 bytes 113804.15 bytes/sec
total size is 104857600 speedup is 1.00
Also note that even though the speedup is 1, the inital copy took about 1/4 the time to complete than the final rsync, even though the contents are exactly the same. So what's going on here? Is it just all the overhead of doing the comparisons?
If that's the case, then when does rsync ever provide a performance advantage? Only when files are exactly the same on both sides?
For local files, if the size or mtime have changed, rsync by default just copies the whole thing without using its delta algorithm. You can turn this off with the --no-whole-file option, but for local copies this will typically be slower.
For the specific case of touching a file without changing it:
If you give the --size-only option, it will assume that files that have the same size are unchanged.
If you give the --checksum option, it will first hash the file to see if anything has changed, before copying it.
When the source and destination are both locally mounted filesystems rsync just copies the file(s) if the timestamps or sizes don't match. Rsync wins where you have large files with small differences and they are on machines separated by a low bandwidth link.
EDIT: Since someone felt the need to downvote this ancient answer... As to why rsync on local files might be slower than cp, there does not seem to be any good reason.
It appears the answer is that rsync does some extra steps in order to keep files in a consistent state, and not in a "partial-transferred" state while operating. Using the --inplace option removes this overhead.
Interestingly, for me rsync is about 4× faster than cp for copying to an external USB drive.

Resources