Concatenate files on Windows, and reverse operation - windows

I'm currently trying to find a way to concatenate several files, typically all files from within a directory (recursive included) into a single stream, for further processing.
TAR looks like an obvious candidate, except that it is not at all standard in Windows, and unfortunately, all versions i could find (mostly variations of GNU TAR) are much too big (several hundreds of KB once included DLL dependencies). I need something much smaller.
Apparently, the standard COPY command could do the trick. For example the following command works:
COPY /B sourcefile1+sourcefile2 destinationfile
However, there are still 2 problems : I don't know how to write the result to stdout (for pipe), and even more importantly how to achieve the reverse operation ?
I need a small utility to do this concatenation job, either in C source code, a standard windows command, or as a distributable binary. It doesn't need to respect the TAR format (although it is not a bad thing if it does). And obviously the concatenation shall be reversible.

I suggest using 7-zip. It has portable version, can compress very good (or just copy without compression) all files recurse subdirectories and write output to single stream (stdout).
It has "-so" (write data to stdout) switch. For example,
7z x archive.gz -so > Doc.txt
decompresses archive.gz archive to output stream and then redirects that stream to Doc.txt file.
7z a -tzip -so -r src\*.cpp src\*.h > archive.zip
compresses the all *.cpp- and *.h- files in src directory and all it subdirectories to the 7-Zip standard output stream and writes that stream to archive.zip file (remove "> archive.zip" and intercept output by your program).

Why don't you use ZIP (disable compression if you want)? It's very standard, and support comes built into Windows. See Creating a ZIP file on Windows (XP/2003) in C/C++
Pure concatenation isn't reversible, because you can't know where to split it again. So you should use a directory of chunk sizes, such as exists in the ZIP and TAR formats.

Well, Shelwien's almost solved the issue.
The Tar version he proposes is "lean anough" (~120KB) and does not necessitate external DLL dependancies.
http://downloads.sourceforge.net/project/unxutils/unxutils/current/UnxUtils.zip
Unfortunately, it also has some problems of its own, such as no support for Unicode characters, interpreted escape sequence (so a directory name starting with t triggers a \t which is considered a tabulation), and a potential problem with pipe implementation under Windows XP (although on this last one it could come from the other program).
So that's a dead end.
A solution is still to be found...
[Edit] Shelwien just provided a solution by creating "shar", a tar replacement much smaller and much more efficient, without the limitations described above. This solve the issue.

Related

7zip produces different output from identical input

I'm using command line 7zip to zip up the contents of a folder (in Windows) thus:
7za a myzip.zip * -tzip -r
I've discovered that running exactly the same command line twice will produce two different ZIP files - they've got the same size but if you run a binary compare (ie fc /b file1.zip file2.zip) they are different. To complicate matters it seems that if you make the two zips in rapid succession then they are the same. But if you do them on different days or separated by a few hours they are not.
I presume that there's a date/time stamp in the ZIP file somewhere but I can't find anything on the 7zip site to confirm that.
Assuming I'm right does anyone know how to suppress the date/time? Or is something else causing the binaries to be different?
7-zip has the switch -m with parameter tc which has value on by default if not specified on command line.
With -mtc=on all 3 dates of a file stored on an NTFS partition are stored in the archive:
the last modification time,
the creation time, and also
the last access time.
See in help of 7-zip the page with title -m (Set compression Method) switch.
The last access time of the files is most likely responsible for the differences between the archive files.
You have to append -mtc=off to avoid storage of the NTFS timestamps in archive file.

How to tar a folder while files inside the folder might being written by some other process

I am trying to create a script for cron job. I have around 8 GB folder containing thousands of files. I am trying to create a bash script which first tar the folder and then transfer the tarred file to ftp server.
But I am not sure while tar is tarring the folder and some other process is accessing files inside it or writing to the files inside it.
Although its is fine for me if the tarred file does not contains that recent changes while the tar was tarring the folder.
suggest me the proper way. Thanks.
tar will hapilly tar "whatever it can". But you will probably have some surprises when untarring, as tar also stored the size of the file it tars, before taring it. So expect some surprises.
A very unpleasant surprise would be : if the size is truncated, then tar will "fill" it with "NUL" characters to match it's recorded size... This can give very unpleasant side effects. In some cases, tar, when untarring, will say nothing, and silently add as many NUL characters it needs to match the size (in fact, in unix, it doesn't even need to do that : the OS does it, see "sparse files"). In some cases, if truncating occured during the taring of the file, tar will complain it encounters an Unexpected End of File when untarring (as it expected XXX bytes but only reads fewer than this), but will still say that the file should be XXX bytes (and the unix OSes will then create it as a sparse file, with "NUL" chars magically appended at the end to match the expected size when you read it).
(to see the NUL chars : an easy way is to less thefile (or cat -v thefile | more on a very old unix. Look for any ^#)
But on the contrary, if files are only appended to (logs, etc), then the side effect is less problematic : you will only miss some bits of them (which you say you're ok about), and not have that unpleasant "fill with NUL characters" side effects. tar may complain when untarring the file, but it will untar it.
I think tar failed (so do not create archive) when an archived file is modified during archiving. As Etan said, the solution depends on what you want finally in the tarball.
To avoid a tar failure, you can simply COPY the folder elsewhere before to call tar. But in this case, you cannot be confident in the consistency of the backuped directory. It's NOT an atomic operation, so some files will be todate while other files will be outdated. It can be a severe issue or not follow your situation.
If you can, I suggest you configure how these files are created. For example: "only recent files are appended, files older than 1 day are never changed", in this case you can easily backup only old files and the backup will be consistent.
More generally, you have to accept to loose last data AND be not consistant (each files is backup at a different date), or you have to act at a different level. I suggest :
Configure the software that produces the data to choose a consistency
Or use OS/Virtualization features. For example it's possible to do consistent snapshot of a storage on some virtual storage...

PVRTexTool, is there a way to run it on multiple files at once?

I am using PVRTexTool to convert png files to pvr files but the tool seems to only be able to run on one file at a time(wont accept *.png as file name).
does anyone know how to run it on a group of files at once?
Its really a hassle to run it on all of my textures.
In a shell, run
for file in *.png ; do
PVRTexToll $file
done
(I don't know how to call PVRTeXTool from a command line, so please substitute the second line with a correct version)
This is a general way to feed each file to a command which only accepts one file at a time. See any introduction on shell scripting, e.g. this discussion of the for loop.

Compress command results in corrupted zip file

I have a script set up to rotate some log files in windows, and as part of the process I'd like it to automatically compress the rotated file. To do this I use the command
compress source.file destination.file.zip
However, if I try to open the file, I get the message "The Compressed (zipped) Folder is invalid or corrupted"
I've tried compress with -Z, and I get the same message. What am I doing wrong?
compress output is not ZIP file format compatible, it uses the LZW algorithm.
The only way to "open" a compressed file is with uncompress or gunzip.
Windows ports of common Unix commands, including compress and gzip/gunzip available here.
EDIT: To produce ZIP files from the command line in Windows, you can use something like 7-Zip, which includes a command line application (7z.exe). The Unix commands linked above also include zip.exe for manipulating ZIP files from the command line.

What are these stray zero-byte files extracted from tarball? (OSX)

I'm extracting a folder from a tarball, and I see these zero-byte files showing up in the result (where they are not in the source.) Setup (all on OS X):
On machine one, I have a directory /My/Stuff/Goes/Here/ containing several hundred files.
I build it like this
tar -cZf mystuff.tgz /My/Stuff/Goes/Here/
On machine two, I scp the tgz file to my local directory, then unpack it.
tar -xZf mystuff.tgz
It creates ~scott/My/Stuff/Goes/, but then under Goes, I see two files:
Here/ - a directory,
Here.bGd - a zero byte file.
The "Here.bGd" zero-byte file has a random 3-character suffix, mixed upper and lower-case characters. It has the same name as the lowest-level directory mentioned in the tar-creation command. It only appears at the lowest level directory named. Anybody know where these come from, and how I can adjust my tar creation to get rid of them?
Update: I checked the table of contents on the files using tar tZvf: toc does not list the zero-byte files, so I'm leaning toward the suggestion that the uncompress machine is at fault. OS X is version 10.5.5 on the unzip machine (not sure how to check the filesystem type). Tar is GNU tar 1.15.1, and it came with the machine.
You can get a table of contents from the tarball by doing
tar tZvf mystuff.tgz
If those zero-byte files are listed in the table of contents, then the problem is on the computer making the tarball. If they aren't listed, then the problem is on the computer decompressing the tarball.
I can't replicate this on my 10.5.5 system.
So, for each system:
what version of OSX are you using?
what filesystem is in use?
I have not seen this particular problem before with tar. However, there is another problem where tar bundles metadata files with regular files (they have the same name but are prefixed with "._"). The solution to this was to set the environment variable COPYFILE_DISABLE=y. If those weird files you have are more metadata files, maybe this would solve your problem as well?
Failing that, you could try installing a different version of tar.
On my MacOS X (10.4.11) machine, I sometimes acquire files .DS_Store in a directory (but these are not empty files), and I've seen other hidden file names on memory sticks that have been used on the Mac. These are somehow related to the Mac file system. I'd guess that what you are seeing are related to one or the other of these sets of files. Original Macs (MacOS 9 and earlier) had data forks and resource forks for files.
A little bit of play shows that a directory in which I've never used Finder has no .DS_Store file; if I use the Finder to navigate to that directory, the .DS_Store file appears. It presumably contains information about how the files should appear in the Finder display (so if you move files around, they stay where you put them).
This doesn't directly answer your question; I hope it gives some pointers.
I don't know (and boy is this a hard problem to Google for!), but here's a troubleshooting step: try tar without Z. That will determine whether compress or tar is causing the issue.

Resources