Interesting usage of tar... but what is happening? - bash

I saw the following interesting usage of tar in a co-worker's Bash scripts:
`tar cf - * | (cd <dest> ; tar xf - )`
Apparently it works much like rsync -av does, but faster. The question arises, how?
-m
EDIT: Can anyone explain why should this solution be preferable over the following?
cp -rfp * dest
Is the former faster?

It writes the archive to standard output, then pipes it to a subprocess -- wrapped by the parentheses -- that changes to a different directory and reads/extracts from standard input. That's what the dash character after the f argument means. It's basically copying all the visible files and subdirectories of the current directory to another directory.

On the difference between cp and tar to copy the directory hierarchies, a simple experiment can be conducted to show the difference:
alastair box:~/hack/cptest [1134]% mkdir src
alastair box:~/hack/cptest [1135]% cd src
alastair box:~/hack/cptest/src [1136]% touch foo
alastair box:~/hack/cptest/src [1137]% ln -s foo foo-s
alastair box:~/hack/cptest/src [1138]% ln foo foo-h
alastair box:~/hack/cptest/src [1139]% ls -a
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 14:59 foo-s -> foo
alastair box:~/hack/cptest/src [1142]% mkdir ../cpdest
alastair box:~/hack/cptest/src [1143]% cp -rfp * ../cpdest
alastair box:~/hack/cptest/src [1144]% mkdir ../tardest
alastair box:~/hack/cptest/src [1145]% tar cf - * | (cd ../tardest ; tar xf - )
alastair box:~/hack/cptest/src [1146]% cd ..
alastair box:~/hack/cptest [1147]% ls -l cpdest
total 0
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
alastair box:~/hack/cptest [1148]% ls -l tardest
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
The difference is in the hard-linked files. Notice how the hard-linked files are copied individually with cp and together with tar. To make the difference more obvious, have a look at the inodes for each:
alastair box:~/hack/cptest [1149]% ls -i cpdest
24690722 foo 24690723 foo-h 24690724 foo-s
alastair box:~/hack/cptest [1150]% ls -i tardest
24690801 foo 24690801 foo-h 24690802 foo-s
There are probably other reasons to prefer tar, but this is one big one, at least if you have extensively hard-linked files.

For a directory with 25,000 empty files:
$ time { tar -cf - * | (cd ../bar; tar -xf - ); }
real 0m4.209s
user 0m0.724s
sys 0m3.380s
$ time { cp * ../baz/; }
real 0m18.727s
user 0m0.644s
sys 0m7.127s
For a directory with 4 files of 1073741824 bytes (1GB) each
$ time { tar -cf - * | (cd ../bar; tar -xf - ); }
real 3m44.007s
user 0m3.390s
sys 0m25.644s
$ time { cp * ../baz/; }
real 3m11.197s
user 0m0.023s
sys 0m9.576s
My guess is this phenomenon is highly filesystem-dependent. If I'm right you will see a drastic difference between a filesystem that specializes in numerous small files, such as reiserfs 3.6, and a filesystem that is better at handling large files.
(I ran the above tests on HFS+.)

This is a unique usage of pipes. Basically, the first tar typically writes directly to a file, but instead it's going to write to stdout (the -), which is then redirected to the other tar which takes stdin rather than a file. Basically this is the same thing as tarring to a file and untarring later, except without the file in between.

The PowerTools book has the copy as:
tar cf - * | (cd <dest> && tar xvBf - )
The '&&' is a conditional that checks the return code of the preceding command. Ihat is, if the "cd " failed, the "tar xf -" would not be executed. I always throw in a -v (verbose) and a -B (reblock input).
I use tar all the time. It is especially useful for copying to a remote system, such as:
tar cvf - . | ssh someone#somemachine '(cd somewhere && tar xBf -)'

tar cf - * | (cd <dest> ; tar xf - )
is going to tar all not hidden files/directories of the current directory to stdout, then piping that into a new subshells' stdin. That shell first changes the current working directory to <dest>, and then untars it to that directory.

Some old versions of cp didn't have -f / -p (and similar) options for preserving permissions, so this tar trick did the job.

I believe the tar will do a Windows style 'merge' operation with deeply nested directories, whereas the cp will overwrite sub-directories.
For example if you have the layout:
dir/subdir/file1
and you copy it to a destination that contains:
dir/subdir/file2
Then with copy you will be left with:
dir/subdir/file1
But with the tar command, your destination will contain:
dir/subdir/file1
dir/subdir/file2

tar cf - *
This uses tar to send * to stdout
|
This does the obvious redirect of stdout to...
(cd <dest> ; tar xf - )
This, which changes PWD to the appropriate location and then extracts from stdin
I do not know why this would be faster than rsync, as there is no compression involved.

The tar solution will preserve symbolic links, whereas cp will just make copies and destroy the links.
tar has been a standard Unix utility a lot longer than rsync. You're more likely to find it in a situation when a directory hierarchy needs to be copied to another location (even another computer). rsync is probably easier to use these days, but is slower because it compares both the source and destinations and sync's them. tar just copies in one direction.

If you have GNU cp (which all Linux-based systems will), the cp --archive will work, even on hard-linked files, and tar is not needed.

As it happens, a co-worker wrote a nearly identical command into one of our scripts. After I spent some time puzzling over it, I asked why he had used that rather than cp. His answer, as I recall it, was that cp is slow when making a copy from one file system to another.
Whether or not this is true would require more testing than I care to spend on the question, but it makes a certain amount of sense. The first tar process reads from the source device as quickly as possible only waiting for that device to read. Meanwhile, the second tar process reads from its input pipe and writes as quickly as possible. It might have to wait for input, but if writes on the destination device are slower than reads on the source device it will only wait on the destination device. A single cp command will have to wait on both the source and the destination devices.
On the other hand, modern operating systems do a pretty good job of pre-caching IO operations. It's entirely possible cp will spend most of its time waiting on writes and getting reads from memory rather than the device itself. It seems like one would need really solid data to chose using two tar commands rather than the more straightforward cp command.

Related

make directory with bash builtin

I am writing bash script for Windows 7 and have limitation to bash.exe
mkdir -p does not work (command not found)
I am looking for any replacement/cheat with bash builtins. (the -p flag is not mandatory, could do this with a loop instead)
There is some idea to Transform a file into directory or some other nice hack only a expert will know. This question is not about how to fix a missing mkdir binary, but about reducing dependencies of (broken) core utils.
this are the available bash builtins:
& (( . : [ [[ alias bg bind break builtin caller case cd command compgen complete compopt continue coproc declare dirs disown echo enable eval exec exit export false fc fg for function getopts hash help history if jobs kill let local logout mapfile popd printf pushd pwd read readarray readonly return select set shift shopt source suspend test time times trap true type typeset ulimit umask unalias unset until wait while {
if not possible with bash builtins there are some external utils available. There is already one example which depends on cp
Create a new folder using bash without mkdir command
this is all i have:
bash blobpack blobunpack bzip2 cat chmod clear cpio cut dd dhtbsign dos2unix dumpimage elftool expr file find futility grep gzip hexdump kernel_dump loki_tool ls lz4 lzop mac2unix mboot md5sum mkbootimg mkimage mkmtkhdr mv printf pxa-mkbootimg pxa-unpackbootimg rkcrc rm sed stat sudo tail tar touch unix2dos unix2mac unpackbootimg unpackelf xz
Note: I have edited the question to clarify the needs.
If I'm reading the contents of that pastebin.com link correctly, this is not a true/complete cygwin installation.
That link shows the following (in a windows environment):
PATH = ... C:\Android\bash ... # this is not where cygwin is typically installed though, yeah, you could override the default installation directory
C:\Users\mint>bash # this is not how `cygwin/bash` is invoked
bash-4.1$
bash-4.1$ ls /cygdrive/c/Android/bash # full cygwin install does not throw everything under a single directory like this:
bash.exe ... snip ... mv.exe
At this point this doesn't appear to be an actual cygwin installation but rather some sort of reduced/incomplete/bastardized bash installation under the C:\Android\bash directory.
Where/How did OP 'install' the contents under the C:\Android\bash directory?
For comparison purposes (my windows machine):
# cygwin installation directory:
C:\cygwin64>dir
Volume in drive C is Windows7
Volume Serial Number is xxxx-yyyy
Directory of C:\cygwin64
11/06/2020 12:29 <DIR> .
11/06/2020 12:29 <DIR> ..
03/30/2021 16:08 <DIR> bin
05/25/2019 17:15 53,342 Cygwin-Terminal.ico
05/25/2019 18:46 95 Cygwin.2.bat
05/25/2019 17:15 88 Cygwin.bat
05/25/2019 17:15 157,097 Cygwin.ico
02/08/2021 13:01 <DIR> dev
02/04/2021 12:06 <DIR> etc
02/26/2021 16:35 <DIR> home
02/02/2021 11:34 <DIR> lib
07/12/2020 17:28 <DIR> sbin
11/06/2020 12:29 <DIR> srv
05/07/2021 07:46 <DIR> tmp
02/02/2021 11:34 <DIR> usr
05/25/2019 17:15 <DIR> var
6 File(s) 210,622 bytes
12 Dir(s) 20,305,154,048 bytes free
C:\cygwin64\bin>dir
Volume in drive C is Windows7
Volume Serial Number is xxxx-yyyy
Directory of C:\cygwin64\bin
03/30/2021 16:08 <DIR> .
03/30/2021 16:08 <DIR> ..
... snip ...
02/03/2017 14:40 37,395 base64.exe
02/03/2017 14:40 29,715 basename.exe
01/27/2017 14:13 739,859 bash.exe # bash binary
01/27/2017 14:13 7,291 bashbug
10/17/2014 17:00 81,949 bc.exe
... snip ...
12/20/2020 17:01 11,564 mintheme
02/03/2017 14:40 62,995 mkdir.exe # mkdir binary
02/03/2017 14:40 29,715 mkfifo.exe
08/22/2020 14:00 21,523 mkgroup.exe
... snip ...
2/19/2020 11:37 30 zstdless
2/03/2017 14:41 64,019 [.exe
1130 File(s) 541,740,761 bytes # 1100+ binaries in this directory
2 Dir(s) 20,305,154,048 bytes free
# example cygwin session startup
C:\cygwin64\bin\mintty.exe -i /Cygwin-Terminal.ico -
# user has option to startup a few different tty's;
# actual OS is determined from /etc/passwd entry (/usr/bin/bash in my case)
At this point if OP wants to run cygwin/bash then I'd suggest installing an actual cygwin environment (see cygwin.org ), making sure to also install the bash package.
As for how to simulate mkdir with ... whatever is installed under C:\Android\bash ... shrug ... have the originator pull more of the binaries (eg, mkdir.exe) from a complete cygwin/bash installation?
----------------------- previous answer (before reviewing the contents of the pastebin.com link)
I'm wondering if this could be an issue of an incomplete/corrupted cygwin/bash installation or an invalid $PATH ... ?
From my cygwin environment:
$ bash --version
GNU bash, version 4.4.12(3)-release (x86_64-unknown-cygwin)
$ which mkdir
/usr/bin/mkdir
$ command -v mkdir
/usr/bin/mkdir
$ mkdir --version
mkdir (GNU coreutils) 8.26
Packaged by Cygwin (8.26-2)
Does /usr/bin/mkdir exist?
What is returned when running find / -name mkdir*?
If you can find mkdir(.exe) then the next check would be for the location/path also being defined in $PATH ...
This trick can be helpful to run Windows md shell builtin from within bash script
echo -e 'md %*\r' > mkdir.bat
./mkdir.bat test

mkdir doesn't do path expansion

So I have folder aa
$ mkdir aa
and path expansion for ls command works like this:
$ ls -la a*
total 0
drwxr-xr-x 1 a a 0 Mar 29 08:41 ./
drwxr-xr-x 1 a a 0 Dec 31 1979 ../
$ ls -la a?
total 0
drwxr-xr-x 1 a a 0 Mar 29 08:41 ./
drwxr-xr-x 1 a a 0 Dec 31 1979 ../
But "the same" for mkdir shows an error:
$ mkdir a*/bb
mkdir: cannot create directory 'a*/bb': No such file or directory
$ mkdir a?/bb
mkdir: cannot create directory 'a?/bb': No such file or directory
Where can I read why this difference in behavior happens and is there simple trick to let mkdir be "smarter" for behavior like in ls?
This does not work, since wildcard expansion is done before the argument is passed to mkdir. bash tries to expand a*/bb, doesn't find a match and tells you so. mkdir is not even invoked here. You can also try e.g.
echo a*/bb
or as you did before
ls -la a*/bb
Both commands will give you the same error message.
Now I realize how stupid that question was. Probably I wanted something like this for expansion to work:
mkdir "$(ls -d a?)"/bb
Try:
mkdir -p a*/aa
mkdir -p a?/aa

Is there an elegant way to control permissions for directories created by "install -D"?

I am using [/usr/bin/]install in a Makefile to copy some binaries into my $HOME directory. My umask is set to 700.
The problem is that I am using install -D -m 700 to install the binaries and the parent directory is created with permissions of 755 and not 700:
$ umask
077
$ ls
$ touch hello
$ ls -l
total 0
-rw------- 1 emuso emuso 0 Apr 5 13:15 hello
$ install -D -m 700 hello $PWD/this/is/hello
$ ls -ld this
drwxr-xr-x 3 emuso emuso 4096 Apr 5 13:17 this
$ ls -lR this
this:
total 4
drwxr-xr-x 2 emuso emuso 4096 Apr 5 13:17 is
this/is:
total 0
-rwx------ 1 emuso emuso 0 Apr 5 13:17 hello
I want that the directories this and is get permissions 700 instead of 755.
Solutions that come to my mind are:
using install -d -m 700 to create the directory structure by hand.
using chmod to fix permissions manually.
The major drawback for the first solution is that I have a directory structure, which I would have to travel and create by hand.
So my question is: Is there an elegant way to control permissions for directories created by "install -D"?
What you want to achieve does not seem possible with a single invocation to install only, so you might have to resort to a combination of mkdir and install. Depending on your exact situation, you might be able to take advantage of a canned recipe, using something like this:
define einstall
test -d "$(dir $#)" || mkdir -p "$(dir $#)"
install -m 700 $< $#
endef
some/new/test/hello: hello
$(einstall)
If you plan to play around with canned recipes with make v3.81 or older, please make sure to read this answer to Why GNU Make canned recipe doesn't work?

UNIX / Linux / Mac OSX get permission of file as number

This must be really simple to do but have completely drawn a blank. I can see the permission of files by using ls -la which can give something like:
-rwxr-xr-x 1 james staff 68 8 Feb 13:33 basic.sh*
-rw-r--r-- 1 james staff 68 8 Feb 13:33 otherFile.sh*
How do I translate that into a number for use with chmod like chmod 755 otherFile.sh (with out doing the manual conversion).
stat -f "%Lp" [filename] works for me in OS X 10.8.
You should be able to use the stat command instead of ls. From looking at the manpage, this should work to get the file permissions:
for f in dir/*
do
perms=$(stat -f '0%Hp%Mp%Lp' $f)
echo "$f has permissions $perms"
done
(although I am not at my Mac at the moment and therefore cannot test it).

Suppress make rule error output

I have an rule that creates a directory
bin:
-mkdir $#
However after the first time the directory has been generated, I receive this output:
mkdir bin
mkdir: cannot create directory `bin': File exists
make: [bin] Error 1 (ignored)
Is there some way I can only run the rule if the directory doesn't exist, or suppress the output when the directory already exists?
Another way to suppress the make: error ... (ignored) output is to append || true to a command that could fail. Example with grep that checks for errors in a LaTeX log file:
undefined:
#grep -i undefined *.log || true
Without the || true, make complains when grep fails to find any matches.
This works for all commands, not just mkdir; that's why I added this answer.
The error is ignored already by the leading '-' on the command line. If you really want to lose the error messages from mkdir, use I/O redirection:
bin:
-mkdir bin 2> /dev/null
You will still get the 'ignored' warning from make, though, so it might be better to use the option to mkdir that doesn't cause it to fail when the target already exists, which is the -p option:
MKDIR_P = mkdir -p
bin:
${MKDIR_P} $#
The -p option actually creates all the directories that are missing on the given paths, so it can generate a a number of directories in one invocation, but a side-effect is that it does not generate an error for already existing directories. This does assume a POSIX-ish implementation of mkdir; older machines may not support it (though it has been standard for a long time now).
The traditional way to handle directory creation is to use a stamp file that is depended on and creates the dir as a side effect. Remove the stamp file when making distclean or whatever your "really clean" target is:
bin/.dirstamp:
mkdir -p $(DIRS)
touch $#
bin/foo: bin/.dirstamp
$(MKFOO) -o $#
distclean:
rm -rf bin
The reason for this is as follows: whenever a file in bin is created/removed, the mtime of the containing directory is updated. If a target depends on bin, then the next time make runs, it will then recreate files that it doesn't need to.
Well I ended up with this construct, maybe someone will find it useful or can comment on it:
BINDIR = .
TMPDIR = tmp
OUTDIRS = $(BINDIR) $(TMPDIR)
$(OUTDIRS):
#test -d $# || mkdir $#
You rule should not be executed unless its target does not exists or is out of date because of its dependencies. In other words, you should never encounter this error.
[Example Added]
[max#truth tmp]$ ls -la
total 20
drwxr-xr-x. 2 max users 4096 Aug 14 21:11 .
drwx------. 80 max max 4096 Aug 14 18:25 ..
-rw-rw-r-- 1 max max 38 Aug 14 21:11 Makefile
[max#truth tmp]$ cat Makefile
.PHONY: all
all: bin
bin:
mkdir $#
[max#truth tmp]$ make
mkdir bin
[max#truth tmp]$ ls -la
total 24
drwxr-xr-x. 3 max users 4096 Aug 14 21:11 .
drwx------. 80 max max 4096 Aug 14 18:25 ..
drwxrwxr-x 2 max max 4096 Aug 14 21:11 bin
-rw-rw-r-- 1 max max 38 Aug 14 21:11 Makefile
[max#truth tmp]$ make
make: Nothing to be done for `all'.
[max#truth tmp]$ make
make: Nothing to be done for `all'.
Does your folder depend on anything? Is your folder a phony target?
Make assumes the first target is the default target. If that is your complete makefile, then it must be always trying to remake the default target, which is bin. Insert the following lines to the top of your makefile:
all: bin
.PHONY: all

Resources