multi processing untar a tbz2 file in linux - multiprocessing

I have a really high end machine with high number of cores
And I want to unzip a really big tbz2 file, the default one uses only one core
Is there a way to unzip which will use all the CPU cores something like the -j32 option of make.
tar xjpf abc.tbz2 -C $PWD -j32

I have an alias for this myself, which makes tar use the parallelized bzip2 decoder lbzip2:
sudo apt install lbzip2 # on Debian-like systems
alias tbz2='tar --use-compress-program=lbzip2'
tbz2 xjpf abc.tbz2 -C $PWD
You could also specify another parallel bzip2 decoder like ibzip2 but until I have optimized that further lbzip2 is probably the fastest, easily accessible, and more time-tested solution. Note that others like pbzip2 only compress in parallel but cannot decompress in parallel.

Related

Rename/translate multiple files on Mac

I have a lot of files with names in one language and I need to rename them all to another one.
Is there any script for this? (preferably on Mac)
Figured it out. We can do it with translate-shell CLI utility.
Install it with brew install translate-shell
Then run next script in your folder:
for i in *.txt
do
sleep 5
mv -i "$i" "$(echo ${i%.txt} | trans -b nl:en).txt";
done
translate-shell makes a call to Google Translate server to do translation
sleep 5 is needed to avoid being blocked by Google's server for too many requests in a second
trans is actual translate command
-b stands for "brief", as we don't need verbose output
nl:en are the source and destination languages

What is the practical purpose of -c in the linux install (/usr/bin/install) command?

What is the practical purpose of -c in the linux install command? The man page shows:
-c (ignored)
I came across this scenario while installing Nagios command-mode:
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/var/rw
From the install man page:
-c Copy the file. This is actually the default. The -c option is only included for backwards compatibility.
I think it exhibits the same behavior for copying files and creating directories. Also, taking a look at https://github.com/NagiosEnterprises/nagioscore/blob/master/Makefile.in#L37-L39 and https://github.com/NagiosEnterprises/nagioscore/blob/master/Makefile.in#L415-L421 we can see that these are set by the ./configure script.
Taking a look at configure.ac we see that it is the default autoconf macros that determine the binary to use: https://github.com/NagiosEnterprises/nagioscore/blob/master/configure.ac#L17-L19. And it is Nagios specific for setting the $(COMMAND_OPTS): https://github.com/NagiosEnterprises/nagioscore/blob/master/configure.ac#L237-L240.
Hope this helps!
TL;DR: It's a relic and doesn't matter :)
Historically, BSD's install would not copy files unless specified with the -c option. GNU Coreutils's version of install copied files by default, and provided -c as a BSD-compatibility option that is ignored. The GNU Coreutils manpage simply says
-c (ignored)
and doesn't provide context. This is the version of coreutils present on most Linux systems.
FreeBSD's manpage is a little bit more descriptive:
-c Copy the file. This is actually the default. The -c option is only included for backwards compatibility.
Nowadays, there's no particular reason to use the -c option unless you want compatibility with very old systems. That being said there's also no particular downside to leaving it in since it's unlikely you'll be hitting the character limit for the command line, so it's kind of an "up to you" sort of thing.

Stop GNU make from compiling by default

I just want to use GNU make to compress some files.
So I wrote the Makefile as follows:
lib.tar.lzma: $(shell find ~/lib -name "*")
rm -f lib.tar.lzma
tar -cavf lib.tar.lzma -C ~/ lib/
However, after I run make, it automatically compile the c++ source code in that directory.
How can I stop it from compiling them? I just want to compress them.
Update:
I got the following error:
<builtin>: recipe for target '/home/xxx/lib/app' failed
It seems a built-in recipe.
(We don't know your entire Makefile and your full file tree, so this is only a guess; I assume that you have shown us a fragment of your much bigger Makefile)
However, after I run make, it automatically compile the c++ source code in that directory.
This is probably happening because your $(shell find ~/lib -name "*") is expanded to something containing your object files. Since they are in your dependencies their source file is recompiled if it is newer. BTW you might want to use instead $(shell cd .. ; find lib -name "*") or if lib has no subdirectory even $(wildcard ../lib/*)
You probably don't need any dependency for that lib.tar.lzma target, so just have:
lib.tar.lzma:
rm -f lib.tar.lzma
tar -cavf lib.tar.lzma -C ~/ lib/
BTW, that -C ~/ perhaps should be -C $$HOME since make use /bin/sh to run commands, and that POSIX shell don't know about ~ ; perhaps a -C .. might be better ...
Perhaps you might write some shell script make-backup.sh to do a more clever tar and you would then code
lib.tar.lzma: make-backup.sh
./make-backup.sh $#
However, perhaps you do have dependencies (e.g. if you need to archive some generated files). Then you need to list them explicitly and wisely (you certainly don't want to depend on all the files; perhaps only the source ones). Also, you might not need to archive any object files *.o, if you have some (but YMMV).
I recommend using make --trace or remake -x to debug your Makefile.
BTW, having a Makefile only for a backup is useless; write a shell script instead.
I also strongly recommend using some version control system (like git) if you don't use any. Notice that git has an archive subcommand which might be a more clever backup.

Rename directory contents in OSX terminal

I have multiple folders with similarly named files (e.g., Article(1).rtf, Article(2).rtf, etc.). I want to store them all in a single directory, but since they have exactly the same name, they are immediately overwritten. I would like to add, for example, a different number to the name of each file depending on the folder they're in, so that the names are different. E.g., Article(1)1.rtf, Article(2)1.rtf for the first folder, Article(1)2.rtf, Article(2)2.rtf for the second, etc. How would I go by it in Terminal? Thanks.
Use mv from GNU Coreutils
One way to do this is with the mv utility from GNU coreutils. You can install it via Homebrew with:
brew install coreutils
Once you have the utilities installed, you can use GNU version of mv, which Homebrew makes available as gmv to avoid conflicts with the native BSD version. The GNU version has a very handy --backup flag, which is ideal for this use case. For example:
find . -name '*rtf' \
-exec gmv --backup=numbered {} /path/to/merge/directory/ +
This will prevent clobbering of files, and results in versioned filenames like:
Article(1).rtf
Article(1).rtf~1
Article(1).rtf~2
Article(2).rtf
Article(2).rtf~1
Article(2).rtf~2
and so forth. You can then use other Homebrew-installable utilities like fdupes to deduplicate or rename to further munge the filenames to suit your purposes.

shell: can I use linux-like command style `rm tmp/ -fr` in mac

Say, I want to remve a tmp/ dir in shell.
In linux (like ubuntu), I can rm tmp/ -fr or rm -fr tmp/. But in mac, I can only rm -fr tmp/.
Is there any way to config the mac shell (bash/zsh/...), so I can use the linux-like command style: rm tmp/ -fr?
Having options after operands as in rm tmp/ -fr is non-standard. IEEE Std 1003.1-2001 has in the section 12.2 Utility Syntax Guidelines:
Guideline 9:
All options should precede operands on the command line.
The implementors of the GNU utilities (as used by most Linux distributions) have chosen to add many non-standard extensions. While sometimes convenient, using these extionsions is inherently unportable. MacOS X has a userland derived from BSD, which does not have most of the non-standard GNU extensions. If you expect to be working with non-GNU systems such as BSD, Solaris or any other commercial UNIX in the future, it really pays to stick to standard syntax of utilities and not get used to any GNU extensions. It saves a lot of hassle when working with all the different UNIX operating systems out there. This is especially true and important when writing scripts. Relying on GNU syntax in scripts will make them unportable.
So instead of installing GNU coreutils on MacOS X, my advice would be to use it as an opportunity to get used to standard syntax (IEEE Std 1003.1, POSIX.2 etc.).
It depends on the commands that you use. OS X comes from the BSD family (via NeXTSTEP), and so their standard utilities mostly descend from BSD. There is a large degree of compatibility between BSD and Linux style commands, but every once in a while you run into implementation-specific things like this. If you really want to use the Linux versions of the commands, you can install the GNU coreutils package. It's available from Homebrew (recommended) and MacPorts. In general, though, it's a pain to install the GNU coreutils over the built in BSD toolchain, because of naming clashes and such.
It general it would depend on the implementation of the command you execute. However, as for rm (given you mean rm), since you used that as an example, you cannot. At least it will not do what you think.
rm -fr tmp/: will delete directory tmp/ with the options -r and -f, so recursive (here mandatory) and forced.
rm tmp/ -fr: will (attempt) to delete directory tmp/ and/or a file/directory
named -fr, but would produce an error: rm: tmp/ directory.
Note: the only other way to remove a file named -rf would be to use rm -- -rf.

Resources