Redirect output of xargs to file - bash

I want to delete the first line of every files of a directory and save the corresponding output by appending a '.tmp' at the end of each of the filename. For example, if there is a file named input.txt with following content:
line 1
line 2
I want to create a file in the same directory with name input.txt.tmp which will have the following content
line 2
I'm trying this command:
find . -type f | xargs -I '{}' tail -n +2 '{}' > '{}'.tmp
The problem is, instead of writing output to separate files with .tmp suffix, it creates just one single file named {}.tmp. I understand that this is happening because the output redirection is done after xargs is completely finished. But is there any way to tell xargs that the output redirection is a part of it's argument?

Note you can use find together with -exec, without need to pipe to xargs:
find . -type f -exec sh -c 'f={}; tail -n+2 $f > $f.tmp' \;
^^^^ ^^^^^^^^^^^^^^^^^^^^^
| perform the tail and redirection
store the name of the file

If you have GNU Parallel you can run:
find . -type f | parallel tail -n +2 {} '>' {}.tmp
All new computers have multiple cores, but most programs are serial in nature and will therefore not use the multiple cores. However, many tasks are extremely parallelizeable:
Run the same program on many files
Run the same program for every line in a file
Run the same program for every block in a file
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
A personal installation does not require root access. It can be done in 10 seconds by doing this:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Related

How to delete all files except the N newest files?

this command allows me to login to a server, to a specific directory from my pc
ssh -t xxx.xxx.xxx.xxx "cd /directory_wanted ; bash"
How can I then do this operation in that directory. I want to be able to basically delete all files except the N most newest.
find ./tmp/ -maxdepth 1 -type f -iname *.tgz | sort -n | head -n -10 | xargs rm -f
This command should work:
ls -t *.tgz | tail -n +11 | xargs rm -f
Warning: Before doing rm -f, confirm that the files being listed by ls -t *.tgz | tail -n +11 are as expected.
How it works:
ls lists the contents of the directory.-t flag sorts by
modification time (newest first). See the man page of ls
tail -n +11 outputs starting from line 11. Please refer the man page of
tail for more
detials.
If the system is a Mac OS X then you can delete based on creation time too. Use ls with -Ut flag. This will sort the contents based on the creation time.
You can use this command,
ssh -t xxx.xxx.xxx.xxx "cd /directory_wanted; ls -t *.tgz | tail -n
+11 | xargs rm -f; bash"
In side quotes, we can add what ever the operations to be performed in remote machine. But every command should be terminated with semicolon (;)
Note: Included the same command suggested by silentMonk. It is simple and it is working. But verify it once before performing the operation.

using "wc -l" on script counts more than using on terminal

I'm making a bash script and it's like this:
#!/bin/bash
DNUM=$(ls -lAR / 2> /dev/null | grep '^d' | wc -l)
echo there are $DNUM directories.
the problem is, that when I run this line directly on the terminal:
ls -lAR / 2> /dev/null | grep '^d' | wc -l
I get a number.
But when I run the script it displays me a greater number, like 30 to 50 more.
What is the problem here?
Why is the "wc" command counting more lines when running it from a script?
You may have different directory roots for the two runs. Instead of ls to find the directories only you can use this
find parent_directory -type d
and pipe to wc -l to count.
The /proc directory will have processes and treated as directories and will change from run to run. To exclude it from the count use
find / -path /proc -prune -o -type d | wc -l
To find the differences in your exact case I would suggest to run
#!/bin/bash
for r in 1 2; do
ls -lAR / 2> /dev/null | grep '^d' > run${r}.txt 1> out${r}.txt
done
diff -Nura out1.txt out2.txt
rm -f out1.txt out2.txt
But as the most ppl. already said it would make sense to exclude directories like sys,proc ...

Xargs stdout to file redirection appending file indefinitely

I have a problem with redirecting output from xargs namely I do something like:
find . -mmin -10 | xargs grep mypattern > greping
This will keep writing to file indefinitely (I have waited until file reached around 25GB ) but when I change it to add pipe to grep at the end I will get proper results ( around 25 kB file ):
find . -mmin -10 | xargs grep mypattern | grep 2013-07-11 > greping
What am I missing here and why does xargs in first code snippet keep writing to file ?
Bash version GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Change redirected file from >greping to >../greping or >/tmp/greping.
Basically, output file should not be in current directory or any of its subdirectory.
Or try:
find . -mmin -10 | grep -Fx -v './greping' | xargs grep mypattern > greping

bash: Script arguments ${1} interfering with xargs' ${1} inside script

Long story short, I have a script chunk that looks like this:
cd ${1} # via command-line argument
find . -name "output*.txt" | xargs cat ${1} | grep -v ${filter} > temp.txt
I essentially got to this point buy building the find ... line in the command line, then pasting it into my script, then adding the cd command to make it easy to reuse this script in a wrapper that will run this script on a large set of directories. Anyway ...
The problem is that cd and xargs use the same ${1} variable, which sort of makese
I know that I can drop the ${1} argument from xargs, and I can probably rewrite the find command to not need xargs at all, but my question remains:
Is there a way to "reset" ${1} after I use it for cd so that xargs doesn't
I'm not familiar with a version of xargs that uses ${1} as a default replacement string, but the following should work:
find . -name "output*.txt" | xargs -I '{}' cat '{}' | grep -v ${filter} > temp.txt
Your use of find + xargs suffers from The separator problem https://en.wikipedia.org/wiki/Xargs#Separator_problem
Here is a solution that does not have that problem. It uses GNU Parallel:
find . -name "output*.txt" |
parallel cat {} |
grep -v ${filter} > temp.txt
It takes literally 10 seconds to install GNU Parallel:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
the utility xargs is just to manage the number of arguments and there braces act as a placeholder for the arguments or filename {}.
i did simulate the wrapper you have and had the same issue the pathname passed as ${1} to my function was picked up by xargs ...... but this is how shell works , actually the shell would just substitute/expand all the variables and stuff to make the commands complete before execution
the utility arnt coded to take care of this by themselves , they just dont do that ......
example :
echo * ;
here the shell would expand * and replace it with all the filenames in the current directory and pass them to the echo utility as argument # before execution of the command echo *
likewise in your case the shell is expanding the ${1} to the pathname value you had passed before executing the command ..... thats why you got what you had.
Solution : you could just use the braces (empty of course) or better just drop them ....... it works fine for me .....
find . -iname "*${filename}*" | xargs ls -l | more commands ....;
hope this helps .

How to use > in an xargs command?

I want to find a bash command that will let me grep every file in a directory and write the output of that grep to a separate file. My guess would have been to do something like this
ls -1 | xargs -I{} "grep ABC '{}' > '{}'.out"
but, as far as I know, xargs doesn't like the double-quotes. If I remove the double-quotes, however, then the command redirects the output of the entire command to a single file called '{}'.out instead of to a series of individual files.
Does anyone know of a way to do this using xargs? I just used this grep scenario as an example to illustrate my problem with xargs so any solutions that don't use xargs aren't as applicable for me.
Do not make the mistake of doing this:
sh -c "grep ABC {} > {}.out"
This will break under a lot of conditions, including funky filenames and is impossible to quote right. Your {} must always be a single completely separate argument to the command to avoid code injection bugs. What you need to do, is this:
xargs -I{} sh -c 'grep ABC "$1" > "$1.out"' -- {}
Applies to xargs as well as find.
By the way, never use xargs without the -0 option (unless for very rare and controlled one-time interactive use where you aren't worried about destroying your data).
Also don't parse ls. Ever. Use globbing or find instead: http://mywiki.wooledge.org/ParsingLs
Use find for everything that needs recursion and a simple loop with a glob for everything else:
find /foo -exec sh -c 'grep "$1" > "$1.out"' -- {} \;
or non-recursive:
for file in *; do grep "$file" > "$file.out"; done
Notice the proper use of quotes.
A solution without xargs is the following:
find . -mindepth 1 -maxdepth 1 -type f -exec sh -c "grep ABC '{}' > '{}.out'" \;
...and the same can be done with xargs, it turns out:
ls -1 | xargs -I {} sh -c "grep ABC '{}' > '{}.out'"
Edit: single quotes added after remark by lhunath.
I assume your example is just an example and that you may need > for other things. GNU Parallel http://www.gnu.org/software/parallel/ may be your rescue. It does not need additional quoting as long as your filenames do not contain \n:
ls | parallel "grep ABC {} > {}.out"
If you have filenames with \n in it:
find . -print0 | parallel -0 "grep ABC {} > {}.out"
As an added bonus you get the jobs run in parallel.
Watch the intro videos to learn more: http://pi.dk/1
The 10 seconds installation will try to do a full installation; if that fails, a personal installation; if that fails, a minimal installation:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
If you need to move it to a server, that does not have GNU Parallel installed, try parallel --embed.
Actually, most of the answers here do not work with all filenames (if they contain double and single quotes), including the answer by lhunath and Stephan202.
This solution works with filenames with single and double quotes:
find . -mindepth 1 -print0 | xargs -0 -I{} sh -c 'grep ABC "$1" > "$1.out"' -- {}
Here's a test with filename with both single and double quotes:
echo ABC > "I'm here.txt"
# lhunath solution (hangs waiting for input)
$ find . -exec sh -c 'grep "$1" > "$1.out"' -- {} \;
# Stephan202 solutions
$ find . -mindepth 1 -maxdepth 1 -type f -exec sh -c "grep ABC '{}' > '{}.out'" \;
grep: ./Im: No such file or directory
grep: here.txt > ./Im here.txt.out: No such file or directory
$ ls -1 | xargs -I {} sh -c "grep ABC '{}' > '{}.out'"
xargs: unterminated quote
# this solution
$ find . -mindepth 1 -print0 | xargs -0 -I{} sh -c 'grep ABC "$1" > "$1.out"' -- {}
$ ls -1
"I'm here.txt"
"I'm here.txt.out"

Resources