Suppose you have a folder that contains two files.
Example: stop_tomcat_center.sh and start_tomcat_center.sh.
In my example, the return of ls *tomcat* returns these two scripts.
How can I search and execute these two scripts simultaneously?
I tried
ls *tomcat* | xargs sh
but only the first script is executed (not the second).
An easy way to do multiple things in parallel is with GNU Parallel:
parallel ::: ./*tomcat*
Or, if your scripts don't have a shebang at the first line:
parallel bash ::: ./*tomcat*
Or if you like xargs:
ls *tomcat* | xargs -P 2
xargs is missing the -n 1 option.
From man xargs:
-n max-args, --max-args=max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the -x option is given, in which case xargs will exit.
xargs otherwise tries to execute the command with as many parameters as possible, which makes sense for most commands.
In your case ls *tomcat* | xargs sh is running sh stop_tomcat_center.sh start_tomcat_center.sh and the stop_tomcat_center.sh is probably just ignoring the $1 parameter.
Also it is not a good idea to use the output of ls. A better way would be to use find -maxdepth 1 -name '*tomcat*' -print0 | xargs -0 -n 1 sh or for command in *tomcat*; do sh "$command"; done
This answer is based on the assumption that the OP meant "both with one command line" when he wrote "simultaneously".
For solutions on parallel execution have take a look at the other answers.
You can do the following to search and execute
find . -name "*.sh" -exec sh x {} \;
find will find the file and exec will find the match and execute
Related
I have a scenario where I need to execute a series of commands on each file that's found. This normally would work great, except I have over 100 files and folders to exclude from find's results for execution. This becomes unwieldy and non-executable from the shell directly. It seems like it would be optimal to use an "exclusion file" similar to how tar or grep allows for such files.
Since find does not accept a file for exclusion, but grep does, I want to know: how can the following be converted to a command that would replace the exclusion (prune) and exec functions in find to instead utilize grep with an exclusion file (grep -v -f excludefile) to exclude the folders and files and then execute a series of commands on the result like the current command does it:
find $IN_PATH -regextype posix-extended \
-regex "/(excluded1|excluded2|excluded3|...|excludedN)" -prune \
-o -type f \
-exec sh -c "( cmd -with_args 1 '{}'; cmd -args2 '{}'; cmd3 '{}') \
| cmd4 | cmd5 | cmd6; cmd7 '{}'" \; \
> output
As a side note (not critical), I've read that if you don't use exec this process becomes much less efficient and this process is already consuming over 100 minutes to execute each time that it's run, so I don't want to slow it down any more than is necessary.
the best way i think of to fulfill your scenario , is split the one-liner to two line and introduce xargs with parallel .
find $IN_PATH -regextype posix-extended \
-regex "/(excluded1|excluded2|excluded3|...|excludedN)" -prune \
-o -type f > /tmp/full_file_list
cat /tmp/full_file_list|grep -f excludefile |xargs -0 -n 1 -P <nr_procs> sh -c 'command here' >output
see Bash script processing limited number of commands in parallel and Doing parallel processing in bash? to learn more about parallel in bash
finding and command on files are facing disk-io conflicts in one liner , spilt the one-liner could speed up the process a little bit ,
hint: remember to put your full_file_list/excludefile/output in your exclude rules , and always debug your command on a smaller directory to reduce waiting time
Why not simply:
find . -type f |
grep -v -f excludefile |
xargs whatever
With respect to this process is already consuming over 100 minutes to execute - that's almost certainly a problem with whatever command line you wrote to replace whatever above and we could probably help you improve that if you post a separate question.
I am using jsonlint to lint a bunch of files in a directory (recursively). I wrote the following command:
find ./config/pages -name '*.json' -print0 | xargs -0I % sh -c 'echo Linting: %; jsonlint -V ./config/schema.json -q %;'
It works for most files but some files I get the following error:
Linting: ./LONG_FILE_NAME.json
fs.js:500
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^
Error: ENOENT, no such file or directory '%'
It appears to fail for long filenames. Is there a way to fix this? Thanks.
Edit 1:
Found the problem.
-I replstr
Execute utility for each input line, replacing one or more occurrences
of replstr in up to replacements (or 5 if no -R flag is specified)
arguments to utility with the entire line of input. The resulting
arguments, after replacement is done, will not be allowed to grow
beyond 255 bytes; this is implemented by concatenating as much of the
argument containing replstr as possible, to the con-structed arguments
to utility, up to 255 bytes. The 255 byte limit does not apply to
arguments to utility which do not contain replstr, and furthermore, no
replacement will be done on utility itself. Implies -x.
Edit 2:
Partial solution. Supports longer file names than before but still not as long as I need.
find ./config/pages -name '*.json' -print0 | xargs -0I % sh -c 'file=%; echo Linting: $file; jsonlint -V ./config/schema.json -q $file;'
On BSD like systems (e.g. Mac OS X)
If you happen to be on a mac or freebsd etc. your xargs implementation may support option -J which does not suffer from the argument size limits imposed on option -I.
Excert from manpage
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occurrence of replstr instead of appending that data after all other arguments. This option will not effect how many arguments will be read from input (-n), or the size of the command(s) xargs will generate (-s). The option just moves where those arguments will be placed in the command(s) that are executed. The replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is in the middle of a quoted string. Furthermore, only the first occurrence of the replstr will be replaced. For example, the following command will copy the list of files and directories which start with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -Rp % destdir
If you need to refer to the repstr multiple times (*points up* TL;DR -J only replaces first occurrence) you can use this pattern:
echo hi | xargs -J{} sh -c 'arg=$0; echo "$arg $arg"' "{}"
=> hi hi
POSIX compliant method
The posix compliant method of doing this would be to use some other tool, e.g. sed to construct the code you want to execute and then use xargs to just specify the utility. When no repl string is used in xargs the 255 byte limit does not apply. xargs POSIX spec
find . -type f -name '*.json' -print |
sed "s_^_-c 'file=\\\"_g;s_\$_\\\"; echo \\\"Definitely over 255 byte script..$(printf "a%.0s" {1..255}): \\\$file\\\"; wc -l \\\"\\\$file\\\"'_g" |
xargs -L1 sh
This of course largely defeats the purpose of xargs to begin with, but can still be used to leverage e.g. parallel execution using xargs -L1 -P10 sh which is quite widely supported, though not posix.
Use -exec in find instead of piping to xargs.
find ./config/pages -name '*.json' -print0 -exec echo Linting: {} \; -exec jsonlint -V ./config/schema.json -q {} \;
The limit on xargs's command line length is imposed by the system (not an environment) variable ARG_MAX. You can check it like:
$ getconf ARG_MAX
2097152
Surprisingly, there doesn't not seem to be a way to change it, barring kernel modification.
But even more surprising that xargs by default gets capped to a much lower value, and you can increase with -s option. Still, ARG_MAX is not the value you can set after -s — acc. to man xargs you need to subtract size of environment, plus some "headroom", no idea why. To find out the actual number use the following command (alternatively, using an arbitrary big number for -s will result in a descriptive error):
$ xargs --show-limits 2>&1 | grep "limit on argument length (this system)"
POSIX upper limit on argument length (this system): 2092120
So you need to run … | xargs -s 2092120 …, e.g. with your command:
find ./config/pages -name '*.json' -print0 | xargs -s 2092120 -0I % sh -c 'echo Linting: %; jsonlint -V ./config/schema.json -q %;'
I want to do something on the lines of:
find -name *.mk | xargs "for i in $# do mv i i.aside end"
I realize that there might be more than on error in this, but I'd like to specifically know about this sort of inline command definition that I can pass xargs to.
This particular command isn't a great example, but you can use an "inline shell script" by giving sh -c 'here is the script' as a command. And you can give it arguments which will be $# inside the script but there's a catch: the first argument after here is the script goes to $0 inside the script, so you have to put an extra word there or you'll lose the first argument.
find . -name '*.mk' -exec sh -c 'for i; do mv "$i" "$i.aside"; done' fnord '{}' +
Another fun feature I took advantage of there is the fact that for loops iterate over the command line arguments by default: for i; do ... is equivalent to for i in "$#"; do ...
I reiterate, the above command is convoluted and slow compared to the many other methods of doing the bulk mv. I'm posting it only to show some cool syntax.
There's no need for xargs here
find -name *.mk -exec mv {} {}.aside \;
I'm not sure what the semantics of your for loop should be, but blindly coding it would give something like this:
find -name *.mk | while read file
do
for i in $file; do mv $i $i.aside; done
done
If the body is used in multiple places, you can also use bash functions.
In some version of find an argument is needed : . for the current directory
Star * must be escaped
You can try with echo command to be sure what command will do
find . -name '*.mk' -print0 | xargs -0i sh -c "echo mv '{}' '{}.aside'"
man xargs
/-i
man sh
/-c
I'm certain you could do this in a nice manner, but since you requested xargs:
find -name "*.tk" | xargs -I% mv % %.aside
Looping over filenames makes no sense, since you can only rename one at a time. Using inline uglyness is not necessary, but I could not make it work with the pipe and either eval or bash -c.
Can someone show me to use xargs properly? Or if not xargs, what unix command should I use?
I basically want to input more than (1) file name for input <localfile>, third input parameter.
For example:
1. use `find` to get list of files
2. use each filename as input to shell script
Usage of shell script:
test.sh <localdir> <localfile> <projectname>
My attempt, but not working:
find /share1/test -name '*.dat' | xargs ./test.sh /staging/data/project/ '{}' projectZ \;
Edit:
After some input from everybody and trying -exec, I am finding that my <localfile> filename input with find is also giving me the full path. /path/filename.dat instead of filename.dat. Is there a way to get the basename from find? I think this will have to be a separate question.
I'd just use find -exec here:
% find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ {} projectZ \;
This will invoke ./test.sh with your three arguments once for each .dat file under /share1/test.
xargs would pack up all of these filenames and pass them into one invocation of ./test.sh, which doesn't look like your desired behaviour.
If you want to execute the shell script for each file (as opposed to execute in only once on the whole list of files), you may want to use find -exec:
find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ '{}' projectZ \;
Remember:
find -exec is for when you want to run a command on one file, for each file.
xargs instead runs a command only once, using all the files as arguments.
xargs stuffs as many files as it can onto the end of the command line.
Do you want to execute the script on one file at a time or all files? For one at a time, use file's exec, which it looks like you're already using the syntax for, and which xargs doesn't use:
find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ '{}' projectZ \;
xargs does not have to combine arguments, it's just the default behavior. this properly uses xargs, to execute the commands, as intended.
find /share1/test -name '*.dat' -print0 | xargs -0 -I'{}' ./test.sh /staging/data/project/ '{}' projectZ
When piping find to xargs, NULL termination is usually preferred, I recommend appending the -print0 option to find. After which you must add -0 to xargs, so it will expect NULL terminated arguments. This ensures proper handling of filenames. It's not POSIX proper, but considered well supported. You can always drop the NULL terminating options, if your commands lack support.
Remeber while find's purpose is finding files, xargs is much more generic. I often use xargs to process non-filename arguments.
I've thousands of png files which I like to make smaller with pngcrush. I've a simple find .. -exec job, but it's sequential. My machine has quite some resources and I'd make this in parallel.
The operation to be performed on every png is:
pngcrush input output && mv output input
Ideally I can specify the maximum number of parallel operations.
Is there a way to do this with bash and/or other shell helpers? I'm Ubuntu or Debian.
You can use xargs to run multiple processes in parallel:
find /path -print0 | xargs -0 -n 1 -P <nr_procs> sh -c 'pngcrush $1 temp.$$ && mv temp.$$ $1' sh
xargs will read the list of files produced by find (separated by 0 characters (-0)) and run the provided command (sh -c '...' sh) with one parameter at a time (-n 1). xargs will run <nr_procs> (-P <nr_procs>) in parallel.
You can use custom find/xargs solutions (see Bart Sas' answer), but when things become more complex you have -at least- two powerful options:
parallel (from package moreutils)
GNU parallel
With GNU Parallel http://www.gnu.org/software/parallel/ it can be done like:
find /path -print0 | parallel -0 pngcrush {} {.}.temp '&&' mv {.}.temp {}
Learn more:
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line
will love you for it.