Using find and xargs how can I stop execution on errors without crapping out - bash

In my script I have the following 3 commands
Basically what it is trying to do is:
create a symlink to a certain bunch of files based on their filenames, in a temp directory.
change the name of the symlink to match the current date
move the symlinks from a temp directory to their proper location
-
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
This will be inserted as a cron job to run every 15 mins, which is not a problem when the source directory contains valid data.
However when it doesn't contain any files I get errors on the second find command and the mv command
What I want I guess is a way of not executing the last two lines of the script if the first one does not create any new links

GNU xargs supports a --no-run-if-empty parameter that, to quote the documentation "If the standard input is completely empty, do not run the command. By default, the command is run once even if there is no input".
This should help avoid the xargs error (assuming you are running GNU xargs)

check the status of the command:
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
if [[ $? == 0 ]]; then
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
fi

Related

For loop, wildcard and conditional statement

I don't really know what am I supposed to do with it.
For each file in the /etc directory whose name starts with the o or l and the second letter and the second letter of the name is t or r, display its name, size and type ('file'/'directory'/'link'). Use: wildcard, for loop and conditional statement for the type.
#!/bin/bash
etc_dir=$(ls -a /etc/ | grep '^o|^l|^.t|^.r')
for file in $etc_dir
do
stat -c '%s-%n' "$file"
done
I was thinking about something like that but I have to use if statement.
You may reach the goal by using find command.
This will search through all subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
To have control on searching in subdirectories, you may use -maxdepth flag, like in the below example it will search only the files and directories name in the /etc dir and don't go through the subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -maxdepth 1 -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
You may also use -type f OR -type d parameters to filter finding only Files OR Directories accordingly (if needed).
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -type f -exec stat -c '%s-%n' {} \; 2>/dev/null
Update #1
Due to your request in the comments, this is a long way but used for loop and if statement.
Note: I'd strongly recommend to review and practice the commands used in this script instead of just copy and pasting them to get the score ;)
#!/bin/bash
# Set the main directory path.
_mainDir='/etc'
# This will find all files in the $_mainDir (ignoring errors if any) and assign the file's path to the $_files variable.
_files=$(find "${_mainDir}" 2>/dev/null)
# In this for loop we will
# loop over all files
# identify the poor filename from the whole file path
# and IF the poor file name matches the statement then run & output the `stat` command on that file.
for _file in ${_files} ;do
_fileName=$(basename ${_file})
if [[ "${_fileName}" =~ ^[ol][tr].* ]] ;then
stat -c 'Size: %s , Type: %n ' "${_file}"
fi
done
exit 0
You should break-down you problems into multiple pieces and tackle them one by one.
First, try and build an expression that finds the right files. If you were to execute your regex expression in a shell:
ls -a /etc/ | grep '^o|^l|^.t|^.r'
You would immediately see that you don't get the right output. So the first step would be to understand how grep works and fix the expression to:
ls -a /etc/ | grep '^[ol][tr]*'
Then, you have the file name, and you need the size and a textual file type. The size is easy to obtain using a stat call.
But, you soon realize you cannot ask stat to provide a textual format of the file type with the -f switch, so you probably have to use an if clause to present that.
How about this:
shopt -s extglob
ls -dp /etc/#(o|l)#(t|r)* | grep -v '/$'
Explanation:
shopt extglob - enable extended globbing (https://www.google.com/search?q=bash+extglob)
ls -d - list directories names, not their content
ls -dp - and add / at the end of each directory name
#(o|l)#(t|r) - o or l once (#), and then t or r once
grep -v '/$' - remove all lines containing / at the end
Of course, Vab's find solution is better that this ls:
find /etc -maxdepth 1 -name "[ol][tr]*" -type f -exec stat {} \;

How to cd into grep output?

I have a shell script which basically searches all folders inside a location and I use grep to find the exact folder I want to target.
for dir in /root/*; do
grep "Apples" "${dir}"/*.* || continue
While grep successfully finds my target directory, I'm stuck on how I can move the folders I want to move in my target directory. An idea I had was to cd into grep output but that's where I got stuck. Tried some Google results, none helped with my case.
Example grep output: Binary file /root/ant/containers/secret/Documents/2FD412E0/file.extension matches
I want to cd into 2FD412E0and move two folders inside that directory.
dirname is the key to that:
cd $(dirname $(grep "...." ...))
will let you enter the directory.
As people mentioned, dirname is the right tool to strip off the file name from the path.
I would use find for such kind of task:
while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
done < <(find /root/ -type f \
-exec grep "Apples" --files-with-matches {} \;)
Consider using find's -maxdepth option. See the man page for find.
Well, there is actually simpler solution :) I just like to write bash scripts. You might simply use single find command like this:
find /root/ -type f -exec grep Apples {} ';' -exec ls -l {} ';'
Note the second -exec. It will be executed, if the previous -exec command exited with status 0 (success). From the man page:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ; is encountered. The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
Replace the ls -l command with your stuff.
And if you want to execute dirname within the -exec command, you may do the following trick:
find /root/ -type f -exec grep -q Apples {} ';' \
-exec sh -c 'cd `dirname $0`; pwd' {} ';'
Replace pwd with your stuff.
When find is not available
In the comments you write that find is not available on your system. The following solution works without find:
grep -R --files-with-matches Apples "${dir}" | while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
echo $target_dir
done

Execute loop on bash thru keywords

I have such script, it search my mail files and if keyword is found it move all files to other location.
How to make it work for multiple keywords?, for example i would have 11 KEY's and i would not want to copy and paste find command over and over.
DIRF='move/from'
DIRT='move/to'
KEY='discount'
find $DIRF -type f -exec grep -ilR "$KEY" {} \; | xargs -I % mv % $DIRT
Why are you using find here at all?
You are already telling grep to operate recursively (-R) so just point it at $DIRF and be done. -R is also pointless if you only ever give it files (from type -f).
Also grep takes a pattern that can do alternation. Just use that.
grep -RilE 'KEY1|KEY2|KEY3|Key4' "$DIRF"
for KEY in "discount" "other_value" "other_value2"
do
find $DIRF -type f -exec grep -ilR "$KEY " {} \; | xargs -I % mv % $DIRT
done

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Modifying replace string in xargs

When I am using xargs sometimes I do not need to explicitly use the replacing string:
find . -name "*.txt" | xargs rm -rf
In other cases, I want to specify the replacing string in order to do things like:
find . -name "*.txt" | xargs -I '{}' mv '{}' /foo/'{}'.bar
The previous command would move all the text files under the current directory into /foo and it will append the extension bar to all the files.
If instead of appending some text to the replace string, I wanted to modify that string such that I could insert some text between the name and extension of the files, how could I do that? For instance, let's say I want to do the same as in the previous example, but the files should be renamed/moved from <name>.txt to /foo/<name>.bar.txt (instead of /foo/<name>.txt.bar).
UPDATE: I manage to find a solution:
find . -name "*.txt" | xargs -I{} \
sh -c 'base=$(basename $1) ; name=${base%.*} ; ext=${base##*.} ; \
mv "$1" "foo/${name}.bar.${ext}"' -- {}
But I wonder if there is a shorter/better solution.
The following command constructs the move command with xargs, replaces the second occurrence of '.' with '.bar.', then executes the commands with bash, working on mac OSX.
ls *.txt | xargs -I {} echo mv {} foo/{} | sed 's/\./.bar./2' | bash
It is possible to do this in one pass (tested in GNU) avoiding the use of the temporary variable assignments
find . -name "*.txt" | xargs -I{} sh -c 'mv "$1" "foo/$(basename ${1%.*}).new.${1##*.}"' -- {}
In cases like this, a while loop would be more readable:
find . -name "*.txt" | while IFS= read -r pathname; do
base=$(basename "$pathname"); name=${base%.*}; ext=${base##*.}
mv "$pathname" "foo/${name}.bar.${ext}"
done
Note that you may find files with the same name in different subdirectories. Are you OK with duplicates being over-written by mv?
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
find . -name "*.txt" | parallel 'ext={/} ; mv -- {} foo/{/.}.bar."${ext##*.}"'
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
If you're allowed to use something other than bash/sh, AND this is just for a fancy "mv"... you might try the venerable "rename.pl" script. I use it on Linux and cygwin on windows all the time.
http://people.sc.fsu.edu/~jburkardt/pl_src/rename/rename.html
rename.pl 's/^(.*?)\.(.*)$/\1-new_stuff_here.\2/' list_of_files_or_glob
You can also use a "-p" parameter to rename.pl to have it tell you what it WOULD HAVE DONE, without actually doing it.
I just tried the following in my c:/bin (cygwin/windows environment). I used the "-p" so it spit out what it would have done. This example just splits the base and extension, and adds a string in between them.
perl c:/bin/rename.pl -p 's/^(.*?)\.(.*)$/\1-new_stuff_here.\2/' *.bat
rename "here.bat" => "here-new_stuff_here.bat"
rename "htmldecode.bat" => "htmldecode-new_stuff_here.bat"
rename "htmlencode.bat" => "htmlencode-new_stuff_here.bat"
rename "sdiff.bat" => "sdiff-new_stuff_here.bat"
rename "widvars.bat" => "widvars-new_stuff_here.bat"
the files should be renamed/moved from <name>.txt to /foo/<name>.bar.txt
You can use rename utility, e.g.:
rename s/\.txt$/\.txt\.bar/g *.txt
Hint: The subsitution syntax is similar to sed or vim.
Then move the files to some target directory by using mv:
mkdir /some/path
mv *.bar /some/path
To do rename files into subdirectories based on some part of their name, check for:
-p/--mkpath/--make-dirs Create any non-existent directories in the target path.
Testing:
$ touch {1..5}.txt
$ rename --dry-run "s/.txt$/.txt.bar/g" *.txt
'1.txt' would be renamed to '1.txt.bar'
'2.txt' would be renamed to '2.txt.bar'
'3.txt' would be renamed to '3.txt.bar'
'4.txt' would be renamed to '4.txt.bar'
'5.txt' would be renamed to '5.txt.bar'
Adding on that the wikipedia article is surprisingly informative
for example:
Shell trick
Another way to achieve a similar effect is to use a shell as the launched command, and deal with the complexity in that shell, for example:
$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 bash -c 'for filename; do cp -a "$filename" ~/backups; done' bash
Inspired by an answer by #justaname above, this command which incorporates Perl one-liner will do it:
find ./ -name \*.txt | perl -p -e 's/^(.*\/(.*)\.txt)$/mv $1 .\/foo\/$2.bar.txt/' | bash

Resources