xargs with multiple commands only working on some files - bash

I'm trying (starting with my Macbook) to get a list of all image files matching the specification in the line below along with their size and sha512. I'm doing this to audit the tens of thousands of such files I have spread over mutliple systems.
sudo find /Users \( -iname '*.JPG' -or -iname '*.NEF' -or -iname '*.PNG'
-or -iname '*.RAF' -or -iname '*.PW2' -or -iname '*.DNG' \) -type f -and
-size +10000k -print0 | xargs -0 -I ##
/bin/bash -c '{ stat -n -f"MACBOOK %z " "##" && shasum -p -a 512 "##"; }'
When run, this correctly produces the output I want for some of the files, for example I get;
MACBOOK 32465640 <SHA512-REDACTED> ?/Users/<REDACTED>/Pictures/Pendleton Roundup/2018/2018-09-13/_DSC3955.NEF
But for some of the files, the ## replacement doesn't seem to work properly and instead I get;
MACBOOK 28130793 shasum: ##:
If I add a -v flag to the bash line to print out the commands I expect to be executed when it goes wrong I see this;
{ stat -n -f"MACBOOK %z " "/Users/<REDACTED>/Pictures/Photos Library D750.photoslibrary/Masters/2018/07/29/20180729-223141/DSC_3274.NEF"; shasum -p -a 512 "##"; }
If I manually run that line with the ## replaced with the filename, it works as expected, so it seems that the -I ## parameter to xargs is somehow not always working and I'm at a loss as to what the cause might me.
Can anyone help me evolve a fix for this? I've tried putting the ## in quotes, tried with different patterns and always the same issue.

Consider:
find_args=( -false )
for type in jpg nef png raf pw2 dng; do
find_args+=( -o -name "*.$type" )
done
sudo find /Users '(' "${find_args[#]}" ')' \
-type f \
-size +10000k \
-exec sh -c '
for arg; do
stat -n -f"MACBOOK %z " "$arg"
shasum -p -a 512 "$arg"
done' _ {} +
Using -exec ... {} + lets find invoke only one copy of sh per batch of files (as many as will fit on a command line on your local platform).
Even more importantly, not using {} inside the sh -c argument avoids command injection vulnerabilities, which with the original code would allow malicious filenames to run arbitrary commands (especially important when you're running under sudo, so those commands would be executed as root!).

The problem isn't that you were using xargs. It's that your find was run with sudo but any process receiving your piped or redirected output was not run with sudo, so your permissions during the find do not match your permissions during the subsequent xargs execution.
So, for example, instead of running:
sudo ls -al >> list.txt
you should instead run the entire pipeline of commands with sudo, as follows:
sudo sh -c 'ls -al >> list.txt'

UPDATE: NOT RECOMMENDED - see comment below.
I seem to have a variant that works now without xargs.
sudo find /Users \( -iname '*.JPG' -or -iname '*.NEF' -or -iname '*.PNG' -or -iname '*.RAF'
-or -iname '*.PW2' -or -iname '*.DNG' \) -type f -and -size +10000k
-exec sh -c '{ stat -n -f"MACBOOK %z " "{}"; shasum -p -a 512 "{}"; }' {} \;

Related

Wildcard within if conditional of Bash fails to execute. Works when literal filename provided

I am trying to execute a command depending on the file type within directory. But am unable to check the content within directory using wildcard. When provided a literal filename I am able to execute.
find ./* -type d -execdir bash -c 'DIR=$(basename {}); if [[ -e {}/*.png ]]; then echo "img2pdf {}/*.png -o $DIR.pdf"; fi ' \;
Instead of going over directories, and then looking for png-s inside, find can find png-s straight away:
find . -name '*.png'
Then you can process it as you do, or using xargs:
find . -name '*.png' | xargs -I '{}' img2pdf '{}' -o '{}.pdf'
The command above will process convert each png to a separate pdf.
If you want to pass all png-s at once, and call img2pdf once:
find . -name '*.png' | xargs img2pdf -o out.pdf

Bash: Find command with multiple -name variable [duplicate]

I have a find command that finds files with name matching multiple patterns mentioned against the -name parameter
find -L . \( -name "SystemOut*.log" -o -name "*.out" -o -name "*.log" -o -name "javacore*.*" \)
This finds required files successfully at the command line. What I am looking for is to use this command in a shell script and join this with a tar command to create a tar of all log files. So, in a script I do the following:
LIST="-name \"SystemOut*.log\" -o -name \"*.out\" -o -name \"*.log\" -o -name \"javacore*.*\" "
find -L . \( ${LIST} \)
This does not print files that I am looking for.
First - why this script is not functioning like the command? Once it does, can I club it with cpio or similar to create a tar in one shot?
Looks like find fails to match * in patterns from unquoted variables. This syntax works for me (using bash arrays):
LIST=( -name \*.tar.gz )
find . "${LIST[#]}"
Your example would become the following:
LIST=( -name SystemOut\*.log -o -name \*.out -o -name \*.log -o -name javacore\*.\* )
find -L . \( "${LIST[#]}" \)
eval "find -L . \( ${LIST} \)"
You could use an eval and xargs,
eval "find -L . \( $LIST \) " | xargs tar cf 1.tar
When you have a long list of file names you use, you may want to try the following syntax instead:
# List of file patterns
Pat=( "SystemOut*.log"
"*.out"
"*.log"
"javacore*.*" )
# Loop through each file pattern and build a 'find' string
find $startdir \( -name $(printf -- $'\'%s\'' "${Pat[0]}") $(printf -- $'-o -name \'%s\' ' "${Pat[#]:1}") \)
That method constructs the argument sequentially using elements from a list, which tends to work better (at least in my recent experiences).
You can use find's -exec option to pass the results to an archiving program:
find -L . \( .. \) -exec tar -Af archive.tar {} \;
LIST="-name SystemOut*.log -o -name *.out -o -name *.log -o -name javacore*.*"
The wildcards are already quoted and you don't need to quote them again. Moreover, here
LIST="-name \"SystemOut*.log\""
the inner quotes are preserved and find will get them as a part of the argument.
Building -name list for find command
Here is a proper way to do this:
cmd=();for p in \*.{log,tmp,bak} .debug-\*;do [ "$cmd" ] && cmd+=(-o);cmd+=(-name "$p");done
Or
cmd=()
for p in \*.{log,tmp,bak,'Spaced FileName'} {.debug,.log}-\* ;do
[ "$cmd" ] && cmd+=(-o)
cmd+=(-name "$p")
done
You could dump you $cmd array:
declare -p cmd
declare -a cmd=([0]="-name" [1]="*.log" [2]="-o" [3]="-name" [4]="*.tmp" [5]="-o"
[6]="-name" [7]="*.bak" [8]="-o" [9]="-name" [10]="*.Spaced FileName"
[11]="-o" [12]="-name" [13]=".debug-*" [14]="-o" [15]="-name" [16]=".log-*")
Then now you could
find [-L] [/path] \( "${cmd[#]}" \)
As
find \( "${cmd[#]}" \)
(Nota: if no path is submited, current path . is default)
find /home/user/SomeDir \( "${cmd[#]}" \)
find -L /path -type f \( "${cmd[#]}" \)

How to run a command (1000 times) that requires two different types of input files

I have calculated directed modularity by means of DirectedLouvain (https://github.com/nicolasdugue/DirectedLouvain). I am now trying to test the significance of the values obtained, by means of a null model. To do it I need to run 1000 times one of the commands of DirectedLouvain over 1000 different input files.
Following # KamilCuk recomendations I have used this code that takes the 1000 *.txt input files and generates 1000 *.bin files and 1000 *.weights files. It worked perfectly:
find -type f -name '*.txt' |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./convert -i "$file" -o "$file_no_extension".bin -w "$file_no_extension".weights
done
Now I am trying to use another command that takes these two types of files (*.bin and *.weights) and generates *.tree files. I have tried this with no success:
find ./ -type f \( -iname \*.bin -o -iname \*.weights \) |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./community "$file.bin" -l -1 -w "$file.weights" > "$file_no_extension".tree
done
Any suggestion?
Find all files with that extension.
For each file
Extract the filename without exntesion
Run the command
So:
find -type f -name '*.ext' |
while IFS= read -r file; do
file_no_extension=${file##*/};
file_no_extension=${file_no_extension%%.*}
./convert -i "$file" -o "$file_no_extension".bin -w "$file_no_extension".weights
done
// with find:
find -type f -name '*.ext' -exec sh -c 'f=$(basename "$1" .ext); ./convert -i "$1" -o "$f".bin -w "$f".weights' _ {} \;
// with xargs:
find -type f -name '*.ext' |
xargs -d '\n' -n1 sh -c 'f=$(basename "$1" .ext); ./convert -i "$1" -o "$f".bin -w "$f".weights' _
You could use GNU Parallel to run your jobs in parallel across all your CPU cores like this:
parallel convert -i {} -o {.}.bin -w {.}.weights ::: input*.txt
Initially, you may like to do a "dry run" that shows what it would do without actually doing anything:
parallel --dry-run convert -i {} -o {.}.bin -w {.}.weights ::: input*.txt
If you get errors about the argument list being too long because you have too many files, you can feed their names in on stdin like this instead:
find . -name "input*txt" -print0 | parallel -0 convert -i {} -o {.}.bin -w {.}.weights
You can use find to list your files and execute a command on all of them:
find -name '*.ext' -exec ./runThisExecutable '{}' \;
If you have a.ext and b.ext in a directory, this will run ./runThisExecutable a.ext and ./runThisExecutable b.ext.
To test whether it identifies the right files, you can run it without -exec so it only prints the filenames:
find -name '*.ext'
./a.ext
./b.ext

What's wrong with this bash code to replace words in files?

I wrote this code a few months ago and didn't touch it again. Now I picked it up to complete it. This is part of a larger script to find all files with specific extensions, find which ones have a certain word, and replace every instance of that word with another one.
In this excerpt, ARG4 is the directory it starts looking at (it keeps going recursively).
ARG2 is the word it looks for.
ARG3 is the word that replaces ARG2.
ARG4="$4"
find -P "$ARG4" -type f -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Like I said it's been a while, but I've read the code and I think it's pretty understandable. I think the problem must be in the while loop. I googled more info about "while read ---" but I didn't find much.
EDIT 2: See my answer down below for the solution.
I discovered that find wasn't working properly. It turns out that it's because of -maxdepth 0 which I put there so that the search would only happen in the current directory. I took it out, but then the output of find was one single string with all of the file names. They needed to be separate entities so that the while loop could read each one. So I rewrote it:
files=(`find . -type f \( -name "*.h" -o -name "*.C" -o \
-name "*.cpp" -o -name "*.cc" \) \
-exec grep -l "$ARG1" {} \;`)
for i in ${files[#]} ; do
echo $i
echo `gsed -E -i "s/$ARG1/$ARG2/g" ${i}`
done
I had to install GNU sed, the regular one just wouldn't accept the file names.
It's hard to say if this is the only issue, since you haven't said precisely what's wrong. However, your find command's -exec action is only being applied for *.cc files. If you want it to apply for any of those, it should look more like:
ARG4="$4"
find -P "$ARG4" -type f \( -name '*.h' -o -name '*.C' \
-o -name '*.cpp' -o -name "*.cc" \) \
-exec grep -l "$ARG2" {} \; | while read file; do
echo "$file"
sed -n -i -E "s/"$ARG2"/"$ARG3"/g" "$file"
done
Note the added ( and ) for grouping to attach the action to the result of all of those.

How to run a command recursively on all files except for those under .svn directories

Here is how i run dos2unix recursively on all files:
find -exec dos2unix {} \;
What do i need to change to make it skip over files under .svn/ directories?
Actual tested solution:
$ find . -type f \! -path \*/\.svn/\* -exec dos2unix {} \;
Here's a general script on which you can change the last line as required.
I've taken the technique from my findrepo script:
repodirs=".git .svn CVS .hg .bzr _darcs"
for dir in $repodirs; do
repo_ign="$repo_ign${repo_ign+" -o "}-name $dir"
done
find \( -type d -a \( $repo_ign \) \) -prune -o \
\( -type f -print0 \) |
xargs -r0 \
dos2unix
Just offering an additional tip: piping the result through xargs instead of using find's -exec option will increase the performance when going through a large directory structure if the filtering program accepts multiple arguments, as this will reduce the number of fork()'s, so:
find <opts> | xargs dos2unix
One caveat: piping through xargs will fail horribly if any filenames include whitespace.
In bash
for fic in **/*; dos2unix $fic
Or even better in zsh
for fic in **/*(.); dos2unix $fic
find . -path ./.svn -prune -o -print0 | xargs -0 -i echo dos2unix "{}" "{}"
if you have bash 4.0
shopt -s globstar
shopt -s dotglob
for file in /path/**
do
case "$file" in
*/.svn* )continue;;
esac
echo dos2unix $file $file
done

Resources