Can find take a file as argument from stdin? - bash

I have a list of 3,900 ID numbers and I need to find on our FTP server the matching files.
Finding one file is quite simple e.g.
find . -name "*IDNumber*" -exec ls '{}' ';' -print
but how do I do this for 3,900 IDs numbers? I created a file with the IDs like so
028892663163
028923481973
...
but how do I pass the list of ID numbers as argument? Can you provide some pointers?
Thanks!

I would try to reduce the number of times you have to invoke find:
find . -type f -print | grep -f id.file | xargs cp -t target_dir

You may try to optimize it by running find with more than one id at a time.
With bash (100 at a time, you may try with more):
c= p=
while IFS= read -r; do
p+=" -name '*$REPLY*' -o "
(( ++c ))
(( c % 100 )) || {
eval find . ${p% -o }
p=
}
done < id_list_all
[[ $p ]] &&
eval find . ${p% -o }

Figured it out.
put all my 3,900 ID numbers in a file outfile
typed the command line:
cat outfile | while read line
do
find . -name "$line" -exec cp '{}' /target_directory ';' -print
done
Worked awesome!

I read your question wrong the first time... arguments from find to other things. What you want is arguments from a file passed to find. So, here's the correct answer with xargs:
xargs --max-args=1 -I X -d '\n' find . -name X -exec [...] < your_list

Related

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

Count filenumber in directory with blank in its name

If you want a breakdown of how many files are in each dir under your current dir:
for i in $(find . -maxdepth 1 -type d) ; do
echo -n $i": " ;
(find $i -type f | wc -l) ;
done
It does not work when the directory name has a blank in the name. Can anyone here tell me how I must edite this shell script so that such directory names also accepted for counting its file contents?
Thanks
Your code suffers from a common issue described in http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29.
In your case you could do this instead:
for i in */; do
echo -n "${i%/}: "
find "$i" -type f | wc -l
done
This will work with all types of file names:
find . -maxdepth 1 -type d -exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
How it works
find . -maxdepth 1 -type d
This finds the directories just as you were doing
-exec sh -c 'printf "%s: %i\n" "$1" "$(find "$1" -type f | wc -l)"' Counter {} \;
This feeds each directory name to a shell script which counts the files, similarly to what you were doing.
There are some tricks here: Counter {} are passed as arguments to the shell script. Counter becomes $0 (which is only used if the shell script generates an error. find replaces {} with the name of a directory it found and this will be available to the shell script as $1. This is done is a way that is safe for all types of file names.
Note that, wherever $1 is used in the script, it is inside double-quotes. This protects it for word splitting or other unwanted shell expansions.
I found the solution what I have to consider:
Consider_this
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for i in $(find . -maxdepth 1 -type d); do
echo -n " $i: ";
(find $i -type f | wc -l) ;
done
IFS=$SAVEIFS

Bash function find and rm

I am trying to do a recursive grep and deleting files with less than a specified entry.
To be more clear, I have a directory of 400000 text files and in each file i have 10 items each starting with the >. Now the problem is that some of the files out of the 4000000 files have only 6-7 or 8-9 items starting with >.
So I wish to delete the files which have fewer than 10 items. I am using the recursive function, however i am not able to figure out how to add rm in the recursive way. What I have till now is:
find . -name "*.[txt]" -exec grep ">" -c {} \;
You can use -exec like this:
find . -name "*.txt" -exec bash -c '(( $(grep ">" -c "$1") <= 10 )) && rm "$1"' - '{}' \;
To avoid creating shell per file you can use:
while read -r f; do
(( $(grep ">" -c "$f") <= 10 )) && rm "$f"
done < <(find . -name "*.txt")
I would break it up into smaller steps:
find . -type f -exec grep -c '>' {} + |
awk -F: '$2 != 10 {print $1}' |
xargs echo rm
remove the "echo" if you're satisfied it's working
The awk step is fragile if you have any filenames containing ":"

find -exec with multiple commands

I am trying to use find -exec with multiple commands without any success. Does anybody know if commands such as the following are possible?
find *.txt -exec echo "$(tail -1 '{}'),$(ls '{}')" \;
Basically, I am trying to print the last line of each txt file in the current directory and print at the end of the line, a comma followed by the filename.
find accepts multiple -exec portions to the command. For example:
find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;
Note that in this case the second command will only run if the first one returns successfully, as mentioned by #Caleb. If you want both commands to run regardless of their success or failure, you could use this construct:
find . -name "*.txt" \( -exec echo {} \; -o -exec true \; \) -exec grep banana {} \;
find . -type d -exec sh -c "echo -n {}; echo -n ' x '; echo {}" \;
One of the following:
find *.txt -exec awk 'END {print $0 "," FILENAME}' {} \;
find *.txt -exec sh -c 'echo "$(tail -n 1 "$1"),$1"' _ {} \;
find *.txt -exec sh -c 'echo "$(sed -n "\$p" "$1"),$1"' _ {} \;
Another way is like this:
multiple_cmd() {
tail -n1 $1;
ls $1
};
export -f multiple_cmd;
find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
in one line
multiple_cmd() { tail -1 $1; ls $1 }; export -f multiple_cmd; find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
"multiple_cmd()" - is a function
"export -f multiple_cmd" - will export it so any other subshell can see it
"find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;" - find that will execute the function on your example
In this way multiple_cmd can be as long and as complex, as you need.
Hope this helps.
There's an easier way:
find ... | while read -r file; do
echo "look at my $file, my $file is amazing";
done
Alternatively:
while read -r file; do
echo "look at my $file, my $file is amazing";
done <<< "$(find ...)"
Extending #Tinker's answer,
In my case, I needed to make a command | command | command inside the -exec to print both the filename and the found text in files containing a certain text.
I was able to do it with:
find . -name config -type f \( -exec grep "bitbucket" {} \; -a -exec echo {} \; \)
the result is:
url = git#bitbucket.org:a/a.git
./a/.git/config
url = git#bitbucket.org:b/b.git
./b/.git/config
url = git#bitbucket.org:c/c.git
./c/.git/config
I don't know if you can do this with find, but an alternate solution would be to create a shell script and to run this with find.
lastline.sh:
echo $(tail -1 $1),$1
Make the script executable
chmod +x lastline.sh
Use find:
find . -name "*.txt" -exec ./lastline.sh {} \;
Thanks to Camilo Martin, I was able to answer a related question:
What I wanted to do was
find ... -exec zcat {} | wc -l \;
which didn't work. However,
find ... | while read -r file; do echo "$file: `zcat $file | wc -l`"; done
does work, so thank you!
1st answer of Denis is the answer to resolve the trouble. But in fact it is no more a find with several commands in only one exec like the title suggest. To answer the one exec with several commands thing we will have to look for something else to resolv. Here is a example:
Keep last 10000 lines of .log files which has been modified in the last 7 days using 1 exec command using severals {} references
1) see what the command will do on which files:
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "echo tail -10000 {} \> fictmp; echo cat fictmp \> {} " \;
2) Do it: (note no more "\>" but only ">" this is wanted)
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "tail -10000 {} > fictmp; cat fictmp > {} ; rm fictmp" \;
I usually embed the find in a small for loop one liner, where the find is executed in a subcommand with $().
Your command would look like this then:
for f in $(find *.txt); do echo "$(tail -1 $f), $(ls $f)"; done
The good thing is that instead of {} you just use $f and instead of the -exec … you write all your commands between do and ; done.
Not sure what you actually want to do, but maybe something like this?
for f in $(find *.txt); do echo $f; tail -1 $f; ls -l $f; echo; done
should use xargs :)
find *.txt -type f -exec tail -1 {} \; | xargs -ICONSTANT echo $(pwd),CONSTANT
another one (working on osx)
find *.txt -type f -exec echo ,$(PWD) {} + -exec tail -1 {} + | tr ' ' '/'
A find+xargs answer.
The example below finds all .html files and creates a copy with the .BAK extension appended (e.g. 1.html > 1.html.BAK).
Single command with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} cp -- "{}" "{}.BAK"
Multiple commands with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} echo "cp -- {} {}.BAK ; echo {} >> /tmp/log.txt" | sh
# if you need to do anything bash-specific then pipe to bash instead of sh
This command will also work with files that start with a hyphen or contain spaces such as -my file.html thanks to parameter quoting and the -- after cp which signals to cp the end of parameters and the beginning of the actual file names.
-print0 pipes the results with null-byte terminators.
for xargs the -I {} parameter defines {} as the placeholder; you can use whichever placeholder you like; -0 indicates that input items are null-separated.
I found this solution (maybe it is already said in a comment, but I could not find any answer with this)
you can execute MULTIPLE COMMANDS in a row using "bash -c"
find . <SOMETHING> -exec bash -c "EXECUTE 1 && EXECUTE 2 ; EXECUTE 3" \;
in your case
find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
i tested it with a test file:
[gek#tuffoserver tmp]$ ls *.txt
casualfile.txt
[gek#tuffoserver tmp]$ find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
testonline1=some TEXT
./casualfile.txt
Here is my bash script that you can use to find multiple files and then process them all using a command.
Example of usage. This command applies a file linux command to each found file:
./finder.sh file fb2 txt
Finder script:
# Find files and process them using an external command.
# Usage:
# ./finder.sh ./processing_script.sh txt fb2 fb2.zip doc docx
counter=0
find_results=()
for ext in "${#:2}"
do
# #see https://stackoverflow.com/a/54561526/10452175
readarray -d '' ext_results < <(find . -type f -name "*.${ext}" -print0)
for file in "${ext_results[#]}"
do
counter=$((counter+1))
find_results+=("${file}")
echo ${counter}") ${file}"
done
done
countOfResults=$((counter))
echo -e "Found ${countOfResults} files.\n"
echo "Processing..."
counter=0
for file in "${find_results[#]}"
do
counter=$((counter+1))
echo -n ${counter}"/${countOfResults}) "
eval "$1 '${file}'"
done
echo "All files have been processed."

Resources