Shell how to use a command on every file in a directory - bash

So what I have to do is find all regular files within and below the directory. For each of these regular files, I have to egrep for pattern($ARG) and find out if the output of the file matches the pattern ($ARG), if it does it will add one to the counter.
What I have so far is the file command:
$count = 0
file *
However, I am having trouble getting egrep &ARG > /dev/null/ ; echo $? to run through each file that appears from (file *).
I understand that file * | egrep directory > /dev/null ; echo $? will output 0 because it find the pattern 'directory' in the file, but am having trouble getting it to loop through each regular file so I can add one to the counter every time the pattern is matched.

The question is not clear, but if you're looking for number of files containing a pattern
grep -l "pattern" * 2>/dev/null | wc -l
will give you that. Errors are ignored coming from directories.
If you want recursively do the complete tree including dot files
grep -r -l "pattern" | wc -l

You can try this:
counter=0
find /path/to/directory/ -type f | \
while read file
do
if grep -i -e -q "$pattern" "$file"
then counter=$((counter+1))
fi
done
echo "$counter"

See http://mywiki.wooledge.org/BashFAQ/020
counter=0
shopt -s globstar nullglob
for file in **; do
grep -qiE "$pattern" "$file" && ((counter++))
done
echo "$counter"
If you want to include hidden files, add shopt -s dotglob

Related

bash: Check if a file in sequence is missing

I have a bunch of files in a directory whose names contain numbers.
/mnt/exp-data/6/instrument/caen2018/stage0/S0Test_26060_500ns_CW_0ns_CBT_0ns_DEBT.root
/mnt/exp-data/6/instrument/caen2018/stage0/S0Run_26061_500ns_CW_0ns_CBT_0ns_DEBT.root
/mnt/exp-data/6/instrument/caen2018/stage0/S0Test_26063_500ns_CW_0ns_CBT_0ns_DEBT.root
/mnt/exp-data/6/instrument/caen2018/stage0/S0Run_26065_500ns_CW_0ns_CBT_0ns_DEBT.root
What I'd like to do is find which files are missing and then do something with those. In the above case the files which contains numbers 26062, 26064 are missing.
So far I'm doing the following
#___________________________________________________________________________________________________________________________
#-3-Find the missing runs
REPLAYED_RUNS_DIR=/ /mnt/exp-data/6/instrument/caen2018/stage0
echo "..........Looking for non replayed runs in the range $smallest_run-$biggest_run"
for (( i=$smallest_run; i<=$biggest_run; ++i));do
filename="$REPLAYED_DATA_DIR/*$i*.root"
#echo $filename
if [ ! -f $filename ]; then
echo "Run $i does not exist."
./produce_file $i
fi
done
This snippet manages to find files that are missing, however I have a few issues:
In some cases I get the following error for files that do exist and I have no idea why.
./check_missing.sh: line 53: [: /mnt/exp-data/6/instrument/caen2018/stage0/S0Run_25829_500ns_CW_0ns_CBT_0ns_DEBT.root: binary operator expected
If I uncomment echo filename the I get as an output the full name and directory of the files, as if I was doing ls instead of echo`. Is this to be expected?
Is there a better way (i,e, faster, more efficient) to do what I'm trying to do?
Here is a script to do this in a more massive way.
#!/bin/bash
d="path/to/directory"
start=$1
end=$2
join -v1 <(
seq "$start" "$end"
) <(
find "$d" -type f -printf "%f\0" |
awk -F"/" -v RS="\0" '{split($NF,a,"_"); print a[2]}' | sort
) | xargs -r -n1 echo ./produce_file
join -v1 file1 file2 will output all lines of file1 not in file2. In place of those two files, using process substitution, we put the sequence to be tested, and the filenames by find, filtered by awk to get the number in them and finally sorted, because join wants sorted inputs.
Finally you can pipe the result to your script produce_file. -r stands for --no-run-if-empty which is a GNU extension to avoid one execution with empty input if the previous command gave no arguments.
Remove echo after testing. If your script can process multiple number arguments, remove -n1 also, to process all of them at once.
Testing:
> mkdir -p path/to/directory
touch path/to/directory/S0Test_26060_0ns_CBT_0ns_DEBT.root
touch path/to/directory/S0Run_26061_500ns.root
touch path/to/directory/S0Test_26063_500ns_CW.root
touch path/to/directory/S0Test_26065_500ns.root
touch path/to/directory/S0Test_30000_500ns.root
> bash test.sh 26060 26065
./produce_file 26062
./produce_file 26064
I found another way to do it using find and [ -z "$filename"] to check if find returned an empty entry.
for (( i=$start; i<=$endn; ++i ));do
filename=`find "$DIR" -type f -name "*$i*"`
if [ -z "$filename" ]; then
echo "File $i does not exist."
fi
done

Shell script to loop over all files in a folder and pick them in numerical order

I have the following code to loop through the files of a folder. Files are named 1.txt, 2.txt all the way to 15.txt
for file in .solutions/*; do
if [ -f "$file" ]; then
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
fi
done
My issue I get 1.txt then 10.txt to 15.txt displayed.
I would like it to be displayed in numerical order instead of lexicographical order, in other words I want the loop to iterate though the files in numerical order. Is there any way to achieve this?
ls *.txt | sort -n
This would solve the problem, provided .solutions is a directory and no directory is named with an extension .txt.
and if you want complete accuracy,
ls -al *.txt | awk '$0 ~ /^-/ {print $9}' | sort -n
Update:
As per your edits,
you can simply do this,
ls | sort -n |
while read file
do
#do whatever you want here
:
done
Looping through ls is usually a bad idea since file names can have newlines in them. Redirecting using process substitution instead of piping the results will keep the scope the same (variables you set will stay after the loop).
#!/usr/bin/env bash
while IFS= read -r -d '' file; do
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
done < <(find '.solutions/' -name '*.txt' -type f -print0 | sort -nz)
Setting IFS to "" keeps the leading/trailing spaces, -r to stop backslashes messing stuff up, and -d '' to use NUL instead of newlines.
The find command looks normal files -type f, so the if [ -f "$file" ] check isn't needed. It finds -name '*.txt' files in '.solutions/' and prints them -print0 NUL terminated.
The sort command accepts NUL terminated strings with the -z option, and sorts them numerically with -n.

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

List files whose last line doesn't contain a pattern

The very last line of my file should be "#"
if I tail -n 1 * | grep -L "#" the result is (standard input) obviously because it's being piped.
was hoping for a grep solution vs reading the entire file and just searching the last line.
for i in *; do tail -n 1 "$i" | grep -q -v '#' && echo "$i"; done
You can use sed for that:
sed -n 'N;${/pattern/!p}' file
The above command prints all lines of file if it's last line doesn't contain a pattern.
However, it looks like I misunderstood you, you want only to print the file names of the those files where the last line doesn't match the pattern. In this case I would use find together with the following (GNU) sed command:
find -maxdepth 1 -type f -exec sed -n '${/pattern/!F}' {} \;
The find command iterates over all files in the current folder and executes the sed command. $ marks the last line of input. If /pattern/ isn't found ! then F prints the file name.
The solution above looks nice and executes fast it has a drawback it would not print the names of empty files, since the last line will never reached and $ will not match.
For a stable solution I would suggest to put the commands into a script:
script.sh
#!/bin/bash
# Check whether the file is empty ...
if [ ! -s "$1" ] ; then
echo "$1"
else
# ... or if the last line contains a pattern
sed -n '${/pattern/!F}' "$1"
# If you don't have GNU sed you can use this
# (($(tail -n1 a.txt | grep -c pattern))) || echo "$1"
fi
make it executable
chmod +x script.sh
And use the following find command:
find -maxdepth 1 -type f -exec ./script.sh {} \;
Consider this one-liner:
while read name ; do tail -n1 "$name" | grep -q \# || echo "$name" does not contain the pattern ; done < <( find -type f )
It uses tail to get the last line of each file and grep to test that line against the pattern. Performance will not be the best on many files because two new processes are started in each iteration.

How to use grep in a for loop

Could someone please help with this script. I need to use grep to loop to through the filenames that need to be changed.
#!/bin/bash
file=
for file in $(ls $1)
do
grep "^.old" | mv "$1/$file" "$1/$file.old"
done
bash can handle regular expressions without using grep.
for f in "$1"/*; do
[[ $f =~ \.old ]] && continue
# Or a pattern instead
# [[ $f == *.old* ]] && continue
mv "$f" "$f.old"
done
You can also move the name checking into the pattern itself:
shopt -s extglob
for f in "$1/"!(*.old*); do
mv "$f" "$f.old"
done
If I understand your question correctly, you want to make rename a file (i.e. dir/file.txt ==> dir/file.old) only if the file has not been renamed before. The solution is as follow.
#!/bin/bash
for file in "$1/"*
do
backup_file="${file%.*}.old"
if [ ! -e "$backup_file" ]
then
echo mv "$file" "$backup_file"
fi
done
Discussion
The script currently does not actual make back up, it only displays the action. Run the script once and examine the output. If this is what you want, then remove the echo from the script and run it again.
Update
Here is the no if solution:
ls "$1/"* | grep -v ".old" | while read file
do
echo mv "$file" "${file}.old"
done
Discussion
The ls command displays all files.
The grep command filter out those files that has the .old extension so they won't be displayed.
The while loop reads the file names that do not have the .old extension, one by one and rename them.

Resources