Delete empty files - Improve performance of logic - bash

I am i need to find & remove empty files. The definition of empty files in my use case is a file which has zero lines.
I did try testing the file to see if it's empty However, this behaves strangely as in even though the file is empty it doesn't detect it so.
Hence, the best thing I could write up is the below script which i way too slow given it has to test several hundred thousand files
#!/bin/bash
LOOKUP_DIR="/path/to/source/directory"
cd ${LOOKUP_DIR} || { echo "cd failed"; exit 0; }
for fname in $(realpath */*)
do
if [[ $(wc -l "${fname}" | awk '{print $1}') -eq 0 ]]
then
echo "${fname}" is empty
rm -f "${fname}"
fi
done
Is there a better way to do what I'm after or alternatively, can the above logic be re-written in a way that brings better performance please?

Your script is slow beacuse wc reads every file to the end, which is not needed for your purpose. This might be what you're looking for:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || echo rm -f -- "$file"
fi
done
Drop the echo after making sure it works as intended.
Another version, calling the rm only once, could be:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || files_to_be_deleted+=("$file")
fi
done
rm -f -- "${files_to_be_deleted[#]}"
Explanation:
The core logic is in the line
read < "$file" || rm -f -- "$file"
The read < "$file" command attempts to read a line from the $file. If it succeeds, that is, a line is read, then the rm command on the right-hand side of the || won't be executed (that's how the || works). If it fails then the rm command will be executed. In any case, at most one line will be read. This has great advantage over the wc command because wc would read the whole file.
if ! read < "$file"; then rm -f -- "$file"; fi
could be used instead. The two lines are equivalent.

To check a "$fname" is a file and is empty or not, use [ -s "$fname" ]:
#!/usr/bin/env sh
LOOKUP_DIR="/path/to/source/directory"
for fname in "$LOOKUP_DIR"*/*; do
if ! [ -s "$fname" ]; then
echo "${fname}" is empty
# remove echo when output is what you want
echo rm -f "${fname}"
fi
done
See: help test:
File operators:
...
-s FILE True if file exists and is not empty.

Yet another method
wc -l ~/tmp/* 2>/dev/null | awk '$1 == 0 {print $2}' | xargs echo rm
This will break if any of your files have whitespace in the name.
To work around that, with awk still
wc -l ~/tmp/* 2>/dev/null \
| awk 'sub(/^[[:blank:]]+0[[:blank:]]+/, "")' \
| xargs echo rm
This works because the sub function returns the number of substitutions made, which can be treated as a boolean zero/not-zero condition.
Remove the echo to actually delete the files.

Related

Add character to file name if duplicate when moving with bash

I currently use a bash script and PDFgrep to rename files to a certain structure. However, in order to stop overriding if the new file has a duplicate name, I want to add a number at the end of the name. Keep in mind that there may be 3 or 4 duplicate names. What's the best way to do this?
#!/bin/bash
if [ $# -ne 1 ]; then
echo Usage: Renamer file
exit 1
fi
f="$1"
id1=$(pdfgrep -m 1 -i "MR# : " "$f" | grep -oE "[M][0-9][0-9]+") || continue
id2=$(pdfgrep -m 1 -i "Visit#" "$f" | grep -oE "[V][0-9][0-9]+") || continue
{ read today; read dob; read dop; } < <(pdfgrep -i " " "$f" | grep -oE "[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]")
dobsi=$(echo $dob | sed -e 's/\//-/g')
dopsi=$(echo $dop | sed -e 's/\//-/g')
mv -- "$f" "${id1}_${id2}_$(printf "$dobsi")_$(printf "$dopsi")_1.pdf"
Use a loop that checks if the destination filename exists, and increments a counter if it does. Replace the mv line with this:
prefix="${id1}_{id2}_${dob}_${dop}"
counter=0
while true
do
if [ "$counter" -ne 0 ]
then target="${prefix}_${counter}.pdf"
else target="${prefix}.pdf"
fi
if [ ! -e "$target" ]
then
mv -- "$f" "$target"
break
fi
((counter++))
done
Note that this suffers from a TOCTTOU problem, if the duplicate file is created between the ! -f "$target" test and the mv. I thought it would be possible to replace the existence check with using mv -n; but while this won't overwrite the file, it still treats the mv as successful, so you can't test the result to see if you need to increment the counter.

How to find symlinks in a directory that points to another

I need to write a bash script that finds and lists symlinks from one directory (lets say some "Directory1") but only the ones pointing to files in certain another directory (lets say "Directory2"). I can`t use "find".
I have tried something like this but it's apparently wrong:
if [[ -d $1 ]]&&[[ -d $2 ]]
then
current_dir='pwd'
cd $1
do
for plik in *
if[[-L $file] && ["$(readlink -- "$file")" = "$2"] ]
then
#ls -la | grep ^l
echo "$(basename "$file")"
fi
done
fi
How about a simple ls with grep :
ls -l Directory1/ | grep "\->" | grep "Directory2"
I have found a solution:
for file1 in $_cat1/* do
if [ -L $file1]; then
dir1="$(dirname `readlink -f $file1`)"
dir2="$(dirname `readlink -f $_cat2`)"
if [dir1 == dir2]
echo $dir1
fi
fi
done

find multiple patterns in multiple files bash

I'm trying to find multiple patterns (I have a file of them) in multiple differents files with a lot of subdirs.
I'm trying to use exit codes for not outputting all patterns found (because I need only the ones which are NOT found), but exit codes doesn't work as I understand them.
while read pattern; do
grep -q -n -r $pattern ./dir/
if [ $? -eq 0 ]; then
: #echo $pattern ' exists'
else
echo $pattern " doesn't exist"
fi
done <strings.tmp
You can use this in bash:
while read -r pattern; do
grep -F -q -r "$pattern" ./dir/ || echo $pattern " doesn't exist"
done < strings.tmp
Use read -r to safely read regex patterns
Use quoting in "$pattern" to avoid shell escaping
No need to use -n since you're using -q (quiet) flag
#anubhava's solution should work. If it doesn't for some reason, try the following
while read -r pattern; do
lines=`grep -q -r "$pattern" ./dir/ | wc -l`
if [ $lines -eq 0 ]; then
echo $pattern " doesn't exist"
else
echo $pattern "exists"
fi
done < strings.tmp

create and rename multiple copies of files

I have a file input.txt that looks as follows.
abas_1.txt
abas_2.txt
abas_3.txt
1fgh.txt
3ghl_1.txt
3ghl_2.txt
I have a folder ff. The filenames of this folder are abas.txt, 1fgh.txt, 3ghl.txt. Based on the input file, I would like to create and rename the multiple copies in ff folder.
For example in the input file, abas has three copies. In the ff folder, I need to create the three copies of abas.txt and rename it as abas_1.txt, abas_2.txt, abas_3.txt. No need to copy and rename 1fgh.txt in ff folder.
Your valuable suggestions would be appreciated.
You can try something like this (to be run from within your folder ff):
#!/bin/bash
while IFS= read -r fn; do
[[ $fn =~ ^(.+)_[[:digit:]]+\.([^\.]+)$ ]] || continue
fn_orig=${BASH_REMATCH[1]}.${BASH_REMATCH[2]}
echo cp -nv -- "$fn_orig" "$fn"
done < input.txt
Remove the echo if you're happy with it.
If you don't want to run from within the folder ff, just replace the line
echo cp -nv -- "$fn_orig" "$fn"
with
echo cp -nv -- "ff/$fn_orig" "ff/$fn"
The -n option to cp so as to not overwrite existing files, and the -v option to be verbose. The -- tells cp that there are no more options beyond this point, so that it will not be confused if one of the files starts with a hyphen.
using for and grep :
#!/bin/bash
for i in $(ls)
do
x=$(echo $i | sed 's/^\(.*\)\..*/\1/')"_"
for j in $(grep $x in)
do
cp -n $i $j
done
done
Try this one
#!/bin/bash
while read newFileName;do
#split the string by _ delimiter
arr=(${newFileName//_/ })
extension="${newFileName##*.}"
fileToCopy="${arr[0]}.$extension"
#check for empty : '1fgh.txt' case
if [ -n "${arr[1]}" ]; then
#check if file exists
if [ -f $fileToCopy ];then
echo "copying $fileToCopy -> $newFileName"
cp "$fileToCopy" "$newFileName"
#else
# echo "File $fileToCopy does not exist, so it can't be copied"
fi
fi
done
You can call your script like this:
cat input.txt | ./script.sh
If you could change the format of input.txt, I suggest you adjust it in order to make your task easier. If not, here is my solution:
#!/bin/bash
SRC_DIR=/path/to/ff
INPUT=/path/to/input.txt
BACKUP_DIR=/path/to/backup
for cand in `ls $SRC_DIR`; do
grep "^${cand%.*}_" $INPUT | while read new
do
cp -fv $SRC_DIR/$cand $BACKUP_DIR/$new
done
done

Bash: Native way to check if an entry is one line?

I have a find script that automatically opens a file if just one file is found. The way I currently handle it is doing a word count on the number of lines of the search results. Is there an easier way to do this?
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
edit `cat "$temp"`
fi
EDITED - here is the context of the whole script.
term="$1"
temp=".aafind.txt"
find src sql common -iname "*$term*" | grep -v 'src/.*lib' >> "$temp"
if [ ! -s "$temp" ]; then
echo "ΓΈ - including lib..." 1>&2
find src sql common -iname "*$term*" >> "$temp"
fi
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
# just open it in an editor
edit `cat "$temp"`
else
# format output
term_regex=`echo "$term" | sed "s%\*%[^/]*%g" | sed "s%\?%[^/]%g" `
cat "$temp" | sed -E 's%//+%/%' | grep --color -E -i "$term_regex|$"
fi
rm "$temp"
Unless I'm misunderstanding, the variable $temp contains one or more filenames, one per line, and if there is only one filename it should be edited?
[ $(wc -l <<< "$temp") = "1" ] && edit "$temp"
If $temp is a file containing filenames:
[ $(wc -l < "$temp") = "1" ] && edit "$(cat "$temp")"
Several of the results here will read through an entire file, whereas one can stop and have an answer after one line and one character:
if { IFS='' read -r result && ! read -n 1 _; } <file; then
echo "Exactly one line: $result"
else
echo "Either no valid content at all, or more than one line"
fi
For safely reading from find, if you have GNU find and bash as your shell, replace <file with < <(find ...) in the above. Even better, in that case, is to use NUL-delimited names, such that filenames with newlines (yes, they're legal) don't trip you up:
if { IFS='' read -r -d '' result && ! read -r -d '' -n 1 _; } \
< <(find ... -print0); then
printf 'Exactly one file: %q\n' "$result"
else
echo "Either no results, or more than one"
fi
Well, given that you are storing these results in the file $temp this is a little easier:
[ "$( wc -l < $temp )" -eq 1 ] && edit "$( cat $temp )"
Instead of 'cat $temp' you can do '< $temp', but it might take away some readability if you are not very familiar with redirection 8)
If you want to test whether the file is empty or not, test -s does that.
if [ -s "$temp" ]; then
edit `cat "$temp"`
fi
(A non-empty file by definition contains at least one line. You should find that wc -l agrees.)
If you genuinely want a line count of exactly one, then yes, it can be simplified substantially;
if [ $( wc -l <"$temp" ) = 1 ]; then
edit `cat "$temp"`
fi
You can use arrays:
x=($(find . -type f))
[ "${#x[*]}" -eq 1 ] && echo "just one || echo "many"
But you might have problems in case of filenames with whitespace, etc.
Still, something like this would be a native way
no this is the way, though you're making it over-complicated:
if [ "`wc -l $temp | cut -d' ' -f1`" = "1" ]; then
edit "$temp";
fi
what's complicating it is:
useless use of cat,
unuseful use of xargs
and I'm not sure if you really want the editcat $temp`` which is editing the file at the content of $temp

Resources