Select modified files using AWK - bash

I am working on a task for which AWK is the designated tool.
The task is to list files that are:
modified today (same day the script is run)
of size 1 MB or less (size <= 1048576 bytes)
User's input is to instruct where to start search.
Search for files recursively.
Script:
#!/bin/bash
#User's input target for search of files.
target="$1"
#Absolute path of target.
ap="$(realpath $target)"
echo "Start search in: $ap/*"
#Today's date (yyyy-mm-dd).
today="$(date '+%x')"
#File(s) modified today.
filemod="$(find $target -newermt $today)"
#Loop through files modified today.
for fm in $filemod
do
#Print name and size of file if no larger than 1 MiB.
ls -l $fm | awk '{if($5<=1048576) print $5"\t"$9}'
done
My problem is that the for-loop does not mind the size of files!
Every variable gets its intended value. AWK does what it should outside a for-loop. I've experimented with quotation marks to no avail.
Can anyone tell what's wrong?
I appreciate any feedback, thanks.
Update:
I've solved it by searching explicitly for files:
filemod="$(find $target -type f -newermt $today)"
How come that matters?

Don't parse 'ls' output. Use stat instead, like this:
for fm in $filemod; do
size=$(stat --printf='%s\n' "$fm")
if (( size <= 1048576)); then
printf "%s\t%s\n" "$size" "$fm"
fi
done
The above method is not immune to files that have white spaces or wild cards in their name. To handle such files gracefully, do this:
while IFS= read -r -d '' file; do
size=$(stat --printf='%s\n' "$file")
if (( size <= 1048576)); then
printf "%s\t%s\n" "$size" "$file"
fi
done < <(find $target -newermt $today -print0)
See also:
How to loop through file names returned by find?

Related

Add file name to txt file if it is not zero bytes

I would like to add files that meet a set of conditions to a txt file for easy transfer later, I do this with:
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
case $val in
(Result*.RData) echo "$val" >> CompletedJobs.txt ;;
esac
done
I've run into the situation where some of the files are corrupted and show up as zero byte files, which i can find manually with:
find . -size 0 -maxdepth 1
How do I include adjust my loop to reject files that are zero bytes?
The code
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
has essentially identical functionality to
for val in $(ls -1); do
That doesn't work in general. It breaks if filenames have whitespace or glob characters in them, at least. See Bash Pitfalls #1 (for f in $(ls *.mp3)). In addition, there are particular problems with using the output of ls in programs. It's only suitable for interactive use. See Why you shouldn't parse the output of ls(1).
A correct, completely safe, and much shorter and faster, alternative is:
for val in *; do
A full solution for your question is:
shopt -s nullglob
for file in Result*.RData; do
[[ -f $file && -s $file ]] && printf '%s\n' "$file"
done >CompletedJobs.txt
shopt -s nullglob prevents glob patterns expanding to (what amounts to ) garbage if they don't match any files.
I've replaced val with the more meaningful (to me anyway) file.
The Result*.RData causes the loop to only process files that match that pattern.
I've added a -f $file test to avoid processing any non-file things (directories, fifos, ...) that might be lying around. It still allows symlinks to files through. You might not want that. You can add a ! -L $file && at the start of the test expression if you want to rule out symlinks.
I've replaced echo "$val" with printf '%s\n' "$val" because the original code doesn't work in general. See the accepted, and excellent, answer to Why is printf better than echo?.
I've moved the redirection to CompletedJobs.txt outside the loop, as suggested by #CharlesDuffy in a comment.
Note that this code won't work if any of the files have newlines in their names (e.g. create one with echo data > $'Result\n1.RData'). That's very uncommon, but posssible. The only way to safely store general unquoted filenames in files is to separate them with ASCII NUL characters (which can't appear in file names). To do that, replace the printf ... with printf '%s\0' "$file". That would mean that CompletedJobs.txt is no longer a text file though. It would also require modifications to any tools that read the file.
You could also do this just with find:
find . -maxdepth 1 -type f -name 'Result*.RData' -not -size 0 -printf '%P\n' >CompletedJobs.txt
The %P format with -printf removes the leading ./ from outputs so you get Result2.RData instead of ./Result2.RData (which find would print by default, or with the -print option).
Replace \n with \0 to make the output safe for any possible filename.
-s file True if file exists and has a size greater than zero.
ls -1 > AllFilesPresent.txt
value=$(<AllFilesPresent.txt)
rm AllFilesPresent.txt
for val in $value; do
if [[ -s "${val}" ]]; then
case $val in
(Result*.RData) echo "$val" >> CompletedJobs.txt ;;
esac
fi
done

How to create a txt file with a list of directory names if directories have a certain file

I have a parent directory with over 800+ directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.
I need to script a loop that will check each of the 800+ for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[#]%%/*}" > IDlist.txt
Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[#]%%/*}". The result is printed, one name per line: printf '%s\n'.
If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by #LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by #LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep, sed, awk, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
The first example doesn't check that z is a file, but I think it's worth showing compgen:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
Doing glob expansion, file check and path splitting with perl only:
perl -E 'foreach $p (glob "*/y/z") {say substr($p, 0, index($p, "/")) if -f $p}' > IDlist.txt

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

Iterate through several files in bash [duplicate]

This question already has answers here:
How to zero pad a sequence of integers in bash so that all have the same width?
(15 answers)
Closed 6 years ago.
I have a folder with several files that are named like this:
file.001.txt.gz, file.002.txt.gz, ... , file.150.txt.gz
What I want to do is use a loop to run a program with each file. I was thinking in something like this (just a sketch):
for i in {1:150}
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$1.txt
gzip file.$1.txt
First of all, I don't know if something like this is gonna work, and second, I can't figure out how to keep the three digits numeration the file have ('001' instead of just '1').
Thanks a lot
The syntax for ranges in bash is
{1..150}
not {1:150}.
Moreover, if your bash is recent enough, you can add the leading zeroes:
{001..150}
The correct syntax of the for loop needs do and done.
for i in {001..150} ; do
# ...
done
It's unclear what $1 contains in your script.
To iterate over files I believe the simpler way is:
(assuming there are no files named 'file.*.txt' already in the directory and that your output file can have a different name)
for i in file.*.txt.gz; do
gunzip $i
./my_program $i $i-output.txt
gzip file.*.txt
done
Using find command:
# Path to the source directory
dir="./"
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done < <(find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$')
The same via pipe:
find "$dir" \
-regextype 'posix-egrep' \
-regex '.*file\.[0-9]{3}\.txt\.gz$' | \
while read file
do
output="$(basename "$file")"
output="$(dirname "$file")/"${output/#file/output}
echo "$file ==> $output"
done
Sample output
/home/ruslan/tmp/file.001.txt.gz ==> /home/ruslan/tmp/output.001.txt.gz
/home/ruslan/tmp/file.002.txt.gz ==> /home/ruslan/tmp/output.002.txt.gz
(for $dir=/home/ruslan/tmp/).
Description
The scripts iterate the files in $dir directory. The $file variable is filled with the next line read from the find command.
The find command returns a list of paths corresponding to the regular expression '.*file\.[0-9]{3}\.txt\.gz$'.
The $output variable is built from two parts: basename (path without directories) and dirname (path to file's directory).
${output/#file/output} expression replaces file with output at the front end of $output variable (see Manipulating Strings)
Try-
for i in $(seq -w 1 150) #-w adds the leading zeroes
do
gunzip file."$i".txt.gz
./my_program file."$i".txt output."$1".txt
gzip file."$1".txt
done
The syntax for ranges is as choroba said, but when iterating over files you usually want to use a glob. If you know all the files have three digits in their names you can match on digits:
shopt -s nullglob
for i in file.0[0-9][0-9].txt.gz file.1[0-4][0-9] file.15[0].txt.gz; do
gunzip file.$i.txt.gz
./my_program file.$i.txt output.$i.txt
gzip file.$i.txt
done
This will only iterate through files that exist. If you use the range expression, you have to take extra care not to try to operate on files that don't exist.
for i in file.{000..150}.txt.gz; do
[[ -e "$i" ]] || continue
...otherstuff
done

Bash to determine file size

Still learning bash but I had some questions in regards to my script.
My goal with the script is to access a folder with jpg images and if an image is 34.9kb it will return file not present. 34.9kb is the size of the image that shows "image not present".
#!/bin/bash
#Location
DIR="/mnt/windows/images"
file=file.jpg
badfile=12345
actualsize=$(du -b "$file" | cut -f 1)
if [ $actualsize -ge $badfile ]; then
echo $file does not exist >> results.txt
else
echo $file exists >> results.txt
fi
I need it to print each line to a txt file named results. I did research where some people either suggested using du -b or stat -c '%s' but I could not see what the pros and cons would be for using one or the other. Would the print to file come after the if else or stay with the if since Im printing for each file?? I need to print the name and result in the same line. What would be the best way to echo the file??
stat -c '%s' will give you the file size and nothing else, while du -b will include the file name in the output, so you'll have to use for instance cut or awk to get just the file size. For your requirements I'd go with stat.
Based on your question and your comments on your following question I'm assuming what you want to do is:
Iterate through all the *.jpg files in a specific directory
Run different commands depending on the size of the image
Specifically, you want to print "[filename] does not exist" if the file is of size 40318 bytes.
If my assumptions are close, then this should get you started:
# Location
DIR="/home/lsc"
# Size to match
BADSIZE=40318
find "$DIR" -maxdepth 1 -name "*.jpg" | while read filename; do
FILESIZE=$(stat -c "%s" "$filename") # get file size
if [ $FILESIZE -eq $BADSIZE ]; then
echo "$filename has a size that matches BADSIZE"
else
echo "$filename is fine"
fi
done
Note that I've used "find ... | while read filename" instead of "for filename in *.jpg" because the former can better handle paths that contain spaces.
Also note that $filename will contain the full path the the file (e.g. /mnt/windows/images/pic.jpg). If you want to only print the filename without the path, you can use either:
echo ${filename##*/}
or:
echo $(basename $filename)
The first uses Bash string maniputation which is more efficient but less readable, and the latter does so by making a call to basename.

Resources