Adding up file sizes in bash shells - bash

I've written a shell script that takes a directory as an arg and prints the file names and sizes, I wanted to find out how to add up the file sizes and store them so that I can print them after the loop. I've tried a few things but haven't gotten anywhere so far, any ideas?
#!/bin/bash
echo "Directory <$1> contains the following files:"
let "x=0"
TEMPFILE=./count.tmp
echo 0 > $TEMPFILE
ls $1 |
while read file
do
if [ -f $1/$file ]
then
echo "file: [$file]"
stat -c%s $file > $TEMPFILE
fi
cat $TEMPFILE
done
echo "number of files:"
cat ./count.tmp
Help would be thoroughly appreciated.

A number of issues in your code:
Don't parse ls
Quote variables in large majority of cases
Don't use temp files when they're not needed
Use already made tools like du for this (see comments)
Assuming you're just wanting to get practice at this and/or want to do something else other than what du already does, you should change syntax to something like
#!/bin/bash
dir="$1"
[[ $dir == *'/' ]] || dir="$dir/"
if [[ -d $dir ]]; then
echo "Directory <$1> contains the following files:"
else
echo "<$1> is not a valid directory, exiting"
exit 1
fi
shopt -s dotglob
for file in "$dir"*; do
if [[ -f $file ]]; then
echo "file: [$file]"
((size+=$(stat -c%s "$file")))
fi
done
echo "$size"
Note:
You don't have to pre-allocate variables in bash, $size is assumed to be 0
You can use (()) for math that doesn't require decimal places.
You can use globs (*) to get all files (including dirs, symlinks, etc...) in a particular directory (and globstar ** for recursive)
shopt -s dotglob Is needed so it includes hidden .whatever files in glob matching.

You can use ls -l to find size of files:
echo "Directory $1 contains the following:"
size=0
for f in "$1"/*; do
if [[ ! -d $f ]]; then
while read _ _ _ _ bytes _; do
if [[ -n $bytes ]]; then
((size+=$bytes))
echo -e "\tFile: ${f/$1\//} Size: $bytes bytes"
fi
done < <(ls -l "$f")
fi
done
echo "$1 Files total size: $size bytes"
Parsing ls results for size is ok here as byte size will always be found in the 5th field.
If you know what the date stamp format for ls is on your system and portability isn't important, you can parse ls to reliably find both the size and file in a single while read loop.
echo "Directory $1 contains the following:"
size=0
while read _ _ _ _ bytes _ _ _ file; do
if [[ -f $1/$file ]]; then
((size+=$bytes))
echo -e "\tFile: $file Size: $bytes bytes"
fi
done < <(ls -l "$1")
echo "$1 Files total size: $size bytes"
Note: These solutions would not include hidden files. Use ls -la for that.
Depending on the need or preference, ls can also print sizes in a number of different formats using options like -h or --block-size=SIZE.

#!/bin/bash
echo "Directory <$1> contains the following files:"
find ${1}/* -prune -type f -ls | \
awk '{print; SIZE+=$7} END {print ""; print "total bytes: " SIZE}'
Use find with -prune (so it does not recurse into subdirectories) and -type f (so it will only list files and no symlinks or directories) and -ls (so it lists the files).
Pipe the output into awk and
for each line print the whole line (print; replace with print $NF to only print the last item of each line, which is the filename including the directory). Also add the value of the 7th field, which is the file size (in my version of find) to the variable SIZE.
After all lines have been processed (END) print the calculated total size.

Related

How can i sort a Array based on a not integer Substring in Bash?

I wrote a cleanup Script to delete some certain files. The files are stored in Subfolders. I use find to get those files into a Array and its recursive because of find. So an Array entry could look like this:
(path to File)
./2021_11_08_17_28_45_1733556/2021_11_12_04_15_51_1733556_0.jfr
As you can see the filenames are Timestamps. Find sorts by the Folder name only (./2021_11_08_17_28_45_1733556) but I need to sort all Files which can be in different Folders by the timestamp only of the files and not of the folders (they can be completely ignored), so I can delete the oldest files first. Here you can find my Script at the not properly working state, I need to add some sorting to fix my problems.
Any Ideas?
#!/bin/bash
# handle -h (help)
if [[ "$1" == "-h" || "$1" == "" ]]; then
echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
exit 0
fi
# handle parameters
while getopts p:f:d: flag
do
case "${flag}" in
p) pathToFolder=${OPTARG};;
f) maxFiles=${OPTARG};;
d) dryRun=${OPTARG};;
*) echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
esac
done
if [[ -z $dryRun ]]; then
dryRun=true
fi
# fill array specified by .jfr files an sorted that the oldest files get deleted first
fillarray() {
files=($(find -name "*.jfr" -type f))
totalFiles=${#files[#]}
}
# Return size of file
getfilesize() {
filesize=$(du -k "$1" | cut -f1)
}
count=0
checkfiles() {
# Check if File matches the maxFiles parameter
if [[ ${#files[#]} -gt $maxFiles ]]; then
# Check if dryRun is enabled
if [[ $dryRun == "false" ]]; then
echo "msg=\"Removal result\", result=true, file=$(realpath $1) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
((count++))
rm $1
else
((count++))
echo msg="\"Removal result\", result=true, file=$(realpath $1 ) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
fi
# Remove the file from the files array
files=(${files[#]/$1})
else
echo msg="\"Removal result\", result=false, file=$( realpath $1), reason=\"within max file boundary\""
fi
}
# Scan for empty files
scanfornullfiles() {
for file in "${files[#]}"
do
filesize=$(! getfilesize $file)
if [[ $filesize == 0 ]]; then
files=(${files[#]/$file})
echo msg="\"Removal result\", result=false, file=$(realpath $file), reason=\"empty file\""
fi
done
}
echo msg="jfrcleanup.sh started", maxFiles=$maxFiles, dryRun=$dryRun, directory=$pathToFolder
{
cd $pathToFolder > /dev/null 2>&1
} || {
echo msg="no permission in directory"
echo msg="jfrcleanup.sh stopped"
exit 0
}
fillarray #> /dev/null 2>&1
scanfornullfiles
for file in "${files[#]}"
do
checkfiles $file
done
echo msg="\"jfrcleanup.sh finished\", totalFileCount=$totalFiles filesRemoved=$count"
Assuming the file paths do not contain newline characters, would tou please try
the following Schwartzian transform method:
#!/bin/bash
pat="/([0-9]{4}(_[0-9]{2}){5})[^/]*\.jfr$"
while IFS= read -r -d "" path; do
if [[ $path =~ $pat ]]; then
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$path"
fi
done < <(find . -type f -name "*.jfr" -print0) | sort -k1,1 | head -n 1 | cut -f2- | tr "\n" "\0" | xargs -0 echo rm
The string pat is a regex pattern to extract the timestamp from the
filename such as 2021_11_12_04_15_51.
Then the timestamp is prepended to the filename delimited by a tab
character.
The output lines are sorted by the timestamp in ascending order
(oldest first).
head -n 1 picks the oldest line. If you want to change the number of files
to remove, modify the number to the -n option.
cut -f2- drops the timestamp to retrieve the filename.
tr "\n" "\0" protects the filenames which contain whitespaces or
tab characters.
xargs -0 echo rm just outputs the command lines as a dry run.
If the output looks good, drop echo.
If you have GNU find, and pathnames don't contain new-line ('\n') and tab ('\t') characters, the output of this command will be ordered by basenames:
find path/to/dir -type f -printf '%f\t%p\n' | sort | cut -f2-
TL;DR but Since you're using find and if it supports the -printf flag/option something like.
find . -type f -name '*.jfr' -printf '%f/%h/%f\n' | sort -k1 -n | cut -d '/' -f2-
Otherwise a while read loop with another -printf option.
#!/usr/bin/env bash
while IFS='/' read -rd '' time file; do
printf '%s\n' "$file"
done < <(find . -type f -name '*.jfr' -printf '%T#/%p\0' | sort -zn)
That is -printf from find and the -z flag from sort is a GNU extension.
Saving the file names you could change
printf '%s\n' "$file"
To something like, which is an array named files
files+=("$file")
Then "${files[#]}" has the file names as elements.
The last code with a while read loop does not depend on the file names but the time stamp from GNU find.
I solved the problem! I sort the array with the following so the oldest files will be deleted first:
files=($(printf '%s\n' "${files[#]}" | sort -t/ -k3))
Link to Solution

Grab random files from a directory using just bash

I am looking to create a bash script that can grab files fitting a certain glob pattern and cp them to another folder for example
$foo\
a.txt
b.txt
c.txt
e.txt
f.txt
g.txt
run script that request 2 files I would get
$bar\
c.txt
f.txt
I am not sure if bash has a random number generator and how to use that to pull from a list. The directory is large as well (over 100K) so some of the glob stuff won't work.
Thanks in advance
Using GNU shuf, this copies N random files matching the given glob pattern in the given source directory to the given destination directory.
#!/bin/bash -e
shopt -s failglob
n=${1:?} glob=${2:?} source=${3:?} dest=${4:?}
declare -i rand
IFS=
[[ -d "$source" ]]
[[ -d "$dest" && -w "$dest" ]]
cd "$dest"
dest=$PWD
cd "$OLDPWD"
cd "$source"
printf '%s\0' $glob |
shuf -zn "$n" |
xargs -0 cp -t "$dest"
Use like:
./cp-rand 2 '?.txt' /source/dir /dest/dir
This will work for a directory containing thousands of files. xargs will manage limits like ARG_MAX.
$glob, unquoted, undergoes filename expansion (glob expansion). Because IFS is empty, the glob pattern can contain whitespace.
Matching sub-directories will cause cp to error and a premature exit (some files may have already been copied). cp -r to allow sub-directories.
cp -t target and xargs -0 are not POSIX.
Note that using a random number to select files from a list can cause cause duplicates, so you might copy less than N files. Hence using GNU shuf.
Try this:
#!/bin/bash
sourcedir="files"
# Arguments processing
if [[ $# -ne 1 ]]
then
echo "Usage: random_files.bash NUMBER-OF-FILES"
echo " NUMBER-OF-FILES: how many random files to select"
exit 0
else
numberoffiles="$1"
fi
# Validations
listoffiles=()
while IFS='' read -r line; do listoffiles+=("$line"); done < <(find "$sourcedir" -type f -print)
totalnumberoffiles=${#listoffiles[#]}
# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i++ ))
do
# Select a random number between 0 and $totalnumberoffiles
randomnumber=$(( RANDOM % totalnumberoffiles ))
echo "${listoffiles[$randomnumber]}"
done
build an array with the filenames
random a number from 0 to the size of the array
display the filename at that index
I built in a loop if you want to randomly select more than one file
you can setup another argument for the location of the files, I hard coded it here.
Another method, if this one fails because of to many files in the same directory, could be:
#!/bin/bash
sourcedir="files"
# Arguments processing
if [[ $# -ne 1 ]]
then
echo "Usage: random_files.bash NUMBER-OF-FILES"
echo " NUMBER-OF-FILES: how many random files to select"
exit 0
else
numberoffiles="$1"
fi
# Validations
find "$sourcedir" -type f -print >list.txt
totalnumberoffiles=$(wc -l list.txt | awk '{print $1}')
# loop on the number of files the user wanted
for (( i=1; i<=numberoffiles; i++ ))
do
# Select a random number between 1 and $totalnumberoffiles
randomnumber=$(( ( RANDOM % totalnumberoffiles ) + 1 ))
sed -n "${randomnumber}p" list.txt
done
/bin/rm -f list.txt
build a list of the files, so that each filename will be on one line
select a random number
in that one, the randomnumber must be +1 since line count starts at 1, not at 0 like in an array.
use sed to print the random line from the list of files

Bash command does not work in script but in console

I have running the two commands in a script where I want to check if all files in a directoy are media:
1 All_LINES=$(ls -1 | wc -l)
2 echo "Number of lines: ${All_LINES}"
3
4 REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
5 echo "Number of reached lines: ${REACHED_LINES}"
if[...]
Running line 4 and 5 sequentially in a shell it works as expected, counting all files ending with .jpg, .JPG...
Running all together in a script gives the following error though:
Number of lines: 12
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: syntax error near unexpected token `('
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: `ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)'
Number of reached lines:
Could somebody explain this to me, please?
EDIT: This is as far as I got:
#!/bin/bash
# script to rename images, sorted by "type" and "date modified" and named by current folder
#get/set basename for files
CURRENT_BASENAME=$(basename "${PWD}")
echo -e "Current directory/basename is: ${CURRENT_BASENAME}\n"
read -e -p "Please enter basename: " -i "${CURRENT_BASENAME}" BASENAME
echo -e "\nNew basename is: ${BASENAME}\n"
#start
echo -e "START RENAMING"
#get nr of all files in directory
All_LINES=$(ls -1 | wc -l)
echo "Number of lines: ${All_LINES}"
#get nr of media files in directory
REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"
EDIT1: Thanks again guys, this is my result so far. Still room for improvement, but a start and ready to test.
#!/bin/bash
#script to rename media files to a choosable name (default: ${basename} of current directory) and sorted by date modified
#config
media_file_extensions="(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg)"
#enable option extglob (extended globbing): If set, the extended pattern matching features described above under Pathname Expansion are enabled.
#more info: https://askubuntu.com/questions/889744/what-is-the-purpose-of-shopt-s-extglob
#used for regex
shopt -s extglob
#safe and set IFS (The Internal Field Separator): IFS is used for word splitting after expansion and to split lines into words with the read builtin command.
#more info: https://bash.cyberciti.biz/guide/$IFS
#used to get blanks in filenames
SAVEIFS=$IFS;
IFS=$(echo -en "\n\b");
#get and print current directory
basedir=$PWD
echo "Directory:" $basedir
#get and print nr of files in current directory
all_files=( "$basedir"/* )
echo "Number of files in directory: ${#all_files[#]}"
#get and print nr of media files in current directory
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files in directory: ${#media_files[#]}\n"
#validation if #all_files = #media_files
if [ ${#all_files[#]} -ne ${#media_files[#]} ]
then
echo "ABORT - YOU DID NOT REACH ALL FILES, PLEASE CHECK YOUR FILE ENDINGS"
exit
fi
#make a copy
backup_dir="backup_95f528fd438ef6fa5dd38808cdb10f"
backup_path="${basedir}/${backup_dir}"
mkdir "${backup_path}"
rsync -r "${basedir}/" "${backup_path}" --exclude "${backup_dir}"
echo "BACKUP MADE"
echo -e "START RENAMING"
#set new basename
basename=$(basename "${PWD}")
read -e -p "Please enter file basename: " -i "$basename" basename
echo -e "New basename is: ${basename}\n"
#variables
counter=1;
new_name="";
file_extension="";
#iterate over files
for f in $(ls -1 -t -r *${media_file_extensions})
do
#catch file name
echo "Current file is: $f"
#catch file extension
file_extension="${f##*.}";
echo "Current file extension is: ${file_extension}"
#create new name
new_name="${basename}_${counter}.${file_extension}"
echo "New name is: ${new_name}";
#rename file
mv $f "${new_name}";
echo -e "Counter is: ${counter}\n"
((counter++))
done
#get and print nr of media files before
echo "Number of media files before: ${#media_files[#]}"
#get and print nr of media files after
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files after: ${#media_files[#]}\n"
#delete backup?
while true; do
read -p "Do you wish to keep the result? " yn
case $yn in
[Yy]* ) rm -r ${backup_path}; echo "BACKUP DELETED"; break ;;
[Nn]* ) rm -r !(${backup_dir}); rsync -r "${backup_path}/" "${basedir}"; rm -r ${backup_path}; echo "BACKUP RESTORED THEN DELETED"; break;;
* ) echo "Please answer yes or no.";;
esac
done
#reverse IFS to default
IFS=$SAVEIFS;
echo -e "END RENAMING"
You don't need to and don't want to use ls at all here. See https://mywiki.wooledge.org/ParsingLs
Also, don't use uppercase for your private variables. See Correct Bash and shell script variable capitalization
#!/bin/bash
shopt -s extglob
read -e -p "Please enter basename: " -i "$PWD" basedir
all_files=( "$basedir"/* )
echo "Number of files: ${#all_files[#]}"
media_files=( "$basedir"/*(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) )
echo "Number of media files: ${#media_files[#]}"
As #chepner already pointed out in a comment, you likely need to explicitly enable extended globbing on your script. c.f. Greg's WIKI
It's also possible to condense that pattern to eliminate some redundancy and add mixed case if you like -
$: ls -1 *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg])
a.jpg
b.JPG
c.jpeg
d.JPEG
mixed.jPeG
mixed.pNg
x.png
y.PNG
You can also accomplish this without ls, which is error-prone. Try this:
$: all_lines=(*)
$: echo ${#all_lines[#]}
55
$: reached_lines=( *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg]) )
$: echo ${#reached_lines[#]}
8
c.f. this breakdown
If all you want is counts, but prefer not to include directories:
all_dirs=( */ )
num_files=$(( ${#all_files[#]} - ${#all_dirs[#]} ))
If there's a chance you will have a directory with a name that matches your jpg/png pattern, then it gets trickier. At that point it's probably easier to just use #markp-fuso's solution.
One last thing - avoid all-caps variable names. Those are generally reserved for system stuff.
Assuming the OP wants to limit the counts to normal files (ie, exclude non-files like directories, pipes, symbolic links, etc), a solution based on find may provide more accurate counts.
Updating OP's original code to use find (ignoring dot files for now):
ALL_LINES=$(find . -maxdepth 1 -type f | wc -l)
echo "Number of lines: ${ALL_LINES}"
REACHED_LINES=$(find . -maxdepth 1 -type f \( -iname '*.jpg' -o -iname '*.png' -o -iname '*.jpeg' \) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"

Bash script - How to find substring in all txt files in directory one by one and write a message for all positive findings

This is what I'm asked to do.
Browse all text (regular / plain) files in the specified folder (the first parameter of the script) and look for the specified word in them (the second parameter of the script). If the file contains the specified word, write the message: "YES, the word $2 is in $file". Otherwise write: "NO, there is no word $2 in $file"
Here is what i came with so far:
#!/bin/bash
FILES=$1
for file in $FILES
do
if [ $file == "*$2*" ];then
echo "YES, the word $2 is in $file"
else
echo "NO, there is no word $2 in $file"
fi
done
Using find
#!/usr/bin/env bash
pattern=$2
folder=$1
export pattern
find "$folder" -type f -exec sh -c '
for f; do
if grep -q "$pattern" "$f"; then
printf "%s is in %s\n" "$pattern" "$f"
else
printf "%s is not found in %s\n" "$pattern" "$f"
fi
done' _ {} +
Howto use.
./myscript folder pattern

Why is while not not working?

AIM: To find files with a word count less than 1000 and move them another folder. Loop until all under 1k files are moved.
STATUS: It will only move one file, then error with "Unable to move file as it doesn't exist. For some reason $INPUT_SMALL doesn't seem to update with the new file name."
What am I doing wrong?
Current Script:
Check for input files already under 1k and move to Split folder
INPUT_SMALL=$( ls -S /folder1/ | grep -i reply | tail -1 )
INPUT_COUNT=$( cat /folder1/$INPUT_SMALL 2>/dev/null | wc -l )
function moveSmallInput() {
while [[ $INPUT_SMALL != "" ]] && [[ $INPUT_COUNT -le 1003 ]]
do
echo "Files smaller than 1k have been found in input folder, these will be moved to the split folder to be processed."
mv /folder1/$INPUT_SMALL /folder2/
done
}
I assume you are looking for files that has the word reply somewhere in the path. My solution is:
wc -w $(find /folder1 -type f -path '*reply*') | \
while read wordcount filename
do
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
Run the script once, if the output looks correct, then uncomment the mv command and run it for real this time.
Update
The above solution has trouble with files with embedded spaces. The problem occurs when the find command hands its output to the wc command. After a little bit of thinking, here is my revised soltuion:
find /folder1 -type f -path '*reply*' | \
while read filename
do
set $(wc -w "$filename") # $1= word count, $2 = filename
wordcount=$1
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
A somewhat shorter version
#!/bin/bash
find ./folder1 -type f | while read f
do
(( $(wc -w "$f" | awk '{print $1}' ) < 1000 )) && cp "$f" folder2
done
I left cp instead of mv for safery reasons. Change to mv after validating
I you also want to filter with reply use #Hai's version of the find command
Your variables INPUT_SMALL and INPUT_COUNT are not functions, they're just values you assigned once. You either need to move them inside your while loop or turn them into functions and evaluate them each time (rather than just expanding the variable values, as you are now).

Resources