Why is while not not working? - bash

AIM: To find files with a word count less than 1000 and move them another folder. Loop until all under 1k files are moved.
STATUS: It will only move one file, then error with "Unable to move file as it doesn't exist. For some reason $INPUT_SMALL doesn't seem to update with the new file name."
What am I doing wrong?
Current Script:
Check for input files already under 1k and move to Split folder
INPUT_SMALL=$( ls -S /folder1/ | grep -i reply | tail -1 )
INPUT_COUNT=$( cat /folder1/$INPUT_SMALL 2>/dev/null | wc -l )
function moveSmallInput() {
while [[ $INPUT_SMALL != "" ]] && [[ $INPUT_COUNT -le 1003 ]]
do
echo "Files smaller than 1k have been found in input folder, these will be moved to the split folder to be processed."
mv /folder1/$INPUT_SMALL /folder2/
done
}

I assume you are looking for files that has the word reply somewhere in the path. My solution is:
wc -w $(find /folder1 -type f -path '*reply*') | \
while read wordcount filename
do
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
Run the script once, if the output looks correct, then uncomment the mv command and run it for real this time.
Update
The above solution has trouble with files with embedded spaces. The problem occurs when the find command hands its output to the wc command. After a little bit of thinking, here is my revised soltuion:
find /folder1 -type f -path '*reply*' | \
while read filename
do
set $(wc -w "$filename") # $1= word count, $2 = filename
wordcount=$1
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done

A somewhat shorter version
#!/bin/bash
find ./folder1 -type f | while read f
do
(( $(wc -w "$f" | awk '{print $1}' ) < 1000 )) && cp "$f" folder2
done
I left cp instead of mv for safery reasons. Change to mv after validating
I you also want to filter with reply use #Hai's version of the find command

Your variables INPUT_SMALL and INPUT_COUNT are not functions, they're just values you assigned once. You either need to move them inside your while loop or turn them into functions and evaluate them each time (rather than just expanding the variable values, as you are now).

Related

How can i sort a Array based on a not integer Substring in Bash?

I wrote a cleanup Script to delete some certain files. The files are stored in Subfolders. I use find to get those files into a Array and its recursive because of find. So an Array entry could look like this:
(path to File)
./2021_11_08_17_28_45_1733556/2021_11_12_04_15_51_1733556_0.jfr
As you can see the filenames are Timestamps. Find sorts by the Folder name only (./2021_11_08_17_28_45_1733556) but I need to sort all Files which can be in different Folders by the timestamp only of the files and not of the folders (they can be completely ignored), so I can delete the oldest files first. Here you can find my Script at the not properly working state, I need to add some sorting to fix my problems.
Any Ideas?
#!/bin/bash
# handle -h (help)
if [[ "$1" == "-h" || "$1" == "" ]]; then
echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
exit 0
fi
# handle parameters
while getopts p:f:d: flag
do
case "${flag}" in
p) pathToFolder=${OPTARG};;
f) maxFiles=${OPTARG};;
d) dryRun=${OPTARG};;
*) echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
esac
done
if [[ -z $dryRun ]]; then
dryRun=true
fi
# fill array specified by .jfr files an sorted that the oldest files get deleted first
fillarray() {
files=($(find -name "*.jfr" -type f))
totalFiles=${#files[#]}
}
# Return size of file
getfilesize() {
filesize=$(du -k "$1" | cut -f1)
}
count=0
checkfiles() {
# Check if File matches the maxFiles parameter
if [[ ${#files[#]} -gt $maxFiles ]]; then
# Check if dryRun is enabled
if [[ $dryRun == "false" ]]; then
echo "msg=\"Removal result\", result=true, file=$(realpath $1) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
((count++))
rm $1
else
((count++))
echo msg="\"Removal result\", result=true, file=$(realpath $1 ) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
fi
# Remove the file from the files array
files=(${files[#]/$1})
else
echo msg="\"Removal result\", result=false, file=$( realpath $1), reason=\"within max file boundary\""
fi
}
# Scan for empty files
scanfornullfiles() {
for file in "${files[#]}"
do
filesize=$(! getfilesize $file)
if [[ $filesize == 0 ]]; then
files=(${files[#]/$file})
echo msg="\"Removal result\", result=false, file=$(realpath $file), reason=\"empty file\""
fi
done
}
echo msg="jfrcleanup.sh started", maxFiles=$maxFiles, dryRun=$dryRun, directory=$pathToFolder
{
cd $pathToFolder > /dev/null 2>&1
} || {
echo msg="no permission in directory"
echo msg="jfrcleanup.sh stopped"
exit 0
}
fillarray #> /dev/null 2>&1
scanfornullfiles
for file in "${files[#]}"
do
checkfiles $file
done
echo msg="\"jfrcleanup.sh finished\", totalFileCount=$totalFiles filesRemoved=$count"
Assuming the file paths do not contain newline characters, would tou please try
the following Schwartzian transform method:
#!/bin/bash
pat="/([0-9]{4}(_[0-9]{2}){5})[^/]*\.jfr$"
while IFS= read -r -d "" path; do
if [[ $path =~ $pat ]]; then
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$path"
fi
done < <(find . -type f -name "*.jfr" -print0) | sort -k1,1 | head -n 1 | cut -f2- | tr "\n" "\0" | xargs -0 echo rm
The string pat is a regex pattern to extract the timestamp from the
filename such as 2021_11_12_04_15_51.
Then the timestamp is prepended to the filename delimited by a tab
character.
The output lines are sorted by the timestamp in ascending order
(oldest first).
head -n 1 picks the oldest line. If you want to change the number of files
to remove, modify the number to the -n option.
cut -f2- drops the timestamp to retrieve the filename.
tr "\n" "\0" protects the filenames which contain whitespaces or
tab characters.
xargs -0 echo rm just outputs the command lines as a dry run.
If the output looks good, drop echo.
If you have GNU find, and pathnames don't contain new-line ('\n') and tab ('\t') characters, the output of this command will be ordered by basenames:
find path/to/dir -type f -printf '%f\t%p\n' | sort | cut -f2-
TL;DR but Since you're using find and if it supports the -printf flag/option something like.
find . -type f -name '*.jfr' -printf '%f/%h/%f\n' | sort -k1 -n | cut -d '/' -f2-
Otherwise a while read loop with another -printf option.
#!/usr/bin/env bash
while IFS='/' read -rd '' time file; do
printf '%s\n' "$file"
done < <(find . -type f -name '*.jfr' -printf '%T#/%p\0' | sort -zn)
That is -printf from find and the -z flag from sort is a GNU extension.
Saving the file names you could change
printf '%s\n' "$file"
To something like, which is an array named files
files+=("$file")
Then "${files[#]}" has the file names as elements.
The last code with a while read loop does not depend on the file names but the time stamp from GNU find.
I solved the problem! I sort the array with the following so the oldest files will be deleted first:
files=($(printf '%s\n' "${files[#]}" | sort -t/ -k3))
Link to Solution

How to find files and count them (storing the info into a variable)?

I want to have a conditional behavior depending on the number of files found:
found=$(find . -type f -name "$1")
numfiles=$(printf "%s\n" "$found" | wc -l)
if [ $numfiles -eq 0 ]; then
echo "cannot access $1: No such file" > /dev/stderr; exit 2;
elif [ $numfiles -gt 1 ]; then
echo "cannot access $1: Duplicate file found" > /dev/stderr; exit 2;
else
echo "File: $(ls $found)"
head $found
fi
EDITED CODE (to reflect more precisely what I need)
Though, numfiles isn't equal to 2(or more) when there are duplicate files found...
All the filenames are on one line, separated by a space.
On the other hand, this works correctly:
find . -type f -name "$1" | wc -l
but I don't want to do twice the recursive search in the if/then/else construct...
Adding -print0 doesn't help either.
What would?
PS- Simplifications or improvements are always welcome!
You want to find files and count the files with a name "$1":
grep -c "/${1}$" $(find . 2>/dev/null)
And store the result in a var. In one command:
numfiles=$(grep -c "/${1}$" <(find . 2>/dev/null))
Using $() to store data to a variable trims tailing whitespace. Since the final newline does not appear in the variable numfiles, wc miscounts by one. You can recover the trailing newline with:
numfiles=$(printf "%s\n" "$found" | wc -l)
This miscounts if found is empty (and if any filenames contain a newline), emphasizing the fact that this entire approach is faulty. If you really want to go this way, you can try:
numfiles=$(test -z "$numfiles" && echo 0 || printf "%s\n" "$found" | wc -l)
or pipe the output of find to a script that counts the output and prints a count along with the first filename:
find . -type f -name "$1" | tr '\n' ' ' |
awk '{c=NF; f=$1 } END {print c, f; exit c!=1}' c=0 |
while read count name; do
case $count in
0) echo no files >&2;;
1) echo 1 file $name;;
*) echo Duplicate files >&2;;
esac;
done
All of these solutions fail miserably if any pathnames contain whitespace. If that matters, you could change the awk to a perl script to make it easier to handle null separators and use -print0, but really I think you should stop worrying about special cases. (find -exec and find | xargs both fail to handle to 0 files matching case cleanly. Arguably this awk solution also doesn't handle it cleanly.)

Adding up file sizes in bash shells

I've written a shell script that takes a directory as an arg and prints the file names and sizes, I wanted to find out how to add up the file sizes and store them so that I can print them after the loop. I've tried a few things but haven't gotten anywhere so far, any ideas?
#!/bin/bash
echo "Directory <$1> contains the following files:"
let "x=0"
TEMPFILE=./count.tmp
echo 0 > $TEMPFILE
ls $1 |
while read file
do
if [ -f $1/$file ]
then
echo "file: [$file]"
stat -c%s $file > $TEMPFILE
fi
cat $TEMPFILE
done
echo "number of files:"
cat ./count.tmp
Help would be thoroughly appreciated.
A number of issues in your code:
Don't parse ls
Quote variables in large majority of cases
Don't use temp files when they're not needed
Use already made tools like du for this (see comments)
Assuming you're just wanting to get practice at this and/or want to do something else other than what du already does, you should change syntax to something like
#!/bin/bash
dir="$1"
[[ $dir == *'/' ]] || dir="$dir/"
if [[ -d $dir ]]; then
echo "Directory <$1> contains the following files:"
else
echo "<$1> is not a valid directory, exiting"
exit 1
fi
shopt -s dotglob
for file in "$dir"*; do
if [[ -f $file ]]; then
echo "file: [$file]"
((size+=$(stat -c%s "$file")))
fi
done
echo "$size"
Note:
You don't have to pre-allocate variables in bash, $size is assumed to be 0
You can use (()) for math that doesn't require decimal places.
You can use globs (*) to get all files (including dirs, symlinks, etc...) in a particular directory (and globstar ** for recursive)
shopt -s dotglob Is needed so it includes hidden .whatever files in glob matching.
You can use ls -l to find size of files:
echo "Directory $1 contains the following:"
size=0
for f in "$1"/*; do
if [[ ! -d $f ]]; then
while read _ _ _ _ bytes _; do
if [[ -n $bytes ]]; then
((size+=$bytes))
echo -e "\tFile: ${f/$1\//} Size: $bytes bytes"
fi
done < <(ls -l "$f")
fi
done
echo "$1 Files total size: $size bytes"
Parsing ls results for size is ok here as byte size will always be found in the 5th field.
If you know what the date stamp format for ls is on your system and portability isn't important, you can parse ls to reliably find both the size and file in a single while read loop.
echo "Directory $1 contains the following:"
size=0
while read _ _ _ _ bytes _ _ _ file; do
if [[ -f $1/$file ]]; then
((size+=$bytes))
echo -e "\tFile: $file Size: $bytes bytes"
fi
done < <(ls -l "$1")
echo "$1 Files total size: $size bytes"
Note: These solutions would not include hidden files. Use ls -la for that.
Depending on the need or preference, ls can also print sizes in a number of different formats using options like -h or --block-size=SIZE.
#!/bin/bash
echo "Directory <$1> contains the following files:"
find ${1}/* -prune -type f -ls | \
awk '{print; SIZE+=$7} END {print ""; print "total bytes: " SIZE}'
Use find with -prune (so it does not recurse into subdirectories) and -type f (so it will only list files and no symlinks or directories) and -ls (so it lists the files).
Pipe the output into awk and
for each line print the whole line (print; replace with print $NF to only print the last item of each line, which is the filename including the directory). Also add the value of the 7th field, which is the file size (in my version of find) to the variable SIZE.
After all lines have been processed (END) print the calculated total size.

How to locate the directory where the sum of the number of lines of regular file is greatest (in bash)

Hi i'm new in Unix and bash and I'd like to ask q. how can i do this
The specified directory is given as arguments. Locate the directory
where the sum of the number of lines of regular file is greatest.
Browse all specific directories and their subdirectories. Amounts
count only for files that are directly in the directory.
I try somethnig but it's not working properly.
while [ $# -ne 0 ];
do case "$1" in
-h) show_help ;;
-*) echo "Error: Wrong arguments" 1>&2 exit 1 ;;
*) directories=("$#") break ;;
esac
shift
done
IFS='
'
amount=0
for direct in "${directories[#]}"; do
for subdirect in `find $direct -type d `; do
temp=`find "$subdirect" -type f -exec cat {} \; | wc -l | tr -s " "`
if [ $amount -lt $temp ]; then
amount=$temp
subdirect2=$subdirect
fi
done
echo Output: "'"$subdirect2$amount"'"
done
the problem is here when i use as arguments this dirc.(just example)
/home/usr/first and there are this direct.
/home/usr/first/tmp/first.txt (50 lines)
/home/usr/first/tmp/second.txt (30 lines)
/home/usr/first/tmp1/one.txt (20 lines)
it will give me on Output /home/usr/first/tmp1 100 and this is wrong it should be /home/usr/first/tmp 80
I'd like to scan all directories and all its subdirectories in depth. Also if multiple directories meets the maximum should list all.
Given your sample files, I'm going to assume you only want to look at the immediate subdirectories, not recurse down several levels:
max=-1
# the trailing slash limits the wildcard to directories only
for dir in */; do
count=0
for file in "$dir"/*; do
[[ -f "$file" ]] && (( count += $(wc -l < "$file") ))
done
if (( count > max )); then
max=$count
maxdir="$dir"
fi
done
echo "files in $maxdir have $max lines"
files in tmp/ have 80 lines
In the spirit of Unix (caugh), here's an absolutely disgusting chain of pipes that I personally hate, but it's a lot of fun to construct :):
find . -mindepth 1 -maxdepth 1 -type d -exec sh -c 'find "$1" -maxdepth 1 -type f -print0 | wc -l --files0-from=- | tail -1 | { read a _ && echo "$a $1"; }' _ {} \; | sort -nr | head -1
Of course, don't use this unless you're mentally ill, use glenn jackman's nice answer instead.
You can have great control on find's unlimited filtering possibilities, too. Yay. But use glenn's answer!

Need a bash scripts to move files to sub folders automatically

I have a folder with 320G images, I want to move the images to 5 sub folders randomly(just need to move to 5 sub folders). But I know nothing on bash scripts.Please could someone help? thanks!
You could move the files do different directories based on their first letter:
mv [A-Fa-f]* dir1
mv [F-Kf-k]* dir2
mv [^A-Ka-k]* dir3
Here is my take on this. In order to use it place the script somewhere else (not in you folder) but run it from your folder. If you call your script file rmove.sh, you can place it in, say ~/scripts/, then cd to your folder and run:
source ~/scripts/rmove.sh
#/bin/bash
ndirs=$((`find -type d | wc -l` - 1))
for file in *; do
if [ -f "${file}" ]; then
rand=`dd if=/dev/random bs=1 count=1 2>/dev/null | hexdump -b | head -n1 | cut -d" " -f2`
rand=$((rand % ndirs))
i=0
for directory in `find -type d`; do
if [ "${directory}" = . ]; then
continue
fi
if [ $i -eq $rand ]; then
mv "${file}" "${directory}"
fi
i=$((i + 1))
done
fi
done
Here's my stab at the problem:
#!/usr/bin/env bash
sdprefix=subdir
dirs=5
# pre-create all possible sub dirs
for n in {1..5} ; do
mkdir -p "${sdprefix}$n"
done
fcount=$(find . -maxdepth 1 -type f | wc -l)
while IFS= read -r -d $'\0' file ; do
subdir="${sdprefix}"$(expr \( $RANDOM % $dirs \) + 1)
mv -f "$file" "$subdir"
done < <(find . -maxdepth 1 -type f -print0)
Works with huge numbers of files
Does not beak if a file is not moveable
Creates subdirectories if necessary
Does not break on unusual file names
Relatively cheap
Any scripting language will do so I'll write in Python here:
#!/usr/bin/python
import os
import random
new_paths = ['/path1', '/path2', '/path3', '/path4', '/path5']
image_directory = '/path/to/images'
for file_path in os.listdir(image_directory):
full_path = os.path.abspath(os.path.join(image_directory, file_path))
random_subdir = random.choice(new_paths)
new_path = os.path.abspath(os.path.join(random_subdir, file_path))
os.rename(full_path, new_path)
mv `ls | while read x; do echo "`expr $RANDOM % 1000`:$x"; done \
| sort -n| sed 's/[0-9]*://' | head -1` ./DIRNAME
run it in your current image directory, this command will select one file at a time and move it to ./DIRNAME, iterate this command until there are no more files to move.
Pay attention that ` is backquotes and not just quotes characters.

Resources