Issue with line count in the loop - shell

I have a script as below.
mount=/transfer/input
rm src/cb.log
array=( $(find ${mount} -type f \( -name "Turbine_DATA*" -o -name "GT_Data*" -o -name "INSIGHTS_data*" \) -printf '%f\n' ))
for i in "${array[#]}";do
echo $i >>src/cb.log
Filecount=`find ${mount} -maxdepth 1 -type f -name "$i" | wc -l`
echo $Filecount>> src/cb.log;
done
I am facing below issue.
Although the file Turbine_DATA has 3 lines of data in that but the Filecount is still showing as 1. I am able to do wc -l on that file in the /transfer/input directory where the file is present and I can see 3 count there.
The point to note here that I am running the script from /NAS/Files directory.
Is this the problem here?

Filecount is the number of files, not the total number of lines in those files.
That's why it's named Filecount, not Linecount.
It's counting the number of files with the given name (I assume there can be multiple files of the same name in different subdirectories).
However, for a correct understanding you should probably consult the author of that script - it's not clear what information it should write to cb.log.

Related

Keep latest pair of files and move older files to another (Unix)

For example I have following files in a directory
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
I want to keep FILE_1_2021-01-03.csum and FILE_1_2021-01-03.csv in current directory but zip and move rest of the older files to another directory.
So far I have tried like this but stuck how to correctly identify the pairs
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
for file in ${PATH}/*
do
if [[ ! -d $file ]]
then
file_count=$(($file_count+1))
fi
done
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
// How to do it
fi
Since the timestamps are in a format that naturally sorts out with earliest first, newest last, an easy approach is to just use filename expansion to store the .csv and .csum filenames in a pair of arrays, and then do something with all but the last element of both:
declare -a csv=( FILE_*.csv ) csum=( FILE_*.csum )
mv "${csv[#]:0:${#csv[#]}-1}" "${csum[#]:0:${#csum[#]}-1}" new_directory/
(Or tar them up first, or whatever.)
First off ...
it's bad practice to use all uppercase variables as these can clash with OS-level variables (also all uppercase); case in point ...
PATH is a OS-level variable for keeping track of where to locate binaries but in this case ...
OP has just wiped out the OS-level variable with the assignment PATH=/path/to/dir
As for the question, some assumptions:
each *.csv file has a matching *.csum file
the 2 files to 'keep' can be determined from the first 2 lines of output resulting from a reverse sort of the filenames
not sure what OP means by 'zip and move' (eg, zip? gzip? tar all old files into a single .tar and then (g)zip?) so for the sake of this answer I'm going to just gzip each file and move to a new directory (OP can adjust the code to fit the actual requirement)
Setup:
srcdir='/tmp/myfiles'
arcdir='/tmp/archive'
rm -rf "${srcdir}" "${arcdir}"
mkdir -p "${srcdir}" "${arcdir}"
cd "${srcdir}"
touch FILE_1_2021-01-0{1..3}.{csum,csv} abc XYZ
ls -1
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
Get list of *.csum/*.csv files and sort in reverse order:
$ find . -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r
/tmp/myfiles/FILE_1_2021-01-03.csv
/tmp/myfiles/FILE_1_2021-01-03.csum
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Eliminate first 2 files (ie, generate list of files to zip/move):
$ find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Process our list of files:
while read -r fname
do
gzip "${fname}"
mv "${fname}".gz "${arcdir}"
done < <(find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3)
NOTE: the find|sort|tail results could be piped to xargs (or parallel) to perform the gzip/mv operations but without more details on what OP means by 'zip and move' I've opted for a simpler, albeit less performant, while loop
Results:
$ ls -1 "${srcdir}"
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
$ ls -1 "${arcdir}"
FILE_1_2021-01-01.csum.gz
FILE_1_2021-01-01.csv.gz
FILE_1_2021-01-02.csum.gz
FILE_1_2021-01-02.csv.gz
Your algorithm of counting files can be simplified using find. You seem to look for non-directories. The option -not -type d does exactly that. By default find searches into the subfolders, so you need to pass -maxdepth 1 to limit the search to a depth of 1.
find "$PATH" -maxdepth 1 -not -type d
If you want to get the number of files, you may pipe the command to wc:
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
Now there are two ways of detecting which file is the more recent: by looking at the filename, or by looking at the date when the files were last created/modified/etc. Since your naming convention looks pretty solid, I would recommend the first option. Sorting by creation/modification date is more complex and there are numerous cases where this information is not reliable, such as copying files, zipping/unzipping them, touching files, etc.
You can sort with sort and then grab the last element with tail -1:
find "$PATH" -maxdepth 1 -not -type d | sort | tail -1
You can do the same thing by sorting in reverse order using sort -r and then grab the first element with head -1. From a functional point of view, it is strictly equivalent, but it is slightly faster because it stops at the first result instead of parsing all results. Plus it will be more relevant later on.
find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1
Once you have the filename of the most recent file, you can extract the base name in order to create a pattern out of it.
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
Let’s explain this:
first, we grab the filename into a variable called most_recent_file
then we remove the extension using ${most_recent_file%.*} ; the % symbol will cut at the end, and .* will cut everything after the last dot, including the dot itself
finally, we remove the folder using ${most_recent_file##*/} ; the ## symbol will cut at the beginning with a greedy catch, and */ will cut everything before the last slash, including the slash itself
The difference between # and ## is how greedy the pattern is. If your file is /path/to/file.csv then ${most_recent_file#*/} (single #) will cut the first slash only, i.e. it will output path/to/file.csv, while ${most_recent_file##*/} (double #) will cut all paths, i.e. it will output file.csv.
Once you have this string, you can make a pattern to include/exclude similar files using find.
find "$PATH" -maxdepth 1 -not -type d -name "$most_recent_file.*"
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*"
The first line will list all files which match your pattern, and the second line will list all files which do not match the pattern.
Since you want to move your 'old' files to a folder, you may execute a mv command for the last list.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv {} "$ARCH" \;
If your version of find supports it, you may use + in order to batch the move operations.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv -t "$ARCH" {} +
Otherwise you can pipe to xargs.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
If put altogether:
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
fi
As a last note, if your path has newlines, it will not work. If you want to handle this case, you need a few modifications. Counting files would be done like this:
file_count=$(find "$PATH" -maxdepth 1 -not -type d -print . | wc -c)
Getting the most recent file:
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d -print0 | sort -rz | grep -zm1)
Moving files with xargs:
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -print0 | xargs -0 mv -t "$ARCH"
(There’s no problem if moving files using -exec)
I won’t go into details, but just know that the issue is known and these are the kind of solutions you can apply if need be.

Shell script to remove and compress files from 2 paths

Need to delete log files older than 60 days and compress it if files are greater than 30 days and lesser than 60 days. I have to remove and compress files from 2 paths as mentioned in PURGE_DIR_PATH.
Also have to take the output of find command and redirect it to log file. Basically need to create an entry in the log file whenever a file is deleted. How can i achieve this?
I have to also validate if the directory path is valid or not and put a message in log file if the directory is valid or not
I have written a shell script but doesn't cover all the scenarios. This is my first shell script and need some help. How do i keep just one variable log_retention and
use it to compress files as the condition would be >30 days and <60 days? how do I validate if directories is valid or not? is my IF condition checking that?
Please let me know.
#!/bin/bash
LOG_RETENTION=60
WEB_HOME="/web/local/artifacts"
ENG_DIR="$(dirname $0)"
PURGE_DIR_PATH="$(WEB_HOME)/backup/csvs $(WEB_HOME)/home/archives"
if[[ -d /PURGE_DIR_PATH]] then echo "/PURGE_DIR_PATH exists on your filesystem." fi
for dir_name in ${PURGE_DIR_PATH}
do
echo $PURGE_DIR_PATH
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec ls -l {} \;
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec rm {} \;
done
Off the top of my head -
#!/bin/bash
CSV_DELETE=60
CSV_COMPRESS=30
WEB_HOME="/web/local/artifacts"
PURGE_DIR_PATH=( "$(WEB_HOME)/backup/csvs" "$(WEB_HOME)/home/archives" ) # array, not single string
# eliminate the oldest
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_DELETE} |
xargs -P 100 rm -f # run 100 in bg parallel
# compress the old-enough after the oldest are gone
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_COMPRESS} |
xargs -P 100 gzip # run 100 in bg parallel
Shouldn't need loops.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

why is part of my one liner not being executed

I am trying to write a one liner to find the number of files in each home directory. I am trying to do this as the other day I had situation where I ran out of inodes on /home. It took me a long time to find the offender and I want to shorten this process. This is what i have but it is not working.
for i in /home/*; do if [ -d "$i" ]; then cd $i find . -xdev -maxdepth 100 -type f |wc -l; fi done
When I run it, it prints a 0 for each home directory, and I remain in roots home directory.
However when I run just this part:
for i in /home/*; do if [ -d "$i" ]; then cd $i; fi done
I wind up in the last home directory leading me to believe I traversed them all.
And when I run this in each users home directory:
find . -xdev -maxdepth 100 -type f |wc -l
I get a legit answer.
You're missing a terminating character after your cd. But more importantly, using cd can cause unwanted errors if you're not careful, try below instead (cd not needed).
for i in /home/*; do [ -d "$i" ] && echo "$i" && find "$i" -xdev -maxdepth 100 -type f | wc -l; done
Since find can take multiple paths, you don't need a loop:
find /home/*/ -xdev -maxdepth 100 -type f | wc -l
To avoid any issues with filenames containing newlines (rare, yes), you can take advantage of an additional GNU extension to find (you're using -maxdepth, so I assume you can use -printf as well):
find /home/*/ -xdev -maxdepth 100 -type f -printf "." | wc -c
Since you aren't actually using the name of the file for counting, replace it with a single-character string, then count the length of the resulting string.

How can I list all unique file names without their extensions in bash?

I have a task where I need to move a bunch of files from one directory to another. I need move all files with the same file name (i.e. blah.pdf, blah.txt, blah.html, etc...) at the same time, and I can move a set of these every four minutes. I had a short bash script to just move a single file at a time at these intervals, but the new name requirement is throwing me off.
My old script is:
find ./ -maxdepth 1 -type f | while read line; do mv "$line" ~/target_dir/; echo "$line"; sleep 240; done
For the new script, I basically just need to replace find ./ -maxdepth 1 -type f
with a list of unique file names without their extensions. I can then just replace do mv "$line" ~/target_dir/; with do mv "$line*" ~/target_dir/;.
So, with all of that said. What's a good way to get a unique list of files without their file names with bash script? I was thinking about using a regex to grab file names and then throwing them in a hash to get uniqueness, but I'm hoping there's an easier/better/quicker way. Ideas?
A weird-named files tolerant one-liner could be:
find . -maxdepth 1 -type f -and -iname 'blah*' -print0 | xargs -0 -I {} mv {} ~/target/dir
If the files can start with multiple prefixes, you can use logic operators in find. For example, to move blah.* and foo.*, use:
find . -maxdepth 1 -type f -and \( -iname 'blah.*' -or -iname 'foo.*' \) -print0 | xargs -0 -I {} mv {} ~/target/dir
EDIT
Updated after comment.
Here's how I'd do it:
find ./ -type f -printf '%f\n' | sed 's/\..*//' | sort | uniq | ( while read filename ; do find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \; ; sleep 240; done )
Perhaps it needs some explaination:
find ./ -type f -printf '%f\n': find all files and print just their name, followed by a newline. If you don't want to look in subdirectories, this can be substituted by a simple ls;
sed 's/\..*//': strip the file extension by removing everything after the first dot. Both foo.tar ad foo.tar.gz are transformed into foo;
sort | unique: sort the filenames just found and remove duplicates;
(: open a subshell:
while read filename: read a line and put it into the $filename variable;
find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \;: find in the current directory (find .) all the files (-type f) whose name starts with the value in filename (-iname "$filename"'*', this works also for files containing whitespaces in their name) and execute the mv command on each one (-exec mv {} /dest/dir \;)
sleep 240: sleep
): end of subshell.
Add -maxdepth 1 as argument to find as you see fit for your requirements.
Nevermind, I'm dumb. there's a uniq command. Duh. New working script is: find ./ -maxdepth 1 -type f | sed -e 's/.[a-zA-Z]*$//' | uniq | while read line; do mv "$line*" ~/target_dir/; echo "$line"; sleep 240; done
EDIT: Forgot close tag on code and a backslash.

Resources