Adding files sizes using bash command "wc" - bash

I ran this command to find each file modified yesterday:
find /eqtynas/ -type f -mtime -1 > /home/writtenToStorage.20171026 &
and then developed this script to add up all the files collected by the script, and sum the sizes. .
#!/bin/bash
ydate=$(date +%Y%m%d --date="yesterday")
file="/home/writtenToStorage.$ydate"
fileSize=0
for line in $(cat $file)
do
if [ -f $line ] && [ -s $line ] ; then
fileSize1=$fileSize
fileSize=$(wc -c < $line)
Total=$(( $fileSize + $fileSize1 ))
fi
done
echo $Total
However when I stat just one of the files in the list It comes out to 18942, where as the total for all the files combined comes out at 34499.
wc -c /eqty/fixed
18942 /eqty/fixed
Is the script ok - because I ran another check and the total size was 314 gigs
find /eqtynas/ -type f -mtime -1 -print0 | du -ch --files0-from=- --total -s > 24hourUsage.20171026 &

Continuing from my comment, you may prefer something similar to:
sum=
while read -r sz; do
sum=$((sum + sz))
done < <(find /eqtynas/ -type f -mtime -1 -exec stat -c %s '{}' \; )
echo "sum: $sum"
There are a number of ways to do this. You can also pipe the result of -exec ls -al '{}' to awk and just sum the 5th field.
If you have already written the filenames to /home/writtenToStorage.20171026, then you can simply redirect the file to your while loop, e.g.
while read -r sz; do
sum=$((sum + sz))
done <"/home/writtenToStorage.20171026"
Look things over and let me know i you have any questions.

You're not adding to Total, you're just setting it to the sum of the sizes of the last two files.
for line in $(cat $file)
do
if [ -f $line ] && [ -s $line ] ; then
fileSize=$(wc -c < $line)
((Total += fileSize))
fi
done

Related

Print top N files by word count in two columns

I would like to make a script that prints the filenames for the top n files from two directories (n being the number of files I give in in the command line) in order of number of words they have. My biggest problem however is in the way they should be displayed.
Say my command line looks like this:
myscript.sh 5 dir1 dir2
The output should have 2 columns: on the left the top 5 files in descending order from dir1, and on the right the top 5 files in descending order from dir2.
This is what I have in terms of code, however I'm missing something. I think that pr -m -t should do what i want, but I couldn't make it work.
#!/bin/bash
dir=$1
dir2=$2
for files in "$dir"
do
find ./reuters-topics/$dir -type f -exec wc -l {} + | sort -rn |head -n 15
done
for files in "$dir2"
do
find ./reuters-topics/$dir2 -type f -exec wc -l {} + | sort -rn | head -n 15
done
This is a solution in fish:
for i in (find . -type f); wc -l $i; end | sort -rn | head -n15 | awk '{print $2 "\t" $1}'
As you can see, the re-ordering (filename first, number of words second) is done by awk. As a separator I use a tab character:
awk '{print $2 "\t" $1}'
The difference between my loop and your find call, btw, is that I do not get the "total" line in the output.
I did not test if this (including awk) also works well for files with spaces in the name.
#!/usr/bin/env bash
_top_files_by_words_usage() {
local usage=""
read -r -d '' usage <<-"EOF"
Usage:
top_files_by_words <show_count> <dir1> <dir2>
EOF
1>&2 printf "%s\n" "$usage"
}
top_files_by_words() {
if (( $# != 3 )) || [[ "$1" != +([0-9]) ]]; then
_top_files_by_words_usage
return 1
fi
local -i showCount=0
local dir1=""
local dir2=""
showCount="$1"
dir1="$2"
dir2="$3"
shopt -s extglob
if [[ ! -d "$dir1" ]]; then
1>&2 printf "directory '%s' does not exist or is not a directory\n" "$dir1"
return 1
fi
if [[ ! -d "$dir2" ]]; then
1>&2 printf "directory '%s' does not exist or is not a directory\n" "$dir2"
return 1
fi
local -a out1=()
local -a out2=()
IFS=$'\n' read -r -d '' -a out1 < <(find "$dir1" -type f -exec wc -w {} \; | sort -k 1gr | head -n "$showCount")
IFS=$'\n' read -r -d '' -a out2 < <(find "$dir2" -type f -exec wc -w {} \; | sort -k 1gr | head -n "$showCount")
local -i i=0
local -i maxLen=0
local -i len=0;
for ((i = 0; i < showCount; ++i)); do
len="${#out1[$i]}"
if (( len > maxLen )); then
maxLen=$len
fi
# len="${#out2[$i]}"
# if (( len > maxLen )); then
# maxLen=$len
# fi
done
for (( i = 0; i < showCount; ++i)); do
printf "%-*.*s %s\n" "$maxLen" "$maxLen" "${out1[$i]}" "${out2[$i]}"
done
return 0
}
top_files_by_words "$#"
$ ~/tmp/count_words.bash 15 tex tikz
2309328 tex/resume.log 9692402 tikz/tikz-Graphics in LaTeX with TikZ.mp4
2242997 tex/resume_cv.log 2208818 tikz/tikz-Tikz-Graphs and Automata.mp4
2242969 tex/cover_letters/resume_cv.log 852631 tikz/tikz-Drawing Automata with TikZ in LaTeX.mp4
73859 tex/pgfplots/plotdata/heightmap.dat 711004 tikz/tikz-tutorial.mp4
49152 tex/pgfplots/lena.dat 300038 tikz/.ipynb_checkpoints/TikZ 11 Design Principles-checkpoint.ipynb
43354 tex/nancy.mp4 300038 tikz/TikZ 11 Design Principles.ipynb
31226 tex/pgfplots/pgfplotstodo.tex 215583 tikz/texample/bridges-of-konigsberg.svg
26000 tex/pgfplots/plotdata/ou.dat 108040 tikz/Visual TikZ.pdf
20481 tex/pgfplots/pgfplotstable.tex 82540 tikz/worldflags.pdf
19571 tex/pgfplots/pgfplots.reference.3dplots.tex 37608 tikz/texample/india-map.tex
19561 tex/pgfplots/plotdata/risingdrop3d_coord.dat 35798 tikz/.ipynb_checkpoints/TikZ-checkpoint.ipynb
19561 tex/pgfplots/plotdata/risingdrop3d_vel.dat 35656 tikz/texample/periodic_table.svg
18207 tex/pgfplots/ChangeLog 35501 tikz/TikZ.ipynb
17710 tex/pgfplots/pgfplots.reference.markers-meta.tex 25677 tikz/tikz-Graphics in LaTeX with TikZ.info.json
13800 tex/pgfplots/pgfplots.reference.axisdescription.tex 14760 tikz/tikz-Tikz-Graphs and Automata.info.json
column can print files side-by-side in columns. You can use process substitution with <(command) to have those "files" be live commands instead of actual files.
#!/bin/bash
top-files() {
local n="$1"
local dir="$2"
find "$dir" -type f -exec wc -l {} + |
head -n -1 | sort -rn | head -n "$n"
}
n="$1"
dir1="$2"
dir2="$3"
column <(top-files "$n" reuters-topics/"$dir1") \
<(top-files "$n" reuters-topics/"$dir2")

find command in for loop does not list all the files

i made a script, which lists everysingle file in the current directory and subdirectory and gives me the md5sum from the head and tail (with offset) of this file and saves it into a .txt file.
i made this with pipes, so i wasn't able to redirect a variable, which has been declared before by userinput. So i changed my script to a for loop.
Problem now: it doesn't list all the files, but only one. And it seems to do this randomely. Why doesn't it list all the files like before?
I even tryed **.* and ./* and so on. I use a macbookpro mac os 10.13.6. I onced installed something so i could use linux commands aswell for example like tree etc...
any help is appreciated! I have no clue whatelse i can do.
old code in which the variable wasn't redirected:
#!/bin/bash
echo Wie heißt die Festplatte?
read varname
echo Los gehts!
before=$(date +%s)
find . \( ! -regex '.*/\..*' \) -type f -exec bash -c 'h=`tail -n +50000 "{}" | head -c 1000 | md5`;\
t=`tail -c 51000 "{}" | head -c 1000 | md5`;\
echo "$varname {} ; $h ; $t"' \;> /Users/Tobias/Desktop/$varname.txt
after=$(date +%s)
echo Das hat: $(((after - $before)/60)) Minuten bzw $(((after - $before))) Sekunden gedauert
new code in which it doesn't list all the files but only one :
#!/bin/bash
echo Wie heißt die Festplatte?
read varname
echo Los gehts!
before=$(date +%s)
for i in $( find . \( ! -regex '.*/\..*' \) -type f ); do
h=$(tail -n +50000 $i | head -c 1000 | md5)
t=$(tail -c 51000 $i | head -c 1000 | md5)
echo "$varname; $i ; $h ; $t" > /Users/Tobias/Desktop/$varname.txt
done
after=$(date +%s)
echo Das hat: $(((after - $before)/60)) Minuten bzw $(((after - $before))) Sekunden gedauert
You are overwriting the file in each iteration of the loop. Use the append mode instead:
echo "$varname; $i ; $h ; $t" >> /Users/Tobias/Desktop/"$varname".txt
# ~~
or redirect the output of the whole loop:
echo "$varname; $i ; $h ; $t"
done > /Users/Tobias/Desktop/"$varname".txt
Redirect the output of the entire loop, not each echo statement, which overwrites the file each time.
for i in $( find . \( ! -regex '.*/\..*' \) -type f ); do
h=$(tail -n +50000 $i | head -c 1000 | md5)
t=$(tail -c 51000 $i | head -c 1000 | md5)
echo "$varname; $i ; $h ; $t"
done > /Users/Tobias/Desktop/$varname.txt

rename files in a folder using find shell

i have a n files in a different folders like abc.mp3 acc.mp3 bbb.mp3 and i want to rename them 01-abc.mp3, 02-acc.mp3, 03-bbb.mp3... i tried this
#!/bin/bash
IFS='
'
COUNT=1
for file in ./uff/*;
do mv "$file" "${COUNT}-$file" let COUNT++ done
but i keep getting errors like for syntax error near 'do and sometimes for not found... Can someone provide single line solution to this using "find" from terminal. i'm looking for a solution using find only due to certain constraints... Thanks in advance
I'd probably use:
#!/bin/bash
cd ./uff || exit 1
COUNT=1
for file in *.mp3;
do
mv "$file" $(printf "%.2d-%s" ${COUNT} "$file")
((COUNT++))
done
This avoids a number of issues and also includes a 2-digit number for the first 9 files (the next 90 get 2-digit numbers anyway, and after that you get 3-digit numbers, etc).
you can try this;
#!/bin/bash
COUNT=1
for file in ./uff/*;
do
path=$(dirname $file)
filename=$(basename $file)
if [ $COUNT -lt 10 ]; then
mv "$file" "$path"/0"${COUNT}-$filename";
else
mv "$file" "$path"/"${COUNT}-$filename";
fi
COUNT=$(($COUNT+1));
done
Eg:
user#host:/tmp/test$ ls uff/
abc.mp3 acc.mp3 bbb.mp3
user#host:/tmp/test$ ./test.sh
user#host:/tmp/test$ ls uff/
01-abc.mp3 02-acc.mp3 03-bbb.mp3
Ok, here's the version without loops:
paste -d'\n' <(printf "%s\n" *) <(printf "%s\n" * | nl -w1 -s-) | xargs -d'\n' -n2 mv -v
You can also use find if you want:
paste -d'\n' <(find -mindepth 1 -maxdepth 1 -printf "%f\n") <(find -mindepth 1 -maxdepth 1 -printf "%f\n" | nl -w1 -s-) | xargs -d'\n' -n2 mv -v
Replace mv with echo mv for the "dry run":
paste -d'\n' <(printf "%s\n" *) <(printf "%s\n" * | nl -w1 -s-) | xargs -d'\n' -n2 echo mv -v
Here's a solution.
i=1
for f in $(find ./uff -mindepth 1 -maxdepth 1 -type f | sort)
do
n=$i
[ $i -lt 10 ] && n="0$i"
echo "$f" "$n-$(basename "$f")"
((i++))
done
And here it is as a one-liner (but in real life if you ever tried anything remotely like what's below in a coding or ops interview you'd not only fail to get the job, you'd probably give the interviewer PTSD. They'd wake up in cold sweats thinking about how terrible your solution was).
i=1; for f in $(find ./uff -mindepth 1 -maxdepth 1 -type f | sort); do n=$i; [ $i -lt 10 ] && n="0$i"; echo "$f" "$n-$(basename "$f")" ; ((i++)); done
Alternatively, you could just cd ./uff if you wanted the rename them in the same directory, and then use find . (along with the other find arguments) to clear everything up. I'm assuming you only want files moved, not directories. And I'm assuming you don't want to recursively rename files / directories.

How to locate the directory where the sum of the number of lines of regular file is greatest (in bash)

Hi i'm new in Unix and bash and I'd like to ask q. how can i do this
The specified directory is given as arguments. Locate the directory
where the sum of the number of lines of regular file is greatest.
Browse all specific directories and their subdirectories. Amounts
count only for files that are directly in the directory.
I try somethnig but it's not working properly.
while [ $# -ne 0 ];
do case "$1" in
-h) show_help ;;
-*) echo "Error: Wrong arguments" 1>&2 exit 1 ;;
*) directories=("$#") break ;;
esac
shift
done
IFS='
'
amount=0
for direct in "${directories[#]}"; do
for subdirect in `find $direct -type d `; do
temp=`find "$subdirect" -type f -exec cat {} \; | wc -l | tr -s " "`
if [ $amount -lt $temp ]; then
amount=$temp
subdirect2=$subdirect
fi
done
echo Output: "'"$subdirect2$amount"'"
done
the problem is here when i use as arguments this dirc.(just example)
/home/usr/first and there are this direct.
/home/usr/first/tmp/first.txt (50 lines)
/home/usr/first/tmp/second.txt (30 lines)
/home/usr/first/tmp1/one.txt (20 lines)
it will give me on Output /home/usr/first/tmp1 100 and this is wrong it should be /home/usr/first/tmp 80
I'd like to scan all directories and all its subdirectories in depth. Also if multiple directories meets the maximum should list all.
Given your sample files, I'm going to assume you only want to look at the immediate subdirectories, not recurse down several levels:
max=-1
# the trailing slash limits the wildcard to directories only
for dir in */; do
count=0
for file in "$dir"/*; do
[[ -f "$file" ]] && (( count += $(wc -l < "$file") ))
done
if (( count > max )); then
max=$count
maxdir="$dir"
fi
done
echo "files in $maxdir have $max lines"
files in tmp/ have 80 lines
In the spirit of Unix (caugh), here's an absolutely disgusting chain of pipes that I personally hate, but it's a lot of fun to construct :):
find . -mindepth 1 -maxdepth 1 -type d -exec sh -c 'find "$1" -maxdepth 1 -type f -print0 | wc -l --files0-from=- | tail -1 | { read a _ && echo "$a $1"; }' _ {} \; | sort -nr | head -1
Of course, don't use this unless you're mentally ill, use glenn jackman's nice answer instead.
You can have great control on find's unlimited filtering possibilities, too. Yay. But use glenn's answer!

Bash script to list files not found

I have been looking for a way to list file that do not exist from a list of files that are required to exist. The files can exist in more than one location. What I have now:
#!/bin/bash
fileslist="$1"
while read fn
do
if [ ! -f `find . -type f -name $fn ` ];
then
echo $fn
fi
done < $fileslist
If a file does not exist the find command will not print anything and the test does not work. Removing the not and creating an if then else condition does not resolve the problem.
How can i print the filenames that are not found from a list of file names?
New script:
#!/bin/bash
fileslist="$1"
foundfiles="~/tmp/tmp`date +%Y%m%d%H%M%S`.txt"
touch $foundfiles
while read fn
do
`find . -type f -name $fn | sed 's:./.*/::' >> $foundfiles`
done < $fileslist
cat $fileslist $foundfiles | sort | uniq -u
rm $foundfiles
#!/bin/bash
fileslist="$1"
while read fn
do
FPATH=`find . -type f -name $fn`
if [ "$FPATH." = "." ]
then
echo $fn
fi
done < $fileslist
You were close!
Here is test.bash:
#!/bin/bash
fn=test.bash
exists=`find . -type f -name $fn`
if [ -n "$exists" ]
then
echo Found it
fi
It sets $exists = to the result of the find. the if -n checks if the result is not null.
Try replacing body with [[ -z "$(find . -type f -name $fn)" ]] && echo $fn. (note that this code is bound to have problems with filenames containing spaces).
More efficient bashism:
diff <(sort $fileslist|uniq) <(find . -type f -printf %f\\n|sort|uniq)
I think you can handle diff output.
Give this a try:
find -type f -print0 | grep -Fzxvf - requiredfiles.txt
The -print0 and -z protect against filenames which contain newlines. If your utilities don't have these options and your filenames don't contain newlines, you should be OK.
The repeated find to filter one file at a time is very expensive. If your file list is directly compatible with the output from find, run a single find and remove any matches from your list:
find . -type f |
fgrep -vxf - "$1"
If not, maybe you can massage the output from find in the pipeline before the fgrep so that it matches the format in your file; or, conversely, massage the data in your file into find-compatible.
I use this script and it works for me
#!/bin/bash
fileslist="$1"
found="Found:"
notfound="Not found:"
len=`cat $1 | wc -l`
n=0;
while read fn
do
# don't worry about this, i use it to display the file list progress
n=$((n + 1))
echo -en "\rLooking $(echo "scale=0; $n * 100 / $len" | bc)% "
if [ $(find / -name $fn | wc -l) -gt 0 ]
then
found=$(printf "$found\n\t$fn")
else
notfound=$(printf "$notfound\n\t$fn")
fi
done < $fileslist
printf "\n$found\n$notfound\n"
The line counts the number of lines and if its greater than 0 the find was a success. This searches everything on the hdd. You could replace / with . for just the current directory.
$(find / -name $fn | wc -l) -gt 0
Then i simply run it with the files in the files list being separated by newline
./search.sh files.list

Resources