Print an ordered list of files based on files size in bash - bash

I made the following script to find files based on a 'find' command and then print out the results:
#!/bin/bash
loc_to_look='./'
file_list=$(find $loc_to_look -type f -name "*.txt" -size +5M)
total_size=`du -ch $file_list | tail -1 | cut -f 1`
echo 'total size of all files is: '$total_size
for file in $file_list; do
size_of_file=`du -h $file | cut -f 1`
echo $file" "$size_of_file
done
...which give me output like:
>>> ./file_01.txt 12.0M
>>> ./file_04.txt 24.0M
>>> ./file_06.txt 6.0M
>>> ./file_02.txt 6.2M
>>> ./file_07.txt 84.0M
>>> ./file_09.txt 55.0M
>>> ./file_10.txt 96.0M
What I would like to do first, though, is sort the list by file size before printing it out. What is the best way to go about doing this?

Easy to do if you grab the file size in bytes, just pipe to sort
find $loc_to_look -type f -name "*.txt" -size +5M -printf "%f %s\n" | sort -n -k 2
If you wanted to make the file sizes print in MB, you could finally pipe to awk:
find $loc_to_look -type f -printf "%f %s\n" | sort -n -k 2 | awk '{ printf "%s %.1fM\n", $1, $2/1024/1024}'

Related

How to call a function while using find in bash?

So my objective here is to print a small graph, followed by the file size and the path for the 15 largest files. However, I'm running into issues trying to call the create_graph function on each line. Here's what isn't working
find $path -type f | sort -nr | head -$n | while read line; do
size=$(stat -c '%s' $line)
create_graph $largest $size 50
echo "$size $line"
done
My problem is that it isn't sorting the files, and the files aren't the n largest files. So it appears my "while read line" is messing it all up.
Any suggestions?
The first command,
find $path -type f
just prints out file names. So it can't sort them by size. If you want to sort them by size, you need to make it print out the size. Try this:
find $path -type f -exec du -b {} \; | sort -nr | cut -f 2 | head -$n | ...
Update:
Actually, only the first part of that seems to do everything you want from it:
find $path -type f -exec du -b {} \; | sort -nr | head -$n
will print out a table with size and filename, sorted by file size, and limited to $n rows.
Of course I don't know what the create_graph does.
Explanation:
find $path -type f -exec du -b {} \;
Find all files (not directories or links) in ${path} or its subdirectories, and execute the command du -b <file> on each.
du -b <file>
will output the size of the file (disk usage). See man du for details.
This will produce something like this:
8880 ./line_too_long/line.o
4470 ./line_too_long/line.f
934 ./random/rand.f
9080 ./random/rand
23602 ./random/monte
7774 ./random/monte.f90
13610 ./format/form
288 ./format/form.f90
411 ./delme.f90
872 ./delme_mod.mod
9029 ./delme
So for each file, it prints the size (-b for 'in bytes').
Then you can do a numerical sort on that.
$ find . -type f -exec du -b {} \; | sort -nr
23602 ./random/monte
13610 ./format/form
9080 ./random/rand
9029 ./delme
8880 ./line_too_long/line.o
7774 ./random/monte.f90
4470 ./line_too_long/line.f
934 ./random/rand.f
872 ./delme_mod.mod
411 ./delme.f90
288 ./format/form.f90
And if you then cut it off after the first, say five entries:
$ find . -type f -exec du -b {} \; | sort -nr | head -5
23602 ./random/monte
13610 ./format/form
9080 ./random/rand
9029 ./delme
8880 ./line_too_long/line.o
Some idea to put that back together:
find . -type f -exec du -b {} \; | sort -nr | head -$n | while read line; do
size=$(cut -d ' ' -f 1 <<< $line)
file=$(cut -d ' ' -f 2 <<< $line)
create_graph $largest $size 50
echo $line
done
Note that I have no idea what create_graph is or what $largest contains. I took that straight out of your script.

How can I count the number of words in a directory recursively?

I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.
Can anyone help me find out a quick way to do this?
bash or vim would be good!
Thanks
use find the scan the dir tree and wc will do the rest
$ find path -type f | xargs wc -w | tail -1
last line gives the totals.
tldr;
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
Explanation:
The find . -type f -exec wc -w {} + will run wc -w on all the files (recursively) contained by . (the current working directory). find will execute wc as few times as possible but as many times as is necessary to comply with ARG_MAX --- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX, then find invokes wc -w more than once, giving multiple total lines:
$ find . -type f -exec wc -w {} + | awk '/total/{print $0}'
8264577 total
654892 total
1109527 total
149522 total
174922 total
181897 total
1229726 total
2305504 total
1196390 total
5509702 total
9886665 total
Isolate these partial sums by printing only the first whitespace-delimited field of each total line:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665
paste the partial sums with a + delimiter to give an infix summation:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665
Evaluate the infix summation using bc, which supports both infix expressions and arbitrary precision:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
30663324
References:
https://www.cyberciti.biz/faq/argument-list-too-long-error-solution/
https://www.in-ulm.de/~mascheck/various/argmax/
https://linux.die.net/man/1/find
https://linux.die.net/man/1/wc
https://linux.die.net/man/1/awk
https://linux.die.net/man/1/paste
https://linux.die.net/man/1/bc
You could find and print all the content and pipe to wc:
find path -type f -exec cat {} \; -exec echo \; | wc -w
Note: the -exec echo \; is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.
Or you could find and wc and use awk to aggregate the counts:
find . -type f -exec wc -w {} \; | awk '{ sum += $1 } END { print sum }'
If there's one thing I've learned from all the bash questions on SO, it's that a filename with a space will mess you up. This script will work even if you have whitespace in the file names.
#!/usr/bin/env bash
shopt -s globstar
count=0
for f in **/*.txt
do
words=$(wc -w "$f" | awk '{print $1}')
count=$(($count + $words))
done
echo $count
Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:
wc -l *
10 000292_0
500 000297_0
510 total
If you want to count the words for only a specific extension in the current directory , you could try :
cat *.txt | wc -l

Script to count totals from commands and output to screen

I am looking for assistance in creating a bash script that will run several similar commands, sum up the totals and output that total to the screen. I want to run the following commands:
find /var/log/audit -xdev -type f -printf '%i\n' | sort -u | wc -l
find /boot -xdev -type f -printf '%i\n' | sort -u | wc -l
find /home -xdev -type f -printf '%i\n' | sort -u | wc -l
And so on. I have a few others. What I am basically doing is counting up all of the files in each mount point on my system, then I need the script to sum up all of the output from each commands "wc -l" and output the grand total to the screen. Any help is greatly appreciated.
this should work:
a=$(find /var/log/audit -xdev -type f -printf '%i\n' | sort -u | wc -l)
b=$(find /boot -xdev -type f -printf '%i\n' | sort -u | wc -l)
c=$(find /home -xdev -type f -printf '%i\n' | sort -u | wc -l)
final=$(($a+$b+$c))
echo $final
this will work without naming names, change the echo n with your scripts
awk '{sum+=$1} END{print "total: "sum}' < <(echo 4; echo 5; echo 6)
alternatively if the individual counts are not required you can pass more than one path to find
find path1 path2 path3 ...
This might be a good place for dc
{
for mnt in /var/log/audit /boot /home; do
find "$mnt" -xdev -type f -printf '%i\n' | sort -u | wc -l
done
echo "+"
echo "+"
echo "p"
} | dc
You need one less "+" than your number of mountpoints.
I would redirect each commands output to a file
your_command >> results.txt
and sum them up
awk '{ sum += $1 } END { print sum }' results.txt

Need a command that will separate the count accounting to files that are < 1M lines and > 1M lines

Environment: Solaris 9
I have a command that gives me a total count of files. But I need a command that will separately count files that are less than 1M lines and files that are more than 1M lines long. How can I do that?
find . -type f -exec wc -l {} \; | awk '{print $1}' | paste -sd+ | bc
Use the -size option:
echo "Smaller: $(find . -type f -size -1M | wc -l)"
echo "Larger: $(find . -type f -size +1M | wc -l)"
When your find does not support 1M, just write the full number.
EDIT: #rojomoke's comment, I have here is a version that counts LINES in the files with the wc utility, since that is what you used in your original post
Code:
# here I am already in the directory with the files so I just use *
# to refer to all files
# the wc -l will return a single column of counts so I use $1 to
# refer to field 1
wc -l * | awk '$1>1e6{bigger++}$1<1e6{smaller++}END{print "Files > 1M lines = ", bigger, "\nFiles < 1M lines = ", smaller}'
Output:
"Files > 1M lines = 454"
"Files < 1M lines = 528"

Find files in order of modification time

I have a certain shell script like this:
for name in `find $1 -name $2 -type f -mmin +$3`
do
Filename=`basename "ls $name"`
echo "$Filename">>$1/order.txt
done
find command returns N number of files in alphabetical order. Their names are inserted into order.txt in alphabetical order. How to change this into the order of modification time?
i.e., if file F2 was modified first then file F1, then the above script enters first F1 then F2 into order.txt as per alphabetical order. But I want F2 to be entered first then F1, that is as per order of modification timeI want order.txt after the script to be
Order.txt=>
F2F1and not as F1F2Please help
find has an -exec switch, allowing you to pass any matched filenames to an external command:
find $1 -name $2 -type -mmin +$3 -exec ls -1t [-r] {} +
With this, find will pass all of the matching files at once to ls and allow that to do the sorting for you. With the optional -r flag, files will be printed in order of oldest to newest; without, in order of newest to oldest.
for name in `find $1 -name $2 -type f -mmin +$3`
do
ftime=$(stat -c %Y $name)
Filename=`basename "ls $name"`
echo "$ftime $Filename"
done | sort -n | awk '{print $2}' > $1/order.txt
One way: get file mtimes in seconds since epoch, sort on seconds since epoch, then print only the filename
Here you go:
find_date_sorted() {
# Ascending, by ISO date
while IFS= read -r -d '' -u 9
do
cut -d ' ' -f 3- <<< "$REPLY"
done 9< <(find ${1+"$#"} -printf '%TY-%Tm-%Td %TH:%TM:%TS %p\0' | sort -z)
}

Resources