Getting file size in bytes with bash (Ubuntu) - bash

Hi, i'm looking for a way to output a filesize in bytes. Whatever i try i will get either 96 or 96k instead of 96000.
if [[ -d $1 ]]; then
largestN=$(find $1 -depth -type f | tr '\n' '\0' | du -s --files0-from=- | sort | tail -n 1 | awk '{print $2}')
largestS=$(find $1 -depth -type f | tr '\n' '\0' | du -h --files0-from=- | sort | tail -n 1 | awk '{print $1}')
echo "The largest file is $largestN which is $largestS bytes."
else
echo "$1 is not a directory..."
fi
This prints "The largest file [file] is 96k bytes"

there is -b option for this
$ du -b ...

Looks like you're trying to find the largest file in a given directory. It's more efficient (and shorter) to let find do the heavy lifting for you:
find $1 -type f -printf '%s %p\n' | sort -n | tail -n1
Here, %s expands to the size in bytes of the file, and %p expands to the name of the file.

Related

How to return an MD5 and SHA1 value for multiple files in a directory using BASH

I am creating a BASH script to take a directory as an argument and return to std out a list of all files in that directory with both the MD5 and SHA1 value of the files present in that directory. The only files I'm interested in are those between 100 and 500K. So far I gotten this far. (Section of Script)
cd $1 &&
find . -type f -size +100k -size -500k -printf '%f \t %s \t' -exec md5sum {} \; |
awk '{printf "NAME:" " " $1 "\t" "MD5:" " " $3 "\t" "BYTES:" "\t" $2 "\n"}'
I'm getting a little confused when adding the Sha1 and obviously leaving something out.
Can anybody suggest a way to achieve this.
Ideally I'd like the script to format in the following way
Name Md5 SHA1
(With the relevant fields underneath)
Your awk printf bit is overly complicated. Try this:
find . -type f -printf "%f\t%s\t" -exec md5sum {} \; | awk '{ printf "NAME: %s MD5: %s BYTES: %s\n", $1, $3, $2 }'
Just read line by line the list of files outputted by find:
find . -type f |
while IFS= read -r l; do
echo "$(basename "$l") $(md5sum <"$l" | cut -d" " -f1) $(sha1sum <"$l" | cut -d" " -f1)"
done
It's better to use a zero separated stream:
find . -type f -print0 |
while IFS= read -r -d '' l; do
echo "$(basename "$l") $(md5sum <"$l" | cut -d" " -f1) $(sha1sum <"$l" | cut -d" " -f1)"
done
You could speed up something with xargs and multiple processes with -P option to xargs:
find . -type f -print0 |
xargs -0 -n1 sh -c 'echo "$(basename "$1") $(md5sum <"$1" | cut -d" " -f1) $(sha1sum <"$1" | cut -d" " -f1)"' --
Consider adding -maxdepth 1 to find if you are not interested in files in subdirectories recursively.
It's easy from xargs to go to -exec:
find . -type f -exec sh -c 'echo "$1 $(md5sum <"$1" | cut -d" " -f1) $(sha1sum <"$1" | cut -d" " -f1)"' -- {} \;
Tested on repl.
Add those -size +100k -size -500k args to find to limit the sizes.
The | cut -d" " -f1 is used to remove the - that is outputted by both md5sum and sha1sum. If there are no spaces in filenames, you could run a single cut process for the whole stream, so it should be slightly faster:
find . -type f -print0 |
xargs -0 -n1 sh -c 'echo "$(basename "$1") $(md5sum <"$1") $(sha1sum <"$1")"' -- |
cut -d" " -f1,2,5
I also think that running a single md5sum and sha1sum process maybe would be faster rather then spawning multiple separate processes for each file, but such method needs storing all the filenames somewhere. Below a bash array is used:
IFS=$'\n' files=($(find . -type f))
paste -d' ' <(
printf "%s\n" "${files[#]}") <(
md5sum "${files[#]}" | cut -d' ' -f1) <(
sha1sum "${files[#]}" | cut -d' ' -f1)
Your find is fine, you want to join the results of two of those, one for each hash. The command for that is join, which expects sorted inputs.
doit() { find -type f -size +100k -size -500k -exec $1 {} + |sort -k2; }
join -j2 <(doit md5sum) <(doit sha1sum)
and that gets you the raw data in sane environments. If you want pretty data, you can use the column utility:
join -j2 <(doit md5sum) <(doit sha1sum) | column -t
and add nice headers:
(echo Name Md5 SHA1; join -j2 <(doit md5sum) <(doit sha1sum)) | column -t
and if you're in an unclean environment where people put spaces in file names, protect against that by subbing in tabs for the field markers:
doit() { find -type f -size +100k -size -500k -exec $1 {} + \
| sed 's, ,\t,'| sort -k2 -t$'\t' ; }
join -j2 -t$'\t' <(doit md5sum) <(doit sha1sum) | column -ts$'\t'

I want my script to echo "$1" into a file literally

This is part of my script
#!/bin/bash
echo "ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//';" >> script2.sh
This echos everything nicely into my script except $1 and $2. Instead of that it outputs the input of those variables but i want it to literally read "$1" and "$2". Help?
Escape it:
echo "ls /SomeFolder | grep \$1 | xargs cat | grep something | grep .txt | awk '{print \$2}' | sed 's/;\$//';" >> script2.sh
Quote it:
echo "ls /SomeFolder | grep "'$'"1 | xargs cat | grep something | grep .txt | awk '{print "'$'"2}' | sed 's/;"'$'"//';" >> script2.sh
or like this:
echo 'ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '\''{print $2}'\'' | sed '\''s/;$//'\'';' >> script2.sh
Use quoted here document:
cat << 'EOF' >> script2.sh
ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//';
EOF
Basically you want to prevent expansion, ie. take the string literaly. You may want to read bashfaq quotes
First, you'd never write this (see https://mywiki.wooledge.org/ParsingLs, http://porkmail.org/era/unix/award.html and you don't need greps+seds+pipes when you're using awk):
ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//'`
you'd write this instead:
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -exec \
awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}' {} +
or if you prefer using print | xargs instead of -exec:
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -print0 |
xargs -0 awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}'
and now to append that script to a file would be:
cat <<'EOF' >> script2.sh
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -print0 |
xargs -0 awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}'
EOF
Btw, if you want the . in .txt to be treated literally instead of as a regexp metachar meaning "any character" then you should be using \.txt instead of .txt.

How to call a function while using find in bash?

So my objective here is to print a small graph, followed by the file size and the path for the 15 largest files. However, I'm running into issues trying to call the create_graph function on each line. Here's what isn't working
find $path -type f | sort -nr | head -$n | while read line; do
size=$(stat -c '%s' $line)
create_graph $largest $size 50
echo "$size $line"
done
My problem is that it isn't sorting the files, and the files aren't the n largest files. So it appears my "while read line" is messing it all up.
Any suggestions?
The first command,
find $path -type f
just prints out file names. So it can't sort them by size. If you want to sort them by size, you need to make it print out the size. Try this:
find $path -type f -exec du -b {} \; | sort -nr | cut -f 2 | head -$n | ...
Update:
Actually, only the first part of that seems to do everything you want from it:
find $path -type f -exec du -b {} \; | sort -nr | head -$n
will print out a table with size and filename, sorted by file size, and limited to $n rows.
Of course I don't know what the create_graph does.
Explanation:
find $path -type f -exec du -b {} \;
Find all files (not directories or links) in ${path} or its subdirectories, and execute the command du -b <file> on each.
du -b <file>
will output the size of the file (disk usage). See man du for details.
This will produce something like this:
8880 ./line_too_long/line.o
4470 ./line_too_long/line.f
934 ./random/rand.f
9080 ./random/rand
23602 ./random/monte
7774 ./random/monte.f90
13610 ./format/form
288 ./format/form.f90
411 ./delme.f90
872 ./delme_mod.mod
9029 ./delme
So for each file, it prints the size (-b for 'in bytes').
Then you can do a numerical sort on that.
$ find . -type f -exec du -b {} \; | sort -nr
23602 ./random/monte
13610 ./format/form
9080 ./random/rand
9029 ./delme
8880 ./line_too_long/line.o
7774 ./random/monte.f90
4470 ./line_too_long/line.f
934 ./random/rand.f
872 ./delme_mod.mod
411 ./delme.f90
288 ./format/form.f90
And if you then cut it off after the first, say five entries:
$ find . -type f -exec du -b {} \; | sort -nr | head -5
23602 ./random/monte
13610 ./format/form
9080 ./random/rand
9029 ./delme
8880 ./line_too_long/line.o
Some idea to put that back together:
find . -type f -exec du -b {} \; | sort -nr | head -$n | while read line; do
size=$(cut -d ' ' -f 1 <<< $line)
file=$(cut -d ' ' -f 2 <<< $line)
create_graph $largest $size 50
echo $line
done
Note that I have no idea what create_graph is or what $largest contains. I took that straight out of your script.

Print an ordered list of files based on files size in bash

I made the following script to find files based on a 'find' command and then print out the results:
#!/bin/bash
loc_to_look='./'
file_list=$(find $loc_to_look -type f -name "*.txt" -size +5M)
total_size=`du -ch $file_list | tail -1 | cut -f 1`
echo 'total size of all files is: '$total_size
for file in $file_list; do
size_of_file=`du -h $file | cut -f 1`
echo $file" "$size_of_file
done
...which give me output like:
>>> ./file_01.txt 12.0M
>>> ./file_04.txt 24.0M
>>> ./file_06.txt 6.0M
>>> ./file_02.txt 6.2M
>>> ./file_07.txt 84.0M
>>> ./file_09.txt 55.0M
>>> ./file_10.txt 96.0M
What I would like to do first, though, is sort the list by file size before printing it out. What is the best way to go about doing this?
Easy to do if you grab the file size in bytes, just pipe to sort
find $loc_to_look -type f -name "*.txt" -size +5M -printf "%f %s\n" | sort -n -k 2
If you wanted to make the file sizes print in MB, you could finally pipe to awk:
find $loc_to_look -type f -printf "%f %s\n" | sort -n -k 2 | awk '{ printf "%s %.1fM\n", $1, $2/1024/1024}'

Bash scripting: Deleting the oldest directory

I want to look for the oldest directory (inside a directory), and delete it. I am using the following:
rm -R $(ls -1t | tail -1)
ls -1t | tail -1 does indeed gives me the oldest directory, the the problem is that it is not deleting the directory, and that it also list files.
How could I please fix that?
rm -R "$(find . -maxdepth 1 -type d -printf '%T#\t%p\n' | sort -r | tail -n 1 | sed 's/[0-9]*\.[0-9]*\t//')"
This works also with directory whose name contains spaces, tabs or starts with a "-".
This is not pretty but it works:
rm -R $(ls -lt | grep '^d' | tail -1 | tr " " "\n" | tail -1)
rm -R $(ls -tl | grep '^d' | tail -1 | cut -d' ' -f8)
find directory_name -type d -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -nr | tail -1 | cut -d" " -f2 | xargs -n1 echo rm -Rf
You should remove the echo before the rm if it produces the right results

Resources