Bash script to delete watched videos - bash

I would like to create a bash script to check video files in a directory, and delete those, that are already watched.
I was thinking about using stat -c %w and stat -c %x and compare the birth and the last access of the video.
I have used stat -c %w to determine the date of the file creation, but I am unsure of stat -c %x. When is the access time updated? Will it really show the last time the video was opened? Is there any other scenarios that could change the access time?
Another issue is files that are in a directory and not "naked" in the work-directory. How should I handle those?
Do you have maybe a better solution?

find is using fstat system call as well.
-printf option can be used to display the modification time in seconds since EPOCH (%T#) and access time in seconds since EPOCH (%A#) and filename of each files found.
Naturally, awk can be used to compare the values and prints only filenames with different times.
Give this a try:
find . -type f -printf "%T# %A# %p\n" | awk '{if (substr($0,1,21)!=substr($0,23,21)) { print substr($0,45); }}' | xargs -I xxxx rm 'xxxx'
Maybe it should be tested first with printf:
find . -type f -printf "%T# %A# %p\n" | awk '{if (substr($0,1,21)!=substr($0,23,21)) { print substr($0,45); }}' | xargs -I xxxx printf "%s\n" 'xxxx'
The test of the version with printf:
$ touch foo bar
$ ls
bar foo
$ find . -type f -printf "%T# %A# %p\n" | awk '{if (substr($0,1,21)!=substr($0,23,21)) { print substr($0,45); }}' | xargs -I xxxx printf "%s\n" 'xxxx'
$ cat foo
$ find . -type f -printf "%T# %A# %p\n" | awk '{if (substr($0,1,21)!=substr($0,23,21)) { print substr($0,45); }}' | xargs -I xxxx printf "%s\n" 'xxxx'
./foo
Note: this one-liner command is working with filenames that do not contain special chars: \n " ' etc.

Related

Passing awk results to command after pipe

I'm trying to pass what would be the awk outputs of print $1 and print $2 to setfattr after a pipe. The value of the extended attribute is an MD5 hash which is calculated from input files from the output of a find command. This is what I have so far:
find /path/to/dir -type f \
-regextype posix-extended \
-not -iregex '.*\.(jpg|docx|psd|jpeg|png|html|bmp|gif|txt|pdf|mp3|bts|srt)' \
| parallel -j 64 md5sum | awk '{system("setfattr -n user.digest.md5 -v " $1 $2)}'
Having awk '{print $1}' and $2 after the last pipe returns the hash and file path respectively just fine, I'm just not sure how to get those values into setfattr. setfattr just throws a generic usage error when that command is run. Is this just a syntax issue or am I going about this totally wrong?
Try piping the output of the parallel command into a while loop:
find /path/to/dir -type f \
-regextype posix-extended \
-not -iregex '.*\.(jpg|docx|psd|jpeg|png|html|bmp|gif|txt|pdf|mp3|bts|srt)' |
parallel -j 64 md5sum |
while read hash file; do
setfattr -n user.digest.md5 -v ${hash} ${file}
done

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

How can I count the number of words in a directory recursively?

I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.
Can anyone help me find out a quick way to do this?
bash or vim would be good!
Thanks
use find the scan the dir tree and wc will do the rest
$ find path -type f | xargs wc -w | tail -1
last line gives the totals.
tldr;
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
Explanation:
The find . -type f -exec wc -w {} + will run wc -w on all the files (recursively) contained by . (the current working directory). find will execute wc as few times as possible but as many times as is necessary to comply with ARG_MAX --- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX, then find invokes wc -w more than once, giving multiple total lines:
$ find . -type f -exec wc -w {} + | awk '/total/{print $0}'
8264577 total
654892 total
1109527 total
149522 total
174922 total
181897 total
1229726 total
2305504 total
1196390 total
5509702 total
9886665 total
Isolate these partial sums by printing only the first whitespace-delimited field of each total line:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665
paste the partial sums with a + delimiter to give an infix summation:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665
Evaluate the infix summation using bc, which supports both infix expressions and arbitrary precision:
$ find . -type f -exec wc -w {} + | awk '/total/{print $1}' | paste -sd+ | bc
30663324
References:
https://www.cyberciti.biz/faq/argument-list-too-long-error-solution/
https://www.in-ulm.de/~mascheck/various/argmax/
https://linux.die.net/man/1/find
https://linux.die.net/man/1/wc
https://linux.die.net/man/1/awk
https://linux.die.net/man/1/paste
https://linux.die.net/man/1/bc
You could find and print all the content and pipe to wc:
find path -type f -exec cat {} \; -exec echo \; | wc -w
Note: the -exec echo \; is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.
Or you could find and wc and use awk to aggregate the counts:
find . -type f -exec wc -w {} \; | awk '{ sum += $1 } END { print sum }'
If there's one thing I've learned from all the bash questions on SO, it's that a filename with a space will mess you up. This script will work even if you have whitespace in the file names.
#!/usr/bin/env bash
shopt -s globstar
count=0
for f in **/*.txt
do
words=$(wc -w "$f" | awk '{print $1}')
count=$(($count + $words))
done
echo $count
Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:
wc -l *
10 000292_0
500 000297_0
510 total
If you want to count the words for only a specific extension in the current directory , you could try :
cat *.txt | wc -l

bash - padding find results

I'm running the following command to get a directory listing:
find ./../ \
-type f -newer ./lastsearchstamp -path . -prune -name '*.txt' -o -name '*.log' \
| awk -F/ '{print $NF " - " $FILENAME}'
Is there some way I can format the output in a 2 column left indented layout so that the output looks legible?
The command above always adds a constant spacing between the filename and the path.
Expected output:
abc.txt /root/somefolder/someotherfolder/
helloworld.txt /root/folder/someotherfolder/
a.sh /root/folder/someotherfolder/scripts
I nice tool for this kind of thing is column -t. You just add the command on to the end of the pipeline:
find ... | awk -F/ '{print $NF " - " $FILENAME}' | column -t

Find files in order of modification time

I have a certain shell script like this:
for name in `find $1 -name $2 -type f -mmin +$3`
do
Filename=`basename "ls $name"`
echo "$Filename">>$1/order.txt
done
find command returns N number of files in alphabetical order. Their names are inserted into order.txt in alphabetical order. How to change this into the order of modification time?
i.e., if file F2 was modified first then file F1, then the above script enters first F1 then F2 into order.txt as per alphabetical order. But I want F2 to be entered first then F1, that is as per order of modification timeI want order.txt after the script to be
Order.txt=>
F2F1and not as F1F2Please help
find has an -exec switch, allowing you to pass any matched filenames to an external command:
find $1 -name $2 -type -mmin +$3 -exec ls -1t [-r] {} +
With this, find will pass all of the matching files at once to ls and allow that to do the sorting for you. With the optional -r flag, files will be printed in order of oldest to newest; without, in order of newest to oldest.
for name in `find $1 -name $2 -type f -mmin +$3`
do
ftime=$(stat -c %Y $name)
Filename=`basename "ls $name"`
echo "$ftime $Filename"
done | sort -n | awk '{print $2}' > $1/order.txt
One way: get file mtimes in seconds since epoch, sort on seconds since epoch, then print only the filename
Here you go:
find_date_sorted() {
# Ascending, by ISO date
while IFS= read -r -d '' -u 9
do
cut -d ' ' -f 3- <<< "$REPLY"
done 9< <(find ${1+"$#"} -printf '%TY-%Tm-%Td %TH:%TM:%TS %p\0' | sort -z)
}

Resources