Check for size of files deleted after sleep - bash

I have a script that takes a snapshot of the contents of a directory and then sleeps for a certain amount of time.
After it wakes up, it should accumulate the sum of the size of the files DELETED, not shrunk or grown.
My approach to this was something like this:
FILES_SNAPSHOT="$(ls -l | awk 'NR>1 {print $5,$9}')"
echo "Sleeping for $1 seconds..."
sleep $1
FILES_CURRENT="$(ls -l | awk 'NR>1 {print $5,$9}')"
So the snapshot basically stores this:
762 filename.sh
16 anotherfile.sh
...
the first column is the size of the file, second name of the file.
after I store this, the script sleeps, and I take another snapshot.
Now, I need to compare the 2nd column of my 2 variables and check for missing filenames, that have been deleted.
After, I need to take these names and match them to the size, using the first var I declared where I still have this information.
I am not quite sure, but I assume these variables just store a regular string that is formatted nicely, now how can I iterate over both strings I store, and know which file is missing? should I use a loop to iterate and then inside the loop should I look for index -1 for missing files?
Or should I use awk somehow, in a combination with merging these variables?
Any help would be much appreciated, thanks in advance.

init:
snap1=$(mktemp)
snap2=$(mktemp)
find -maxdepth 1 -type f -printf "%p\t%k\n" | sort > $snap2
loop:
cp $snap2 $snap1
sleep 60
find -maxdepth 1 -type f -printf "%p\t%k\n" | sort > $snap2
awk 'BEGIN {FS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {delete a[$1]}
END {for(k in a) {sum+=a[k]}; print sum+0}' $snap1 $snap2
print total size of deleted files in kb.
end:
unlink $snap1
unlink $snap2
Using ls is not recommended in scripts, use find instead as shown. Creating files is no concern in Unix systems, you can avoid it if you like to but will just make it harder. I'm not sure how your script is currently written but the loop content can be replaced with what I suggested. You just need to initialize one of the snapshots and at the end delete both temp files.

Related

Is there a way to take an input that behaves like a file in bash?

I have a task where I'm given an input of the format:
4
A CS 22 M
B ECE 23 M
C CS 23 F
D CS 22 F
as the user input from the command line. From this, we have to perform tasks like determine the number of male and female students, determine which department has the most students, etc. I have done this using awk with the input as a file. Is there any way to do this with a user input instead of a file?
Example of a command I used for a file (where the content in the file is in the same format):
numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l) #list number of males
Not Reproducible
It works fine for me, so your problem can't be reproduced with either GNU or BSD awk under Bash 5.0.18(1). With your posted code and file sample:
$ numberofmales=$(awk -F ' ' '{print $4}' file.txt | grep M | wc -l)
$ echo $numberofmales
2
Check to make sure you don't have problems in your input file, or elsewhere in your code.
Also, note that if you call awk without a file argument or input from a pipe, it tries to collect data from standard input. It may not actually be hanging; it's probably just waiting on end-of-file, which you can trigger with CTRL+D.
Recommended Improvements
Even if your code works, it can be improved. Consider the following, which skips the unnecessary field-separator definition and performs all the actions of your pipeline within awk.
males=$(
awk 'tolower($4)=="m" {count++}; END {print count}' file.txt
)
echo "$males"
Fewer moving parts are often easier to debug, and can often be more performant on large datasets. However, your mileage may vary.
User Input
If you want to use user input rather than a file, you can use standard input to collect your data, and then pass it as a quoted argument to a function. For example:
count_males () {
awk 'tolower($4)=="m" {count++}; END {print count}' <<< "$*"
}
echo "Enter data (CTRL-D when done):"
data=$(cat -)
# If at command prompt, wait until EOF above before
# pasting this line. Won't matter in scripts.
males=$(count_males "$data")
The result is now stored in males, and you can echo "$males" or make use of the variable in whatever other way you like.
Bash indeed does not care whether a file handle is connected to standard input or to a file, and neither does Awk.
However, if you want to pass the same input to multiple Awk instances, it really does make sense to store it in a temporary file.
A better overall solution is to write a better Awk script so you only need to read the input once.
awk 'NF > 1 { ++a[$4] } END { for (g in a) print g, a[g] }'
Demo: https://ideone.com/0ML7Xk
The NF > 1 condition is to skip the silly first line. Probably don't put that information there in the first place and let Awk figure out how many lines there are; it's probably better at counting than you are anyway.

Incrementing variable in Bash -sed command

I have a bash script that I'm trying to put together that finds all of the images in a folder and then puts the names of those files into a pre-formatted CSV. I actually have the more complicated parts of it figured out and working well... I'm stuck on a really basic part. I have a variable that I need to increment for each file found, simple enough right? I've tried a bunch of different things and cannot for the life of me get it to increment. Here's the script I'm working with:
EDITED to show less context
i=0
find "$(pwd)" -maxdepth 1 -type f -exec file {} \; | awk -F: '{if ($2 ~/image/) print $1}' | grep -o -E '[^/]*$' | sed -e "s/^/$((++i))/" > "$(pwd)/inventory-$(date +%Y%m%d)-$(date +%I%M).csv"
I've tried incrementing it with i++, i=+1, i=i+1 as well as putting the dollar sign before the different iterations of the i variable... nothing seems to actually increment the variable. My best guess is that this isn't a true loop so it doesn't save the changes to the variable? Any guidance would be greatly appreciated!
The $((++i)) is performed by your shell. But the shell executes this line only once. This line cannot do what you need.
I would increment in inside awk, print it alongside the file name, and then combine output in the further commands.

getting the last opened file

input file:
wtf.txt|/Users/jaro/documents/inc/face/|
lol.txt|/Users/jaro/documents/inc/linked/|
lol.txt|/Users/jaro/documents/inc/twitter/|
lol.txt|/Users/jaro/documents/inc/face/|
wtf.txt|/Users/jaro/documents/inc/face/|
omg.txt|/Users/jaro/documents/inc/twitter/|
omg.txt|/Users/jaro/documents/inc/linked/|
wtf.txt|/Users/jaro/documents/inc/linked/|
lol.txt|/Users/jaro/documents/inc/twitter/|
wtf.txt|/Users/jaro/documents/inc/linked/|
lol.txt|/Users/jaro/documents/inc/face/|
omg.txt|/Users/jaro/documents/inc/twitter/|
omg.txt|/Users/jaro/documents/inc/face/|
wtf.txt|/Users/jaro/documents/inc/face/|
wtf.txt|/Users/jaro/documents/inc/twitter/|
omg.txt|/Users/jaro/documents/inc/linked/|
omg.txt|/Users/jaro/documents/inc/linked/|
input file is the list of opened files (opening file means 1 line of file) i want to get the last opened file in
e.g. : get last opened file in dir /Users/jaro/documents/inc/face/
output:
wtf.txt
This fetches the last line in the file whose second field is the desired folder name, and prints the first field.
awk -F '\|' '$2 == "/Users/jaro/documents/inc/face/" { f=$1 }
END { print f }' file
To test whether the most recent file is also an existing file, I would use the shell to reverse the order with tac and perform the logic; skip the files in the wrong path, and the ones which don't exist, then print the first success and quit.
tac file |
while IFS='|' read -r basename path _; do
case $path in "/Users/jaro/documents/inc/face") ;; *) continue;; esac
test -e "$path/$basename" || continue
echo "$basename"
break
done |
grep .
The final grep . is to produce an exit code which reflects whether or not the command was successful -- if it printed a file, it's okay; if none of the extracted files existed, return error.
Below is my original answer, based on a plausible but apparently incorrect interpretation of your question.
Here is a quick attempt at finding the file with the newest modification time from the list. I avoid parsing ls, prefering instead to use properly machine-parseable output from stat. Since your input file is line-oriented, I assume no file names contain newlines, which simplifies things quite a bit.
awk -F '\|' '$2 == "/Users/jaro/documents/inc/face/" { print $2 $1 }' file |
sort -u |
xargs stat -f '%m %N' |
sort -rn |
awk -F '/' '{ print $NF; exit(0) }'
The first sort is to remove any duplicates, to avoid running stat more times than necessary (premature optimization, perhaps), the stat prefixes each line with the file's modification time expressed as seconds since the epoch, which facilitates easy numerical sorting by age, and the final Awk script neatly combines head -n 1 | rev | cut -d / -f1 | rev i.e. extract just the basename from the first line of output, then quit.
If there is any way to use a less wacky input format, that would be an improvement (probably of your life in general as well).
The output format from stat is not properly standardized, but your question is tagged linuxosx so I assume GNU coreutils BSD stat. If portability is desired, maybe look at find (which however may be overkill and/or not much better standardized across diverse platforms) or write a small Perl or Python script instead. (Well, Ruby too, I suppose, but personally, I'd go with Perl.)
perl -F'\|' -lane '{ $t{$F[0]} = (stat($F[1].$F[0]))[10]
if !defined $t{$F[0]} and $F[1] == "/Users/jaro/documents/inc/face/" }
END { print ((sort { $t{$a} <=> $t{$b} } keys %t)[-1]) }' file
atime – The atime (access time) is the time when the data of a file was last accessed. Displaying the contents of a file or executing a shell script will update a file’s atime, for example. You can view the atime with the ls -lu command
http://www.techtrunch.com/linux/ctime-mtime-atime-linux-timestamps
So in your case, will do the trick.
ls -lu /Users/jaro/documents/inc/face/

How do you get the total size of all files of a certain type within a directory in linux?

I'm trying to figure out how much space all my JPGs within a particular directory ('Personal Drive') are taking up. I figure there is a command line command to accomplish this.
You might be able to use the du command ...
du -ch *.jpg
Found it.
ls -lR | grep .jpg | awk '{sum = sum + $5} END {print sum}'
As I understand it:
ls says list all the files in the directory.
Adding in the -l flag says show me more details like owner, permissions, and file size.
tacking on R to that flag says 'do this recursively'.
Piping that to grep .jpg allows only the output from ls that contains '.jpg' to continue on to the next phase.
Piping that output to awk '{sum = sum + $5} END {print sum}' says take each line and pull its fifth column element (file size in this case) and add that value to our variable sum. If we reach the end of the list, print that variables value.

bash scripting how to filter the most counted first line var

How to filter the most counted first line var in all the files under directory (where other directories should also be checked)?
I want to find all the lines in my files (I want all the files in lots of folders under pwd) first variable where this first var display the most times
I am trying to use awk like this:
awk -f : { print $1} FILENAME
EDIT:
I will explain the purpose:
I have a server and i want to filter his logs cause I have a certain IP which repeat every day 100 times the first var in line is the ip
I want the find what is the ip which repeats problem : i have two servers therefore checking this will not be effiant by checking one log for 100 times I hope that this script will help me find out what is the IP that repeats ...
You should rewrite your question to make it clearer. I understood that you want to know which first lines are most common across a set of files. For that, I'd use this:
head -qn 1 * | sort | uniq -c | sort -nr
head prints the first line for every file in the current directory. -q causes it not to print the name of the file too; -n lets you specify the amount of lines).
sort groups them in sorted order.
uniq -c counts the occurrences, that is the amount of repeated lines in each block after the previous sort.
sort -r orders them with the most popular coming first. -r means reverse; by default it sorts in ascending order.
Not sure, if this helps. Question is not so clear.
Try if something like this can help.
find . -type f -name "*.*" -exec head -1 {} \; 2>/dev/null | awk -F':' 'BEGIN {max=0;}{if($2>max){max=$2;}}END{print max;}'
find - tries to find all the files from the current directory till end (type f) with any name and extension (*.*) and gets the first line of each of those files.
awk - sets the field seperator as : (-F:) and before processing the first line BEGIN sets the max to 0.
gets the second field after : ($2) checks if $2 > current_max_value. If it is, then it sets the current field as the new max value.
At the end of processing all the lines(first lines from all the files under current directory) END prints the max value.

Resources