Scripting a clamscan summary that adds multiple "Infected files" outputs together - bash

I want a simple way to add 2 numbers taken from a text file. Details below:
Daily, I run clamscan against my /home/ folder, which generates a simple log along the lines of this:
Scanning 851M in /home/.
----------- SCAN SUMMARY -----------
Infected files: 0
Time: 0.000 sec (0 m 0 s)
Start Date: 2021:11:27 06:25:02
End Date: 2021:11:27 06:25:02
Weekly, I scan both my /home/ folder and an external drive, so I get twice as much in the log:
Scanning 851M in /home/.
----------- SCAN SUMMARY -----------
Infected files: 0
Time: 0.000 sec (0 m 0 s)
Start Date: 2021:11:28 06:25:02
End Date: 2021:11:28 06:25:02
Scanning 2.8T in /mnt/ext/.
----------- SCAN SUMMARY -----------
Infected files: 0
Time: 0.005 sec (0 m 0 s)
Start Date: 2021:11:28 06:26:30
End Date: 2021:11:28 06:26:30
I don't email the log to myself, I just have a bash script that sends an email that (for the daily scan) reads the number that comes after "Infected files:" and says either "No infected files found" or "Infected files found, check log." (And, to be honest, once I'm 100% comfortable that it all works the way I want it to, I'll skip the "No infected files found" email.) The problem is, I don't know how to make that work for the weekly scan of multiple folders, because the summary I get doesn't combine those numbers.
I'd like the script to find both lines that start "Infected files:", get the numbers that follow, and add them. I guess the ideal solution use a loop in case I ever need to scan more than two folders. I've taken a couple of stabs at it with grep and cut, but I'm just not experienced enough a coder to make it all work.
Thanks!

This bash script will print out the sum of infected files:
#!/bin/bash
n=$(sed -n 's/^Infected files://p' logfile)
echo $((${n//$'\n'/+}))
or a one-liner:
echo $(( $(sed -n 's/^Infected files: \(.*\)/+\1/p' logfile) ))

Related

What is the numerical difference in the number of files in two different directories for every sequence (seq 1-current)?

Every time I write a new amount of data, two new directories are created called a sequence.
Directory 1 should always be 9 files larger than Directory 2.
I’m using ls | wc –l to output the number of files in each directory then manually doing the difference.
For example
Sequence 151
Directory 1 /raid2/xxx/xxxx/NHY274938WSP1151-OnlineSEHD-hyp (1911 files) – after WSP1 is the seq number.
Directory 2 - /raid/xxx/ProjectNumber/xxxx/seq0151 (1902 files)
Sequence 152
Directory 1 /raid2/xxx/xxxx/NHY274938WSP1152-OnlineSEHD-hyp (1525 files)
Directory 2 - /raid/xxx/ProjectNumber/xxxx/seq0152 (1516 files)
Is there a script that will output the difference (minus 9) for every sequence.
Ie
151 diff= 0
152 diff =0
That works great however:
I can now see some sequences in
Directory 1 (RAW/all files) it contains extra files that i dont want compared against diectory 2 these are:
At the beginning Warmup files (not set amount every sequence)
Duplicate files with an _
For example :
20329.uutt -warmup
20328.uutt -warmup
.
.
21530.uutt First good file after warmup
.
.
19822.uutt
19821.uutt
19820.uutt
19821_1.uutt
Directory 2 (reprocessed /missing files) doesn’t include warmup shots or Duplicate files with an _
For example :
Missing shots
*021386 – first available file (files are missing before).
*021385
.
.
*019822
*019821
*019820
Could we remove warmup files and any duplicates I should have number of missing files?
Or output
diff, D1#warmup files, D1#duplicate files, TOTdiff
to get D1#duplicate files maybe I could count the total number of occurances of _.uutt
to get D1#warmup files I have a log file where warmup shots have a "WARM" at the end of each line. in /raid2/xxx/xxxx/NHY274938WSP1151.log
i.e.
"01/27/21 15:33:51 :FLD211018WSP1004: SP:21597: SRC:2: Shots:1037: Manifold:2020:000 Vol:4000:828 Spread: 1.0:000 FF: nan:PtP: 0.000:000 WARM"
"01/27/21 15:34:04 :FLD211018WSP1004: SP:21596: SRC:4: Shots:1038: Manifold:2025:000 Vol:4000:000 Spread: 0.2:000 FF: nan:PtP: 0.000:000 WARM"
Is there a script that will output the difference (minus 9) for every sequence. Ie 151 diff= 0 152 diff =0
There it is:
#!/bin/bash
d1p=/raid2/xxx/xxxx/NHY274938WSP1 # Directory 1 prefix
d1s=-OnlineSEHD-hyp # Directory 1 suffix
d2=/raid/xxx/ProjectNumber/xxxx/seq0
for d in $d2*
do s=${d: -3} # extract sequence from Directory 2
echo $s diff=$(expr `ls $d1p$s$d1s|wc -l` - `ls $d|wc -l` - 9)
done
With filename expansion * we get all the directory names, and by removing the fixed part with the parameter expansion ${parameter:offset} we get the sequence.
For comparison here's a variant using arrays as suggested by tripleee:
#!/bin/bash
d1p=/raid2/xxx/xxxx/NHY274938WSP1 # Directory 1 prefix
d1s=-OnlineSEHD-hyp # Directory 1 suffix
d2=/raid/xxx/ProjectNumber/xxxx/seq0
shopt -s nullglob # make it work also for 0 files
for d in $d2*
do s=${d: -3} # extract sequence from Directory 2
f1=($d1p$s$d1s/*) # expand files from Directory 1
f2=($d/*) # expand files from Directory 2
echo $s diff=$((${#f1[#]} - ${#f2[#]} - 9))
done

How to print lines extracted from a log file within a specified time range?

I'd like to fetch result, let's say from 2017-12-19 19:14 till the entire day from a log file that looks like this -
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:00.723 Info: Saving /var/opt/MarkLogic/Forests/Meters/00001829
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:01.134 Info: Saved 9 MB at 22 MB/sec to /var/opt/MarkLogic/Forests/Meters/00001829
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:01.376 Info: Merging 19 MB from /var/opt/MarkLogic/Forests/Meters/0000182a and /var/opt/MarkLogic/Forests/Meters/00001829 to /var/opt/MarkLogic/Forests/Meters/0000182c, timestamp=15137318408510140
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:02.585 Info: Merged 18 MB in 1 sec at 15 MB/sec to /var/opt/MarkLogic/Forests/Meters/0000182c
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:05.200 Info: Deleted 15 MB at 337 MB/sec /var/opt/MarkLogic/Forests/Meters/0000182a
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:05.202 Info: Deleted 9 MB at 4274 MB/sec /var/opt/MarkLogic/Forests/Meters/00001829
I am new to Unix and familiar with grep command. I tried the below command
date="2017-12-19 [19-23]:[14-59]"
echo "$date"
grep "$date" $root_path_values
but it throws invalid range end error. Any solution ? The date is going to be coming from a variable so it will be unpredictable. Therefore, don't make a command just keeping the example in mind. $root_path_values is a sequence of error files like errorLog.txt, errorLog_1.txt, errorLog_2.txt and so on.
I'd like to fetch result, let's say from 2017-12-19 19:14 till the entire day … The date is going to be coming from a variable …
This is not a job for regular expressions. Since the timestamp has a sensible form, we can simply compare it as a whole, e. g.:
start='2017-12-19 19:14'
end='2017-12-20'
awk -vstart="$start" -vend=$end 'start <= $0 && $0 < end' ErrorLog_1.txt
egrep '2017-12-19 (19|2[0-3])\:(1[4-9]|[2-5][0-9])\:*\.*' path/to/your/file Try this regexp.
In the case if you need pattern in variable:
#!/bin/bash
date="2017-12-19 (19|2[0-3])\:(1[4-9]|[2-5][0-9])\:*\.*"
egrep ${date} path/to/your/file

How to resume reading a file?

I'm trying to find the best and most efficient way to resume reading a file from a given point.
The given file is being written frequently (this is a log file).
This file is rotated on a daily basis.
In the log file I'm looking for a pattern 'slow transaction'. End of such lines have a number into parentheses. I want to have the sum of the numbers.
Example of log line:
Jun 24 2015 10:00:00 slow transaction (5)
Jun 24 2015 10:00:06 slow transaction (1)
This is easy part that I could do with awk command to get total of 6 with above example.
Now my challenge is that I want to get the values from this file on a regular basis. I've an external system that polls a custom OID using SNMP. When hitting this OID the Linux host runs a couple of basic commands.
I want this SNMP polling event to get the number of events since the last polling only. I don't want to have the total every time, just the total of the newly added lines.
Just to mention that only bash can be used, or basic commands such as awk sed tail etc. No perl or advanced programming language.
I hope my description will be clear enough. Apologizes if this is duplicate. I did some researches before posting but did not find something that precisely correspond to my need.
Thank you for any assistance
In addition to the methods in the comment link, you can also simply use dd and stat to read the logfile size, save it and sleep 300 then check the logfile size again. If the filesize has changed, then skip over the old information with dd and read the new information only.
Note: you can add a test to handle the case where the logfile is deleted and then restarted with 0 size (e.g. if $((newsize < size)) then read all.
Here is a short example with 5 minute intervals:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((size != newsize)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\nnewtext:\n\n%s\n" "$newtext"
size=$((newsize)); ## update size to newsize
fi
sleep 300
done

Need a shell script for the following scenario

I have multiple log files in a directory /home/user/ with pattern x.log, y.log, z.log :
content of files are :
error
pass
fail
executed
not executed
Summary:
test 1
test 2
test 3
Finished in 2682 min 43.9 sec.
done
completed
i want output in a new single file from multiple log files as:
Summary:
test 1
test 2
test 3
Finished in 2682 min 43.9 sec.
Summary:
test 1
test 2
test 3
Finished in 2682 min 43.9 sec.
Summary:
test 1
test 2
test 3
Finished in 2682 min 43.9 sec.
Can you help me out with shell script
You can use awk:
awk '/Summary/ {run=1} run==1 {print} /Finished/ {run=0}' *.log > log.agr
This will take the contents of every file ending with .log, start writing to log.agr when it finds a line containing Summary, and then skip lines after a line containing Finished. It'll repeat that through the entire contents of all the *.log files.

fastest hashing in a unix environment?

I need to examine the output of a certain script 1000s of times on a unix platform and check if any of it has changed from before.
I've been doing this:
(script_stuff) | md5sum
and storing this value. I actually don't really need "md5", JUST a simple hash function which I can compare against a stored value to see if its changed. Its okay if there are an occassional false positive.
Is there anything better than md5sum that works faster and generates a fairly usable hash value? The script itself generates a few lines of text - maybe 10-20 on average to max 100 or so.
I had a look at fast md5sum on millions of strings in bash/ubuntu - that's wonderful, but I can't compile a new program. Need a system utility... :(
Additional "background" details:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I have no idea what the use of such a system would be, I'm just doing this as a job for someone else...
The cksum utility calculates a non-cryptographic CRC checksum.
How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.
And cmp won't give you any false positives or negatives :-)
pax> echo hello >qq1.txt
pax> echo goodbye >qq2.txt
pax> cp qq1.txt qq3.txt
pax> cmp qq1.txt qq2.txt >/dev/null
pax> echo $?
1
pax> cmp qq1.txt qq3.txt >/dev/null
pax> echo $?
0
Based on your question update:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).
#!/bin/bash
rm -rf qqtemp
mkdir qqtemp
((i = 0))
while [[ $i -ne 5000 ]] ; do
#dig microsoft.com +short >qqtemp/microsoft.com.$i
dig microsoft.com +short >/dev/null
((i = i + 1))
done
The elapsed times at 5 runs each are:
File I/O | /dev/null
----------+-----------
3:09 | 1:52
2:54 | 2:33
2:43 | 3:04
2:49 | 2:38
2:33 | 3:08
After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only 1/1250th of a second per item.
However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.
Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.
Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.

Resources