AWK - Getting all columns where first is = $var && Date >= $date - bash

Im new to AWK, and am trying to work out how to get all the results where the first column is equal to a variable and the date is greater than another unix timestamp formatted variable. Im using 'last' as my command. Example output is:
bob pts/2 172.6.14.37 Fri July 24 12:43 - 12:17 (9+23:34)
bob pts/2 172.6.14.37 Fri July 24 10:03 - 12:17 (5+23:34)
bob pts/2 172.6.14.37 Tue June 4 17:55 - 09:42 (8+15:46)
bob pts/2 172.6.14.37 Tue Mar 4 17:55 - 09:42 (8+15:46)
tim pts/1 172.6.14.37 Mon Mar 3 16:22 - 17:30 (1+01:08)
root pts/1 172.6.14.37 Thu Feb 27 09:38 - 09:56 (4+00:18)
and so I want all the results where 'bob' is in the first column. I've got
last -f /var/log/btmp | awk '$1 == "bob"'
Which gives me all bobs failed logins. Now I need to filter again where the date filed is greater than say '20140723145100' something like
last -f /var/log/btmp | awk '$1 == "bob" && $4 >= $DATE'
Assuming $DATE = 20140723145100 , the result I would want would be :
bob pts/2 172.6.14.37 Fri July 24 12:43 - 12:17 (9+23:34)
bob pts/2 172.6.14.37 Fri July 24 10:03 - 12:17 (5+23:34)

bash:
user=bob
since=20140623145100
last -Fa -f /var/log/btmp |
while read line; do
set -- $line # no quotes here
[[ $1 == "$user" ]] || continue
[[ $(date -d "$3 $4 $5 $6 $7" +%Y%m%d%H%M%S) > $since ]] && echo "$line"
done

Use the -s option in last:
last -s 20140723145100
From man last:
-s, --since time
Display the state of logins since specified time. This is useful,
e.g., to determine easily who was logged in at a particular time. The
option is often combined with --until.
And then grep for the user:
last -s 20140723145100 | grep "^bob"
As you do not have the -s option, you can use this workaround: store all the last output and the output until a certain time (using -t option). Then compare the output:
last -f /var/log/btmp | grep "^bob" > everything
last -f /var/log/btmp -t "20140723145100" | grep "^bob" > upto_20140723145100
grep -vf upto_20140723145100 everything

Using GNU Awk:
gawk -v user=bob -v date=20140723145100 -F '[[:space:]]{3,}| - ' '$1 == user { cmd = "exec date -d \"" $4 "\" +%Y%m%d%H%M%S"; cmd | getline d; close(cmd); if (d >= date) print }' sample
Output:
bob pts/2 172.6.14.37 Fri July 24 12:43 - 12:17 (9+23:34)
bob pts/2 172.6.14.37 Fri July 24 10:03 - 12:17 (5+23:34)
Of course actual command is last -f /var/log/btmp | gawk -v user=bob -v date=20140723145100 ....
And here's a script version:
#!/usr/bin/gawk -f
BEGIN {
FS = "[[:space:]]{3,}| - "
}
$1 == user {
cmd = "exec date -d \"" $4 "\" +%Y%m%d%H%M%S"
cmd | getline d
close(cmd)
if (d >= date)
print
}
Usage:
last -f /var/log/btmp | gawk -v user=bob -v date=20140723145100 -f script.awk

Related

Get the longest logon time of a given user using awk

My task is to write a bash script, using awk, to find the longest logon of a given user ("still logged in" does not count), and print the month day IP logon time in minutes.
Sample input: ./scriptname.sh username1
Content of last username1:
username1 pts/ IP Apr 2 .. .. .. .. (00.03)
username1 pts/ IP Apr 3 .. .. .. .. (00.13)
username1 pts/ IP Apr 5 .. .. .. .. (12.00)
username1 pts/ IP Apr 9 .. .. .. .. (12.11)
Sample output:
Apr 9 IP 731
(note: 12 hours and 11 minutes is in total 731 minutes)
I have written this script, but a bunch of errors pop up, and I am really confused:
#!/bin/bash
usr=$1
last $usr | grep -v "still logged in" | awk 'BEGIN {max=-1;}
{
h=substr($10,2,2);
min=substr($10,5,2) + h/60;
}
(max < min){
max = min;
}
END{
maxh=max/60;
maxmin=max-maxh;
($maxh == 0 && $maxmin >=10){
last $usr | grep "00:$maxmin" | awk '{print $5," ",$6," ", $3," ",$maxmin}'
exit 1
}
($maxh == 0 $$ $maxmin < 10){
last $usr | grep "00:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh < 10 && $maxmin == 0){
last $usr | grep "0$maxh:00" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh < 10 && $maxmin < 10){
last $usr | grep "0$maxh:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh >= 10 && $maxmin < 10){
last $usr | grep "$maxh:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh >=10 && $maxmin >= 10){
last $usr | grep "$maxh:$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
}'
So a bit of explaining of how I imagined this would work:
After the initialization, I want to find the (hh:mm) column of the last $usr command, save the h and min of every line, find the biggest number (in minutes, meaning it is the longest logon time).
After I found the longest logon time (in minutes, stored in the variable max), I then have to reformat the only minutes format to hh:mm to be able to use a grep, use the last command again, but now only searching for the line(s) that contain the max logon time, and print all of the needed information in the month day IP logon time in minutes format, using another awk.
Errors I get when running this code: A bunch of syntax errors when I try using grep and awk inside the original awk.
awk is not shell. You can't directly call tools like last, grep and awk from awk any more than you could call them directly from a C program.
Using any awk in any shell on every Unix box and assuming if multiple rows have the max time you'd want all of them printed and that if no timestamped rows are found you want something like No matching records printed (easy tweak if not, just tell us your requirements for those cases and include them in the example in your question):
last username1 |
awk '
/still logged in/ {
next
}
{
split($NF,t,/[().]/)
cur = (t[2] * 60) + t[3]
}
cur >= max {
out = ( cur > max ? "" : out ORS ) $4 OFS $5 OFS $3 OFS cur
max = cur
}
END {
print (out ? out : "No matching records")
}
'
Apr 9 IP 731
If gnu-awk is available, you might use a pattern with 2 capture groups for the numbers in the last field. In the END block print the format that you want.
If in this example, file contains the example content, and the last column contains the logon:
awk '
match ($(NF), /\(([0-9]+)\.([0-9]+)\)/, a) {
hm = (a[1] * 60) + a[2]
if(hm > max) {max = hm; line = $0;}
}
END {
n = split(line,a,/[[:space:]]+/)
print a[3], a[4], a[5], max
}
' file
Output
IP Apr 9 731
Testing last command in my machine:
Using Red Hat Linux 7.8
Got the following output:
user0022 pts/1 10.164.240.158 Sat Apr 25 19:32 - 19:47 (00:14)
user0022 pts/1 10.164.243.80 Sat Apr 18 22:31 - 23:31 (1+01:00)
user0022 pts/1 10.164.243.164 Sat Apr 18 19:21 - 22:05 (02:43)
user0011 pts/0 10.70.187.1 Thu Nov 21 15:26 - 18:37 (03:10)
user0011 pts/0 10.70.187.1 Thu Nov 7 16:21 - 16:59 (00:38)
astukals pts/0 10.70.187.1 Mon Oct 7 19:10 - 19:13 (00:03)
reboot system boot 3.10.0-957.10.1. Mon Oct 7 22:09 - 14:30 (156+17:21)
astukals pts/0 10.70.187.1 Mon Oct 7 18:56 - 19:08 (00:12)
reboot system boot 3.10.0-957.10.1. Mon Oct 7 21:53 - 19:08 (-2:-44)
IT pts/0 10.70.187.1 Mon Oct 7 18:50 - 18:53 (00:03)
IT tty1 Mon Oct 7 18:48 - 18:49 (00:00)
user0022 pts/1 30.30.30.168 Thu Apr 16 09:43 - 14:54 (05:11)
user0022 pts/1 30.30.30.59 Wed Apr 15 11:48 - 04:59 (17:11)
user0022 pts/1 30.30.30.44 Tue Apr 14 19:03 - 04:14 (09:11)
Found time format is DD+HH:MM appears only when DD is not zero.
Found there are additional technical users: IT, system, reboot need to filtered.
Suggesting solution:
last | awk 'BEGIN {FS="[ ()+:]*"}
/reboot|system|still/{next}
{ print $5 OFS $6 OFS $3 OFS $(NF-1) + ($(NF-2) * 60) + ($(NF-3) * 60 * 24)}
' |sort -nk 4| head -1
Result:
Apr 15 30.30.30.59 85991

calculate start and end of working, grep in logfile

I need to know when I can do a maintance on a frequently used system. All I can check is a logfile, where I can see when the users are starting and ending there work in average.
I need to do this for weekdays, saturday and sunday.
I know how to grep these information but I don't know how to separate weekdays from weekends and how to build an average from the timestamps. Can anyone help me with that please? Kind regards
Edit: More information as requested
Here is my script so far:
i=14
while i >=0
do dow=$(date -d "-$i day" +%A)
if [ $dow = "Saturday" ] || [ $dow = "Sunday" ]
then i=$((i-1))
fi
beginnweek+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|head -n 1|cut -d " " -f2`)
endweek+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|tail -n 1|cut -d " " -f2`)
i=$((i-1))
done
###calculate average beginn and end - Thats what missing
i=14
while i >=0
do dow=$(date -d "-$i day" +%A)
if [ $dow = "Monday" ] || [ $dow = "Tuesday" ] || [ $dow = "Wednesday" ] || [ $dow = "Thursday" ] || [ $dow = "Friday" ] || [ $dow = "Sunday" ]
then i=$((i-1))
fi
beginnSat+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|head -n 1|cut -d " " -f2`)
endSat+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|tail -n 1|cut -d " " -f2`)
i=$((i-1))
done
###calculate average beginn and end - Thats what missing
i=14
while i >=0
do dow=$(date -d "-$i day" +%A)
if [ $dow = "Monday" ] || [ $dow = "Tuesday" ] || [ $dow = "Wednesday" ] || [ $dow = "Thursday" ] || [ $dow = "Friday" ] || [ $dow = "Saturday" ]
then i=$((i-1))
fi
beginnSun+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|head -n 1|cut -d " " -f2`)
endSun+=(`zgrep T400: logfile|grep -v 'T811:Icinga'|tail -n 1|cut -d " " -f2`)
i=$((i-1))
done
###calculate average beginn and end - Thats what missing
I'm working with
GNU bash, version 4.2.46
on SLES and with
GNU bash, version 3.1.17
The logfiles are looking like this:
19/10/2018 04:00:03.175 : [32631] INFO : (8) >>\\\\\\\\\\T090:NOPRINT,NOSAVE|T400:551200015480|T811:Icinga|T8904:001|T8905:001|//////////
19/10/2018 07:17:19.501 : [4935] INFO : >>\\\\\\\\\\T021:datamax|T050:software|T051:V 1.0|T101:|T400:428568605212|T520:00000000|T510:|T500:|T545:19.10.2018||T821:DE|PRINTINFO:|PRINT1:|PRINT0:intermec pf4i.int01|//////////
First of all you should ask yourself if you really want to use an average. An average makes only sense if all users login at morning, stay logged in over noon, and logout at evening. If you have logouts distributed all over the day the average logout time is meaningless.
But even in such an idealized case you shouldn't start maintenance right after the average logout time since around 50% of the users would still be logged in at that time.
I would rather visualize logins as bars and determine a good maintenance time by hand. ranwhen.py (see picture below) is a very nice tool to display when your system was up. Maybe you can find something similar for logins or adapt the tool yourself.
Nevertheless, here's what you asked for:
Parsing The Logs
Instead of parsing the log manually, I would advise you to use the last tool, which prints the last logins in a simpler format. Since you are on Linux, there should be an -F option for last to print dates prefixed with their weekday. With -R we suppress some unneeded information. The output of last -FR looks as follows:
socowi pts/5 Fri Oct 19 17:42:16 2018 still logged in
reboot system boot Fri Oct 19 14:34:44 2018 still running
alice pts/2 Fri Oct 19 10:35:05 2018 - Fri Oct 19 11:51:03 2018 (01:15)
alice tty7 Fri Oct 19 10:24:32 2018 - Fri Oct 19 11:51:52 2018 (01:27)
bob tty7 Fri Oct 19 10:04:21 2018 - Fri Oct 19 10:14:01 2018 (00:09)
reboot system boot Fri Oct 19 12:03:34 2018 - Fri Oct 19 11:51:55 2018 (00:-11)
carol tty7 Fri Oct 19 08:10:49 2018 - down (01:50)
dave tty7 Thu Oct 18 12:48:12 2018 - crash (04:28)
wtmp begins Tue Oct 16 12:38:03 2018
To extract the valid login and logout dates we use the following functions.
onlyUsers() { last -FR | head -n -2 | grep -Ev '^reboot '; }
onlyDates() { grep -F :; }
loginDates() { onlyUsers | cut -c 23-46 | onlyDates; }
logoutDates() { onlyUsers | cut -c 50-73 | onlyDates; }
Filter By Weekday
The functions loginDates and logoutDates print something like
Fri Oct 19 17:42:16 2018
Fri Oct 19 14:34:44 2018
[...]
Thu Oct 18 12:48:12 2018
Filtering out specific weekdays is pretty easy:
workweek() { grep -E 'Mon|Tue|Wed|Thu|Fri'; }
weekend() { grep -E 'Sat|Sun'; }
If you want all login dates on weekends, you would write loginDates | weekend.
Computing An Average Time
To compute the average time from multiple dates, we first extract the time of day from the dates. Then we convert the HH:MM format to minutes since midnight. Computing an average of a list numbers is easy. Afterwards we convert back to HH:MM.
timeOfDay() { cut -c 12-16; }
timeToMins() { awk -F: '{print $1*60 + $2}'; }
minsToTime() { awk '{printf "%02d:%02d", $1/60, $1%60}'; }
avgMins() { awk '{s+=$1}END{printf "%d", s/NR}'; }
avgTime() { timeOfDay | timeToMins | avgMins | minsToTime; }
Putting Everything Together
To get the average times just combine the commands as needed. Some examples:
# Average login times during workweeks
avg="$(loginDates | workweek | avgTime)"
# Average logout times on weekends
avg="$(logoutDates | weekend | avgTime)"

Csh - Fetching fields via awk inside xargs

I'm struggling to understand this behavior:
Script behavior: read a file (containing dates); print a list of files in a multi-level directory tree and get their size, print the file size only, (future step: sum the overall file size).
Starting script:
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | head"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
[and so on]
But when I try to filter via awk on the first field, I still get the whole line
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print $1}'"
2000-03:
1000 /folder/2000-03balbasldas
2000-04:
12300 /folder/2000-04asdwqdas
I've already approached it via divide-et-impera, and the following command works just fine:
du -d 2 "/folder/" | grep '2000-03' | awk '{print $1}'
1000
I'm afraid that I'm missing something very trivial, but I haven't found anything so far.
Any idea? Thanks!
Input: directory containing folders named YYYY-MM-random_data and a file containing strings:
ls -l
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-03-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-04-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-05-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablabla
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablb
drwxr-xr-x 2 user staff 68 Apr 24 11:21 2000-06-blablablc
[...]
cat dates
2000-03
2000-04
2000-05
[...]
Expected output: sum of the disk space occupied by all the files contained in the folder whose name include the string in the file dates
2000-03: 1000
2000-04: 2123
2000-05: 1222112
[...]
======
But in particular, I'm interested in why awk is not able to fetch the column $1 I asked it to.
Ok it seems I found the answer myself after a lot of research :D
I'll post it here, hoping that it will help somebody else out.
https://unix.stackexchange.com/questions/282503/right-syntax-for-awk-usage-in-combination-with-other-command-inside-xargs-sh-c
The trick was to escape the $ sign.
cat dates | xargs -I {} sh -c "echo '{}: '; du -d 2 "/folder/" | grep {} | awk '{print \$1}'"
Using GNU Parallel it looks like this:
parallel --tag "eval du -s folder/{}* | perl -ne '"'$s+=$_ ; END {print "$s\n"}'"'" :::: dates
--tag prepends the line with the date.
{} is replaced with the date.
eval du -s folder/{}* finds all the dirs starting with the date and gives the total du from those dirs.
perl -ne '$s+=$_ ; END {print "$s\n"}' sums up the output from du
Finally there is bit of quoting trickery to get it quoted correctly.

How can I extract the data between two time in two or more log files

I have two log files namely, Log1.log and Log2.log each containing following data.
Log1.log:
Apr 10 02:07:20 Data 1
May 10 04:11:09 Data 2
June 11 06:22:35 Data 3
Aug 12 09:08:07 Data 4
Log2.log
Apr 10 09:07:20 Data 1
Apr 10 10:07:10 Data 2
Jul 11 11:07:30 Data 3
Aug 18 12:50:40 Data 4
What command I can use to get the data between Apr 10 02:07:20 to Aug 18 12:50:40.
I have used
$ awk -v start=01:06:04 -v stop=01:07:16 'start <= $3 && $3 <= stop' Log1.log Log2.log
I have also used
awk -v StartTime="$StartTime" -v EndTime="$EndTime" -f script.sh Log1.log Log2.log
where script.sh contains,
BEGIN { Keep = 0;}
{
if($3 >= StartTime)
{
keep = 1;
}
if ($3 > EndTime)
{
exit;
}
if(keep)
{
print;
}
}
I am not getting the desired result. Can someone help me in improving me answer?Thanks in advance
I would first use sort to sort the input. Then I would use sed to extract that range:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 18 12:50:40/p'
Btw, it is not fully clear to me if you want to exclude or include the range borders. The above example includes them, the below example excludes them:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 12 09:08:07/{/Apr 10 02:07:20/!{/Aug 12 09:08:07/!p}}
At least GNU sed allows to simplify the latter command to:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 12 09:08:07/{//!p}'

pick up files based on dates in ksh script

I have this list of files . Now I will have to pick the latest file based on some condition
3679 Jul 21 23:59 belk_rpo_error_**po9324892**_07212014.log
0 Jul 22 23:59 belk_rpo_error_**po9324892**_07222014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
0 Jul 20 05:50 belk_rpo_error_**po9999992**_07202014.log
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
742 Jul 21 07:30 belk_rpo_error_**po9999991**_07212014.log
0 Jul 23 2014 belk_rpo_error_**po9999991**_07232014.log
For a PATRICULAR Order_No(Marked with ** **)
If the latest file is 0 kB then we will discard it (rest of the files with same Order_no as well)
if the latest file is non Zero then I will take it.(Only the latest one)
Then append the contents in a txt file .
My expected output would be ::
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
I am at my wits end here. I cant seem to figure out how to compare dates in Unix. Any help is very appreciated.
You can try something like:
touch test.txt
for var in ` find . ! -empty -exec ls -r {} \;`
do
cat $var>>test.txt
done
untested
use stat to emit date (epoch time), size and filename.
use awk to filter out zero-length files and extract order number.
sort by order number and date
awk to pick up the last filename for each order number
stat -c $'%Y\t%s\t%n' *.log |
awk -F'\t' -v OFS='\t' '
$2 > 0 {
split($3, a, /_/)
print a[4], $1, $3
}' |
sort -t $'\t' -k1,1 -k2,2n |
awk -F'\t' '
NR > 1 && $1 != prev_order {print filename}
{filename = $3; prev_order = $1}
END {print filename}
'
The sort command might be wrong: In order to group by order number, you might need to sort first by file time then by order number.
If I understand your question, the resulting files need to be concatenated and appended to a file. If the above pipeline is working OK, then pipe into | xargs cat >> something.log

Resources