How can I get the number of logins of each day from the beginning of the wtmp file using AWK?
I thought about using an associative array but I don't know how to implement it in AWK..
myscript.sh
#!/bin/bash
awk 'BEGIN{numberoflogins=0}
#code goes here'
The output of the last command:
[fnorbert#localhost Documents]$ last
fnorbert tty2 /dev/tty2 Mon Apr 24 13:25 still logged in
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 16:25 still running
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 13:42 still running
fnorbert tty2 /dev/tty2 Fri Apr 21 16:14 - 21:56 (05:42)
reboot system boot 4.8.6-300.fc25.x Fri Apr 21 19:13 - 21:56 (02:43)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:31 - 10:02 (01:30)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:30 - 10:02 (00:-27)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:14 - 08:26 (00:11)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:13 - 08:26 (-1:-47)
wtmp begins Mon Mar 6 09:39:43 2017
The shell script's output should be:
Apr 4: 4
Apr 21: 2
Apr 24: 3
, using associative array if it's possible
In awk, arrays can be indexed by strings or numbers, so you can use it like an associative array.
However, what you're asking will be hard to do with awk reliably because the delimiters are whitespace, therefore empty fields will throw off the columns, and if you use FIELDWIDTHS you'll also get thrown off by columns longer than their assigned width.
If all you're looking for is just the number of logins per day you might want to use a combination of sed and awk (and sort):
last | \
sed -E 's/^.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([ 0-9]{2}).*$/\2 \3/p;d' | \
awk '{arr[$0]++} END { for (a in arr) print a": " arr[a]}' | \
sort -M
The sed -E uses extended regular expressions, and the pattern just prints the date of each line that is emitted by last (This matches on the day of week, but only prints the Month and Date)
We could have used uniq -c to get the counts, but using awk we can do an associative array as you hinted.
Finally using sort -M we're sorting on the abbreviated date formats like Apr 24, Mar 16, etc.
Try the following awk script(assuming that the month is the same, points to current month):
myscript.awk:
#!/bin/awk -f
{
a[NR]=$0; # saving each line into an array indexed by line number
}
END {
for (i=NR-1;i>1;i--) { # iterating lines in reverse order(except the first/last line)
if (match(a[i],/[A-Z][a-z]{2} ([A-Z][a-z]{2}) *([0-9]{1,2}) [0-9]{2}:[0-9]{2}/, b))
m=b[1]; # saving month name
c[b[2]]++; # accumulating the number of occurrences
}
for (i in c) print m,i": "c[i]
}
Usage:
last | awk -f myscript.awk
The output:
Apr 4: 4
Apr 21: 2
Apr 24: 3
Related
This question already has answers here:
How do I pipe a file line by line into multiple read variables?
(3 answers)
Closed 1 year ago.
I am currently struggling, with splitting a string with a varying amount of spaces, coming from a log file.
An excerpt of the log file:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)
As you can see the amount of spaces between the process name and the date of execution varies between 2 to 5 spaces.
I would like to split it up into three parts; - process, date of execution, and execution time.
However I don’t see a solution to that, because of the unequal amount of spaces. Am I wrong or is splitting such a string incredibly hard?
Hopefully somebody out there is way smarter than me and can provide me with a solution for that 😊
Thanks to everybody in advance, who is willing trying to help me with that!
You can also assign fields directly in a read.
while read -r prc wd mon md start _ end dur _; do
echo "prc='$prc' wd='$wd' mon='$mon' md='$md' start='$start' end='$end' dur='${dur//[)(]/}'"
done < file
Output:
prc='ProcessA' wd='Mon' mon='Nov' md='9' start='09:59' end='10:48' dur='00:48'
prc='ProcessB' wd='Sun' mon='Nov' md='8' start='11:16' end='11:17' dur='00:00'
prc='ProcessC' wd='Sat' mon='Nov' md='7' start='12:52' end='12:53' dur='00:00'
prc='ProcessD' wd='Fri' mon='Nov' md='6' start='09:31' end='11:25' dur='01:54'
prc='ProcessE' wd='Thu' mon='Nov' md='5' start='16:41' end='16:41' dur='00:00'
prc='ProcessF' wd='Thu' mon='Nov' md='5' start='11:39' end='11:40' dur='00:00'
read generally doesn't care how much whitespace is between.
In bash, you can use a regex to parse each line:
#! /bin/bash
while IFS=' ' read -r line ; do
if [[ "$line" =~ ([^\ ]+)\ +(.+[^\ ])\ +'('([^\)]+)')' ]] ; then
process=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}
time=${BASH_REMATCH[3]}
echo "$process $date $time."
fi
done
Or, use parameter expansions:
#! /bin/bash
while IFS=' ' read -r process datetime ; do
shopt -s extglob
date=${datetime%%+( )\(*}
time=${datetime#*\(}
time=${time%\)}
echo "$process $date $time."
done
Using awk:
awk '{printf $1; for (i=2; i<NF; i++) printf " %s",$i; print "",$NF}' < file.txt
produces:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)
I don't want to print repeated lines based on column 6 and 7. sort -u does not seem to help
cat /tmp/testing :-
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
-rwxrwxr-x. 1 root root 27441394 Feb 27 13:12 /home/something/bin/keychain_exec
-rwxrwxr-x. 1 root root 45570820 Feb 27 13:11 /home/something/bin/wallnut_exec
-rwxrwxr-x. 1 root root 10942993 Feb 27 13:12 /home/something/bin/log_exec
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
When I try cat /tmp/testing | sort -u -k 6,6 -k 7,7 I get :-
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
Desired output is below, as that is the only file different from others based on month and date column
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
[not] to print repeated lines based on column 6 and 7 using awk, you could:
$ awk '
++seen[$6,$7]==1 { # count seen instances
keep[$6,$7]=$0 # keep first seen ones
}
END { # in the end
for(i in seen)
if(seen[i]==1) # the ones seen only once
print keep[i] # get printed
}' file # from file or pipe your ls to the awk
Output for given input:
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
Notice: All standard warnings against parsing ls output still apply.
tried on gnu sed
sed -E '/^\s*(\S+\s+){5}Feb\s+27/d' testing
tried on gnu awk
awk 'NR==1{a=$6$7;next} a!=$6$7{print}' testing
I'm trying to write a shell script that displays unique Names, user name and Date using finger command.
Right now when I enter finger, it displays..
Login Name Tty Idle Login Time Office
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
2xxxx xxxx pts/23 2 Dec 2 21:35 (108.252.136.12)
2zzzz zzzz pts/61 13 Dec 2 20:46 (24.4.205.223)
2yyyy yyyy pts/32 57 Dec 2 21:06 (205.154.255.145)
1zzz zzz pts/35 37 Dec 2 20:56 (71.198.36.189)
1zzz zzz pts/48 12 Dec 2 20:56 (71.198.36.189)
I would the script to eliminate the unique values of the username and display it like..
xyz (1xyz) Dec 2 18:24
xxxx (2xxxx) Dec 2 21:35
zzzz (2zzzz) Dec 2 20:46
yyyy (2yyyy) Dec 2 21:06
zzz (1zzz) Dec 2 20:56
the Name is in the first column and the user name is in () and Date is last column
Thanks in Advance!
Ugly but should work.
finger | sed 's/\t/ /' | sed 's/pts\/[0-9]* *[0-9]*//' | awk '{print $2"\t("$1")\t"$3" "$4" "$5}' | sort | uniq
Unique names with sort-u is the easy part.
When you only want to parse the data in your example, you can try matching all strings in one command.
finger | sed 's/^\([^ ]*\) *\([^ ]*\) *pts[^A-Z]*\([^(]*\).*/\2\t(\1)\t\3/'
However, this is hard work and waiting to fail. My finger returns
Login Name Tty Idle Login Time Where
notroot notroot *:0 - Nov 26 15:30 console
notroot notroot pts/0 7d Nov 26 15:30
notroot notroot *pts/1 - Nov 26 15:30
You can try to improve the sed command, good luck with that!
I think the only way is looking at the columns: Read the finger output one line a time and slice each line with ${line:start:len} into parts (and remove spaces afterwards). Have a nice count (and be aware for that_user_with_a_long_name).
I was trying to search for string 'Cannot proceed: the database is empty' in file out.log from bottom to top only (as log file is quite huge and everyday it appends the log at last only) during time-stamps yesterday 10:30 pm to today 00:30 am only.
Extract from out.log is as below:
[Thu Jun 5 07:56:17 2014]Local/data///47480280486528/Info(1019022)
Writing Database Mapping For [data]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1250008)
Setting Outline Paging Cachesize To [8192KB]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
[Thu Jun 5 07:56:20 2014]Local/data///47480280486528/Info(1013205)
Received Command [Load Database]
[Thu Jun 5 07:56:21 2014]Local/data///47480280486528/Info(1019018)
Writing Parameters For Database
I searched on google and SO and explored commands like sed and grep but unfortunately it seems like grep doesn't parse timestamps and sed prints all lines between two patterns.
Can anybody please let me know how I can achieve this ?
You can make use of date comparison in awk:
tac file | awk '/Cannot proceed: the database is empty/ {f=$0; next} f{if (($3==5 && $4>"22:30:00") || ($4==6 && $4<="00:30:00")) {print; print f} f=""}'
Test
For this given file:
$ cat a
[Thu Jun 5 07:56:17 2014]Local/data///47480280486528/Info(1019022)
Writing Database Mapping For [data]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1250008)
Setting Outline Paging Cachesize To [8192KB]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
[Thu Jun 5 07:56:20 2014]Local/data///47480280486528/Info(1013205)
Received Command [Load Database]
[Thu Jun 5 07:56:21 2014]Local/data///47480280486528/Info(1019018)
Writing Parameters For Database
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Writing Parameters For Database
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Cannot proceed: the database is empty
[Thu Jun 5 22:56:21 2014]Local/data///47480280486528/Info(1019018)
Cannot proceed: the database is empty
It returns:
$ tac a | awk '/Cannot proceed: the database is empty/ {f=$0; next} f{if (($3==5 && $4>"22:30:00") || ($4==6 && $4<="00:30:00")) {print; print f} f=""}'
[Thu Jun 5 22:56:21 2014]Local/data///47480280486528/Info(1019018)
Cannot proceed: the database is empty
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Cannot proceed: the database is empty
You can get the line with this:
awk '/Cannot proceed: the database is empty/{ts = last; msg = $0; next}; {last = $0}; END{if (ts) printf "%s\n%s\n", ts, msg}' log
Output:
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
It should be easy to refine the code depending on which part is really needed.
i'm just wondering how can we use awk to do exact matches.
for eg
$ cal 09 09 2009
September 2009
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
$ cal 09 09 2009 | awk '{day="9"; col=index($0,day); print col }'
17
0
0
11
20
0
8
0
As you can see the above command outputs the index number of all the lines that contain the string/number "9", is there a way to make awk output index number in only the 4th line of cal output above.??? may be an even more elegant solution?
I'm using awk to get the day name using the cal command. here's the whole line of code:
$ dayOfWeek=$(cal $day $month $year | awk '{day='$day'; split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array); column=index($o,day); dow=int((column+2)/3); print array[dow]}')
The problem with the above code is that if multiple matches are found then i get multiple results, whereas i want it to output only one result.
Thanks!
Limit the call to index() to only those lines which have your "day" surrounded by spaces:
awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Proof of Concept
$ cal 02 1956
February 1956
Su Mo Tu We Th Fr Sa
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29
$ day=18; cal 02 1956 | awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Saturday
Update
If all you are looking for is to get the day of the week from a certain date, you should really be using the date command like so:
$ day=9;month=9;year=2009;
$ dayOfWeek=$(date +%A -d "$day/$month/$year")
$ echo $dayOfWeek
Wednesday
you wrote
cal 09 09 2009
I'm not aware of a version of cal that accepts day of month as an input,
only
cal ${mon} (optional) ${year} (optional)
But, that doesn't affect your main issue.
you wrote
is there a way to make awk output index number in only the 4th line of cal output above.?
NR (Num Rec) is your friend
and there are numerous ways to use it.
cal 09 09 2009 | awk 'NR==4{day="9"; col=index($0,day); print col }'
OR
cal 09 09 2009 | awk '{day="9"; if (NR==4) {col=index($0,day); print col } }'
ALSO
In awk, if you have variable assignments that should be used throughout your whole program, then it is better to use the BEGIN section so that the assignment is only performed once. Not a big deal in you example, but why set bad habits ;-)?
HENCE
cal 09 2009 | awk 'BEGIN{day="9"}; NR==4 {col=index($0,day); print col }'
FINALLY
It is not completely clear what problem you are trying to solve. Are you sure you always want to grab line 4? If not, then how do you propose to solve that?
Problems stated as " 1. I am trying to do X. 2. Here is my input. 3. Here is my output. 4. Here is the code that generated that output" are much easier to respond to.
It looks like you're trying to do date calculations. You can be much more robust and general solutions by using the gnu date command. I have seen numerous useful discussions of this tagged as bash, shell, (date?).
I hope this helps.
This is so much easier to do in a language that has time functionality built-in. Tcl is great for that, but many other languages are too:
$ echo 'puts [clock format [clock scan 9/9/2009] -format %a]' | tclsh
Wed
If you want awk to only output for line 4, restrict the rule to line 4:
$ awk 'NR == 4 { ... }'