using awk to do exact match in a file - bash

i'm just wondering how can we use awk to do exact matches.
for eg
$ cal 09 09 2009
September 2009
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
$ cal 09 09 2009 | awk '{day="9"; col=index($0,day); print col }'
17
0
0
11
20
0
8
0
As you can see the above command outputs the index number of all the lines that contain the string/number "9", is there a way to make awk output index number in only the 4th line of cal output above.??? may be an even more elegant solution?
I'm using awk to get the day name using the cal command. here's the whole line of code:
$ dayOfWeek=$(cal $day $month $year | awk '{day='$day'; split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array); column=index($o,day); dow=int((column+2)/3); print array[dow]}')
The problem with the above code is that if multiple matches are found then i get multiple results, whereas i want it to output only one result.
Thanks!

Limit the call to index() to only those lines which have your "day" surrounded by spaces:
awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Proof of Concept
$ cal 02 1956
February 1956
Su Mo Tu We Th Fr Sa
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29
$ day=18; cal 02 1956 | awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Saturday
Update
If all you are looking for is to get the day of the week from a certain date, you should really be using the date command like so:
$ day=9;month=9;year=2009;
$ dayOfWeek=$(date +%A -d "$day/$month/$year")
$ echo $dayOfWeek
Wednesday

you wrote
cal 09 09 2009
I'm not aware of a version of cal that accepts day of month as an input,
only
cal ${mon} (optional) ${year} (optional)
But, that doesn't affect your main issue.
you wrote
is there a way to make awk output index number in only the 4th line of cal output above.?
NR (Num Rec) is your friend
and there are numerous ways to use it.
cal 09 09 2009 | awk 'NR==4{day="9"; col=index($0,day); print col }'
OR
cal 09 09 2009 | awk '{day="9"; if (NR==4) {col=index($0,day); print col } }'
ALSO
In awk, if you have variable assignments that should be used throughout your whole program, then it is better to use the BEGIN section so that the assignment is only performed once. Not a big deal in you example, but why set bad habits ;-)?
HENCE
cal 09 2009 | awk 'BEGIN{day="9"}; NR==4 {col=index($0,day); print col }'
FINALLY
It is not completely clear what problem you are trying to solve. Are you sure you always want to grab line 4? If not, then how do you propose to solve that?
Problems stated as " 1. I am trying to do X. 2. Here is my input. 3. Here is my output. 4. Here is the code that generated that output" are much easier to respond to.
It looks like you're trying to do date calculations. You can be much more robust and general solutions by using the gnu date command. I have seen numerous useful discussions of this tagged as bash, shell, (date?).
I hope this helps.

This is so much easier to do in a language that has time functionality built-in. Tcl is great for that, but many other languages are too:
$ echo 'puts [clock format [clock scan 9/9/2009] -format %a]' | tclsh
Wed

If you want awk to only output for line 4, restrict the rule to line 4:
$ awk 'NR == 4 { ... }'

Related

Number of logins on Linux using Shell script and AWK

How can I get the number of logins of each day from the beginning of the wtmp file using AWK?
I thought about using an associative array but I don't know how to implement it in AWK..
myscript.sh
#!/bin/bash
awk 'BEGIN{numberoflogins=0}
#code goes here'
The output of the last command:
[fnorbert#localhost Documents]$ last
fnorbert tty2 /dev/tty2 Mon Apr 24 13:25 still logged in
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 16:25 still running
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 13:42 still running
fnorbert tty2 /dev/tty2 Fri Apr 21 16:14 - 21:56 (05:42)
reboot system boot 4.8.6-300.fc25.x Fri Apr 21 19:13 - 21:56 (02:43)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:31 - 10:02 (01:30)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:30 - 10:02 (00:-27)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:14 - 08:26 (00:11)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:13 - 08:26 (-1:-47)
wtmp begins Mon Mar 6 09:39:43 2017
The shell script's output should be:
Apr 4: 4
Apr 21: 2
Apr 24: 3
, using associative array if it's possible
In awk, arrays can be indexed by strings or numbers, so you can use it like an associative array.
However, what you're asking will be hard to do with awk reliably because the delimiters are whitespace, therefore empty fields will throw off the columns, and if you use FIELDWIDTHS you'll also get thrown off by columns longer than their assigned width.
If all you're looking for is just the number of logins per day you might want to use a combination of sed and awk (and sort):
last | \
sed -E 's/^.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([ 0-9]{2}).*$/\2 \3/p;d' | \
awk '{arr[$0]++} END { for (a in arr) print a": " arr[a]}' | \
sort -M
The sed -E uses extended regular expressions, and the pattern just prints the date of each line that is emitted by last (This matches on the day of week, but only prints the Month and Date)
We could have used uniq -c to get the counts, but using awk we can do an associative array as you hinted.
Finally using sort -M we're sorting on the abbreviated date formats like Apr 24, Mar 16, etc.
Try the following awk script(assuming that the month is the same, points to current month):
myscript.awk:
#!/bin/awk -f
{
a[NR]=$0; # saving each line into an array indexed by line number
}
END {
for (i=NR-1;i>1;i--) { # iterating lines in reverse order(except the first/last line)
if (match(a[i],/[A-Z][a-z]{2} ([A-Z][a-z]{2}) *([0-9]{1,2}) [0-9]{2}:[0-9]{2}/, b))
m=b[1]; # saving month name
c[b[2]]++; # accumulating the number of occurrences
}
for (i in c) print m,i": "c[i]
}
Usage:
last | awk -f myscript.awk
The output:
Apr 4: 4
Apr 21: 2
Apr 24: 3

How to get the data from log file with different months in unix?

I want to get the data between two times in a log file of different months and date.Suppose if my startime is not present in the logfile, then I want to extract the data from the nearest next time in the logfile. And also it has to end before the endtime, if the entered endtime is not present in the log file.
My log file data,
Apr 10 16 02:07:20 Data 1
Apr 11 16 02:07:20 Data 1
May 10 16 04:11:09 Data 2
May 12 16 04:11:09 Data 2
Jun 11 16 06:22:35 Data 3
Jun 12 16 06:22:35 Data 3
The solution I am using is,
awk -v start="$StartTime" -v stop="$EndTime" 'start <= $StartTime && $EndTime <= stop' $file
where, I am storing my starttime in $StartTime and endtime in $EndTimeBut Iam not getting the exact output. Please help.
Something like this maybe:
$ BashVarStart="16 05 10 00 00 00" # the same format that awk function will reformat to
$ BashVarStop="16 06 11 00 00 00"
$ awk -v start="$BashVarStart" -v stop="$BashVarStop" -F"[ :]" -v OFS=\ '
function reformatdate(m,d,y,h,mm,s) { # basically throw year to the beginning
monstr="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"; # numerize the months
split(monstr,monarr," "); # split monstr to an array to enumerate the months
# monarr[1]="Jan", monarr[2]="Feb" etc
for(i in monarr) { # iterate over all month numbers in monarr index
if(monarr[i]==m) # when month number matches
m=sprintf("%02d",i) # zeropad if month number below 10: 9 -> 09
};
return y" "m" "d" "h" "mm" "s # return in different order
}
start < reformatdate($1,$2,$3,$4,$5,$6) && stop > reformatdate($1,$2,$3,$4,$5,$6)
' test.in
May 10 16 04:11:09 Data 2
May 12 16 04:11:09 Data 2

Using Finger in Bash to display unique names

I'm trying to write a shell script that displays unique Names, user name and Date using finger command.
Right now when I enter finger, it displays..
Login Name Tty Idle Login Time Office
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
1xyz xyz pts/13 Dec 2 18:24 (76.126.34.32)
2xxxx xxxx pts/23 2 Dec 2 21:35 (108.252.136.12)
2zzzz zzzz pts/61 13 Dec 2 20:46 (24.4.205.223)
2yyyy yyyy pts/32 57 Dec 2 21:06 (205.154.255.145)
1zzz zzz pts/35 37 Dec 2 20:56 (71.198.36.189)
1zzz zzz pts/48 12 Dec 2 20:56 (71.198.36.189)
I would the script to eliminate the unique values of the username and display it like..
xyz (1xyz) Dec 2 18:24
xxxx (2xxxx) Dec 2 21:35
zzzz (2zzzz) Dec 2 20:46
yyyy (2yyyy) Dec 2 21:06
zzz (1zzz) Dec 2 20:56
the Name is in the first column and the user name is in () and Date is last column
Thanks in Advance!
Ugly but should work.
finger | sed 's/\t/ /' | sed 's/pts\/[0-9]* *[0-9]*//' | awk '{print $2"\t("$1")\t"$3" "$4" "$5}' | sort | uniq
Unique names with sort-u is the easy part.
When you only want to parse the data in your example, you can try matching all strings in one command.
finger | sed 's/^\([^ ]*\) *\([^ ]*\) *pts[^A-Z]*\([^(]*\).*/\2\t(\1)\t\3/'
However, this is hard work and waiting to fail. My finger returns
Login Name Tty Idle Login Time Where
notroot notroot *:0 - Nov 26 15:30 console
notroot notroot pts/0 7d Nov 26 15:30
notroot notroot *pts/1 - Nov 26 15:30
You can try to improve the sed command, good luck with that!
I think the only way is looking at the columns: Read the finger output one line a time and slice each line with ${line:start:len} into parts (and remove spaces afterwards). Have a nice count (and be aware for that_user_with_a_long_name).

How can i switch place of hour and minutes from a clock command (for crontab) using awk

I want to use a command to make a crontab that plays an alarm (for my wife). The program is called ipraytime and it gives an output like this.
$ ipraytime -u +2
Prayer schedule for,
City : Custom
Latitude : 021�� 25' 12" N
Longitude : 039�� 49' 47" E
Angle Method : Umm Al-Qurra University
TimeZone : UTC+2.0
Qibla : 061�� 45' 42" W of true North
Date Fajr Shorooq Zuhr Asr Maghrib Isha
--------------------------------------------------------------------
[09-05-2012] 4:19 5:43 12:16 15:35 18:48 20:18
Today's Imsaak : 4:11
Tomorrow's Imsaak : 4:10
Tomorrow's Fajr : 4:18
What i want is that the times format good for a crontab which means i need to switch places of the minute and hour. To be 19 4 instead.
I have made this command but don't know how to make that switch.
ipraytime -u +2| awk 'NR==12 {print $2"\n"$3"\n"$4"\n"$5"\n"$6"\n"$7}' | sed 's/:/ /g'
This gives me an output like this
4 19
5 43
12 16
15 35
18 48
20 18
But i want it to be like this
19 4
43 5
16 12
35 15
48 18
18 20
As that is what a crontab is using. I have played with sort a bit but couldn't find a solution there either.
(Sorry for the bad topic.. didn't know how to write a good one for this)
It's not necessary to use sed at all.
$ ipraytime -u +2 | awk -F ' +|:' 'NR == 12 {for (i = 2; i <= 12; i += 2) print $(i+1), $i}'
19 4
43 5
16 12
35 15
48 18
18 20
Use sed 's/\(.*\):\(.*\)/\2 \1/'
Command:
ipraytime -u +2 | awk 'NR==12 {print $2"\n"$3"\n"$4"\n"$5"\n"$6"\n"$7}'
| sed 's/\(.*\):\(.*\)/\2 \1/'

Convert multi-line file into TSV using awk

Am using Windows 7 & gawk 3.1.3 (via UnxUtils).
I'd like to turn this input (Liverpool FC's fixtures):
Sunday, 27 November 2011
Barclays Premier League
Liverpool v Man City, 16:00
Tuesday, 29 November 2011
Carling Cup
Chelsea v Liverpool, QF, 19:45
...
into a tab-separated file, such as:
Sunday, 27 November 2011<tab>Barclays Premier League<tab>Liverpool v Man City, 16:00
Tuesday, 29 November 2011<tab>Carling Cup<tab>Chelsea v Liverpool, QF, 19:45
...
I've tried doing this with awk, but failed thus far. Identifying every first and second line is easy enough:
if (NR % 3 == 1 || NR % 3 == 2) print;
but despite many attempts (usually resulting in syntax errors) can't find out how to strip out the (Windows) line-endings and concatenate those with every third line.
I'm now wondering if awk is actually the right tool for the job.
Thanks for any pointers.
awk '(NR % 3) > 0 {printf("%s\t",$0)}
(Nr % 3) == 0 {printf("%s\n",$0)}
Should work. For every line where the modulo of NR (number of records) is not 0 it will print the line and a tab character. Otherwise the (input) line and a newline character.
HTH
see the test below:
kent$ echo "Sunday, 27 November 2011
Barclays Premier League
Liverpool v Man City, 16:00
Tuesday, 29 November 2011
Carling Cup
Chelsea v Liverpool, QF, 19:45
"|awk '{printf $0"\t";if(!(NR%3))print""}'
output:
Sunday, 27 November 2011 Barclays Premier League Liverpool v Man City, 16:00
Tuesday, 29 November 2011 Carling Cup Chelsea v Liverpool, QF, 19:45

Resources