I have csv file having below details, want to fetch the details based on time like
if hour is in betwwen 10 and 18 it should print be as morning, and rest of lines print night.
time,id
2022-08-01T00:09:14+09:00,PKA990
2022-08-01T06:48:24+09:00,PKA990
2022-08-01T08:27:23+09:00,
2022-08-01T11:04:18+09:00,ABCD890
2022-08-01T11:23:22+09:00,ABCD890
2022-08-01T11:30:14+09:00,
2022-08-01T12:01:12+09:00,ABCD890
2022-08-01T15:11:59+09:00,JIKOPL8
2022-08-01T18:20:53+09:00,TUVNDGD
im expecting output be like
time,id,session
2022-08-01T00:09:14+09:00,PKA990,night
2022-08-01T06:48:24+09:00,PKA990,night
2022-08-01T08:27:23+09:00,LOADING,night
2022-08-01T11:04:18+09:00,ABCD890,morning
2022-08-01T11:23:22+09:00,ABCD890,morning
2022-08-01T11:30:14+09:00,LOADING,morning
2022-08-01T12:01:12+09:00,ABCD890,morning
2022-08-01T15:11:59+09:00,JIKOPL8,morning
2022-08-01T18:20:53+09:00,TUVNDGD,night
Please suggest.
Sorry i haved edited some rows .... when ever there is an blank it should fill with "LOADING"
apologize
Here an awk solution:
#! /bin/bash
LC_ALL=en_US awk '
BEGIN {
FS=OFS=","
}
{
$1 = $1
}
NR == 1 {
print $0, "session", "day_of_week"
next
}
$2 == "" {
$2 = "LOADING"
}
{
date_timestamp=sprintf("%04d %02d %02d 00 00 00", substr($1, 1, 4), substr($1, 6, 2), substr($1, 9, 2))
day_of_week=tolower(strftime("%A", mktime(date_timestamp)))
if ($1 ~ /^[-0-9]+T(10|11|12|13|14|15|16|17):/) {
session="morning"
} else {
session="night"
}
}
{
print $0, session, day_of_week
}
' <"datetime.csv"
Output:
time,id,session,day_of_week
2022-08-01T00:09:14+09:00,PKA990,night,monday
2022-08-01T06:48:24+09:00,PKA990,night,monday
2022-08-01T08:27:23+09:00,LOADING,night,monday
2022-08-01T11:04:18+09:00,ABCD890,morning,monday
2022-08-01T11:23:22+09:00,ABCD890,morning,monday
2022-08-01T11:30:14+09:00,LOADING,morning,monday
2022-08-01T12:01:12+09:00,ABCD890,morning,monday
2022-08-01T15:11:59+09:00,JIKOPL8,morning,monday
2022-08-01T18:20:53+09:00,TUVNDGD,night,monday
A pure bash solution:
#! /bin/bash
INPUT_FILENAME="datetime.csv"
FLAG_FIRST=1
FS=","
OFS=","
while read -r LINE; do
if [[ ${FLAG_FIRST} -eq 1 ]]; then
printf "%s%s%s%s%s\n" "${LINE//${FS}/${OFS}}" "${OFS}" "session" "${OFS}" "day_of_week"
FLAG_FIRST=0
continue
fi
# Ignore empty lines
[[ -z "${LINE}" ]] && continue
# If LINE ends with coma (assume id field is empty):
# put "LOADING" token at end of line
[[ "${LINE}" =~ ${FS}$ ]] && LINE+="LOADING"
# LC_ALL=en_US to obtains day of week in english (I'am french)
# Use ${LINE%%T*} to use date field without hour and timezone
DAY_OF_WEEK=$(LC_ALL=en_US date +%A --date "${LINE%%T*}")
# In english, day of week begins with an uppercase letter:
# use a coma after variable name to put it in lowercase
if [[ "${LINE}" =~ ^[-0-9]+T(10|11|12|13|14|15|16|17): ]]; then
SESSION="morning"
else
SESSION="night"
fi
printf "%s%s%s%s%s\n" "${LINE//${FS}/${OFS}}" "${OFS}" "${SESSION}" "${OFS}" "${DAY_OF_WEEK,}"
done < <(cat "${INPUT_FILENAME}"; echo)
Output:
time,id,session,day_of_week
2022-08-01T00:09:14+09:00,PKA990,night,monday
2022-08-01T06:48:24+09:00,PKA990,night,monday
2022-08-01T08:27:23+09:00,LOADING,night,monday
2022-08-01T11:04:18+09:00,ABCD890,morning,monday
2022-08-01T11:23:22+09:00,ABCD890,morning,monday
2022-08-01T11:30:14+09:00,LOADING,morning,monday
2022-08-01T12:01:12+09:00,ABCD890,morning,monday
2022-08-01T15:11:59+09:00,JIKOPL8,morning,monday
2022-08-01T18:20:53+09:00,TUVNDGD,night,monday
Related
I have to create a script that given a country and a sport you get the number of medalists and medals won after reading a csv file.
The csv is called "athletes.csv" and have this header
id|name|nationality|sex|date_of_birth|height|weight|sport|gold|silver|bronze|info
when you call the script you have to add the nationality and sport as parameters.
The script i have created is this one:
#!/bin/bash
participants=0
medals=0
while IFS=, read -ra array
do
if [[ "${array[2]}" == $1 && "${array[7]}" == $2 ]]
then
participants=$participants++
medals=$(($medals+${array[8]}+${array[9]}+${array[10]))
fi
done < athletes.csv
echo $participants
echo $medals
where array[3] is the nationality, array[8] is the sport and array[9] to [11] are the number of medals won.
When i run the script with the correct paramters I get 0 participants and 0 medals.
Could you help me to understand what I'm doing wrong?
Note I cannot use awk nor grep
Thanks in advance
Try this:
#! /bin/bash -p
nation_arg=$1
sport_arg=$2
declare -i participants=0
declare -i medals=0
declare -i line_num=0
while IFS=, read -r _ _ nation _ _ _ _ sport ngold nsilver nbronze _; do
(( ++line_num == 1 )) && continue # Skip the header
[[ $nation == "$nation_arg" && $sport == "$sport_arg" ]] || continue
participants+=1
medals+=ngold+nsilver+nbronze
done <athletes.csv
declare -p participants
declare -p medals
The code uses named variables instead of numbered positional parameters and array indexes to try to improve readability and maintainability.
Using declare -i means that strings assigned to the declared variables are treated as arithmetic expressions. That reduces clutter by avoiding the need for $(( ... )).
The code assumes that the field separator in the CSV file is ,, not | as in the header. If the separator is really |, replace IFS=, with IFS='|'.
I'm assuming that the field delimiter of your CSV file is a comma but you can set it to whatever character you need.
Here's a fixed version of your code:
#!/bin/bash
participants=0
medals=0
{
# skip the header
read
# process the records
while IFS=',' read -ra array
do
if [[ "${array[2]}" == $1 && "${array[7]}" == $2 ]]
then
(( participants++ ))
medals=$(( medals + array[8] + array[9] + array[10] ))
fi
done
} < athletes.csv
echo "$participants" "$medals"
remark: As $1 and $2 are left unquoted they are subject to glob matching (right side of [[ ... == ... ]]). For example you'll be able to show the total number of medals won by the US with:
./script.sh 'US' '*'
But I have to say, doing text processing with pure shell isn't considered a good practice; there exists dedicated tools for that. Here's an example with awk:
awk -v FS=',' -v country="$1" -v sport="$2" '
BEGIN {
participants = medals = 0
}
NR == 1 { next }
$3 == country && $8 == sport {
participants++
medals += $9 + $10 + $11
}
END { print participants, medals }
' athletes.csv
There's also a potential problem remaining: the CSV format might need a real CSV parser for reading it accurately. There exists a few awk libraries for that but IMHO it's simpler to use a CSV‑aware tool that provides the functionalities that you need.
Here's an example with Miller:
mlr --icsv --ifs=',' filter -s country="$1" -s sport="$2" '
begin {
#participants = 0;
#medals = 0;
}
$nationality == #country && $sport == #sport {
#participants += 1;
#medals += $gold + $silver + $bronze;
}
false;
end { print #participants, #medals; }
' athletes.csv
I have some missing dates in a file. e.g.
$cat ifile.txt
20060805
20060807
20060808
20060809
20060810
20060813
20060815
20060829
20060901
20060903
20060904
20060905
20070712
20070713
20070716
20070717
The dates are in the format YYYYMMDD. My intention is fill the missing dates in between the dates if they are missing maximum for 5 day e.g.
20060805
20060806 ---- This was missed
20060807
20060808
20060809
20060810
20060811 ----- This was missed
20060812 ----- This was missed
20060813
20060814 ----- This was missed
20060815
20060829
20060830 ------ This was missed
20060831 ------ This was missed
20060901
20060902 ------ This was missed
20060903
20060904
20060905
20070712
20070713
20070714 ----- This was missed
20070715 ----- This was missed
20070716
20070717
Other dates are not needed where there is a gap of more than 5 days. For example, I don't need to fill the dates between 20060815 and 20060829, because the gap between them is more than 5 days.
I am doing it in following ways, but don't get anything.
#!/bin/sh
awk BEGIN'{
a[NR]=$1
} {
for(i=1; i<NR; i++)
if ((a[NR+1]-a[NR]) <= 5)
for (j=1; j<(a[NR+1]-a[NR]); j++)
print a[j]
}' ifile.txt
Desired output:
20060805
20060806
20060807
20060808
20060809
20060810
20060811
20060812
20060813
20060814
20060815
20060829
20060830
20060831
20060901
20060902
20060903
20060904
20060905
20070712
20070713
20070714
20070715
20070716
20070717
Could you please try following, written and tested with shown samples in GNU awk.
awk '
FNR==1{
print
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
next
}
{
found=i=diff=""
curr_time=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
diff=(curr_time-prev)/86400
if(diff>1){
while(++i<=diff){ print strftime("%Y%m%d", prev+86400*i) }
found=1
}
prev=mktime(substr($0,1,4)" "substr($0,5,2)" "substr($0,7,2) " 00 00 00")
}
!found
' Input_file
The following seems to work:
stringtodate() {
echo "${1:0:4}-${1:4:2}-${1:6:2} 12:00:00"
}
datetoseconds() {
LC_ALL=C date -d "$(stringtodate "$1")" +%s
}
secondstodate() {
LC_ALL=C date -d "#$1" +%Y%m%d
}
outputdatesbetween() {
local start=$1
local stop=$2
for ((i = $1; i < $2; i += 3600*24)); do
secondstodate "$i"
done
}
prev=
while IFS= read -r line; do
now=$(datetoseconds "$line")
if [[ -n "$prev" ]] &&
((
now - prev > 3600 * 24 &&
now - prev < 3600 * 24 * 5
))
then
outputdatesbetween "$((prev + 3600 * 24))" "$now"
fi
echo "$line"
prev="$now"
done < 1
Tested on repl
Here is a quick GNU awk script. We use GNU awk to make use of the time-functions mktime and strftime:
awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0",1) }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i,1) }
{print; p=t}' file
Using mktime we convert the time into the total seconds since 1970. The function strftime converts it back to the desired format. Be aware that we enable the UTC-flag in both functions to ensure that we do not end up with surprises around Daylight-Saving-Time. Furthermore, since we already make use of GNU awk, we can further use the FIELDWIDTHS to determine the field lengths.
note: If your awk does not support the UTC-flag in mktime and strftime, you can run the following:
TZ=UTC awk -v n=5 'BEGIN{FIELDWIDTHS="4 2 2"}
{t=mktime($1 " " $2 " " $3 " 0 0 0") }
(t-p < n*86400) { for(i=p+86400;i<t;i+=86400) print strftime("%Y%m%d",i) }
{print; p=t}' file
I have my command below and I want to have the result in the same line with delimeters. My command:
Array=("GET" "POST" "OPTIONS" "HEAD")
echo $(date "+%Y-%m-%d %H:%M")
for i in "${Array[#]}"
do
cat /home/log/myfile_log | grep "$(date "+%d/%b/%Y:%H")"| awk -v last5=$(date --date="-5 min" "+%M") -F':' '$3>=last5 && $3<last5+5{print}' | egrep -a "$i" | wc -l
done
Results is:
2019-01-01 13:27
1651
5760
0
0
I want to have the result below:
2019-01-01 13:27,1651,5760,0,0
It looks (to me) like the overall objective is to scan /home/log/myfile.log for entries that have occurred within the last 5 minutes and which match one of the 4 entries in ${Array[#]}, keeping count of the matches along the way and finally printing the current date and the counts to a single line of output.
I've opted for a complete rewrite that uses awk's abilities of pattern matching, keeping counts and generating a single line of output:
date1=$(date "+%Y-%m-%d %H:%M") # current date
date5=$(date --date="-5 min" "+%M") # date from 5 minutes ago
awk -v d1="${date1}" -v d5="${date5}" -F":" '
BEGIN { keep=0 # init some variables
g=0
p=0
o=0
h=0
}
$3>=d5 && $3<d5+5 { keep=1 } # do we keep processing this line?
!keep { next } # if not then skip to next line
/GET/ { g++ } # increment our counters
/POST/ { p++ }
/OPTIONS/ { o++ }
/HEAD/ { h++ }
{ keep=0 } # reset keep flag for next line
# print results to single line of output
END { printf "%s,%s,%s,%s,%s\n", d1, g, p, o, h }
' <(grep "$(date '+%d/%b/%Y:%H')" /home/log/myfile_log)
NOTE: The OP may need to revisit the <(grep "$(date ...)" /home/log/myfile.log) to handle timestamp periods that span hours, days, months and years, eg, 14:59 - 16:04, 12/31/2019 23:59 - 01/01/2020 00:04, etc.
Yeah, it's a bit verbose but a bit easier to understand; OP can rewrite/reduce as sees fit.
I have csv file with multiple lines. Each line has the same number of columns. What I need to do is to group those lines by a few specified columns and aggregate data from other columns. Example of input file:
proces1,pathA,5-May-2011,10-Sep-2017,5
proces2,pathB,6-Jun-2014,7-Jun-2015,2
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces1,pathA,11-Sep-2017,15-Oct-2017,2
For above example I need to group lines by first two columns. From 3rd column I need to choose the min value, for 4th column max value, and 5th column should have the sum. So, for such input file I need output:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2
I need to process it in bash (I can use awk or sed as well).
With bash and sort:
#!/bin/bash
# create associative arrays
declare -A month2num=([Jan]=1 [Feb]=2 [Mar]=3 [Apr]=4 [May]=5 [Jun]=6 [Jul]=7 [Aug]=8 [Sep]=9 [Oct]=10 [Nov]=11 [Dec]=12])
declare -A p ds de # date start and date end
declare -A -i sum # set integer attribute
# function to convert 5-Jun-2011 to 20110605
date2num() { local d m y; IFS="-" read -r d m y <<< "$1"; printf "%d%.2d%.2d\n" $y ${month2num[$m]} $d; }
# read all columns to variables p1 p2 d1 d2 s
while IFS="," read -r p1 p2 d1 d2 s; do
# if associative array is still empty for this entry
# fill with current strings/value
if [[ -z ${p[$p1,$p2]} ]]; then
p[$p1,$p2]="$p1,$p2"
ds[$p1,$p2]="$d1"
de[$p1,$p2]="$d2"
sum[$p1,$p2]="$s"
continue
fi
# compare strings, set new strings and sum value
if [[ ${p[$p1,$p2]} == "$p1,$p2" ]]; then
[[ $(date2num "$d1") < $(date2num ${ds[$p1,$p2]}) ]] && ds[$p1,$p2]="$d1"
[[ $(date2num "$d2") > $(date2num ${de[$p1,$p2]}) ]] && de[$p1,$p2]="$d2"
sum[$p1,$p2]=sum[$p1,$p2]+s
fi
done < file
# print content of all associative arrays with key vom associative array p
for i in "${!p[#]}"; do echo "${p[$i]},${ds[$i]},${de[$i]},${sum[$i]}"; done
Usage: ./script.sh | sort
Output to stdout:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2
See: help declare, help read and of course man bash
With awk + sort
awk -F',|-' '
BEGIN{
A["Jan"]="01"
A["Feb"]="02"
A["Mar"]="03"
A["Apr"]="04"
A["May"]="05"
A["Jun"]="06"
A["July"]="07"
A["Aug"]="08"
A["Sep"]="09"
A["Oct"]="10"
A["Nov"]="11"
A["Dec"]="12"
}
{
B[$1","$2]=B[$1","$2]+$9
z=sprintf( "%.2d",$3)
y=sprintf("%s",$5 A[$4] z)
if(!start[$1$2])
{
end[$1$2]=0
start[$1$2]=99999999
}
if (y < start[$1$2])
{
start[$1$2]=y
C[$1","$2]=$3"-"$4"-"$5
}
x=sprintf( "%.2d",$6)
w=sprintf("%s",$8 A[$7] x)
if(w > end[$1$2] )
{
end[$1$2]=w
D[$1","$2]=$6"-"$7"-"$8
}
}
END{
for (i in B)print i "," C[i] "," D[i] "," B[i]
}
' infile | sort
Extended GNU awk solution:
awk -F, 'function parse_date(d_str){
split(d_str, d, "-");
t = mktime(sprintf("%d %d %d 00 00 00", d[3], m[d[2]], d[1]));
return t
}
BEGIN{ m["Jan"]=1; m["Feb"]=2; m["Mar"]=3; m["Apr"]=4; m["May"]=5; m["Jun"]=6;
m["Jul"]=7; m["Aug"]=8; m["Sep"]=9; m["Oct"]=10; m["Nov"]=11; m["Dec"]=12;
}
{
k=$1 SUBSEP $2;
if (k in a){
if (parse_date(a[k]["min"]) > parse_date($3)) { a[k]["min"]=$3 }
if (parse_date(a[k]["max"]) < parse_date($4)) { a[k]["max"]=$4 }
} else {
a[k]["min"]=$3; a[k]["max"]=$4
}
a[k]["sum"]+= $5
}
END{
for (i in a) {
split(i, j, SUBSEP);
print j[1], j[2], a[i]["min"], a[i]["max"], a[i]["sum"]
}
}' OFS=',' file
The output:
proces1,pathA,5-May-2011,15-Oct-2017,7
proces1,pathB,6-Jun-2017,7-Jun-2017,1
proces2,pathB,6-Jun-2014,7-Jun-2015,2
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed