unix shell script to get nth business day - shell

Referencing the solution posted on this unix.com thread for getting the Nth business day of the month, I tried to get the 16th business day of the month using the following code, but it doesn't work.
currCal=`/usr/bin/cal`
BUSINESS_DAYS=`echo $($currCal|nawk 'NR>2 {print substr($0,4,14)}' |tr "\n" " ")`
The error when executing this is:
nawk: syntax error at source line 1 context is
NR>2 {print >>> substr(test. <<< sh,4,14)}
nawk: illegal statement at source line 1
I'm guessing it takes $0 as the script name, causing the syntax error. Please help.

There seem to be a few issues with what you have above.
First, I agree with #John1024 that in order to get the nawk error you've posted, you must actually be running:
BUSINESS_DAYS=`echo $($currCal|nawk "NR>2 {print substr($0,4,14)}" |tr "\n" " ")`
with double quotes around the nawk script.
Furthermore, once you resolve the nawk error, you're going to run into issues with how you are using currCal. You get the actual output of the cal command into the currCal variable, but then are using the variable value (that is the output of cal) as a command before the | rather than echoing it into the pipe or something similar.
This brings up an additional question of why you're using echo on the result of a subshell command (the $() part) within another subshell (the outer ``s).
Finally, the two lines you show above only get a list of the business days in the current month into the BUSINESS_DAYS variable. They do not output/save the 16th such day.
Taking all of the above into consideration (and also changing to use the $() subshell syntax consistently), you might want one of the following invocations:
If you really need to cache the current month's calendar and want to pull multiple days:
currCal="$(/usr/bin/cal)"
BUSINESS_DAYS="$(echo "${currCal}" | \
nawk 'NR>2 {print substr($0,4,14)}' | \
tr "\n" " ")"
DAY=16
DAYTH_DAY="$(echo "${BUSINESS_DAYS}" | nawk -v "day=${DAY}" '{ print $day }')
If this is just a one-and-done:
DAY=16
DAYTH_DAY="$(/usr/bin/cal | \
nawk 'NR>2 {print substr($0,4,14)}' | \
tr "\n" " " | \
nawk -v "day=${DAY}" '{ print $day }')"
One more note: the processing here can be simplified if done entirely in awk(/nawk), but I wanted to stick to the basic framework you had already chosen.
Update per the request in the comment:
A pure POSIX awk version:
DAY=16
DAYTH="$(cal | awk -v "day=${DAY}" '
(NR < 3) { next ; }
/^.[0-9 ]/ { $1="" ; }
/^ / || (NF == 7) { $NF="" ; }
{ hold=hold $0 ; }
END { split(hold,arr," ") ; print arr[day] ; }')"
Yes, simplified is a matter of opinion, and I'm sure someone can make this more concise. Explanation of how this works:
Skip the header of the cal output:
(NR < 3) { next ; }
For weeks that have a date on the Sunday, trim the date of that Sunday:
/^.[0-9 ]/ { $1="" ; }
For weeks that start after Sunday (first week of a month) or weeks that have a full seven days, trim the date of Saturday for that week:
/^ / || (NF == 7) { $NF="" ; }
Once the lines only have the dates of weekdays, curry them into hold:
{ hold=hold $0 ; }
At the end, split hold on spaces so we can grab the Nth day:
END { split(hold,arr," ") ; print arr[day] ; }')"

No awk, just software tools:
set -- $(cal -h | rev | cut --complement -b-5,20- | rev | tail -n +3) ; \
shift 15 ; echo $1
Output:
22
The output of cal is tricky to parse because:
It's right justified.
It's space delimited.
One or two digit dates means two or one delimiting spaces.
More leading spaces for first days of month.
Parsing won't quite work without the -h option, (turn off 'today' highlighting).

Related

Using awk to replace part of a line with the output of a program

I've got a file with several columns, like so:
13:46:48 1.2.3.4:57 user1
13:46:49 5.6.7.8:58 user2
13:48:07 9.10.11.12:59 user3
I'd like to transform one of the columns by passing it as input to a program:
echo "1.2.3.4:57" | transformExternalIp
10.0.0.4:57
I wrote a small bit of awk to do this:
awk '{ ("echo " $2 " | transformExternalIp") | getline output; $2=output; print}'
But what I got surprised me. Initially, it looked like it was working as expected, but then I started to see weird repeated values. In order to debug, I removed my fancy "transformExternalIp" program in case it was the problem and replaced it with echo and cat, which means literally nothing should change:
awk '{ ("echo " $2 " | cat") | getline output; print $2 " - " output}' connections.txt
For the first thousand lines or so, the left and right sides matched, but then after that, the right side frequently stopped changing:
1.2.3.4:57 - 1.2.3.4:57
2.2.3.4:12 - 2.2.3.4:12
3.2.3.4:24 - 3.2.3.4:24
# .... (okay for a long while)
120.120.3.4:57 - 120.120.3.4:57
121.120.3.4:25 - 120.120.3.4:57
122.120.3.4:100 - 120.120.3.4:57
123.120.3.4:76 - 120.120.3.4:57
What the heck have I done wrong? I'm guessing that I'm misunderstanding something about awk.
Close the command after each invocation to insure a new copy of the command is run for the next set of input, eg:
awk '{ ("echo " $2 " | transformExternalIp") | getline output
close("echo " $2 " | transformExternalIp")
$2=output
print
}'
# or, to reduce issues from making a typo:
awk '{ cmd="echo " $2 " | transformExternalIp"
(cmd) | getline output
close(cmd)
$2=output
print
}'
For more details see this and this.
During my testing with a dummy script (echo $RANDOM; sleep .1) I could generate similar results as OP ... some good/expected lines and then a bunch of duplicates.
I noticed that as soon as the duplicates started occuring, the dummy script wasn't actually being called any more and instead awk was treating the system call as a static result (ie, kept re-using the value from the last 'good' call); it was quite noticeable because the sleep .1 was no longer being called so the output from the awk script sped up significantly.
Can't say that I understand 100% what's happening under the covers ... perhaps an issue with how the script (my dummy script; OP's transforExternalIp) behaves with multiple lines of input when expecting one line of input ... an issue with a limit on the number of open/active process handles ... shrug
("echo" $2" | cat") creates a fork almost every time that you use it.
Then, when the above instruction reaches some kind of fork limit, the output variable isn't updated by getline anymore; that's what's happening here.
If you're using GNU awk then you can fix the issue with a Coprocess:
awk '
BEGIN { cmd = "cat" }
{
print $2 |& cmd
cmd |& getline output
print $2 " - " output
}
' connections.txt
Awk is a tool to manipulate text. A shell is a tool to sequence calls to other tools. Don't use a shell to call awk to sequence calls to transformExternalIp as if it were a shell, just use a shell:
while read -r _ old_ip _; do
new_ip=$(printf '%s\n' "$old_ip" | transformExternalIp)
printf '%s - %s\n' "$old_ip" "$new_ip"
done < connections.txt
When you're using awk you're spawning a subshell for every call to transformExternalIp so it's no more efficient (probably a bit less efficient) than just staying in shell.

Show with star symbols how many times a user have logged in

I'm trying to create a simple shell script showing how many times a user has logged in to their linux machine for at least one week. The output of the shell script should be like this:
2021-12-16
****
2021-12-15
**
2021-12-14
*******
I have tried this so far but it shows only numeric but i want showing * symbols.
user="$1"
last -F | grep "${user}" | sed -E "s/${user}.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c
Any help?
You might want to refactor all of this into a simple Awk script, where repeating a string n times is also easy.
user="$1"
last -F |
awk -v user="$1" 'BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":");
for(i=1; i<=12; i++) mon[m[i]] = sprintf("%02i", i) }
$1 == user { ++count[$8 "-" mon[$5] "-" sprintf("%02i", $6)] }
END { for (date in count) {
padded = sprintf("%-" count[date] "s", "*");
gsub(/ /, "*", padded);
print date, padded } }'
The BEGIN block creates an associative array mon which maps English month abbreviations to month numbers.
sprintf("%02i", number) produces the value of number with zero padding to two digits (i.e. adds a leading zero if number is a single digit).
The $1 == user condition matches the lines where the first field is equal to the user name we passed in. (Your original attempt had two related bugs here; it would look for the user name anywhere in the line, so if the user name happened to match on another field, it would erroneously match on that; and the regex you used would match a substring of a longer field).
When that matches, we just update the value in the associative array count whose key is the current date.
Finally, in the END block, we simply loop over the values in count and print them out. Again, we use sprintf to produce a field with a suitable length. We play a little trick here by space-padding to the specified width, because sprintf does that out of the box, and then replace the spaces with more asterisks.
Your desired output shows the asterisks on a separate line from the date; obviously, it's easy to change that if you like, but I would advise against it in favor of a format which is easy to sort, grep, etc (perhaps to then reformat into your desired final human-readable form).
If you have GNU sed you're almost there. Just pipe the output of uniq -c to this GNU sed command:
sed -En 's/^\s*(\S+)\s+(\S+).*/printf "\2\n%\1s" ""/e;s/ /*/g;p'
Explanation: in the output of uniq -c we substitute a line like:
6 Dec-15-2021
by:
printf "Dec-15-2021\n%6s" ""
and we use the e GNU sed flag (this is a GNU sed extension so you need GNU sed) to pass this to the shell. The output is:
Dec-15-2021
where the second line contains 6 spaces. This output is copied back into the sed pattern space. We finish by a global substitution of spaces by stars and print:
Dec-15-2021
******
A simple soluction, using tempfile
#!/bin/bash
user="$1"
tempfile="/tmp/last.txt"
IFS='
'
last -F | grep "${user}" | sed -E "s/"${user}".*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c > $tempfile
for LINE in $(cat $tempfile)
do
qtde=$(echo $LINE | awk '{print $1'})
data=$(echo $LINE | awk '{print $2'})
echo -e "$data "
for ((i=1; i<=qtde; i++))
do
echo -e "*\c"
done
echo -e "\n"
done

Bash find last entry before timestamp

I have a .csv file that is formatted thus;
myfile.csv
**Date,Timestamp,Data1,Data2,Data3,Data4,Data5,Data6**
20130730,22:08:51.244,APPLES,Spain,67p,blah,blah
20130730,22:08:51.244,PEARS,Spain,32p,blah,blah
20130730,22:08:51.708,APPLES,France,102p,blah,blah
20130730,22:10:62.108,APPLES,Spain,67p,blah,blah
20130730,22:10:68.244,APPLES,Spain,67p,blah,blah
I wish to feed in a timestamp which most likely will NOT match up perfectly to the millisecond with those in the file, and find the preceding line that matches a particular grep search.
so e.g. something like;
cat myfile.csv | grep 'Spain' | grep 'APPLES' | grep -B1 "22:09"
should return
20130730,22:08:51.244,APPLES,Spain,67p,blah,blah
But thus far I can only get it to work with exact timestamps in the grep. Is there a way to get it to treat these as a time series? (I am guessing that's what the issue is here - it's trying pure pattern matching and not unreasonably failing to find one)
I have also a fancy solution using awk:
awk -F ',' -v mytime="2013 07 30 22 09 00" '
BEGIN {tlimit=mktime(mytime); lastline=""}
{
l_y=substr($1,0,4); l_m=substr($1,4,2); l_d=substr($1,6,2);
split($2,l_hms,":"); l_hms[3]=int(l_hms[3]);
line_time=mktime(sprintf("%d %d %d %d %d %d", l_y, l_m, l_d, l_hms[1], l_hms[2], l_hms[3]));
if (line_time>tlimit) exit; lastline=$0;
}
END{if lastline=="" print $0; else print lastline;}' myfile.csv
It is working based on making the timestamps from each line with awk's time function mktime. I also make the assumption that $1 is the date.
On the first line, you have to provide the timestamp of the time limit you want (here I choose 2013 07 30 22 09 00). You have to write it according to the format used by mktime : YYYY MM DD hh mm ss. You begin the awk statement with making up the timestamp of your time limit. Then, for each line, you catch up year, month and day from $1 (line 4), then the exact hour from $2 (line 5). As mktime takes only entire seconds, I truncate the seconds (you can round it up with int(l_hms[3]+0.5)). Here you can do evereything you want to approximate the timestamp, like discarding the seconds. On line 6, I make the time stamp from the six date fields I have extracted. Finally, on line 7, I compare timestamps and goto end in case of reaching your time limit. As you want the preceding line, I store the line into the variable lastline. On exit, I print lastline; in case of reaching the time limit on the first line, I print the first line.
This solution works well on your sample file, and works for any date you supply. You only have to supply the date limit in the correct format!
EDIT
I realize that mktime is not necessary. If the assumption that $1 is the date written as YYYYMMDD, you can compare the date as a number then the time (extracted with split, rebuilt as a number as in other answers). In that case, you can supply the time limit in the format you want, and recover proper date and time limits in the BEGIN block.
you could have a awk that keep in memory the last line it saw which have a timestamp lower than the one you feed it, and prints the last match at the end (considering they are in ascending order)
ex:
awk -v FS=',' -v thetime="22:09" '($2 < thetime) { before=$0 ; } END { print before ; }' myfile.csv
This happen to work as you feed it a string that, lexigographically, doesn't need to have the complete size (ie 22:09:00.000) to be compared.
The same, but on several lines for readability:
awk -v FS=',' -v thetime="22:09" '
($2 < thetime) { before=$0 ; }
END { print before ; }' myfile.csv
Now if I understand your complete requirements: you need to find, among lines mactching a country and a type of product, the last line before a timestamp? then:
awk -v FS=',' -v thetime="${timestamp}" -v country="${thecountry}" -v product="${theproduct}" '
( $4 == country ) && ( $3 == product ) && ( $2 < thetime ) { before=$0 ; }
END { print before ; }' myfile.csv
should work for you... (feed it with 10:07, Spain and APPLES, and it returns the expected "20130730,22:08:51.244,APPLES,Spain,67p,blah,blah" line)
And if your file spans several days (to adress Bentoy13's concern),
awk -v FS=',' -v theday="${theday}" -v thetime="${timestamp}" -v thecountry="${thecountry}" -v theproduct="${theproduct}" '
( $4 == thecountry ) && ( $3 == theproduct ) && (($1<theday)||(($1==theday)&&($2<thetime))) { before=$0 ; }
END { print before ; }' myfile.csv
That last one also works if the first column changes (ie, if it spans several days), but you need to feed it also theday
You could use awk instead of your grep like this:
awk -v FS=',' -v Hour=22 -v Min=9 '{split($2, a, "[:]"); if ((3600*a[1] + 60*a[2] + a[3] - 3600*Hour - 60*Min)^2 < 100) print $0}' file
and basically change the 100 to what ever tolerance you want.

How to print all the columns after a particular number using awk?

On shell, I pipe to awk when I need a particular column.
This prints column 9, for example:
... | awk '{print $9}'
How can I tell awk to print all the columns including and after column 9, not just column 9?
awk '{ s = ""; for (i = 9; i <= NF; i++) s = s $i " "; print s }'
When you want to do a range of fields, awk doesn't really have a straight forward way to do this. I would recommend cut instead:
cut -d' ' -f 9- ./infile
Edit
Added space field delimiter due to default being a tab. Thanks to Glenn for pointing this out
awk '{print substr($0, index($0,$9))}'
Edit:
Note, this doesn't work if any field before the ninth contains the same value as the ninth.
sed -re 's,\s+, ,g' | cut -d ' ' -f 9-
Instead of dealing with variable width whitespace, replace all whitespace as single space. Then use simple cut with the fields of interest.
It doesn't use awk so isn't germane but seemed appropriate given the other answers/comments.
Generally perl replaces awk/sed/grep et. al., and is much more portable (as well as just being a better penknife).
perl -lane 'print "#F[8..$#F]"'
Timtowtdi applies of course.
awk -v m="\x01" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
This chops what is before the given field nr., N, and prints all the rest of the line, including field nr.N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line, which is the problem with Ascherer's answer.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
For N=0 it returns the whole line, as is, and for n>NF the empty string
Here is an example of ls -l output:
-rwxr-----# 1 ricky.john 1493847943 5610048 Apr 16 14:09 00-Welcome.mp4
-rwxr-----# 1 ricky.john 1493847943 27862521 Apr 16 14:09 01-Hello World.mp4
-rwxr-----# 1 ricky.john 1493847943 21262056 Apr 16 14:09 02-Typical Go Directory Structure.mp4
-rwxr-----# 1 ricky.john 1493847943 10627144 Apr 16 14:09 03-Where to Get Help.mp4
My solution to print anything post $9 is awk '{print substr($0, 61, 50)}'
Using cut instead of awk and overcoming issues with figuring out which column to start at by using the -c character cut command.
Here I am saying, give me all but the first 49 characters of the output.
ls -l /some/path/*/* | cut -c 50-
The /*/*/ at the end of the ls command is saying show me what is in subdirectories too.
You can also pull out certain ranges of characters ala (from the cut man page). E.g., show the names and login times of the currently logged in users:
who | cut -c 1-16,26-38
To display the first 3 fields and print the remaining fields you can use:
awk '{s = ""; for (i=4; i<= NF; i++) s= s $i : "; print $1 $2 $3 s}' filename
where $1 $2 $3 are the first 3 fields.
function print_fields(field_num1, field_num2){
input_line = $0
j = 1;
for (i=field_num1; i <= field_num2; i++){
$(j++) = $(i);
}
NF = field_num2 - field_num1 + 1;
print $0
$0 = input_line
}
Usually it is desired to pass the remaining columns unmodified. That is, without collapsing contiguous white space.
Imagine the case of processing the output of ls -l or ps faux (not recommended, just giving examples where the last column may contain sequences of whitespace)). We'd want any contiguous white space in the remaning columns preserved so that a file named my file.txt doesn't become my file.txt.
Preserving white space for the remainder of the line is surprisingly difficult using awk. The accepted awk-based answer does not, even with the suggested refinements.
sed or perl are better suited to this task.
sed
echo '1 2 3 4 5 6 7 8 9 10' | sed -E 's/^([^ \t]*[ \t]*){8}//'
Result:
9 10
The -E option enables modern ERE regex syntax. This saves me the trouble of backslash escaping the parentheses and braces.
The {8} is a quantifier indicating to match the previous item exactly 8 times.
The sed s command replaces 8 occurrences of white space delimited words by an empty string. The remainder of the line is left intact.
perl
Perl regex supports the \h escape for horizontal whitespace.
echo '1 2 3 4 5 6 7 8 9 10' | perl -pe 's/^(\H*\h*){8}//'
Result:
9 10
ruby -lane 'print $F[3..-1].join(" ")' file

Print last 10 rows of specific columns using awk

I have the below awk command-line argument and it works aside from the fact it performs the print argument on the entire file (as expected). I would like it to just perform the formatting on the last 10 lines of the file (or any arbitrary number). Any suggestions are greatly appreciated, thanks!
I know one solution would be to pipe it with tail, but would like to stick with a pure awk solution.
awk '{print "<category label=\"" $13 " " $14 " " $15 "\"/>"}' foofile
There is no need to be orthodox with a language or tool on the Unix shell.
tail -10 foofile | awk '{print "<category label=\"" $13 " " $14 " " $15 "\"/>"}'
is a good solution. And, you already had it.
Your arbitrary number can still be used as an argument to tail, nothing is lost;
solution does not lose any elegance.
Using ring buffers, this one-liner prints last 10 lines;
awk '{a[NR%10]=$0}END{for(i=NR+1;i<=NR+10;i++)print a[i%10]}'
then, you can merge "print last 10 lines" and "print specific columns" like below;
{
arr_line[NR % 10] = $0;
}
END {
for (i = NR + 1; i <= NR + 10; i++) {
split(arr_line[i % 10], arr_field);
print "<category label=\"" arr_field[13] " " \
arr_field[14] " " \
arr_field[15] "\"/>";
}
}
I don't think this can be tidily done in awk. The only way you can do it is to buffer the last X lines, and then print them in the END block.
I think you'll be better off sticking with tail :-)
Just for last 10 rows
awk 'BEGIN{OFS="\n"}
{
a=b;b=c;c=d;d=e;e=f;f=g;g=h;h=i;i=j;j=$0
}END{
print a,b,c,d,e,f,g,h,i,j
}' file
In the case of variable # of columns, i have worked out two solutions
#cutlast [number] [[$1] [$2] [$3]...]
function cutlast {
length=${1-1}; shift
list=( ${#-`cat /proc/${$}/fd/0`} )
output=${list[#]:${#list[#]}-${length-1}}
test -z "$output" && exit 1 || echo $output && exit 0
}
#example: cutlast 2 one two three print print # echo`s print print
#example1: echo one two three four print print | cutlast 2 # echo`s print print
or
function cutlast {
length=${1-1}; shift
list=( ${#-`cat /proc/${$}/fd/0`} )
aoutput=${#-`cat /proc/${$}/fd/0`} | rev | cut -d ' ' -f-$num | rev
test -z "$output" && exit 1 || echo $output && exit 0
}
#example: cutlast 2 one two three print print # echo`s print print
There is loads of awk one liners in this text document, not sure if any of those will help.
This specifically might be what you're after (something similar anyway):
# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'
awk '{ y=x "\n" $0; x=$0 }; END { print y }'
This is very inefficient: what it does is reading the whole file line by line only to print the last two lines.
Because there is no seek() statement in awk it is recommended to use tail to print the last lines of a file.

Resources