Convert GNU awk command to default macOS awk command - macos

Given a file containing many lines such as, e.g.:
Z|X|20210903|07:00:00|S|33|27.71||
With wanted output of, e.g.:
Z|X|20210903|07:00:00|S|33|27.71|||03-09-2021 07:00:00
This GNU awk command works:
gawk -F'|' '{dt = gensub(/(....)(..)(..)/,"\\3-\\2-\\1",1,$3); print $0"|"dt,$4}' infile > outfile
However, I need this to work under macOS with the version of awk that is installed by default, and it produces the following error:
awk: calling undefined function gensub
input record number 1, file
source line number 1
I'm assuming the default version of awk in macOS is too old and doesn't support the gensub function.
Note that I have tried numerous other string functions to no avail. awk programming is not in my area of expertise and I derived at the GNU awk command above thru a fair amount of googling, but my google-fu was unsuccessful in trying to get something to work with macOS awk.
Can the above GNU awk command be rewritten to work with the default version of awk in, e.g., macOS Catalina and if so how?

Would you please try the following:
awk -F'|' '{dt=substr($3,7,2) "-" substr($3,5,2) "-" substr($3,1,4); print $0 "|" dt, $4}' infile > outfile

Using perl instead of gawk:
$ perl -lne '
my #F = split /[|]/, $_, -1;
my $dt = ($F[2] =~ s/(....)(..)(..)/$3-$2-$1/r);
print join("|", #F, "$dt $F[3]")' <<<"Z|X|20210903|07:00:00|S|33|27.71||"
Z|X|20210903|07:00:00|S|33|27.71|||03-09-2021 07:00:00

Related

What's wrong with repeating entries in awk print statements?

I was trying to answer this other question, about how to repeat an existing column.
I thought this to be fairly easy, just by doing something like:
awk '{print $0 $2}'
This, however, only seems to print $0.
So, I decided to do some more tests:
awk '{print $0 $0}' // prints the entire line only once
awk '{print $1 $1 $1}' // prints the first entry only once
awk '{print $2 $1 $0}' // prints the first entry, followed
// by the entire line
// (the second part is not printed)
...
And having a look at the results, I have the impression that awk is more or less checking what he has printed already and refuses to print it a next time.
Why is that?
I'm using awk from my Windows subsystem for Linux (WSL), more exactly the Ubuntu app from Canonical. This is the result of awk --version:
GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
Copyright (C) 1989, 1991-2019 Free Software Foundation.
awk '{print $0 $0}' // prints the entire line only once
awk '{print $0 $2}' // prints only $0
All these are due to presence of DOS line break \r in your file. Due to presence of \r unix output overwrites on same line from the beginning of the line position hence both lines overlap and you get to see only one line in output.
You can remove \r using tr or sed like this:
tr -d '\t' < file > file.new
sed -i.bak $'s/\\r$//' file
Or you can ask awk to treat \r\n as record separator (note gnu-awk)
awk -v RS='\r\n` '{print $0, $0}' file

awk expression that works on awk v4.0.2 but it does not on >= 4.2.1

I have this awk command:
echo www.host.com |awk -F. '{$1="";OFS="." ; print $0}' | sed 's/^.//'
which what it does is to get the domain from the hostname:
host.com
that command works on CentOS 7 (awk v 4.0.2), but it does not work on ubuntu 19.04 (awk 4.2.1) nor alpine (gawk 5.0.1), the output is:
host com
How could I fix that awk expression so it works in recent awk versions ?
For your provided samples could you please try following. This will try to match regex from very first . to till last of the line and then prints after first dot to till last of line.
echo www.host.com | awk 'match($0,/\..*/){print substr($0,RSTART+1,RLENGTH-1)}'
OP's code fix: In case OP wants to use his/her own tried code then following may help. There are 2 points here: 1st- We need not to use any other command along with awk to processing. 2nd- We need to set values of FS and OFS in BEGIN section which you are doing in everyline.
echo www.host.com | awk 'BEGIN{FS=OFS="."} {$1="";sub(/\./,"");print}'
To get the domain, use:
$ echo www.host.com | awk 'BEGIN{FS=OFS="."}{print $(NF-1),$NF}'
host.com
Explained:
awk '
BEGIN { # before processing the data
FS=OFS="." # set input and output delimiters to .
}
{
print $(NF-1),$NF # then print the next-to-last and last fields
}'
It also works if you have arbitrarily long fqdns:
$ echo if.you.have.arbitrarily.long.fqdns.example.com |
awk 'BEGIN{FS=OFS="."}{print $(NF-1),$NF}'
example.com
And yeah, funny, your version really works with 4.0.2. And awk version 20121220.
Update:
Updated with some content checking features, see comments. Are there domains that go higher than three levels?:
$ echo and.with.peculiar.fqdns.like.co.uk |
awk '
BEGIN {
FS=OFS="."
pecs["co\034uk"]
}
{
print (($(NF-1),$NF) in pecs?$(NF-2) OFS:"")$(NF-1),$NF
}'
like.co.uk
You got 2 very good answers on awk but I believe this should be handled with cut because of simplicity it offers in getting all fields starting for a known position:
echo 'www.host.com' | cut -d. -f2-
host.com
Options used are:
-d.: Set delimiter as .
-f2-: Extract all the fields starting from position 2
What you are observing was a bug in GNU awk which was fixed in release 4.2.1. The changlog states:
2014-08-12 Arnold D. Robbins
OFS being set should rebuild $0 using previous OFS if $0 needs to be
rebuilt. Thanks to Mike Brennan for pointing this out.
awk.h (rebuild_record): Declare.
eval.c (set_OFS): If not being called from var_init(), check if $0 needs rebuilding. If so, parse the record fully and rebuild it. Make OFS point to a separate copy of the new OFS for next time, since OFS_node->var_value->stptr was
already updated at this point.
field.c (rebuild_record): Is now extern instead of static. Use OFS and OFSlen instead of the value of OFS_node.
When reading the code in the OP, it states:
awk -F. '{$1="";OFS="." ; print $0}'
which, according to POSIX does the following:
-F.: set the field separator FS to represent the <dot>-character
read a record
Perform field splitting with FS="."
$1="": redefine field 1 and rebuild record $0 using OFS. At this time, OFS is set to be a single space. If the record $0 was www.foo.com it now reads _foo_com (underscores represent spaces). Recompute the number of fields which are now only one as there is no FS available anymore.
OFS=".": redefine the output field separator OFS to be the <dot>-character. This is where the bug happens. The Gnu awk knew that a rebuild needed to happend, but did this already with the new OFS and not the old OFS.
**print $0':** print the record $0 which is now_foo_com`.
The minimal change to your program would be:
awk -F. '{OFS="."; $1=""; print $0}'
The clean change would be:
awk 'BEGIN{FS=OFS="."}{$1="";print $0}'
The perfect change would be to replace the awk and sed by the cut solution of Anubahuva
If you have a variable with that name in there, you could use:
var=www.foo.com
echo ${var#*.}

convert first column in a csv file from timestamp to year-month format

Trying to convert first column in a csv file from unix timestamp to date(year-month format)
Tried date -d #number'+%Y-%m' and awk, but awk doesn't recognize # when used together
Extract from a csv file :
1556113878,60662402644292
1554090396,59547403093308
Expected O/p
2019-04,60662402644292
2019-03,59547403093308
If you have GNU awk (sometimes called gawk), try:
gawk -F, '{print strftime("%Y-%m", $1),$2}' OFS=, file.csv
For example, consider this input file:
$ cat file.csv
1556113878,60662402644292
1554090396,59547403093308
Our command produces this output:
$ gawk -F, '{print strftime("%Y-%m", $1),$2}' OFS=, file.csv
2019-04,60662402644292
2019-03,59547403093308
On many Linux systems, GNU awk is the default. On others like Ubuntu, it is not but it can be easily installed: sudo apt-get install gawk. On MacOS, GNU awk can be installed via homebrew.
If you don't have GNU AWK, you may have a system Ruby, in which case you can do this:
▶ ruby -F, -ane \
'$F[0] = Time.at($F[0].to_i).strftime("%Y-%m"); print $F.join(",")' FILE
2019-04,60662402644292
2019-04,59547403093308
Further explanation:
Unlike Perl's POSIX::strftime, system Ruby should ship with the Time module. Thus my choice of Ruby.
The command line options are -F, is the same as AWK; -n is the same as sed; -a turns on AWK-like auto-split; -e is the same as sed.
$F is similar to AWK's $0 and $F[0] is similar to AWK's $1. $F[0].to_i converts the Epoch time string in the first field to an integer.

Find string in col 1, print col 2 in awk

I'm on a Mac, and I want to find a field in a CSV file adjacent to a search string
This is going to be a single file with a hard path; here's a sample of it:
84:a5:7e:6c:a6:b0, AP-ATC-151g84
84:a5:7e:6c:a6:b1, AP-A88-131g84
84:a5:7e:73:10:32, AP-AG7-133g56
84:a5:7e:73:10:30, AP-ADC-152g81
84:a5:7e:73:10:31, AP-D78-152e80
so if my search string is "84:a5:7e:73:10:32"
I want to get returned "AP-AG7-133g56"
I had been working within an Applescript, but maybe a shell script will do.
I just need the proper syntax for opening the file and having awk search it. Again, I'm weak conceptually on how shell commands run, how they must be executed, etc
This errors, gives me ("command not found"):
set the_file to "/Users/Paw/Desktop/AP-Decoder 3.app/Contents/Resources/BSSIDtable.csv"
set the_val to "70:56:81:cb:a2:dc"
do shell script "'awk $1 ~ the_val {print $2} the_file'"
Thank you for coddling me...
This is a relatively simple:
awk '$1 == "70:56:81:cb:a2:dc," {print "The answer is "$2}' 'BSSIDtable.csv'
(the "The answer is " text can be omitted if you only wish to see only the data, but this shows you how to get more user-friendly output if desired).
The comma is included since awk uses white space for separators so the comma becomes part of column 1.
If the thing you're looking for is in a shell variable, you can use -v to provide that to awk as an awk variable:
lookfor="70:56:81:cb:a2:dc,"
awk -v mac=$lookfor '$1 == mac {print "The answer is "$2}' 'BSSIDtable.csv'
As an aside, your AppleScript solution is probably not working because the $1/$2 are being interpreted as shell variable rather than awk variables. If you insist on using AppleScript, you will have to figure out how to construct a shell command that quotes the awk commands correctly.
My advice is to just use the shell directly, the number of people proficient in that almost certainly far outnumber those proficient in AppleScript :-)
if sed is available (normaly on mac, event if not tagged in OP)
simple but read all the file
sed -n 's/84:a5:7e:73:10:32,[[:blank:]]*//p' YourFile
quit after first occurence (so average of 50% faster on huge file)
sed -n -e '/84:a5:7e:73:10:32,[[:blank:]]*/!b' -e 's///p;q' YourFile
awk
awk '/^84:a5:7e:73:10:32/ {print $2}'
# OR using a variable for batch interaction
awk -v Src='84:a5:7e:73:10:32' '$1 == Src {print $2}'
# OR assuming that case is unknow
awk -v Src='84:a5:7e:73:10:32' 'BEGIN{IGNORECASE=1} $1 == Src {print $2}'
by default it take $0 as compare test if a regex is present, just add the ^ to take first field content

Awk double-slash record separator

I am trying to separate RECORDS of a file based on the string, "//".
What I've tried is:
awk -v RS="//" '{ print "******************************************\n\n"$0 }' myFile.gb
Where the "******" etc, is just a trace to show me that the record is split.
However, the file also contains / (by themselves) and my trace, ****** is being printed there as well meaning that awk is interpreting those also as my record separator.
How can I get awk to only split records on // ????
UPDATE: I am running on Unix (the one that comes with OS X)
I found a temporary solution, being:
sed s/"\/\/"/"*"/g | awk -v RS="*" ...
But there must be a better way, especially with massive files that I am working with.
On a Mac, awk version 20070501 does not support multi-character RS. Here's an illustration using such an awk, and a comparison (on the same machine) with gawk:
$ /usr/bin/awk --version
awk version 20070501
$ /usr/bin/awk -v RS="//" '{print NR ":" $0}' <<< x//y//z
1:x
2:
3:y
4:
5:z
$ gawk -v RS="//" '{print NR ":" $0}' <<< x//y//z
1:x
2:y
3:z
If you cannot find a suitable awk, then pick a better character than *. For example, if tabs are acceptable, and if your shell supports $'...', then you could use this incantation of sed:
sed $'s,//,\t,g'

Resources