awk expression that works on awk v4.0.2 but it does not on >= 4.2.1 - bash

I have this awk command:
echo www.host.com |awk -F. '{$1="";OFS="." ; print $0}' | sed 's/^.//'
which what it does is to get the domain from the hostname:
host.com
that command works on CentOS 7 (awk v 4.0.2), but it does not work on ubuntu 19.04 (awk 4.2.1) nor alpine (gawk 5.0.1), the output is:
host com
How could I fix that awk expression so it works in recent awk versions ?

For your provided samples could you please try following. This will try to match regex from very first . to till last of the line and then prints after first dot to till last of line.
echo www.host.com | awk 'match($0,/\..*/){print substr($0,RSTART+1,RLENGTH-1)}'
OP's code fix: In case OP wants to use his/her own tried code then following may help. There are 2 points here: 1st- We need not to use any other command along with awk to processing. 2nd- We need to set values of FS and OFS in BEGIN section which you are doing in everyline.
echo www.host.com | awk 'BEGIN{FS=OFS="."} {$1="";sub(/\./,"");print}'

To get the domain, use:
$ echo www.host.com | awk 'BEGIN{FS=OFS="."}{print $(NF-1),$NF}'
host.com
Explained:
awk '
BEGIN { # before processing the data
FS=OFS="." # set input and output delimiters to .
}
{
print $(NF-1),$NF # then print the next-to-last and last fields
}'
It also works if you have arbitrarily long fqdns:
$ echo if.you.have.arbitrarily.long.fqdns.example.com |
awk 'BEGIN{FS=OFS="."}{print $(NF-1),$NF}'
example.com
And yeah, funny, your version really works with 4.0.2. And awk version 20121220.
Update:
Updated with some content checking features, see comments. Are there domains that go higher than three levels?:
$ echo and.with.peculiar.fqdns.like.co.uk |
awk '
BEGIN {
FS=OFS="."
pecs["co\034uk"]
}
{
print (($(NF-1),$NF) in pecs?$(NF-2) OFS:"")$(NF-1),$NF
}'
like.co.uk

You got 2 very good answers on awk but I believe this should be handled with cut because of simplicity it offers in getting all fields starting for a known position:
echo 'www.host.com' | cut -d. -f2-
host.com
Options used are:
-d.: Set delimiter as .
-f2-: Extract all the fields starting from position 2

What you are observing was a bug in GNU awk which was fixed in release 4.2.1. The changlog states:
2014-08-12 Arnold D. Robbins
OFS being set should rebuild $0 using previous OFS if $0 needs to be
rebuilt. Thanks to Mike Brennan for pointing this out.
awk.h (rebuild_record): Declare.
eval.c (set_OFS): If not being called from var_init(), check if $0 needs rebuilding. If so, parse the record fully and rebuild it. Make OFS point to a separate copy of the new OFS for next time, since OFS_node->var_value->stptr was
already updated at this point.
field.c (rebuild_record): Is now extern instead of static. Use OFS and OFSlen instead of the value of OFS_node.
When reading the code in the OP, it states:
awk -F. '{$1="";OFS="." ; print $0}'
which, according to POSIX does the following:
-F.: set the field separator FS to represent the <dot>-character
read a record
Perform field splitting with FS="."
$1="": redefine field 1 and rebuild record $0 using OFS. At this time, OFS is set to be a single space. If the record $0 was www.foo.com it now reads _foo_com (underscores represent spaces). Recompute the number of fields which are now only one as there is no FS available anymore.
OFS=".": redefine the output field separator OFS to be the <dot>-character. This is where the bug happens. The Gnu awk knew that a rebuild needed to happend, but did this already with the new OFS and not the old OFS.
**print $0':** print the record $0 which is now_foo_com`.
The minimal change to your program would be:
awk -F. '{OFS="."; $1=""; print $0}'
The clean change would be:
awk 'BEGIN{FS=OFS="."}{$1="";print $0}'
The perfect change would be to replace the awk and sed by the cut solution of Anubahuva
If you have a variable with that name in there, you could use:
var=www.foo.com
echo ${var#*.}

Related

Find string in col 1, print col 2 in awk

I'm on a Mac, and I want to find a field in a CSV file adjacent to a search string
This is going to be a single file with a hard path; here's a sample of it:
84:a5:7e:6c:a6:b0, AP-ATC-151g84
84:a5:7e:6c:a6:b1, AP-A88-131g84
84:a5:7e:73:10:32, AP-AG7-133g56
84:a5:7e:73:10:30, AP-ADC-152g81
84:a5:7e:73:10:31, AP-D78-152e80
so if my search string is "84:a5:7e:73:10:32"
I want to get returned "AP-AG7-133g56"
I had been working within an Applescript, but maybe a shell script will do.
I just need the proper syntax for opening the file and having awk search it. Again, I'm weak conceptually on how shell commands run, how they must be executed, etc
This errors, gives me ("command not found"):
set the_file to "/Users/Paw/Desktop/AP-Decoder 3.app/Contents/Resources/BSSIDtable.csv"
set the_val to "70:56:81:cb:a2:dc"
do shell script "'awk $1 ~ the_val {print $2} the_file'"
Thank you for coddling me...
This is a relatively simple:
awk '$1 == "70:56:81:cb:a2:dc," {print "The answer is "$2}' 'BSSIDtable.csv'
(the "The answer is " text can be omitted if you only wish to see only the data, but this shows you how to get more user-friendly output if desired).
The comma is included since awk uses white space for separators so the comma becomes part of column 1.
If the thing you're looking for is in a shell variable, you can use -v to provide that to awk as an awk variable:
lookfor="70:56:81:cb:a2:dc,"
awk -v mac=$lookfor '$1 == mac {print "The answer is "$2}' 'BSSIDtable.csv'
As an aside, your AppleScript solution is probably not working because the $1/$2 are being interpreted as shell variable rather than awk variables. If you insist on using AppleScript, you will have to figure out how to construct a shell command that quotes the awk commands correctly.
My advice is to just use the shell directly, the number of people proficient in that almost certainly far outnumber those proficient in AppleScript :-)
if sed is available (normaly on mac, event if not tagged in OP)
simple but read all the file
sed -n 's/84:a5:7e:73:10:32,[[:blank:]]*//p' YourFile
quit after first occurence (so average of 50% faster on huge file)
sed -n -e '/84:a5:7e:73:10:32,[[:blank:]]*/!b' -e 's///p;q' YourFile
awk
awk '/^84:a5:7e:73:10:32/ {print $2}'
# OR using a variable for batch interaction
awk -v Src='84:a5:7e:73:10:32' '$1 == Src {print $2}'
# OR assuming that case is unknow
awk -v Src='84:a5:7e:73:10:32' 'BEGIN{IGNORECASE=1} $1 == Src {print $2}'
by default it take $0 as compare test if a regex is present, just add the ^ to take first field content

Awk double-slash record separator

I am trying to separate RECORDS of a file based on the string, "//".
What I've tried is:
awk -v RS="//" '{ print "******************************************\n\n"$0 }' myFile.gb
Where the "******" etc, is just a trace to show me that the record is split.
However, the file also contains / (by themselves) and my trace, ****** is being printed there as well meaning that awk is interpreting those also as my record separator.
How can I get awk to only split records on // ????
UPDATE: I am running on Unix (the one that comes with OS X)
I found a temporary solution, being:
sed s/"\/\/"/"*"/g | awk -v RS="*" ...
But there must be a better way, especially with massive files that I am working with.
On a Mac, awk version 20070501 does not support multi-character RS. Here's an illustration using such an awk, and a comparison (on the same machine) with gawk:
$ /usr/bin/awk --version
awk version 20070501
$ /usr/bin/awk -v RS="//" '{print NR ":" $0}' <<< x//y//z
1:x
2:
3:y
4:
5:z
$ gawk -v RS="//" '{print NR ":" $0}' <<< x//y//z
1:x
2:y
3:z
If you cannot find a suitable awk, then pick a better character than *. For example, if tabs are acceptable, and if your shell supports $'...', then you could use this incantation of sed:
sed $'s,//,\t,g'

How do I pass a stored value as the column number parameter to edit in awk?

I have a .dat file with | separator and I want to change the value of the column which is defined by a number passed as argument and stored in a var. My code is
awk -v var="$value" -F'|' '{ FS = OFS = "|" } $1=="$id" {$"\{$var}"=8}1'
myfile.dat > tmp && mv tmp myfiletemp.dat
This changes the whole line to 8, obviously doesn't work. I was wondering what is the right way to write this part
{$"\{$var}"=8}1
For example, if I want to change the fourth column to 8 and I have value=4, how do I get {$4=8}?
The other answer is mostly correct, but just wanted to add a couple of notes, in case it wasn't totally clear.
Referring to a variable with a $ in front of it turns it in to a reference to the column. So i=3; print $i; print i will print the third column and then the number 3.
Putting all your variables in the command line will avoid any problems with trying to include bash variables inside your single-quoted awk code, which won't work.
You can let awk do the output to the specific file instead of relying on bash to redirect output and move files.
The -F option on the command line specifies FS for you, so no need to redeclare it in your code.
Here's how I would do this:
#!/bin/bash
column=4
value=8
id=1
awk -v col="$column" -v val="$value" -v id="$id" -F"|" '
BEGIN {OFS="|"}
{$1==id && $col=val; print > "myfiletemp.dat"}
' myfile.dat
you can refer to the awk variable directly by it's name, slight rewrite of your script with correct reference to column number var...
awk -F'|' -v var="$value" 'BEGIN{OFS=FS} $1=="$id"{$var=8}1'
should work as long as $value is a number. If id is another bash variable, pass it the same way as an awk variable
awk -F'|' -v var="$value" -v id="$id" 'BEGIN{OFS=FS} $1==id{$var=8}1'
Not only can you use a number in a variable by putting a $ in front of it, you can also use put a $ in front of an expression!
$ date | tee /dev/stderr | awk '{print $(2+2)}'
Mon Aug 3 12:47:39 CDT 2020
12:47:39

Using a multi-character field separator in awk on Solaris

I wish to use a string (BIRCH) as a field delimiter in awk to print second field. I am trying the following command:
cat tmp.log|awk -FBirch '{ print $2}'
Below output is getting printed:
irch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Desired output:
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Contents of tmp.log file.
-bash-3.2# cat tmp.log
Dec 05 13:49:23 [x.x.x.x.180.100] business-log-dev/int [TEST][0x80000001][business-log][info] mpgw(Test): trans(8497187)[request][10.x.x.x]:
Birch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Am I doing something wrong?
OS: Solaris10
Shell: Bash
Tried below command suggested in one of the ansers below. I am getting the desired output, but with an extra empty line at the top. How can this be eliminated from the output?
-bash-3.2# /usr/xpg4/bin/awk -FBirch '{print $2}' tmp.log
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Originally, I suggested putting quotes around "Birch" (-F'Birch') but actually, I don't think that should make any difference.
I'm not at all experienced working with Solaris but you may want to also try using nawk ("new awk") instead of awk.
nawk -FBirch '{print $2}' file
If this works, you may want to consider creating an alias so that you always use the newer version of awk with more features.
You may also want to try using the version of awk in the /usr/xpg4/bin directory, which is a POSIX compliant implementation so should support multi-character FS:
/usr/xpg4/bin/awk -FBirch '{print $2}' file
If you only want to print lines which have more than one field, you can add a condition:
/usr/xpg4/bin/awk -FBirch 'NF>1{print $2}' file
This only prints the second field when there is more than one field.
From the man page of the default awk on solaris usr/bin/awk
-Fc Uses the character c as the field separator
(FS) character. See the discussion of FS
below.
As you can see solaris awk only takes a single character as a Field separator
Also in the man page is split
split(s, a, fs)
Split the string s into array elements a[1], a[2], ...
a[n], and returns n. The separation is done with the
regular expression fs or with the field separator FS if
fs is not given.
As you can see here it takes a regular expression as a separator so we can use.
awk 'split($0,a,"Birch"){print a[2]}' file
To print the second field split by Birch

How to Extract text between a string and a character?

I have those lines in my text file :
msg_wdraw[] = "whatever a sentence here,"
"This is the second part of this text1 ."
msg_sp2million[] = "whatever a sentence here,"
"This is the second part of this text2."
I need the sentence between msg_sp2million and the period "." and print them out.
i.e ("whatever a sentence here,"
"This is the second part of this text2.")
I tried this : sed -n "/msg_sp2million/,/./p" filename.txt
However, this sed command also returns me the value of msg_wdraw (the first variable)
I also tried awk, grep, other sed..... but failed eventually.
How can I fix this problem ? And Why this returns me not only the value of msg_sp2million and also the value of msg_wdraw ?
Please help # ~ #
Maybe something like this:
awk '/msg_sp2million/{ split($0,a,"="); print a[length(a)]; getline; print}' file.txt
Match regexp, print what comes after the =, get next line, and print that too.
Returns:
"whatever a sentence here,"
"This is the second part of this text2."
Using simple awk command:
awk -F '= *' -v RS='.' -v ORS='."\n' '$1 ~ /msg_sp2million/ {sub(/" *\n */, "\" ", $2);
print $2}' file
"whatever a sentence here," "This is the second part of this text2."
I'm unable to add my solution (a POSIX-compliant derivative of qwwqwwq's solution, referred as qww below) as a comment. So, qww's solution works, but ONLY in GNU awk from a certain version onward (apparently 3.1.5, see also http://awk.freeshell.org/AwkFeatureComparison).
Tip: Try
awk -W posix '/msg_sp2million/{ split($0,a,"="); print a[length(a)]; getline; print}' file.txt
in a non-GNU environment and you will 99% sure get an error message, e. g. about using an array in a scalar context.
The following solution should also work on a HP-UX workstation:
(well, the -W posix may be omitted of course, but is always invaluable while in testing stage)
awk -W posix '/msg_sp2million/{ amount=split($0,a,"="); print a[amount]; getline; print}' file.txt

Resources