Use Awk to extract substring - bash

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,
echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'
While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)
I am asking in general, how to write a compatible awk script that performs the same functionality ...

You just want to set the field separator as . using the -F option and print the first field:
$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0
Same thing but using cut:
$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0
Or with sed:
$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0
Even grep:
$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0

Or just use cut:
echo aaa0.bbb.ccc | cut -d'.' -f1

I am asking in general, how to write a compatible awk script that
performs the same functionality ...
To solve the problem in your quesiton is easy. (check others' answer).
If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)
for example:
some awk works on string in terms of characters, some with bytes
some supports \x escape, some not
FS interpreter works differently
keywords/reserved words abbreviation restriction
some operator restriction e.g. **
even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
the implementation of certain functions are also different. (your problem is one example, see below)
well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.
There are two reasons why your awk line behaves differently on gawk and mawk:
your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.
gawk and mawk implemented substr() differently.

You don't need awk for this...
echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc
echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc
x=aaa0.bbb.ccc; echo ${x/.*/}
Heavier options:
sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc

You do not need any external command at all, just use Parameter Expansion in bash:
hostname=aaa0.bbb.ccc
echo ${hostname%%.*}

if you don't want to change the input field separator, then it's possible to use split function:
echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'
documentation:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep
and store the pieces in array and the separator
strings in the seps array.

awk is still the cleanest approach :
mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc
aaa0
If there's stuff before or after :
mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]' <<< 'some aaa0.bbb.ccc text'
aaa0

Related

Trim line to the first comma (bash)

I have a line from which I need to cut the branch name to the first comma:
commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)
I need to cut out MGB-322.
The number of characters in a line is always different.
awk -F "origin/" '{print $2}' - this is how I cut out
MGB-322, refs/pipelines/36877)
But how to tell it to trim to the first comma?
I tried doing it via substr,
awk -F "origin/" '{print substr ($2,1, index $2 ,)}'
But it is not clear how to correctly specify the comma in index
With any awk. Use / and , as field separator:
awk '{print $3}' FS='[/,]' file
Output:
MGB-322
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
With OP's code fix: considered that you have only occurrence of origin in case you have more than occurrence then change $NF to $2 in following code. Written and tested in https://ideone.com/xjv2we
awk -F"origin/" '{print $NF}' Input_file
sed could be also helpful here, generic solution it's based on first occurrence of comma and / as per OP's thread title. I have written this on mobile so couldn't test it as of now should with though and will test it after sometime.
sed 's/\([^,]*\),\([^/]*\)\/\(.*\)/\3/' Input_file
"I need to cut out MGB-322."
You can use cut in two steps:
echo "${line}" | cut -d"/" -f2 | cut -d"," -f1
I would prefer one step with awk (already anwered by others) or sed
echo "${line}" | sed -r 's/.*origin.(.*), refs.*/\1/'
Why spawn procs? bash's built-in parameter parsing will handle this.
If
$: line="commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)"
then
$: [[ "$line" =~ .*origin.(.*), ]] && echo "${BASH_REMATCH[1]}"
MGB-322
or maybe
$: tmp=${line#*, origin/}; echo ${tmp%,*}
MGB-322
or even
$: IFS=",/" read _ _ x _ <<< "$line" && echo $x
MGB-322
c.f. https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

How to: In bash print a value from a key/value pair

I need to print only the 900 in this line: auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900
However, this line will not always be in this order.
I need to find out how to print the value after the =.
I will need to do this for unlock_time and fail_interval
I have been searching all night for something that will work exactly for me and cannot find it. I have been toying around with sed and awk and have not nailed this down yet.
Let's define your string:
s='auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900'
Using awk:
$ printf %s "$s" | awk -F= '$1=="fail_interval"{print $2}' RS=' '
900
Or:
$ printf %s "$s" | awk -F= '$1=="unlock_time"{print $2}' RS=' '
604800
How it works
Awk divides its input into records. We tell it to use a space as the record separator. Each record is divided into fields. We tell awk to use = as the field separator. In more detail:
printf %s "$s"
This prints the string. printf is safer than echo in cases where the string might begin with -.
-F=
This tells awk to use = as the field separator.
$1=="fail_interval" {print $2}
If the first field is fail_interval, then we tell awk to print the second field.
RS=' '
This tells awk to use a space as the record separator.
You may use sed for this
Command
echo "...stuff.... unlock_time=604800 fail_interval=900" | sed -E '
s/^.*unlock_time=([[:digit:]]*).*fail_interval=([[:digit:]]*).*$/\1 \2/'
Output
604800 900
Notes
The (..) in sed is used for selections.
[[:digit:]]* or 0-9 is used to match any number of digits
The \1 and \2 is used to replace the matched stuff, in order.
Given an input variable:
input="auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900"
With GNU grep:
$ grep -oP 'fail_interval=\K([0-9]*)' <<< "$input"
900
$ grep -oP 'unlock_time=\K([0-9]*)' <<< "$input"
604800
Try using this.
unlock_time=$(echo "auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900" | awk -F'unlock_time=' '{print $2}' | awk '{print $1}')
echo "$unlock_time"
fail_interval=$(echo "auth required pam_faillock.so preauth silent deny=3 unlock_time=604800 fail_interval=900" | awk -F'fail_interval=' '{print $2}' | awk '{print $1}')
echo "$fail_interval"

Split String in Unix Shell Script

I have a String like this
//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
and want to get last part of
00000000957481f9-08d035805a5c94bf
Let's say you have
text="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
If you know the position, i.e. in this case the 9th, you can go with
echo "$text" | cut -d'/' -f9
However, if this is dynamic and your want to split at "/", it's safer to go with:
echo "${text##*/}"
This removes everything from the beginning to the last occurrence of "/" and should be the shortest form to do it.
For more information on this see: Bash Reference manual
For more information on cut see: cut man page
The tool basename does exactly that:
$ basename //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
00000000957481f9-08d035805a5c94bf
I would use bash string function:
$ string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
$ echo "${string##*/}"
00000000957481f9-08d035805a5c94bf
But following are some other options:
$ awk -F'/' '$0=$NF' <<< "$string"
00000000957481f9-08d035805a5c94bf
$ sed 's#.*/##g' <<< "$string"
00000000957481f9-08d035805a5c94bf
Note: <<< is herestring notation. They do not create a subshell, however, they are NOT portable to POSIX sh (as implemented by shells such as ash or dash).
In case you want more than just the last part of the path,
you could do something like this:
echo $PWD | rev | cut -d'/' -f1-2 | rev
You can use this BASH regex:
s='//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf'
[[ "$s" =~ [^/]+$ ]] && echo "${BASH_REMATCH[0]}"
00000000957481f9-08d035805a5c94bf
This can be done easily in awk:
string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
echo "${string}" | awk -v FS="/" '{ print $NF }'
Use "/" as field separator and print the last field.
You can try this...
echo //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf |awk -F "/" '{print $NF}'

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

How to replace the nth column/field in a comma-separated string using sed/awk?

assume I have a string
"1,2,3,4"
Now I want to replace, e.g. the 3rd field of the string by some different value.
"1,2,NEW,4"
I managed to do this with the following command:
echo "1,2,3,4" | awk -F, -v OFS=, '{$3="NEW"; print }'
Now the index for the column to be replaced should be passed as a variable. So in this case
index=3
How can I pass this to awk? Because this won't work:
echo "1,2,3,4" | awk -F, -v OFS=, '{$index="NEW"; print }'
echo "1,2,3,4" | awk -F, -v OFS=, '{$($index)="NEW"; print }'
echo "1,2,3,4" | awk -F, -v OFS=, '{\$$index="NEW"; print }'
Thanks for your help!
This might work for you:
index=3
echo "1,2,3,4" | awk -F, -v OFS=, -v INDEX=$index '{$INDEX="NEW"; print }'
or:
index=3
echo "1,2,3,4" | sed 's/[^,]*/NEW/'$index
Have the shell interpolate the index in the awk program:
echo "1,2,3,4" | awk -F, -v OFS=, '{$'$index'="NEW"; print }'
Note how the originally single quoted awk program is split in three parts, a single quoted beginning '{$', the interpolated index value, followed by the single quoted remainder of the program.
Here's a seductive way to break the awkwardness:
$ echo "1,2,3,4" | sed 's/,/\n/g' | sed -e $index's/.*/NEW/'
This is easily extendable to multiple indexes just by adding another -e $newindex's/.*/NEWNEW/'
# This should be faster than awk or sed.
str="1,2,3,4"
IFS=','
read -a f <<< "$str"
f[2]='NEW'
printf "${f[*]}"
With plain awk (I.E. Not gawk etc) I believe you'll have to use split( string, array, [fieldsep] ); change the array entry of choice and then join them back together with sprintf or similar in a loop.
gawk allows you to have a variable as a field name, $index in your example. See here.
gawk is usually the default awk on Linux, so change your invocation to gawk "script" and see if it works.

Resources