Merging awk and cut into one command - bash

My line is:
var1="_source.statistics.test1=AAAAA;;;_source.statistics.test2=BBBB;;;_source.statistics.test3=CCCCC"
awk -F ";;;" '{print $1}' <<<$var1 | cut -d= -f2
AAAAA
awk -F ";;;" '{print $2}' <<<$var1 | cut -d= -f2
BBBB
How can I get to the same result using only AWK?

Awk lets you split a field on another delimiter.
awk -F ";;;" '{split($1, a, /=/); print a[2] }'
However, perhaps a more fruitful approach would be to transform this horribly hostile input format to something a little bit more normal, and take it from there with standard tools.
sed 's/;;;/\
/g' inputfile | ...

Could you please try following, within single awk by making use of field separator -F setting it as either = or ; for each line passed to awk.
echo "$var1" | awk -F'=|;' '{print $2}'
AAAAA
echo "$var1" | awk -F'=|;' '{print $6}'
BBBB
OR
echo "$var1" | awk -F"=|;;;" '{print $2}'
AAAAA
echo "$var1" | awk -F"=|;;;" '{print $4}'
BBBB
Considering that you need these output for variables, if yes then you could use it by sed and placing its values in an array and later could make use of it. IMHO this is why arrays are built to save our time of creating N numbers of variables.
Creation of an array with sed:
array=( $(echo "$var1" | sed 's/\([^=]*\)=\([^;]*\)\([^=]*\)=\([^;]*\)\(.*\)/\2 \4/' ) )
Creating of an array with awk:
array=( $(echo "$var1" | awk -F"=|;;;" '{print $2,$4}') )
Above will create an array with values of AAAAA and BBBB now to fetch it you could use.
for i in {0..1}; do echo "$i : ${array[$i]}"; done
0 : AAAAA
1 : BBBB
I have used for loop for your understanding of it, one could use directly array[0] for AAAAA or array[1] for BBBB.

Whenever you have name/tag=val input data it's useful to create an array of tag-value pairs so you can just print or do whatever else you like with the data by it's tags, e.g.:
$ awk -F';;;|=' '{for (i=1; i<NF; i+=2) f[$i]=$(i+1); print f["_source.statistics.test1"]}' <<<"$var1"
AAAAA
$ awk -F';;;|=' '{for (i=1; i<NF; i+=2) f[$i]=$(i+1); print f["_source.statistics.test3"], f["_source.statistics.test2"]}' <<<"$var1"
CCCCC BBBB

Related

How to grab fields in inverted commas

I have a text file which contains the following lines:
"user","password_last_changed","expires_in"
"jeffrey","2021-09-21 12:54:26","90 days"
"root","2021-09-21 11:06:57","0 days"
How can I grab two fields jeffrey and 90 days from inverted commas and save in a variable.
If awk is an option, you could save an array and then save the elements as individual variables.
$ IFS="\"" read -ra var <<< $(awk -F, '/jeffrey/{ print $1, $NF }' input_file)
$ $ var2="${var[3]}"
$ echo "$var2"
90 days
$ var1="${var[1]}"
$ echo "$var1"
jeffrey
while read -r line; do # read in line by line
name=$(echo $line | awk -F, ' { print $1} ' | sed 's/"//g') # grap first col and strip "
expire=$(echo $line | awk -F, ' { print $3} '| sed 's/"//g') # grap third col and strip "
echo "$name" "$expire" # do your business
done < yourfile.txt
IFS=","
arr=( $(cat txt | head -2 | tail -1 | cut -d, -f 1,3 | tr -d '"') )
echo "${arr[0]}"
echo "${arr[1]}"
The result is into an array, you can access to the elements by index.
May be this below method will help you using
sed and awk command
#!/bin/sh
username=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $1}')
echo "$username"
expires_in=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $3}')
echo "$expires_in"
Output :
jeffrey
90 days
Note :
This above method will work if their is only distinct username
As far i know username are not duplicate

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

awk script read line matching a pattern and result output with comma separated

help with shell script to read pattern in comma separated line and end result output should again be in comma separated. In below eg, read line separated by commas and output only puppet strings again separated by commas.
echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" | script
output required:
docker-one,docker-three
awk to the rescue!
echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk 'BEGIN{RS=ORS=","} /puppet/'
puppet-one,puppet-two,puppet-four
for docker, and replacing the last comma
echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk 'BEGIN{RS=ORS=","} /docker/' |
sed 's/,$/\n/'
docker-one,docker-three
or, if you meant non puppet
echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk 'BEGIN{RS=ORS=","} !/puppet/' |
sed 's/,$/\n/'
docker-one,docker-three
It sounds like one of these might be what you're looking for:
$ echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk -F, '{for (i=1;i<=NF;i++) if ($i ~ /puppet/) printf "%s%s", (c++?FS:""), $i; print ""}'
puppet-one,puppet-two,puppet-four
$ echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk -F, '{for (i=1;i<=NF;i++) if ($i !~ /puppet/) printf "%s%s", (c++?FS:""), $i; print ""}'
docker-one,docker-three
$ echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" |
awk -F, '{for (i=1;i<=NF;i++) if ($i ~ /docker/) printf "%s%s", (c++?FS:""), $i; print ""}'
docker-one,docker-three
Using native bash regEx operator ~ and using GNU paste for csv formatting alone,
IFS="," read -ra myArray <<<"docker-one,puppet-one,puppet-two,docker-three,puppet-four"
for i in "${myArray[#]}";do [[ $i =~ ^puppet ]] && echo "$i" ; done | paste -sd ','
produces an output as
puppet-one,puppet-two,puppet-four
and for the other strings than the ones starting with puppet, do a negative regex match,
for i in "${myArray[#]}";do [[ ! $i =~ ^puppet ]] && echo "$i" ; done | paste -sd ','
docker-one,docker-three
Using tr, grep, and paste:
$ echo "docker-one,puppet-one,puppet-two,docker-three,puppet-four" \
| tr , '\n' | grep -v puppet | paste -s -d , -
docker-one,docker-three

getting command output assigned to variable (BASH)

I'm running a command which basically parses some JSON and then extracts an ID using awk and sed.
When I run the command on its own it give the correct output eg
cat CustomThemeProfile.json | sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | awk -F ":" '{print $0}' | grep id | awk -F ":" '{print $2}' | sed 's/\"//g'
2F13F732-4BCB-49DC-A0FB-C91B5DE58472
But when I want to assign the output to a variable I get nothing returned. eg
cat CustomThemeProfile.json | id=$(sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | awk -F ":" '{print $0}' | grep id | awk -F ":" '{print $2}' | sed 's/\"//g'); echo $id
Any ideas. I really want this to be ran from a script but for the moment the script just does nothing, sits waiting for something?
Script i'm calling from.
First script just finds all json files and then calls this script. so the file is passed
#!/bin/bash
echo "running search and replace script ..."
id="$(sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | awk -F ":" '{print $0}' | grep id | awk -F ":" '{print $2}' | sed 's/\"//g')"
filler="0-0000-0000-0000-000000000000"
echo $id
if [ $(#id) -ge 8]; then echo "New Profile ID in use"; exit
else idnew=$id$filler
fi
sed -i '"s/$id/$idnew/g"' $1
sed -i 's/ps_hpa/ps_hpa/g' $1
You need to rearrange your syntax a little bit:
id=$(sed -e 's/[{}]/''/g' CustomThemeProfile.json | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | awk -F ":" '{print $0}' | grep id | awk -F ":" '{print $2}' | sed 's/\"//g')
Notice I am avoiding a useless use of cat and passing the file directly to sed. This is why your script does nothing - sed is waiting for some input. It would be possible to move cat inside the command substitution but there's no advantage to doing so. If a tool is capable of reading a file itself, then you should use that capability.
The better solution would be to parse your JSON properly, using jq for example. In order for us to show you how to do that, you should edit your question to show us a sample of your input.

Use Awk to extract substring

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,
echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'
While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)
I am asking in general, how to write a compatible awk script that performs the same functionality ...
You just want to set the field separator as . using the -F option and print the first field:
$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0
Same thing but using cut:
$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0
Or with sed:
$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0
Even grep:
$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0
Or just use cut:
echo aaa0.bbb.ccc | cut -d'.' -f1
I am asking in general, how to write a compatible awk script that
performs the same functionality ...
To solve the problem in your quesiton is easy. (check others' answer).
If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)
for example:
some awk works on string in terms of characters, some with bytes
some supports \x escape, some not
FS interpreter works differently
keywords/reserved words abbreviation restriction
some operator restriction e.g. **
even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
the implementation of certain functions are also different. (your problem is one example, see below)
well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.
There are two reasons why your awk line behaves differently on gawk and mawk:
your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.
gawk and mawk implemented substr() differently.
You don't need awk for this...
echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc
echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc
x=aaa0.bbb.ccc; echo ${x/.*/}
Heavier options:
sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc
You do not need any external command at all, just use Parameter Expansion in bash:
hostname=aaa0.bbb.ccc
echo ${hostname%%.*}
if you don't want to change the input field separator, then it's possible to use split function:
echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'
documentation:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep
and store the pieces in array and the separator
strings in the seps array.
awk is still the cleanest approach :
mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc
aaa0
If there's stuff before or after :
mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]' <<< 'some aaa0.bbb.ccc text'
aaa0

Resources