How to extract two pieces of data from a string - bash

I am trying to extract two pieces of data from a string and I have having a bit of trouble. The string is formatted like this:
11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd
What I am trying to achieve is to print the first column (11111111-2222:3333:4444:555555555555) and the third section of the colon string (cccccccc), on the same line with a space between the two, as the first column is an identifier. Ideally in a way that can just be run as one-line from the terminal.
I have tried using cut and awk but I have yet to find a good way to make this work.

How about a sed expression like this?
echo "11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd" |
sed -e "s/\(.*\) .*:.*:\(.*\):.*/\1 \2/"
Result:
11111111-2222:3333:4444:555555555555 cccccccc

The following awk script does the job without relying on the format of the first column.
awk -F: 'BEGIN {RS=ORS=" "} NR==1; NR==2 {print $3}'
Use it in a pipe or pass the string as a file (simply append the filename as an argument) or as a here-string (append <<< "your string").
Explanation:
Instead of lines this awk script splits the input into space-separated records (RS=ORS=" "). Each record is subdivided into :-separated fields (-F:). The first record will be printed as is (NR==1;, that's the same as NR==1 {print $0}). In the second record, we will only print the 3rd field (NR==2 {print {$3}}); in case of the record aaa:bbb:ccc:ddd the 3rd field is ccc.

I think the answer from user803422 is better but here's another option. Maybe it'll help you use cut in the future.
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
first=$(echo "$str" | cut -d ' ' -f1)
second=$(echo "$str" | cut -d ':' -f6)
echo "$first $second"

With pure Bash Regex:
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
echo "$([[ $str =~ (.*\ ).*:.*:([^:]*) ]])${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Explanations:
[[ $str =~ (.*\ ).*:.*:([^:]* ]]: Match $str against the POSIX Extended RegEx (.*\ ).*:.*:([^:]*) witch contains two capture groups: 1: (.*\ ) 0 or more of any characters, followed by a space; and capture group 2: ([^:]*) witch contains any number of characters that are not :.
$([[ $str =~ (.*\ ).*:.*:([^:]*) ]]): execute the RegEx match in a sub-shell during the string value expansion. (here it produces no output, but the RegEx captured groups are referenced later).
${BASH_REMATCH[1]}${BASH_REMATCH[2]}: expand the content of the RegEx captured groups that Bash keeps in the dedicated $BASH_REMATCH array.

Related

Cut-n-paste while preserving last blank line for empty match (awk or sed)

I have a two-line "keyword=keyvalue" line pattern (selectively excised from systemd/networkd.conf file):
DNS=0.0.0.0
DNS=
and need the following 2-line answer:
0.0.0.0
But all attempts using sed or awk resulted in omitting the newline if the last line pattern matching resulted in an empty match.
EDIT:
Oh, one last thing, this multiline-follow-cut result has to be stored back into a bash variable containing this same 'last blank-line" as well, so this is a two-step operation of preserving last-blank-line
multiline prepending-cut-out before (or save content after) the equal = symbol while preserving a newline ... in case of an empty result (this is the key here). Or possibly jerry-rig a weak fixup to attach a new-line in case of an empty match result at the last line.
save the multi-line result back into a bash variable
sed Approach
When performing cut up to and include that matched character in bash shell, the sed will remove any blank lines having an empty pattern match:
raw="DNS=0.0.0.0
DNS=
"
rawp="$(printf "%s\n" "$raw")"
kvs="$(echo "$rawp"| sed -e '/^[^=]*=/s///')"
echo "result: '${kvs}'"
gives the result:
0.0.0.0
without the corresponding blank line.
awk Approach
Awk has the same problem:
raw="DNS=0.0.0.0
DNS=
"
rawp="$(printf "%s\n" "$raw")"
kvs="$(echo "$rawp"| awk -F '=' -v OFS="" '{$1=""; print}')"
echo "result: '${kvs}'"
gives the same answer (it removed the blank line).
Please Advise
Somehow, I need the following answer:
0.0.0.0
in form of a two-line output containing 0.0.0.0 and a blank line.
Other Observations Made
I also noticed that if I provided a 3-line data as followed (two with a keyvalue and middle one without a keyvalue:
DNS=0.0.0.0
DNS=
DNS=999.999.999.999
Both sed and awk provided the correct answer:
0.0.0.0
999.999.999.999
Weird, uh?
The above regex (both sed and awk) works for:
a one-line with its keyvalue,
any-line provided that any lines have its non-empty keyvalue, BUT
last line MUST have a keyvalue.
Just doesn't work when the last-line has an empty keyvalue.
:-/
You can use this awk:
raw="DNS=0.0.0.0
DNS=
"
awk -F= 'NF == 2 {print $2}' <<< "$raw"
0.0.0.0
Following cut should also work:
cut -d= -f2 <<< "${raw%$'\n'}"
0.0.0.0
To store output including trailing line breaks use read with process substitution:
IFS= read -rd '' kvs < <(awk -F= 'NF == 2 {print $2}' <<< "$raw")
declare -p kvs
declare -- s="0.0.0.0
"
Code Demo:

How can I interpret a string that contains decimal escape sequences?

I'm trying to parse the "parsable" ouput of the avahi-browse command for use in a shell script. e.g.
for i in $(avahi-browse -afkpt | awk -F';' '{print $4}') ; do <do something with $i> ; done
The output looks like:
+;br.vlan150;IPv4;Sonos-7828CAC5D944\064Bedroom;_sonos._tcp;local
I am particularly interested in the value of the 4th field, which is a "service name".
With the -p|--parsable flag, avahi-browse escapes the "service name" values.
For example 7828CAC5D944\064Bedroom, where \064 is a zero-padded decimal representation of the ASCII character '#'.
I just want 7828CAC5D944#Bedroom so I can, for example, use it as an argument to another command.
I can't quite figure out how to do this inside the shell.
I tried using printf, but that only seems to interpret octal escape sequences. e.g.:
# \064 is interpreted as 4
$ printf '%b\n' '7828CAC5D944\064Bedroom'
7828CAC5D9444Bedroom
How can I parse these values, converting any of the decimal escape sequences to their corresponding ASCII characters?
Assumptions:
there's a reason the -p flag cannot be removed (will removing -p generate a # instead of \064?)
the 4th field is to be further processed by stripping off all text up to and including a hyphen (-)
\064 is the only escaped decimal value we need to worry about (for now)
Since OP is already calling awk to process the raw data I propose we do the rest of the processing in the same awk call.
One awk idea:
awk -F';' '
{ n=split($4,arr,"-") # split field #4 based on a hyphen delimiter
gsub(/\\064/,"#",arr[n]) # perform the string replacement in the last arr[] entry
print arr[n] # print the newly modified string
}'
# or as a one-liner:
awk -F';' '{n=split($4,arr,"-");gsub(/\\064/,"#",arr[n]);print arr[n]}'
Simulating the avahi-browse call feeding into awk:
echo '+;br.vlan150;IPv4;Sonos-7828CAC5D944\064Bedroom;_sonos._tcp;local' |
awk -F';' '{n=split($4,arr,"-");gsub(/\\064/,"#",arr[n]);print arr[n]}'
This generates:
7828CAC5D944#Bedroom
And for the main piece of code I'd probably opt for a while loop, especially if there's a chance the avahi-browse/awk process could generate data with white space:
while read -r i
do
<do something with $i>
done < <(avahi-browse -afkpt | awk -F';' '{n=split($4,arr,"-");gsub(/\\064/,"#",arr[n]);print arr[n]}')
Using perl to do the conversion:
$ perl -pe 's/\\(\d+)/chr $1/ge' <<<"7828CAC5D944\064Bedroom"
7828CAC5D944#Bedroom
As part of your larger script, completely replacing awk:
while read -r i; do
# do something with i
done < <(avahi-browse -afkpt | perl -F';' -lane 'print $F[3] =~ s/\\(\d+)/chr $1/ger')

how to iterate over awk result

I have the following string that I want to retrieve a specific ID for eu-central-1 only:
ca-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd
so what I want as an output is: ami-bbbb
The way I am doing it right now is:
echo a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk -F',' '{ print $2 }' |
awk -F':' '{print $2}'
The problem with this approach is that I am explicity specifying that eu-central-1 is the second ($2) result for the first awk call, but sometimes they might in different order, so I might need to iterate over this result. Is it possible to achieve this in one line, and without knowing before hand in which place in the string eu-central-1:ami-bbbb will land?
Use grep like so:
echo your_string | grep -Po '\beu-central-1:\K[^,]+'
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
\b : Word boundary.
\K : Pretend that the match starts at this point. Specifically, ignore the preceding part of the regex when printing the match.
[^,]+ : Any characters that are not a comma, one or more occurrences.
SEE ALSO:
grep manual
I'd prefer grep as in Timur Shtatland's answer. But for completeness here is an alternative:
You can set awk's record separator (linebreak by default) and then only print that record starting with eu-central-1.
awk -F: -v RS=, '$1 == "eu-central-1" { print $2 }'
With GNU sed or OSX/BSD sed for -E:
$ sed -E 's/(^|.*,)eu-central-1:([^,]*).*/\2/' file
ami-bbbb
One sed idea:
id='eu-central-1'
# desired id in middle of input string:
echo 'a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd' | \
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p"
Where:
-En - enable extended regex support
^(.*,)* - [capture group #1] - matches start of line plus zero or more instances of characters ending with a comma (,)
^(.*,)*${id}: - capture group #1 followed by ${id} + :
([^,]*) - [capture group #2] - matches everything up to, but not including, the next comma (,)
(,.*)*$ - [capture group #3] - matches zero or more instances of comma followed by other characters to end of line
\2/p - print capture group #2
Alternatively, using a here-string to eliminate the pipe/sub-process call:
id='eu-central-1'
# desired id at start of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'eu-central-1:ami-bbbb,a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd'
# desired id at end of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd,eu-central-1:ami-bbbb'
All three generate:
ami-bbbb
Defining , as line (record) separator and : as field separator, a simple condition over $1 prints the result.
echo -n a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk 'BEGIN{RS=","; FS=":"}$1=="eu-central-1"{print $2}'

bash string manipulation - regex match with delimiter

I have a string like this:
zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A
The order inside the delimiter | can be random - that means the key-value pairs can be randomly ordered in the string.
I want an output string like the following:
"INTERNET","10.10.10.0/24","SCB-INET-A"
All values in the output are values from the key-value string above
Does anyone know how I can solve this with awk or sed?
Given your input is a variable var:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
echo "$var" | tr "|" "\n" | sed -n -r "s/(zone|name|gateway)=(.*)/\"\2\"/p"
"INTERNET"
"10.10.10.100"
"SCB-INET-A"
Using another 2 pipes inserts commas and removes line breaks:
SOFAR | tr "\n" "," | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
Whenever you have name -> value pairs in your input the best approach is to create an array of those mappings (f[] below) and then access the values by their names:
$ cat tst.awk
BEGIN { RS="|"; FS="[=\n]"; OFS="," }
{ f[$1] = "\"" $2 "\"" }
END { print f["zone"], f["CIDR"], f["name"] }
$ awk -f tst.awk file
"INTERNET","10.10.10.0/24","SCB-INET-A"
The above will work efficiently (i.e. literally orders of magnitude faster than a shell loop) and portably using any awk in any shell on any UNIX box, unlike all of the other answers so far which all rely on non-POSIX functionality. It does full string matching instead of partial regexp matching, like some of the other answers, so it is extremely robust and will not result in bad output given partial matches. It also will not interpret any input characters (e.g. escape sequences and/or globbing chars), like some of your other answers do, and instead will just robustly reproduce them as-is in the output.
If you need to enhance it to print any extra field values just add them as , f["<field name>"] to the print statement and if you need to change the output format or do anything else it's all absolutely trivial too.
Using awk:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|name=SCB-INET-A|inheritDNSRestrictions=true"
awk -v RS='|' -v ORS=',' -F= '$1~/zone|gateway|name/{print "\"" $2 "\""}' <<<"$var" | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
The input record separator RS is set to |.
The input field separator FS is set to =.
The output record separator ORS is set to ,.
$1~/zone|gateway|name/ is filtering the parameter to extract. The print statement is added double quote to the parameter value.
The sed statement is to remove the annoying last , (that the print statement is adding).
One more solution using Bash. Not the shortest but I hope it is the best readable and so the best maintainable.
#!/bin/bash
# Function split_key_val()
# selects values from a string with key-value pairs
# IN: string_with_key_value_pairs wanted_key_1 [wanted_key_2] ...
# OUT: result
function split_key_val {
local KEY_VAL_STRING="$1"
local RESULT
# read the string with key-value pairs into array
IFS=\| read -r -a ARRAY <<< "$KEY_VAL_STRING"
#
shift
# while there are wanted-keys ...
while [[ -n $1 ]]
do
WANTED_KEY="$1"
# Search the array for the wanted-key
for KEY_VALUE in "${ARRAY[#]}"
do
# the key is the part before "="
KEY=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=1)
# the value is the part after "="
VALUE=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=2)
if [[ $KEY == $WANTED_KEY ]]
then
# if result is empty; result= found value...
if [[ -z $RESULT ]]
then
# (quote the damned quotes)
RESULT="\"${VALUE}\""
else
# ... else add a comma as a separator
RESULT="${RESULT},\"${VALUE}\""
fi
fi # key == wanted-key
done # searched whole array
shift # prepare for next wanted-key
done
echo "$RESULT"
return 0
}
STRING="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
split_key_val "$STRING" zone CIDR name
The result is:
"INTERNET","10.10.10.0/24","SCB-INET-A"
without using more sophisticated text editing tools (as an exercise!)
$ tr '|' '\n' <file | # make it columnar
egrep '^(zone|CIDR|name)=' | # get exact key matches
cut -d= -f2 | # get values
while read line; do echo '"'$line'"'; done | # quote values
paste -sd, # flatten with comma
will give
"INTERNET","10.10.10.0/24","SCB-INET-A"
you can also replace while statement with xargs printf '"%s"\n'
Not using sed or awk but the Bash Arrays feature.
line="zone=INTERNET|sta=good|CIDR=10.10.10.0/24|a=1 1|...=...|name=SCB-INET-A"
echo "$line" | tr '|' '\n' | {
declare -A vars
while read -r item ; do
if [ -n "$item" ] ; then
vars["${item%%=*}"]="${item##*=}"
fi
done
echo "\"${vars[zone]}\",\"${vars[CIDR]}\",\"${vars[name]}\"" ; }
One advantage of this method is that you always get your fields in order independent of the order of fields in the input line.

Find and Replace with awk

I have this value, cutted from .txt:
,Request Id,dummy1,dummy2,dummyN
I am trying to find and replace the space with "_", like this:
#iterator to read lines of txt
#if conditions
trim_line=$(echo "$user" | awk '{gsub(" ", "_", $0); print}')
echo $trim_line
but the echo is showing:
Id,dummy1,dummy2,dummyN
Expected output:
,Request_Id,dummy1,dummy2,dummyN
Where is my bug?
EDIT:
The echo of user is not the expected, it is:
Id,dummy1,dummy2,dummyN
And should be:
,Request Id,dummy1,dummy2,dummyN
To do this operation I am using:
for user in $(cut -d: -f1 $FILENAME)
do (....) find/replace
You can try bash search and replace substring :
echo $user
,Request Id,dummy1,dummy2,dummyN
echo ${user// /_} ## For all the spaces
,Request_Id,dummy1,dummy2,dummyN
echo ${user/ /_} ## For first match
This will replace all the blank spaces with _. Note that here two / are used after user. This is to do the search and replace operation on whole text. If you put only one / then search and replace would be done over first match.
Your problem is your use of a for loop to read the contents of your file. The shell splits the output of your command substitution $(cut -d: -f1 $FILENAME) on white space and you have one in the middle of your line, so it breaks.
Use a while read loop to read the file line by line:
while IFS=: read -r col junk; do
col=${col// /_}
# use $col here
done < "$FILENAME"
As others have mentioned, there's no need to use an external tool to make the substitution.
...That said, if you don't plan on doing something different (e.g. executing other commands) with each line, then the best option is to use awk:
awk -F: '{ gsub(/ /, "_", $1); print $1 }' "$FILENAME"
The output of this command is the first column of your input file, with the substitution made.
If your data is already in an environment variable, the fastest way is to directly use built-in bash replacement feature:
echo "${user// /_/}"
With awk, set the separator as , or the space character will be interpreted as the separator.
echo ",Request Id,dummy1,dummy2,dummyN" | awk -F, '{gsub(" ", "_", $0); print}'
,Request_Id,dummy1,dummy2,dummyN
note: if it's just to replace a character in a raw string (no tokens, no fields), bash, sed and tr are best suited.

Resources