how to iterate over awk result - bash

I have the following string that I want to retrieve a specific ID for eu-central-1 only:
ca-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd
so what I want as an output is: ami-bbbb
The way I am doing it right now is:
echo a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk -F',' '{ print $2 }' |
awk -F':' '{print $2}'
The problem with this approach is that I am explicity specifying that eu-central-1 is the second ($2) result for the first awk call, but sometimes they might in different order, so I might need to iterate over this result. Is it possible to achieve this in one line, and without knowing before hand in which place in the string eu-central-1:ami-bbbb will land?

Use grep like so:
echo your_string | grep -Po '\beu-central-1:\K[^,]+'
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
\b : Word boundary.
\K : Pretend that the match starts at this point. Specifically, ignore the preceding part of the regex when printing the match.
[^,]+ : Any characters that are not a comma, one or more occurrences.
SEE ALSO:
grep manual

I'd prefer grep as in Timur Shtatland's answer. But for completeness here is an alternative:
You can set awk's record separator (linebreak by default) and then only print that record starting with eu-central-1.
awk -F: -v RS=, '$1 == "eu-central-1" { print $2 }'

With GNU sed or OSX/BSD sed for -E:
$ sed -E 's/(^|.*,)eu-central-1:([^,]*).*/\2/' file
ami-bbbb

One sed idea:
id='eu-central-1'
# desired id in middle of input string:
echo 'a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd' | \
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p"
Where:
-En - enable extended regex support
^(.*,)* - [capture group #1] - matches start of line plus zero or more instances of characters ending with a comma (,)
^(.*,)*${id}: - capture group #1 followed by ${id} + :
([^,]*) - [capture group #2] - matches everything up to, but not including, the next comma (,)
(,.*)*$ - [capture group #3] - matches zero or more instances of comma followed by other characters to end of line
\2/p - print capture group #2
Alternatively, using a here-string to eliminate the pipe/sub-process call:
id='eu-central-1'
# desired id at start of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'eu-central-1:ami-bbbb,a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd'
# desired id at end of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd,eu-central-1:ami-bbbb'
All three generate:
ami-bbbb

Defining , as line (record) separator and : as field separator, a simple condition over $1 prints the result.
echo -n a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk 'BEGIN{RS=","; FS=":"}$1=="eu-central-1"{print $2}'

Related

How do i get the value present in first double quotes?

I'm currently writing a bash script to get the first value among the many comma separated strings.
I have a file that looks like this -
name
things: "water bottle","40","new phone cover",10
place
I just need to return the value in first double quotes.
water bottle
The value in first double quotes can be one word/two words. That is, water bottle can be sometimes replaced with pen.
I tried -
awk '/:/ {print $2}'
But this just gives
water
I wanted to comma separate it, but there's colon(:) after things. So, I'm not sure how to separate it.
How do i get the value present in first double quotes?
EDIT:
SOLUTION:
I used the below code since I particularly wanted to use awk -
awk '/:/' test.txt | cut -d\" -f2
A solution using the cut utility could be
cut -d\" -f2 infile > outfile
Using gnu awk you could make use of a capture group, and use a negated character class to not cross the , as that is the field delimiter.
awk 'match($0, /^[^",:]*:[^",]*"([^"]*)"/, a) {print a[1]}' file
Output
water bottle
The pattern matches
^ Start of string
[^",:]*:Optionally match any value except " and , and :, then match :
[^",]* Optionally match any value except " and ,
"([^"]*)" Capture in group 1 the value between double quotes
If the value is always between double quotes, a short option to get the desired result could be setting the field separator to " and check if group 1 contains a colon, although technically you can also get water bottle if there is only a leading double quote and not closing one.
awk -F'"' '$1 ~ /:/ {print $2}' file
With your shown samples, please try following awk code.
awk '/^things:/ && match($0,/"[^"]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Explanation: In awk program checking if line starts with things: AND using match function to match everything between 1st and 2nd " and printing them accordingly.
Solution 1: awk
You can use a single awk command:
awk -F\" 'index($1, ":"){print $2}' test.txt > outfile
See the online demo.
The -F\" sets the field separator to a " char, index($1, ":") condition makes sure Field 1 contains a : char (no regex needed) and then {print $2} prints the second field value.
Solution 2: awk + cut
You can use awk + cut:
awk '/:/' test.txt | cut -d\" -f2 > outfile
With awk '/:/' test.txt, you will extract line(s) containing : char, and then the piped cut -d\" -f2 command will split the string with " as a separator and return the second item. See the online demo.
Solution 3: sed
Alternatively, you can use sed:
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' file > outfile
See the online demo:
#!/bin/bash
s='name
things: "water bottle","40","new phone cover",10
place'
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' <<< "$s"
# => water bottle
The command means
-n - the option suppresses the default line output
^[^"]*"\([^"]*\)".* - a POSIX BRE regex pattern that matches
^ - start of string
[^"]* - zero or more chars other than "
" - a " char
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
".* - a " char and the rest of the string.
\1 replaces the match with Group 1 value
p - only prints the result of a successful substitution.

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

How to extract two pieces of data from a string

I am trying to extract two pieces of data from a string and I have having a bit of trouble. The string is formatted like this:
11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd
What I am trying to achieve is to print the first column (11111111-2222:3333:4444:555555555555) and the third section of the colon string (cccccccc), on the same line with a space between the two, as the first column is an identifier. Ideally in a way that can just be run as one-line from the terminal.
I have tried using cut and awk but I have yet to find a good way to make this work.
How about a sed expression like this?
echo "11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd" |
sed -e "s/\(.*\) .*:.*:\(.*\):.*/\1 \2/"
Result:
11111111-2222:3333:4444:555555555555 cccccccc
The following awk script does the job without relying on the format of the first column.
awk -F: 'BEGIN {RS=ORS=" "} NR==1; NR==2 {print $3}'
Use it in a pipe or pass the string as a file (simply append the filename as an argument) or as a here-string (append <<< "your string").
Explanation:
Instead of lines this awk script splits the input into space-separated records (RS=ORS=" "). Each record is subdivided into :-separated fields (-F:). The first record will be printed as is (NR==1;, that's the same as NR==1 {print $0}). In the second record, we will only print the 3rd field (NR==2 {print {$3}}); in case of the record aaa:bbb:ccc:ddd the 3rd field is ccc.
I think the answer from user803422 is better but here's another option. Maybe it'll help you use cut in the future.
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
first=$(echo "$str" | cut -d ' ' -f1)
second=$(echo "$str" | cut -d ':' -f6)
echo "$first $second"
With pure Bash Regex:
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
echo "$([[ $str =~ (.*\ ).*:.*:([^:]*) ]])${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Explanations:
[[ $str =~ (.*\ ).*:.*:([^:]* ]]: Match $str against the POSIX Extended RegEx (.*\ ).*:.*:([^:]*) witch contains two capture groups: 1: (.*\ ) 0 or more of any characters, followed by a space; and capture group 2: ([^:]*) witch contains any number of characters that are not :.
$([[ $str =~ (.*\ ).*:.*:([^:]*) ]]): execute the RegEx match in a sub-shell during the string value expansion. (here it produces no output, but the RegEx captured groups are referenced later).
${BASH_REMATCH[1]}${BASH_REMATCH[2]}: expand the content of the RegEx captured groups that Bash keeps in the dedicated $BASH_REMATCH array.

output csv with lines that contains only one column

with input csv file
sid,storeNo,latitude,longitude
2,1,-28.03720000,153.42921670
9
I wish to output only the lines with one column, in this example it's line 3.
how can this be done in bash shell script?
Using awk
The following awk would be usfull
$ awk -F, 'NF==1' inputFile
9
What it does?
-F, sets the field separator as ,
NF==1 matches lines with NF, number of fields as 1. No action is provided hence default action, printing the entire record is taken. it is similar to NF==1{print $0}
inputFile input csv file to the awk script
Using grep
The same function can also be done using grep
$ grep -v ',' inputFile
9
-v option prints lines that do not match the pattern
, along with -v greps matches lines that do not contain , field separator
Using sed
$ sed -n '/^[^,]*$/p' inputFile
9
what it does?
-n suppresses normal printing of pattern space
'/^[^,]*$/ selects lines that match the pattern, lines without any ,
^ anchors the regex at the start of the string
[^,]* matches anything other than ,
$ anchors string at the end of string
p action p makes sed to print the current pattern space, that is pattern space matching the input
try this bash script
#!/bin/bash
while read -r line
do
IFS=","
set -- $line
case ${#} in
1) echo $line;;
*) continue;;
esac
done < file

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Resources