Extract string between qoutes in a script - bash

my text-
(
"en-US"
)
what i need -
en-US
currently im able to get it by piping it with
... | tr -d '[:space:]' | sed s/'("'// | sed s/'("'// | sed s/'")'//
I wonder if there is a simple way to extract the string between the qoutes rather than chopping off useless parts one by one

... | grep -oP '(?<=").*(?=")'
Explanation:
-o: Only output matching string
-P: Use Perl style RegEx
(?<="): Lookbehind, so only match text that is preceded by a double quote
.*: Match any characters
(?="): Lookahead, so only match text that is followed by a double quote

With sed
echo '(
"en-US"
)' | sed -rn 's/.*"(.*)".*/\1/p'
with 2 commands
echo '(
"en-US"
)' | tr -d "\n" | cut -d '"' -f2

Could you please try following. Where var is the bash variable haveing shown sample value stored in it.
echo "$var" | awk 'match($0,/".*"/){print substr($0,RSTART+1,RLENGTH-2)}'
Explanation: Following is only for explanation purposes.
echo "$var" | ##Using echo to print variable named var and using |(pipe) to send its output to awk command as an Input.
awk ' ##Starting awk program from here.
match($0,/".*"/){ ##using match function of awk to match a regex which is to match from till next occurrence of " by this match 2 default variables named RSTART and RLENGTH will be set as per values.
print substr($0,RSTART+1,RLENGTH-2) ##Where RSTART means starting point index of matched regex and RLENGTH means matched regex length, here printing sub-string whose starting point is RSTART and ending point of RLENGTH to get only values between " as per request.
}' ##Closing awk command here.

Consider using
... | grep -o '"[^"]\{1,\}"' | sed -e 's/^"//' -e 's/"$//'
grep will extract all substrings between quotes (excluding empty ones), the sed later will remove the quotes on both ends.

And this one ?
... | grep '"' | cut -d '"' -f 2
It works if you have just 1 quoted value by line.

Related

How do i get the value present in first double quotes?

I'm currently writing a bash script to get the first value among the many comma separated strings.
I have a file that looks like this -
name
things: "water bottle","40","new phone cover",10
place
I just need to return the value in first double quotes.
water bottle
The value in first double quotes can be one word/two words. That is, water bottle can be sometimes replaced with pen.
I tried -
awk '/:/ {print $2}'
But this just gives
water
I wanted to comma separate it, but there's colon(:) after things. So, I'm not sure how to separate it.
How do i get the value present in first double quotes?
EDIT:
SOLUTION:
I used the below code since I particularly wanted to use awk -
awk '/:/' test.txt | cut -d\" -f2
A solution using the cut utility could be
cut -d\" -f2 infile > outfile
Using gnu awk you could make use of a capture group, and use a negated character class to not cross the , as that is the field delimiter.
awk 'match($0, /^[^",:]*:[^",]*"([^"]*)"/, a) {print a[1]}' file
Output
water bottle
The pattern matches
^ Start of string
[^",:]*:Optionally match any value except " and , and :, then match :
[^",]* Optionally match any value except " and ,
"([^"]*)" Capture in group 1 the value between double quotes
If the value is always between double quotes, a short option to get the desired result could be setting the field separator to " and check if group 1 contains a colon, although technically you can also get water bottle if there is only a leading double quote and not closing one.
awk -F'"' '$1 ~ /:/ {print $2}' file
With your shown samples, please try following awk code.
awk '/^things:/ && match($0,/"[^"]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Explanation: In awk program checking if line starts with things: AND using match function to match everything between 1st and 2nd " and printing them accordingly.
Solution 1: awk
You can use a single awk command:
awk -F\" 'index($1, ":"){print $2}' test.txt > outfile
See the online demo.
The -F\" sets the field separator to a " char, index($1, ":") condition makes sure Field 1 contains a : char (no regex needed) and then {print $2} prints the second field value.
Solution 2: awk + cut
You can use awk + cut:
awk '/:/' test.txt | cut -d\" -f2 > outfile
With awk '/:/' test.txt, you will extract line(s) containing : char, and then the piped cut -d\" -f2 command will split the string with " as a separator and return the second item. See the online demo.
Solution 3: sed
Alternatively, you can use sed:
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' file > outfile
See the online demo:
#!/bin/bash
s='name
things: "water bottle","40","new phone cover",10
place'
sed -n 's/^[^"]*"\([^"]*\)".*/\1/p' <<< "$s"
# => water bottle
The command means
-n - the option suppresses the default line output
^[^"]*"\([^"]*\)".* - a POSIX BRE regex pattern that matches
^ - start of string
[^"]* - zero or more chars other than "
" - a " char
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
".* - a " char and the rest of the string.
\1 replaces the match with Group 1 value
p - only prints the result of a successful substitution.

how to iterate over awk result

I have the following string that I want to retrieve a specific ID for eu-central-1 only:
ca-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd
so what I want as an output is: ami-bbbb
The way I am doing it right now is:
echo a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk -F',' '{ print $2 }' |
awk -F':' '{print $2}'
The problem with this approach is that I am explicity specifying that eu-central-1 is the second ($2) result for the first awk call, but sometimes they might in different order, so I might need to iterate over this result. Is it possible to achieve this in one line, and without knowing before hand in which place in the string eu-central-1:ami-bbbb will land?
Use grep like so:
echo your_string | grep -Po '\beu-central-1:\K[^,]+'
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only, 1 match/line, not the entire lines.
\b : Word boundary.
\K : Pretend that the match starts at this point. Specifically, ignore the preceding part of the regex when printing the match.
[^,]+ : Any characters that are not a comma, one or more occurrences.
SEE ALSO:
grep manual
I'd prefer grep as in Timur Shtatland's answer. But for completeness here is an alternative:
You can set awk's record separator (linebreak by default) and then only print that record starting with eu-central-1.
awk -F: -v RS=, '$1 == "eu-central-1" { print $2 }'
With GNU sed or OSX/BSD sed for -E:
$ sed -E 's/(^|.*,)eu-central-1:([^,]*).*/\2/' file
ami-bbbb
One sed idea:
id='eu-central-1'
# desired id in middle of input string:
echo 'a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd' | \
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p"
Where:
-En - enable extended regex support
^(.*,)* - [capture group #1] - matches start of line plus zero or more instances of characters ending with a comma (,)
^(.*,)*${id}: - capture group #1 followed by ${id} + :
([^,]*) - [capture group #2] - matches everything up to, but not including, the next comma (,)
(,.*)*$ - [capture group #3] - matches zero or more instances of comma followed by other characters to end of line
\2/p - print capture group #2
Alternatively, using a here-string to eliminate the pipe/sub-process call:
id='eu-central-1'
# desired id at start of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'eu-central-1:ami-bbbb,a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd'
# desired id at end of input string:
sed -En "s/^(.*,)*${id}:([^,]*)(,.*)*$/\2/p" <<< 'a-central-1:ami-aaaa,eu-north-1:ami-cccc,eu-west-1:ami-dddd,eu-central-1:ami-bbbb'
All three generate:
ami-bbbb
Defining , as line (record) separator and : as field separator, a simple condition over $1 prints the result.
echo -n a-central-1:ami-aaaa,eu-central-1:ami-bbbb,eu-north-1:ami-cccc,eu-west-1:ami-dddd |
awk 'BEGIN{RS=","; FS=":"}$1=="eu-central-1"{print $2}'

getting first part of a string that has two parts

I have a string that has two parts (path and owner) both separated by a space.
This is the input file input.txt
/dir1/dir2/file1 #owner1
/dir1/dir2/foo\ bar #owner2
I want to extract all the paths to a separate output file - output.txt
I cannot use space as delimiter since paths can also have filenames with space and delimiter in them
/dir1/dir2/file1
/dir1/dir2/foo\ bar
Here could be a different way of doing it with rev + GNU grep:
rev file | grep -oP '.*# \K.*' | rev
OR
rev file | grep -oP '.*#\s+\K.*' | rev
With original simple solution go with:
awk -F' #' '{print $1}' Input_file
Assuming spaces that shouldn't be parsed as delimiters are escaped by a backslash as in your sample, you could use the following regex :
^(\\ |[^ ])*
For instance with grep :
grep -oE '^(\\ |[^ ])*'
The regex matches from the start of the line any number of either a backslash followed by a space or any other character than a space and will stop at the first occurence of a space that isn't preceded by a backslash.
You can try it here.
I would trim the ending part with sed.
sed 's/ [^ ]*$//' /path/to/file
This will match from the end of the line:
(blank) matches the space character
[^ ]* matches the longest string that contains no spaces, i.e. #owner1
$ matches the end of the line
And they will be replaced by nothing, which will act as if you deleted the matched string.
A one-line would do it:
while read p _; do printf '%q\n' "$p"; done <input.txt >output.txt
You can put them in an array and process
mapfile test < input.txt; test=("${test[#]% *}")
echo "${test[#]}"
echo "${test[0]}"
echo "${test[1]}"
You can try with simple awk
awk ' { $NF=""; print } '
Try it here https://ideone.com/W8J1ZO

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

how can I get the index of a character in a given concurrence which is repeated several times in a TEXT line using SHELL (BASH) script

I have a Text string like below
"/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000|| "
Here "|" is repeated several times within the string and I need to get the index of 4th occurrence of "|" character using shell-script (BASH) command. I tried to find a way using grep command's options.
Thanks.
Using awk you can do:
awk -F '|' '{print index($0, $5)-1}' file
This will print character position of fourth pipe in the file.
grep can print the byte-offset; when used with -o it prints the byte-offset of the matching part.
$ string="/path/to/log/file/LOG_FILE.log.2013-10-02-15:2013-10-02 15:46:57.809 INFO - TTT005|Receive|0000293|N~0000284~YOS~TTT005~ ~000~YC~|YOS TYOS-YCUPDT1-H 20131002154657669284YCARR TTT005 Y0TD04 |1|0150520106050|001|051052020603|003|015030010101502702060510520101|000||000||"
$ grep -ob "[^|]*" <<< "${string}" | sed '5!d' | cut -d: -f1
132
Alternatively, without using grep:
$ newstring=$(echo "${string}" | cut -d\| -f5-)
$ echo $(( ${#string} - ${#newstring} ))
132

Resources