Extract string from brackets - bash

I'm pretty new at bash so this is a pretty noob question..
Suppose I have a string:
string1 [string2] string3 string4
I would like to extract string2 from the square brackets; but the brackets may be surrounding any other string at any other time.
How would I use sed, etc, to do this? Thanks!

Try this:
echo $str | cut -d "[" -f2 | cut -d "]" -f1

Here's one way using awk:
echo "string1 [string2] string3 string4" | awk -F'[][]' '{print $2}'
This sed option also works:
echo "string1 [string2] string3 string4" | sed 's/.*\[\([^]]*\)\].*/\1/g'
Here's a breakdown of the sed command:
s/ <-- this means it should perform a substitution
.* <-- this means match zero or more characters
\[ <-- this means match a literal [ character
\( <-- this starts saving the pattern for later use
[^]]* <-- this means match any character that is not a [ character
the outer [ and ] signify that this is a character class
having the ^ character as the first character in the class means "not"
\) <-- this closes the saving of the pattern match for later use
\] <-- this means match a literal ] character
.* <-- this means match zero or more characters
/\1 <-- this means replace everything matched with the first saved pattern
(the match between "\(" and "\)" )
/g <-- this means the substitution is global (all occurrences on the line)

In pure bash:
STR="string1 [string2] string3 string4"
STR=${STR#*[}
STR=${STR%]*}
echo $STR

Specify awk multiple delimiters with -F '[delimiters]'
If the delimiters are square brackets, put them back to back like this ][
awk -F '[][]' '{print $2}'
otherwise you will have to escape them
awk -F '[\\[\\]]' '{print $2}'
Other examples to get the value between the brackets:
echo "string1 (string2) string3" | awk -F '[()]' '{print $2}'
echo "string1 {string2} string3" | awk -F '[{}]' '{print $2}'

Here's another one , but it takes care of multiple occurrences, eg
$ echo "string1 [string2] string3 [string4 string5]" | awk -vRS="]" -vFS="[" '{print $2}'
string2
string4 string5
The simple logic is this, you split on "]" and go through the split words finding a "[", then split on "[" to get the first field. In Python
for item in "string1 [string2] string3 [string4 string5]".split("]"):
if "[" in item:
print item.split("]")[-1]

Here is an awk example, but I'm matching on parenthesis which also makes it more obvious of how the -F works.
echo 'test (lskdjf)' | awk -F'[()]' '{print $2}'

Another awk:
$ echo "string1 [string2] string3 [string4]" |
awk -v RS=[ -v FS=] 'NR>1{print $1}'
string2
string4

Read file in which the delimiter is square brackets:
$ cat file
123;abc[202];124
125;abc[203];124
127;abc[204];124
To print the value present within the brackets:
$ awk -F '[][]' '{print $2}' file
202
203
204
At the first sight, the delimiter used in the above command might be confusing. Its simple. 2 delimiters are to be used in this case: One is [ and the other is ]. Since the delimiters itself is square brackets which is to be placed within the square brackets, it looks tricky at the first instance.
Note: If square brackets are delimiters, it should be put in this way only, meaning first ] followed by [. Using the delimiter like -F '[[]]' will give a different interpretation altogether.
Refer this link: http://www.theunixschool.com/2012/07/awk-10-examples-to-read-files-with.html

Inline solution could be:
a="first \"Foo1\" and second \"Foo2\""
echo ${a#*\"} | { read b; echo ${b%%\"*}; }
You can test in single line:
a="first \"Foo1\" and second \"Foo2\""; echo ${a#*\"} | { read b; echo ${b%%\"*}; }
Output: Foo1
Example with brackets:
a="first [Foo1] and second [Foo2]"
echo ${a#*[} | { read b; echo ${b%%]*}; }
That in one line:
a="first [Foo1] and second [Foo2]"; echo ${a#*[} | { read b; echo ${b%%]*}; }
Output: Foo1

Related

Extract String before bracket and create new line

I have data in below format
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW [123457, 89053]
FAFDJ-ER 1234 MNO [6532, 789, 234578]
I want to create the data in below format using sed or awk.
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW 123457
FFANKN 2345 QW 89053
FAFDJ-ER 1234 MNO 6532
FAFDJ-ER 1234 MNO 789
FAFDJ-ER 1234 MNO 234578
I can extract the data before bracket but I don't know how to concatenate the same with data from bracket repeatedly.
My Effort :--
# !/bin/bash
while IFS= read -r line
do
echo "$line"
cnt=`echo $line | grep -o "\[" | wc -l`
if [ $cnt -gt 0 ]
then
startstr=`echo $line | awk -F[ '{print $1}'`
echo $startstr
intrstr=`echo $line | cut -d "[" -f2 | cut -d "]" -f1`
echo $intrstr
else
echo "$line" >> newfile.txt
fi
done < 1.txt
I am able to get the first part and also keep the rows not having "[" in new file but I dont know how to get the values in "[" and pass it at end as number of variables in "[" keep changing randomly.
Regards
With your shown samples, please try following awkcode.
awk '
match($0,/\[[^]]*\]$/){
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ")
for(i=1;i<=num;i++){
print substr($0,1,RSTART-1) arr[i]
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
match($0,/\[[^]]*\]$/){ ##Using match function to match from [ till ] at the end of line.
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ") ##Splitting matched values by regex above and passing into array named arr with delimiters comma and space.
for(i=1;i<=num;i++){ ##Running for loop till value of num.
print substr($0,1,RSTART-1) arr[i] ##printing sub string before matched along with element of arr with index of i.
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.
Suggesting simple awk script:
awk 'NR==1{print}{for (i=2;i<NF;i++)print $1, $i}' FS="( \\\[)|(, )|(\\\]$)" input.1.txt
Explanation:
FS="( \\\[)|(, )|(\\\]$)" Set awk field seperator to be either [ , ]EOL
This will make the interesting fields $2 ---> $FN to be appended to $1
NR==1{print} print first line only as it is.
{for (i=2;i<NF;i++)print $1, $i} for 2nd line on, print: field $1 appended by current field.
This might work for you (GNU sed):
sed -E '/(.*)\[([^,]*), /{s//\1\2\n\1[/;P;D};s/[][]//g' file
Match the string up to the opening square bracket and also the string after before the comma and space.
Replace the entire match by the leading and trailing matching strings, followed be a newline and the leading matching string.
Print/delete the first line and repeat.
The last line of any repeat above will fail because there is not trailing comma space, in which case the opening and closing square brackets should also be removed.
Alternative:
sed -E ':a;s/([^\n]*)\[([^,]*), /\1\2\n\1[/;ta;s/[][]//g' file

getting first part of a string that has two parts

I have a string that has two parts (path and owner) both separated by a space.
This is the input file input.txt
/dir1/dir2/file1 #owner1
/dir1/dir2/foo\ bar #owner2
I want to extract all the paths to a separate output file - output.txt
I cannot use space as delimiter since paths can also have filenames with space and delimiter in them
/dir1/dir2/file1
/dir1/dir2/foo\ bar
Here could be a different way of doing it with rev + GNU grep:
rev file | grep -oP '.*# \K.*' | rev
OR
rev file | grep -oP '.*#\s+\K.*' | rev
With original simple solution go with:
awk -F' #' '{print $1}' Input_file
Assuming spaces that shouldn't be parsed as delimiters are escaped by a backslash as in your sample, you could use the following regex :
^(\\ |[^ ])*
For instance with grep :
grep -oE '^(\\ |[^ ])*'
The regex matches from the start of the line any number of either a backslash followed by a space or any other character than a space and will stop at the first occurence of a space that isn't preceded by a backslash.
You can try it here.
I would trim the ending part with sed.
sed 's/ [^ ]*$//' /path/to/file
This will match from the end of the line:
(blank) matches the space character
[^ ]* matches the longest string that contains no spaces, i.e. #owner1
$ matches the end of the line
And they will be replaced by nothing, which will act as if you deleted the matched string.
A one-line would do it:
while read p _; do printf '%q\n' "$p"; done <input.txt >output.txt
You can put them in an array and process
mapfile test < input.txt; test=("${test[#]% *}")
echo "${test[#]}"
echo "${test[0]}"
echo "${test[1]}"
You can try with simple awk
awk ' { $NF=""; print } '
Try it here https://ideone.com/W8J1ZO

Extract string between qoutes in a script

my text-
(
"en-US"
)
what i need -
en-US
currently im able to get it by piping it with
... | tr -d '[:space:]' | sed s/'("'// | sed s/'("'// | sed s/'")'//
I wonder if there is a simple way to extract the string between the qoutes rather than chopping off useless parts one by one
... | grep -oP '(?<=").*(?=")'
Explanation:
-o: Only output matching string
-P: Use Perl style RegEx
(?<="): Lookbehind, so only match text that is preceded by a double quote
.*: Match any characters
(?="): Lookahead, so only match text that is followed by a double quote
With sed
echo '(
"en-US"
)' | sed -rn 's/.*"(.*)".*/\1/p'
with 2 commands
echo '(
"en-US"
)' | tr -d "\n" | cut -d '"' -f2
Could you please try following. Where var is the bash variable haveing shown sample value stored in it.
echo "$var" | awk 'match($0,/".*"/){print substr($0,RSTART+1,RLENGTH-2)}'
Explanation: Following is only for explanation purposes.
echo "$var" | ##Using echo to print variable named var and using |(pipe) to send its output to awk command as an Input.
awk ' ##Starting awk program from here.
match($0,/".*"/){ ##using match function of awk to match a regex which is to match from till next occurrence of " by this match 2 default variables named RSTART and RLENGTH will be set as per values.
print substr($0,RSTART+1,RLENGTH-2) ##Where RSTART means starting point index of matched regex and RLENGTH means matched regex length, here printing sub-string whose starting point is RSTART and ending point of RLENGTH to get only values between " as per request.
}' ##Closing awk command here.
Consider using
... | grep -o '"[^"]\{1,\}"' | sed -e 's/^"//' -e 's/"$//'
grep will extract all substrings between quotes (excluding empty ones), the sed later will remove the quotes on both ends.
And this one ?
... | grep '"' | cut -d '"' -f 2
It works if you have just 1 quoted value by line.

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

awk to ignore leading and trailing space and blank lines and commented lines if any from a file

Need help on awk
awk to ignore leading and trailing space and blank lines and commented lines if any from a file
Here you go,
grep "MyText" FromMyLog.log |awk -F " " '{print $2}'|awk -F "#" '{print $1}'
Here MyText is the key to grep from file FromMyLog.log
-F is used to avoid the following value, here space between quotes.
'{print $2}' will print the 2nd argument from the output, you can use $1, $2 as your requirement.
awk -F "#" This will ignore the commented lines.
This is just a hint for you, Modify the code with your requirements. This works for me while grep.
grep -v '^$\|^\s*\#' <filename> or egrep -v '^[[:space:]]*$|^ *#' <file_name> (if white spaces)
I think this is what you were asking for:
$> echo -e ' abc \t
\t efg
# alskdjfl
#
awk
# askdfh
' |
awk '
# match if first none space character is not a hash sign
/^[[:space:]]*[^#]/ {
# delete any spaces from start and end of line
sub(/^[[:space:]]*/, "");
sub(/[[:space:]]*$/, "", NF); # `NF` is Number of Fields
print
}'
abc
efg
awk
This can be folded onto a single line if so needed. Any problems, an actual example of the input (in a code block in your question) would be helpful.
Here's one way to extract required content ignoring spaces
FILE CONTENT
Server: 192.168.XX.XX
Address 1: 192.168.YY.YY
Name: central.google.com
Now to extract the server's address without spaces.
COMMAND
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -s " "
option -s for squeezing the repetition of spaces.
which gives,
192.168.XX.XX
Here, notice that there is one leading space in the address.
To completely ignore spaces you can change that to,
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -d [:space:]
option -d for removing particular characters, which is [:space:] here.
which gives,
192.168.YY.YY
without any leading or trailing spaces.
tr is an UNIX utility for translating, or deleting, or squeezing repeated characters. tr refers to translate here.
Examples:
tr [:lower:] [:upper:]
gives,
YOUAREAWESOME
for
youareawesome
Hope that helps.

Resources