extract text between two words using sed - bash

I want to extract text in shell variable which is in between two matching words/characters, like below.
Input string-
extract='sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn Query State: STARTING\n'
I want to extract the query state, which is in between the text 'Query State' and first occurrence of '\n'
I have used below sed expression-
query_state=$(echo $extract | sed 's/.*Query State: \(.*\)\\n .*/\1/')
but I get output as - FINISHED\n Query Status: OK,
basically, the above is giving eveything between the the words 'Query Status' and last occurence of '\n'.
So, I changed to sed expression like below to get output 'FINISHED'
query_state=$(echo $extract | sed 's/.*Query State: \(.*\)\\n Query Status.*/\1/')
But above expression is having hard dependency on the text 'Query Status'. How can I change the expression to get exactly at first occurence of '\n' ?
Update:I want to extract the query state, which is in between the first occurence of text 'Query State' followed by first occurrence of '\n'
-Thanks

grep solution (since you are only looking to find a match, you are not looking to edit anything):
$ echo "$extract"
sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn
$ echo "$extract" | grep -oP '(?<=Query State: ).*?(?=\\n)'
FINISHED
Explanation:
-o Return only the matched substring (this will return all matches, one per line)
-P For perl-compatible regular expressions; needed for lookaround as well as lazy quantifier
(?<= ... ) lookbehind : The match should start at a position immediately following the last character (in this case, the space) between the opening sequence (?<= and the closing parenthesis.
.*? zero or more characters (any characters), as few as possible. *? is called lazy (or non-greedy) quantifier.
(?=\\n) lookahead : Similar to lookbehind. Backslash must be escaped.
EDIT:
If the "Query State: ..." fragment may appear at the very end of the string, not terminated by the \n marker, and if in that case the state must still be returned, the regular expression needs to be modified as follows:
$ echo $extract
sometext Query State: FINISHED
$ echo $extract | grep -oP '(?<=Query State: ).*?((?=\\n)|$)'
FINISHED
Notice the alternation in the lookahead: we are looking for the substring \n or the end of the input string; either one will work.

For a short case you can consider an extra call to sed:
echo "$extract" | sed -n 's/\\n/\n/g; s/.*Query State: //p'
Can you tell anything about the possible values of the state? Another solution can be something like
echo "$extract" | sed -r 's/.*Query State: ([A-Za-z ]*).*/\1/'

This works.
extract='sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn'
echo "$extract" | sed 's/.*Query State: \([^\\n]\+\).*$/\1/'
Output
FINISHED
Works with awk
echo "$extract" | awk -F'[: \\\\n]+' '{print $4}'

Related

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

Print word between two characters by going backward in the line

I having problems in extracting the word from a line. What i want is that it picks the first word before the symbol # but after the /. Which is the only delimiter that stand out.
A line looks like this:
,["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
I want the word Programming.
To get that line i am using this which narrows it down.
sed -n '/.*picasa.*.jpg/p' 5743548866439293105
So i want it to pretty much find # and then go backward until it hit the first /. Then print it out. In this case the word should be Programming but could be anything.
I want it to be as short as possible and have experimented with
sed -n '/.*picasa.*.jpg/p' 5743548866439293105 | awk '$0=$2' FS="/" RS="[$#]"
You can do that with sed (slightly shortened for formatting but works on your original string as well):
pax> echo ',["https://p.g.com/111/Prog#574' | sed 's/^[^#]*\/\([^#]*\)#.*$/\1/'
Prog
pax>
Explaining in more detail:
/---+------------------> greedy capture up to '/'.
/ |
| | /------+---------> capture the stuff between '/' and '#'.
| |/ |
| || | /-+-----> everything from '#' to end of line.
| || |/ |
| || || |
's/^[^#]*\/\([^#]*\)#.*$/\1/'
||
\+---> replace with captured group.
It basically searches for an entire line that has the pattern you want (first # following a /), whilst capturing (with the \( and \) brackets) just the stuff between / and #.
The substitution then replaces the entire line with just that captured text you're interested in (via \1).
Using grep with some Perl regex extensions:
echo $string | grep -P -o "(?<=/)[^/]+(?=#)"
-P tells grep to use Perl extensions. -o tells grep to display only the matched text. To understand what gets matched, break the regex into three parts: (?<=/), [^/]+?, and (?=#). The first part says that the matched text must follow a '/', without including the '/' in the match. The second parts matches a string of non-'/' characters. The last part says that the matched text must be immediately followed by a '#', without including the '#' in the match.
Another grep, using the "\K" feature to "throw away" the match up to the last '/' before the '#':
# Match as much as possible up to a '/', but throw it away, then match as much as you can
# up to the first #
echo $string | grep -oP ".*/\K.+(?=#)"
Using cut and awk to get the first field (splitting on #) followed by the last field (splitting on /):
echo $string | cut -d# -f1 | awk -F/ '{print $NF}'
Using some temporary variables and bash's parameter expansion facilities:
$ FOO=["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
$ BAR=${FOO%#*} # Strip the last # and everything after
$ echo $BAR
[https://picasaweb.google.com/111560558537332305125/Programming
$ BAZ=${BAR##*/} # Strip everything up to and including the last /
$ echo $BAZ
Programming
This might work for you:
sed '/.*\/\([^#]*\)#.*/{s//\1/;q};d' file

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

Resources