Same pattern in grep and sed select different values - bash

I want to create a pattern for sed, which will find out 'type=""
For this I tried to use the pattern
type=".*\?"
echo 'aa type="none" stretchChildren="first"' | sed s/'type=".*\?"'/hello/
Above is the sed command which prints
aa hello
Which means it selects 'type="none" stretchChildren="first"' for 'type=".*\?"'
Now below is the grep command using same pattern on same string
echo 'aa type="none" stretchChildren="first"' | grep -oP 'type=".*?"'
It gives output
type="none"
Don't know what I am missing in sed pattern
Can some one help me out here
Output of sed should be
aa hello stretchChildren="first"

sed doesn't have non-greedy pattern matching, so using *? or *\? won't work.
If you want to have the same output as grep then use a grouping without the " - [^"]+ instead of ".*?":
sed -r 's/type="[^"]+"/hello/'
[, ] is a group of characters, ^ is a negation, so [^"] means any character that is not a ".
For OSX use -E instead of -r.
(-E also works on latest GNU sed, but it is not documented in --help nor in man sed so I don't recommend it)

Related

Convert text from HttpStatus.NOT_FOUND into status().isNotFound() in bash

I want to convert the text in a bash variable i.e. HttpStatus.NOT_FOUND into status().isNotFound() and I had accomplished this by using sed:
result=HttpStatus.NOT_FOUND
result=$(echo $result | cut -d'.' -f2- | sed -r 's/(^|_)([A-Z])/\L\2/g' | sed -E 's/([[:lower:]])|([[:upper:]])/\U\1\L\2/g')
echo "status().is$result()"
Output:
status().isNotFound()
As you can see here I'm using 2 sed commands.
Is there a way to achieve the same result using 1 sed or any other simpler way?
Since it involves a lot of new text insertion in the replacement part, the sed command can be written in detail as below. Just pass the variable content over a pipe without using cut
result=HttpStatus.NOT_FOUND
echo "$result" |
sed -E 's/^.*(Status)\.([[:upper:]])([[:upper:]]+)_([[:upper:]])([[:upper:]]+)$/\L\1().is\u\2\L\3\u\4\L\5()/g'
The idea is add the case conversion functions of GNU sed on the captured groups. So we capture
(Status) in \1 in which we just lowercase the entire string and then append a ().is to the result
The next captured group, \2 would be first uppercase character following the . which would be N and the rest of the string OT in \3. We retain the second as such and do lower case of the third group.
The same sequence as above is repeated for the next word FOUND in \4 and \5.
The \L, \u are case conversion operators available in GNU sed.
If you are looking to modify only the part beyond the . to CamelCase, then you can use sed as
result=HttpStatus.NOT_FOUND
result=$(echo "$result" |
sed -E 's/^.*\.([[:upper:]])([[:upper:]]+)_([[:upper:]])([[:upper:]]+)/\u\1\L\2\u\3\L\4/g')
echo "status().is$result()"
This might work for you (GNU sed):
<<<"$result" sed -r 's/.*(Status)\.(.*)_(.*)/\L\1().is\u\2\u\3()/'
Use pattern matching/grouping/back references. The majority of the RHS is lowercase, so use the \L metacharacter to convert from Status... to lowercase and uppercase just the start of words using \u which converts only the next character to uppercase.
N.B. \L and likewise \U converts all following characters to lowercase/uppercase until \E or \U/\L, \l and \u only interrupt this for the next character.
Since you are using GNU sed (-r switch), here's another sed solution,
just a little bit more concise, and locale safe:
$ result=HttpStatus.NOT_FOUND
$ echo "$result" | sed -r 's/^.*([A-Z][a-z]*)\.([a-zA-Z])([a-zA-Z]*)_([a-zA-Z])([a-zA-Z]*)/\L\1().is\u\2\L\3\U\4\L\5()/'
status().isNotFound()
An even more concise way of sed is:
echo "$result" | sed -r 's/^.*([A-Z][a-z]*)\.([a-zA-Z]*)_([a-zA-Z]*)/\L\1().is\u\2\u\3()/'
They both are case insensitive for the second part, for example .nOt_fOuNd also works here.
And an GNU awk solution:
echo "$result" | awk 'function cap(str){return (toupper(substr(str,1,1)) tolower(substr(str,2)))}match($0, /([A-Z][a-z]*)\.([a-zA-Z]*)_([a-zA-Z]*)/, m){print tolower(m[1]) ".is" cap(m[2]) cap(m[3]) "()"}'
You can use the sed option "-e" to concatenate multible expressions.

sed: remove all characters except for last n characters

I am trying to remove every character in a text string except for the remaining 11 characters. The string is Sample Text_that-would$normally~be,here--pe_-l4_mBY and what I want to end up with is just -pe_-l4_mBY.
Here's what I've tried:
$ cat food
Sample Text_that-would$normally~be,here--pe_-l4_mBY
$ cat food | sed 's/^.*(.{3})$/\1/'
sed: 1: "s/^.*(.{3})$/\1/": \1 not defined in the RE
Please note that the text string isn't really stored in a file, I just used cat food as an example.
OS is macOS High Sierra 10.13.6 and bash version is 3.2.57(1)-release
You can use this sed with a capture group:
sed -E 's/.*(.{11})$/\1/' file
-pe_-l4_mBY
Basic regular expressions (used by default by sed) require both the parentheses in the capture group and the braces in the brace expression to be escaped. ( and { are otherwise treated as literal characters to be matched.
$ cat food | sed 's/^.*\(.\{3\}\)$/\1/'
mBY
By contrast, explicitly requesting sed to use extended regular expressions with the -E option reverses the meaning, with \( and \{ being the literal characters.
$ cat food | sed -E 's/^.*(.{3})$/\1/'
mBY
Try this also:
grep -o -E '.{11}$' food
grep, like sed, accepts an arbitrary number of file name arguments, so there is no need for a separate cat. (See also useless use of cat.)
You can use tail or Parameter Expansion :
string='Sample Text_that-would$normally~be,here--pe_-l4_mBY'
echo "$string" | tail -c 11
echo "${string#${string%??????????}}"
pe_-l4_mBY
pe_-l4_mBY
also with rev/cut/rev
$ echo abcdefghijklmnopqrstuvwxyz | rev | cut -c1-11 | rev
pqrstuvwxyz
man rev => rev - reverse lines characterwise

Proper use of capture groups in SED command

I need to convert a string "1,234" =to=> 1234.
this string is just a part of a bigger line. There are thousands of such lines in the file.
I have written a sed command which is not working as I expect it to.
echo \"1,234\" | sed 's/\("\)\([0-9]+\)\(,\)\([0-9]+\)\("\)/\2\4/g'
As far as I understand, in this code,
\1 is "
\2 is the digits before comma
\3 is ,
\4 is the digits after comma
I expect this command to output 1234 which should be \2\4. But it just yields back "1,234". So I think it is not being parsed properly. Some help would be appreciated.
I would suggest you use POSIX Extended Regular Expressions (ERE), where you don't have to escape parentheses and the repetition operator. To enable ERE in sed, you can use the -E switch (or -r in GNU sed). Your expression will then look like this:
$ echo '"1,234"' | sed -E 's/"([0-9]+),([0-9]+)"/\1\2/g'
1234
For completeness, your original BRE expression will function properly if you escape the +:
echo \"1,234\" | sed 's/\("\)\([0-9]\+\)\(,\)\([0-9]\+\)\("\)/\2\4/g'
1234
Your second and fourth groups contain [0-9]+, which matches any digit followed by a plus sign.
It looks like you meant [0-9]\+, to match one or more digits.
In passing: there's no need to group the parts you'll not be using (\1, \3 and \5). You can simplify to:
echo \"1,234\" | sed 's/"\([0-9]\+\),\([0-9]\+\)"/\1\2/g'
If you're finding all those \ hard to handled, you could use Extendend Regular Expression syntax, with the -E flag:
echo \"1,234\" | sed -E 's/"([0-9]+),([0-9]+)"/\1\2/g'

Adding double quotes to beginning, end and around comma's in bash variable

I have a shell script that accepts a parameter that is comma delimited,
-s 1234,1244,1567
That is passed to a curl PUT json field. Json needs the values in a "1234","1244","1567" format.
Currently, I am passing the parameter with the quotes already in it:
-s "\"1234\",\"1244\",\"1567\"", which works, but the users are complaining that its too much typing and hard to do. So I'd like to just take a comma delimited list like I had at the top and programmatically stick the quotes in.
Basically, I want a parameter to be passed in as 1234,2345 and end up as a variable that is "1234","2345"
I've come to read that easiest approach here is to use sed, but I'm really not familiar with it and all of my efforts are failing.
You can do this in BASH:
$> arg='1234,1244,1567'
$> echo "\"${arg//,/\",\"}\""
"1234","1244","1567"
awk to the rescue!
$ awk -F, -v OFS='","' -v q='"' '{$1=$1; print q $0 q}' <<< "1234,1244,1567"
"1234","1244","1567"
or shorter with sed
$ sed -r 's/[^,]+/"&"/g' <<< "1234,1244,1567"
"1234","1244","1567"
translating this back to awk
$ awk '{print gensub(/([^,]+)/,"\"\\1\"","g")}' <<< "1234,1244,1567"
"1234","1244","1567"
you can use this:
echo QV=$(echo 1234,2345,56788 | sed -e 's/^/"/' -e 's/$/"/' -e 's/,/","/g')
result:
echo $QV
"1234","2345","56788"
just add double quotes at start, end, and replace commas with quote/comma/quote globally.
easy to do with sed
$ echo '1234,1244,1567' | sed 's/[0-9]*/"\0"/g'
"1234","1244","1567"
[0-9]* zero more consecutive digits, since * is greedy it will try to match as many as possible
"\0" double quote the matched pattern, entire match is by default saved in \0
g global flag, to replace all such patterns
In case, \0 isn't recognized in some sed versions, use & instead:
$ echo '1234,1244,1567' | sed 's/[0-9]*/"&"/g'
"1234","1244","1567"
Similar solution with perl
$ echo '1234,1244,1567' | perl -pe 's/\d+/"$&"/g'
"1234","1244","1567"
Note: Using * instead of + with perl will give
$ echo '1234,1244,1567' | perl -pe 's/\d*/"$&"/g'
"1234""","1244""","1567"""
""$
I think this difference between sed and perl is similar to this question: GNU sed, ^ and $ with | when first/last character matches
Using sed:
$ echo 1234,1244,1567 | sed 's/\([0-9]\+\)/\"\1\"/g'
"1234","1244","1567"
ie. replace all strings of numbers with the same strings of numbers quoted using backreferencing (\1).

Case insensitive search matching with sed?

I'm trying to use SED to extract text from two words, such as "Account" and "Recognized", and I'd like that the searching be case insensitive. So I tried to use the I parameter, but receive this error message:
cat Security.txt | sed -n "/Account/,/Recognized/pI" | sed -e '1d' -e '$d'
sed: -e expression #1, char 24: extra characters after command
Avoid useless use of cat
/pattern/I is how to specify case-insensitive matching in sed
sed -n "/Account/I,/Recognized/Ip" Security.txt | sed -e '1d' -e '$d'
You can use single sed command to achieve the same:
sed -n '/account/I,/recognized/I{/account/I!{/recognized/I!p}}' Security.txt
Or awk
awk 'BEGIN{IGNORECASE=1} /account/{f=1; next} /recognized/{f=0} f' Security.txt
Reference:
How to select lines between two patterns?
Use:
sed -n "/Account/,/Recognized/Ip"
i.e. change the order to: Ip instead of pI
You have useless use of cat where you should've fed the file directly to sed. Below could be a way of doing it.
$ cat file.txt
Some stuff Account sllslsljjs Security.
Another stuff account name and ffss security.
$ sed -nE 's/^.*account[[:blank:]]*(.*)[[:blank:]]*security.*$/\1/pI' file.txt
sllslsljjs
name and ffss
The [[:blank:]]* is greedy and will strip the spaces before and after the required text. The -E option enables the use of extended regular expressions.

Resources