Sed regex, extracting part of a string in Mac terminal - macos

I have sample data like "(stuff/thing)" and I'm trying to extract "thing".
I'm doing this in the terminal on OSX and I can't quite seem to get this right.
Here's the last broken attempt
echo '(stuff/thing)' | sed -n 's/\((.*)\)/\1/p'

I would say:
$ echo '(stuff/thing)' | sed -n 's#.*/\([^)]*\))#\1#p'
thing
I start saying:
$ echo '(stuff/thing)' | sed -n 's#.*/##p'
thing)
Note I use # as sed delimiter for better readability.
Then, I want to get rid of what comes from the ). For this, we have to capture the block with \([^)]*\)) and print it back with \1.
So all together this is doing:
# print the captured group
# ^^
# |
.*/\([^)]*\))#\1
# ^^^| ^^^^^ |
# | | ------|---- all but )
# | | |
# | ^^ ^^
# | capture group
# |
# everything up to a /

To provide an awk alternative to fedorqui's helpful answer:
awk makes it easy to parse lines into fields based on separators:
$ echo '(stuff/thing)' | awk -F'[()/]' '{print $3}'
thing
-F[()/] specifies that any of the characters ( ) / should serve as a field separator when breaking each input line into fields.
$3 refers to the 3rd field (thing is the 3rd field, because the line starts with a field separator, which implies that field 1 ($1) is the empty string before it).
As for why your sed command didn't work:
Since you're not using -E, you must use basic regexes (BREs), where, counter-intuitively, parentheses must be escaped to be special - you have it the other way around.
The main problem, however, is that in order to output only part of the line, you must match ALL of it, and replace it with the part of interest.
With a BRE, that would be:
echo '(stuff/thing)' | sed -n 's/^.*\/\(.*\))$/\1/p'
With an ERE (extended regex), it would be:
echo '(stuff/thing)' | sed -En 's/^.*\/(.*)\)$/\1/p'`
Also note that both commands work as-is with GNU sed, so the problem is not Mac-specific (but note that the -E option to activate EREs is an alias there for the better-known -r).
That said, regex dialects do differ across implementations; GNU sed generally supports extensions to the POSIX-mandated BREs and EREs.

I would do it in 2 easy parts - remove everything up to and including the slash and then everything from the closing parenthesis onwards:
echo '(stuff/thing)' | sed -e 's/.*\///' -e 's/).*//'

Related

"sed" doesn't match pattern

I'm trying to format cut, paste output but sed not working.
file.txt
Apple
Banana
Apple
Banana
Orange
Apple
Orange
code.sh
cut -f2 file.txt | sort | uniq | sed 's/^\|$/#/g'| paste -sd,\& -
expected output / output on ubuntu
#Apple#,#Banana#&#Orange#
getting output / output on macos
Apple,Banana&Orange
Note: The code works on Ubuntu, but on MacOS it doesn't.
This can be done in a single gnu-awk:
awk '!seen[$1]++{} END {
PROCINFO["sorted_in"]="#ind_str_asc"
for (i in seen)
s = s (s == "" ? "" : (++j==1?",":"&")) "#" i "#"
print s
}' file
#Apple#,#Banana#&#Orange#
On OSX I have gnu awk installed via home brew.
As mentioned elsewhere, BSD sed doesn't support \|. Instead of replacing ^ and $, you can substitute # around the whole line.
sort -u file.txt | sed 's/.*/#&#/' | paste -sd,'&' -
As far as I know, BSD/Mac sed doesn't support \|. See sed not giving me correct substitute operation for newline with Mac - differences between GNU sed and BSD / OSX sed for details.
As an alternate, you can use ERE instead of BRE. I checked it on Linux, apparently this still doesn't seem to work on Mac (See also: MacOS sed: match either beginning or end).
$ echo 'Apple' | sed -E 's/^|$/#/g'
#Apple#
# workaround for Mac
$ echo 'Apple' | sed -e 's/^/#/' -e 's/$/#/'
#Apple#
Instead of sort+uniq+sed, you can also use awk (but note that awk solution shown here removes duplicates while preserving original order, doesn't sort the input):
$ awk '!seen[$0]++{print "#" $0 "#"}' ip.txt
#Apple#
#Banana#
#Orange#
Change $0 to $2 if you want only the second field, based on your use of cut
A simple way to do it using the sed command:
sed -E 's/[[:alnum:]]+/#&#/'
the -E option for enabling the POSIX ERE (extended regular
expression)
[[:alnum:]]+ The alphanumeric characters; in ASCII, equivalent to [A-Za-z0-9] with the plus (+) to refer to one or more.
the & symbol, does bring or refer to the content of the pattern we found. (on which we surrounded it with #)

Clean output using sed

I have a file that begins with this kind of format
INFO|NOT-CLONED|/folder/another-folder/another-folder|last-folder-name|
What I need is to read the file and get this output:
INFO|NOT-CLONED|last-folder-name
I have this so far:
cat clone_them.log | grep 'INFO|NOT-CLONED' | sed -E 's/INFO\|NOT-CLONED\|(.*)/g'
But is not working as intended
NOTE: the last "another-folder" and "last-folder-name is the same
If you want a sed solution:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p' file
INFO|NOT-CLONED|last-folder-name
How it works:
-E
Use extended regex
-n
Don't print unless we explicitly tell it to.
s/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p
Look for lines that include INFO|NOT-CLONED| (save this in group 1) followed by anything, .*, followed by | followed by any characters not |, [^|]* (saved in group 2), followed by | at the end of the line. The replacement text is group 1 followed by group 2.
The p option tells sed to print the line if the match succeeds. Since the substitution only succeeds for lines that contain INFO|NOT-CLONED|, this eliminates the need for an extra grep process.
Variation: Returning just the last-folder-name
To just get the last-folder-name without the INFO|NOT-CLONED, we need only remove \1 from the output:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\2/p' file
last-folder-name
Since we no longer need the first capture group, we could simplify and remove the now unneeded parens so that the only capture group is the last folder name:
$ sed -En 's/INFO\|NOT-CLONED\|.*\|([^|]*)\|$/\1/p' file
last-folder-name
Its simpler in awk as input file is properly delimited by | symbol. You need to tell awk that the input fields are separated by | and output should also remain separated with | symbol using IFS and OFS respectively.
awk 'BEGIN{FS=OFS="|"}/INFO\|NOT-CLONED/{print $1,$2,$(NF-1)}' clone_them.log
INFO|NOT-CLONED|last-folder-name

Convert text from HttpStatus.NOT_FOUND into status().isNotFound() in bash

I want to convert the text in a bash variable i.e. HttpStatus.NOT_FOUND into status().isNotFound() and I had accomplished this by using sed:
result=HttpStatus.NOT_FOUND
result=$(echo $result | cut -d'.' -f2- | sed -r 's/(^|_)([A-Z])/\L\2/g' | sed -E 's/([[:lower:]])|([[:upper:]])/\U\1\L\2/g')
echo "status().is$result()"
Output:
status().isNotFound()
As you can see here I'm using 2 sed commands.
Is there a way to achieve the same result using 1 sed or any other simpler way?
Since it involves a lot of new text insertion in the replacement part, the sed command can be written in detail as below. Just pass the variable content over a pipe without using cut
result=HttpStatus.NOT_FOUND
echo "$result" |
sed -E 's/^.*(Status)\.([[:upper:]])([[:upper:]]+)_([[:upper:]])([[:upper:]]+)$/\L\1().is\u\2\L\3\u\4\L\5()/g'
The idea is add the case conversion functions of GNU sed on the captured groups. So we capture
(Status) in \1 in which we just lowercase the entire string and then append a ().is to the result
The next captured group, \2 would be first uppercase character following the . which would be N and the rest of the string OT in \3. We retain the second as such and do lower case of the third group.
The same sequence as above is repeated for the next word FOUND in \4 and \5.
The \L, \u are case conversion operators available in GNU sed.
If you are looking to modify only the part beyond the . to CamelCase, then you can use sed as
result=HttpStatus.NOT_FOUND
result=$(echo "$result" |
sed -E 's/^.*\.([[:upper:]])([[:upper:]]+)_([[:upper:]])([[:upper:]]+)/\u\1\L\2\u\3\L\4/g')
echo "status().is$result()"
This might work for you (GNU sed):
<<<"$result" sed -r 's/.*(Status)\.(.*)_(.*)/\L\1().is\u\2\u\3()/'
Use pattern matching/grouping/back references. The majority of the RHS is lowercase, so use the \L metacharacter to convert from Status... to lowercase and uppercase just the start of words using \u which converts only the next character to uppercase.
N.B. \L and likewise \U converts all following characters to lowercase/uppercase until \E or \U/\L, \l and \u only interrupt this for the next character.
Since you are using GNU sed (-r switch), here's another sed solution,
just a little bit more concise, and locale safe:
$ result=HttpStatus.NOT_FOUND
$ echo "$result" | sed -r 's/^.*([A-Z][a-z]*)\.([a-zA-Z])([a-zA-Z]*)_([a-zA-Z])([a-zA-Z]*)/\L\1().is\u\2\L\3\U\4\L\5()/'
status().isNotFound()
An even more concise way of sed is:
echo "$result" | sed -r 's/^.*([A-Z][a-z]*)\.([a-zA-Z]*)_([a-zA-Z]*)/\L\1().is\u\2\u\3()/'
They both are case insensitive for the second part, for example .nOt_fOuNd also works here.
And an GNU awk solution:
echo "$result" | awk 'function cap(str){return (toupper(substr(str,1,1)) tolower(substr(str,2)))}match($0, /([A-Z][a-z]*)\.([a-zA-Z]*)_([a-zA-Z]*)/, m){print tolower(m[1]) ".is" cap(m[2]) cap(m[3]) "()"}'
You can use the sed option "-e" to concatenate multible expressions.

sed: remove all characters except for last n characters

I am trying to remove every character in a text string except for the remaining 11 characters. The string is Sample Text_that-would$normally~be,here--pe_-l4_mBY and what I want to end up with is just -pe_-l4_mBY.
Here's what I've tried:
$ cat food
Sample Text_that-would$normally~be,here--pe_-l4_mBY
$ cat food | sed 's/^.*(.{3})$/\1/'
sed: 1: "s/^.*(.{3})$/\1/": \1 not defined in the RE
Please note that the text string isn't really stored in a file, I just used cat food as an example.
OS is macOS High Sierra 10.13.6 and bash version is 3.2.57(1)-release
You can use this sed with a capture group:
sed -E 's/.*(.{11})$/\1/' file
-pe_-l4_mBY
Basic regular expressions (used by default by sed) require both the parentheses in the capture group and the braces in the brace expression to be escaped. ( and { are otherwise treated as literal characters to be matched.
$ cat food | sed 's/^.*\(.\{3\}\)$/\1/'
mBY
By contrast, explicitly requesting sed to use extended regular expressions with the -E option reverses the meaning, with \( and \{ being the literal characters.
$ cat food | sed -E 's/^.*(.{3})$/\1/'
mBY
Try this also:
grep -o -E '.{11}$' food
grep, like sed, accepts an arbitrary number of file name arguments, so there is no need for a separate cat. (See also useless use of cat.)
You can use tail or Parameter Expansion :
string='Sample Text_that-would$normally~be,here--pe_-l4_mBY'
echo "$string" | tail -c 11
echo "${string#${string%??????????}}"
pe_-l4_mBY
pe_-l4_mBY
also with rev/cut/rev
$ echo abcdefghijklmnopqrstuvwxyz | rev | cut -c1-11 | rev
pqrstuvwxyz
man rev => rev - reverse lines characterwise

Grep last match of returned multi line result and assign to variable

Lets say that I have a command list kittens that returns something in this multi line format in my terminal (in this exact layout):
[ 'fluffy'
'buster'
'bob1' ]
How can I fetch bob1 and assign to a variable for scripting use? Here's my non working try so far.
list kittens | grep "'([^']+)' \]"
I am not overly familiar with grepping on the cli and am running into issues of syntax with quotes and such.
If you know that bob1 will be in the last line, you can capture it like that:
myvar="$(list kittens | tail -n1 | grep -oP "'\K[^']+(?=')")"
This uses tail to find the last line and then grep with a lookahead and a lookbehind in the regular expression to extract the part inside the quotes.
Edit: The above assume that you are using GNU grep (for the -P mode). Here's an alternative with sed:
myvar="$(list kittens | tail -n1 | sed -e "s/^[^']*'//; s/'[^']*$//")"
Could be done by awk alone:
list kittens |awk 'END{gsub(/\047|[[:blank:]]|\]/,"");print $0}'
bob1
Example:
echo "$kit"
[ 'fluffy'
'buster'
'bob1' ]
echo "$kit" |awk 'END{gsub(/\047|[[:blank:]]|\]/,"");print $0}'
bob1
To Assign it to any variable:
var=$(list kittens |awk 'END{gsub(/\047|[[:blank:]]|\]/,"");print $0}'
Explanation:
END{}: End block is used to take data from last line as we are interested only for last line.
gsub: This is awk's inbuilt function for search and replacement tasks. Here white space and double quoted and single quotes are removed. Not that \047 is used for single quote replacement.

Resources