expr does not return the pattern if not at the beginning of the string - bash

Using this version of bash:
GNU bash, version 4.1.2(1)-release (i386-redhat-linux-gnu)
How can I get expr to find my pattern within a string, if the pattern I'm looking for does not begin this string?
Example:
expr match "123 abc 456 def ghi789" '\([0-9]*\)' #returns 123 as expected
expr match "z 123 abc 456 def ghi789" '\([0-9]*\)' #returns nothing
In the second example, I would expect 123 to be returned.
Further analysis:
If I start from the end of the string by adding .* in my command, I get a weird result:
expr match "123 abc 456 def ghi789" '.*\([0-9]*\)' #returns nothing
expr match "123 abc 456 def ghi789" '.*\([0-9]\)' #returns 9 as expected
expr match "123 abc 456 def ghi789 z" '.*\([0-9]\)' #returns also 9
Here, it seems that the pattern can be found at the end of the string (so at the beginning of my search), and also if it's not at the end of the string. But it does not work if I add the * at the end of the regular expression.
In the other hand, the same does not apply if I start from the beginning of my string:
expr match "z 123 abc 456 def ghi789" '\([0-9]\)' #returns nothing
I think I must misunderstand something obvious, but I cannot find what.
Thank you for your help :)

Would
expr match "123 abc 456 def ghi789 z" '[^0-9]*\([0-9]*\)
do it? (Just added [0-9]* instead of .* at the beginning)
Like mentioned in the comments - the expression
expr match "123 abc 456 def ghi789 z" '^[^0-9]*\([0-9]*\)
would fit better, because the part ^[^0-9] can be read as "skip all characters which are not digits ([^0-9]) form begin (the ^as first character)"

Related

How do I write a regex that captures the first non-numeric part of string that also doesn't include 3 or more spaces?

I'm using Ruby 2.4. I want to extract from a string the first consecutive occurrence of non-numeric characters that do not include at least three or more spaces. For example, in this string
str = "123 aa bb cc 33 dd"
The first such occurrence is " aa bb ". I thought the below expression would help me
data.split(/[[:space:]][[:space:]][[:space:]]+/).first[/\p{L}\D+\p{L}\p{L}/i]
but if the string is "123 456 aaa", it fails to return " aaa", which I would want it to.
r = /
(?: # begin non-capture group
[ ]{,2} # match 0, 1 or 2 spaces
[^[ ]\d]+ # match 1+ characters that are neither spaces nor digits
)+ # end non-capture group and perform 1+ times
[ ]{,2} # match 0, 1 or 2 spaces
/x # free-spacing regex definition mode
str = "123 aa bb cc 33 dd"
str[r] #=> " aa bb "
Note that [ ] could be replaced by a space if free-spacing regex definition mode is not used:
r = /(?: {,2}[^ \d]+)+ {,2}/
Remove all digits + spaces from the start of a string. Then split with 3 or more whitespaces and grab the first item.
def parse_it(s)
s[/\A(?:[\d[:space:]]*\d)?(\D+)/, 1].split(/[[:space:]]{3,}/).first
end
puts parse_it("123 aa bb cc 33 dd")
# => aa bb
puts parse_it("123 456 aaa")
# => aaa
See the Ruby demo
The first regex \A(?:[\d[:space:]]*\d)?(\D+) matches:
\A - start of a string
(?:[\d[:space:]]*\d)? - an optional sequence of:
[\d[:space:]]* - 0+ digits or whitespaces
\d - a digit
(\D+) -Group 1 capturing 1 or more non-digits
The splitting regex is [[:space:]]{3,}, it matches 3 or more whitespaces.
It looks like this'd do it:
regex = /(?: {1,2}[[:alpha:]]{2,})+/
"123 aa bb cc 33 dd"[regex] # => " aa bb"
"123 456 aaa"[regex] # => " aaa"
(?: ... ) is a non-capturing group.
{1,2} means "find at least one, and at most two".
[[:alpha:]] is a POSIX definition for alphabet characters. It's more comprehensive than [a-z].
You should be able to figure out the rest, which is all documented in the Regexp documentation and String's [] documentation.
Will this work?
str.match(/(?: ?)?(?:[^ 0-9]+(?: ?)?)+/)[0]
or apparently
str[/(?: ?)?(?:[^ 0-9]+(?: ?)?)+/]
or using Cary's nice space match,
str[/ {,2}(?:[^ 0-9]+ {,2})+/]

Gsub causing part of string to be substituted

I want to replace all occurrences of a single quote (') with backslash single quote (\'). I tried doing this with gsub, but I'm getting partial string duplication:
a = "abc 'def' ghi"
a.gsub("'", "\\'")
# => "abc def' ghidef ghi ghi"
Can someone explain why this happens and what a solution to this is?
It happens because "\\'" has a special meaning when it occurs as the replacement argument of gsub, namely it means the post-match substring.
To do what you want, you can use a block:
a.gsub("'"){"\\'"}
# => "abc \\'def\\' ghi"
Notice that the backslash is escaped in the string inspection, so it appears as \\.
Your "\\'" actually represents a literal \' because of the backslash escaping the next backslash. And that literal \' in Ruby regex is actually a special variable that interpolates to the part of the string that follows the matched portion. So here's what's happening.
abc 'def' ghi
^
The caret points to the first match, '. Replace it with everything to its right, i.e. def' ghi.
abc def' ghidef' ghi
++++++++
Now find the next match:
abc def' ghidef' ghi
^
Once again, replace the ' with everything to its right, i.e. ghi.
abc def' ghidef ghi ghi
++++
It's possible you just need a higher dose of escaping:
a.gsub(/'/, "\\\\'" )
Result:
abc \'def\' ghi

Ruby: eval with string interpolation

I don't understand, why eval works like this:
"123 #{456.to_s} 789" # => "123 456 789"
eval('123 #{456.to_s} 789') # => 123
How can I interpolate into a string inside eval?
Update:
Thank you, friends. It worked.
So if you have a string variable with #{} that you want to eval later, you should do it as explained below:
string = '123 #{456} 789'
eval("\"" + string + "\"")
# => 123 456 789
or
string = '123 #{456} 789'
eval('"' + string + '"')
# => 123 456 789
What's happening, is eval is evaluating the string as source code. When you use double quotes, the string is interpolated
eval '"123 #{456.to_s} 789"'
# => "123 456 789"
However when you use single quotes, there is no interpolation, hence the # starts a comment, and you get
123 #{456.to_s} 789
# => 123
The string interpolation happens before the eval call because it is the parameter to the method.
Also note the 456.to_s is unnecessary, you can just do #{456}.
You wanted:
eval('"123 #{456.to_s} 789"')
. . . hopefully you can see why?
The code passed to the interpretter from eval is exactly as if you had written it (into irb, or as part of a .rb file), so if you want an eval to output a string value, the string you evaluate must include the quotes that make the expression inside it a String.

How do I modify \1 in gsub?

When I see some text that matches my pattern, I want to create a link to an external site using RedCloth that has a query link for it.
If I have something like:
Text 123 1234 12345
When I see that I want to replace it with:
"Text 123 1234 12345":http://site.com/query?value=Text%20123%201234%2012345
If I let it keep the spaces, RedCloth won't notice this as a link correctly.
Here is where I am at:
s = "this is a string which has Text 123 1234 12345 "
s = s.s.gsub(/(Text \d+ \d+ \d+)/,'"\1":http://site.com/query?value=\1'
=> "Text 123 1234 12345":http://site.com/query?value=Text 123 1234 12345"
The probem is that RedCloth stops parsing after:
"Text 123 1234 12345":http://site.com/query?value=Text
So I really need:
"Text 123 1234 12345":http://site.com/query?value=Text%20123%201234%2012345"
Is there a way I can mess with \1 in the right hand side of gsub, such that I could get the following? If not, what's the best way to do this?
Ok, thanks to the comment by Narfanator I found the following: "$1 and \1 in Ruby".
The solutions was super easy:
s = "this is a string which has Text 123 1234 12345 "
s = s.s.gsub(/(Text \d+ \d+ \d+)/){|x| "\"" + x + "\":https://site.com/query?value=" + CGI::escape(x)}

How to use sed command to add a string before a pattern string?

I want to use sed to modify my file named "baz".
When i search a pattern foo , foo is not at the beginning or end of line, i want to append bar before foo, how can i do it using sed?
Input file named baz:
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
Output file
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
You can just use something like:
sed 's/foo/barfoo/g' baz
(the g at the end means global, every occurrence on each line rather than just the first).
For an arbitrary (rather than fixed) pattern such as foo[0-9], you could use capture groups as follows:
pax$ echo 'xyz fooA abc
xyz foo5 abc
xyz fooB abc' | sed 's/\(foo[0-9]\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc
xyz fooB abc
The parentheses capture the actual text that matched the pattern and the \1 uses it in the substitution.
You can use arbitrarily complex patterns with this one, including ensuring you match only complete words. For example, only changing the pattern if it's immediately surrounded by a word boundary:
pax$ echo 'xyz fooA abc
xyz foo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc' | sed 's/\(\bfoo[0-9]\b\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc
In terms of how the capture groups work, you can use parentheses to store the text that matches a pattern for later use in the replacement. The captured identifiers are based on the ( characters reading from left to right, so the regex (I've left off the \ escape characters and padded it a bit for clarity):
( ( \S* ) ( \S* ) )
^ ^ ^ ^ ^ ^
| | | | | |
| +--2--+ +--3--+ |
+---------1---------+
when applied to the text Pax Diablo would give you three groups:
\1 = Pax Diablo
\2 = Pax
\3 = Diablo
as shown below:
pax$ echo 'Pax Diablo' | sed 's/\(\(\S*\) \(\S*\)\)/[\1] [\2] [\3]/'
[Pax Diablo] [Pax] [Diablo]
Just substitute the start of the line with something different.
sed '/^foo/s/^/bar/'
To replace or modify all "foo" except at beginning or end of line, I would suggest to temporarily replace them at beginning and end of line with a unique sentinel value.
sed 's/^foo/____veryunlikelytoken_bol____/
s/foo$/____veryunlikelytoken_eol____/
s/foo/bar&/g
s/^____veryunlikelytoken_bol____/foo/
s/____veryunlikelytoken_eol____$/foo/'
In sed there is no way to specify "cannot match here". In Perl regex and derivatives (meaning languages which borrowed from Perl's regex, not necessarily languages derived from Perl) you have various negative assertions so you can do something like
perl -pe 's/(?!^)foo(?!$)/barfoo/g'

Resources