Ruby replace word - ruby

exapl
I have specific situation. I am trying to replace some words in string. I have two example strings:
string1 = "aaabbb aaa bbb"
string2 = "a. bbb"
In string1 I want to replace full word "aaa" with "ccc" so I do it right this:
translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"
So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.
I tried also this:
translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"
But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.

Try
string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")
In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".
Update
As #Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:
string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# => "ccc bbb"

Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:
string = "a. b.a. a. bbb"
translation = "a."
# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"
# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"
Anyway, the rightness of my reasoning depends by OP needs ;-)

The . (dot) has a special meaning in regexes: it means match any character.
You should escape it with \.

Related

Gsub causing part of string to be substituted

I want to replace all occurrences of a single quote (') with backslash single quote (\'). I tried doing this with gsub, but I'm getting partial string duplication:
a = "abc 'def' ghi"
a.gsub("'", "\\'")
# => "abc def' ghidef ghi ghi"
Can someone explain why this happens and what a solution to this is?
It happens because "\\'" has a special meaning when it occurs as the replacement argument of gsub, namely it means the post-match substring.
To do what you want, you can use a block:
a.gsub("'"){"\\'"}
# => "abc \\'def\\' ghi"
Notice that the backslash is escaped in the string inspection, so it appears as \\.
Your "\\'" actually represents a literal \' because of the backslash escaping the next backslash. And that literal \' in Ruby regex is actually a special variable that interpolates to the part of the string that follows the matched portion. So here's what's happening.
abc 'def' ghi
^
The caret points to the first match, '. Replace it with everything to its right, i.e. def' ghi.
abc def' ghidef' ghi
++++++++
Now find the next match:
abc def' ghidef' ghi
^
Once again, replace the ' with everything to its right, i.e. ghi.
abc def' ghidef ghi ghi
++++
It's possible you just need a higher dose of escaping:
a.gsub(/'/, "\\\\'" )
Result:
abc \'def\' ghi

Extracting word in with regex

I want to replace $word with another word in the following string:
"Hello $word How are you"
I used /\$(.*)/, /\$(.*)(\s)/ , /\$(.* \s)/. Due to *, I get the whole string after $, but I only need that word; I need to escape the space. I tried /s,\b, and few other options, but I cannot figure it out. Any help would be appreciated.
* is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match. The token .* will greedily match every single character in the string. The regex engine will then advance to the next token \s which matches the last whitespace before the word "you" in the string given you a result of word How are.
You can use \S in place of .* which matches any non-whitespace characters.
\$\S+
Or to simply match only word characters, you can use the following:
\$\w+
If you only want to replace "$world" using a regex, try this:
"Hello $word How are you".gsub(/\$word/, 'other_word')
Or:
"Hello $word How are you".sub('$word',"*")
You can read more for gsub here: http://www.ruby-doc.org/core-2.2.0/String.html#method-i-gsub
Substituting placeholder words for other words is usually not done with a regex but with the % method and a hash:
h = {word: "aaa", other_word: "bbb"}
p "Hello %{word} How are you. %{other_word}. Bye %{word}" % h
# => "Hello aaa How are you. bbb. Bye aaa"
Consider:
>> string = "Hello $word How are you"
=> "Hello $word How are you"
>> replace_regex = /(?<replace_word>\$\w+)/
=> /(?<replace_word>\$\w+)/
>> string.gsub(replace_regex, "Bob")
=> "Hello Bob How are you"
>> string.match(replace_regex)[:replace_word]
=> "$word"
Note:
replace_word is the regex with a named capture group.

Is it possible in Ruby to print a part of a regex (group) and instead of the whole matched substring?

Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'

Why $ doesn't match \r\n

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.
$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

How to escape newline in regex scan

str = "This\n is a sample text for test"
str.scan(/\S.{0,15}\S(?=\s|$)|\S+/)
# => ["This", "is a sample text", "for test"]
Here, it splits when the newline (\n) is present. I actually want the output as,
["This\n is a", "sample text for", "test"]
How can I achieve that?
Use the /m modifier which allows the dot to match newlines:
str.scan(/\S.{0,15}\S(?=\s|\z)|\S+/m)
Also, I suggest you use \z instead of $ because $ matches the end of a line; \z is the only way to force Ruby to match the end of the string. It doesn't matter in this example, but it's a good habit to get into. Ruby differs from all other regex flavors in these two points.

Resources