How do I keep the split token in the second part of what was split in Ruby? - ruby

In Ruby, how do you split a stirng and keep the token with which you are splitting on in the second part of the result of the split? I have
line.split(/(?<=#{Regexp.escape(split_token)})/)
But the token is getting merged into the first part of teh split and I want it in the second part
2.4.0 :004 > split_token = "aaa"
=> "aaa"
2.4.0 :005 > line = "bbb aaa ccc"
=> "bbb aaa ccc"
2.4.0 :006 > line.split(/(?<=#{Regexp.escape(split_token)})/)
=> ["bbb aaa", " ccc"]

Changing lookbehind ((?<=) to lookahead ((?=) seems to do the trick:
split_token = "aaa"
line = "bbb aaa ccc"
line.split(/(?=#{Regexp.escape(split_token)})/)
# => ["bbb ", "aaa ccc"]
This just changes the split point to before the token rather than after it.

Another possibility is to use slice_before :
line.split.slice_before('aaa').map{|s| s.join(' ')}

Related

How do I match something that is not a letter or a number or a space?

I'm using Ruby 2.4. How do I match something that is not a letter or a number or a space? I tried
2.4.0 :004 > str = "-"
=> "-"
2.4.0 :005 > str =~ /[^[:alnum:]]*/
=> 0
2.4.0 :006 > str = " "
=> " "
2.4.0 :007 > str =~ /[^[:alnum:]]*/
=> 0
but as you can see it is still matching a space.
Your /[^[:alnum:]]*/ pattern matches 0 or more symbols other than alphanumeric chars. It will match whitespace.
To match 1 or more chars other than alphanumeric and whitespace, you can use
/[^[:alnum:][:space:]]+/
Use the negated bracket expression with the relevant POSIX character classes inside.

Remove duplicate substrings from a string

str = "hi ram hi shyam hi jhon"
I want something like:
"ram hi shyam hi jhon"
"ram shyam hi jhon"
I assume you want to remove duplicate occurrences of all words, not just "hi". Here are two ways of doing that.
1 Use String#reverse, Array#reverse and Array#uniq
str = "hi shyam ram hi shyam hi jhon"
str.split.reverse.uniq.reverse.join(' ')
#=> "ram shyam hi jhon"
The doc for uniq states: "self is traversed in order, and the first occurrence is kept."
2 Use a regular expression
r = /
\b # match a word break
(\w+) # match a word in capture group 1
\s # match a trailing space
(?= # begin a positive lookahead
.* # match any number of characters
\s # match a space
\1 # match the contents of capture group 1
\b # match a word break
) # end the positive lookahead
/x # free-spacing regex definition mode
str.gsub(r, '')
#=> "ram shyam hi jhon"
To remove the extra spaces change \s to \s+ in the third line of the regex definition.
str = "hi ram hi shyam hi jhon"
To remove one occurrence:
str.sub('hi', '').strip.squeeze
#⇒ "ram hi shyam hi jhon"
To remove n occurrences:
n.times { str.replace(str.sub('hi', '').strip.squeeze) }
You are looking for sub!:
str = "hi ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam jhon"
In-case you do not what to modify your original string, which is not how the example looks like, you might want to use sub instead and an extra variable

Ruby replace word

exapl
I have specific situation. I am trying to replace some words in string. I have two example strings:
string1 = "aaabbb aaa bbb"
string2 = "a. bbb"
In string1 I want to replace full word "aaa" with "ccc" so I do it right this:
translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"
So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.
I tried also this:
translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"
But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.
Try
string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")
In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".
Update
As #Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:
string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# => "ccc bbb"
Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:
string = "a. b.a. a. bbb"
translation = "a."
# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"
# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"
Anyway, the rightness of my reasoning depends by OP needs ;-)
The . (dot) has a special meaning in regexes: it means match any character.
You should escape it with \.

Is it possible in Ruby to print a part of a regex (group) and instead of the whole matched substring?

Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'

Replace the pipe character "|" with line breaks?

I am trying to use Regex in my Ruby program to convert "|" character into a line breaker, so for example:
# convert("title|subtitle") => "title \n subtitle"
The regex I'm trying is the following:
title_params =~ s/\|/\\n/
But I kept getting errors saying that "|" is not recognized.
Regex is not needed for this simple problem:
=> puts "foo|bar".tr("|","\n")
foo
bar
I don't really know the syntax of your way of doing this but this works fine for me.
>> a = "title | subtitle"
=> "title | subtitle"
>> a.gsub(/\|/,"\n")
=> "title \n subtitle"

Resources