In Ruby, how do you split a stirng and keep the token with which you are splitting on in the second part of the result of the split? I have
line.split(/(?<=#{Regexp.escape(split_token)})/)
But the token is getting merged into the first part of teh split and I want it in the second part
2.4.0 :004 > split_token = "aaa"
=> "aaa"
2.4.0 :005 > line = "bbb aaa ccc"
=> "bbb aaa ccc"
2.4.0 :006 > line.split(/(?<=#{Regexp.escape(split_token)})/)
=> ["bbb aaa", " ccc"]
Changing lookbehind ((?<=) to lookahead ((?=) seems to do the trick:
split_token = "aaa"
line = "bbb aaa ccc"
line.split(/(?=#{Regexp.escape(split_token)})/)
# => ["bbb ", "aaa ccc"]
This just changes the split point to before the token rather than after it.
Another possibility is to use slice_before :
line.split.slice_before('aaa').map{|s| s.join(' ')}
Related
I'm using Ruby 2.4. How do I match something that is not a letter or a number or a space? I tried
2.4.0 :004 > str = "-"
=> "-"
2.4.0 :005 > str =~ /[^[:alnum:]]*/
=> 0
2.4.0 :006 > str = " "
=> " "
2.4.0 :007 > str =~ /[^[:alnum:]]*/
=> 0
but as you can see it is still matching a space.
Your /[^[:alnum:]]*/ pattern matches 0 or more symbols other than alphanumeric chars. It will match whitespace.
To match 1 or more chars other than alphanumeric and whitespace, you can use
/[^[:alnum:][:space:]]+/
Use the negated bracket expression with the relevant POSIX character classes inside.
str = "hi ram hi shyam hi jhon"
I want something like:
"ram hi shyam hi jhon"
"ram shyam hi jhon"
I assume you want to remove duplicate occurrences of all words, not just "hi". Here are two ways of doing that.
1 Use String#reverse, Array#reverse and Array#uniq
str = "hi shyam ram hi shyam hi jhon"
str.split.reverse.uniq.reverse.join(' ')
#=> "ram shyam hi jhon"
The doc for uniq states: "self is traversed in order, and the first occurrence is kept."
2 Use a regular expression
r = /
\b # match a word break
(\w+) # match a word in capture group 1
\s # match a trailing space
(?= # begin a positive lookahead
.* # match any number of characters
\s # match a space
\1 # match the contents of capture group 1
\b # match a word break
) # end the positive lookahead
/x # free-spacing regex definition mode
str.gsub(r, '')
#=> "ram shyam hi jhon"
To remove the extra spaces change \s to \s+ in the third line of the regex definition.
str = "hi ram hi shyam hi jhon"
To remove one occurrence:
str.sub('hi', '').strip.squeeze
#⇒ "ram hi shyam hi jhon"
To remove n occurrences:
n.times { str.replace(str.sub('hi', '').strip.squeeze) }
You are looking for sub!:
str = "hi ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam jhon"
In-case you do not what to modify your original string, which is not how the example looks like, you might want to use sub instead and an extra variable
exapl
I have specific situation. I am trying to replace some words in string. I have two example strings:
string1 = "aaabbb aaa bbb"
string2 = "a. bbb"
In string1 I want to replace full word "aaa" with "ccc" so I do it right this:
translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"
So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.
I tried also this:
translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"
But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.
Try
string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")
In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".
Update
As #Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:
string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# => "ccc bbb"
Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:
string = "a. b.a. a. bbb"
translation = "a."
# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"
# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"
Anyway, the rightness of my reasoning depends by OP needs ;-)
The . (dot) has a special meaning in regexes: it means match any character.
You should escape it with \.
Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'
I am trying to use Regex in my Ruby program to convert "|" character into a line breaker, so for example:
# convert("title|subtitle") => "title \n subtitle"
The regex I'm trying is the following:
title_params =~ s/\|/\\n/
But I kept getting errors saying that "|" is not recognized.
Regex is not needed for this simple problem:
=> puts "foo|bar".tr("|","\n")
foo
bar
I don't really know the syntax of your way of doing this but this works fine for me.
>> a = "title | subtitle"
=> "title | subtitle"
>> a.gsub(/\|/,"\n")
=> "title \n subtitle"