Remove duplicate substrings from a string - ruby

str = "hi ram hi shyam hi jhon"
I want something like:
"ram hi shyam hi jhon"
"ram shyam hi jhon"

I assume you want to remove duplicate occurrences of all words, not just "hi". Here are two ways of doing that.
1 Use String#reverse, Array#reverse and Array#uniq
str = "hi shyam ram hi shyam hi jhon"
str.split.reverse.uniq.reverse.join(' ')
#=> "ram shyam hi jhon"
The doc for uniq states: "self is traversed in order, and the first occurrence is kept."
2 Use a regular expression
r = /
\b # match a word break
(\w+) # match a word in capture group 1
\s # match a trailing space
(?= # begin a positive lookahead
.* # match any number of characters
\s # match a space
\1 # match the contents of capture group 1
\b # match a word break
) # end the positive lookahead
/x # free-spacing regex definition mode
str.gsub(r, '')
#=> "ram shyam hi jhon"
To remove the extra spaces change \s to \s+ in the third line of the regex definition.

str = "hi ram hi shyam hi jhon"
To remove one occurrence:
str.sub('hi', '').strip.squeeze
#⇒ "ram hi shyam hi jhon"
To remove n occurrences:
n.times { str.replace(str.sub('hi', '').strip.squeeze) }

You are looking for sub!:
str = "hi ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam jhon"
In-case you do not what to modify your original string, which is not how the example looks like, you might want to use sub instead and an extra variable

Related

How do I write a regex that captures the first non-numeric part of string that also doesn't include 3 or more spaces?

I'm using Ruby 2.4. I want to extract from a string the first consecutive occurrence of non-numeric characters that do not include at least three or more spaces. For example, in this string
str = "123 aa bb cc 33 dd"
The first such occurrence is " aa bb ". I thought the below expression would help me
data.split(/[[:space:]][[:space:]][[:space:]]+/).first[/\p{L}\D+\p{L}\p{L}/i]
but if the string is "123 456 aaa", it fails to return " aaa", which I would want it to.
r = /
(?: # begin non-capture group
[ ]{,2} # match 0, 1 or 2 spaces
[^[ ]\d]+ # match 1+ characters that are neither spaces nor digits
)+ # end non-capture group and perform 1+ times
[ ]{,2} # match 0, 1 or 2 spaces
/x # free-spacing regex definition mode
str = "123 aa bb cc 33 dd"
str[r] #=> " aa bb "
Note that [ ] could be replaced by a space if free-spacing regex definition mode is not used:
r = /(?: {,2}[^ \d]+)+ {,2}/
Remove all digits + spaces from the start of a string. Then split with 3 or more whitespaces and grab the first item.
def parse_it(s)
s[/\A(?:[\d[:space:]]*\d)?(\D+)/, 1].split(/[[:space:]]{3,}/).first
end
puts parse_it("123 aa bb cc 33 dd")
# => aa bb
puts parse_it("123 456 aaa")
# => aaa
See the Ruby demo
The first regex \A(?:[\d[:space:]]*\d)?(\D+) matches:
\A - start of a string
(?:[\d[:space:]]*\d)? - an optional sequence of:
[\d[:space:]]* - 0+ digits or whitespaces
\d - a digit
(\D+) -Group 1 capturing 1 or more non-digits
The splitting regex is [[:space:]]{3,}, it matches 3 or more whitespaces.
It looks like this'd do it:
regex = /(?: {1,2}[[:alpha:]]{2,})+/
"123 aa bb cc 33 dd"[regex] # => " aa bb"
"123 456 aaa"[regex] # => " aaa"
(?: ... ) is a non-capturing group.
{1,2} means "find at least one, and at most two".
[[:alpha:]] is a POSIX definition for alphabet characters. It's more comprehensive than [a-z].
You should be able to figure out the rest, which is all documented in the Regexp documentation and String's [] documentation.
Will this work?
str.match(/(?: ?)?(?:[^ 0-9]+(?: ?)?)+/)[0]
or apparently
str[/(?: ?)?(?:[^ 0-9]+(?: ?)?)+/]
or using Cary's nice space match,
str[/ {,2}(?:[^ 0-9]+ {,2})+/]

Ruby - How to remove space after some characters?

I need to remove white spaces after some characters, not all of them. I want to remove whites spaces after these chars: I,R,P,O. How can I do it?
"I ".gsub(/(?<=[IRPO]) /, "") # => "I"
"A ".gsub(/(?<=[IRPO]) /, "") # => "A "
" P $ R 3I&".gsub(/([IRPO])\s+/,'\1')
#=> " P$ R3I&"

what would the regular expression to extract the 3 from be?

I basically need to get the bit after the last pipe
"3083505|07733366638|3"
What would the regular expression for this be?
You can do this without regex. Here:
"3083505|07733366638|3".split("|").last
# => "3"
With regex: (assuming its always going to be integer values)
"3083505|07733366638|3".scan(/\|(\d+)$/)[0][0] # or use \w+ if you want to extract any word after `|`
# => "3"
Try this regex :
.*\|(.*)
It returns whatever comes after LAST | .
You could do that most easily by using String#rindex:
line = "3083505|07733366638|37"
line[line.rindex('|')+1..-1]
#=> "37"
If you insist on using a regex:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
(.+) # match one or more characters in capture group 1
/x # extended mode
line[r,1]
#=> "37"
Alternatively:
r = /
.* # match any number of any character (greedily!)
\| # match pipe
\K # forget everything matched so far
.+ # match one or more characters
/x # extended mode
line[r]
#=> "37"
or, as suggested by #engineersmnky in a comment on #shivam's answer:
r = /
(?<=\|) # match a pipe in a positive lookbehind
\d+ # match any number of digits
\z # match end of string
/x # extended mode
line[r]
#=> "37"
I would use split and last, but you could do
last_field = line.sub(/.+\|/, "")
That remove all chars up to and including the last pipe.

Check if string1 is before string2 on the same line

I am trying to match comment lines in a c#/sql code. CREATE may come before or after /*. They can be on the same line.
line6 = " CREATE /* this is ACTIVE line 6"
line5 = " charlie /* CREATE inside this is comment 5"
In the first case, it will be an active line; in the second, it will be a comment. I probably can do some kind of charindex, but maybe there is a simpler way
regex1 = /\/\*||\-\-/
if (line1 =~ regex1) then puts "Match comment___" + line6 else puts '____' end
if (line1 =~ regex1) then puts "Match comment___" + line5 else puts '____' end
With the regex
r = /
\/ # match forward slash
\* # match asterisk
\s+ # match > 0 whitespace chars
CREATE # match chars
\b # match word break (to avoid matching CREATED)
/ # extended mode for regex def
you can return an array of the comment lines thus:
[line6, line5].select { |l| l =~ r }
#=> [" charlie /* CREATE inside this is comment 5"]

Extracting word in with regex

I want to replace $word with another word in the following string:
"Hello $word How are you"
I used /\$(.*)/, /\$(.*)(\s)/ , /\$(.* \s)/. Due to *, I get the whole string after $, but I only need that word; I need to escape the space. I tried /s,\b, and few other options, but I cannot figure it out. Any help would be appreciated.
* is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match. The token .* will greedily match every single character in the string. The regex engine will then advance to the next token \s which matches the last whitespace before the word "you" in the string given you a result of word How are.
You can use \S in place of .* which matches any non-whitespace characters.
\$\S+
Or to simply match only word characters, you can use the following:
\$\w+
If you only want to replace "$world" using a regex, try this:
"Hello $word How are you".gsub(/\$word/, 'other_word')
Or:
"Hello $word How are you".sub('$word',"*")
You can read more for gsub here: http://www.ruby-doc.org/core-2.2.0/String.html#method-i-gsub
Substituting placeholder words for other words is usually not done with a regex but with the % method and a hash:
h = {word: "aaa", other_word: "bbb"}
p "Hello %{word} How are you. %{other_word}. Bye %{word}" % h
# => "Hello aaa How are you. bbb. Bye aaa"
Consider:
>> string = "Hello $word How are you"
=> "Hello $word How are you"
>> replace_regex = /(?<replace_word>\$\w+)/
=> /(?<replace_word>\$\w+)/
>> string.gsub(replace_regex, "Bob")
=> "Hello Bob How are you"
>> string.match(replace_regex)[:replace_word]
=> "$word"
Note:
replace_word is the regex with a named capture group.

Resources