Splitting string based on word - ruby

I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.

You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end

str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"

Related

Splitting the string with last underscore

I have a string like "a_b_c" or "a_b_c_d" or "a_b_c_d_e". I want to split the string at the last underscore.
**input**
'a_b_c'
**output**
a_b
c
**input**
'a_b_c_d'
**output**
a_b_c
d
I have done the following:
a='a_b_c'
a=a.split('_')
last=a.pop
a.delete(last)
p a.join("_")
p last
and achieved the result, but I don't think this should be done this way. I hope there is some regular expression to achieve this. Is there anyone who can help me with this?
You can use String#rpartition that searches for a given pattern form the right end of the string and splits when it finds it.
'a_b_c_d_e'.rpartition(/_/)
=> ["a_b_c_d", "_", "e"]
s = 'a_b_c_d_e'
parts = s.rpartition(/_/)
[parts.first, parts.last]
=> ["a_b_c_d", "e"]
EDIT: applying advices from the comments:
'a_b_c_d_e'.rpartition('_').values_at(0,2)
=> ["a_b_c_d", "e"]
Do you really need to split? How about just replacing the _ with a space? e.g. using rindex and []=
a[a.rindex('_')] = ' '
I didn't do a benchmark, but split creates a new array, which typically requires more resources, at least in other languages.
EDIT: as the question was edited, its now clear the OP is asking for a list instead of a string output
You can also get values as below,
> a = a.split('_')
> a[0..-2].join('_')
# => "a_b_c_d"
> a[-1]
# => "e"
'a_b_c_d_e'.split /_(?!.*_)/
#=> ["a_b_c_d", "e"]
The negative lookahead (?!.*_) requires that following the match of the underscore there is no other underscore in the string.
Split it with regex:
a.split(/_(?=[^_]+$)/)
Explanation:
matches the character _ with positive Lookahead (?=[^_]+$)
Match a single character not present in the list below [^_]+ and
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Assuming you know this string follows this format:
str = 'a_b_c_d_e'
# Remainder
str[0...-2] # -> 'a_b_c_d'
# Last symbol
str[-1] # -> 'e'

How to split a string in half, into two variables, in one statement?

I want to split str in half and assign each half to first and second
Like this pseudo code example:
first,second = str.split( middle )
class String
def halves
chars.each_slice(size / 2).map(&:join)
end
end
Will work, but you will need to adjust to how you want to handle odd-sized strings.
Or in-line:
first, second = str.chars.each_slice(str.length / 2).map(&:join)
first,second = str.partition(/.{#{str.size/2}}/)[1,2]
Explanation
You can use partition. Using a regex pattern to look for X amount of characters (in this case str.size / 2).
Partition returns three elements; head, match, and tail. Because we are matching on any character, the head will always be a blank string. So we only care about the match and tail hence [1,2]
Here are two ways to do that
rgx = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
.{#{str.size/2}} # match any character #{str.size/2} times
) # end positive lookbehind
/x # invoke free-spacing regex definition mode
def halves(str)
str.split(rgx)
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
The regular expression is conventionally written
/(?<=\A.{#{str.size/2}})/
Note that the regular expression matches a location between two successive characters.
def halves(str)
[str[0, str.size/2], str[str.size/2..-1]]
end
first, second = halves('abcdef')
#=> ["abc", "def"]
first, second = halves('abcde')
#=> ["ab", "cde"]
Note: This only works with even length strings.
Along the line of your pseudocode,
first, second = string[0...string.length/2], string[string.length/2...string.length]
If string is the original string.

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

How do I remove a substring after a certain character in a string using Ruby?

How do I remove a substring after a certain character in a string using Ruby?
new_str = str.slice(0..(str.index('blah')))
I find that "Part1?Part2".split('?')[0] is easier to read.
I'm surprised nobody suggested to use 'gsub'
irb> "truncate".gsub(/a.*/, 'a')
=> "trunca"
The bang version of gsub can be used to modify the string.
str = "Hello World"
stopchar = 'W'
str.sub /#{stopchar}.+/, stopchar
#=> "Hello W"
A special case is if you have multiple occurrences of the same character and you want to delete from the last occurrence to the end (not the first one).
Following what Jacob suggested, you just have to use rindex instead of index as rindex gets the index of the character in the string but starting from the end.
Something like this:
str = '/path/to/some_file'
puts str.slice(0, str.index('/')) # => ""
puts str.slice(0, str.rindex('/')) # => "/path/to"
We can also use partition and rpartitiondepending on whether we want to use the first or last instance of the specified character:
string = "abc-123-xyz"
last_char = "-"
string.partition(last_char)[0..1].join #=> "abc-"
string.rpartition(last_char)[0..1].join #=> "abc-123-"

How to replace the last occurrence of a substring in ruby?

I want to replace the last occurrence of a substring in Ruby. What's the easiest way?
For example, in abc123abc123, I want to replace the last abc to ABC. How do I do that?
How about
new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
For instance:
irb(main):001:0> old_str = "abc123abc123"
=> "abc123abc123"
irb(main):002:0> pattern="abc"
=> "abc"
irb(main):003:0> replacement="ABC"
=> "ABC"
irb(main):004:0> new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
=> "abc123ABC123"
"abc123abc123".gsub(/(.*(abc.*)*)(abc)(.*)/, '\1ABC\4')
#=> "abc123ABC123"
But probably there is a better way...
Edit:
...which Chris kindly provided in the comment below.
So, as * is a greedy operator, the following is enough:
"abc123abc123".gsub(/(.*)(abc)(.*)/, '\1ABC\3')
#=> "abc123ABC123"
Edit2:
There is also a solution which neatly illustrates parallel array assignment in Ruby:
*a, b = "abc123abc123".split('abc', -1)
a.join('abc')+'ABC'+b
#=> "abc123ABC123"
Since Ruby 2.0 we can use \K which removes any text matched before it from the returned match. Combine with a greedy operator and you get this:
'abc123abc123'.sub(/.*\Kabc/, 'ABC')
#=> "abc123ABC123"
This is about 1.4 times faster than using capturing groups as Hirurg103 suggested, but that speed comes at the cost of lowering readability by using a lesser-known pattern.
more info on \K: https://www.regular-expressions.info/keep.html
Here's another possible solution:
>> s = "abc123abc123"
=> "abc123abc123"
>> s[s.rindex('abc')...(s.rindex('abc') + 'abc'.length)] = "ABC"
=> "ABC"
>> s
=> "abc123ABC123"
When searching in huge streams of data, using reverse will definitively* lead to performance issues. I use string.rpartition*:
sub_or_pattern = "!"
replacement = "?"
string = "hello!hello!hello"
array_of_pieces = string.rpartition sub_or_pattern
( array_of_pieces[(array_of_pieces.find_index sub_or_pattern)] = replacement ) rescue nil
p array_of_pieces.join
# "hello!hello?hello"
The same code must work with a string with no occurrences of sub_or_pattern:
string = "hello_hello_hello"
# ...
# "hello_hello_hello"
*rpartition uses rb_str_subseq() internally. I didn't check if that function returns a copy of the string, but I think it preserves the chunk of memory used by that part of the string. reverse uses rb_enc_cr_str_copy_for_substr(), which suggests that copies are done all the time -- although maybe in the future a smarter String class may be implemented (having a flag reversed set to true, and having all of its functions operating backwards when that is set), as of now, it is inefficient.
Moreover, Regex patterns can't be simply reversed. The question only asks for replacing the last occurrence of a sub-string, so, that's OK, but readers in the need of something more robust won't benefit from the most voted answer (as of this writing)
You can achieve this with String#sub and greedy regexp .* like this:
'abc123abc123'.sub(/(.*)abc/, '\1ABC')
simple and efficient:
s = "abc123abc123abc"
p = "123"
s.slice!(s.rindex(p), p.size)
s == "abc123abcabc"
string = "abc123abc123"
pattern = /abc/
replacement = "ABC"
matches = string.scan(pattern).length
index = 0
string.gsub(pattern) do |match|
index += 1
index == matches ? replacement : match
end
#=> abc123ABC123
I've used this handy helper method quite a bit:
def gsub_last(str, source, target)
return str unless str.include?(source)
top, middle, bottom = str.rpartition(source)
"#{top}#{target}#{bottom}"
end
If you want to make it more Rails-y, extend it on the String class itself:
class String
def gsub_last(source, target)
return self unless self.include?(source)
top, middle, bottom = self.rpartition(source)
"#{top}#{target}#{bottom}"
end
end
Then you can just call it directly on any String instance, eg "fooBAR123BAR".gsub_last("BAR", "FOO") == "fooBAR123FOO"
.gsub /abc(?=[^abc]*$)/, 'ABC'
Matches a "abc" and then asserts ((?=) is positive lookahead) that no other characters up to the end of the string are "abc".

Resources