Find the longest substring in a string - ruby

I would like to find the longest sequence of repeated characters in a string.
ex:
"aabbccc" #=> ccc
"aabbbddccdddd" #=> dddd
etc
In the first example, ccc is the longest sequence because c is repeated 3 times. In the second example, dddd is the longest sequence because d is repeated 4 times.
It should be something like this:
b = []
a.scan(/(.)(.)(.)/) do |x,y,z|
b<<x<<y<<z if x==y && y==z
end
but with some flags to keep the count of repeating, I guess

This should work:
string = 'aabbccc'
string.chars.chunk {|a| a}.max_by {|_, ary| ary.length}.last.join
Update:
Explanation of |_, ary|: at this point we have array of 2-element arrays. We only need to use the second one and we ignore the first one. If instead we do |char, ary| some IDEs would complain about unused local variable. Placing _ tells ruby to ignore that value.
Using regex:
We can achieve same thing with regex:
string.scan(/([a-z])(\1*)/).map(&:join).max_by(&:length)

Here's a solution using a regular expression:
LETTER_MATCH = Regexp.new(('a'..'z').collect do |letter|
"#{letter}+"
end.join('|'))
def repeated(string)
string.scan(LETTER_MATCH).sort_by(&:length).last
end

Here's another solution. It's bigger but it still works)
def most_friquent_char_in_a_row(my_str)
my_str = my_str.chars
temp=[]
ary=[]
for i in 0..my_str.count-1
if my_str[i]==my_str[i+1]
temp<<my_str[i] unless temp.include?(my_str[i])
temp<<my_str[i+1]
else
ary<<temp
temp=[]
end
end
result = ary.max_by(&:size).join
p "#{result} - #{result.size}"
end

Related

Space before first word after .join array to string

To convert array to string I used Array#join and got space between the
beginning of the string, first quote mark, and the first word. I do not understand why this is happening.
I resolved with String#strip but I would like to understand
def order(words)
arr_new = []
arr = words.split(" ")
nums = ["1","2","3","4","5","6","7","8","9"]
arr.each do |word|
nums.each do |num|
if word.include? num
arr_new[num.to_i] = word
end
end
end
arr_new.join(" ").strip
end
order("is2 Thi1s T4est 3a")
Without .strip the output is:
" Thi1s is2 3a T4est"
After .strip:
"Thi1s is2 3a T4est"
The reason you're seeing the extra space is because arrays in ruby are 0 indexed, so you have an nil array element because your first insert is a index 1
x = []
x[1] = "test"
This creates an array as such:
[
nil,
"test"
]
If you created an empty array named x and assigned x[10] = "test" you'd have 10 nil values, and the word "test" in your array.
So, your array, before joining, is actually:
[nil, "Thi1s", "is2", "3a", "T4est"]
You have a couple options:
Change your strings to start with zero
Change your assignment to adjust the offset (subtract one)
Use compact before you join (this will remove nils)
Use strip as you noted
I'd suggest compact because it would address a few edge cases (such as "gaps" in your numbers.
More info in the array docs
#Jay's explanation is indeed correct.
I'll simply suggest a cleaner version of your code that doesn't have the same problem.
This assumes that the 1-9 order isn't dynamic. Aka wouldn't work if you wanted to sort by random characters for example.
def order(words)
words.split.sort_by { |word| word[/\d/].to_i }.join ' '
end

Determine if the end of a string overlaps with beginning of a separate string

I want to find if the ending of a string overlaps with the beginning of separate string. For example if I have these two strings:
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
How do I find that the "but I" part at the end of string_1 is the same as the beginning of string_2?
I could write a method to loop over the two strings, but I'm hoping for an answer that has a Ruby string method that I missed or a Ruby idiom.
Set MARKER to some string that never appears in your string_1 and string_2. There are ways to do that dynamically, but I assume you can come up with some fixed such string in your case. I assume:
MARKER = "###"
to be safe for you case. Change it depending on your use case. Then,
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => true
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but you do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => false
You can use a simple loop and test at the end:
a=string_1.split(/\b/)
idx=0
while (idx<=a.length) do
break if string_2.start_with?(a[idx..-1].join)
idx+=1
end
p a[idx..-1].join if idx<a.length
Since this starts at 0, the longest sub string overlap is found.
You can use the same logic in a .detect block on the same array:
> a[(0..a.length).detect { |idx| string_2.start_with?(a[idx..-1].join) }..-1].join
=> "but I"
Or, as pointed out in comments, you can use the strings vs the array
string_1[(0..string_1.length).detect { |idx| string_2.start_with?(string_1[idx..-1]) }..-1]
Here's a solution that works by comparing the end of string_1 to the start of string_2—using the greatest common length as a starting point—with at least one matching character. It returns the index (from the end of string_1 or the beginning of string_2) if any matching character(s) are found, which can be used to extract the matching portion.
class String
def oindex(other)
[length, other.length].min.downto(1).detect do |i|
end_with?(other[0, i])
end
end
end
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
if (idx = string_1.oindex(string_2))
puts "Last #{idx} characters match: #{string_1[-idx..-1]}"
end
Here's an alternative that finds all the indexes of the first character of the other string in the string, and uses those indexes as starting points to check for matches:
class String
def each_index(other)
return enum_for(__callee__, other) unless block_given?
i = -1
yield i while i = index(other, i.succ)
end
def oindex(other)
each_index(other.chr).detect do |i|
other.start_with?(self[i..-1]) and break length - i
end
end
end
This should be more efficient than checking every index, especially on longer strings with shorter matches, but I haven't benchmarked it.
Here are a couple of ways to do that. The first converts the two strings to arrays and then compares sequences from those arrays. The second operates on the two strings directly, comparing substrings.
#1 Convert strings to arrays and compare sequences from those arrays
Here's a simple alternative that requires the strings to be converted to arrays of words. It assumes all pairs of words are separated by one space.
def begins_with_ends?(end_str, begin_str)
end_arr = end_str.split
begin_arr = begin_str.split
!!begin_arr.each_index.find { |i| begin_arr[0,i+1] == end_arr[-1-i..-1] }
end
!!obj converts obj to false when it's "falsy" (nil or false) and to true when it's "truthy" (not "falsy"). For example, !!3 #=> true and !!nil #=> false.
end_str = 'People say nothing is impossible, but I when I'
begin_str = 'but I when I do nothing every day.'
begins_with_ends?(end_str, begin_str)
#=> true
Here the match is on the second word "I" in begin_str. Often, however, the last word of end_str only matches (at most) a single word in begin_str
#2 Compare substrings
I've implemented the following algorithm.
Set start_search to 0.
Attempt to match the last word of end_str (value of target) in begin_str, beginning at offset start_search. If no match is found return false; else let idx be the index of start_str where the last character of target appears.
Return true if the string comprised of the first idx characters of begin_str equals the string comprised by the last idx characters of end_str; else set start_search = idx + 2 and repeat step 2.
def begins_with_ends?(end_str, begin_str)
target = end_str[/[[:alnum:]]+\z/]
start_idx = 0
loop do
idx = begin_str.index(/\b#{target}\b/, start_idx)
return false if idx.nil?
idx += target.size
return true if end_str[-idx..-1] == begin_str[0, idx]
start_idx = idx + 2
end
end
begins_with_ends?(end_str, begin_str)
#=> true
This approach recognizes different numbers of spaces between the same two words in both strings (in which case there is no match).
Perhaps something like this would meet your needs?
string_1.split(' ') - string_2.split(' ')
=> ["People", "say", "is", "impossible,"]
Or this is more convoluted, but would give you the exact overlap:
string_2.
chars.
each_with_index.
map { |_, i| string_1.match(string_2[0..i]) }.
select { |s| s }.
max { |x| x.length }.
to_s
=> "but I"

Counting words in Ruby with some exceptions

Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Randomly replace letters in word

I tried to write a function which will be able to randomly change letters in word except first and last one.
def fun(string)
z=0
s=string.size
tab=string
a=(1...s-1).to_a.sample s-1
for i in 1...(s-1)
puts tab[i].replace(string[a[z]])
z=z+1
end
puts tab
end
fun("sample")
My output is:
p
l
a
m
sample
Anybody know how to make it my tab be correct?
it seems to change in for block, because in output was 'plamp' so it's random as I wanted but if I want to print the whole word (splampe) it doesn't working. :(
What about:
def fun(string)
first, *middle, last = string.chars
[first, middle.shuffle, last].join
end
fun("sample") #=> "smalpe"
s = 'sample'
[s[0], s[1..-2].chars.shuffle, s[-1]].join
# => "slpmae"
Here is my solution:
def fun(string)
first = string[0]
last = string[-1]
middle = string[1..-2]
puts "#{first}#{middle.split('').shuffle.join}#{last}"
end
fun('sample')
there are some problems with your function. First, when you say tab=string, tab is now a reference to string, so, when you change characters on tab you change the string characters too. I think that for clarity is better to keep the index of sample (1....n)to reference the position in the original array.
I suggest the usage of tab as a new array.
def fun(string)
if string.length <= 2
return
z=1
s=string.size
tab = []
tab[0] = string[0]
a=(1...s-1).to_a.sample(s-1)
(1...s-1).to_a.each do |i|
tab[z] = string[a[i - 1]]
z=z+1
end
tab.push string[string.size-1]
tab.join('')
end
fun("sample")
=> "spalme"
Another way, using String#gsub with a block:
def inner_jumble(str)
str.sub(/(?<=\w)\w{2,}(?=\w)/) { |s| s.chars.shuffle.join }
end
inner_jumble("pneumonoultramicroscopicsilicovolcanoconiosis") # *
#=> "poovcanaiimsllinoonroinuicclprsciscuoooomtces"
inner_jumble("what ho, fellow coders?")
#=> "waht ho, folelw coedrs?"
(?<=\w) is a ("zero-width") positive look-behind that requires the match to immediately follow a word character.
(?=\w) is a ("zero-width") positive look-ahead that requires the match to be followed immediately by a word character.
You could use \w\w+ in place of \w{2,} for matching two or more consecutive word characters.
If you only want it to apply to individual words, you can use gsub or sub.
*A lung disease caused by inhaling very fine ash and sand dust, supposedly the longest word in some English dictionaries.

Resources