Finding Longest Substring No Duplicates - Help Optimizing Code [Ruby] - ruby

So I've been trying to solve a Leetcode Question, "Given a string, find the length of the longest substring without repeating characters."
For example
Input: "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.
Currently I optimized my algorithm when it comes to figuring out if the substring is unique by using a hash table. However my code still runs in O(n^2) runtime, and as a result exceeds the time limit during submissions.
What i try to do is to essentially go through every single possible substring and check if it has any duplicate values. Am I as efficient as it gets when it comes to the brute force method here? I know there's other methods such as a sliding window method but I'm trying to get the brute force method down first.
# #param {String} s
# #return {Integer}
def length_of_longest_substring(s)
max_length = 0
max_string = ""
n = s.length
for i in (0..n-1)
for j in (i..n-1)
substring = s[i..j]
#puts substring
if unique(substring)
if substring.length > max_length
max_length = substring.length
max_string = substring
end
end
end
end
return max_length
end
def unique(string)
hash = Hash.new(false)
array = string.split('')
array.each do |char|
if hash[char] == true
return false
else
hash[char] = true
end
end
return true
end

Approach
Here is a way of doing that with a hash that maps characters to indices. For a string s, suppose the characters in the substring s[j..j+n-1] are unique, and therefore the substring is a candidate for the longest unique substring. The next element is therefore e = s[j+n] We wish to determine if s[j..j+n-1] includes e. If it does not we can append e to the substring, keeping it unique.
If s[j..j+n-1] includes e, we determine if n (the size of the substring) is greater than the length of the previously-known substring, and update our records if it is. To determine if s[j..j+n-1] includes e, we could perform a linear search of the substring, but it is faster to maintain a hash c_to_i whose key-value pairs are s[i]=>i, i = j..j_n-1. That is, c_to_i maps the characters in the substring to their indices in full string s. That way we can merely evaluate c_to_i.key?(e) to see if the substring contains e. If the substring includes e, we use c_to_i to determine its index in s and add one: j = c_to_i[e] + 1. The new substring is therefore s[j..j+n-1] with the new value of j. Note that several characters of s may be skipped in this step.
Regardless of whether the substring contained e, we must now append e to the (possibly-updated) substring, so that it becomes s[j..j+n].
Code
def longest_no_repeats(str)
c_to_i = {}
longest = { length: 0, end: nil }
str.each_char.with_index do |c,i|
j = c_to_i[c]
if j
longest = { length: c_to_i.size, end: i-1 } if
c_to_i.size > longest[:length]
c_to_i.reject! { |_,k| k <= j }
end
c_to_i[c] = i
end
c_to_i.size > longest[:length] ? { length: c_to_i.size, end: str.size-1 } :
longest
end
Example
a = ('a'..'z').to_a
#=> ["a", "b",..., "z"]
str = 60.times.map { a.sample }.join
#=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexssbuuawxmhprkfms"
longest = longest_no_repeats(str)
#=> {:length=>14, :end=>44}
str[0..longest[:end]]
#=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexs"
str[longest[:end]-longest[:length]+1,longest[:length]]
#=> "bivoygmupdaexs"
Efficiency
Here is a benchmark comparison to #mechnicov's code:
require 'benchmark/ips'
a = ('a'..'z').to_a
arr = 50.times.map { 1000.times.map { a.sample }.join }
Benchmark.ips do |x|
x.report("mechnicov") { arr.sum { |s| max_non_repeated(s)[:length] } }
x.report("cary") { arr.sum { |s| longest_no_repeats(s)[:length] } }
x.compare!
end
displays:
Comparison:
cary: 35.8 i/s
mechnicov: 0.0 i/s - 1198.21x slower

From your link:
Input: "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
That means you need first non-repeated substring.
I suggest here is such method
def max_non_repeated(string)
max_string = string.
each_char.
map.with_index { |_, i| string[i..].split('') }.
map do |v|
ary = []
v.each { |l| ary << l if ary.size == ary.uniq.size }
ary.uniq.join
end.
max
{
string: max_string,
length: max_string.length
}
end
max_non_repeated('pwwkew')[:string] #=> "wke"
max_non_repeated('pwwkew')[:length] #=> 3
In Ruby < 2.6 use [i..-1] instead of [i..]

Related

Ruby - Find the longest non-repeating substring in any given string

I am working on an assignment where I have to take user input of a string and search through it to find the longest non-repeating string in it. So for example:
If the string is:
"abcabcabcdef"
My output needs to be:
"abcdef is the longest substring at the value of 6 characters"
Here is my poorly made code:
class Homework_4
puts "Enter any string of alphabetical characters: "
user_input = gets
longest_str = 0
empty_string = ""
map = {}
i = 0
j = 0
def long_substr()
while j < str_len
if map.key?(user_input[j])
i = [map[user_input[j]], i].max
end
longest_str = [longest_str, j - i + 1].max
map[user_input[j]] = j + 1
j += 1
end
longest_str
end
long_substr(user_input)
end
I have been working on this for over 6 hours today and I just can't figure it out. It seems like the internet has many ways to do it. Almost all of them confuse me greatly and don't really explain what they're doing. I don't understand the syntax they use or any of the variables or conditions.
All I understand is that I need to create two indicators that go through the inputted string searching for a non-repeating substring (sliding window method). I don't understand how to create them, what to make them do or even how to make them find and build the longest substring. It is very confusing to try and read the code that is full of random letters, symbols, and conditions. I'm sure my code is all sorts of messed up but any help or tips that could point me in the right direction would be greatly appreciated!
def uniq?(s)
# All letters of s uniq?
return s.chars.uniq == s.chars
end
def subs(s)
# Return all substrings in s.
(0..s.length).inject([]){|ai,i|
(i..s.length - i).inject(ai){|aj,j|
aj << s[i,j]
}
}.uniq
end
def longest_usub(s)
# Return first longest substring of s.
substrings(s).inject{|res, s| (uniq?(s) and s.length > res.length) ? s : res}
end
ruby's inject is actually a reduce function, where inject(optional_start_value){<lambda expression>} - and the lambda expression is similar to Python's lambda x, y: <return expression using x and y> just that lambda expressions are strangely written in Ruby as {|x, y| <return expression using x and y>}.
Python's range(i, y) is Ruby's i..y.
Python's slicing s[i:j] is in Ruby s[i..j] or s[i,j].
<< means add to end of the array.
Second solution (inspired by #Rajagopalan's answer)
def usub(s)
# Return first chunk of uniq substring in s
arr = []
s.chars do |char|
break if arr.include? char
arr << char
end
arr.join
end
def usubs(s)
# Return each position's usub() in s
(0..s.length).to_a.map{|i| usub(s[i,s.length])}
end
def longest_usub(s)
# return the longest one of the usubs() over s
usubs(s).max_by(&:length)
end
then you can do:
longest_usub("abcabcabcdef")
## "abcdef"
I have asssumed that a string is defined to be repeating if it contains a substring s of one or one more characters that is followed by the same substring s, and that a string is non-repeating if it is not repeating.
A string is seen to be repeating if and only if it matches the regular expression
R = /([a-z]+)\1/
Demo
The regular expression reads, "match one or more letters that are saved to capture group one, then match the content of capture group 1".
For convenience we can construct a simple helper method.
def nonrepeating?(str)
!str.match? R
end
I will perform a binary search to find the longest non-repeating string. First, I need a second helper method:
def find_nonrepeating(str, len)
0.upto(str.size-len) do |i|
s = str[i,len]
return s if nonrepeating?(s)
end
nil
end
find_nonrepeating("abababc", 7) #=> nil
find_nonrepeating("abababc", 6) #=> nil
find_nonrepeating("abababc", 5) #=> nil
find_nonrepeating("abababc", 4) #=> "babc"
find_nonrepeating("abababc", 3) #=> "aba"
find_nonrepeating("abababc", 2) #=> "ab"
find_nonrepeating("abababc", 1) #=> "a"
We may now implement the binary search.
def longest(str)
longest = ''
low = 0
high = str.size - 1
while low < high
mid = (low + high)/2
s = find_nonrepeating(str, mid)
if s
longest = s
low = mid + 1
else
high = mid - 1
end
end
longest
end
longest("dabcabcdef")
#=> "bcabcdef"
a = "abcabcabcdef"
arr = []
words = []
b=a
a.length.times do
b.chars.each do |char|
break if arr.include? char
arr << char
end
words << arr.join
arr.clear
b=b.chars.drop(1).join
end
p words.map(&:chars).max_by(&:length).join
Output
"abcdef"

Using regular expressions to multiply and sum numeric string characters contained in a hash of mixed numeric strings

Without getting too much into biology, Proteins are made of Amino Acids. Each of the 20 Amino Acids that make up Proteins are represented by characters in a sequence. Each Amino Acid char has a different chemical formula, which I represent as strings. For example, "M" has a formula of "C5H11NO2S"
Given the 20 different formulas (and the varying frequency of each amino acid chars in a protein sequence) I want to compile all 20 of them into a single formula that will yield the total formula for the protein.
So first: multiply each formula by the frequency of its char in the sequence
Second : sum together all multiplied formulas into one formula.
To accomplish this, I first tried multiplying each amino acid char frequency in the sequence by the numbers in the chemical formula. I did this using .tally
sequence ="MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
sequence.chars.string.tally --> {"M"=>2, "G"=>5, "A"=>11, "R"=>5, "T"=>2, "L"=>9, "P"=>5, "D"=>5, "C"=>3, "S"=>4, "V"=>5, "H"=>1, "Q"=>4, "F"=>3, "N"=>3, "I"=>8, "K"=>7, "E"=>5, "Y"=>2}
Then, I listed all the amino acids chars and formulas into a hash
hash_of_formulas = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
An example of what the process for my overall goal is:
In the sequence , "M" occurs twice so "C5H11NO2S" will become "C10H22N2O4S2". "C" has a formula of "C3H7NO2S" occurs 3 times: In the sequence so "C3H7NO2S" becomes "C9H21N3O6S3"
So, Summing together "C10H22N2O4S2" and "C9H21N3O6S3" will yield "C19H43N5O10S5"
How can I repeat the process of multiplying each formula by its frequency and then summing together all multiplied formulas?
I know that I could use regex for multiplying a formula by its frequency for an individual string using
formula_multiplied_by_frequency = "C5H11NO2S".gsub(/\d+/) { |x| x.to_i * 4}
But I'm not sure of any methods to use regex on strings embedded within hashes
If I understand correctly, you want the to provide the total formula for a given protein sequence. Here's how I'd do it:
NUCLEOTIDES = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
NUCLEOTIDE_COMPOSITIONS = NUCLEOTIDES.each_with_object({}) { |(nucleotide, formula), compositions|
compositions[nucleotide] = formula.scan(/([A-Z][a-z]*)(\d*)/).map { |element, count| [element, count.empty? ? 1 : count.to_i] }.to_h
}
def formula(sequence)
sequence.each_char.with_object(Hash.new(0)) { |nucleotide, final_counts|
NUCLEOTIDE_COMPOSITIONS[nucleotide].each { |element, element_count|
final_counts[element] += element_count
}
}.map { |element, element_count|
"#{element}#{element_count.zero? ? "" : element_count}"
}.join
end
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
p formula(sequence)
# => "C434H888N51O213S"
You can't use regexp to multiply things. You can use it to parse a formula, but then it's on you and regular Ruby to do the math. The first job is to prepare a composition lookup by breaking down each nucleotide formula. Once we have a composition hash for each nucleotide, we can iterate over a nucleotide sequence, and add up all the elements of each nucleotide.
BTW, tally is not particularly useful here, since tally will need to iterate over the sequence, and then you have to iterate over tally anyway — and there is no aggregate operation going on that can't be done going over each letter independently.
EDIT: I probably made the regexp slightly more complicated that it needs to be, but it should parse stuff like CuSO4 correctly. I don't know if it's an accident or not that all nucleotides are only composed of elements with a single-character symbol... :P )
Givens
We are given a string representing a protein comprised of amino acids:
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIR" +
"AKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
and a hash that contains the formulas of amino acids:
formulas = {
"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4",
"C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2",
"H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2",
"M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3",
"T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"
}
Obtain counts of atoms in each amino acid
As a first step we can calculate the numbers of each atom in each amino acid:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).
each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
#=> {"A"=>{"C"=>3, "H"=>7, "N"=>1, "O"=>2},
# "R"=>{"C"=>6, "H"=>14, "N"=>4, "O"=>2},
# ...
# "M"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2, "S"=>1}
# ...
# "V"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2}}
Compute formula for protein
Then it's simply:
def protein_formula(sequence, counts)
sequence.each_char.
with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
end
protein_formula(sequence, counts)
#=> "C434H888N120O213S5"
Another example:
protein_formula("MCMPCFTTDHQMARKCDDCCGGKGRGKCYGPQCLCR", count)
#=> "C158H326N52O83S11"
Explanation of calculation of counts
This calculation:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
uses the method Hash#transform_values. It will return a hash having the same keys as the hash formulas, with the values of those keys in formula modified by transform_values's block. For example, formulas["A"] ("C3H7NO2") is "transformed" to the hash {"C"=>3, "H"=>7, "N"=>1, "O"=>2} in the hash that is returned, counts.
transform_values passes each value of formulas to the block and sets the block variable equal to it. The first value passed is "C3H7NO2", so it sets:
s = "C3H7NO2"
We can write the block calculation more simply:
h = {}
s.scan(/[CHNOS]\d*/).each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
h
(Once you understand this calculation, which I explain below, see Enumerable#each_with_object to understand why I used that method in my solution.)
After initializing h to an empty hash, the following calculations are performed:
h = {}
a = s.scan(/[CHNOS]\d*/)
#=> ["C3", "H7", "N", "O2"]
a is computed using String#scan with the regular expression /[CHNOS]\d*/. That regular expression, or regex, matches exactly one character in the character class [CHNOS] followed by zero of more (*) digits (\d). It therefore separates the string "C3H7NO2" into the substrings that are returned in the array shown under the calculation of a above . Continuing,
a.each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
changes h to the following:
h #=> {"C"=>3, "H"=>7, "N"=>1, "O"=>2}
The block variable s is initially set equal to the first element of a that is passed to each's block:
s = "C3"
then we compute:
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
h["A"] = 2 == 1 ? 1 : "3".to_i
= false ? 1 : 3
3
This is repeated for each element of a.
Exclamation of construction of formula for the protein
We can simplify the following code1:
sequence.each_char.with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
to more or less the following:
h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }
ch = sequence.chars
#=> ["M", "G", "A",..., "F", "I"]
ch.each do |c|
counts[c].each { |aa,cnt| h[aa] += cnt }
end
h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5}
When the first value of ch ("M") is passed to each's block (when h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }), the following calculations are performed:
c = "M"
g = counts[c]
#=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
g.each { |aa,cnt| h[aa] += cnt }
h #=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
Lastly, (when h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5})
s = ''
h.each { |aa,nbr| s << "#{aa}#{nbr}" }
s #=> "C434H888N120O213S5"
When aa = "C" and nbr = 434,
"#{aa}#{nbr}"
#=> "C434"
is appended to the string s.
1. (("C"=>0, "H"=>0, "N"=>0, "O"=>0) is shorthand for ({"C"=>0, "H"=>0, "N"=>0, "O"=>0}).

Permutations of strings takes too long to solve

I'm creating an array of permutated and unique letters in a string, only to sort them alphabetically and find the middle element in the set.
def middle_permutation(string)
length = string.length
permutation_set = string.split("").permutation(length).to_a.map{|item| item.join}.sort
permutation_set.length.even? ? permutation_set[(permutation_set.length)/2-1] : permutation_set[(permutation_set.length/2)+1]
end
For example:
middle_permutation("zxcvbnmasd") should equal "mzxvsndcba"
Even for small strings (N >=10), the calculations take pretty long to finish, and I can forget about anything double that; is there a quicker way?
I'm assuming the letters are unique, as in the OP's question.
Sort
Pluck the middle letter of the sorted string (rounded down). This is the first letter of the middle permutation.
If the original list had an even number of letters, the rest of the permutation is the reverse sort of the remaining letters.
If not, take the middle letter again. Now the rest of the result is the reverse sort of the remaining letters.
The method below returns the desired permutation directly, without iterating through permutations.
The asker has stated that the string contains no duplicated letters, which is a requirement for this method. I assume the characters of the string are sorted. If they are not, the creation of a sorted string would be the first step:
str = "ebadc".chars.sort.join
#=> "abcde"
Code
def mid_perm(str)
return mid_perm_even_length_strings(str) if str.size.even?
first_char_index = str.size/2
str[first_char_index] << mid_perm_even_length_strings(str[0,first_char_index] +
str[first_char_index+1..-1])
end
def mid_perm_even_length_strings(str)
first_char_index = str.size/2-1
str[first_char_index] + (str[0,first_char_index] + str[first_char_index+1..-1]).reverse
end
Examples
mid_perm 'abcd'
#=> "bdca"
mid_perm 'abcde'
#=> "cbeda"
mid_perm 'abcdefghijklmnopqrstuvwxyz'
#=> "mzyxwvutsrqponlkjihgfedcba"
Explanation
Let's start by defining a method to produce permutations of the letters of a string.
def perms(str)
str.chars.permutation(str.size).map(&:join)
end
Strings containing an even number of characters
Consider
a = perms "abcd"
#=> ["abcd", "abdc", "acbd", "acdb", "adbc", "adcb",
# "bacd", "badc", "bcad", "bcda", "bdac", "bdca",
# "cabd", "cadb", "cbad", "cbda", "cdab", "cdba",
# "dabc", "dacb", "dbac", "dbca", "dcab", "dcba"]
a contains 4! #=> 4*3*2 => 24 elements, 4 being the length of the string.
Notice that since the characters in perms' argument are sorted, the array returned is also sorted1.
a == a.sort #=>true
As a.size #=> 24, the "middle" element is either a[11] #=> "bdca" or a[12] #=> "cabd" (where 11 = (24-1)/2 and 12 = 24/2), depending on how we want to round. The question stipulates that, for even-length strings, we are to round down, so that would be "bdca".
Now let's slice a into str.size equal arrays, each containing a.size/str.size #=> 24/4 => 6 elements:
b = a.each_slice(a.size/str.size).to_a
#=> [["abcd", "abdc", "acbd", "acdb", "adbc", "adcb"],
# ["bacd", "badc", "bcad", "bcda", "bdac", "bdca"],
# ["cabd", "cadb", "cbad", "cbda", "cdab", "cdba"],
# ["dabc", "dacb", "dbac", "dbca", "dcab", "dcba"]]
The desired element is therefore
b[(a.size/str.size-1)/2-1][-1]
#=> "bdca"
This value can be computed more directly as follows.
first_char_index = str.size/2-1
#=> 1
first_char = str[first_char_index]
#=> "b"
remaining_chars = (str[0,first_char_index] + str[first_char_index+1..-1]).reverse
#=> "dca"
first_char + remaining_chars
#=> "bdca"
The same logic applies to all strings having an even number of characters. We therefore can write the method mid_perm_even_length_strings shown in the Code section above.
For example (for a 12-character string)
mid_perm_even_length_strings 'abcdefghijkl'
#=> "flkjihgedcba"
Strings containing an odd number of characters
Now consider
str = "abcde"
a = perms str
#=> ["abcde", "abced", "abdce", "abdec", "abecd", "abedc",
# "acbde", "acbed", "acdbe", "acdeb", "acebd", "acedb",
# "adbce", "adbec", "adcbe", "adceb", "adebc", "adecb",
# "aebcd", "aebdc", "aecbd", "aecdb", "aedbc", "aedcb",
# "bacde", "baced", "badce", "badec", "baecd",..., "bedca",
# "cabde", "cabed", "cadbe", "cadeb", "caebd", "caedb",
# "cbade", "cbaed", "cbdae", "cbdea", "cbead", "cbeda",
# "cdabe", "cdaeb", "cdbae", "cdbea", "cdeab", "cdeba",
# "ceabd", "ceadb", "cebad", "cebda", "cedab", "cedba",
# "dabce", "dabec", "dacbe", "daceb", "daebc",..., "decba",
# "eabcd", "eabdc", "eacbd", "eacdb", "eadbc",..., "edcba"]
Here the permutation contains 5! #=> 100 elements, in 5 blocks of 20. (Again, a.each_cons(2).all? { |s1,s2| s1 < s2 } #=> true.)
The middle element of a is clearly the middle element of the block of elements that begin with
str[str.size/2] #=> "c"
That block would be the array
b = a.each_slice(a.size/str.size).to_a[str.size/2]
#=> ["cabde", "cabed", "cadbe", "cadeb", "caebd", "caedb",
# "cbade", "cbaed", "cbdae", "cbdea", "cbead", "cbeda",
# "cdabe", "cdaeb", "cdbae", "cdbea", "cdeab", "cdeba",
# "ceabd", "ceadb", "cebad", "cebda", "cedab", "cedba"]
which would be 'c' plus the middle element of the array
["abde", "abed", "adbe", "adeb", "aebd", "aedb",
"bade", "baed", "bdae", "bdea", "bead", "beda",
"dabe", "daeb", "dbae", "dbea", "deab", "deba",
"eabd", "eadb", "ebad", "ebda", "edab", "edba"]
That array is merely the permutations of the string "abde". Since that string contains an even number characters, its middle element is
mid_perm_even_length_strings 'abde'
#=> "beda"
It follows that the middle element of the permutations of the letters of "abcde" is therefore
'c' + 'abde'
#=> "cabde"
This clearly applies to all strings containing an odd number of characters.
1. The doc for Array#permutation states, "The implementation makes no guarantees about the order in which the permutations are yielded.". We therefore might need to tack .sort to the end of the operative line of perms, but with Ruby v2.4 (and I suspect, earlier versions) that is, in fact not necessary here.
I was able to compact it like this:
def middle_permutation(string)
list = string.chars.permutation.map(&:join).sort
list[list.length / 2 - (list.length.even? ? 1 : 0)]
end
Which yields:
middle_permutation('zxcvbnmasd')
# => "mzxvsndcba"
You don't need to generate all permutations. Just find overall number of permutations as PN = N! where N is string (of different chars) length and calculate only needed PN/2-th permutation by its number - for example, using this approach
public static int[] perm(int n, int k)
{
int i, ind, m=k;
int[] permuted = new int[n];
int[] elems = new int[n];
for(i=0;i<n;i++) elems[i]=i;
for(i=0;i<n;i++)
{
ind=m%(n-i);
m=m/(n-i);
permuted[i]=elems[ind];
elems[ind]=elems[n-i-1];
}
return permuted;
}
So it turns out there are two tracks to this, odd strings and even strings.
For odd strings, you take out the middle character Element of the sorted array and the one before it, in that order. When you do that you have two remaining arrays, the one the right and left, both alphabetically sorted. You tack on elements of the right array, starting with the last element, then do the same for the one on the left.
For even strings, Do the same but only take one character in the first step: the (N/2) element.
Here's my solution:
def middle_permutation(string)
string_array = string.chars.sort
mid_string = []
length = string.length
if length.even?
mid_string << string_array[length/2-1]
string_array.delete_at(length/2-1)
(mid_string << string_array.reverse).flatten.join
else
mid_string << string_array[(length/2)-1..length/2].reverse
string_array.slice!((length/2)-1, 2)
(mid_string << string_array.reverse).flatten.join
end
end

Ruby - Finding the longest palindromic substring in a string

I understand how to find if one string is a palindrome
string1 == string1.reverse
It's a little more difficult though with multiple palindromes in a string
"abcdxyzyxabcdaaa"
In the above string, there are 4 palindromes of length greater than 1
"xyzyx", "yzy", "aaa" and "aa"
In this case, the longest palindrome is "xyxyx", which is 5 characters long.
How would I go about solving this problem though.
I know of the array#combination method, but that won't work in this case.
I was thinking of implementing something like this
def longest_palindrome(string)
palindromes = []
for i in 2..string.length-1
string.chars.each_cons(i).each {|x|palindromes.push(x) if x == x.reverse}
end
palindromes.map(&:join).max_by(&:length)
end
If your just looking for the largest palindrome substring, Here is a quick and dirty solution.
def longest_palindrome(string, size)
string.size.times do |start| # loop over the size of the string
break if start + size > string.size # bounds check
reverse = string[start, size].reverse
if string.include? reverse #look for palindrome
return reverse #return the largest palindrome
end
end
longest_palindrome(string, size - 1) # Palindrome not found, lets look for the next smallest size
end
def longest_palindrome(string)
longest = ''
i = 0
while i < string.length
j = 1
while (i + j) <= string.length
x = string.slice(i, j)
if (x.length > longest.length) && (x == x.reverse)
longest = x
end
j += 1
end
i += 1
end
longest
end
The slice method is handy to have for solving this problem. Test each substring with the classic double while loop approach with (i, j) representing a starting index and length of the substring respectively. string.slice(start_index, substring_length)
The String#slice method works like this:
"bdehannahc".slice(3, 8) == "hannah" # which is a palindrome and would be
# found by the method introduced above
This checks if the entire string str is a palindrome. If it is, we're finished; if not, check all substrings of length str.size-1. If one is a palindrome, we're finished; if not, check substrings of length str.size-1, and so on.
def longest_palindrome(str)
arr = str.downcase.chars
str.length.downto(1) do |n|
ana = arr.each_cons(n).find { |b| b == b.reverse }
return ana.join if ana
end
end
longest_palindrome "abcdxyzyxabcdaaa"
#=> "xyzyx"
longest_palindrome "abcdefghba"
#=> "a"
The key method here is Enumerable#each_cons.
Here is another solution, using less features of Ruby and iteration instead of recursion:
def longest_palindrome(string)
# to find the longest palindrome, start with whole thing
substr_start = 0
substr_length = string.length
while substr_length > 0 # 1 is a trivial palindrome and the end case
# puts 'substr_length is:' + substr_length.to_s
while substr_start <= string.length - substr_length
# puts 'start is: ' + substr_start.to_s
if palindrome?(string.slice(substr_start,substr_length))
puts 'found palindrome: ' + string.slice(substr_start,substr_length)
return string.slice(substr_start,substr_length)
end
substr_start += 1
end
substr_start = 0 # inner loop ctr reset
substr_length -= 1
end
puts 'null string tested?'
return ''
end

Finding if a string is a repeated substring in Ruby

Here is the problem:
A string is called a k-string if it can be represented as k concatenated copies of some string. For example, the string "aabaabaabaab" is at the same time a 1-string, a 2-string and a 4-string, but it is not a 3-string, a 5-string, or a 6-string and so on. Obviously any string is a 1-string.
You are given a string s, consisting of lowercase English letters and a positive integer k. Your task is to reorder the letters in the string s in such a way that the resulting string is a k-string.
Input
The first input line contains integer k (1 ≤ k ≤ 1000). The second line contains s, all characters in s are lowercase English letters. The string length s satisfies the inequality 1 ≤ |s| ≤ 1000, where |s| is the length of string s.
Output
Rearrange the letters in string s in such a way that the result is a k-string. Print the result on a single output line. If there are multiple solutions, print any of them.
If the solution doesn't exist, print "-1" (without quotes).
Here is my code:
k = gets.to_i
str = gets.chomp.split(//)
n = str.length/k
map = Hash.new(0)
map2 = Hash.new(0)
str.each { |i| map[i] += 1 }
x = str.uniq.permutation(n).map(&:join).each do |string|
string.each_char { |c| map2[c] += k }
if map2 == map
puts string*k
exit
end
map2 = Hash.new(0)
end
puts '-1'
To me this solution seems like it should work, but it fails on a test case. Can anyone tell me why?
Here's my solution.
Just create one segment, then output it k times. If a character does not appear k times (or a multiple of it), then stop early and output -1.
k = gets.to_i
str = gets.chomp.split(//)
counts = Hash.new(0)
str.each { |i| counts[i] += 1 }
out = ''
str.uniq.each do |c|
if counts[c] % k != 0
puts -1
exit
end
out = out + c*(counts[c]/k)
end
puts out*k

Resources