Split a string into parts in efficient way - ruby

My code is here
str = "Early in his first term in office, Obama signed into law economic stimulus legislation in response"
arr= str.split(" ")
set_element= arr.each_cons(2).to_a
sub_str = set_element.map {|i| i.join(' ')}
If i have a big string like very big string then this process take 6.50 sec
because i want to this type of result
sub_str= ["Early in", "in his", "his first", "first term", "term in", "in office,", "office, Obama", "Obama signed", "signed into", "into law", "law economic", "economic stimulus", "stimulus legislation", "legislation in", "in response"]
Is it possible any another way with efficient way

Use scan instead of split and you can get your word pairs directly.
s.scan(/\S+(?:\s+\S+)?/)
EDIT: Just to assure myself that this was relatively efficient, I made a little micro-benchmark. Here's results for the answers seen to date:
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]
10 times on string of size 2284879
user system total real
original 4.180000 0.070000 4.250000 ( 4.272856)
sergio 2.090000 0.000000 2.090000 ( 2.102469)
dbenhur 1.050000 0.000000 1.050000 ( 1.042167)

set_element = arr.each_cons(2).to_a
The line above creates a ton of temporary objects that you don't need. Try this, should be faster:
str = "Early in his first term in office, Obama signed into law economic stimulus legislation in response"
arr = str.split(" ")
sub_str = arr.each_with_object([]).with_index do |(el, memo), idx|
if idx % 2 == 0
memo << el
else
memo.last << ' ' << el
end
end
sub_str # => ["Early in", "his first", "term in", "office, Obama", "signed into", "law economic", "stimulus legislation", "in response"]

You can try this. one step less :)
arr= str.scan(/\S+/)
s = []
arr.each_with_index { |x, i| s << (x + " " + arr[i + 1]) if arr[i+1] }

Related

How to loop nested arrays in Ruby

VERY new to Ruby and coding in general. I'm trying to loop through two dimensional arrays but can't figure it out. Here's what I have:
--Use a loop to print out each person on separate lines with their alter egos.
--Bruce Wayne, a.k.a. Batman
people = [
["Bruce", "Wayne", "Batman"],
["Selina", "Kyle", "Catwoman"],
["Barbara", "Gordon", "Oracle"],
["Terry", "McGinnis", "Batman Beyond"]
]
index = people[0][0]
first_name = people[0][0]
last_name = people[0][1]
hero_name = people[0][2]
4.times do
puts first_name + " " + last_name + "," " " + "a.k.a" " " + hero_name
index = index + 1
end
It does print the first line but then raises an error:
Bruce Wayne, a.k.a Batman
# `+': no implicit conversion of Integer into String (TypeError)
In ruby we don’t use loops by index, like for and family; instead we iterate on collections:
people =
[["Bruce", "Wayne", "Batman"],
["Selina", "Kyle", "Catwoman"],
["Barbara", "Gordon", "Oracle"],
["Terry", "McGinnis", "Batman Beyond"]]
people.each do |first, last, nick|
puts "#{first} #{last}, a.k.a #{nick}"
end
or
people.each do |first_last_nick|
*first_last, nick = first_last_nick
puts [first_last.join(' '), nick].join(', a.k.a ')
end
Your code produces error because you assign a String to index
index = people[0][0]
and then you use it to count with
index = index + 1
You could have used
index = 0
and
index += 1
A more Rubyesque way would be to enumerate the array and print it like this
people.each do |person|
puts "#{person.first} #{person[1]}, a.k.a #{person.last}"
end
Which gives
Bruce Wayne, a.k.a Batman
Selina Kyle, a.k.a Catwoman
Barbara Gordon, a.k.a Oracle
Terry McGinnis, a.k.a Batman Beyond
Storing the parts in a variable improves readability but lenghtens the code which in turn diminishes readability, the choice is yours..
As an alternative you could name the indices or decompose like mudasobwa suggests.
Firstname, Lastname, Nickname = 0, 1, 2
people.each do |person|
puts "#{person[Firstname]} #{person[Lastname]}, a.k.a #{person[Nickname]}"
end
For your code to work:
4.times do |character|
puts people[character][0] + " " + people[character][1] + "," " " + "a.k.a" " " + people[character][2]
end
But iterating in ruby is done as answered by others.
This is a version using a block {} instead:
people = [["Bruce", "Wayne", "Batman"], ["Selina", "Kyle", "Catwoman"], ["Barbara", "Gordon", "Oracle"], ["Terry", "McGinnis", "Batman Beyond"]]
people.each { |character| puts "#{character [0]}, a.k.a #{character [1]} #{character [2]}" }
#=> Bruce, a.k.a Wayne Batman
#=> Selina, a.k.a Kyle Catwoman
#=> Barbara, a.k.a Gordon Oracle
#=> Terry, a.k.a McGinnis Batman Beyond
In general to loop through nested arrays:
people.each do |character|
character.each do |name|
puts name
end
end

Capture output from `each_with_index` and ignore the return

I have:
deli_line = ["stuff", "things", "people", "places"]
I want the string:
"1. stuff 2. things 3. people 4. places"
then, do
string1 += "1. stuff 2. things 3. people 4. places"
I cannot figure that out. I am doing:
deli_line.each_with_index do |x, i| print "#{i+1}. #{x} " end
I get output:
1. stuff 2. things 3. people 4. places
# => ["stuff", "things", "people", "places"]
and I am currently trying to append the return, which is an array, to a string, causing an error.
deli_line.map.with_index(1){|x, i| "#{i}. #{x}"}.join(" ")
# => "1. stuff 2. things 3. people 4. places"
idx = 1.step
deli_line.map { |s| "%s. %s" % [idx.next, s] }.join(" ")
#=> "1. stuff 2. things 3. people 4. places"
Ruby v.2.1.0 changed Numeric#step to allow "the limit argument to be omitted, in which case an infinite sequence of numbers is generated" (ref).
Therefore
idx = 1.step #=> #<Enumerator: 1:step>
idx.next #=> 1
idx.next #=> 2
idx.next #=> 3
...
Experienced Rubiests alert: gory detail follows. You may wish to avert your eyes.
The steps:
enum = deli_line.map
#=> #<Enumerator: ["stuff", "things", "people", "places"]:map>
Ruby invokes Enumerator#each on enum:
enum.each { |s| "%s. %s" % [idx.next, s] }
#=> ["1. stuff", "2. things", "3. people", "4. places"]
Note that when Enumerable#map is given without a block (e.g., to be chained to another enumerator or an Enumerable method [as Enumerator includes Enumerable]), it returns an enumerator. Here it does have a block, but the calculations are no different than those I describe below. The same is true of all Enumerable instance methods that may or may not be invoked with a block.
(Enumerator#each in turn invokes Array#each because the receiver, deli_line is an instance of Array.)
each passes each element of enum to the block and assigns the block variable to that value. The first is passed as follows (ref Enumerator#next):
s = enum.next
#=> "stuff"
and the block calculation is performed, using the method String#%:
"%s. %s" % [idx.next, s]
#=> "%s. %s" % [1, "stuff"]
#=> "1. stuff"
The second element of enum is passed to the block:
s = enum.next
#=> "things"
"%s. %s" % [idx.next, s]
#=> "2. things"
Similar calculations are performed for the third and last elements of enum, after which we have determined that:
arr = deli_line.map { |s| "%s. %s" % [idx.next, s] }
#=> ["1. stuff", "2. things", "3. people", "4. places"]
All that remains is to apply Array#join to join the elements of arr with one space between each element:
arr.join(" ")
#=> "1. stuff 2. things 3. people 4. places"
Try this:
string = ''
deli_line.each.with_index do |word, index|
string << "#{index+1}. #{word} "
end
puts string #=> 1. stuff 2. things 3. people 4. places

Programming a basic calculator in Ruby

This is my first foray into computer programming. I have chosen to learn Ruby, and I am enjoying it quite a bit. However, I am a little confused as to why the answer will not output properly in this bit of code.
def addition_function
puts "Which numbers would you like to add?"
#n1 = gets.chomp
#n2 = gets.chomp
#n1 + #n2 == #answer
puts "The sum is... #{#answer}"
end
def subtraction_function
puts "Which numbers would you like to subtract?"
#n1 = gets.chomp.to_i
#n2 = gets.chomp.to_i
#n1 - #n2 == #answer
puts "The answer is... #{#answer}"
end
def multiplication_function
puts "Which numbers would you like to multiply?"
#n1 = gets.chomp
#n2 = gets.chomp
#n1 * #n2 == #answer
puts "The answer is... #{#answer}"
end
puts "Would you like to [add], [multiply], or [subtract]?"
response = gets.chomp
if response == "add" then
addition_function
end
if response == "subtract" then
subtraction_function
end
if response == "multiply" then
multiplication_function
end
I know this is probably horrible code... but could someone help steer me in the right direction?
Consider this code:
def get_int_values
[gets, gets].map{ |s| s.chomp.to_i }
end
puts "Would you like to [add], [multiply], or [subtract]?"
response = gets.chomp
case response.downcase
when 'add'
puts "Which numbers would you like to add?"
operator = :+
when 'subtract'
puts "Which numbers would you like to subtract?"
operator = :-
when 'multiply'
puts "Which numbers would you like to multiply?"
operator = :*
end
answer = get_int_values.inject(operator)
puts "The answer is... #{ answer }"
The idea is to follow the "DRY" principle: "DRY" means "Don't Repeat Yourself", which the vast majority of the time, is a really good thing.
To help avoid typing mistakes I'd recommend doing something like:
puts "Would you like to [a]dd, [m]ultiply, or [s]ubtract?"
response = gets.chomp
case response[0].downcase
then change the when clauses to match the first letter of the desired operation.
Which will work unless response is empty. You can figure out how to handle that.
another way to obtain answer, once operator is determined, is answer = gets.to_i.send(operator, gets.to_i)
That's true, but here's why I refactored the code the way I did: If, for some reason, there was a need to operate on more than two values, only one thing has to be changed:
[gets, gets].map{ |s| s.chomp.to_i }
could become:
[gets, gets, gets].map{ |s| s.chomp.to_i }
Or, better, could be transformed to something like:
def get_int_values(n)
n.times.map { gets.chomp.to_i }
end
Nothing else will have to change except to find out how many values are needed.
Now, to do it all right would require different text to alert the user that multiple values are expected, but that's easily done by letting letting the user say how many they want to enter, and then prompting for each gets:
def get_int_values(n)
n.times.map.with_index { |n|
print "Enter value ##{ 1 + n }: "
gets.chomp.to_i
}
end
puts "Would you like to [add], [multiply], or [subtract]?"
response = gets.chomp
puts "How many values?"
num_of_values = gets.to_i
case response.downcase
when 'add'
puts "Which numbers would you like to add?"
operator = :+
when 'subtract'
puts "Which numbers would you like to subtract?"
operator = :-
when 'multiply'
puts "Which numbers would you like to multiply?"
operator = :*
end
answer = get_int_values(num_of_values).inject(operator)
puts "The answer is... #{ answer }"
inject can scale up easily because it doesn't presuppose knowledge about the number of values being operated on.
I think with_index in n.times.map.with_index is an artifact you forgot to delete.
It was deliberate but I like this better:
def get_int_values(n)
1.upto(n).map { |n|
print "Enter value ##{ n }: "
gets.chomp.to_i
}
end
Your assignments are on the wrong side of the statement. You should have answer = n1 * n2,
which is not the same as answer == n1 * n2 (this is a check for equality, using ==). The expression always goes on the right, and the variable the result is assigned to goes on the left -- this is pretty much universal, but not necessarily intuitive coming from algebra.
Also: using an # prior to a variable name differentiates it as an instance variable, or member, of a class. From what you've shown here you don't need to include those, just normally scoped variables are required for this use.
Check out this question for more on that part.
The "#" sigil is used to indicate a class instance variable, you have no class so don't use it.
#n1 + #n2 == #answer
Is a boolean expression evaluating whether #n1 + #n2 is equal to #answer.
It will evaluate to true or false.... but you don't make use of the answer.
What you want is ...
answer = n1 + n2
I strongly recommend you always run Ruby with the -w option. It will save you much much heartache.
Please indent your "end"'s to match your "def" (or "if").
You repeat n1 = gets.chomp.to_i all over the place, do it once and pass the answers as a parameter...
response = gets.chomp
n1 = gets.chomp.to_i
n2 = gets.chomp.to_i
if response == "add" then
addition_function( n1, n2)
elsif...
A few suggestions not mentioned by others:
Shorten your method (not "function") names and use verbs (e.g., add instead of addition_method).
As well as using local variables rather than instance variables (mentioned by others), eliminate them where you can. For example, you could simplify
.
def add
puts "Which numbers would you like to add?"
n1 = gets.to_i
n2 = gets.to_i
answer = n1 + n2
puts "The sum is... #{answer}"
end
to
def add
puts "Which numbers would you like to add?"
puts "The sum is... #{gets.to_i + gets.to_i}"
end
Notice I've used the Ruby convention of indenting two spaces.
You don't need chomp here (though it does no harm), because "123followed by \n or any other non-digits".to_i => 123.
A case statement would work well at the end (and let's loop until the user chooses to quit):
.
loop do
puts "Would you like to [add], [multiply], [subtract] or [quit]?"
case gets.chomp
when "add"
add
when "subtract"
subtract
when "multiply"
multiply
when "quit"
break
end
or just
def quit() break end
loop do
puts "Would you like to [add], [multiply], [subtract] or [quit]?"
send(gets.chomp)
end
Here we do need chomp. You could replace loop do with while true do or use other equivalent constructs.
class Calculator
def Calc
puts"==well come to mobiloitte calculator=="
puts "enter the first operand:"
#op1 = gets.chomp
return if #op1=="q"
#o1=#op1.to_i
puts "entre the second operand:"
#op2 = gets.chomp
return if #op2=="q"
#o2=#op2.to_i
strong text puts "enter any one operator of your choice (add,sub,mul,div,mod)"
operator = gets.chomp
case operator
when 'add' then #s=#o1+#o2 ; puts "\n ##o1 + ##o2 =##s"
when 'sub' then #t=#o1-#o2 ; puts "\n ##o1 - ##o2 =##t"
when 'mul' then #l=#o1*#o2 ; puts "\n ##o1 * ##o2 =##l"
when 'div' then #r=#o1/#o2 ; puts "\n ##o1 \ ##o2 =##r"
when 'md' then #d=#o1%#o2 ; puts "\n ##o1 % ##o2 =##d"
else
puts"invalide input"
end
end
end
obj= Calculator.new
$f=obj.Calc
You are using #n1 + #n2 == #answer to try and set the answer. What you want to do is #answer = #n1 + #n2.
= is assignment, == is a comparison operator.
Also, you will need to #n1 = gets.chomp.to_i. This will convert your input to an integer from a string. Do that with #n2 as well.
You also do not need to use the # before each of your variables. That should only be used when you are dealing with classes, which you do not appear to be doing.
print "enter number 1 : "
n1 = gets.chomp.to_f
print "enter number 2 : "
n2 = gets.chomp.to_f
print "enter operator: "
op = gets.chomp
if op == '+'
puts "#{n1} + #{n2} = #{n1 + n2}"
elsif op == '-'
puts "#{n1} - #{n2} = #{n1 - n2}"
elsif op == '*'
puts "#{n1} * #{n2} = #{n1 * n2}"
elsif op == '/'
puts "#{n1} / #{n2} = #{n1 / n2}"
end
puts "Would you like to
0 ---- [exit],
1 ---- [add],
2 ---- [subtract],
3 ---- [multiply],
4 ---- [divide]"
response = gets.chomp
case response.downcase
when '1'
def addition_function
puts "Which numbers would you like to add?"
n1 = gets.to_i
n2 = gets.to_i
answer = n1 + n2
puts "The sum is... #{n1} + #{n2} = #{answer}"
end
addition_function()
#Subtract
when '2'
def subtraction_function
puts "Which numbers would you like to subtact?"
n1 = gets.to_i
n2 = gets.to_i
answer = n1 - n2
puts "The subtraction is... #{n1} - #{n2} = #{answer}"
end
subtraction_function()
#Multiply
when '3'
def multiplication_function
puts "Which numbers would you like to multiply?"
n1 = gets.to_i
n2 = gets.to_i
answer = n1 * n2
puts "The multiplication is... #{n1} * #{n2} = #{answer}"
end
multiplication_function()
#Division
when '4'
def division_function
puts "Which numbers would you like to divide?"
n1 = gets.to_i
n2 = gets.to_i
answer = n1 / n2
puts "The division is... #{n1} / #{n2} = #{answer}"
end
division_function()
else '0'
puts "Exit! Thank You for using us!"
end
#ruby script to do the calculator
puts " enter the number1"
in1=gets.to_i
puts " enter the number2"
in2=gets.to_i
puts "enter the operator"
op=gets.chomp
case op
when '+'
plus=in1+in2
puts "#{in1+in2}"
#puts "#{plus}"
when '-'
min=in1-in2
puts "#{min}"
when '*'
mul= in1*in2
puts "#{mul}"
when '/'
div=in1/in2
puts "#{div}"
else
puts "invalid operator"
end
begin
puts 'First number:'
a = $stdin.gets.chomp.to_i
puts 'Second number:'
b = $stdin.gets.chomp.to_i
operation = nil
unless ['+', '-', '*', '/', '**'].include?(operation)
puts 'Choose operation: (+ - * /):'
operation = $stdin.gets.chomp
end
result = nil
success = false
case operation
when '+'
result = (a + b).to_s
when '-'
result = (a - b).to_s
when '*'
result = (a * b).to_s
when '/'
result = (a / b).to_s
when '**'
result = (a**b).to_s
else
puts 'There is not such kind of operation'
end
success = true
puts "Результат: #{result}"
rescue ZeroDivisionError => e
puts "You tried to devide number by zero! Error: #{e.message}"
end
if success
puts "\nSuccess!"
else
puts "\nSomething goes wrong :("
end
puts ("plz enter a number :")
num1 = gets.chomp.to_f
puts ("plz enter a another number")
num2 = gets.chomp.to_f
puts ("plz enter the operation + , - , x , / ")
opp = gets.chomp
if opp == "+"
puts (num1 + num2)
elsif opp == "-"
puts (num1 - num2)
elsif opp == "x"
puts (num1 * num2)
elsif opp == "/"
puts (num1 / num2)
else puts ("try again :|")
end

How to get words frequency in efficient way with ruby?

Sample input:
"I was 09809 home -- Yes! yes! You was"
and output:
{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }
My code that does not work:
def get_words_f(myStr)
myStr=myStr.downcase.scan(/\w/).to_s;
h = Hash.new(0)
myStr.split.each do |w|
h[w] += 1
end
return h.to_a;
end
print get_words_f('I was 09809 home -- Yes! yes! You was');
This works but I am kinda new to Ruby too. There might be a better solution.
def count_words(string)
words = string.split(' ')
frequency = Hash.new(0)
words.each { |word| frequency[word.downcase] += 1 }
return frequency
end
Instead of .split(' '), you could also do .scan(/\w+/); however, .scan(/\w+/) would separate aren and t in "aren't", while .split(' ') won't.
Output of your example code:
print count_words('I was 09809 home -- Yes! yes! You was');
#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}
def count_words(string)
string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end
Second variant:
def count_words(string)
string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end
def count_words(string)
Hash[
string.scan(/[a-zA-Z]+/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
end
puts count_words 'I was 09809 home -- Yes! yes! You was'
This code will ask you for input and then find the word frequency for you:
puts "enter some text man"
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word.downcase] += 1 }
frequencies = frequencies.sort_by {|a, b| b}
frequencies.reverse!
frequencies.each do |word, frequency|
puts word + " " + frequency.to_s
end
This works, and ignores the numbers:
def get_words(my_str)
my_str = my_str.scan(/\w+/)
h = Hash.new(0)
my_str.each do |s|
s = s.downcase
if s !~ /^[0-9]*\.?[0-9]+$/
h[s] += 1
end
end
return h
end
print get_words('I was there 1000 !')
puts '\n'
You can look at my code that splits the text into words. The basic code would look as follows:
sentence = "Ala ma kota za 5zł i 10$."
splitter = SRX::Polish::WordSplitter.new(sentence)
histogram = Hash.new(0)
splitter.each do |word,type|
histogram[word.downcase] += 1 if type == :word
end
p histogram
You should be careful if you wish to work with languages other than English, since in Ruby 1.9 the downcase won't work as you expected for letters such as 'Ł'.
class String
def frequency
self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash|
hash[word.downcase] += 1
end
end
end
puts "I was 09809 home -- Yes! yes! You was".frequency

Find consecutive substring indexes

Given a search string and a result string (which is guaranteed to contain all letters of the search string, case-insensitive, in order), how can I most efficiently get an array of ranges representing the indices in the result string corresponding to the letters in the search string?
Desired output:
substrings( "word", "Microsoft Office Word 2007" )
#=> [ 17..20 ]
substrings( "word", "Network Setup Wizard" )
#=> [ 3..5, 19..19 ]
#=> [ 3..4, 18..19 ] # Alternative, acceptable, less-desirable output
substrings( "word", "Watch Network Daemon" )
#=> [ 0..0, 10..11, 14..14 ]
This is for an autocomplete search box. Here's a screenshot from a tool similar to Quicksilver that underlines letters as I'm looking to do. Note that--unlike my ideal output above--this screenshot does not prefer longer single matches.
Benchmark Results
Benchmarking the current working results shows that #tokland's regex-based answer is basically as fast as the StringScanner-based solutions I put forth, with less code:
user system total real
phrogz1 0.889000 0.062000 0.951000 ( 0.944000)
phrogz2 0.920000 0.047000 0.967000 ( 0.977000)
tokland 1.030000 0.000000 1.030000 ( 1.035000)
Here is the benchmark test:
a=["Microsoft Office Word 2007","Network Setup Wizard","Watch Network Daemon"]
b=["FooBar","Foo Bar","For the Love of Big Cars"]
test = { a=>%w[ w wo wor word ], b=>%w[ f fo foo foobar fb fbr ] }
require 'benchmark'
Benchmark.bmbm do |x|
%w[ phrogz1 phrogz2 tokland ].each{ |method|
x.report(method){ test.each{ |words,terms|
words.each{ |master| terms.each{ |term|
2000.times{ send(method,term,master) }
} }
} }
}
end
To have something to start with, how about that?
>> s = "word"
>> re = /#{s.chars.map{|c| "(#{c})" }.join(".*?")}/i # /(w).*?(o).*?(r).*?(d)/i/
>> match = "Watch Network Daemon".match(re)
=> #<MatchData "Watch Network D" 1:"W" 2:"o" 3:"r" 4:"D">
>> 1.upto(s.length).map { |idx| match.begin(idx) }
=> [0, 10, 11, 14]
And now you only have to build the ranges (if you really need them, I guess the individual indexes are also ok).
Ruby's Abbrev module is a good starting point. It breaks down a string into a hash consisting of the unique keys that can identify the full word:
require 'abbrev'
require 'pp'
abbr = Abbrev::abbrev(['ruby'])
>> {"rub"=>"ruby", "ru"=>"ruby", "r"=>"ruby", "ruby"=>"ruby"}
For every keypress you can do a lookup and see if there's a match. I'd filter out all keys shorter than a certain length, to reduce the size of the hash.
The keys will also give you a quick set of words to look up the subword matches in your original string.
For fast lookups to see if there's a substring match:
regexps = Regexp.union(
abbr.keys.sort.reverse.map{ |k|
Regexp.new(
Regexp.escape(k),
Regexp::IGNORECASE
)
}
)
Note that it's escaping the patterns, which would allow characters to be entered, such as ?, * or ., and be treated as literals, instead of special characters for regex, like they would normally be treated.
The result looks like:
/(?i-mx:ruby)|(?i-mx:rub)|(?i-mx:ru)|(?i-mx:r)/
Regexp's match will return information about what was found.
Because the union "ORs" the patterns, it will only find the first match, which will be the shortest occurrence in the string. To fix that reverse the sort.
That should give you a good start on what you want to do.
EDIT: Here's some code to directly answer the question. We've been busy at work so it's taken a couple days to get back this:
require 'abbrev'
require 'pp'
abbr = Abbrev::abbrev(['ruby'])
regexps = Regexp.union( abbr.keys.sort.reverse.map{ |k| Regexp.new( Regexp.escape(k), Regexp::IGNORECASE ) } )
target_str ='Ruby rocks, rub-a-dub-dub, RU there?'
str_offset = 0
offsets = []
loop do
match_results = regexps.match(target_str, str_offset)
break if (match_results.nil?)
s, e = match_results.offset(0)
offsets << [s, e - s]
str_offset = 1 + s
end
pp offsets
>> [[0, 4], [5, 1], [12, 3], [27, 2], [33, 1]]
If you want ranges replace offsets << [s, e - s] with offsets << [s .. e] which will return:
>> [[0..4], [5..6], [12..15], [27..29], [33..34]]
Here's a late entrant that's making a move as it nears the finish line.
code
def substrings( search_str, result_str )
search_chars = search_str.downcase.chars
next_char = search_chars.shift
result_str.downcase.each_char.with_index.take_while.with_object([]) do |(c,i),a|
if next_char == c
(a.empty? || i != a.last.last+1) ? a << (i..i) : a[-1]=(a.last.first..i)
next_char = search_chars.shift
end
next_char
end
end
demo
substrings( "word", "Microsoft Office Word 2007" ) #=> [17..20]
substrings( "word", "Network Setup Wizard" ) #=> [3..5, 19..19]
substrings( "word", "Watch Network Daemon" ) #=> [0..0, 10..11, 14..14]
benchmark
user system total real
phrogz1 1.120000 0.000000 1.120000 ( 1.123083)
cary 0.550000 0.000000 0.550000 ( 0.550728)
I don't think there are any built in methods that will really help with this, probably the best way is to go through each letter in the word you're searching for and build up the ranges manually. Your next best option would probably be to build a regex like in #tokland's answer.
Here's my implementation:
require 'strscan'
def substrings( search, master )
[].tap do |ranges|
scan = StringScanner.new(master)
init = nil
last = nil
prev = nil
search.chars.map do |c|
return nil unless scan.scan_until /#{c}/i
last = scan.pos-1
if !init || (last-prev) > 1
ranges << (init..prev) if init
init = last
end
prev = last
end
ranges << (init..last)
end
end
And here's a shorter version using another utility method (also needed by #tokland's answer):
require 'strscan'
def substrings( search, master )
s = StringScanner.new(master)
search.chars.map do |c|
return nil unless s.scan_until(/#{c}/i)
s.pos - 1
end.to_ranges
end
class Array
def to_ranges
return [] if empty?
[].tap do |ranges|
init,last = first
each do |o|
if last && o != last.succ
ranges << (init..last)
init = o
end
last = o
end
ranges << (init..last)
end
end
end

Resources