Check if a string variable is in a set of strings - ruby

Which one is better:
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
x =~ /abc|def|ghi/
?

Which one is better? The question can't be easily answered, because they don't all do the same things.
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
compare x against fixed strings for equality. x has to be one of those values. Between those two I tend to go with the second because it's easier to maintain. Imagine what it would look like if you had to compare against twenty, fifty or one hundred strings.
The third test:
x ~= /abc|def|ghi/
matches substrings:
x = 'xyzghi'
(x =~ /abc|def|ghi/) # => 3
so it isn't the same as the first two.
EDIT: There are some things in the benchmarks done by nash that I'd do differently. Using Ruby 1.9.2-p180 on a MacBook Pro, this tests 1,000,000 loops and compares the results of anchoring the regex, using grouping, along with not splitting the %w() array each time through the loop:
require 'benchmark'
str = "test"
n = 1_000_000
Benchmark.bm do |x|
x.report { n.times { str == 'abc' || str == 'def' || str == 'ghi' } }
x.report { n.times { %w(abc def ghi).include? str } }
x.report { ary = %w(abc def ghi); n.times { ary.include? str } }
x.report { n.times { str =~ /abc|def|ghi/ } }
x.report { n.times { str =~ /^abc|def|ghi$/ } }
x.report { n.times { str =~ /^(abc|def|ghi)$/ } }
x.report { n.times { str =~ /^(?:abc|def|ghi)$/ } }
x.report { n.times { str =~ /\b(?:abc|def|ghi)\b/ } }
end
# >> user system total real
# >> 1.160000 0.000000 1.160000 ( 1.165331)
# >> 1.920000 0.000000 1.920000 ( 1.920120)
# >> 0.990000 0.000000 0.990000 ( 0.983921)
# >> 1.070000 0.000000 1.070000 ( 1.068140)
# >> 1.050000 0.010000 1.060000 ( 1.054852)
# >> 1.060000 0.000000 1.060000 ( 1.063909)
# >> 1.060000 0.000000 1.060000 ( 1.050813)
# >> 1.050000 0.000000 1.050000 ( 1.056147)

The first might be a tad quicker, since there are no method calls and your doing straight string comparisons, but its also probably the least readable and least maintainable.
The second is definitely the grooviest, and the ruby way of going about it. It's the most maintainable, and probably the best to read.
The last way uses old school perl regex syntax. Fairly fast, not as annoying as the first to maintain, fairly readable.
I guess it depends what you mean by "better".

some benchmarks:
require 'benchmark'
str = "test"
Benchmark.bm do |x|
x.report {100000.times {if str == 'abc' || str == 'def' || str == 'ghi'; end}}
x.report {100000.times {if %w(abc def ghi).include? str; end}}
x.report {100000.times {if str =~ /abc|def|ghi/; end}}
end
user system total real
0.250000 0.000000 0.250000 ( 0.251014)
0.374000 0.000000 0.374000 ( 0.402023)
0.265000 0.000000 0.265000 ( 0.259014)
So as you can see the first way works faster then other. And the longer str, the slower the last way works:
str = "testasdasdasdasdasddkmfskjndfbdkjngdjgndksnfg"
user system total real
0.234000 0.000000 0.234000 ( 0.248014)
0.405000 0.000000 0.405000 ( 0.403023)
1.046000 0.000000 1.046000 ( 1.038059)

Related

String length modification in Ruby

How can I fix a length of strings to be 10, i.e. an input string will always have less than or 10 characters. If it has less than 10 characters, I have to add 0's at the beginning of the string.
Example of input:
123456
1234567
Needed output:
0000123456
0001234567
Input is arbitrary.
Thanks in advance!
String#rjust does just that:
'1234567'.rjust(10, '0') # => "0001234567"
a = "123456"
b = "1234567"
"0"*(10-a.size)+a
=> "0000123456"
"0"*(10-b.size)+b
=> "0001234567"
Interestingly for speed purposes:
a = "123456"
n = 50000
Benchmark.bm do |x|
x.report{n.times do ; a.rjust(10,'0'); end}
x.report{n.times do ; "0"*(10-a.size)+a; end}
end
user system total real
0.020000 0.000000 0.020000 ( 0.016442)
0.010000 0.000000 0.010000 ( 0.015134)
or a even higher sample size:
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> a = "123456"
=> "123456"
irb(main):003:0> n = 5_000_000
=> 5000000
irb(main):004:0> Benchmark.bm do |x|
irb(main):005:1* x.report{n.times do ; a.rjust(10,'0'); end}
irb(main):006:1> x.report{n.times do ; "0"*(10-a.size)+a; end}
irb(main):007:1> end
user system total real
1.510000 0.000000 1.510000 ( 1.519720)
1.480000 0.000000 1.480000 ( 1.486935)

Sorting array of string by numbers

I want to sort an array like the following:
["10a","10b","9a","9b","8a","8b"]
When I call,
a = a.sort {|a,b| a <=> b}
it will sort like the following:
["10a","10b","8a","8b","9a","9b"]
The 10 is a string and is not handled as a number. When I first sort by integer and then by string, it will just do the same. Does anybody know how I can handle the 10 as a 10 without making it into an integer? That would mess up the letters a, b etc.
When I first sort by integer and then by string, it will just do the same.
That would have been my first instinct, and it seems to work perfectly:
%w[10a 10b 9a 9b 8a 8b].sort_by {|el| [el.to_i, el] }
# => ['8a', '8b', '9a', '9b', '10a', '10b']
I'd do something like this:
ary = ["10a","10b","9a","9b","8a","8b"]
sorted_ary = ary.sort_by{ |e|
/(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]
}
ary # => ["10a", "10b", "9a", "9b", "8a", "8b"]
sorted_ary # => ["8a", "8b", "9a", "9b", "10a", "10b"]
sorted_by is going to be faster than sort for this sort of problem. Because the value being sorted isn't a direct comparison and we need to dig into it to get the values to use for collation, a normal sort would have to do it multiple times for each element. Instead, using sort_by caches the computed value, and then sorts based on it.
/(?<digit>\d+)(?<alpha>\D+)/ =~ e isn't what you'll normally see for a regular expression. The named-captures ?<digit> and ?<alpha> define the names of local variables that can be accessed immediately, when used in that form.
[digit.to_i, alpha] returns an array consisting of the leading numeric convert to an integer, followed by the character. That array is then used for comparison by sort_by.
Benchmarking sort vs. sort_by using Fruity: I added some length to the array being sorted to push the routines a bit harder for better time resolution.
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 1000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test 2 times. Test will take about 1 second.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 19.999999999999996% ± 1.0%
# >> Running each test once. Test will take about 1 second.
# >> jorge_sort_by is faster than jorg_sort by 10.000000000000009% ± 1.0%
Ruby's sort_by uses a Schwartzian Transform, which can make a major difference in sort speed when dealing with objects where we have to compute the value to be sorted.
Could you run your benchmark for 100_000 instead of 1_000 in the definition of ARY?
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 100_000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test once. Test will take about 10 seconds.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 2x ± 1.0
# >> Running each test once. Test will take about 26 seconds.
# >> jorg_sort is similar to jorge_sort_by
The Wikepedia article has a good efficiency analysis and example that explains why sort_by is preferred for costly comparisons.
Ruby's sort_by documentation also covers this well.
I don't think the size of the array will make much difference. If anything, as the array size grows, if the calculation for the intermediate value is costly, sort_by will still be faster because of its caching. Remember, sort_by is all compiled code, whereas using a Ruby-script-based transform is subject to slower execution as the array is transformed, handed off to sort and then the original object is plucked from the sub-arrays. A larger array means it just has to be done more times.
▶ a = ["10a","10b","9a","9b","8a","8b"]
▶ a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
#=> [
# [0] "8a",
# [1] "8b",
# [2] "9a",
# [3] "9b",
# [4] "10a",
# [5] "10b"
#]
Hope it helps.
Two ways that don't use String#to_i (but rely on the assumption that each string consists of one or more digits followed by one lower case letter).
ary = ["10a","10b","9a","9b","8a","8b","100z", "96b"]
#1
mx = ary.map(&:size).max
ary.sort_by { |s| s.rjust(mx) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
#2
ary.sort_by { |s| s.to_i(36) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
Hmmm, I wonder if:
ary.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
or
ary.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
would be faster.
["10a","10b","9a","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9a", "9b", "10a", "10b"]
["10a3","10a4", "9", "9aa","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9", "9aa", "9b", "10a3", "10a4"]
Gentlemen, start your engines!
I decided to benchmark the various solutions that have been offered. One of the things I was curious about was the effect of converting sort_by solutions to sort solutions. For example, I compared my method
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
to
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
This always involves mapping the original array to the transformed values within the sort_by block, sorting that array, then mapping the results back to the elements in the original array (when that can be done).
I tried this sort_by-to-sort conversion with some of the methods that use sort_by. Not surprisingly, the conversion to sort was generally faster, though the amount of improvement varied quite a bit.
Methods compared
module Methods
def mudasobwa(a)
a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
end
def jorg(a)
a.sort_by {|el| [el.to_i, el] }
end
def jorg_sort(a)
a.map {|el| [el.to_i, el] }.sort.map(&:last)
end
def the(a)
a.sort_by {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha] }
end
def the_sort(a)
a.map {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]}.sort.map {|d,a| d.to_s+a }
end
def engineer(a) a.sort_by { |s|
s.scan(/(\d+)(\D+)/).flatten.tap{ |a| a[0] = a[0].to_i } }
end
def sawa(a) a.sort_by { |s|
s.split(/(\D+)/).map.with_index { |s, i| i.odd? ? s : s.to_i } }
end
def cary_rjust(a)
mx = a.map(&:size).max
a.sort_by {|s| s.rjust(mx)}
end
def cary_rjust_sort(a)
mx = a.map(&:size).max
a.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
end
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
end
include Methods
methods = Methods.instance_methods(false)
#=> [:mudasobwa, :jorg, :jorg_sort, :the, :the_sort,
# :cary_rjust, :cary_rjust_sort, :cary_to_i, :cary_to_i_sort]
Test data and helper
def test_data(n)
a = 10_000.times.to_a.map(&:to_s)
b = [*'a'..'z']
n.times.map { a.sample + b.sample }
end
def compute(m,a)
send(m,a)
end
Confirm methods return the same values
a = test_data(1000)
puts "All methods correct: #{methods.map { |m| compute(m,a) }.uniq.size == 1}"
Benchmark code
require 'benchmark'
indent = methods.map { |m| m.to_s.size }.max
n = 500_000
a = test_data(n)
puts "\nSort random array of size #{n}"
Benchmark.bm(indent) do |bm|
methods.each do |m|
bm.report m.to_s do
compute(m,a)
end
end
end
Test
Sort random array of size 500000
user system total real
mudasobwa 4.760000 0.000000 4.760000 ( 4.765170)
jorg 2.870000 0.020000 2.890000 ( 2.892359)
jorg_sort 2.980000 0.020000 3.000000 ( 3.010344)
the 9.040000 0.100000 9.140000 ( 9.160944)
the_sort 4.570000 0.090000 4.660000 ( 4.668146)
engineer 10.110000 0.070000 10.180000 ( 10.198117)
sawa 27.310000 0.160000 27.470000 ( 27.504958)
cary_rjust 1.080000 0.010000 1.090000 ( 1.087788)
cary_rjust_sort 0.740000 0.000000 0.740000 ( 0.746132)
cary_to_i 0.570000 0.000000 0.570000 ( 0.576570)
cary_to_i_sort 0.460000 0.020000 0.480000 ( 0.477372)
Addendum
#theTinMan demonstrated that the comparisons between the sort_by and sort methods is sensitive to the choice of test data. Using the data he used:
def test_data(n)
(%w[10a 10b 9a 9b 8a 8b] * (n/6)).shuffle
end
I got these results:
Sort random array of size 500000
user system total real
mudasobwa 0.620000 0.000000 0.620000 ( 0.622566)
jorg 0.620000 0.010000 0.630000 ( 0.636018)
jorg_sort 0.640000 0.010000 0.650000 ( 0.638493)
the 8.790000 0.090000 8.880000 ( 8.886725)
the_sort 2.670000 0.070000 2.740000 ( 2.743085)
engineer 3.150000 0.040000 3.190000 ( 3.184534)
sawa 3.460000 0.040000 3.500000 ( 3.506875)
cary_rjust 0.360000 0.010000 0.370000 ( 0.367094)
cary_rjust_sort 0.480000 0.010000 0.490000 ( 0.499956)
cary_to_i 0.190000 0.010000 0.200000 ( 0.187136)
cary_to_i_sort 0.200000 0.000000 0.200000 ( 0.203509)
Notice that the absolute times are also affected.
Can anyone explain the reason for the difference in the benchmarks?

What is the complexity of Ruby's Array#insert?

What is the complexity of Ruby's Array#insert?
Is it O(1) or O(n) (memory is copied)?
Simple benchmark shows that insert is O(n):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..100000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
end
user system total real
0.078000 0.000000 0.078000 ( 0.077023)
0.500000 0.000000 0.500000 ( 0.522345)
5.953000 0.000000 5.953000 ( 5.967949)
As long as you don't push to the end of the array, when it becomes O(1):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..100000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..10000000).to_a
x.report { 10000.times {arr.push 1} }
end
user system total real
0.000000 0.000000 0.000000 ( 0.001002)
0.000000 0.000000 0.000000 ( 0.001000)
0.000000 0.000000 0.000000 ( 0.001001)
0.000000 0.000000 0.000000 ( 0.002001)

Ruby - get bit range from variable

I have a variable and want to take a range of bits from that variable. I want the CLEANEST way to do this.
If x = 19767 and I want bit3 - bit8 (starting from the right):
100110100110111 is 19767 in binary.
I want the part in parenthesis 100110(100110)111 so the answer is 38.
What is the simplest/cleanest/most-elegant way to implement the following function with Ruby?
bit_range(orig_num, first_bit, last_bit)
PS. Bonus points for answers that are computationally less intensive.
19767.to_s(2)[-9..-4].to_i(2)
or
19767 >> 3 & 0x3f
Update:
Soup-to-nuts (why do people say that, anyway?) ...
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
end
p 19767.bit_range(3, 8)
orig_num.to_s(2)[(-last_bit-1)..(-first_bit-1)].to_i(2)
Here is how this could be done using pure number operations:
class Fixnum
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
p 0b100110100110111.slice(3..8) # 0b100110
p 0b100110100110111.slice(3, 6) # 0b100110
Just to show the speeds of the suggested answers:
require 'benchmark'
ORIG_NUMBER = 19767
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
n = 1_000_000
puts "Using #{ n } loops in Ruby #{ RUBY_VERSION }."
Benchmark.bm(21) do |b|
b.report('texasbruce') { n.times { ORIG_NUMBER.to_s(2)[(-8 - 1)..(-3 - 1)].to_i(2) } }
b.report('DigitalRoss string') { n.times { ORIG_NUMBER.to_s(2)[-9..-4].to_i(2) } }
b.report('DigitalRoss binary') { n.times { ORIG_NUMBER >> 3 & 0x3f } }
b.report('DigitalRoss bit_range') { n.times { 19767.bit_range(3, 8) } }
b.report('Philip') { n.times { f(ORIG_NUMBER, 3, 8) } }
b.report('Semyon Perepelitsa') { n.times { ORIG_NUMBER.slice(3..8) } }
end
And the output:
Using 1000000 loops in Ruby 1.9.3.
user system total real
texasbruce 1.240000 0.010000 1.250000 ( 1.243709)
DigitalRoss string 1.000000 0.000000 1.000000 ( 1.006843)
DigitalRoss binary 0.260000 0.000000 0.260000 ( 0.262319)
DigitalRoss bit_range 0.840000 0.000000 0.840000 ( 0.858603)
Philip 1.520000 0.000000 1.520000 ( 1.543751)
Semyon Perepelitsa 1.150000 0.010000 1.160000 ( 1.155422)
That's on my old MacBook Pro. Your mileage might vary.
Makes sense to define a function for that:
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
puts f(19767, 3, 8) # => 38
Expanding on the idea from DigitalRoss - instead of taking two arguments, you can pass a range:
class Fixnum
def bit_range range
len = range.last - range.first + 1
self >> range.first & ~(-1 >> len << len)
end
end
19767.bit_range 3..8

In Ruby how do I generate a long string of repeated text?

What is the best way to generate a long string quickly in ruby? This works, but is very slow:
str = ""
length = 100000
(1..length).each {|i| str += "0"}
I've also noticed that creating a string of a decent length and then appending that to an existing string up to the desired length works much faster:
str = ""
incrementor = ""
length = 100000
(1..1000).each {|i| incrementor += "0"}
(1..100).each {|i| str += incrementor}
Any other suggestions?
str = "0" * 999999
Another relatively quick option is
str = '%0999999d' % 0
Though benchmarking
require 'benchmark'
Benchmark.bm(9) do |x|
x.report('format :') { '%099999999d' % 0 }
x.report('multiply:') { '0' * 99999999 }
end
Shows that multiplication is still faster
user system total real
format : 0.300000 0.080000 0.380000 ( 0.405345)
multiply: 0.080000 0.080000 0.160000 ( 0.172504)

Resources