In Ruby how do I generate a long string of repeated text? - ruby

What is the best way to generate a long string quickly in ruby? This works, but is very slow:
str = ""
length = 100000
(1..length).each {|i| str += "0"}
I've also noticed that creating a string of a decent length and then appending that to an existing string up to the desired length works much faster:
str = ""
incrementor = ""
length = 100000
(1..1000).each {|i| incrementor += "0"}
(1..100).each {|i| str += incrementor}
Any other suggestions?

str = "0" * 999999

Another relatively quick option is
str = '%0999999d' % 0
Though benchmarking
require 'benchmark'
Benchmark.bm(9) do |x|
x.report('format :') { '%099999999d' % 0 }
x.report('multiply:') { '0' * 99999999 }
end
Shows that multiplication is still faster
user system total real
format : 0.300000 0.080000 0.380000 ( 0.405345)
multiply: 0.080000 0.080000 0.160000 ( 0.172504)

Related

String length modification in Ruby

How can I fix a length of strings to be 10, i.e. an input string will always have less than or 10 characters. If it has less than 10 characters, I have to add 0's at the beginning of the string.
Example of input:
123456
1234567
Needed output:
0000123456
0001234567
Input is arbitrary.
Thanks in advance!
String#rjust does just that:
'1234567'.rjust(10, '0') # => "0001234567"
a = "123456"
b = "1234567"
"0"*(10-a.size)+a
=> "0000123456"
"0"*(10-b.size)+b
=> "0001234567"
Interestingly for speed purposes:
a = "123456"
n = 50000
Benchmark.bm do |x|
x.report{n.times do ; a.rjust(10,'0'); end}
x.report{n.times do ; "0"*(10-a.size)+a; end}
end
user system total real
0.020000 0.000000 0.020000 ( 0.016442)
0.010000 0.000000 0.010000 ( 0.015134)
or a even higher sample size:
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> a = "123456"
=> "123456"
irb(main):003:0> n = 5_000_000
=> 5000000
irb(main):004:0> Benchmark.bm do |x|
irb(main):005:1* x.report{n.times do ; a.rjust(10,'0'); end}
irb(main):006:1> x.report{n.times do ; "0"*(10-a.size)+a; end}
irb(main):007:1> end
user system total real
1.510000 0.000000 1.510000 ( 1.519720)
1.480000 0.000000 1.480000 ( 1.486935)

Which Ruby statement is more efficient?

I have a hash table:
hash = Hash.new(0)
hash[:key] = hash[:key] + 1 # Line 1
hash[:key] += 1 # Line 2
Line 1 and Line 2 do the same thing. Looks like line 1 needs to query hash by key two times while line 2 only once. Is that true? Or they are actually same?
I created a ruby script to benchmark it
require 'benchmark'
def my_case1()
#hash[:key] = #hash[:key] + 1
end
def my_case2()
#hash[:key] += 1
end
n = 10000000
Benchmark.bm do |test|
test.report("case 1") {
#hash = Hash.new(1)
#hash[:key] = 0
n.times do; my_case1(); end
}
test.report("case 2") {
#hash = Hash.new(1)
#hash[:key] = 0
n.times do; my_case2(); end
}
end
Here is the result
user system total real
case 1 3.620000 0.080000 3.700000 ( 4.253319)
case 2 3.560000 0.080000 3.640000 ( 4.178699)
It looks hash[:key] += 1 is slightly better.
#sza beat me to it :)
Here is my example irb session:
> require 'benchmark'
=> true
> n = 10000000
=> 10000000
> Benchmark.bm do |x|
> hash = Hash.new(0)
> x.report("Case 1:") { n.times do; hash[:key] = hash[:key] + 1; end }
> hash = Hash.new(0)
> x.report("Case 2:") { n.times do; hash[:key] += 1; end }
> end
user system total real
Case 1: 1.070000 0.000000 1.070000 ( 1.071366)
Case 2: 1.040000 0.000000 1.040000 ( 1.043644)
The Ruby Language Specification spells out the algorithm for evaluating abbreviated indexing assignment expressions quite clearly. It is something like this:
primary_expression[indexing_argument_list] ω= expression
# ω can be any operator, in this example, it is +
is (roughly) evaluated like
o = primary_expression
*l = indexing_argument_list
v = o.[](*l)
w = expression
l << (v ω w)
o.[]=(*l)
In particular, you can see that both the getter and the setter are called exactly once.
You can also see that by looking at the informal desugaring:
hash[:key] += 1
# is syntactic sugar for
hash[:key] = hash[:key] + 1
# which is syntactic sugar for
hash.[]=(:key, hash.[](:key).+(1))
Again, you see that both the setter and the getter are called exactly once.
second one is the customary way of doing it. It is more efficient.

Why is `inject` very slow?

Here is the benchmark
require 'benchmark'
# create random array
arr = 40000.times.map { rand(100000).to_s }
r1 = ''
r2 = ''
r3 = ''
Benchmark.bm do |x|
x.report {
r1 = (arr.map { |s|
"[#{s}]"
}).join
}
x.report {
r2 = arr.inject('') { |memo, s|
memo + "[#{s}]"
}
}
x.report {
r3 = ''
arr.each { |s|
r3 << "[#{s}]"
}
}
end
# confirm result is same
puts r1 == r2
puts r2 == r3
Here is result
user system total real
0.047000 0.000000 0.047000 ( 0.046875)
5.031000 0.844000 5.875000 ( 5.875000)
0.031000 0.000000 0.031000 ( 0.031250)
true
true
Has any way to make inject faster ?
Here's my guess: unlike the two other methods, approach with inject keeps creating bigger and bigger strings. All of them (except the last) are temporary and will have to be garbage-collected. That's wasted memory and CPU right there. This is also a good example of Shlemiel the Painter's algorithm.
... The inefficiency to which Spolsky was drawing an analogy was the poor programming practice of repeated concatenation of C-style null-terminated character arrays (that is, strings) in which the position of the destination string has to be recomputed from the beginning of the string each time because it is not carried over from a previous concatenation. ...
Approach with map creates many small strings, so, at least, it doesn't spend as much time allocating memory.
Update
As pointed out by Yevgeniy Anfilofyev in the comments, you can avoid creation of many big strings by not creating any. Just keep appending to the memo.
r2 = arr.inject('') { |memo, s|
memo << "[#{s}]"
}
This works because both String#+ and String#<< return a new value for the string.

Ruby - get bit range from variable

I have a variable and want to take a range of bits from that variable. I want the CLEANEST way to do this.
If x = 19767 and I want bit3 - bit8 (starting from the right):
100110100110111 is 19767 in binary.
I want the part in parenthesis 100110(100110)111 so the answer is 38.
What is the simplest/cleanest/most-elegant way to implement the following function with Ruby?
bit_range(orig_num, first_bit, last_bit)
PS. Bonus points for answers that are computationally less intensive.
19767.to_s(2)[-9..-4].to_i(2)
or
19767 >> 3 & 0x3f
Update:
Soup-to-nuts (why do people say that, anyway?) ...
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
end
p 19767.bit_range(3, 8)
orig_num.to_s(2)[(-last_bit-1)..(-first_bit-1)].to_i(2)
Here is how this could be done using pure number operations:
class Fixnum
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
p 0b100110100110111.slice(3..8) # 0b100110
p 0b100110100110111.slice(3, 6) # 0b100110
Just to show the speeds of the suggested answers:
require 'benchmark'
ORIG_NUMBER = 19767
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
n = 1_000_000
puts "Using #{ n } loops in Ruby #{ RUBY_VERSION }."
Benchmark.bm(21) do |b|
b.report('texasbruce') { n.times { ORIG_NUMBER.to_s(2)[(-8 - 1)..(-3 - 1)].to_i(2) } }
b.report('DigitalRoss string') { n.times { ORIG_NUMBER.to_s(2)[-9..-4].to_i(2) } }
b.report('DigitalRoss binary') { n.times { ORIG_NUMBER >> 3 & 0x3f } }
b.report('DigitalRoss bit_range') { n.times { 19767.bit_range(3, 8) } }
b.report('Philip') { n.times { f(ORIG_NUMBER, 3, 8) } }
b.report('Semyon Perepelitsa') { n.times { ORIG_NUMBER.slice(3..8) } }
end
And the output:
Using 1000000 loops in Ruby 1.9.3.
user system total real
texasbruce 1.240000 0.010000 1.250000 ( 1.243709)
DigitalRoss string 1.000000 0.000000 1.000000 ( 1.006843)
DigitalRoss binary 0.260000 0.000000 0.260000 ( 0.262319)
DigitalRoss bit_range 0.840000 0.000000 0.840000 ( 0.858603)
Philip 1.520000 0.000000 1.520000 ( 1.543751)
Semyon Perepelitsa 1.150000 0.010000 1.160000 ( 1.155422)
That's on my old MacBook Pro. Your mileage might vary.
Makes sense to define a function for that:
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
puts f(19767, 3, 8) # => 38
Expanding on the idea from DigitalRoss - instead of taking two arguments, you can pass a range:
class Fixnum
def bit_range range
len = range.last - range.first + 1
self >> range.first & ~(-1 >> len << len)
end
end
19767.bit_range 3..8

Check if a string variable is in a set of strings

Which one is better:
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
x =~ /abc|def|ghi/
?
Which one is better? The question can't be easily answered, because they don't all do the same things.
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
compare x against fixed strings for equality. x has to be one of those values. Between those two I tend to go with the second because it's easier to maintain. Imagine what it would look like if you had to compare against twenty, fifty or one hundred strings.
The third test:
x ~= /abc|def|ghi/
matches substrings:
x = 'xyzghi'
(x =~ /abc|def|ghi/) # => 3
so it isn't the same as the first two.
EDIT: There are some things in the benchmarks done by nash that I'd do differently. Using Ruby 1.9.2-p180 on a MacBook Pro, this tests 1,000,000 loops and compares the results of anchoring the regex, using grouping, along with not splitting the %w() array each time through the loop:
require 'benchmark'
str = "test"
n = 1_000_000
Benchmark.bm do |x|
x.report { n.times { str == 'abc' || str == 'def' || str == 'ghi' } }
x.report { n.times { %w(abc def ghi).include? str } }
x.report { ary = %w(abc def ghi); n.times { ary.include? str } }
x.report { n.times { str =~ /abc|def|ghi/ } }
x.report { n.times { str =~ /^abc|def|ghi$/ } }
x.report { n.times { str =~ /^(abc|def|ghi)$/ } }
x.report { n.times { str =~ /^(?:abc|def|ghi)$/ } }
x.report { n.times { str =~ /\b(?:abc|def|ghi)\b/ } }
end
# >> user system total real
# >> 1.160000 0.000000 1.160000 ( 1.165331)
# >> 1.920000 0.000000 1.920000 ( 1.920120)
# >> 0.990000 0.000000 0.990000 ( 0.983921)
# >> 1.070000 0.000000 1.070000 ( 1.068140)
# >> 1.050000 0.010000 1.060000 ( 1.054852)
# >> 1.060000 0.000000 1.060000 ( 1.063909)
# >> 1.060000 0.000000 1.060000 ( 1.050813)
# >> 1.050000 0.000000 1.050000 ( 1.056147)
The first might be a tad quicker, since there are no method calls and your doing straight string comparisons, but its also probably the least readable and least maintainable.
The second is definitely the grooviest, and the ruby way of going about it. It's the most maintainable, and probably the best to read.
The last way uses old school perl regex syntax. Fairly fast, not as annoying as the first to maintain, fairly readable.
I guess it depends what you mean by "better".
some benchmarks:
require 'benchmark'
str = "test"
Benchmark.bm do |x|
x.report {100000.times {if str == 'abc' || str == 'def' || str == 'ghi'; end}}
x.report {100000.times {if %w(abc def ghi).include? str; end}}
x.report {100000.times {if str =~ /abc|def|ghi/; end}}
end
user system total real
0.250000 0.000000 0.250000 ( 0.251014)
0.374000 0.000000 0.374000 ( 0.402023)
0.265000 0.000000 0.265000 ( 0.259014)
So as you can see the first way works faster then other. And the longer str, the slower the last way works:
str = "testasdasdasdasdasddkmfskjndfbdkjngdjgndksnfg"
user system total real
0.234000 0.000000 0.234000 ( 0.248014)
0.405000 0.000000 0.405000 ( 0.403023)
1.046000 0.000000 1.046000 ( 1.038059)

Resources