String length modification in Ruby - ruby

How can I fix a length of strings to be 10, i.e. an input string will always have less than or 10 characters. If it has less than 10 characters, I have to add 0's at the beginning of the string.
Example of input:
123456
1234567
Needed output:
0000123456
0001234567
Input is arbitrary.
Thanks in advance!

String#rjust does just that:
'1234567'.rjust(10, '0') # => "0001234567"

a = "123456"
b = "1234567"
"0"*(10-a.size)+a
=> "0000123456"
"0"*(10-b.size)+b
=> "0001234567"
Interestingly for speed purposes:
a = "123456"
n = 50000
Benchmark.bm do |x|
x.report{n.times do ; a.rjust(10,'0'); end}
x.report{n.times do ; "0"*(10-a.size)+a; end}
end
user system total real
0.020000 0.000000 0.020000 ( 0.016442)
0.010000 0.000000 0.010000 ( 0.015134)
or a even higher sample size:
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> a = "123456"
=> "123456"
irb(main):003:0> n = 5_000_000
=> 5000000
irb(main):004:0> Benchmark.bm do |x|
irb(main):005:1* x.report{n.times do ; a.rjust(10,'0'); end}
irb(main):006:1> x.report{n.times do ; "0"*(10-a.size)+a; end}
irb(main):007:1> end
user system total real
1.510000 0.000000 1.510000 ( 1.519720)
1.480000 0.000000 1.480000 ( 1.486935)

Related

How do I create the intersection of two hashes?

I have two hashes:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
I need a hash which contains common keys in both hashes:
hash3 = {3 => "hello" , 4 => "world"}
Is it possible to do it without any loop?
hash3 = hash1.keep_if { |k, v| hash2.key? k }
This won't have the same effect as the code in the question, instead it will return:
hash3 #=> { 3 => "c", 4 => "d" }
The order of the hashes is important here. The values will always be taken from the hash that #keep_if is send to.
hash3 = hash2.keep_if { |k, v| hash1.key? k }
#=> {3 => "hello", 4 => "world"}
I'd go with this:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
Hash[(hash1.keys & hash2.keys).zip(hash2.values_at(*(hash1.keys & hash2.keys)))]
=> {3=>"hello", 4=>"world"}
Which can be reduced a bit to:
keys = (hash1.keys & hash2.keys)
Hash[keys.zip(hash2.values_at(*keys))]
The trick is in Array's & method. The documentation says:
Set Intersection — Returns a new array containing elements common to the two arrays, excluding any duplicates. The order is preserved from the original array.
Here are some benchmarks to show what is the most efficient way to do this:
require 'benchmark'
HASH1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
HASH2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
def tinman
keys = (HASH1.keys & HASH2.keys)
Hash[keys.zip(HASH2.values_at(*keys))]
end
def santhosh
HASH2.select {|key, value| HASH1.has_key? key }
end
def santhosh_2
HASH2.select {|key, value| HASH1[key] }
end
def priti
HASH2.select{|k,v| HASH1.assoc(k) }
end
def koraktor
HASH1.keep_if { |k, v| HASH2.key? k }
end
def koraktor2
HASH2.keep_if { |k, v| HASH1.key? k }
end
N = 1_000_000
puts RUBY_VERSION
puts "N= #{N}"
puts [:tinman, :santhosh, :santhosh_2, :priti, :koraktor, :koraktor2].map{ |s| "#{s.to_s} = #{send(s)}" }
Benchmark.bm(11) do |x|
x.report('tinman') { N.times { tinman() }}
x.report('santhosh_2') { N.times { santhosh_2() }}
x.report('santhosh') { N.times { santhosh() }}
x.report('priti') { N.times { priti() }}
x.report('koraktor') { N.times { koraktor() }}
x.report('koraktor2') { N.times { koraktor2() }}
end
Ruby 1.9.3-p448:
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.430000 0.000000 2.430000 ( 2.430030)
santhosh_2 1.000000 0.020000 1.020000 ( 1.003635)
santhosh 1.090000 0.010000 1.100000 ( 1.104067)
priti 1.350000 0.000000 1.350000 ( 1.352476)
koraktor 0.490000 0.000000 0.490000 ( 0.484686)
koraktor2 0.480000 0.000000 0.480000 ( 0.483327)
Running under Ruby 2.0.0-p247:
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.890000 0.000000 1.890000 ( 1.882352)
santhosh_2 0.710000 0.010000 0.720000 ( 0.735830)
santhosh 0.790000 0.020000 0.810000 ( 0.807413)
priti 1.030000 0.010000 1.040000 ( 1.030018)
koraktor 0.390000 0.000000 0.390000 ( 0.389431)
koraktor2 0.390000 0.000000 0.390000 ( 0.389072)
Koraktor's original code doesn't work, but he turned it around nicely with his second code pass, and walks away with the best speed. I added the santhosh_2 method to see what effect removing key? would have. It sped the routine up a little, but not enough to catch up to Koraktor's.
Just for documentation purposes, I tweaked Koraktor's second code to remove the key? method also, and shaved more time from it. Here's the added method and the new output:
def koraktor3
HASH2.keep_if { |k, v| HASH1[k] }
end
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.380000 0.000000 2.380000 ( 2.382392)
santhosh_2 0.970000 0.020000 0.990000 ( 0.976672)
santhosh 1.070000 0.010000 1.080000 ( 1.078397)
priti 1.320000 0.000000 1.320000 ( 1.318652)
koraktor 0.480000 0.000000 0.480000 ( 0.488613)
koraktor2 0.490000 0.000000 0.490000 ( 0.490099)
koraktor3 0.390000 0.000000 0.390000 ( 0.389386)
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.840000 0.000000 1.840000 ( 1.832491)
santhosh_2 0.720000 0.010000 0.730000 ( 0.737737)
santhosh 0.780000 0.020000 0.800000 ( 0.801619)
priti 1.040000 0.010000 1.050000 ( 1.044588)
koraktor 0.390000 0.000000 0.390000 ( 0.387265)
koraktor2 0.390000 0.000000 0.390000 ( 0.388648)
koraktor3 0.320000 0.000000 0.320000 ( 0.327859)
hash2.select {|key, value| hash1.has_key? key }
# => {3=>"hello", 4=>"world"}
Ruby 2.5 has added Hash#slice, which allows a compact code like:
hash3 = hash1.slice(*hash2.keys)
In older rubies this was possible in rails or projects using active support's hash extensions.

What is the complexity of Ruby's Array#insert?

What is the complexity of Ruby's Array#insert?
Is it O(1) or O(n) (memory is copied)?
Simple benchmark shows that insert is O(n):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..100000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
end
user system total real
0.078000 0.000000 0.078000 ( 0.077023)
0.500000 0.000000 0.500000 ( 0.522345)
5.953000 0.000000 5.953000 ( 5.967949)
As long as you don't push to the end of the array, when it becomes O(1):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..100000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..10000000).to_a
x.report { 10000.times {arr.push 1} }
end
user system total real
0.000000 0.000000 0.000000 ( 0.001002)
0.000000 0.000000 0.000000 ( 0.001000)
0.000000 0.000000 0.000000 ( 0.001001)
0.000000 0.000000 0.000000 ( 0.002001)

What's the fastest way to build a string in Ruby?

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying
If efficiency is important to you,
don't build a new string when you can
append items onto an existing string.
[And so on]... Use str << var1 << ' '
<< var2 instead.
But the book was written in 2006.
Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?
Use Array#join when you can, and String#<< when you can't.
The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:
Ruby 1.9.2p180 user system total real
Array#join 0.320000 0.000000 0.320000 ( 0.330224)
String#+ 1 7.730000 0.200000 7.930000 ( 8.373900)
String#+ 2 4.670000 0.600000 5.270000 ( 5.546633)
String#<< 1 1.260000 0.010000 1.270000 ( 1.315991)
String#<< 2 1.600000 0.020000 1.620000 ( 1.793415)
JRuby 1.6.1 user system total real
Array#join 0.185000 0.000000 0.185000 ( 0.185000)
String#+ 1 9.118000 0.000000 9.118000 ( 9.118000)
String#+ 2 4.544000 0.000000 4.544000 ( 4.544000)
String#<< 1 0.865000 0.000000 0.865000 ( 0.866000)
String#<< 2 0.852000 0.000000 0.852000 ( 0.852000)
Ruby 1.8.7p334 user system total real
Array#join 0.290000 0.010000 0.300000 ( 0.305367)
String#+ 1 7.620000 0.060000 7.680000 ( 7.682265)
String#+ 2 4.820000 0.130000 4.950000 ( 4.957258)
String#<< 1 1.290000 0.010000 1.300000 ( 1.304764)
String#<< 2 1.350000 0.010000 1.360000 ( 1.347226)
Rubinius (head) user system total real
Array#join 0.864054 0.008001 0.872055 ( 0.870757)
String#+ 1 9.636602 0.076005 9.712607 ( 9.714820)
String#+ 2 6.456403 0.064004 6.520407 ( 6.521633)
String#<< 1 2.196138 0.016001 2.212139 ( 2.212564)
String#<< 2 2.176136 0.012001 2.188137 ( 2.186298)
Here's the benchmarking code:
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("String#+ 1"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.
What if your source of string bits is not an array?
TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<
Old hands at benchmarking may have looked at #Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("Array#join 2"){
N.times{
arr = Array.new(WORDS.length)
arr[0] = WORDS.first
WORDS[1..-1].each{ |w| arr << w; }
s = WORDS.join(', ')
}
}
x.report("String#+ 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
}
}
x.report("String#<< 2 A"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
small words, ruby 2.1.1
user system total real
Array#join 0.130000 0.000000 0.130000 ( 0.128281)
Array#join 2 0.220000 0.000000 0.220000 ( 0.219588)
String#+ 1 1.720000 0.770000 2.490000 ( 2.478555)
String#+ 2 1.040000 0.370000 1.410000 ( 1.407190)
String#<< 1 0.370000 0.000000 0.370000 ( 0.371125)
String#<< 2 0.360000 0.000000 0.360000 ( 0.360161)
String#<< 2 A 0.310000 0.000000 0.310000 ( 0.318130)
small words, ruby 2.1.1
user system total real
Array#join 0.090000 0.000000 0.090000 ( 0.092072)
Array#join 2 0.180000 0.000000 0.180000 ( 0.180423)
String#+ 1 3.400000 0.750000 4.150000 ( 4.149934)
String#+ 2 1.740000 0.370000 2.110000 ( 2.122511)
String#<< 1 0.360000 0.000000 0.360000 ( 0.359707)
String#<< 2 0.340000 0.000000 0.340000 ( 0.343233)
String#<< 2 A 0.300000 0.000000 0.300000 ( 0.297420)
I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:
WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }
as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<
words often longer than 23 chars, ruby 2.1.1
user system total real
Array#join 0.150000 0.000000 0.150000 ( 0.152846)
Array#join 2 0.230000 0.010000 0.240000 ( 0.231272)
String#+ 1 7.450000 5.490000 12.940000 ( 12.936776)
String#+ 2 4.200000 2.590000 6.790000 ( 6.791125)
String#<< 1 0.400000 0.000000 0.400000 ( 0.399452)
String#<< 2 0.380000 0.010000 0.390000 ( 0.389791)
String#<< 2 A 0.340000 0.000000 0.340000 ( 0.341099)
words often longer than 23 chars, ruby 1.9.3
user system total real
Array#join 0.130000 0.010000 0.140000 ( 0.132957)
Array#join 2 0.220000 0.000000 0.220000 ( 0.220181)
String#+ 1 20.060000 5.230000 25.290000 ( 25.293366)
String#+ 2 9.750000 2.670000 12.420000 ( 12.425229)
String#<< 1 0.390000 0.000000 0.390000 ( 0.397733)
String#<< 2 0.390000 0.000000 0.390000 ( 0.390540)
String#<< 2 A 0.330000 0.000000 0.330000 ( 0.333791)

Check if a string variable is in a set of strings

Which one is better:
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
x =~ /abc|def|ghi/
?
Which one is better? The question can't be easily answered, because they don't all do the same things.
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
compare x against fixed strings for equality. x has to be one of those values. Between those two I tend to go with the second because it's easier to maintain. Imagine what it would look like if you had to compare against twenty, fifty or one hundred strings.
The third test:
x ~= /abc|def|ghi/
matches substrings:
x = 'xyzghi'
(x =~ /abc|def|ghi/) # => 3
so it isn't the same as the first two.
EDIT: There are some things in the benchmarks done by nash that I'd do differently. Using Ruby 1.9.2-p180 on a MacBook Pro, this tests 1,000,000 loops and compares the results of anchoring the regex, using grouping, along with not splitting the %w() array each time through the loop:
require 'benchmark'
str = "test"
n = 1_000_000
Benchmark.bm do |x|
x.report { n.times { str == 'abc' || str == 'def' || str == 'ghi' } }
x.report { n.times { %w(abc def ghi).include? str } }
x.report { ary = %w(abc def ghi); n.times { ary.include? str } }
x.report { n.times { str =~ /abc|def|ghi/ } }
x.report { n.times { str =~ /^abc|def|ghi$/ } }
x.report { n.times { str =~ /^(abc|def|ghi)$/ } }
x.report { n.times { str =~ /^(?:abc|def|ghi)$/ } }
x.report { n.times { str =~ /\b(?:abc|def|ghi)\b/ } }
end
# >> user system total real
# >> 1.160000 0.000000 1.160000 ( 1.165331)
# >> 1.920000 0.000000 1.920000 ( 1.920120)
# >> 0.990000 0.000000 0.990000 ( 0.983921)
# >> 1.070000 0.000000 1.070000 ( 1.068140)
# >> 1.050000 0.010000 1.060000 ( 1.054852)
# >> 1.060000 0.000000 1.060000 ( 1.063909)
# >> 1.060000 0.000000 1.060000 ( 1.050813)
# >> 1.050000 0.000000 1.050000 ( 1.056147)
The first might be a tad quicker, since there are no method calls and your doing straight string comparisons, but its also probably the least readable and least maintainable.
The second is definitely the grooviest, and the ruby way of going about it. It's the most maintainable, and probably the best to read.
The last way uses old school perl regex syntax. Fairly fast, not as annoying as the first to maintain, fairly readable.
I guess it depends what you mean by "better".
some benchmarks:
require 'benchmark'
str = "test"
Benchmark.bm do |x|
x.report {100000.times {if str == 'abc' || str == 'def' || str == 'ghi'; end}}
x.report {100000.times {if %w(abc def ghi).include? str; end}}
x.report {100000.times {if str =~ /abc|def|ghi/; end}}
end
user system total real
0.250000 0.000000 0.250000 ( 0.251014)
0.374000 0.000000 0.374000 ( 0.402023)
0.265000 0.000000 0.265000 ( 0.259014)
So as you can see the first way works faster then other. And the longer str, the slower the last way works:
str = "testasdasdasdasdasddkmfskjndfbdkjngdjgndksnfg"
user system total real
0.234000 0.000000 0.234000 ( 0.248014)
0.405000 0.000000 0.405000 ( 0.403023)
1.046000 0.000000 1.046000 ( 1.038059)

In Ruby how do I generate a long string of repeated text?

What is the best way to generate a long string quickly in ruby? This works, but is very slow:
str = ""
length = 100000
(1..length).each {|i| str += "0"}
I've also noticed that creating a string of a decent length and then appending that to an existing string up to the desired length works much faster:
str = ""
incrementor = ""
length = 100000
(1..1000).each {|i| incrementor += "0"}
(1..100).each {|i| str += incrementor}
Any other suggestions?
str = "0" * 999999
Another relatively quick option is
str = '%0999999d' % 0
Though benchmarking
require 'benchmark'
Benchmark.bm(9) do |x|
x.report('format :') { '%099999999d' % 0 }
x.report('multiply:') { '0' * 99999999 }
end
Shows that multiplication is still faster
user system total real
format : 0.300000 0.080000 0.380000 ( 0.405345)
multiply: 0.080000 0.080000 0.160000 ( 0.172504)

Resources