How can I fix a length of strings to be 10, i.e. an input string will always have less than or 10 characters. If it has less than 10 characters, I have to add 0's at the beginning of the string.
Example of input:
123456
1234567
Needed output:
0000123456
0001234567
Input is arbitrary.
Thanks in advance!
String#rjust does just that:
'1234567'.rjust(10, '0') # => "0001234567"
a = "123456"
b = "1234567"
"0"*(10-a.size)+a
=> "0000123456"
"0"*(10-b.size)+b
=> "0001234567"
Interestingly for speed purposes:
a = "123456"
n = 50000
Benchmark.bm do |x|
x.report{n.times do ; a.rjust(10,'0'); end}
x.report{n.times do ; "0"*(10-a.size)+a; end}
end
user system total real
0.020000 0.000000 0.020000 ( 0.016442)
0.010000 0.000000 0.010000 ( 0.015134)
or a even higher sample size:
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> a = "123456"
=> "123456"
irb(main):003:0> n = 5_000_000
=> 5000000
irb(main):004:0> Benchmark.bm do |x|
irb(main):005:1* x.report{n.times do ; a.rjust(10,'0'); end}
irb(main):006:1> x.report{n.times do ; "0"*(10-a.size)+a; end}
irb(main):007:1> end
user system total real
1.510000 0.000000 1.510000 ( 1.519720)
1.480000 0.000000 1.480000 ( 1.486935)
I want to sort an array like the following:
["10a","10b","9a","9b","8a","8b"]
When I call,
a = a.sort {|a,b| a <=> b}
it will sort like the following:
["10a","10b","8a","8b","9a","9b"]
The 10 is a string and is not handled as a number. When I first sort by integer and then by string, it will just do the same. Does anybody know how I can handle the 10 as a 10 without making it into an integer? That would mess up the letters a, b etc.
When I first sort by integer and then by string, it will just do the same.
That would have been my first instinct, and it seems to work perfectly:
%w[10a 10b 9a 9b 8a 8b].sort_by {|el| [el.to_i, el] }
# => ['8a', '8b', '9a', '9b', '10a', '10b']
I'd do something like this:
ary = ["10a","10b","9a","9b","8a","8b"]
sorted_ary = ary.sort_by{ |e|
/(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]
}
ary # => ["10a", "10b", "9a", "9b", "8a", "8b"]
sorted_ary # => ["8a", "8b", "9a", "9b", "10a", "10b"]
sorted_by is going to be faster than sort for this sort of problem. Because the value being sorted isn't a direct comparison and we need to dig into it to get the values to use for collation, a normal sort would have to do it multiple times for each element. Instead, using sort_by caches the computed value, and then sorts based on it.
/(?<digit>\d+)(?<alpha>\D+)/ =~ e isn't what you'll normally see for a regular expression. The named-captures ?<digit> and ?<alpha> define the names of local variables that can be accessed immediately, when used in that form.
[digit.to_i, alpha] returns an array consisting of the leading numeric convert to an integer, followed by the character. That array is then used for comparison by sort_by.
Benchmarking sort vs. sort_by using Fruity: I added some length to the array being sorted to push the routines a bit harder for better time resolution.
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 1000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test 2 times. Test will take about 1 second.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 19.999999999999996% ± 1.0%
# >> Running each test once. Test will take about 1 second.
# >> jorge_sort_by is faster than jorg_sort by 10.000000000000009% ± 1.0%
Ruby's sort_by uses a Schwartzian Transform, which can make a major difference in sort speed when dealing with objects where we have to compute the value to be sorted.
Could you run your benchmark for 100_000 instead of 1_000 in the definition of ARY?
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 100_000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test once. Test will take about 10 seconds.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 2x ± 1.0
# >> Running each test once. Test will take about 26 seconds.
# >> jorg_sort is similar to jorge_sort_by
The Wikepedia article has a good efficiency analysis and example that explains why sort_by is preferred for costly comparisons.
Ruby's sort_by documentation also covers this well.
I don't think the size of the array will make much difference. If anything, as the array size grows, if the calculation for the intermediate value is costly, sort_by will still be faster because of its caching. Remember, sort_by is all compiled code, whereas using a Ruby-script-based transform is subject to slower execution as the array is transformed, handed off to sort and then the original object is plucked from the sub-arrays. A larger array means it just has to be done more times.
▶ a = ["10a","10b","9a","9b","8a","8b"]
▶ a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
#=> [
# [0] "8a",
# [1] "8b",
# [2] "9a",
# [3] "9b",
# [4] "10a",
# [5] "10b"
#]
Hope it helps.
Two ways that don't use String#to_i (but rely on the assumption that each string consists of one or more digits followed by one lower case letter).
ary = ["10a","10b","9a","9b","8a","8b","100z", "96b"]
#1
mx = ary.map(&:size).max
ary.sort_by { |s| s.rjust(mx) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
#2
ary.sort_by { |s| s.to_i(36) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
Hmmm, I wonder if:
ary.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
or
ary.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
would be faster.
["10a","10b","9a","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9a", "9b", "10a", "10b"]
["10a3","10a4", "9", "9aa","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9", "9aa", "9b", "10a3", "10a4"]
Gentlemen, start your engines!
I decided to benchmark the various solutions that have been offered. One of the things I was curious about was the effect of converting sort_by solutions to sort solutions. For example, I compared my method
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
to
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
This always involves mapping the original array to the transformed values within the sort_by block, sorting that array, then mapping the results back to the elements in the original array (when that can be done).
I tried this sort_by-to-sort conversion with some of the methods that use sort_by. Not surprisingly, the conversion to sort was generally faster, though the amount of improvement varied quite a bit.
Methods compared
module Methods
def mudasobwa(a)
a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
end
def jorg(a)
a.sort_by {|el| [el.to_i, el] }
end
def jorg_sort(a)
a.map {|el| [el.to_i, el] }.sort.map(&:last)
end
def the(a)
a.sort_by {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha] }
end
def the_sort(a)
a.map {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]}.sort.map {|d,a| d.to_s+a }
end
def engineer(a) a.sort_by { |s|
s.scan(/(\d+)(\D+)/).flatten.tap{ |a| a[0] = a[0].to_i } }
end
def sawa(a) a.sort_by { |s|
s.split(/(\D+)/).map.with_index { |s, i| i.odd? ? s : s.to_i } }
end
def cary_rjust(a)
mx = a.map(&:size).max
a.sort_by {|s| s.rjust(mx)}
end
def cary_rjust_sort(a)
mx = a.map(&:size).max
a.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
end
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
end
include Methods
methods = Methods.instance_methods(false)
#=> [:mudasobwa, :jorg, :jorg_sort, :the, :the_sort,
# :cary_rjust, :cary_rjust_sort, :cary_to_i, :cary_to_i_sort]
Test data and helper
def test_data(n)
a = 10_000.times.to_a.map(&:to_s)
b = [*'a'..'z']
n.times.map { a.sample + b.sample }
end
def compute(m,a)
send(m,a)
end
Confirm methods return the same values
a = test_data(1000)
puts "All methods correct: #{methods.map { |m| compute(m,a) }.uniq.size == 1}"
Benchmark code
require 'benchmark'
indent = methods.map { |m| m.to_s.size }.max
n = 500_000
a = test_data(n)
puts "\nSort random array of size #{n}"
Benchmark.bm(indent) do |bm|
methods.each do |m|
bm.report m.to_s do
compute(m,a)
end
end
end
Test
Sort random array of size 500000
user system total real
mudasobwa 4.760000 0.000000 4.760000 ( 4.765170)
jorg 2.870000 0.020000 2.890000 ( 2.892359)
jorg_sort 2.980000 0.020000 3.000000 ( 3.010344)
the 9.040000 0.100000 9.140000 ( 9.160944)
the_sort 4.570000 0.090000 4.660000 ( 4.668146)
engineer 10.110000 0.070000 10.180000 ( 10.198117)
sawa 27.310000 0.160000 27.470000 ( 27.504958)
cary_rjust 1.080000 0.010000 1.090000 ( 1.087788)
cary_rjust_sort 0.740000 0.000000 0.740000 ( 0.746132)
cary_to_i 0.570000 0.000000 0.570000 ( 0.576570)
cary_to_i_sort 0.460000 0.020000 0.480000 ( 0.477372)
Addendum
#theTinMan demonstrated that the comparisons between the sort_by and sort methods is sensitive to the choice of test data. Using the data he used:
def test_data(n)
(%w[10a 10b 9a 9b 8a 8b] * (n/6)).shuffle
end
I got these results:
Sort random array of size 500000
user system total real
mudasobwa 0.620000 0.000000 0.620000 ( 0.622566)
jorg 0.620000 0.010000 0.630000 ( 0.636018)
jorg_sort 0.640000 0.010000 0.650000 ( 0.638493)
the 8.790000 0.090000 8.880000 ( 8.886725)
the_sort 2.670000 0.070000 2.740000 ( 2.743085)
engineer 3.150000 0.040000 3.190000 ( 3.184534)
sawa 3.460000 0.040000 3.500000 ( 3.506875)
cary_rjust 0.360000 0.010000 0.370000 ( 0.367094)
cary_rjust_sort 0.480000 0.010000 0.490000 ( 0.499956)
cary_to_i 0.190000 0.010000 0.200000 ( 0.187136)
cary_to_i_sort 0.200000 0.000000 0.200000 ( 0.203509)
Notice that the absolute times are also affected.
Can anyone explain the reason for the difference in the benchmarks?
What is the complexity of Ruby's Array#insert?
Is it O(1) or O(n) (memory is copied)?
Simple benchmark shows that insert is O(n):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..100000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
end
user system total real
0.078000 0.000000 0.078000 ( 0.077023)
0.500000 0.000000 0.500000 ( 0.522345)
5.953000 0.000000 5.953000 ( 5.967949)
As long as you don't push to the end of the array, when it becomes O(1):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..100000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..10000000).to_a
x.report { 10000.times {arr.push 1} }
end
user system total real
0.000000 0.000000 0.000000 ( 0.001002)
0.000000 0.000000 0.000000 ( 0.001000)
0.000000 0.000000 0.000000 ( 0.001001)
0.000000 0.000000 0.000000 ( 0.002001)
I have a variable and want to take a range of bits from that variable. I want the CLEANEST way to do this.
If x = 19767 and I want bit3 - bit8 (starting from the right):
100110100110111 is 19767 in binary.
I want the part in parenthesis 100110(100110)111 so the answer is 38.
What is the simplest/cleanest/most-elegant way to implement the following function with Ruby?
bit_range(orig_num, first_bit, last_bit)
PS. Bonus points for answers that are computationally less intensive.
19767.to_s(2)[-9..-4].to_i(2)
or
19767 >> 3 & 0x3f
Update:
Soup-to-nuts (why do people say that, anyway?) ...
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
end
p 19767.bit_range(3, 8)
orig_num.to_s(2)[(-last_bit-1)..(-first_bit-1)].to_i(2)
Here is how this could be done using pure number operations:
class Fixnum
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
p 0b100110100110111.slice(3..8) # 0b100110
p 0b100110100110111.slice(3, 6) # 0b100110
Just to show the speeds of the suggested answers:
require 'benchmark'
ORIG_NUMBER = 19767
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
class Fixnum
def bit_range low, high
len = high - low + 1
self >> low & ~(-1 >> len << len)
end
def slice(range_or_start, length = nil)
if length
start = range_or_start
else
range = range_or_start
start = range.begin
length = range.count
end
mask = 2 ** length - 1
self >> start & mask
end
end
def p n
puts "0b#{n.to_s(2)}"; n
end
n = 1_000_000
puts "Using #{ n } loops in Ruby #{ RUBY_VERSION }."
Benchmark.bm(21) do |b|
b.report('texasbruce') { n.times { ORIG_NUMBER.to_s(2)[(-8 - 1)..(-3 - 1)].to_i(2) } }
b.report('DigitalRoss string') { n.times { ORIG_NUMBER.to_s(2)[-9..-4].to_i(2) } }
b.report('DigitalRoss binary') { n.times { ORIG_NUMBER >> 3 & 0x3f } }
b.report('DigitalRoss bit_range') { n.times { 19767.bit_range(3, 8) } }
b.report('Philip') { n.times { f(ORIG_NUMBER, 3, 8) } }
b.report('Semyon Perepelitsa') { n.times { ORIG_NUMBER.slice(3..8) } }
end
And the output:
Using 1000000 loops in Ruby 1.9.3.
user system total real
texasbruce 1.240000 0.010000 1.250000 ( 1.243709)
DigitalRoss string 1.000000 0.000000 1.000000 ( 1.006843)
DigitalRoss binary 0.260000 0.000000 0.260000 ( 0.262319)
DigitalRoss bit_range 0.840000 0.000000 0.840000 ( 0.858603)
Philip 1.520000 0.000000 1.520000 ( 1.543751)
Semyon Perepelitsa 1.150000 0.010000 1.160000 ( 1.155422)
That's on my old MacBook Pro. Your mileage might vary.
Makes sense to define a function for that:
def f(x,i,j)
b = x.to_s(2)
n = b.size
b[(n-j-1)...(n-i)].to_i(2)
end
puts f(19767, 3, 8) # => 38
Expanding on the idea from DigitalRoss - instead of taking two arguments, you can pass a range:
class Fixnum
def bit_range range
len = range.last - range.first + 1
self >> range.first & ~(-1 >> len << len)
end
end
19767.bit_range 3..8