Say I want to take a number and return its digits as an array in Ruby.
For this specific purpose or for string functions and number functions in general, which is faster?
These are the algorithms I assume would be most commonly used:
Using Strings: n.to_s.split(//).map {|x| x.to_i}
Using Numbers:
array = []
until n = 0
m = n % 10
array.unshift(m)
n /= 10
end
The difference seems to be less than one order of magnitude, with the integer-based approach faster for Fixnums. For Bignums, the relative performance starts out more or less even, with the string approach winning out significantly as the number of digits grows.
As strings
Program
#!/usr/bin/env ruby
require 'profile'
$n = 1234567890
10000.times do
$n.to_s.split(//).map {|x| x.to_i}
end
Output
% cumulative self self total
time seconds seconds calls ms/call ms/call name
55.64 0.74 0.74 10000 0.07 0.10 Array#map
21.05 1.02 0.28 100000 0.00 0.00 String#to_i
10.53 1.16 0.14 1 140.00 1330.00 Integer#times
7.52 1.26 0.10 10000 0.01 0.01 String#split
5.26 1.33 0.07 10000 0.01 0.01 Fixnum#to_s
0.00 1.33 0.00 1 0.00 1330.00 #toplevel
As integers
Program
#!/usr/bin/env ruby
require 'profile'
$n = 1234567890
10000.times do
array = []
n = $n
until n == 0
m = n%10
array.unshift(m)
n /= 10
end
array
end
Output
% cumulative self self total
time seconds seconds calls ms/call ms/call name
70.64 0.77 0.77 1 770.00 1090.00 Integer#times
29.36 1.09 0.32 100000 0.00 0.00 Array#unshift
0.00 1.09 0.00 1 0.00 1090.00 #toplevel
Addendum
The pattern seems to hold for smaller numbers also. With $n = 12345, it was around 800ms for the string-based approach and 550ms for the integer-based approach.
When I crossed the boundary into Bignums, say, with $n = 12345678901234567890, I got 2375ms for both approaches. It would appear that the difference evens out nicely, which I would have taken to mean that the internal local powering Bignum is string-like. However, the documentation seems to suggest otherwise.
For academic purposes, I once again doubled the number of digits to $n = 1234567890123456789012345678901234567890. I got around 4450ms for the string approach and 9850ms for the integer approach, a stark reversal that rules out my previous postulate.
Summary
Number of digits | String program | Integer program | Difference
---------------------------------------------------------------------------
5 | 800ms | 550ms | Integer wins by 250ms
10 | 1330ms | 1090ms | Integer wins by 240ms
20 | 2375ms | 2375ms | Tie
40 | 4450ms | 9850ms | String wins by 4400ms
Steven's response is impressive, but I looked at it for a couple minutes of and couldn't distill it into a simple answer, so here is mine.
For Fixnums
It is fastest to use the digits method I provide below. It's also pretty quick (and much easier) to use num.to_s.each_char.map(&:to_i).
For Bignums
It is fastest to use num.to_s.each_char.map(&:to_i).
The Solution
If speed is honestly the determining factor for what code you use (meaning don't be evil), then this code is the best choice for the job.
class Integer
def digits
working_int, digits = self, Array.new
until working_int.zero?
digits.unshift working_int % 10
working_int /= 10
end
digits
end
end
class Bignum
def digits
to_s.each_char.map(&:to_i)
end
end
Here are the approaches I considered to arrive at this conclusion.
I made a solution with 'benchmark' using the code examples of Steven Xu and a String#each_byte-version.
require 'benchmark'
MAX = 10_000
#Solution based on http://stackoverflow.com/questions/6445496/how-much-slower-are-strings-containing-numbers-compared-to-numbers/6447254#6447254
class Integer
def digits
working_int, digits = self, Array.new
until working_int.zero?
digits.unshift working_int % 10
working_int /= 10
end
digits
end
end
class Bignum
def digits
to_s.each_char.map(&:to_i)
end
end
[
12345,
1234567890,
12345678901234567890,
1234567890123456789012345678901234567890,
].each{|num|
puts "========="
puts "Benchmark #{num}"
Benchmark.bm do|b|
b.report("Integer% ") do
MAX.times {
array = []
n = num
until n == 0
m = n%10
array.unshift(m)
n /= 10
end
array
}
end
b.report("Integer% << ") do
MAX.times {
array = []
n = num
until n == 0
m = n%10
array << m
n /= 10
end
array.reverse
}
end
b.report("Integer#divmod ") do
MAX.times {
array = []
n = num
until n == 0
n, x = *n.divmod(10)
array.unshift(x)
end
array
}
end
b.report("Integer#divmod<<") do
MAX.times {
array = []
n = num
until n == 0
n, x = *n.divmod(10)
array << x
end
array.reverse
}
end
b.report("String+split// ") do
MAX.times { num.to_s.split(//).map {|x| x.to_i} }
end
b.report("String#each_byte") do
MAX.times { num.to_s.each_byte.map{|x| x.chr } }
end
b.report("String#each_char") do
MAX.times { num.to_s.each_char.map{|x| x.to_i } }
end
#http://stackoverflow.com/questions/6445496/how-much-slower-are-strings-containing-numbers-compared-to-numbers/6447254#6447254
b.report("Num#digit ") do
MAX.times { num.to_s.each_char.map{|x| x.to_i } }
end
end
}
My results:
Benchmark 12345
user system total real
Integer% 0.015000 0.000000 0.015000 ( 0.015625)
Integer% << 0.016000 0.000000 0.016000 ( 0.015625)
Integer#divmod 0.047000 0.000000 0.047000 ( 0.046875)
Integer#divmod<< 0.031000 0.000000 0.031000 ( 0.031250)
String+split// 0.109000 0.000000 0.109000 ( 0.109375)
String#each_byte 0.047000 0.000000 0.047000 ( 0.046875)
String#each_char 0.047000 0.000000 0.047000 ( 0.046875)
Num#digit 0.047000 0.000000 0.047000 ( 0.046875)
=========
Benchmark 1234567890
user system total real
Integer% 0.047000 0.000000 0.047000 ( 0.046875)
Integer% << 0.046000 0.000000 0.046000 ( 0.046875)
Integer#divmod 0.063000 0.000000 0.063000 ( 0.062500)
Integer#divmod<< 0.062000 0.000000 0.062000 ( 0.062500)
String+split// 0.188000 0.000000 0.188000 ( 0.187500)
String#each_byte 0.063000 0.000000 0.063000 ( 0.062500)
String#each_char 0.093000 0.000000 0.093000 ( 0.093750)
Num#digit 0.079000 0.000000 0.079000 ( 0.078125)
=========
Benchmark 12345678901234567890
user system total real
Integer% 0.234000 0.000000 0.234000 ( 0.234375)
Integer% << 0.234000 0.000000 0.234000 ( 0.234375)
Integer#divmod 0.203000 0.000000 0.203000 ( 0.203125)
Integer#divmod<< 0.172000 0.000000 0.172000 ( 0.171875)
String+split// 0.266000 0.000000 0.266000 ( 0.265625)
String#each_byte 0.125000 0.000000 0.125000 ( 0.125000)
String#each_char 0.141000 0.000000 0.141000 ( 0.140625)
Num#digit 0.141000 0.000000 0.141000 ( 0.140625)
=========
Benchmark 1234567890123456789012345678901234567890
user system total real
Integer% 0.718000 0.000000 0.718000 ( 0.718750)
Integer% << 0.657000 0.000000 0.657000 ( 0.656250)
Integer#divmod 0.562000 0.000000 0.562000 ( 0.562500)
Integer#divmod<< 0.485000 0.000000 0.485000 ( 0.484375)
String+split// 0.500000 0.000000 0.500000 ( 0.500000)
String#each_byte 0.218000 0.000000 0.218000 ( 0.218750)
String#each_char 0.282000 0.000000 0.282000 ( 0.281250)
Num#digit 0.265000 0.000000 0.265000 ( 0.265625)
String#each_byte/each_char is faster the split, for lower numbers the integer version is faster.
Related
n = 5000000
Benchmark.bm(7) do |x|
x.report("upto :") { for i in 1..n; 0.upto(( 10 )) ; end }
x.report("range :") { for i in 1..n; 0..10 ; end }
end
Results:
user system total real
upto : 1.116440 0.068953 1.185393 ( 1.187705)
range : 0.156921 0.000000 0.156921 ( 0.156759)
Why does upto() takes so much time compared to the manual range ?
This benchmark actually compares oranges to apples now: upto performs the actual iterations under the hood, while 0..10 is a range literal that causes only a creation of a tiny object per iteration (with no memory allocations).
I am trying to perform a 1x1 convolution using the Apple BNNS (Basic Neural Network Subroutine) library in Accelerate.
When I run on a 9x1 column vector, I get unexpected results.
Sample code posted at: https://gist.github.com/cancan101/5887cb93cc91a2d10e2bfd23284bb438 (a modification of BNNS sample code.)
Expected Results:
Print numbers 0-8.
Actual Results:
o0: 0.000000
o1: 0.000000
o2: 0.000000
o3: 3.000000
o4: 0.000000
o5: 5.000000
o6: 0.000000
o7: 7.000000
o8: 0.000000
I suspect I am doing this right, but am open for feedback on the linked code.
If you transpose to row vectors, you'd see expected output
from this:
i_desc.width = 1;
i_desc.height = 9;
i_desc.row_stride = 1;
to this:
i_desc.width = 9;
i_desc.height = 1;
i_desc.row_stride = 9;
same for output:
o_desc.width = 9;
o_desc.height = 1;
o_desc.row_stride = 9;
Result:
Input image stack: 9 x 1 x 1
Output image stack: 9 x 1 x 1
Convolution kernel: 1 x 1
o0: 0.000000
o1: 1.000000
o2: 2.000000
o3: 3.000000
o4: 4.000000
o5: 5.000000
o6: 6.000000
o7: 7.000000
o8: 8.000000
I noticed that array.min seems slow, so I did this test against my own naive implementation:
require 'benchmark'
array = (1..100000).to_a.shuffle
Benchmark.bmbm(5) do |x|
x.report("lib:") { 99.times { min = array.min } }
x.report("own:") { 99.times { min = array[0]; array.each { |n| min = n if n < min } } }
end
The results:
Rehearsal -----------------------------------------
lib: 1.531000 0.000000 1.531000 ( 1.538159)
own: 1.094000 0.016000 1.110000 ( 1.102130)
-------------------------------- total: 2.641000sec
user system total real
lib: 1.500000 0.000000 1.500000 ( 1.515249)
own: 1.125000 0.000000 1.125000 ( 1.145894)
I'm shocked. How can my own implementation running a block via each beat the built-in? And beat it by so much?
Am I somehow mistaken? Or is this somehow normal? I'm confused.
My Ruby version, running on Windows 8.1 Pro:
C:\>ruby --version
ruby 2.2.3p173 (2015-08-18 revision 51636) [i386-mingw32]
Have a look at the implementation of Enumerable#min. It might use each eventually to loop through the elements and get the min element, but before that it does some extra checking to see if it needs to return more than one element, or if it needs to compare the elements via a passed block. In your case the elements will get to be compared via min_i function, and I suspect that's where the speed difference comes from - that function will be slower than simply comparing two numbers.
There's no extra optimization for arrays, all enumerables are traversed the same way.
It's even faster if you use:
def my_min(ary)
the_min = ary[0]
i = 1
len = ary.length
while i < len
the_min = ary[i] if ary[i] < the_min
i += 1
end
the_min
end
NOTE
I know this is not an answer, but I thought it was worth sharing and putting this code into a comment would have been exceedingly ugly.
For those who likes to upgrade to newer versions of software
require 'benchmark'
array = (1..100000).to_a.shuffle
Benchmark.bmbm(5) do |x|
x.report("lib:") { 99.times { min = array.min } }
x.report("own:") { 99.times { min = array[0]; array.each { |n| min = n if n < min } } }
end
Rehearsal -----------------------------------------
lib: 0.021326 0.000017 0.021343 ( 0.021343)
own: 0.498233 0.001024 0.499257 ( 0.499746)
-------------------------------- total: 0.520600sec
user system total real
lib: 0.018126 0.000000 0.018126 ( 0.018139)
own: 0.492046 0.000000 0.492046 ( 0.492367)
RUBY_VERSION # => "2.7.1"
If you are looking into solving this in really performant manner: O(log(n)) or O(n), look at https://en.wikipedia.org/wiki/Selection_algorithm#Incremental_sorting_by_selection and https://en.wikipedia.org/wiki/Heap_(data_structure)
In my special case, I want to decide this for matlab. Is it faster (inside a for loop with ~250.000 runs) with the "if", so fprintf is only used 250 times?
for i=1:250042
if ~(mod(i, 1000))
fprintf(<something to standard output>);
end
end
I know for programming in C, programs were much slower, when printing to the standard output.
It is much slower to print everything. By profiling the following code
clear all
clear classes
for i=1:250042
if ~(mod(i, 1000))
fprintf(['current loop: ', num2str(i), '\n']);
end
end
for i=1:250042
fprintf(['current loop: ', num2str(i), '\n']);
end
I found the following:
time calls line
< 0.01 1 1 clear all
< 0.01 1 2 clear classes
3
1 4 for i=1:250042
0.13 250042 5 if ~(mod(i, 1000))
0.12 250 6 fprintf(['current loop: ', num2str(i), '\n']);
250 7 end
0.24 250042 8 end
9
1 10 for i=1:250042
37.90 250042 11 fprintf(['current loop: ', num2str(i), '\n']);
0.47 250042 12 end
Printing everything is orders of magnitude slower.
Calls to fprintf have considerable overhead, especially for small write operations. For instance, executing the following code:
fid = fopen ( 'a.txt' , 'w+' );
timeStart1000 = tic;
for ( ii = 1 : 10 )
for ( iii = 1 : 100 )
b = num2str ( ii );
fprintf ( fid , b );
end
end
timeStop1000 = toc ( timeStart1000 );
timeStart10 = tic;
for ( ii = 1 : 10 )
c = '';
for ( iii = 1 : 100 )
c = [ c , num2str(ii) ];
end
fprintf ( fid , c );
end
timeStop10 = toc ( timeStart10 );
There is a significant time difference between 1000 and 10 calls to fprintf: timeStop1000 = 0.1816 vs. timeStop10 = 0.0765.
First of all, I prefer to not use i as an index, as it serves to represent the imaginary number in Matlab.
Your question is easy to test:
tic
for jj=1:250042
if ~mod(jj,1000)
disp('Hello')
end
end
a=toc;
tic
for jj=1:250042
if ~mod(jj,100)
disp('Hello')
end
end
b=toc;
clc
disp(a)
disp(b)
gives
0.0295
0.0736
so the answer is: yes, it is faster to print less.
Running this code:
require 'benchmark'
Benchmark.bm do |x|
x.report("1+1") {15_000_000.times {1+1}}
x.report("1+1") {15_000_000.times {1+1}}
x.report("1+1") {15_000_000.times {1+1}}
x.report("1+1") {15_000_000.times {1+1}}
x.report("1+1") {15_000_000.times {1+1}}
end
Outputs these results:
user system total real
1+1 2.188000 0.000000 2.188000 ( 2.250000)
1+1 2.250000 0.000000 2.250000 ( 2.265625)
1+1 2.234000 0.000000 2.234000 ( 2.250000)
1+1 2.203000 0.000000 2.203000 ( 2.250000)
1+1 2.266000 0.000000 2.266000 ( 2.281250)
Guessing the variation is a result of the system environment, but wanted to confirm this is the case.
"Guessing the variation is a result of the system environment", you are right.
Benchmarks can't be precise all time. You don't have a perfect regular machine to run something always in the same time. Take two numbers from benchmark as the same if they were too near, as in this case.
I tried using eval to partially unroll the loop, and although it made it faster, it made the execution time less consistent!
$VERBOSE &&= false # You do not want 15 thousand "warning: useless use of + in void context" warnings
# large_number = 15_000_000 # Too large! Caused eval to take too long, so I gave up
somewhat_large_number = 15_000
unrolled = "def do_addition\n" + ("1+1\n" * somewhat_large_number) + "end\n" ; nil
eval(unrolled)
require 'benchmark'
Benchmark.bm do |x|
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
end
gave me
user system total real
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.765586)
1+1 partially unrolled 0.765000 0.000000 0.765000 ( 0.765586)
1+1 partially unrolled 0.688000 0.000000 0.688000 ( 0.703089)
1+1 partially unrolled 0.797000 0.000000 0.797000 ( 0.796834)
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.749962)
1+1 partially unrolled 0.781000 0.000000 0.781000 ( 0.781210)
1+1 partially unrolled 0.719000 0.000000 0.719000 ( 0.718713)
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.749962)
1+1 partially unrolled 0.765000 0.000000 0.765000 ( 0.765585)
1+1 partially unrolled 0.781000 0.000000 0.781000 ( 0.781210)
For the purpose of comparison, your benchmark on my computer gave
user system total real
1+1 2.406000 0.000000 2.406000 ( 2.406497)
1+1 2.407000 0.000000 2.407000 ( 2.484629)
1+1 2.500000 0.000000 2.500000 ( 2.734655)
1+1 2.515000 0.000000 2.515000 ( 2.765908)
1+1 2.703000 0.000000 2.703000 ( 4.391075)
(real time varied in the last line, but not user or total)