Why is array.min so slow? - ruby

I noticed that array.min seems slow, so I did this test against my own naive implementation:
require 'benchmark'
array = (1..100000).to_a.shuffle
Benchmark.bmbm(5) do |x|
x.report("lib:") { 99.times { min = array.min } }
x.report("own:") { 99.times { min = array[0]; array.each { |n| min = n if n < min } } }
end
The results:
Rehearsal -----------------------------------------
lib: 1.531000 0.000000 1.531000 ( 1.538159)
own: 1.094000 0.016000 1.110000 ( 1.102130)
-------------------------------- total: 2.641000sec
user system total real
lib: 1.500000 0.000000 1.500000 ( 1.515249)
own: 1.125000 0.000000 1.125000 ( 1.145894)
I'm shocked. How can my own implementation running a block via each beat the built-in? And beat it by so much?
Am I somehow mistaken? Or is this somehow normal? I'm confused.
My Ruby version, running on Windows 8.1 Pro:
C:\>ruby --version
ruby 2.2.3p173 (2015-08-18 revision 51636) [i386-mingw32]

Have a look at the implementation of Enumerable#min. It might use each eventually to loop through the elements and get the min element, but before that it does some extra checking to see if it needs to return more than one element, or if it needs to compare the elements via a passed block. In your case the elements will get to be compared via min_i function, and I suspect that's where the speed difference comes from - that function will be slower than simply comparing two numbers.
There's no extra optimization for arrays, all enumerables are traversed the same way.

It's even faster if you use:
def my_min(ary)
the_min = ary[0]
i = 1
len = ary.length
while i < len
the_min = ary[i] if ary[i] < the_min
i += 1
end
the_min
end
NOTE
I know this is not an answer, but I thought it was worth sharing and putting this code into a comment would have been exceedingly ugly.

For those who likes to upgrade to newer versions of software
require 'benchmark'
array = (1..100000).to_a.shuffle
Benchmark.bmbm(5) do |x|
x.report("lib:") { 99.times { min = array.min } }
x.report("own:") { 99.times { min = array[0]; array.each { |n| min = n if n < min } } }
end
Rehearsal -----------------------------------------
lib: 0.021326 0.000017 0.021343 ( 0.021343)
own: 0.498233 0.001024 0.499257 ( 0.499746)
-------------------------------- total: 0.520600sec
user system total real
lib: 0.018126 0.000000 0.018126 ( 0.018139)
own: 0.492046 0.000000 0.492046 ( 0.492367)
RUBY_VERSION # => "2.7.1"
If you are looking into solving this in really performant manner: O(log(n)) or O(n), look at https://en.wikipedia.org/wiki/Selection_algorithm#Incremental_sorting_by_selection and https://en.wikipedia.org/wiki/Heap_(data_structure)

Related

why ruby upto method is so slow?

n = 5000000
Benchmark.bm(7) do |x|
x.report("upto :") { for i in 1..n; 0.upto(( 10 )) ; end }
x.report("range :") { for i in 1..n; 0..10 ; end }
end
Results:
user system total real
upto : 1.116440 0.068953 1.185393 ( 1.187705)
range : 0.156921 0.000000 0.156921 ( 0.156759)
Why does upto() takes so much time compared to the manual range ?
This benchmark actually compares oranges to apples now: upto performs the actual iterations under the hood, while 0..10 is a range literal that causes only a creation of a tiny object per iteration (with no memory allocations).

Slow Ruby for simple array operations

I created simple insertion sort implementation in Ruby, following pseudo-code from Cormen's "Introduction to Algorithms":
def sort_insert(array)
(1 ... array.length).each do |item_index|
key = array[item_index]
i = item_index - 1
while i >= 0 && array[i] > key do
array[i + 1] = array[i]
i -= 1
end
array[i + 1] = key
end
array
end
It works, but performs very slowly. For ~20k elements array array = ((0..10_000).to_a * 2).shuffle, it takes about 20 seconds to sort. I measure time only for this method call, no data preparation etc. In JavaScript, a very similar solution to this takes about 1 second. Why is Ruby (v. 2.2.2p95) so slow here?
Edit:
JS version of this sorting, which I use:
function SortMethods() {
}
SortMethods.prototype.sortInsert = function(array) {
for(let itemIndex = 1; itemIndex < array.length; itemIndex++) {
let key = array[itemIndex];
let i = itemIndex - 1;
while( i >= 0 && array[i] > key) {
array[i + 1] = array[i];
i--;
}
array[i + 1] = key;
}
return array;
}
I'm going to disagree with the premise of your question—Ruby is turning in more than respectable performance on my machine.
To illustrate this, I created a file with 100k random numbers:
$ ruby -e '100_000.times {printf "%22.20f\n", rand}' > rand100k.csv
I then sorted this with the system sort utility, saving the results for later comparison (as a correctness check):
$ time sort -n < rand100k.csv > foo
real 0m0.067s
user 0m0.056s
sys 0m0.011s
I wrote a quicksort algorithm (which flips over to insertion sort when the sublist size gets small enough) in pure ruby, ran it, saved the results, and diff'ed the system sort output and ruby output files:
$ time ruby quicksort_w_insertion.rb < rand100k.csv > bar
real 0m0.546s
user 0m0.537s
sys 0m0.008s
$ diff foo bar
$
As you can see, both sorting runs produce identical output, and very quickly. In my opinion, a pure Ruby program which is only 8-10 times slower than the corresponding system utility is doing mighty darn fine speed-wise.
These runs were made with ruby 2.2.2p95 (2015-04-13 revision 50293) [x86_64-darwin14] on a MacBook Pro.

Source of Ruby benchmark irregularites

Running this code:
require 'benchmark'
Benchmark.bm do |x|
  x.report("1+1") {15_000_000.times {1+1}}
  x.report("1+1") {15_000_000.times {1+1}}
  x.report("1+1") {15_000_000.times {1+1}}
  x.report("1+1") {15_000_000.times {1+1}}
  x.report("1+1") {15_000_000.times {1+1}}
end
Outputs these results:
       user     system      total        real
1+1  2.188000   0.000000   2.188000 (  2.250000)
1+1  2.250000   0.000000   2.250000 (  2.265625)
1+1  2.234000   0.000000   2.234000 (  2.250000)
1+1  2.203000   0.000000   2.203000 (  2.250000)
1+1  2.266000   0.000000   2.266000 (  2.281250)
Guessing the variation is a result of the system environment, but wanted to confirm this is the case.
"Guessing the variation is a result of the system environment", you are right.
Benchmarks can't be precise all time. You don't have a perfect regular machine to run something always in the same time. Take two numbers from benchmark as the same if they were too near, as in this case.
I tried using eval to partially unroll the loop, and although it made it faster, it made the execution time less consistent!
$VERBOSE &&= false # You do not want 15 thousand "warning: useless use of + in void context" warnings
# large_number = 15_000_000 # Too large! Caused eval to take too long, so I gave up
somewhat_large_number = 15_000
unrolled = "def do_addition\n" + ("1+1\n" * somewhat_large_number) + "end\n" ; nil
eval(unrolled)
require 'benchmark'
Benchmark.bm do |x|
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
x.report("1+1 partially unrolled") { i = 0; while i < 1000; do_addition; i += 1; end}
end
gave me
user system total real
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.765586)
1+1 partially unrolled 0.765000 0.000000 0.765000 ( 0.765586)
1+1 partially unrolled 0.688000 0.000000 0.688000 ( 0.703089)
1+1 partially unrolled 0.797000 0.000000 0.797000 ( 0.796834)
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.749962)
1+1 partially unrolled 0.781000 0.000000 0.781000 ( 0.781210)
1+1 partially unrolled 0.719000 0.000000 0.719000 ( 0.718713)
1+1 partially unrolled 0.750000 0.000000 0.750000 ( 0.749962)
1+1 partially unrolled 0.765000 0.000000 0.765000 ( 0.765585)
1+1 partially unrolled 0.781000 0.000000 0.781000 ( 0.781210)
For the purpose of comparison, your benchmark on my computer gave
user system total real
1+1 2.406000 0.000000 2.406000 ( 2.406497)
1+1 2.407000 0.000000 2.407000 ( 2.484629)
1+1 2.500000 0.000000 2.500000 ( 2.734655)
1+1 2.515000 0.000000 2.515000 ( 2.765908)
1+1 2.703000 0.000000 2.703000 ( 4.391075)
(real time varied in the last line, but not user or total)

Number crunching in Ruby (optimisation needed)

Ruby may not be the optimal language for this but I'm sort of comfortable working with this in my terminal so that's what I'm going with.
I need to process the numbers from 1 to 666666 so I pin out all the numbers that contain 6 but doesn't contain 7, 8 or 9. The first number will be 6, the next 16, then 26 and so forth.
Then I needed it printed like this (6=6) (16=6) (26=6) and when I have ranges like 60 to 66 I need it printed like (60 THRU 66=6) (SPSS syntax).
I have this code and it works but it's neither beautiful nor very efficient so how could I optimize it?
(silly code may follow)
class Array
def to_ranges
array = self.compact.uniq.sort
ranges = []
if !array.empty?
# Initialize the left and right endpoints of the range
left, right = array.first, nil
array.each do |obj|
# If the right endpoint is set and obj is not equal to right's successor
# then we need to create a range.
if right && obj != right.succ
ranges << Range.new(left,right)
left = obj
end
right = obj
end
ranges << Range.new(left,right) unless left == right
end
ranges
end
end
write = ""
numbers = (1..666666).to_a
# split each number in an array containing it's ciphers
numbers = numbers.map { |i| i.to_s.split(//) }
# delete the arrays that doesn't contain 6 and the ones that contains 6 but also 8, 7 and 9
numbers = numbers.delete_if { |i| !i.include?('6') }
numbers = numbers.delete_if { |i| i.include?('7') }
numbers = numbers.delete_if { |i| i.include?('8') }
numbers = numbers.delete_if { |i| i.include?('9') }
# join the ciphers back into the original numbers
numbers = numbers.map { |i| i.join }
numbers = numbers.map { |i| i = Integer(i) }
# rangify consecutive numbers
numbers = numbers.to_ranges
# edit the ranges that go from 1..1 into just 1
numbers = numbers.map do |i|
if i.first == i.last
i = i.first
else
i = i
end
end
# string stuff
numbers = numbers.map { |i| i.to_s.gsub(".."," thru ") }
numbers = numbers.map { |i| "(" + i.to_s + "=6)"}
numbers.each { |i| write << " " + i }
File.open('numbers.txt','w') { |f| f.write(write) }
As I said it works for numbers even in the millions - but I'd like some advice on how to make prettier and more efficient.
I deleted my earlier attempt to parlez-vous-ruby? and made up for that. I know have an optimized version of x3ro's excellent example.
$,="\n"
puts ["(0=6)", "(6=6)", *(1.."66666".to_i(7)).collect {|i| i.to_s 7}.collect do |s|
s.include?('6')? "(#{s}0 THRU #{s}6=6)" : "(#{s}6=6)"
end ]
Compared to x3ro's version
... It is down to three lines
... 204.2 x faster (to 66666666)
... has byte-identical output
It uses all my ideas for optimization
gen numbers based on modulo 7 digits (so base-7 numbers)
generate the last digit 'smart': this is what compresses the ranges
So... what are the timings? This was testing with 8 digits (to 66666666, or 823544 lines of output):
$ time ./x3ro.rb > /dev/null
real 8m37.749s
user 8m36.700s
sys 0m0.976s
$ time ./my.rb > /dev/null
real 0m2.535s
user 0m2.460s
sys 0m0.072s
Even though the performance is actually good, it isn't even close to the C optimized version I posted before: I couldn't run my.rb to 6666666666 (6x10) because of OutOfMemory. When running to 9 digits, this is the comparative result:
sehe#meerkat:/tmp$ time ./my.rb > /dev/null
real 0m21.764s
user 0m21.289s
sys 0m0.476s
sehe#meerkat:/tmp$ time ./t2 > /dev/null
real 0m1.424s
user 0m1.408s
sys 0m0.012s
The C version is still some 15x faster... which is only fair considering that it runs on the bare metal.
Hope you enjoyed it, and can I please have your votes if only for learning Ruby for the purpose :)
(Can you tell I'm proud? This is my first encounter with ruby; I started the ruby koans 2 hours ago...)
Edit by #johndouthat:
Very nice! The use of base7 is very clever and this a great job for your first ruby trial :)
Here's a slight modification of your snippet that will let you test 10+ digits without getting an OutOfMemory error:
puts ["(0=6)", "(6=6)"]
(1.."66666666".to_i(7)).each do |i|
s = i.to_s(7)
puts s.include?('6') ? "(#{s}0 THRU #{s}6=6)" : "(#{s}6=6)"
end
# before:
real 0m26.714s
user 0m23.368s
sys 0m2.865s
# after
real 0m15.894s
user 0m13.258s
sys 0m1.724s
Exploiting patterns in the numbers, you can short-circuit lots of the loops, like this:
If you define a prefix as the 100s place and everything before it,
and define the suffix as everything in the 10s and 1s place, then, looping
through each possible prefix:
If the prefix is blank (i.e. you're testing 0-99), then there are 13 possible matches
elsif the prefix contains a 7, 8, or 9, there are no possible matches.
elsif the prefix contains a 6, there are 49 possible matches (a 7x7 grid)
else, there are 13 possible matches. (see the image below)
(the code doesn't yet exclude numbers that aren't specifically in the range, but it's pretty close)
number_range = (1..666_666)
prefix_range = ((number_range.first / 100)..(number_range.last / 100))
for p in prefix_range
ps = p.to_s
# TODO: if p == prefix_range.last or p == prefix_range.first,
# TODO: test to see if number_range.include?("#{ps}6".to_i), etc...
if ps == '0'
puts "(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66) "
elsif ps =~ /7|8|9/
# there are no candidate suffixes if the prefix contains 7, 8, or 9.
elsif ps =~ /6/
# If the prefix contains a 6, then there are 49 candidate suffixes
for i in (0..6)
print "(#{ps}#{i}0 thru #{ps}#{i}6) "
end
puts
else
# If the prefix doesn't contain 6, 7, 8, or 9, then there are only 13 candidate suffixes.
puts "(#{ps}06=6) (#{ps}16=6) (#{ps}26=6) (#{ps}36=6) (#{ps}46=6) (#{ps}56=6) (#{ps}60 thru #{ps}66) "
end
end
Which prints out the following:
(6=6) (16=6) (26=6) (36=6) (46=6) (56=6) (60 thru 66)
(106=6) (116=6) (126=6) (136=6) (146=6) (156=6) (160 thru 166)
(206=6) (216=6) (226=6) (236=6) (246=6) (256=6) (260 thru 266)
(306=6) (316=6) (326=6) (336=6) (346=6) (356=6) (360 thru 366)
(406=6) (416=6) (426=6) (436=6) (446=6) (456=6) (460 thru 466)
(506=6) (516=6) (526=6) (536=6) (546=6) (556=6) (560 thru 566)
(600 thru 606) (610 thru 616) (620 thru 626) (630 thru 636) (640 thru 646) (650 thru 656) (660 thru 666)
(1006=6) (1016=6) (1026=6) (1036=6) (1046=6) (1056=6) (1060 thru 1066)
(1106=6) (1116=6) (1126=6) (1136=6) (1146=6) (1156=6) (1160 thru 1166)
(1206=6) (1216=6) (1226=6) (1236=6) (1246=6) (1256=6) (1260 thru 1266)
(1306=6) (1316=6) (1326=6) (1336=6) (1346=6) (1356=6) (1360 thru 1366)
(1406=6) (1416=6) (1426=6) (1436=6) (1446=6) (1456=6) (1460 thru 1466)
(1506=6) (1516=6) (1526=6) (1536=6) (1546=6) (1556=6) (1560 thru 1566)
(1600 thru 1606) (1610 thru 1616) (1620 thru 1626) (1630 thru 1636) (1640 thru 1646) (1650 thru 1656) (1660 thru 1666)
etc...
Note I don't speak ruby, but I intend to dohave done a ruby version later just for speed comparison :)
If you just iterate all numbers from 0 to 117648 (ruby <<< 'print "666666".to_i(7)') and print them in base-7 notation, you'll at least have discarded any numbers containing 7,8,9. This includes the optimization suggestion by MrE, apart from lifting the problem to simple int arithmetic instead of char-sequence manipulations.
All that remains, is to check for the presence of at least one 6. This would make the algorithm skip at most 6 items in a row, so I deem it less unimportant (the average number of skippable items on the total range is 40%).
Simple benchmark to 6666666666
(Note that this means outputting 222,009,073 (222M) lines of 6-y numbers)
Staying close to this idea, I wrote this quite highly optimized C code (I don't speak ruby) to demonstrate the idea. I ran it to 282475248 (congruent to 6666666666 (mod 7)) so it was more of a benchmark to measure: 0m26.5s
#include <stdio.h>
static char buf[11];
char* const bufend = buf+10;
char* genbase7(int n)
{
char* it = bufend; int has6 = 0;
do
{
has6 |= 6 == (*--it = n%7);
n/=7;
} while(n);
return has6? it : 0;
}
void asciify(char* rawdigits)
{
do { *rawdigits += '0'; }
while (++rawdigits != bufend);
}
int main()
{
*bufend = 0; // init
long i;
for (i=6; i<=282475248; i++)
{
char* b7 = genbase7(i);
if (b7)
{
asciify(b7);
puts(b7);
}
}
}
I also benchmarked another approach, which unsurprisingly ran in less than half the time because
this version directly manipulates the results in ascii string form, ready for display
this version shortcuts the has6 flag for deeper recursion levels
this version also optimizes the 'twiddling' of the last digit when it is required to be '6'
the code is simply shorter...
Running time: 0m12.8s
#include <stdio.h>
#include <string.h>
inline void recursive_permute2(char* const b, char* const m, char* const e, int has6)
{
if (m<e)
for (*m = '0'; *m<'7'; (*m)++)
recursive_permute2(b, m+1, e, has6 || (*m=='6'));
else
if (has6)
for (*e = '0'; *e<'7'; (*e)++)
puts(b);
else /* optimize for last digit must be 6 */
puts((*e='6', b));
}
inline void recursive_permute(char* const b, char* const e)
{
recursive_permute2(b, b, e-1, 0);
}
int main()
{
char buf[] = "0000000000";
recursive_permute(buf, buf+sizeof(buf)/sizeof(*buf)-1);
}
Benchmarks measured with:
gcc -O4 t6.c -o t6
time ./t6 > /dev/null
$range_start = -1
$range_end = -1
$f = File.open('numbers.txt','w')
def output_number(i)
if $range_end == i-1
$range_end = i
elsif $range_start < $range_end
$f.puts "(#{$range_start} thru #{$range_end})"
$range_start = $range_end = i
else
$f.puts "(#{$range_start}=6)" if $range_start > 0 # no range, print out previous number
$range_start = $range_end = i
end
end
'1'.upto('666') do |n|
next unless n =~ /6/ # keep only numbers that contain 6
next if n =~ /[789]/ # remove nubmers that contain 7, 8 or 9
output_number n.to_i
end
if $range_start < $range_end
$f.puts "(#{$range_start} thru #{$range_end})"
end
$f.close
puts "Ruby is beautiful :)"
I came up with this piece of code, which I tried to keep more or less in FP-styling. Probably not much more efficient (as it has been said, with basic number logic you will be able to increase performance, for example by skipping from 19xx to 2000 directly, but that I will leave up to you :)
def check(n)
n = n.to_s
n.include?('6') and
not n.include?('7') and
not n.include?('8') and
not n.include?('9')
end
def spss(ranges)
ranges.each do |range|
if range.first === range.last
puts "(" + range.first.to_s + "=6)"
else
puts "(" + range.first.to_s + " THRU " + range.last.to_s + "=6)"
end
end
end
range = (1..666666)
range = range.select { |n| check(n) }
range = range.inject([0..0]) do |ranges, n|
temp = ranges.last
if temp.last + 1 === n
ranges.pop
ranges.push(temp.first..n)
else
ranges.push(n..n)
end
end
spss(range)
My first answer was trying to be too clever. Here is a much simpler version
class MutablePrintingCandidateRange < Struct.new(:first, :last)
def to_s
if self.first == nil and self.last == nil
''
elsif self.first == self.last
"(#{self.first}=6)"
else
"(#{self.first} thru #{self.last})"
end
end
def <<(x)
if self.first == nil and self.last == nil
self.first = self.last = x
elsif self.last == x - 1
self.last = x
else
puts(self) # print the candidates
self.first = self.last = x # reset the range
end
end
end
and how to use it:
numer_range = (1..666_666)
current_range = MutablePrintingCandidateRange.new
for i in numer_range
candidate = i.to_s
if candidate =~ /6/ and candidate !~ /7|8|9/
# number contains a 6, but not a 7, 8, or 9
current_range << i
end
end
puts current_range
Basic observation: If the current number is (say) 1900 you know that you can safely skip up to at least 2000...
(I didn't bother updating my C solution for formatting. Instead I went with x3ro's excellent ruby version and optimized that)
Undeleted:
I still am not sure whether the changed range-notation behaviour isn't actually what the OP wants: This version changes the behaviour of breaking up ranges that are actually contiguous modulo 6; I wouldn't be surprised the OP actually expected
.
....
(555536=6)
(555546=6)
(555556 THRU 666666=6)
instead of
....
(666640 THRU 666646=6)
(666650 THRU 666656=6)
(666660 THRU 666666=6)
I'll let the OP decide, and here is the modified version, which runs in 18% of the time as x3ro's version (3.2s instead of 17.0s when generating up to 6666666 (7x6)).
def check(n)
n.to_s(7).include?('6')
end
def spss(ranges)
ranges.each do |range|
if range.first === range.last
puts "(" + range.first.to_s(7) + "=6)"
else
puts "(" + range.first.to_s(7) + " THRU " + range.last.to_s(7) + "=6)"
end
end
end
range = (1..117648)
range = range.select { |n| check(n) }
range = range.inject([0..0]) do |ranges, n|
temp = ranges.last
if temp.last + 1 === n
ranges.pop
ranges.push(temp.first..n)
else
ranges.push(n..n)
end
end
spss(range)
My answer below is not complete, but just to show a path (I might come back and continue the answer):
There are only two cases:
1) All the digits besides the lowest one is either absent or not 6
6, 16, ...
2) At least one digit besides the lowest one includes 6
60--66, 160--166, 600--606, ...
Cases in (1) do not include any continuous numbers because they all have 6 in the lowest digit, and are different from one another. Cases in (2) all appear as continuous ranges where the lowest digit continues from 0 to 6. Any single continuation in (2) is not continuous with another one in (2) or with anything from (1) because a number one less than xxxxx0 will be xxxxy9, and a number one more than xxxxxx6 will be xxxxxx7, and hence be excluded.
Therefore, the question reduces to the following:
3)
Get all strings between "" to "66666" that do not include "6"
For each of them ("xxx"), output the string "(xxx6=6)"
4)
Get all strings between "" to "66666" that include at least one "6"
For each of them ("xxx"), output the string "(xxx0 THRU xxx6=6)"
The killer here is
numbers = (1..666666).to_a
Range supports iterations so you would be better off by going over the whole range and accumulating numbers that include your segments in blocks. When one block is finished and supplanted by another you could write it out.

How do I optimize this bit of ruby code to go faster?

It's an implementation of Sieve of Eratosthenes.
class PrimeGenerator
def self.get_primes_between( x, y)
sieve_array = Array.new(y) {|index|
(index == 0 ? 0 : index+1)
}
position_when_we_can_stop_checking = Math.sqrt(y).to_i
(2..position_when_we_can_stop_checking).each{|factor|
sieve_array[(factor).. (y-1)].each{|number|
sieve_array[number-1] = 0 if isMultipleOf(number, factor)
}
}
sieve_array.select{|element|
( (element != 0) && ( (x..y).include? element) )
}
end
def self.isMultipleOf(x, y)
return (x % y) == 0
end
end
Now I did this for a 'submit solutions to problems since you have time to kill' site. I chose ruby as my impl language.. however i was declared timed out.
I did some benchmarking
require 'benchmark'
Benchmark.bmbm do |x|
x.report ("get primes") { PrimeGenerator.get_primes_between(10000, 100000)}
end
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
L:\Gishu\Ruby>ruby prime_generator.rb
Rehearsal ----------------------------------------------
get primes 33.953000 0.047000 34.000000 ( 34.343750)
------------------------------------ total: 34.000000sec
user system total real
get primes 33.735000 0.000000 33.735000 ( 33.843750)
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32]
Rehearsal ----------------------------------------------
get primes 65.922000 0.000000 65.922000 ( 66.110000)
------------------------------------ total: 65.922000sec
user system total real
get primes 67.359000 0.016000 67.375000 ( 67.656000)
So I redid the thing in C# 2.0 / VS 2008 -->
722 milliseconds
So now this prods me into thinking is it a problem with my implementation or is the perf diff between languages this wide? (I was amazed with the 1.9 Ruby VM... until I had to go compare it with C# :)
UPDATE:
Turned out to be my "put-eratosthenes-to-shame-adaptation" after all :) Eliminating unnecessary loop iterations was the major optimization. In case anyone is interested in the details.. you can read it here; this question is too long anyways.
I'd start by looking at your inner loop. sieve_array[(factor).. (y-1)] is going to create a new array each time it's executed. Instead, try replacing it with a normal indexing loop.
Obviously each computer is going to benchmark this differently, but I was able to make this run approximately 50x faster on my machine (Ruby 1.8.6) by removing the looping on the array with an each block, and by causing the inner loop to check less numbers.
factor=2
while factor < position_when_we_can_stop_checking
number = factor
while number < y-1
sieve_array[number-1] = 0 if isMultipleOf(number, factor)
number = number + factor; # Was incrementing by 1, causing too many checks
end
factor = factor +1
end
I don't know how it compares for speed, but this is a fairly small and simple SoE implementation that works fine for me:
def sieve_to(n)
s = (0..n).to_a
s[0]=s[1]=nil
s.each do |p|
next unless p
break if p * p > n
(p*p).step(n, p) { |m| s[m] = nil }
end
s.compact
end
There are a few further little speedups possible, but I think it's pretty good.
They're not exactly equivalent, so your 10_000 to 1_000_000 would equate to
sieve_to(1_000_000) - sieve_to(9_999)
or something closely approximate.
Anyhow, on WinXP, with Ruby 1.8.6 (and fairly hefty Xeon CPUs) I get this:
require 'benchmark'
Benchmark.bm(30) do |r|
r.report("Mike") { a = sieve_to(10_000) - sieve_to(1_000) }
r.report("Gishu") { a = PrimeGenerator.get_primes_between( 1_000, 10_000) }
end
which gives
user system total real
Mike 0.016000 0.000000 0.016000 ( 0.016000)
Gishu 1.641000 0.000000 1.641000 ( 1.672000)
(I stopped running the one million case because I got bored waiting).
So I'd say it was your algorithm. ;-)
The C# solution is pretty much guaranteed to be orders of magnitude faster though.
The Sieve of Eratosthenes works fine as illustrative way to find primes, but I would implement it a little bit different. The essence is that you don't have to check numbers which are multiples of already known primes. Now, instead of using an array to store this information, you can also create a list of all sequential primes up to the square root of the number you are checking, and then it suffices to go through the list of primes to check for primes.
If you think of it, this does what you do on the image, but in a more "virtual" way.
Edit: Quickly hacked implementation of what I mean (not copied from the web ;) ):
public class Sieve {
private readonly List<int> primes = new List<int>();
private int maxProcessed;
public Sieve() {
primes.Add(maxProcessed = 2); // one could add more to speed things up a little, but one is required
}
public bool IsPrime(int i) {
// first check if we can compare against known primes
if (i <= primes[primes.Count-1]) {
return primes.BinarySearch(i) >= 0;
}
// if not, make sure that we got all primes up to the square of i
int maxFactor = (int)Math.Sqrt(i);
while (maxProcessed < maxFactor) {
maxProcessed++;
bool isPrime = true;
for (int primeIndex = 0; primeIndex < primes.Count; primeIndex++) {
int prime = primes[primeIndex];
if (maxProcessed % prime == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.Add(maxProcessed);
}
}
// now apply the sieve to the number to check
foreach (int prime in primes) {
if (i % prime == 0) {
return false;
}
if (prime > maxFactor) {
break;
}
}
return true;
}
}
Uses about 67ms on my slow machine.... test app:
class Program {
static void Main(string[] args) {
Stopwatch sw = new Stopwatch();
sw.Start();
Sieve sieve = new Sieve();
for (int i = 10000; i <= 100000; i++) {
sieve.IsPrime(i);
}
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds);
}
}
Benchmark it with ruby-prof. it can spit out things tools like kcachegrind can look at to see where your code is slow.
Then once you make the ruby fast, use RubyInline to optimize the method for you.
I would also note that Ruby, in my experience, is a lot slower on Windows systems than on *nix. I'm not sure what speed processor you have, of course, but running this code on my Ubuntu box in Ruby 1.9 took around 10 seconds, and 1.8.6 took 30.

Resources