A loop with many iterations makes execution incredibly slow

A loop with many iterations makes execution incredibly slow - ruby

Execution time for this code is around 1 second:
start_time = Time.now
prev = 1
(1..1000).each do |i|
(1..10000).each do |j|
result = j * prev
result = result + prev
result = result - prev
result = result / prev
prev = j
end
end
end_time = Time.now
printf('%f sec', end_time - start_time)
But when I use one loop with 10000000 iterations (instead of 2 loops, 1000 and 10000 iterations as written above), it becames much slower (around 4.5 seconds):
start_time = Time.now
prev = 1
(1..10000000).each do |j|
result = j * prev
result = result + prev
result = result - prev
result = result / prev
prev = j
end
end_time = Time.now
printf('%f sec', end_time - start_time)
Why is it happening? The total iterations number is still the same.

The second example processes much larger numbers than the first one (as #Sergii K commented above). It is possible that the second sample code reaches the maximum Fixnum limit on your system. On a 32-bit system, the maximum signed integer is 2**(32-1) - 1 = 2147483647 which is much less than the maximum product j * prev in the second example (as opposed to max products in the first example). In a situation like this ruby has to convert the Fixnums to Bignums internally which is why the second sample code may be slower than the first one.
On a 64-bit system I would expect both samples to run approximately the same time because the biggest integers will never reach the Fixnum limit. That is why perhaps most other commenters did not see a big difference in the timings.
Update: if the max Fixnum number is only 1073741823, as commented by the OP above, then it must mean that while the OS itself is 64-bits and perhaps the installed ruby is also a 64-bit ruby, it still uses only 4 bytes to store Fixnum numbers (instead of 8 in truly 64-bit rubies). The max integer value is much less then needed in the second example so it indeed has to convert the higher numbers to Bignums and that is where the slowness of the second sample comes from.
You can check this yourself if you compare:
(2**(0.size * 8 -2) -1).class # => Fixnum vs:
(2**(0.size * 8 -2) -1 + 1).class # => should be Bignum

Related

Fastest way to generate a kmer count vector from a nucleotide sequence (Julia)

Given a nucleotide sequence, I'm writing some Julia code to generate a sparse vector of (masked) kmer counts, and I would like it to run as fast as possible.
Here is my current implementation,
using Distributions
using SparseArrays
function kmer_profile(seq, k, mask)
basis = [4^i for i in (k - 1):-1:0]
d = Dict('A'=>0, 'C'=>1, 'G'=>2, 'T'=>3)
kmer_dict = Dict{Int, Int32}(4^k=>0)
for n in 1:(length(seq) - length(mask) + 1)
kmer_hash = 1
j = 1
for i in 1:length(mask)
if mask[i]
kmer_hash += d[seq[n+i-1]] * basis[j]
j += 1
end
end
haskey(kmer_dict, kmer_hash) ? kmer_dict[kmer_hash] += 1 : kmer_dict[kmer_hash] = 1
end
return sparsevec(kmer_dict)
end
seq = join(sample(['A','C','G','T'], 1000000))
mask_str = "111111011111001111111111111110"
mask = BitArray([parse(Bool, string(m)) for m in split(mask_str, "")])
k = sum(mask)
#time kmer_profile(seq, k, mask)
This code runs in about 0.3 seconds on my M1 MacBook Pro, is there any way to make it run significantly faster?
The function kmer_profile uses a sliding window of size length(mask) to count the number of times each masked kmer appears in the nucleotide sequence. A mask is a binary sequence, and a masked kmer is a kmer with nucleotides dropped at positions at which the mask is zero. E.g. the kmer ACGT and mask 1001 will produce the masked kmer AT.
To produce the kmer hash, the function treats each kmer as a base 4 number and then converts it to a (base 10) 64-bit integer, for indexing into the kmer vector.
The size of k is equal to the number of ones in the mask string, and is implicitly limited to 31 so that kmer hashes can fit into a 64-bit integer type.

There are several possible optimizations to make this code faster.
First of all, one can convert the Dict to an array since array-based indexing is faster than dictionary-based indexing one and this is possible here since the key is an ASCII character.
Moreover, the extraction of the sequence codes can be done once instead of length(mask) times by pre-computing code and putting the result in a temporary array.
Additionally, the mask-based conditional and the loop carried dependency make things slow. Indeed, the condition cannot be (easily) predicted by the processor causing it to stall for several cycles. The loop carried dependency make things even worse since the processor can hardly execute other instructions during this stall. This problem can be solved by pre-computing the factors based on both mask and basis. The result is a faster branch-less loop.
Once the above optimizations are done, the biggest bottleneck is sparsevec. In fact, it was also taking nearly half the time of the initial implementation! Optimizing this step is difficult but not impossible. It is slow because of random accesses in the Julia implementation. One can speed this up by sorting the keys-values pairs in the first place. It is faster due to a more cache-friendly execution and it can also help the prediction unit of the processor. This is a complex topic. For more details about how this works, please read Why is processing a sorted array faster than processing an unsorted array?.
Here is the final optimized code:
function kmer_profile_opt(seq, k, mask)
basis = [4^i for i in (k - 1):-1:0]
d = zeros(Int8, 128)
d[Int64('A')] = 0
d[Int64('C')] = 1
d[Int64('G')] = 2
d[Int64('T')] = 3
seq_codes = [d[Int8(e)] for e in seq]
j = 1
premult = zeros(Int64, length(mask))
for i in 1:length(mask)
if mask[i]
premult[i] = basis[j]
j += 1
end
end
kmer_dict = Dict{Int, Int32}(4^k=>0)
for n in 1:(length(seq) - length(mask) + 1)
kmer_hash = 1
j = 1
for i in 1:length(mask)
kmer_hash += seq_codes[n+i-1] * premult[i]
end
haskey(kmer_dict, kmer_hash) ? kmer_dict[kmer_hash] += 1 : kmer_dict[kmer_hash] = 1
end
sorted_kmer_pairs = sort(collect(kmer_dict))
sorted_kmer_keys = [e[1] for e in sorted_kmer_pairs]
sorted_kmer_values = [e[2] for e in sorted_kmer_pairs]
return sparsevec(sorted_kmer_keys, sorted_kmer_values)
end
This code is a bit more than twice faster than the initial implementation on my machine. A significant fraction of the time is still spent in the sorting algorithm.
The code can still be optimized further. One way is to use a parallel sort algorithm. Another way is to replace the premult[i] multiplication by a shift which is faster assuming premult[i] is modified so to contain exponents. I expect the code to be about 4 times faster than the original code. The main bottleneck should be the big dictionary creation. Improving further the performance of this is very hard (though it is still possible).

Inspired by Jérôme's answer, and squeezing some more by avoiding Dicts altogether:
function kmer_profile_opt3a(seq, k, mask)
d = zeros(Int8, 128)
d[Int64('A')] = 0
d[Int64('C')] = 1
d[Int64('G')] = 2
d[Int64('T')] = 3
seq_codes = [d[Int8(e)] for e in seq]
basis = [4^i for i in (k-1):-1:0]
j = 1
premult = zeros(Int64, length(mask))
for i in 1:length(mask)
if mask[i]
premult[i] = basis[j]
j += 1
end
end
kmer_vec = Vector{Int}(undef, length(seq)-length(mask)+1)
#inbounds for n in 1:(length(seq) - length(mask) + 1)
kmer_hash = 1
for i in 1:length(mask)
kmer_hash += seq_codes[n+i-1] * premult[i]
end
kmer_vec[n] = kmer_hash
end
sort!(kmer_vec)
return sparsevec(kmer_vec, ones(length(kmer_vec)), 4^k, +)
end
This achieved another 2x over Jérôme's answer on my machine.
The auto-combining feature of sparsevec makes the code a bit more compact.
Trying to slim the code further, and avoid unnecessary allocations in sparse vector creation, the following can be used:
using SparseArrays, LinearAlgebra
function specialsparsevec(nzs, n)
vals = Vector{Int}(undef, length(nzs))
j, k, count, last = (1, 1, 0, nzs[1])
while k <= length(nzs)
if nzs[k] == last
count += 1
else
vals[j], nzs[j] = (count, last)
count, last = (1, nzs[k])
j += 1
end
k += 1
end
vals[j], nzs[j] = (count, last)
resize!(nzs, j)
resize!(vals, j)
return SparseVector(n, nzs, vals)
end
function kmer_profile_opt3(seq, k, mask)
d = zeros(Int8, 128)
foreach(((i,c),) -> d[Int(c)]=i-1, enumerate(collect("ACGT")))
seq_codes = getindex.(Ref(d), Int8.(collect(seq)))
premult = foldr(
(i,(p,j))->(mask[i] && (p[i]=j ; j<<=2) ; (p,j)),
1:length(mask); init=(zeros(Int64,length(mask)),1)) |> first
kmer_vec = sort(
[ dot(#view(seq_codes[n:n+length(mask)-1]),premult) + 1 for
n in 1:(length(seq)-length(mask)+1)
])
return specialsparsevec(kmer_vec, 4^k)
end
This last version gets another 10% speedup (but is a little cryptic):
julia> #btime kmer_profile_opt($seq, $k, $mask);
367.584 ms (81 allocations: 134.71 MiB) # other answer
julia> #btime kmer_profile_opt3a($seq, $k, $mask);
140.882 ms (22 allocations: 54.36 MiB) # 1st this answer
julia> #btime kmer_profile_opt3($seq, $k, $mask);
127.016 ms (14 allocations: 27.66 MiB) # 2nd this answer

A faster alternative to all(a(:,i)==a,1) in MATLAB

It is a straightforward question: Is there a faster alternative to all(a(:,i)==a,1) in MATLAB?
I'm thinking of a implementation that benefits from short-circuit evaluations in the whole process. I mean, all() definitely benefits from short-circuit evaluations but a(:,i)==a doesn't.
I tried the following code,
% example for the input matrix
m = 3; % m and n aren't necessarily equal to those values.
n = 5000; % It's only possible to know in advance that 'm' << 'n'.
a = randi([0,5],m,n); % the maximum value of 'a' isn't necessarily equal to
% 5 but it's possible to state that every element in
% 'a' is a positive integer.
% all, equal solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%% all and equal solution %%%%%%%%%
ax_boo = all(a(:,i)==a,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
% alternative solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%%% alternative solution %%%%%%%%%%%
ax_boo = a(1,i) == a(1,:);
for k = 2:m
ax_boo(ax_boo) = a(k,i) == a(k,ax_boo);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
but it's intuitive that any "for-loop-solution" within the MATLAB environment will be naturally slower. I'm wondering if there is a MATLAB built-in function written in a faster language.
EDIT:
After running more tests I found out that the implicit expansion does have a performance impact in evaluating a(:,i)==a. If the matrix a has more than one row, all(repmat(a(:,i),[1,n])==a,1) may be faster than all(a(:,i)==a,1) depending on the number of columns (n). For n=5000 repmat explicit expansion has proved to be faster.
But I think that a generalization of Kenneth Boyd's answer is the "ultimate solution" if all elements of a are positive integers. Instead of dealing with a (m x n matrix) in its original form, I will store and deal with adec (1 x n matrix):
exps = ((0):(m-1)).';
base = max(a,[],[1,2]) + 1;
adec = sum( a .* base.^exps , 1 );
In other words, each column will be encoded to one integer. And of course adec(i)==adec is faster than all(a(:,i)==a,1).
EDIT 2:
I forgot to mention that adec approach has a functional limitation. At best, storing adec as uint64, the following inequality must hold base^m < 2^64 + 1.

Since your goal is to count the number of columns that match, my example converts the binary encoding to integer decimals, then you just loop over the possible values (with 3 rows that are 8 possible values) and count the number of matches.
a_dec = 2.^(0:(m-1)) * a;
num_poss_values = 2 ^ m;
num_matches = zeros(num_poss_values, 1);
for i = 1:num_poss_values
num_matches(i) = sum(a_dec == (i - 1));
end
On my computer, using 2020a, Here are the execution times for your first 2 options and the code above:
Elapsed time is 0.246623 seconds.
Elapsed time is 0.553173 seconds.
Elapsed time is 0.000289 seconds.
So my code is 853 times faster!
I wrote my code so it will work with m being an arbitrary integer.
The num_matches variable contains the number of columns that add up to 0, 1, 2, ...7 when converted to a decimal.

As an alternative you can use the third output of unique:
[~, ~, iu] = unique(a.', 'rows');
for i = 1:n
ax_boo = iu(i) == iu;
end
As indicated in a comment:
ax_boo isolates the indices of the columns I have to sum in a row vector b. So, basically the next line would be something like c = sum(b(ax_boo),2);
It is a typical usage of accumarray:
[~, ~, iu] = unique(a.', 'rows');
C = accumarray(iu,b);
for i = 1:n
c = C(i);
end

Schedule tasks in smallest interval: Algorithm

Given a char array representing tasks CPU need to do. It contains
capital letters A to Z where different letters represent different
tasks.Tasks could be done without original order. Each task could be
done in one interval. For each interval, CPU could finish one task or
just be idle.
However, there is a non-negative cooling interval n that means between
two same tasks, there must be at least n intervals that CPU are doing
different tasks or just be idle.
You need to return the least number of intervals the CPU will take to
finish all the given tasks. (source: https://leetcode.com/problems/task-scheduler/#/description)
My code:
def least_interval(tasks, n)
calculate_smallest_interval(tasks, n, 0, 0)
end
def calculate_smallest_interval(tasks, n, index, occupied_task_count, cool_down = Hash.new { |h, k| h[k] = 0})
return 0 if tasks.empty?
index = 0 if index == tasks.length && !tasks.empty? # reset to the front
cool_down.each_key do |k|
cool_down[k] -= 1
occupied_task_count -= 1 if cool_down[k] < 0
end
curr_task = tasks[index]
if cool_down[curr_task] > 0
calculate_smallest_interval(tasks, n, index + 1, occupied_task_count, cool_down)
elsif occupied_task_count == tasks.length
# idle period
# don't increment index here because reconsider curr element
1 + calculate_smallest_interval(tasks, n, index, occupied_task_count, cool_down)
else
cool_down[curr_task] = n # set the cooldown timer
occupied_task_count = 1 # increment number of tasks in hash
1 + calculate_smallest_interval(tasks[0...index] + tasks[index + 1..-1], n, index, occupied_task_count, cool_down)
end
end
Caveat:
I understand now that the insight of this problem is that you always want to take the task that has the highest remaining number of instances pending & isn't cooling down to get the best result. However, I'm still interested in understanding where my code is wrong otherwise in approach (since my output is far smaller than the optimal result).
Explanation:
I have a hash table that keeps track of all relevant tasks cooling down. I also have an index variable that increments to keep track of current task -- if current task is available, keep it.
For a large tasks array and high n value, my output is significantly lower than the optimal output (so I know I'm wrong). Here's a smaller but demonstrative input/output combo:
IN: tasks = ['A','A','A','B','B','B'], n = 2
Correct output: 8
Mine: 7
Could someone tell me where they see a logical error in reasoning?

Project Euler number 35 efficiency

https://projecteuler.net/problem=35
All problems on Project Euler are supposed to be solvable by a program in under 1 minute. My solution, however, has a runtime of almost 3 minutes. Other solutions I've seen online are similar to mine conceptually, but have runtimes that are exponentially faster. Can anyone help make my code more efficient/run faster?
Thanks!
#genPrimes takes an argument n and returns a list of all prime numbers less than n
def genPrimes(n):
primeList = [2]
number = 3
while(number < n):
isPrime = True
for element in primeList:
if element > number**0.5:
break
if number%element == 0 and element <= number**0.5:
isPrime = False
break
if isPrime == True:
primeList.append(number)
number += 2
return primeList
#isCircular takes a number as input and returns True if all rotations of that number are prime
def isCircular(prime):
original = prime
isCircular = True
prime = int(str(prime)[-1] + str(prime)[:len(str(prime)) - 1])
while(prime != original):
if prime not in primeList:
isCircular = False
break
prime = int(str(prime)[-1] + str(prime)[:len(str(prime)) - 1])
return isCircular
primeList = genPrimes(1000000)
circCount = 0
for prime in primeList:
if isCircular(prime):
circCount += 1
print circCount

Two modifications of your code yield a pretty fast solution (roughly 2 seconds on my machine):
Generating primes is a common problem with many solutions on the web. I replaced yours with rwh_primes1 from this article:
def genPrimes(n):
sieve = [True] * (n/2)
for i in xrange(3,int(n**0.5)+1,2):
if sieve[i/2]:
sieve[i*i/2::i] = [False] * ((n-i*i-1)/(2*i)+1)
return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]
It is about 65 times faster (0.04 seconds).
The most important step I'd suggest, however, is to filter the list of generated primes. Since each circularly shifted version of an integer has to be prime, the circular prime must not contain certain digits. The prime 23, e.g., can be easily spotted as an invalid candidate, because it contains a 2, which indicates divisibility by two when this is the last digit. Thus you might remove all such bad candidates by the following simple method:
def filterPrimes(primeList):
for i in primeList[3:]:
if '0' in str(i) or '2' in str(i) or '4' in str(i) \
or '5' in str(i) or '6' in str(i) or '8' in str(i):
primeList.remove(i)
return primeList
Note that the loop starts at the fourth prime number to avoid removing the number 2 or 5.
The filtering step takes most of the computing time (about 1.9 seconds), but reduces the number of circular prime candidates dramatically from 78498 to 1113 (= 98.5 % reduction)!
The last step, the circulation of each remaining candidate, can be done as you suggested. If you wish, you can simplify the code as follows:
circCount = sum(map(isCircular, primeList))
Due to the reduced candidate set this step is completed in only 0.03 seconds.

Difference in times is returning a negative number

I have this trial timer code to time euler solutions in Ruby.
$RUNS = 12
def run(solve)
times = []
$RUNS.times do
start_t = Time.now.usec
solve.call
end_t = Time.now.usec
times << (end_t - start_t)/1000.0
end
#times = times.delete_if {|i| i < 0}
puts times.inspect
times.sort
mean = times.inject{|a,c| a+c} / $RUNS
puts("Mean:\t#{mean}");
if (times.length % 2 == 0) then
median = (times[times.length / 2 - 1] + times[times.length / 2]) / 2.0
else
median = times[times.length / 2];
end
puts("Median: #{median}");
end
Unfortunately, I keep getting answers like this:
[409.805, 418.16, -582.23, 402.223, -581.94, 413.196, 426.816, -584.732, 519.457, -569.557, 558.918, -579.176]
What can I do to avoid these strange negative numbers?

usec returns the microseconds from the time in the same was as month returns the month. It is not the number of microseconds for the given time since the epoch.
So if start_t was 1049896564.259970 seconds and end_t was 1049896592.123130 seconds then you would get 123130 - 259970 if you subtracted the usecs. i.e. a negative number.
Instead you could use Time.now.to_f to convert to floating point number of seconds since epoch and subtract those from each other. You can also just subtract one Time object from another directly e.g.
start_t = Time.now
solve.call
end_t = Time.now
times << end_t - start_t

Current time in seconds since the Epoch:
Time.now.to_f
=> 1278631398.143
That should have microsecond resolution, despite only three decimal places being shown here.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

A loop with many iterations makes execution incredibly slow - ruby

Related

Fastest way to generate a kmer count vector from a nucleotide sequence (Julia)

A faster alternative to all(a(:,i)==a,1) in MATLAB

Schedule tasks in smallest interval: Algorithm

Project Euler number 35 efficiency

Difference in times is returning a negative number

Categories

Resources