convert 1mio Perl bitvector to .5 mio integer array - performance

I have bitvector coming from the database of length 1 Mio with two bits
each representing one integer number for compressed storage:
the bit string : 10110001 from the database
array 2 3 0 1 needed for further processing
The current solution is:
my $bitstring =
$sth->fetchrow_array(); # has 2 bits / snp, need 2 convert to I
my $snp_no = 1000000;
for ( my $i = 0; $i <= $snp_no - 1; $i++ ) {
my $A2 = substr ($bitstring ,$j,2);
$j = $j + 2;
my $vec = Bit::Vector->new_Bin(32, $A2);
#bitArray->[$i] = $vec->to_Dec();
}
This does work but is is waaay too slow: to process one such vector take a second
and with thousands of them the processing will take hours.
does someone have an idea how this can be made faster?

If you start with the data "packed", use the following:
my #decode =
map [
($_ >> 6) & 3,
($_ >> 4) & 3,
($_ >> 2) & 3,
($_ >> 0) & 3,
],
0x00..0xFF;
my #nums = map #{ $decode[$_] }, unpack 'C*', $bytes;
For me, this takes about roughly 1.1s for 1,000,000 bytes, which is to say 1.1 microseconds per byte.
A specialized pure C solution takes about half the time.
use Inline C => <<'__EOI__';
void decode(AV* av, SV* sv) {
STRLEN len;
U8* p = (U8*)SvPVbyte(sv, len);
av_fill(av, len*4);
av_clear(av);
while (len--) {
av_push(av, newSViv(*p >> 6 ));
av_push(av, newSViv(*p >> 4 & 3));
av_push(av, newSViv(*p >> 2 & 3));
av_push(av, newSViv(*(p++) & 3));
}
}
__EOI__
decode(\my #nums, $bytes);
If you start with the binary representation of the bits, use the following first:
my $bytes = packed('B*', $bits);
(This assumes the number of bits is divisible by 8. Left-pad with zeroes if it isn't, and don't forget to remove the extra entries this creates in #decode.)

Is this any faster?
#!/usr/bin/env perl
use warnings;
use strict;
my %bin2dec = (
'0' => 0,
'1' => 1,
'00' => 0,
'01' => 1,
'10' => 2,
'11' => 3
);
#warn "$_ => $bin2dec{$_}\n" for sort keys %bin2dec;
my #results;
while (<>)
{
foreach my $bitstring (/([01]+)/g)
{
my #result;
#warn "bitstring is $bitstring\n";
for ( my $i = 0 ; $i < length($bitstring) ; $i += 2 )
{
#warn "value is ", substr( $bitstring, $i, 2 ), "\n";
push( #result, $bin2dec{ substr( $bitstring, $i, 2 ) } );
}
push( #results, \#result );
}
}
foreach my $result (#results)
{
print join( ' ', #$result ), "\n";
}
saved to file b2dec. Example output:
$ echo 10010101010010010101001111001011010101w00101010 | b2dec
2 1 1 1 1 0 2 1 1 1 0 3 3 0 2 3 1 1 1
0 2 2 2
$ b2dec b2dec
0
0
1
1
0
0
1
1
2
3
1
0

Related

Numerical sequence of 1 2 4

I need help in providing an algorithm for a numerical sequence which should display a series of 1 2 4 and its consecutive summations.
e.g. If my input value is 20, it should display
1 2 4 8 9 11 15 16 18
Wherein
1 = 1
2 = 1 + 1
4 = 2 + 2
8 = 4 + 4
And the summation of 1 and 2 and 4 will repeat again starting with the present number which is 8 and so on..
9 = 8 + 1
11 = 9 + 2
15 = 11 + 4
16 = 15 + 1
18 = 16 + 2
As you can see, it should not proceed to 22 (18 + 4) since our sample input value is 20. I hope you guys get my point. I'm having a problem in designing the algorithms in the for loop. What I have now which is not working is
$input = 20;
for ($i = $i; $i < $input; $i = $i+$i) {
if($i==0){
$i = 4;
$i = $i - 3;
}elseif($i % 4 == 0){
$i = $i + 1;
}
print_r("this is \$i = $i<br><br>");
}
NOTE: Only one variable and one for loop is required, it will not be accepted if we use functions or arrays. Please help me, this is one of the most difficult problems I've encountered in PHP..
you can use the code
$input = 20;
$current = 1;
$val = 1;
while($val < $input){
print_r("this is \$val = $val\n");
$val = $val + $current;
$current = ($current == 4 ? 1 : $current*2);
}
see the online compiler
Since you have mentioned Only one variable and one for loop is required
Try this,
$input = 20;
for ($i = 1; $i < $input; $i) {
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+1;
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+2;
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+4;
}
Online Compiler
def getSeq(n):
if n == 1:
return [1]
temp = [1]
seq = [ 1, 2, 4]
count, current, prev = 0, 0, 1
while True:
current = prev + seq[count]
if current > n:
break
prev = current
temp += [current]
count = (count + 1) % 3
return temp
print getSeq(20)
I'm pretty sure that this one is going to work
the case that we have to take care of is n == 1 and return a static result [1].
in other cases the second value is repeating circularly and adding up to previous value.
This Python solution should be implementable in any reasonable language:
limit = 20
n = 1 << 2
while n >> 2 < limit:
print(n >> 2)
n = (((n >> 2) + (2 ** (n & 3))) << 2) + ((n & 3) + 1) % 3
Perl Equivalent (using the style of for loop you expect):
$limit = 20;
for ($n = 1 << 2; $n >> 2 < $limit; $n = ((($n >> 2) + (2 ** ($n & 3))) << 2) + (($n & 3) + 1) % 3) {
print($n >> 2, "\n");
}
OUTPUT
1
2
4
8
9
11
15
16
18
EXPLANATION
The basic solution is this:
limit = 20
n = 1
i = 0
while n < limit:
print(n)
n = n + (2 ** i)
i = (i + 1) % 3
But we need to eliminate the extra variable i. Since i only cycles through 0, 1 and 2 we can store it in two bits. So we shift n up two bits and store the value for i in the lower two bits of n, adjusting the code accordingly.
Not only one variable and one for loop, no if statements either!

Algorithm to determine when midnight occurs based on segments of a traveling bus

I am writing an importer that takes information from a bus company and provides it in the following format:
1. Stations are numbered with indexes from 0 to n (0,1,2,3,4,5... etc)
The provider sends a list of segments: 0->1,0->3,4->5, etc, which represent the trips between the stations. Each station has at least one segment provided.
Each segment has an integer that represents how many times time travels past midnight.
So here are a few examples:
Example 1:
0->2: 1
0->3: 1
1->2: 0
1->3: 0
Which actually means that midnight occurs only once between station 0 and station 1.
Example 2:
0->2: 1
1->3: 1
2->3: 0
Which means that midnight occurs only once between station 1 and station 2
It is possible that in some cases the information will not be enough to find all midnight crossings, in which case the destination should be skipped.
Is there an algorithm for discovering these things?
My attempts so far:
I discovered that if I lay out all of the stations for the second example like:
0--------1--------2--------3
Then I apply the maximum crossings for 0-2 and 1-3, this means that:
0->1 has between 0 and 1 crossings
1->2 has between 0 and 1 crossings
2->3 has between 0 and 1 crossings
After that I apply the third rule - 2->3 has 0 crossings, which leaves me with:
0->1: 0,1
1->2: 0,1
2->3: 0
Which gives me the following combinations:
0,0,0
0,1,1
1,1,0
1,0,0
Then I apply the rules again (position 1 + position 2 should be 1 and position 2 + position 3 should be 1) and I'm left with only:
0,1,0
Which means that midnight occurs once between station 1 and station 2
However, this method requires generating all possible combinations between the numbers, which is not applicable to a programming algorithm. There is a possibility that each segment will have 0,1,2,3 and with 20 stations, that would be 4 to the power of 20 combinations.
Does anyone have another idea on how to do this?
You could solve this as a system of simultaneous equations using Guaussian elimination.
The number of midnight crossings between adjacent stations are your variables, and your co-efficients will just be 1 for every pair of stations included in a segment and 0 otherwise.
Take the second example:
0->2: 1
1->3: 1
2->3: 0
Think of 0->1 as variable a, 1->2 as variable b and 2->3 as variable c, then you can rewrite as:
a + b = 1
b + c = 1
c = 0
or in matrix form as
[ 1 1 0 ] [ a ] [ 1 ]
[ 0 1 1 ] [ b ] = [ 1 ]
[ 0 0 1 ] [ c ] [ 0 ]
(Number of columns in matrix should equal number of pairs of adjacent stations, number of rows is number of equations you have). Solve for a, b, c to find the number of midnight crossings between each pair of stations.
You have an additional constraint that the values are non-negative, so for example if a + b = 0 you know that both a and b are zero because it's not possible for one to be positive and the other negative. So you can just add a = 0 and b = 0 as two more equations to your system.
After using #samgak 's solution and converting the segments to variables and creating the matrix, I found a programming algorithm that calculates the final result.
You can find the algorithm in multiple languages here: https://rosettacode.org/wiki/Gaussian_elimination
Here is the PHP answer (that's what I needed):
function swap_rows(&$a, &$b, $r1, $r2)
{
if ($r1 == $r2) return;
$tmp = $a[$r1];
$a[$r1] = $a[$r2];
$a[$r2] = $tmp;
$tmp = $b[$r1];
$b[$r1] = $b[$r2];
$b[$r2] = $tmp;
}
function gauss_eliminate($A, $b, $N)
{
for ($col = 0; $col < $N; $col++)
{
$j = $col;
$max = $A[$j][$j];
for ($i = $col + 1; $i < $N; $i++)
{
$tmp = abs($A[$i][$col]);
if ($tmp > $max)
{
$j = $i;
$max = $tmp;
}
}
swap_rows($A, $b, $col, $j);
for ($i = $col + 1; $i < $N; $i++)
{
$tmp = $A[$i][$col] / $A[$col][$col];
for ($j = $col + 1; $j < $N; $j++)
{
$A[$i][$j] -= $tmp * $A[$col][$j];
}
$A[$i][$col] = 0;
$b[$i] -= $tmp * $b[$col];
}
}
$x = array();
for ($col = $N - 1; $col >= 0; $col--)
{
$tmp = $b[$col];
for ($j = $N - 1; $j > $col; $j--)
{
$tmp -= $x[$j] * $A[$col][$j];
}
$x[$col] = $tmp / $A[$col][$col];
}
return $x;
}
function test_gauss()
{
$a = array(
array(1.00, 0.00, 0.00, 0.00, 0.00, 0.00),
array(1.00, 0.63, 0.39, 0.25, 0.16, 0.10),
array(1.00, 1.26, 1.58, 1.98, 2.49, 3.13),
array(1.00, 1.88, 3.55, 6.70, 12.62, 23.80),
array(1.00, 2.51, 6.32, 15.88, 39.90, 100.28),
array(1.00, 3.14, 9.87, 31.01, 97.41, 306.02)
);
$b = array( -0.01, 0.61, 0.91, 0.99, 0.60, 0.02 );
$x = gauss_eliminate($a, $b, 6);
ksort($x);
print_r($x);
}
test_gauss();

Maximum length of zigzag sequence

A sequence of integers is called zigzag sequence if each of its elements is either strictly less or strictly greater than its neighbors.
Example : The sequence 4 2 3 1 5 3 forms a zigzag, but 7 3 5 5 2 and 3 8 6 4 5 don't.
For a given array of integers we need to find the length of its largest (contiguous) sub-array that forms a zigzag sequence.
Can this be done in O(N) ?
Currently my solution is O(N^2) which is just simply taking every two points and checking each possible sub-array if it satisfies the condition or not.
I claim that the length of overlapping sequence of any 2 zigzag sub-sequences is a most 1
Proof by contradiction:
Assume a_i .. a_j is the longest zigzag sub-sequence, and there is another zigzag sub-sequence b_m...b_n overlapping it.
without losing of generality, let's say the overlapping part is
a_i ... a_k...a_j
--------b_m...b_k'...b_n
a_k = b_m, a_k+1 = b_m+1....a_j = b_k' where k'-m = j-k > 0 (at least 2 elements are overlapping)
Then they can merge to form a longer zig-zag sequence, contradiction.
This means the only case they can be overlapping each other is like
3 5 3 2 3 2 3
3 5 3 and 3 2 3 2 3 is overlapping at 1 element
This can still be solved in O(N) I believe, like just greedily increase the zig-zag length whenever possible. If fails, move iterator 1 element back and treat it as a new zig-zag starting point
Keep record the latest and longest zig-zag length you have found
Walk along the array and see if the current item belongs to (fits a definition of) a zigzag. Remember the las zigzag start, which is either the array's start or the first zigzag element after the most recent non-zigzag element. This and the current item define some zigzag subarray. When it appears longer than the previously found, store the new longest zigzag length. Proceed till the end of array and you should complete the task in O(N).
Sorry I use perl to write this.
#!/usr/bin/perl
#a = ( 5, 4, 2, 3, 1, 5, 3, 7, 3, 5, 5, 2, 3, 8, 6, 4, 5 );
$n = scalar #a;
$best_start = 0;
$best_end = 1;
$best_length = 2;
$start = 0;
$end = 1;
$direction = ($a[0] > $a[1]) ? 1 : ($a[0] < $a[1]) ? -1 : 0;
for($i=2; $i<$n; $i++) {
// a trick here, same value make $new_direction = $direction
$new_direction = ($a[$i-1] > $a[$i]) ? 1 : ($a[$i-1] < $a[$i]) ? -1 : $direction;
print "$a[$i-1] > $a[$i] : direction $new_direction Vs $direction\n";
if ($direction != $new_direction) {
$end = $i;
} else {
$this_length = $end - $start + 1;
if ($this_length > $best_length) {
$best_start = $start;
$best_end = $end;
$best_length = $this_length;
}
$start = $i-1;
$end = $i;
}
$direction = $new_direction;
}
$this_length = $end - $start + 1;
if ($this_length > $best_length) {
$best_start = $start;
$best_end = $end;
$best_length = $this_length;
}
print "BEST $best_start to $best_end length $best_length\n";
for ($i=$best_start; $i <= $best_end; $i++) {
print $a[$i], " ";
}
print "\n";
For each index i, you can find the smallest j such that the subarray with index j,j+1,...,i-1,i is a zigzag. This can be done in two phases:
Find the longest "increasing" zig zag (starts with a[1]>a[0]):
start = 0
increasing[0] = 0
sign = true
for (int i = 1; i < n; i ++)
if ((arr[i] > arr[i-1] && sign) || )arr[i] < arr[i-1] && !sign)) {
increasing[i] = start
sign = !sign
} else if (arr[i-1] < arr[i]) { //increasing and started last element
start = i-1
sign = false
increasing[i] = i-1
} else { //started this element
start = i
sign = true
increasing[i] = i
}
}
Do similarly for "decreasing" zig-zag, and you can find for each index the "earliest" possible start for a zig-zag subarray.
From there, finding the maximal possible zig-zag is easy.
Since all oporations are done in O(n), and you basically do one after the other, this is your complexity.
You can combine the both "increasing" and "decreasing" to one go:
start = 0
maxZigZagStart[0] = 0
sign = true
for (int i = 1; i < n; i ++)
if ((arr[i] > arr[i-1] && sign) || )arr[i] < arr[i-1] && !sign)) {
maxZigZagStart[i] = start
sign = !sign
} else if (arr[i-1] > arr[i]) { //decreasing:
start = i-1
sign = false
maxZigZagStart[i] = i-1
} else if (arr[i-1] < arr[i]) { //increasing:
start = i-1
sign = true
maxZigZagStart[i] = i-1
} else { //equality
start = i
//guess it is increasing, if it is not - will be taken care of next iteration
sign = true
maxZigZagStart[i] = i
}
}
You can see that you can actually even let go of maxZigZagStart aux array and stored local maximal length instead.
A sketch of simple one-pass algorithm. Cmp compares neighbour elements, returning -1, 0, 1 for less, equal and greater cases.
Zigzag ends for cases of Cmp transitions:
0 0
-1 0
1 0
Zigzag ends and new series starts:
0 -1
0 1
-1 -1
1 1
Zigzag series continues for transitions
-1 1
1 -1
Algo:
Start = 0
LastCmp = - Compare(A[i], A[i - 1]) //prepare to use the first element individually
MaxLen = 0
for i = 1 to N - 1 do
Cmp = Compare(A[i], A[i - 1]) //returns -1, 0, 1 for less, equal and greater cases
if Abs(Cmp - LastCmp) <> 2 then
//zigzag condition is violated, series ends, new series starts
MaxLen = Max(MaxLen, i - 1 - Start)
Start = i
//else series continues, nothing to do
LastCmp = Cmp
//check for ending zigzag
if LastCmp <> 0 then
MaxLen = Max(MaxLen, N - Start)
examples of output:
2 6 7 1 7 0 7 3 1 1 7 4
5 (7 1 7 0 7)
8 0 0 3 5 8
1
0 0 7 0
2
1 2 0 7 9
3
8 3 5 2
4
1 3 7 1 6 6
2
1 4 0 6 6 3 4 3 8 0 9 9
5
Lets consider sequence 5 9 3 4 5 4 2 3 6 5 2 1 3 as an example. You have a condition which every internal element of subsequence should satisfy (element is strictly less or strictly greater than its neighbors). Lets compute this condition for every element of the whole sequence:
5 9 3 6 5 7 2 3 6 5 2 1 3
0 1 1 1 1 1 1 0 1 0 0 1 0
The condition is undefined for outermost elements because they have only one neighbor each. But I defined it as 0 for convenience.
The longest subsequence of 1's (9 3 6 5 7 2) is the internal part of the longest zigzag subsequence (5 9 3 6 5 7 2 3). So the algorithm is:
Find the longest subsequence of elements satisfying condition.
Add to it one element to each side.
The first step can be done in O(n) by the following algorithm:
max_length = 0
current_length = 0
for i from 2 to len(a) - 1:
if a[i - 1] < a[i] > a[i + 1] or a[i - 1] > a[i] < a[i + 1]:
current_length += 1
else:
max_length = max(max_length, current_length)
current_length = 0
max_length = max(max_length, current_length)
The only special case is if the sequence total length is 0 or 1. Then the whole sequence would be the longest zigzag subsequence.
#include "iostream"
using namespace std ;
int main(){
int t ; scanf("%d",&t) ;
while(t--){
int n ; scanf("%d",&n) ;
int size1 = 1 , size2 = 1 , seq1 , seq2 , x ;
bool flag1 = true , flag2 = true ;
for(int i=1 ; i<=n ; i++){
scanf("%d",&x) ;
if( i== 1 )seq1 = seq2 = x ;
else {
if( flag1 ){
if( x>seq1){
size1++ ;
seq1 = x ;
flag1 = !flag1 ;
}
else if( x < seq1 )
seq1 = x ;
}
else{
if( x<seq1){
size1++ ;
seq1=x ;
flag1 = !flag1 ;
}
else if( x > seq1 )
seq1 = x ;
}
if( flag2 ){
if( x < seq2 ){
size2++ ;
seq2=x ;
flag2 = !flag2 ;
}
else if( x > seq2 )
seq2 = x ;
}
else {
if( x > seq2 ){
size2++ ;
seq2 = x ;
flag2 = !flag2 ;
}
else if( x < seq2 )
seq2 = x ;
}
}
}
printf("%d\n",max(size1,size2)) ;
}
return 0 ;
}

Finding the GCD of two numbers quickly

Are there any ways to make this program faster? I am thinking about some faster tools for user input etc.
Here is my code:
sub partia {
my ( $u, $v ) = #_;
if ( $u == $v ) { return $u }
if ( $u == 0 ) { return $v }
if ( $v == 0 ) { return $u }
if ( ~$u & 1 ) {
if ( $v & 1 ) {
return partia( $u >> 1, $v );
}
else {
return partia( $u >> 1, $v >> 1 ) << 1;
}
}
if ( ~$v & 1 ) {
return partia( $u, $v >> 1 );
}
if ( $u > $v ) {
return partia( ( $u - $v ) >> 1, $v );
}
return partia( ( $v - $u ) >> 1, $u );
}
sub calosc {
$t = <>;
while ($t) {
#tab = split( /\s+/, <> );
print( partia( $tab[0], $tab[1] ), "\n" );
$t--;
}
}
calosc();
How does program works :
Generally it returns greatest common divisor for 2 numbers inputed by user. It's mostly Stein's algorithm.
INPUT :
First line:
How many pairs user wants to check.[enter]
Second line:
first number [space] second number[enter]
OUTPUT:
GCD[enter]
In Python I would use things like :
from sys import stdin
t=int(stdin.readline())
instead of
t=input()
Is there any way to do it?
Your solution — Recursive Stein's Algorithm
It appears that you're simply trying to get the GCD of two numbers, and wanting to do so quickly.
You're apparently using the recursive version of the Binary GCD Algorithm. Typically speaking, it is much better to use an iterative algorithm for both speed and scalability. However, I would assert that it is almost certainly worth it to try the much simpler Euclidean algorithm first.
Alternatives — Iterative Stein's Algorithm and Basic Euclidean Algorithm
I've adapted your script to take 3 number pairs from the __DATA__ block as input. The first pair are just two small numbers, then I have two numbers from the Fibonacci Sequence, and finally two larger numbers including some shared powers of two.
I then coded two new subroutines. One of them uses the Iterative Stein's Algorithm (the method your using), and the other is just a simple Euclidean Algorithm. Benchmarking your partia subroutine versus my two subroutine for 1 million iterations report that the iterative is 50% faster, and that Euclid is 3 times faster.
use strict;
use warnings;
use Benchmark;
#use Math::Prime::Util::GMP qw(gcd);
# Original solution
# - Stein's Algorithm (recursive)
sub partia {
my ( $u, $v ) = #_;
if ( $u == $v ) { return $u }
if ( $u == 0 ) { return $v }
if ( $v == 0 ) { return $u }
if ( ~$u & 1 ) {
if ( $v & 1 ) {
return partia( $u >> 1, $v );
}
else {
return partia( $u >> 1, $v >> 1 ) << 1;
}
}
if ( ~$v & 1 ) {
return partia( $u, $v >> 1 );
}
if ( $u > $v ) {
return partia( ( $u - $v ) >> 1, $v );
}
return partia( ( $v - $u ) >> 1, $u );
}
# Using Euclidian Algorithm
sub euclid {
my ( $quotient, $divisor ) = #_;
return $divisor if $quotient == 0;
return $quotient if $divisor == 0;
while () {
my $remainder = $quotient % $divisor;
return $divisor if $remainder == 0;
$quotient = $divisor;
$divisor = $remainder;
}
}
# Stein's Algorithm (Iterative)
sub stein {
my ($u, $v) = #_;
# GCD(0,v) == v; GCD(u,0) == u, GCD(0,0) == 0
return $v if $u == 0;
return $u if $v == 0;
# Remove all powers of 2 shared by U and V
my $twos = 0;
while ((($u | $v) & 1) == 0) {
$u >>= 1;
$v >>= 1;
++$twos;
}
# Remove Extra powers of 2 from U. From here on, U is always odd.
$u >>= 1 while ($u & 1) == 0;
do {
# Remove all factors of 2 in V -- they are not common
# Note: V is not zero, so while will terminate
$v >>= 1 while ($v & 1) == 0;
# Now U and V are both odd. Swap if necessary so U <= V,
# then set V = V - U (which is even). For bignums, the
# swapping is just pointer movement, and the subtraction
# can be done in-place.
($u, $v) = ($v, $u) if $u > $v;
$v -= $u;
} while ($v != 0);
return $u << $twos;
}
# Process 3 pairs of numbers
my #nums;
while (<DATA>) {
my ($num1, $num2) = split;
# print "Numbers = $num1, $num2\n";
# print ' partia = ', partia($num1, $num2), "\n";
# print ' euclid = ', euclid($num1, $num2), "\n";
# print ' stein = ', stein($num1, $num2), "\n";
# print ' gcd = ', gcd($num1, $num2), "\n\n";
push #nums, [$num1, $num2];
}
# Benchmark!
timethese(1_000_000, {
'Partia' => sub { partia(#$_) for #nums },
'Euclid' => sub { euclid(#$_) for #nums },
'Stein' => sub { stein(#$_) for #nums },
# 'GCD' => sub { gcd(#$_) for #nums },
});
__DATA__
20 25 # GCD of 5
89 144 # GCD of Fibonacci numbers = 1
4789084 957196 # GCD of 388 = 97 * 2 * 2
Outputs:
Benchmark: timing 1000000 iterations of Euclid, Partia, Stein...
Euclid: 9 wallclock secs ( 8.31 usr + 0.00 sys = 8.31 CPU) # 120279.05/s (n=1000000)
Partia: 26 wallclock secs (26.00 usr + 0.00 sys = 26.00 CPU) # 38454.14/s (n=1000000)
Stein: 18 wallclock secs (17.36 usr + 0.01 sys = 17.38 CPU) # 57544.02/s (n=1000000)
Module Solution — Math::Prime::Util::GMP qw(gcd)
The fastest solutions are likely to be C implementations of these algorithms though. I therefore recommend finding already coded versions like that provided by Math::Prime::Util::GMP.
Running benchmarks including this new function shows that it is twice again as fast as the basic Euclidean algorithm that I programmed:
Benchmark: timing 1000000 iterations of Euclid, GCD, Partia, Stein...
Euclid: 8 wallclock secs ( 8.32 usr + 0.00 sys = 8.32 CPU) # 120264.58/s (n=1000000)
GCD: 3 wallclock secs ( 3.93 usr + 0.00 sys = 3.93 CPU) # 254388.20/s (n=1000000)
Partia: 26 wallclock secs (25.94 usr + 0.00 sys = 25.94 CPU) # 38546.04/s (n=1000000)
Stein: 18 wallclock secs (17.55 usr + 0.00 sys = 17.55 CPU) # 56976.81/s (n=1000000)
Unless I've completely forgotten what I'm doing (no promises) - this algorithm looks like it keeps dividing it's terms by 2 in each recurse, which means your algorithm is O(log-base2-N). Unless you can find a constant-time algorithm, you've probably got the best one at the moment.
Now #ikegami has mentioned micro-optimizations...if you want to make those, I suggest that you check out Devel::NYTProf for an awesome Perl profiler that should be able to tell you where you're spending time in your algorithm, so you can target your microoptimisations.

Tournament bracket placement algorithm

Given a list of opponent seeds (for example seeds 1 to 16), I'm trying to write an algorithm that will result in the top seed playing the lowest seed in that round, the 2nd seed playing the 2nd-lowest seed, etc.
Grouping 1 and 16, 2 and 15, etc. into "matches" is fairly easy, but I also need to make sure that the higher seed will play the lower seed in subsequent rounds.
An example bracket with the correct placement:
1 vs 16
1 vs 8
8 vs 9
1 vs 4
4 vs 13
4 vs 5
5 vs 12
1 vs 2
2 vs 15
2 vs 7
7 vs 10
2 vs 3
3 vs 14
3 vs 6
6 vs 11
As you can see, seed 1 and 2 only meet up in the final.
This JavaScript returns an array where each even index plays the next odd index
function seeding(numPlayers){
var rounds = Math.log(numPlayers)/Math.log(2)-1;
var pls = [1,2];
for(var i=0;i<rounds;i++){
pls = nextLayer(pls);
}
return pls;
function nextLayer(pls){
var out=[];
var length = pls.length*2+1;
pls.forEach(function(d){
out.push(d);
out.push(length-d);
});
return out;
}
}
> seeding(2)
[1, 2]
> seeding(4)
[1, 4, 2, 3]
> seeding(8)
[1, 8, 4, 5, 2, 7, 3, 6]
> seeding(16)
[1, 16, 8, 9, 4, 13, 5, 12, 2, 15, 7, 10, 3, 14, 6, 11]
With your assumptions, players 1 and 2 will play in the final, players 1-4 in the semifinals, players 1-8 in the quarterfinals and so on, so you can build the tournament recursively backwards from the final as AakashM proposed. Think of the tournament as a tree whose root is the final.
In the root node, your players are {1, 2}.
To expand the tree recursively to the next level, take all the nodes on the bottom layer in the tree, one by one, and create two children for them each, and place one of the players of the original node to each one of the child nodes created. Then add the next layer of players and map them to the game so that the worst newly added player plays against the best pre-existing player and so on.
Here first rounds of the algorithm:
{1,2} --- create next layer
{1, _}
/ --- now fill the empty slots
{1,2}
\{2, _}
{1, 4} --- the slots filled in reverse order
/
{1,2}
\{2, 3} --- create next layer again
/{1, _}
{1, 4}
/ \{4, _}
{1,2} --- again fill
\ /{2, _}
{2, 3}
\{3, _}
/{1, 8}
{1, 4}
/ \{4, 5} --- ... and so on
{1,2}
\ /{2, 7}
{2, 3}
\{3, 6}
As you can see, it produces the same tree you posted.
I've come up with the following algorithm. It may not be super-efficient, but I don't think that it really needs to be. It's written in PHP.
<?php
$players = range(1, 32);
$count = count($players);
$numberOfRounds = log($count / 2, 2);
// Order players.
for ($i = 0; $i < $numberOfRounds; $i++) {
$out = array();
$splice = pow(2, $i);
while (count($players) > 0) {
$out = array_merge($out, array_splice($players, 0, $splice));
$out = array_merge($out, array_splice($players, -$splice));
}
$players = $out;
}
// Print match list.
for ($i = 0; $i < $count; $i++) {
printf('%s vs %s<br />%s', $players[$i], $players[++$i], PHP_EOL);
}
?>
I also wrote a solution written in PHP. I saw Patrik Bodin's answer, but thought there must be an easier way.
It does what darkangel asked for: It returns all seeds in the correct positions. The matches are the same as in his example, but in a prettier order, seed 1 and seed number 16 are on the outside of the schema (as you see in tennis tournaments).
If there are no upsets (meaning a higher seeded player always wins from a lower seeded player), you will end up with seed 1 vs seed 2 in the final.
It actually does two things more:
It shows the correct order (which is a requirement for putting byes in the correct positions)
It fills in byes in the correct positions (if required)
A perfect explanation about what a single elimination bracket should look like: http://blog.playdriven.com/2011/articles/the-not-so-simple-single-elimination-advantage-seeding/
Code example for 16 participants:
<?php
define('NUMBER_OF_PARTICIPANTS', 16);
$participants = range(1,NUMBER_OF_PARTICIPANTS);
$bracket = getBracket($participants);
var_dump($bracket);
function getBracket($participants)
{
$participantsCount = count($participants);
$rounds = ceil(log($participantsCount)/log(2));
$bracketSize = pow(2, $rounds);
$requiredByes = $bracketSize - $participantsCount;
echo sprintf('Number of participants: %d<br/>%s', $participantsCount, PHP_EOL);
echo sprintf('Number of rounds: %d<br/>%s', $rounds, PHP_EOL);
echo sprintf('Bracket size: %d<br/>%s', $bracketSize, PHP_EOL);
echo sprintf('Required number of byes: %d<br/>%s', $requiredByes, PHP_EOL);
if($participantsCount < 2)
{
return array();
}
$matches = array(array(1,2));
for($round=1; $round < $rounds; $round++)
{
$roundMatches = array();
$sum = pow(2, $round + 1) + 1;
foreach($matches as $match)
{
$home = changeIntoBye($match[0], $participantsCount);
$away = changeIntoBye($sum - $match[0], $participantsCount);
$roundMatches[] = array($home, $away);
$home = changeIntoBye($sum - $match[1], $participantsCount);
$away = changeIntoBye($match[1], $participantsCount);
$roundMatches[] = array($home, $away);
}
$matches = $roundMatches;
}
return $matches;
}
function changeIntoBye($seed, $participantsCount)
{
//return $seed <= $participantsCount ? $seed : sprintf('%d (= bye)', $seed);
return $seed <= $participantsCount ? $seed : null;
}
?>
The output:
Number of participants: 16
Number of rounds: 4
Bracket size: 16
Required number of byes: 0
C:\projects\draw\draw.php:7:
array (size=8)
0 =>
array (size=2)
0 => int 1
1 => int 16
1 =>
array (size=2)
0 => int 9
1 => int 8
2 =>
array (size=2)
0 => int 5
1 => int 12
3 =>
array (size=2)
0 => int 13
1 => int 4
4 =>
array (size=2)
0 => int 3
1 => int 14
5 =>
array (size=2)
0 => int 11
1 => int 6
6 =>
array (size=2)
0 => int 7
1 => int 10
7 =>
array (size=2)
0 => int 15
1 => int 2
If you change 16 into 6 you get:
Number of participants: 6
Number of rounds: 3
Bracket size: 8
Required number of byes: 2
C:\projects\draw\draw.php:7:
array (size=4)
0 =>
array (size=2)
0 => int 1
1 => null
1 =>
array (size=2)
0 => int 5
1 => int 4
2 =>
array (size=2)
0 => int 3
1 => int 6
3 =>
array (size=2)
0 => null
1 => int 2
# Here's one in python - it uses nested list comprehension to be succinct:
from math import log, ceil
def seed( n ):
""" returns list of n in standard tournament seed order
Note that n need not be a power of 2 - 'byes' are returned as zero
"""
ol = [1]
for i in range( ceil( log(n) / log(2) ) ):
l = 2*len(ol) + 1
ol = [e if e <= n else 0 for s in [[el, l-el] for el in ol] for e in s]
return ol
For JavaScript code, use one of the two functions below. The former embodies imperative style & is much faster. The latter is recursive & neater, but only applicable to relatively small number of teams (<16384).
// imperative style
function foo(n) {
const arr = new Array(n)
arr[0] = 0
for (let i = n >> 1, m = 1; i >= 1; i >>= 1, m = (m << 1) + 1) {
for (let j = n - i; j > 0; j -= i) {
arr[j] = m - arr[j -= i]
}
}
return arr
}
Here you fill in the spots one by one by mirroring already occupied ones. For example, the first-seeded team (that is number 0) goes to the topmost spot. The second one (1) occupies the opposite spot in the other half of the bracket. The third team (2) mirrors 1 in their half of the bracket & so on. Despite the nested loops, the algorithm has a linear time complexity depending on the number of teams.
Here is the recursive method:
// functional style
const foo = n =>
n === 1 ? [0] : foo(n >> 1).reduce((p, c) => [...p, c, n - c - 1], [])
Basically, you do the same mirroring as in the previous function, but recursively:
For n = 1 team, it's just [0].
For n = 2 teams, you apply this function to the argument n-1 (that is,
1) & get [0]. Then you double the array by inserting mirrored
elements between them at even positions. Thus, [0] becomes [0, 1].
For n = 4 teams, you do the same operation, so [0, 1] becomes [0, 3,
1, 2].
If you want to get human-readable output, increase each element of the resulting array by one:
const readableArr = arr.map(i => i + 1)
At each round sort teams by seeding criteria
(If there are n teams in a round)team at ith position plays with team n-i+1
Since this comes up when searching on the subject, and it's hopeless to find another answer that solves the problem AND puts the seeds in a "prettier" order, I will add my version of the PHP code from darkangel. I also added the possibility to give byes to the higher seed players.
This was coded in an OO environment, so the number of participants are in $this->finalists and the number of byes are in $this->byes. I have only tested the code without byes and with two byes.
public function getBracket() {
$players = range(1, $this->finalists);
for ($i = 0; $i < log($this->finalists / 2, 2); $i++) {
$out = array();
$reverse = false;
foreach ($players as $player) {
$splice = pow(2, $i);
if ($reverse) {
$out = array_merge($out, array_splice($players, -$splice));
$out = array_merge($out, array_splice($players, 0, $splice));
$reverse = false;
} else {
$out = array_merge($out, array_splice($players, 0, $splice));
$out = array_merge($out, array_splice($players, -$splice));
$reverse = true;
}
}
$players = $out;
}
if ($this->byes) {
for ($i = 0; $i < $this->byes; $i++ ) {
for ($j = (($this->finalists / pow(2, $i)) - 1); $j > 0; $j--) {
$newPlace = ($this->finalists / pow(2, $i)) - 1;
if ($players[$j] > ($this->finalists / (pow(2 ,($i + 1))))) {
$player = $players[$j];
unset($players[$j]);
array_splice($players, $newPlace, 0, $player);
}
}
}
for ($i = 0; $i < $this->finalists / (pow(2, $this->byes)); $i++ ) {
$swap[] = $players[$i];
}
for ($i = 0; $i < $this->finalists /(pow(2, $this->byes)); $i++ ) {
$players[$i] = $swap[count($swap) - 1 - $i];
}
return array_reverse($players);
}
return $players;
}
I worked on a PHP / Laravel plugin that generates brackets with / without preliminary round robin. Maybe it can be useful to you, I don't know what tech you are using. Here is the github.
https://github.com/xoco70/kendo-tournaments
Hope it helps!
A C version.
int * pctournamentSeedArray(int PlayerCnt)
{
int * Array;
int * PrevArray;
int i;
Array = meAlloc(sizeof(int) * PlayerCnt);
if (PlayerCnt == 2)
{
Array[0] = 0;
Array[1] = 1;
return Array;
}
PrevArray = pctournamentSeedArray(PlayerCnt / 2);
for (i = 0; i < PlayerCnt;i += 2)
{
Array[i] = PrevArray[i / 2];
Array[i + 1] = (PlayerCnt - 1) - Array[i] ;
}
meFree(PrevArray);
return Array;
}

Resources