Tune my code to solve a puzzle - performance

5 monkey share n peaches, they cannot distribute equally. So the first monkey dump 1 peach, and total number of peaches can be divided by 5, and the first monkey took his part.
Then is the second monkey, -1 peach, can be divided by 5 and took his part.
Until the fifth monkey finished all the steps. There may be some peaches still left.
Give the minimum number of peaches that satisfy this condition.
perl code 1:
#!/usr/bin/perl -w
for $n (0..10000){ #this is basic idea but code is too messy !
if( ($n-1) % 5 == 0 ){
$remain = 4/5 * ($n -1 );
if( ($remain - 1) % 5 == 0){
$remain = 4/5 * ($remain -1 );
if( ($remain - 1) % 5 == 0){
$remain = 4/5 * ($remain -1 );
if( ($remain - 1) % 5 == 0){
$remain = 4/5 * ($remain -1 );
if( ($remain - 1) % 5 == 0){
$remain = 4/5 * ($remain -1 );
print "remain: $remain original: $n\n";
}
}
}
}
}
}
perl code 2:
sub doit($){
($n) = #_;
if( ($n - 1) % 5 ==0 ){ #if can be distributed by 5 monkey
$n = ($n - 1) * 4/5; #commit distribute
return $n;
}else{
return -1; #fail
}
}
for $n (0..10000){ #restriction
$r = $n; #"recursively" find solution
$o = $n; #backup n
$count = 0;
for ($i = 0; $i < 5; $i++){ #assume there is 5 monkey, it can be changed
$r = doit($r);
if($r == -1){ #skip once fail
last;
}
$count++;
}
if($count == 5){ # if pass 5 test, then you found the number !
print "now ".$r."\n";
print "origin ".$o."\n";
}
}
I am thinking to cut some code. But felt hard. Can anyone help ?

First of all, you really should use strict and warnings pragmas at the top of your scripts. Your $n usage is especially worrisome. In the future, if you declare variables with my but use the same name, you convey the fact that they will represent the same quantity, without the fear that they might collide.
Anyway here is a slightly polished, and more importantly strict and warnings safe version:
#!/usr/bin/env perl
use strict;
use warnings;
sub doit {
my ($n) = #_;
if( ($n - 1) % 5 ==0 ){ #if can be distributed by 5 monkey
$n = ($n - 1) * 4/5; #commit distribute
return $n;
} else {
return undef; #fail
}
}
OUTER: for my $n (0..10000){ #restriction
my $r = $n; #"recursively" find solution
for (1..5){ #assume there is 5 monkey, it can be changed
$r = doit($r);
next OUTER unless defined $r;
}
# if code gets here, then it passed 5 test, then you found the number !
print "now: $r\torigin: $n\n";
}
And now, if you really want to be fun with it (don't use this in production, readability first! ):
#!/usr/bin/env perl
use strict;
use warnings;
OUTER: for my $n (0..10000){
my $r = $n;
$r = ($r - 1) % 5 ? next OUTER : 4/5 * ($r - 1) for (1..5);
print "now: $r\torigin: $n\n";
}
or even golfed:
for(0..10000){$r=$n=$_;map$r*=--$r%5?next:4/5,1..5;print"now: $r\torigin: $n\n"}

Consider this solution:
sub share {
($_[0] - 1) % 5 == 0 ? ($_[0]-1)/5*4 : die "unable to share";
}
for my $i (1..10000) {
eval {
share(share(share(share(share($i)))));
};
unless ($#) {
print "solution: $i\n";
last;
}
}
I'm sure there is a monad lurking within.

I'm not 100% sure I understand your question, but instead of searching for the answer, start with the last monkey. The minimum peaches he could take is 1, and even though there could be some left, to get the minimum, assume there are 0 left. Now, calculate how many peaches the second to last monkey saw, and so on.
There is no need to loop, if you start from the last monkey
# set numPeaches to what the last monkey had
$numPeaches = 1;
# then, figure out how many the second to last monkey had, and add to numPeaches
# and, so on ...
# until you get to the first monkey

Related

Numerical sequence of 1 2 4

I need help in providing an algorithm for a numerical sequence which should display a series of 1 2 4 and its consecutive summations.
e.g. If my input value is 20, it should display
1 2 4 8 9 11 15 16 18
Wherein
1 = 1
2 = 1 + 1
4 = 2 + 2
8 = 4 + 4
And the summation of 1 and 2 and 4 will repeat again starting with the present number which is 8 and so on..
9 = 8 + 1
11 = 9 + 2
15 = 11 + 4
16 = 15 + 1
18 = 16 + 2
As you can see, it should not proceed to 22 (18 + 4) since our sample input value is 20. I hope you guys get my point. I'm having a problem in designing the algorithms in the for loop. What I have now which is not working is
$input = 20;
for ($i = $i; $i < $input; $i = $i+$i) {
if($i==0){
$i = 4;
$i = $i - 3;
}elseif($i % 4 == 0){
$i = $i + 1;
}
print_r("this is \$i = $i<br><br>");
}
NOTE: Only one variable and one for loop is required, it will not be accepted if we use functions or arrays. Please help me, this is one of the most difficult problems I've encountered in PHP..
you can use the code
$input = 20;
$current = 1;
$val = 1;
while($val < $input){
print_r("this is \$val = $val\n");
$val = $val + $current;
$current = ($current == 4 ? 1 : $current*2);
}
see the online compiler
Since you have mentioned Only one variable and one for loop is required
Try this,
$input = 20;
for ($i = 1; $i < $input; $i) {
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+1;
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+2;
if($i>$input) break;
print_r("this is \$i = $i<br><br>");
$i=$i+4;
}
Online Compiler
def getSeq(n):
if n == 1:
return [1]
temp = [1]
seq = [ 1, 2, 4]
count, current, prev = 0, 0, 1
while True:
current = prev + seq[count]
if current > n:
break
prev = current
temp += [current]
count = (count + 1) % 3
return temp
print getSeq(20)
I'm pretty sure that this one is going to work
the case that we have to take care of is n == 1 and return a static result [1].
in other cases the second value is repeating circularly and adding up to previous value.
This Python solution should be implementable in any reasonable language:
limit = 20
n = 1 << 2
while n >> 2 < limit:
print(n >> 2)
n = (((n >> 2) + (2 ** (n & 3))) << 2) + ((n & 3) + 1) % 3
Perl Equivalent (using the style of for loop you expect):
$limit = 20;
for ($n = 1 << 2; $n >> 2 < $limit; $n = ((($n >> 2) + (2 ** ($n & 3))) << 2) + (($n & 3) + 1) % 3) {
print($n >> 2, "\n");
}
OUTPUT
1
2
4
8
9
11
15
16
18
EXPLANATION
The basic solution is this:
limit = 20
n = 1
i = 0
while n < limit:
print(n)
n = n + (2 ** i)
i = (i + 1) % 3
But we need to eliminate the extra variable i. Since i only cycles through 0, 1 and 2 we can store it in two bits. So we shift n up two bits and store the value for i in the lower two bits of n, adjusting the code accordingly.
Not only one variable and one for loop, no if statements either!

Algorithm to determine when midnight occurs based on segments of a traveling bus

I am writing an importer that takes information from a bus company and provides it in the following format:
1. Stations are numbered with indexes from 0 to n (0,1,2,3,4,5... etc)
The provider sends a list of segments: 0->1,0->3,4->5, etc, which represent the trips between the stations. Each station has at least one segment provided.
Each segment has an integer that represents how many times time travels past midnight.
So here are a few examples:
Example 1:
0->2: 1
0->3: 1
1->2: 0
1->3: 0
Which actually means that midnight occurs only once between station 0 and station 1.
Example 2:
0->2: 1
1->3: 1
2->3: 0
Which means that midnight occurs only once between station 1 and station 2
It is possible that in some cases the information will not be enough to find all midnight crossings, in which case the destination should be skipped.
Is there an algorithm for discovering these things?
My attempts so far:
I discovered that if I lay out all of the stations for the second example like:
0--------1--------2--------3
Then I apply the maximum crossings for 0-2 and 1-3, this means that:
0->1 has between 0 and 1 crossings
1->2 has between 0 and 1 crossings
2->3 has between 0 and 1 crossings
After that I apply the third rule - 2->3 has 0 crossings, which leaves me with:
0->1: 0,1
1->2: 0,1
2->3: 0
Which gives me the following combinations:
0,0,0
0,1,1
1,1,0
1,0,0
Then I apply the rules again (position 1 + position 2 should be 1 and position 2 + position 3 should be 1) and I'm left with only:
0,1,0
Which means that midnight occurs once between station 1 and station 2
However, this method requires generating all possible combinations between the numbers, which is not applicable to a programming algorithm. There is a possibility that each segment will have 0,1,2,3 and with 20 stations, that would be 4 to the power of 20 combinations.
Does anyone have another idea on how to do this?
You could solve this as a system of simultaneous equations using Guaussian elimination.
The number of midnight crossings between adjacent stations are your variables, and your co-efficients will just be 1 for every pair of stations included in a segment and 0 otherwise.
Take the second example:
0->2: 1
1->3: 1
2->3: 0
Think of 0->1 as variable a, 1->2 as variable b and 2->3 as variable c, then you can rewrite as:
a + b = 1
b + c = 1
c = 0
or in matrix form as
[ 1 1 0 ] [ a ] [ 1 ]
[ 0 1 1 ] [ b ] = [ 1 ]
[ 0 0 1 ] [ c ] [ 0 ]
(Number of columns in matrix should equal number of pairs of adjacent stations, number of rows is number of equations you have). Solve for a, b, c to find the number of midnight crossings between each pair of stations.
You have an additional constraint that the values are non-negative, so for example if a + b = 0 you know that both a and b are zero because it's not possible for one to be positive and the other negative. So you can just add a = 0 and b = 0 as two more equations to your system.
After using #samgak 's solution and converting the segments to variables and creating the matrix, I found a programming algorithm that calculates the final result.
You can find the algorithm in multiple languages here: https://rosettacode.org/wiki/Gaussian_elimination
Here is the PHP answer (that's what I needed):
function swap_rows(&$a, &$b, $r1, $r2)
{
if ($r1 == $r2) return;
$tmp = $a[$r1];
$a[$r1] = $a[$r2];
$a[$r2] = $tmp;
$tmp = $b[$r1];
$b[$r1] = $b[$r2];
$b[$r2] = $tmp;
}
function gauss_eliminate($A, $b, $N)
{
for ($col = 0; $col < $N; $col++)
{
$j = $col;
$max = $A[$j][$j];
for ($i = $col + 1; $i < $N; $i++)
{
$tmp = abs($A[$i][$col]);
if ($tmp > $max)
{
$j = $i;
$max = $tmp;
}
}
swap_rows($A, $b, $col, $j);
for ($i = $col + 1; $i < $N; $i++)
{
$tmp = $A[$i][$col] / $A[$col][$col];
for ($j = $col + 1; $j < $N; $j++)
{
$A[$i][$j] -= $tmp * $A[$col][$j];
}
$A[$i][$col] = 0;
$b[$i] -= $tmp * $b[$col];
}
}
$x = array();
for ($col = $N - 1; $col >= 0; $col--)
{
$tmp = $b[$col];
for ($j = $N - 1; $j > $col; $j--)
{
$tmp -= $x[$j] * $A[$col][$j];
}
$x[$col] = $tmp / $A[$col][$col];
}
return $x;
}
function test_gauss()
{
$a = array(
array(1.00, 0.00, 0.00, 0.00, 0.00, 0.00),
array(1.00, 0.63, 0.39, 0.25, 0.16, 0.10),
array(1.00, 1.26, 1.58, 1.98, 2.49, 3.13),
array(1.00, 1.88, 3.55, 6.70, 12.62, 23.80),
array(1.00, 2.51, 6.32, 15.88, 39.90, 100.28),
array(1.00, 3.14, 9.87, 31.01, 97.41, 306.02)
);
$b = array( -0.01, 0.61, 0.91, 0.99, 0.60, 0.02 );
$x = gauss_eliminate($a, $b, 6);
ksort($x);
print_r($x);
}
test_gauss();

Finding the GCD of two numbers quickly

Are there any ways to make this program faster? I am thinking about some faster tools for user input etc.
Here is my code:
sub partia {
my ( $u, $v ) = #_;
if ( $u == $v ) { return $u }
if ( $u == 0 ) { return $v }
if ( $v == 0 ) { return $u }
if ( ~$u & 1 ) {
if ( $v & 1 ) {
return partia( $u >> 1, $v );
}
else {
return partia( $u >> 1, $v >> 1 ) << 1;
}
}
if ( ~$v & 1 ) {
return partia( $u, $v >> 1 );
}
if ( $u > $v ) {
return partia( ( $u - $v ) >> 1, $v );
}
return partia( ( $v - $u ) >> 1, $u );
}
sub calosc {
$t = <>;
while ($t) {
#tab = split( /\s+/, <> );
print( partia( $tab[0], $tab[1] ), "\n" );
$t--;
}
}
calosc();
How does program works :
Generally it returns greatest common divisor for 2 numbers inputed by user. It's mostly Stein's algorithm.
INPUT :
First line:
How many pairs user wants to check.[enter]
Second line:
first number [space] second number[enter]
OUTPUT:
GCD[enter]
In Python I would use things like :
from sys import stdin
t=int(stdin.readline())
instead of
t=input()
Is there any way to do it?
Your solution — Recursive Stein's Algorithm
It appears that you're simply trying to get the GCD of two numbers, and wanting to do so quickly.
You're apparently using the recursive version of the Binary GCD Algorithm. Typically speaking, it is much better to use an iterative algorithm for both speed and scalability. However, I would assert that it is almost certainly worth it to try the much simpler Euclidean algorithm first.
Alternatives — Iterative Stein's Algorithm and Basic Euclidean Algorithm
I've adapted your script to take 3 number pairs from the __DATA__ block as input. The first pair are just two small numbers, then I have two numbers from the Fibonacci Sequence, and finally two larger numbers including some shared powers of two.
I then coded two new subroutines. One of them uses the Iterative Stein's Algorithm (the method your using), and the other is just a simple Euclidean Algorithm. Benchmarking your partia subroutine versus my two subroutine for 1 million iterations report that the iterative is 50% faster, and that Euclid is 3 times faster.
use strict;
use warnings;
use Benchmark;
#use Math::Prime::Util::GMP qw(gcd);
# Original solution
# - Stein's Algorithm (recursive)
sub partia {
my ( $u, $v ) = #_;
if ( $u == $v ) { return $u }
if ( $u == 0 ) { return $v }
if ( $v == 0 ) { return $u }
if ( ~$u & 1 ) {
if ( $v & 1 ) {
return partia( $u >> 1, $v );
}
else {
return partia( $u >> 1, $v >> 1 ) << 1;
}
}
if ( ~$v & 1 ) {
return partia( $u, $v >> 1 );
}
if ( $u > $v ) {
return partia( ( $u - $v ) >> 1, $v );
}
return partia( ( $v - $u ) >> 1, $u );
}
# Using Euclidian Algorithm
sub euclid {
my ( $quotient, $divisor ) = #_;
return $divisor if $quotient == 0;
return $quotient if $divisor == 0;
while () {
my $remainder = $quotient % $divisor;
return $divisor if $remainder == 0;
$quotient = $divisor;
$divisor = $remainder;
}
}
# Stein's Algorithm (Iterative)
sub stein {
my ($u, $v) = #_;
# GCD(0,v) == v; GCD(u,0) == u, GCD(0,0) == 0
return $v if $u == 0;
return $u if $v == 0;
# Remove all powers of 2 shared by U and V
my $twos = 0;
while ((($u | $v) & 1) == 0) {
$u >>= 1;
$v >>= 1;
++$twos;
}
# Remove Extra powers of 2 from U. From here on, U is always odd.
$u >>= 1 while ($u & 1) == 0;
do {
# Remove all factors of 2 in V -- they are not common
# Note: V is not zero, so while will terminate
$v >>= 1 while ($v & 1) == 0;
# Now U and V are both odd. Swap if necessary so U <= V,
# then set V = V - U (which is even). For bignums, the
# swapping is just pointer movement, and the subtraction
# can be done in-place.
($u, $v) = ($v, $u) if $u > $v;
$v -= $u;
} while ($v != 0);
return $u << $twos;
}
# Process 3 pairs of numbers
my #nums;
while (<DATA>) {
my ($num1, $num2) = split;
# print "Numbers = $num1, $num2\n";
# print ' partia = ', partia($num1, $num2), "\n";
# print ' euclid = ', euclid($num1, $num2), "\n";
# print ' stein = ', stein($num1, $num2), "\n";
# print ' gcd = ', gcd($num1, $num2), "\n\n";
push #nums, [$num1, $num2];
}
# Benchmark!
timethese(1_000_000, {
'Partia' => sub { partia(#$_) for #nums },
'Euclid' => sub { euclid(#$_) for #nums },
'Stein' => sub { stein(#$_) for #nums },
# 'GCD' => sub { gcd(#$_) for #nums },
});
__DATA__
20 25 # GCD of 5
89 144 # GCD of Fibonacci numbers = 1
4789084 957196 # GCD of 388 = 97 * 2 * 2
Outputs:
Benchmark: timing 1000000 iterations of Euclid, Partia, Stein...
Euclid: 9 wallclock secs ( 8.31 usr + 0.00 sys = 8.31 CPU) # 120279.05/s (n=1000000)
Partia: 26 wallclock secs (26.00 usr + 0.00 sys = 26.00 CPU) # 38454.14/s (n=1000000)
Stein: 18 wallclock secs (17.36 usr + 0.01 sys = 17.38 CPU) # 57544.02/s (n=1000000)
Module Solution — Math::Prime::Util::GMP qw(gcd)
The fastest solutions are likely to be C implementations of these algorithms though. I therefore recommend finding already coded versions like that provided by Math::Prime::Util::GMP.
Running benchmarks including this new function shows that it is twice again as fast as the basic Euclidean algorithm that I programmed:
Benchmark: timing 1000000 iterations of Euclid, GCD, Partia, Stein...
Euclid: 8 wallclock secs ( 8.32 usr + 0.00 sys = 8.32 CPU) # 120264.58/s (n=1000000)
GCD: 3 wallclock secs ( 3.93 usr + 0.00 sys = 3.93 CPU) # 254388.20/s (n=1000000)
Partia: 26 wallclock secs (25.94 usr + 0.00 sys = 25.94 CPU) # 38546.04/s (n=1000000)
Stein: 18 wallclock secs (17.55 usr + 0.00 sys = 17.55 CPU) # 56976.81/s (n=1000000)
Unless I've completely forgotten what I'm doing (no promises) - this algorithm looks like it keeps dividing it's terms by 2 in each recurse, which means your algorithm is O(log-base2-N). Unless you can find a constant-time algorithm, you've probably got the best one at the moment.
Now #ikegami has mentioned micro-optimizations...if you want to make those, I suggest that you check out Devel::NYTProf for an awesome Perl profiler that should be able to tell you where you're spending time in your algorithm, so you can target your microoptimisations.

Looking for an algorithm that distributes a width to columns

I would like to print a database-table to STDOUT. If the table-width is greater than the screen-width I would like to cut the column each with the same percentage (unless a table-width has reached min_width) until the table fits in the screen. I've tried to solve this with the posted subroutine. Does somebody know a shorter and more elegant algorithm to solve this problem?
sub cal_size {
my ( $maxcols, $ref ) = #_;
# $maxcols => screen width
# $ref => ref to an AoA; holds the table
my ( $max ) = cal_tab( $ref );
# $max => ref to an array; holds the length of the longest string of each column
# $tab = 2;
if ( $max and #$max ) {
my $sum = sum( #$max ) + $tab * #$max;
$sum -= $tab;
my #max_tmp = #$max;
my $percent = 0;
while ( $sum > $maxcols ) {
$percent += 0.5;
if ( $percent > 99 ) {
return;
}
my $count = 0;
for my $i ( 0 .. $#max_tmp ) {
# $min_width => columns width should not be less than $min_width if possible
next if $min_width >= $max_tmp[$i];
# no need to cut if the column width id less than min_width
next if $min_width >= minus_x_percent( $max_tmp[$i], $percent );
# don't cut if column width become less than min_width
$max_tmp[$i] = minus_x_percent( $max_tmp[$i], $percent );
$count++;
last if $sum <= $maxcols;
}
$min_width-- if $count == 0 and $min_width > 1;
# if no cuts but $sum > $maxcols reduce $min_width
$sum = sum( #max_tmp ) + $tab * #max_tmp;
$sum -= $tab;
}
my $rest = $maxcols - $sum;
while ( $rest > 0 ) { # distribute the rest
my $count = 0;
for my $i ( 0 .. $#max_tmp ) {
if ( $max_tmp[$i] < $max->[$i] ) {
$max_tmp[$i]++;
$rest--;
$count++;
last if $rest < 1;
}
}
last if $count == 0;
last if $rest < 1;
}
$max = [ #max_tmp ] if #max_tmp;
}
return $max;
}
sub minus_x_percent {
my ( $value, $percent ) = #_;
return int $value - ( $value * 1/100 * $percent );
}
This problem would be simple if it wasn't for the lower limit for the field widths. Once a field cannot get any smaller only the larger ones are eligible for scaling, so the calculation varies depending on whether all, none, or some of the fields have been scaled down to their limit.
The scaling has several bands, one per unique field width. As the fields are scaled down in equal proportion the smallest of them will be the first to hit the minimum field size limit. After that only the columns bigger than the smallest size can be reduced any further until the second smallest also reaches the limit.
This continues until all columns have reached their minimum size, after which the available space is just divided equally between the columns.
This program implements the calculations for that algorithm, and I think does what you want.
Note that the returned field widths are floating-point values and you must round them as you see fit.
use strict;
use warnings;
use List::Util 'max';
my $min_col_width = 10;
my $tab = 2;
my $widths = recalc_widths(80, [ 10, 15, 20, 25, 30 ]);
print join ' ', map sprintf('%.3f', $_), #$widths;
print "\n";
sub recalc_widths {
my ($target, $widths) = #_;
$target -= (#$widths - 1) * $tab;
my #sorted_widths = sort { $a <=> $b } #$widths;
my $num_limited = 0;
my $adjustable_total_width = 0;
$adjustable_total_width += $_ for #sorted_widths;
while (#sorted_widths) {
my $boundary = $sorted_widths[0];
my $scale = ($target - $num_limited * $min_col_width) / $adjustable_total_width;
if ($boundary * $scale >= $min_col_width) {
return [ map max($_ * $scale, $min_col_width), #$widths ];
}
while (#sorted_widths and $sorted_widths[0] == $boundary) {
shift #sorted_widths;
$adjustable_total_width -= $boundary;
$num_limited++;
}
}
return [ ($target / $num_limited) x $num_limited ];
}
output
10.000 10.333 13.778 17.222 20.667
I think it's better to use a proper module from CPAN. If you can calc the width of each column, then configure each row print with Text::Format

How should I find nearest neighbors for every element in a list?

I have two sets of integers A and B (size of A less than or equal to B), and I want to answer the question, "How close is A to B?". The way I want to answer this question is by producing a measure of how far you have to go from a given a in A to find a b in B.
The specific measure I want to produce does the following: for each a, find the closest b, the only catch being that once I match a b with an a, I can no longer use that b to match any other a's. (EDIT: the algorithm I'm trying to implement will always prefer a shorter match. So if b is the nearest neighbor to more than one a, pick the a nearest to b. I'm not sure what to do if more than one a has the same distance to b, right now I'm picking the a that precedes b, but that's quite arbitrary and not necessarily optimal.) The measure that I'll for make these sets, the final product, is a histogram showing the number of pairs in the vertical axis and the distance of the pairs in the x-axis.
So if A = {1, 3, 4} and B = {1, 5, 6, 7}, I will get the following a,b pairs: 1,1, 4,5, 3,6. For these data, the histogram should show one pair with distance zero, one pair with distance 1, and one pair with distance 3.
(The actual size of these sets has an upper bound around 100,000 elements, and I read them in from disk already sorted low to high. The integers range from 1 to ~20,000,000. EDIT: also, the elements of A and B are unique, i.e. no repeated elements.)
The solution I've come up with feels a bit clunky. I'm using Perl, but the problem is more or less language agnostic.
First I make a hash, with one key for each number that appears in the union of A and B and values indicating whether each number appears in A, B, or both, e.g. $hash{5} = {a=>1, b=>1} if the number 5 appears in both data-sets. (If it only appeared in A, you'd have $hash{5} = {a=>1}.)
Next, I iterate over A to find all the hash elements that appear in A and B, mark them in the measure, and remove them from the hash.
Then, I sort all the hash keys and make each element of the hash point to its nearest neighbors, like a linked list, where a given hash element now looks like $hash{6} = {b=>1, previous=>4, next=>8}. The linked list doesn't know whether the next and previous elements are in A or B.
Then I loop over pair distances starting at d=1, and find all pairs with distance d, mark them, remove them from the hash, until there are no more elements of A to match.
The loop looks like this:
for ($d=1; #a > 0; $d++) {
#left = ();
foreach $a in #a {
$next = $a;
# find closest b ahead of $a, stop searching if you pass $d
while (exists $hash{$next}{next} && $next - $a < $d) {
$next = $hash{$next}{next};
}
if ($next is in B && $next - $a == $d) {
# found a pair at distance $d
mark_in_measure($a, $next);
remove_from_linked_list($next);
remove_from_linked_list($a);
next;
}
# do same thing looking behind $a
$prev = $a;
...
# you didn't find a match for $a
push #left, $a;
}
#a = #left;
}
This loop obviously prefers pairs that match b's that appear later than a's; I can't tell whether there's a sensible way to decide whether later is better than prior (better in terms of getting closer pairs). The main optimization I'm interested in is processing time.
Sounds like you have a particular case of the Assignment Problem (finding a minimum matching in a weighted bipartite graph).
The algorithm to solve the assignment problem is too slow for you at O(N^3) but I'm pretty sure there you can shave some of this complexity off by exploiting your particular weight function or how you only want a histogram instead of the exact matching.
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use diagnostics;
# http://www.hungarianalgorithm.com/solve.php?c=3-2-6-22--7-2-2-18--13-8-4-12--23-18-14-2&random=1
# https://www.topcoder.com/community/data-science/data-science-tutorials/assignment-problem-and-hungarian-algorithm/
# http://www.cse.ust.hk/~golin/COMP572/Notes/Matching.pdf
my #mat;
my #out_mat;
my $spaces = 6;
my $precision = 0;
my $N = 10;
my $M = 12;
my $r = 100;
my #array1; my #array2;
for my $i (1..$N) {
push #array1, sprintf( "%.${precision}f", rand($r) );
}
for my $i (1..$M) {
push #array2, sprintf( "%.${precision}f", rand($r) );
}
##array1 = ( 1, 3, 4); # $mat[i]->[j] = abs( array1[i] - array2[j] )
##array2 = ( 1, 5, 6, 7);
# 1 5 6 7
# 1 [ 0* 4 5 6 ]
# 3 [ 2 2* 3 4 ]
# 4 [ 3 1 2* 3 ]
my $min_size = $#array1 < $#array2 ? $#array1 : $#array2;
my $max_size = $#array1 > $#array2 ? $#array1 : $#array2;
for (my $i = 0; $i < #array1; $i++){
my #weight_function;
for (my $j = 0; $j < #array2; $j++){
my $dif = sprintf( "%.${precision}f", abs ($array1[$i] - $array2[$j]) );
#my $dif = sprintf( "%.${precision}f", ($array1[$i] - $array2[$j])**2 );
push #weight_function, $dif;
}
push #mat, \#weight_function;
}
# http://cpansearch.perl.org/src/TPEDERSE/Algorithm-Munkres-0.08/lib/Algorithm/Munkres.pm
Algorithm::Munkres::assign(\#mat,\#out_mat);
print "\n\#out_mat index = [";
for my $index (#out_mat) {
printf("%${spaces}d", $index);
}
print " ]\n";
print "\#out_mat values = [";
my %hash;
for my $i (0 .. $max_size){
my $j = $out_mat[$i];
last if ( $i > $min_size and $#array1 < $#array2 );
next if ( $j > $min_size and $#array1 > $#array2 );
my $dif = $mat[$i]->[$j];
printf( "%${spaces}.${precision}f", $dif );
$hash{ $dif } { $i } { 'index_array1' } = $i;
$hash{ $dif } { $i } { 'index_array2' } = $j;
$hash{ $dif } { $i } { 'value_array1' } = $array1[$i];
$hash{ $dif } { $i } { 'value_array2' } = $array2[$j];
}
print " ]\n\n";
my $soma_da_dif = 0;
foreach my $min_diferenca ( sort { $a <=> $b } keys %hash ){
foreach my $k ( sort { $a <=> $b } keys %{$hash{$min_diferenca}} ){
$soma_da_dif += $min_diferenca;
my $index_array1 = $hash{ $min_diferenca } { $k } { 'index_array1' };
my $index_array2 = $hash{ $min_diferenca } { $k } { 'index_array2' };
my $value_array1 = $hash{ $min_diferenca } { $k } { 'value_array1' };
my $value_array2 = $hash{ $min_diferenca } { $k } { 'value_array2' };
printf( " index (%${spaces}.0f,%${spaces}.0f), values (%${spaces}.${precision}f,%${spaces}.${precision}f), dif = %${spaces}.${precision}f\n",
$index_array1, $index_array2, $value_array1, $value_array2, $min_diferenca );
}
}
print "\n\nSum = $soma_da_dif\n";
#-------------------------------------------------#
#------------------ New-Package ------------------#
{ # start scope block
package Algorithm::Munkres;
use 5.006;
use strict;
use warnings;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw( assign );
our $VERSION = '0.08';
...
... <---- copy all the 'package Algorithm::Munkres' here
...
return $minval;
}
1; # don't forget to return a true value from the file
} # end scope block

Resources