Similarity between two ordered lists of numbers with fuzziness

Similarity between two ordered lists of numbers with fuzziness - algorithm

I have ordered lists of numbers (like barcode positions, spectral lines) that I am trying to compare for similarity. Ideally, I would like to compare two lists to get a value from 1.0 (match) degrading gracefully to 0.
The lists could be offset by an arbitrary amount, and that should not degrade the match. The diffs between adjacent items are the most applicable characterization.
Due to noise in the system, some items may be missing (alternatively, extra items may be inserted, depending on point of view).
The diff values may be reordered.
The diff values may be scaled.
Multiple transformations above may be applied and each should reduce similarity proportionally.
Here is some test data:
# deltas
d = [100+(i*10) for i in xrange(10)] # [100, 110, 120, 130, 140, 150, 160, 170, 180, 190]
d_swap = d[:4] + [d[5]] + [d[4]] + d[6:] # [100, 110, 120, 130, 150, 140, 160, 170, 180, 190]
# absolutes
a = [1000+j for j in [0]+[sum(d[:i+1]) for i in xrange(len(d))]] # [1000, 1100, 1210, 1330, 1460, 1600, 1750, 1910, 2080, 2260, 2450]
a_offs = [i+3000 for i in a] # [4000, 4100, 4210, 4330, 4460, 4600, 4750, 4910, 5080, 5260, 5450]
a_rm = a[:2] + a[3:] # [1000, 1100, 1330, 1460, 1600, 1750, 1910, 2080, 2260, 2450]
a_add = a[:7] + [(a[6]+a[7])/2] + a[7:] # [1000, 1100, 1210, 1330, 1460, 1600, 1750, 1830, 1910, 2080, 2260, 2450]
a_swap = [1000+j for j in [0]+[sum(d_swap[:i+1]) for i in xrange(len(d_swap))]] # [1000, 1100, 1210, 1330, 1460, 1610, 1750, 1910, 2080, 2260, 2450]
a_stretch = [1000+j for j in [0]+[int(sum(d[:i+1])*1.1) for i in xrange(len(d))]] # [1000, 1110, 1231, 1363, 1506, 1660, 1825, 2001, 2188, 2386, 2595]
a_squeeze = [1000+j for j in [0]+[int(sum(d[:i+1])*0.9) for i in xrange(len(d))]] # [1000, 1090, 1189, 1297, 1414, 1540, 1675, 1819, 1972, 2134, 2305]
Sim(a, a_offs) should be 1.0 since offset is not considered a penalty.
Sim(a, a_rm) and Sim(a, a_add) should be about 0.91 because 10 of 11 or 11 of 12 match.
Sim(a, a_swap) should be about 0.96 because one diff is out of place (possibly with a further penalty based on distance if moved more than one position).
Sim(a, a_stretch) and Sim(a, a_squeeze) should be about 0.9 because diffs were scaled by about 1 part in 10.
I am thinking of something like difflib.SequenceMatcher but that works for numeric values with fuzziness instead of hard-compared hashables. It would also need to retain some awareness of the diff (first derivative) relationship.
This seems to be a dynamic programming problem, but I can't figure out how to construct an appropriate cost metric.

Related

serialize ruby fixnum by denominator into an array

I have a number, say it's 1000000. I want to give a second number, say 100 and create an Array that looks like [0,100,200,300,400....1000000]. I'm having trouble finding a way to iterate in a way that would serialize based on a given denominator. Any ideas?

n = 200
i = 10
(0..n).step(i).to_a
#=> [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120,
# 130, 140, 150, 160, 170, 180, 190, 200]
See Range#step.

Filter sequence to subsequence with elements evenly apart

I have an input of integers (or floats), eg.
100, 203, 230, 280, 400, 410, 505, 600
And I want to filter them to get a subsequence so that the numbers would be almost evenly apart, that is, remove outsiders, in my case, the filtered sequence would be:
100, 203, 280, 400, 505, 600
Since they all lie about 100 units from each other.
In addition I also know that the average distance of the whole sequence is limited, eg.
70 < dist < 130.
What algorithm should I use?
Extra: Could the algorithm be improved that it would also work if now and then an element of the sequence would be missing, e.g. 505:
100, 203, 230, 280, 400, 410, 600
So the result would be either
100, 203, 280, 400, 600
or something like
100, 203, 280, 400, 500, 600
Also: I am working with semi-big data and would prefer prefer a fast solutions (not checking all possible subsequences for example).

Finding positions of milestones given their pairwise distances

There is a straight road with 'n' number of milestones. You are given
an array with the distance between all the pairs of milestones in
some random order. Find the position of milestones.
Example:
Consider a road with 4 milestones (a,b,c,d) :
a ---3Km--- b ---5Km--- c ---2Km--- d
Distance between a and b is 3
Distance between a and c is 8
Distance between a and d is 10
Distance between b and c is 5
Distance between b and d is 7
Distance between c and d is 2
All the above values are given in a random order say 7, 10, 5, 2, 8, 3.
The output must be 3, 5, 2 or 2, 5, 3.
Assuming the length of the give array is n. My idea is:
Calculate the number of milestones by solving a quadratic equation, saying it's x.
There are P(n, x-1) possibilities.
Validate every possible permutation.
Is there any better solution for this problem?

I can't find an algorithm for this that has good worst-case behaviour. However, the following heuristic may be useful for practical solution:
Say the first landmark is at position zero. You can find the last landmark. Then all other landmark positions need to appear in the input array. Their distances to the last landmark must also appear.
Let's build a graph on these possible landmark positions.
If a and b are two possible landmark positions, then either |a-b| appears in the input array or at least one of a and b isn't a landmark position. Draw an edge between a and b if |a-b| appears in the input array.
Iteratively filter out landmark positions whose degree is too small.
You wind up with something that's almost a clique-finding problem. Find an appropriately large clique; it corresponds to a positioning of the landmarks. Check that this positioning actually gives rise to the right distances.
At worst here, you've narrowed down the possible landmark positions to a more manageable set.

Ok. I will give my idea , which could reduce the number of permutations.
Finding n, is simple, you could even run a Reverse factorial https://math.stackexchange.com/questions/171882/is-there-a-way-to-reverse-factorials
Assumption:
Currently I have no idea of how to find the numbers. But I assume you have found out the numbers somehow. After finding n and elements we could apply this for partial reduction of computation.
Consider a problem like,
|<--3-->|<--6-->|<--1-->|<--7-->|
A B C D E
Now as you said, the sum they will give (in random order too) 3,9,10,17,6,7,14,1,8,7.
But you could take any combination (mostly it will be wrong ),
6-3-1-7. (say this is our taken combination)
Now,
6+3 -> 9 There, so Yes //Checking in the list whether the 2 numbers could possibly be adjacent.
3+1 -> 4 NOT THERE, so cannot
1+7 -> 8 There, So Yes
6+7 -> 13 NOT THERE, So cannot be ajacent
Heart concept :
For, 2 numbers to be adjacent, their sum must be there in the list. If the sum is not in the list, then the numbers are not adjacent.
Optimization :
So, 3 and 1 will not come nearby. And 6 and 7 will not come nearby.
Hence while doing permutation, we could eliminate
*31*,*13*,*76* and *67* combinations. Where * is 0 or more no of digits either preceding or succeeding.
i.e instead of trying permutation for 4! = 24 times, we could only check for 3617,1637,3716,1736. ie only 4 times. i.e 84% of computation is saved.
Worst case :
Say in your case it is 5,2,3.
Now, we have to perform this operation.
5+2 -> 7 There
2+3 -> 5 There
5+3 -> 8 There
Oops, your example is worst case, where we could not optimize the solution in these type of cases.

Place the milestones one by one
EDIT See new implementation below (with timings).
The key idea is the following:
Build a list of milestones one by one, starting with one milestone at 0 and a milestone at max(distances). Lets call them endpoints.
The largest distance that's not accounted for has to be from one of the endpoints, which leaves at most two positions for the corresponding milestone.
The following Python program simply checks if the milestone can be placed from the left endpoint, and if not, tries to place the milestone from the right endpoint (always using the largest distances that's not accounted for by the already placed milestones). This has to be done with back-tracking, as placements may turn out wrong later.
Note that there is another (mirrored) solution that is not output. (I don't think there can be more than 2 solutions (symmetric), but I haven't proven it.)
I consider the position of the milestones as the solution and use a helper function steps for the output desired by the OP.
from collections import Counter
def milestones_from_dists(dists, milestones=None):
if not dists: # all dist are acounted for: we have a solution!
return milestones
if milestones is None:
milestones = [0]
max_dist = max(dists)
solution_from_left = try_milestone(dists, milestones, min(milestones) + max_dist)
if solution_from_left is not None:
return solution_from_left
return try_milestone(dists, milestones, max(milestones) - max_dist)
def try_milestone(dists, milestones, new_milestone):
unused_dists = Counter(dists)
for milestone in milestones:
dist = abs(milestone - new_milestone)
if unused_dists[dist]:
unused_dists[dist] -= 1
if unused_dists[dist] == 0:
del unused_dists[dist]
else:
return None # no solution
return milestones_from_dists(unused_dists, milestones + [new_milestone])
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Example usage:
>>> print(steps(milestones_from_dists([7, 10, 5, 2, 8, 3])))
[3, 5, 2]
>>> import random
>>> milestones = random.sample(range(1000), 100)
>>> dists = [abs(x - y) for x in milestones for y in milestones if x < y]
>>> solution = sorted(milestones_from_dists(dists))
>>> solution == sorted(milestones)
True
>>> print(solution)
[0, 10, 16, 23, 33, 63, 72, 89, 97, 108, 131, 146, 152, 153, 156, 159, 171, 188, 210, 211, 212, 215, 219, 234, 248, 249, 273, 320, 325, 329, 339, 357, 363, 387, 394, 396, 402, 408, 412, 418, 426, 463, 469, 472, 473, 485, 506, 515, 517, 533, 536, 549, 586, 613, 614, 615, 622, 625, 630, 634, 640, 649, 651, 653, 671, 674, 697, 698, 711, 715, 720, 730, 731, 733, 747, 758, 770, 772, 773, 776, 777, 778, 783, 784, 789, 809, 828, 832, 833, 855, 861, 873, 891, 894, 918, 952, 953, 968, 977, 979]
>>> print(steps(solution))
[10, 6, 7, 10, 30, 9, 17, 8, 11, 23, 15, 6, 1, 3, 3, 12, 17, 22, 1, 1, 3, 4, 15, 14, 1, 24, 47, 5, 4, 10, 18, 6, 24, 7, 2, 6, 6, 4, 6, 8, 37, 6, 3, 1, 12, 21, 9, 2, 16, 3, 13, 37, 27, 1, 1, 7, 3, 5, 4, 6, 9, 2, 2, 18, 3, 23, 1, 13, 4, 5, 10, 1, 2, 14, 11, 12, 2, 1, 3, 1, 1, 5, 1, 5, 20, 19, 4, 1, 22, 6, 12, 18, 3, 24, 34, 1, 15, 9, 2]
New implementation incorporationg suggestions from the comments
from collections import Counter
def milestones_from_dists(dists):
dists = Counter(dists)
right_end = max(dists)
milestones = [0, right_end]
del dists[right_end]
sorted_dists = sorted(dists)
add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
return milestones
def add_milestone
s_from_dists(dists, milestones, sorted_dists, right_end):
if not dists:
return True # success!
# find max dist that's not fully used yet
deleted_dists = []
while not dists[sorted_dists[-1]]:
deleted_dists.append(sorted_dists[-1])
del sorted_dists[-1]
max_dist = sorted_dists[-1]
# for both possible positions, check if this fits the already placed milestones
for new_milestone in [max_dist, right_end - max_dist]:
used_dists = Counter() # for backing up
for milestone in milestones:
dist = abs(milestone - new_milestone)
if dists[dist]: # this distance is still available
dists[dist] -= 1
if dists[dist] == 0:
del dists[dist]
used_dists[dist] += 1
else: # no solution
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
break
else: # unbroken
milestones.append(new_milestone)
success = add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
if success:
return True
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
del milestones[-1]
return False
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Timings for random milestones in the range from 0 to 100000:
n = 10: 0.00s
n = 100: 0.05s
n = 1000: 3.20s
n = 10000: still takes too long.

The largest distance in the given set of distance is the distance between the first and the last milestone, i.e. in your example 10. You can find this in O(n) step.
For every other milestone (every one except the first or the last), you can find their distances from the first and the last milestone by looking for a pair of distances that sums up to the maximum distance, i.e. in your example 7+3 = 10, 8+2 = 10. You can find these pairs trivially in O(n^2).
Now if you think the road is from east to west, what remains is that for all the interior milestones (all but the first or the last), you need to know which one of the two distances (e.g. 7 and 3, or 8 and 2) is towards east (the other is then towards west).
You can trivially enumerate all the possibilities in time O(2^(n-2)), and for every possible orientation check that you get the same set of distances as in the problem. This is faster than enumerating through all permutations of the smallest distances in the set.
For example, if you assume 7 and 8 are towards west, then the distance between the two internal milestones is 1 mile, which is not in the problem set. So it must be 7 towards west, 8 towards east, leading to solution (or it's mirror)
WEST | -- 2 -- | -- 5 -- | -- 3 -- | EAST
For a larger set of milestones, you would just start guessing the orientation of the two distances to the endpoints, and whenever you product two milestones that have a distance between them that is not in the problem set, you backtrack.

Mathematical representation of large numbers?

I am attempting to write a function which takes a large number as input (upwards of 800 digits long) and returns a simple formula of no complex math as a string.
By simple math, I mean just numbers with +,-,*,/,^ and () as needed.
'4^25+2^32' = giveMeMath(1125904201809920); // example
Any language would do. I can refactor it, just looking for some help with the logic.
Bonus. The shorter the output the better. Processing time is important. Also, mathematical accuracy is a must.
Update:
to clarify, all input values will be positive integers (no decimals)

I think the entire problem can be recast to a run-length encoding problem on the binary representation of the long integer.
For example, take the following number:
17976931348623159077293051907890247336179769789423065727343008115773
26758055009631327084773224075360211201138798713933576587897688144166
22492847430639474110969959963482268385702277221395399966640087262359
69162804527670696057843280792693630866652907025992282065272811175389
6392184596904358265409895975218053120L
This looks fairly horrendous. In binary, though:
>>> bin(_)
'0b11111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111100000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000
0000000'
Which is about 500 ones, followed by 500 zeroes. This suggests an expression like:
2**1024 - 2**512
Which is how I obtained the large number in the first place.
If there are no significantly long runs in the binary representation of the integer, this won't work well at all. 101010101010101010.... is the worst case.

Here is my attempt in Python:
def give_me_math(n):
if n % 2 == 1:
n = n - 1 # we need to make every odd number even, and add back one later
odd = 1
else:
odd = 0
exps = []
while n > 0:
c = 0
num = 0
while num <= n/2:
c += 1
num = 2**c
exps.append(c)
n = n - num
return (exps, odd)
Results:
>>> give_me_math(100)
([6, 5, 2], 0) #2**6 + 2**5 + 2**2 + 0 = 100
>>> give_me_math(99)
([6, 5, 1], 1) #2**6 + 2**5 + 2**1 + 1 = 99
>>> give_me_math(103)
([6, 5, 2, 1], 1) #2**6 + 2**5 + 2**2 + 2**1 + 1 = 103
I believe the results are accurate, but I am not sure about your other criteria.
Edit:
Result: Calculates in about a second.
>>> give_me_math(10**100 + 3435)
([332, 329, 326, 323, 320, 319, 317, 315, 314, 312, 309, 306, 304, 303, 300, 298, 295, 294, 289, 288, 286, 285, 284, 283, 282, 279, 278, 277, 275, 273, 272, 267, 265, 264, 261, 258, 257, 256, 255, 250, 247, 246, 242, 239, 238, 235, 234, 233, 227, 225, 224, 223, 222, 221, 220, 217, 216, 215, 211, 209, 207, 206, 203, 202, 201, 198, 191, 187, 186, 185, 181, 176, 172, 171, 169, 166, 165, 164, 163, 162, 159, 157, 155, 153, 151, 149, 148, 145, 142, 137, 136, 131, 127, 125, 123, 117, 115, 114, 113, 111, 107, 106, 105, 104, 100, 11, 10, 8, 6, 5, 3, 1], 1)
800 digit works fast too:
>>> give_me_math(10**800 + 3452)
But the output is too long to post here, which is OPs concern of course.
Time complexity here is 0(ln(n)), so it is pretty efficient.

In java, you should take a look at the BigDecimal class in java.math package.

I'd suggest you to have a look at
The GMP library (The GNU Multiple Precision Arithmetic Library) for performing the arithmetics
Take a look at integer factorization. The link redirects to Wikipedia which should give probably a good overview. However to be a bit more scientific:
Integer factorization (PDF) by Daniel Bernstein of the University of Illinois
Integer Factorization Algorithms (PDF) by Connelly Barnes of the Department of Physics, Oregon State University

fitting n variable height images into 3 (similar length) column layout

I'm looking to make a 3-column layout similar to that of piccsy.com. Given a number of images of the same width but varying height, what is a algorithm to order them so that the difference in column lengths is minimal? Ideally in Python or JavaScript...
Thanks a lot for your help in advance!
Martin

How many images?
If you limit the maximum page size, and have a value for the minimum picture height, you can calculate the maximum number of images per page. You would need this when evaluating any solution.
I think there were 27 pictures on the link you gave.
The following uses the first_fit algorithm mentioned by Robin Green earlier but then improves on this by greedy swapping.
The swapping routine finds the column that is furthest away from the average column height then systematically looks for a swap between one of its pictures and the first picture in another column that minimizes the maximum deviation from the average.
I used a random sample of 30 pictures with heights in the range five to 50 'units'. The convergenge was swift in my case and improved significantly on the first_fit algorithm.
The code (Python 3.2:
def first_fit(items, bincount=3):
items = sorted(items, reverse=1) # New - improves first fit.
bins = [[] for c in range(bincount)]
binsizes = [0] * bincount
for item in items:
minbinindex = binsizes.index(min(binsizes))
bins[minbinindex].append(item)
binsizes[minbinindex] += item
average = sum(binsizes) / float(bincount)
maxdeviation = max(abs(average - bs) for bs in binsizes)
return bins, binsizes, average, maxdeviation
def swap1(columns, colsize, average, margin=0):
'See if you can do a swap to smooth the heights'
colcount = len(columns)
maxdeviation, i_a = max((abs(average - cs), i)
for i,cs in enumerate(colsize))
col_a = columns[i_a]
for pic_a in set(col_a): # use set as if same height then only do once
for i_b, col_b in enumerate(columns):
if i_a != i_b: # Not same column
for pic_b in set(col_b):
if (abs(pic_a - pic_b) > margin): # Not same heights
# new heights if swapped
new_a = colsize[i_a] - pic_a + pic_b
new_b = colsize[i_b] - pic_b + pic_a
if all(abs(average - new) < maxdeviation
for new in (new_a, new_b)):
# Better to swap (in-place)
colsize[i_a] = new_a
colsize[i_b] = new_b
columns[i_a].remove(pic_a)
columns[i_a].append(pic_b)
columns[i_b].remove(pic_b)
columns[i_b].append(pic_a)
maxdeviation = max(abs(average - cs)
for cs in colsize)
return True, maxdeviation
return False, maxdeviation
def printit(columns, colsize, average, maxdeviation):
print('columns')
pp(columns)
print('colsize:', colsize)
print('average, maxdeviation:', average, maxdeviation)
print('deviations:', [abs(average - cs) for cs in colsize])
print()
if __name__ == '__main__':
## Some data
#import random
#heights = [random.randint(5, 50) for i in range(30)]
## Here's some from the above, but 'fixed'.
from pprint import pprint as pp
heights = [45, 7, 46, 34, 12, 12, 34, 19, 17, 41,
28, 9, 37, 32, 30, 44, 17, 16, 44, 7,
23, 30, 36, 5, 40, 20, 28, 42, 8, 38]
columns, colsize, average, maxdeviation = first_fit(heights)
printit(columns, colsize, average, maxdeviation)
while 1:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
printit(columns, colsize, average, maxdeviation)
if not swapped:
break
#input('Paused: ')
The output:
columns
[[45, 12, 17, 28, 32, 17, 44, 5, 40, 8, 38],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 9, 37, 44, 30, 20, 28]]
colsize: [286, 267, 248]
average, maxdeviation: 267.0 19.0
deviations: [19.0, 0.0, 19.0]
columns
[[45, 12, 17, 28, 17, 44, 5, 40, 8, 38, 9],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 34, 37, 44, 30, 20, 28, 32]]
colsize: [263, 267, 271]
average, maxdeviation: 267.0 4.0
deviations: [4.0, 0.0, 4.0]
columns
[[45, 12, 17, 17, 44, 5, 40, 8, 38, 9, 34],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 37, 44, 30, 20, 28, 32, 28]]
colsize: [269, 267, 265]
average, maxdeviation: 267.0 2.0
deviations: [2.0, 0.0, 2.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
columns
[[45, 12, 17, 17, 44, 5, 8, 38, 9, 34, 37],
[7, 34, 12, 19, 41, 30, 16, 7, 23, 36, 42],
[46, 44, 30, 20, 28, 32, 28, 40]]
colsize: [266, 267, 268]
average, maxdeviation: 267.0 1.0
deviations: [1.0, 0.0, 1.0]
Nice problem.
Heres the info on reverse-sorting mentioned in my separate comment below.
>>> h = sorted(heights, reverse=1)
>>> h
[46, 45, 44, 44, 42, 41, 40, 38, 37, 36, 34, 34, 32, 30, 30, 28, 28, 23, 20, 19, 17, 17, 16, 12, 12, 9, 8, 7, 7, 5]
>>> columns, colsize, average, maxdeviation = first_fit(h)
>>> printit(columns, colsize, average, maxdeviation)
columns
[[46, 41, 40, 34, 30, 28, 19, 12, 12, 5],
[45, 42, 38, 36, 30, 28, 17, 16, 8, 7],
[44, 44, 37, 34, 32, 23, 20, 17, 9, 7]]
colsize: [267, 267, 267]
average, maxdeviation: 267.0 0.0
deviations: [0.0, 0.0, 0.0]
If you have the reverse-sorting, this extra code appended to the bottom of the above code (in the 'if name == ...), will do extra trials on random data:
for trial in range(2,11):
print('\n## Trial %i' % trial)
heights = [random.randint(5, 50) for i in range(random.randint(5, 50))]
print('Pictures:',len(heights))
columns, colsize, average, maxdeviation = first_fit(heights)
print('average %7.3f' % average, '\nmaxdeviation:')
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount = 0
while maxdeviation:
swapped, maxdeviation = swap1(columns, colsize, average, maxdeviation)
if not swapped:
break
print('%5.2f%% = %6.3f' % ((maxdeviation * 100. / average), maxdeviation))
swapcount += 1
print('swaps:', swapcount)
The extra output shows the effect of the swaps:
## Trial 2
Pictures: 11
average 72.000
maxdeviation:
9.72% = 7.000
swaps: 0
## Trial 3
Pictures: 14
average 118.667
maxdeviation:
6.46% = 7.667
4.78% = 5.667
3.09% = 3.667
0.56% = 0.667
swaps: 3
## Trial 4
Pictures: 46
average 470.333
maxdeviation:
0.57% = 2.667
0.35% = 1.667
0.14% = 0.667
swaps: 2
## Trial 5
Pictures: 40
average 388.667
maxdeviation:
0.43% = 1.667
0.17% = 0.667
swaps: 1
## Trial 6
Pictures: 5
average 44.000
maxdeviation:
4.55% = 2.000
swaps: 0
## Trial 7
Pictures: 30
average 295.000
maxdeviation:
0.34% = 1.000
swaps: 0
## Trial 8
Pictures: 43
average 413.000
maxdeviation:
0.97% = 4.000
0.73% = 3.000
0.48% = 2.000
swaps: 2
## Trial 9
Pictures: 33
average 342.000
maxdeviation:
0.29% = 1.000
swaps: 0
## Trial 10
Pictures: 26
average 233.333
maxdeviation:
2.29% = 5.333
1.86% = 4.333
1.43% = 3.333
1.00% = 2.333
0.57% = 1.333
swaps: 4

This is the offline makespan minimisation problem, which I think is equivalent to the multiprocessor scheduling problem. Instead of jobs you have images, and instead of job durations you have image heights, but it's exactly the same problem. (The fact that it involves space instead of time doesn't matter.) So any algorithm that (approximately) solves either of them will do.

Here's an algorithm (called First Fit Decreasing) that will get you a very compact arrangement, in a reasonable amount of time. There may be a better algorithm but this is ridiculously simple.
Sort the images in order from tallest to shortest.
Take the first image, and place it in the shortest column.
(If multiple columns are the same height (and shortest) pick any one.)
Repeat step 2 until no images remain.
When you're done, you can re-arrange the elements in the each column however you choose if you don't like the tallest-to-shortest look.

Here's one:
// Create initial solution
<run First Fit Decreasing algorithm first>
// Calculate "error", i.e. maximum height difference
// after running FFD
err = (maximum_height - minimum_height)
minerr = err
// Run simple greedy optimization and random search
repeat for a number of steps: // e.g. 1000 steps
<find any two random images a and b from two different columns such that
swapping a and b decreases the error>
if <found>:
swap a and b
err = (maximum_height - minimum_height)
if (err < minerr):
<store as best solution so far> // X
else:
swap two random images from two columns
err = (maximum_height - minimum_height)
<output the best solution stored on line marked with X>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Similarity between two ordered lists of numbers with fuzziness - algorithm

Related

serialize ruby fixnum by denominator into an array

Filter sequence to subsequence with elements evenly apart

Finding positions of milestones given their pairwise distances

Mathematical representation of large numbers?

fitting n variable height images into 3 (similar length) column layout

Categories

Resources