How to find out which subtotal make up a sum? - algorithm

I need to find numbers in a list, which make up a specific total:
Sum: 500
Subtotals: 10 490 20 5 5
In the end I need: {10 490, 490 5 5}
How do you call this type of problem? Are there algorithms to solve it efficiently?

This is Knapsack problem and it is an NP-complete problem, i.e. there is no efficient algorithm known for it.

This is not a knapsack problem.
In the worst case, with N subtotals, there can be O(2^N) solutions, so any algorithm in worst-case will be no better than this (thus, the problem doesn't belong to NP class at all).
Let's assume there are no non-positive elements in the Subtotals array and any element is no greater than Sum. We can sort array of subtotals, then build array of tail sums, adding 0 to the end. In your example, it will look like:
Subtotals: (490, 20, 10, 5, 5)
PartialSums: (530, 40, 20, 10, 5, 0)
Now for any "remaining sum" S, position i, and "current list" L we have problem E(S, i, L):
E(0, i, L) = (print L).
E(S, i, L) | (PartialSums[i] < S) = (nothing).
E(S, i, L) = E(S, i+1, L), E(S-Subtotals[i], j, L||Subtotals[i]), where j is index of first element of Subtotals lesser than or equal to (S-Subtotals[i]) or i+1, whichever is greater.
Our problem is E(Sum, 0, {}).
Of course, there's a problem with duplicates (if there were another 490 number in your list, this algorithm would output 4 solutions). If that's not what you need, using array of pairs (value, multiplicity) may help.
P.S. You may also consider dynamic programming if size of the problem is small enough:
Start with set {0}. Create array of sets equal to array of subtotals in size.
For every subtotal create a new set from previous set by adding subtotal value. Remove all elements greater than Sum. Merge it with previous set (it will essentially be the set of all possible sums).
If in the final set doesn't have Sum, then there is no solution. Otherwise, you backtrack solution from Sum to 0, checking whether previous set contains [value] and [value-subtotal].
Example:
(10, 490, 20, 5, 5)
Sets:
(0)
(0, 10)
(0, 10, 490, 500)
(0, 10, 20, 30, 490, 500) (510, 520 - discarded)
(0, 5, 10, 15, 20, 25, 30, 35, 490, 495, 500)
(0, 5, 10, 15, 20, 25, 30, 35, 40, 490, 495, 500)
From last set: [500-5] in previous set, [495-5] in previous set, [490-20] not in previous set ([490] is), [490-490] is 0, resulting answer {5, 5, 490}.

Related

How to determine if a number is polygonal for a polygon with s sides

A polygonal number is defined as being a number represented as dots arranged in the shape of a regular polygon.
For example:
Triangular numbers are 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, ...
Square numbers are 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
Pentagonal number are 0, 1, 5, 12, 22, 35, 51, 70, 92, 117, ...
and so on...
There are well known formulas to calculate any of these numbers. To calculate the n-th s-gonal number, one can use the formula (n^2 * (s - 2) - (n * (s - 4))) / 2
What I would like to know is, is there an efficient way to check if a given number is s-gonal for a given s?
The obvious approach would be to take successive values from a function that generates s-gonal numbers until either n is found, or the values exceed n, however this has linear time complexity.
I know there are formulas that can be used to determine if a number is s-gonal for specific values of s, but I would like one that works for any s.
Based on Wikipedia's article on Polygonal numbers I can up with the following predicate that seems to solve the problems I ran into with the one the OP proposed:
def isPolygonal(s, x):
''' Check if x is a s-gonal number '''
assert s > 2 and s % 1 == 0 and x % 1 == 0
# Determine if x is some nth s-gonal number,
# fail if n doesn't come out a whole number
n = (sqrt(8 * (s - 2) * x + (s - 4) ** 2) + (s - 4)) / (2 * (s - 2))
return n % 1 == 0

What is the best approach to find and store all pair distances in a weighted tree?

What is the best approach to find and store all pair distances in a weighted tree?
My current approach is to run bfs on every node. But obviously this approach approach suffers from large complexity. Can we improve it further ?
A reasonable way to store them is to use consecutive node numbers starting at 0 so the distances fit neatly in a triangular array d[i,j] where j < i. A reasonable way to compute them is to augment a single search. I'll use the shorthand D[i,j] for d[max(i,j), min(i,j)]. This lets me ignore vertex numbering for convenience.
Let C be a set of completed nodes
Set W be a working set of nodes
Choose any node. Add it to C. Add all its adjacent nodes to W.
while W is not empty
Remove any node x from W
Let e be the unique edge (x,y) where y \in C and d(x, y) be its length
D[x, y] = d(x, y)
for each node z in C - {y}
D[x, z] = D[x, y] + D[y, z]
add x to C
add all nodes adjacent to x but not in C to W
The loop invariant is that for each pair of nodes in C -- call such a pair (p, q) -- we have already computed D[p,q]. The nodes in C always correspond to a subtree and W the nodes adjacent to that subtree.
While this has the same asymptotic complexity as doing n breadth first searches, it's potentially quite a bit faster because it traverses each graph edge only once rather than n times and computes each distance once rather than twice.
A quick Python implementation:
def distance_matrix(graph):
adj, dist = graph
result = [ [0 for j in range(i)] for i in range(len(adj)) ]
c = set([0])
w = set([(x, 0) for x in adj[0]])
while w:
x, y = pair = w.pop()
d = result[max(pair)][min(pair)] = dist[pair]
for z in c:
if z != y:
result[max(x,z)][min(x,z)] = d + result[max(y,z)][min(y,z)]
c.add(x)
for a in adj[x]:
if a not in c:
w.add((a, x))
return result
def to_graph(tree):
adj = [ set() for i in range(len(tree)) ]
dist = {}
for (parent, child_pairs) in tree:
for (edge_len, child) in child_pairs:
adj[child].add(parent)
adj[parent].add(child)
dist[(parent, child)] = edge_len
dist[(child, parent)] = edge_len
return (adj, dist)
def main():
tree = (
(0, ((12, 1), (7, 2), (9, 3))),
(1, ((5, 4), (19, 5))),
(2, ()),
(3, ((31, 6),)),
(4, ((27, 7), (15, 8))),
(5, ()),
(6, ((23, 9), (11, 10))),
(7, ()),
(8, ()),
(9, ()),
(10, ()))
graph = to_graph(tree)
print distance_matrix(graph)
Output (with pprint):
[[],
[12],
[7, 19],
[9, 21, 16],
[17, 5, 24, 26],
[31, 19, 38, 40, 24],
[40, 52, 47, 31, 57, 71],
[44, 32, 51, 53, 27, 51, 84],
[32, 20, 39, 41, 15, 39, 72, 42],
[63, 75, 70, 54, 80, 94, 23, 107, 95],
[51, 63, 58, 42, 68, 82, 11, 95, 83, 34]]
If you need to find the distance for all pairs, you can't do better than that. There're O(N^2) pairs so you need at least O(N^2) time to generate the answer.
If you don't need to compute all of them (say, you get pairs as query parameters later on), other solutions with their own trade-offs are possible.

Partitioning of an array in to

Given an array of positive integers {a1, a2, ..., an} you are required to partition the array into k blocks/partitions such that the maximum of sums of integers in each partition is the minimum it can be. Restriction: you cannot alter the turn in which the numbers appear (example: if you have {2, 5, 80, 1, 200, 80, 8000, 90} one partition CANNOT be the {2, 80, 1, 90}). I need the output partition-values and the maximum sum of the partition. Some kind of Knuth's algorithm or anything else? Any sugggestion? I have no idea...
So, for example:
{11, 16, 5, 5, 12, 10} n (n=3)
The best partitioning according to the problem is:
[(11), (16, 5, 5), (12, 10)]
Given that you can't change the order of the numbers in the array, my suggestion of the solution is:
Binary search on the maximum sum
Given the current maximum sum, use greedy algorithm to see if k blocks are enough to cover the whole array.
The algorithm looks like:
l=0
r=sum(A)
while l<=r:
mid=(l+r)/2;
if greedy(mid):
r=mid-1
else:
l=mid+1
The final maximum sum will be l, and you can use it to construct the partition using greedy.
And the function greedy will look like:
def greedy(s):
now_k=0
now_sum=0
for i in A:
now_sum+=i
if now_sum>s:
now_sum=0
now_k++
if nowsum > 0:
now_k++
return now_k<=k

Finding positions of milestones given their pairwise distances

There is a straight road with 'n' number of milestones. You are given
an array with the distance between all the pairs of milestones in
some random order. Find the position of milestones.
Example:
Consider a road with 4 milestones (a,b,c,d) :
a ---3Km--- b ---5Km--- c ---2Km--- d
Distance between a and b is 3
Distance between a and c is 8
Distance between a and d is 10
Distance between b and c is 5
Distance between b and d is 7
Distance between c and d is 2
All the above values are given in a random order say 7, 10, 5, 2, 8, 3.
The output must be 3, 5, 2 or 2, 5, 3.
Assuming the length of the give array is n. My idea is:
Calculate the number of milestones by solving a quadratic equation, saying it's x.
There are P(n, x-1) possibilities.
Validate every possible permutation.
Is there any better solution for this problem?
I can't find an algorithm for this that has good worst-case behaviour. However, the following heuristic may be useful for practical solution:
Say the first landmark is at position zero. You can find the last landmark. Then all other landmark positions need to appear in the input array. Their distances to the last landmark must also appear.
Let's build a graph on these possible landmark positions.
If a and b are two possible landmark positions, then either |a-b| appears in the input array or at least one of a and b isn't a landmark position. Draw an edge between a and b if |a-b| appears in the input array.
Iteratively filter out landmark positions whose degree is too small.
You wind up with something that's almost a clique-finding problem. Find an appropriately large clique; it corresponds to a positioning of the landmarks. Check that this positioning actually gives rise to the right distances.
At worst here, you've narrowed down the possible landmark positions to a more manageable set.
Ok. I will give my idea , which could reduce the number of permutations.
Finding n, is simple, you could even run a Reverse factorial https://math.stackexchange.com/questions/171882/is-there-a-way-to-reverse-factorials
Assumption:
Currently I have no idea of how to find the numbers. But I assume you have found out the numbers somehow. After finding n and elements we could apply this for partial reduction of computation.
Consider a problem like,
|<--3-->|<--6-->|<--1-->|<--7-->|
A B C D E
Now as you said, the sum they will give (in random order too) 3,9,10,17,6,7,14,1,8,7.
But you could take any combination (mostly it will be wrong ),
6-3-1-7. (say this is our taken combination)
Now,
6+3 -> 9 There, so Yes //Checking in the list whether the 2 numbers could possibly be adjacent.
3+1 -> 4 NOT THERE, so cannot
1+7 -> 8 There, So Yes
6+7 -> 13 NOT THERE, So cannot be ajacent
Heart concept :
For, 2 numbers to be adjacent, their sum must be there in the list. If the sum is not in the list, then the numbers are not adjacent.
Optimization :
So, 3 and 1 will not come nearby. And 6 and 7 will not come nearby.
Hence while doing permutation, we could eliminate
*31*,*13*,*76* and *67* combinations. Where * is 0 or more no of digits either preceding or succeeding.
i.e instead of trying permutation for 4! = 24 times, we could only check for 3617,1637,3716,1736. ie only 4 times. i.e 84% of computation is saved.
Worst case :
Say in your case it is 5,2,3.
Now, we have to perform this operation.
5+2 -> 7 There
2+3 -> 5 There
5+3 -> 8 There
Oops, your example is worst case, where we could not optimize the solution in these type of cases.
Place the milestones one by one
EDIT See new implementation below (with timings).
The key idea is the following:
Build a list of milestones one by one, starting with one milestone at 0 and a milestone at max(distances). Lets call them endpoints.
The largest distance that's not accounted for has to be from one of the endpoints, which leaves at most two positions for the corresponding milestone.
The following Python program simply checks if the milestone can be placed from the left endpoint, and if not, tries to place the milestone from the right endpoint (always using the largest distances that's not accounted for by the already placed milestones). This has to be done with back-tracking, as placements may turn out wrong later.
Note that there is another (mirrored) solution that is not output. (I don't think there can be more than 2 solutions (symmetric), but I haven't proven it.)
I consider the position of the milestones as the solution and use a helper function steps for the output desired by the OP.
from collections import Counter
def milestones_from_dists(dists, milestones=None):
if not dists: # all dist are acounted for: we have a solution!
return milestones
if milestones is None:
milestones = [0]
max_dist = max(dists)
solution_from_left = try_milestone(dists, milestones, min(milestones) + max_dist)
if solution_from_left is not None:
return solution_from_left
return try_milestone(dists, milestones, max(milestones) - max_dist)
def try_milestone(dists, milestones, new_milestone):
unused_dists = Counter(dists)
for milestone in milestones:
dist = abs(milestone - new_milestone)
if unused_dists[dist]:
unused_dists[dist] -= 1
if unused_dists[dist] == 0:
del unused_dists[dist]
else:
return None # no solution
return milestones_from_dists(unused_dists, milestones + [new_milestone])
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Example usage:
>>> print(steps(milestones_from_dists([7, 10, 5, 2, 8, 3])))
[3, 5, 2]
>>> import random
>>> milestones = random.sample(range(1000), 100)
>>> dists = [abs(x - y) for x in milestones for y in milestones if x < y]
>>> solution = sorted(milestones_from_dists(dists))
>>> solution == sorted(milestones)
True
>>> print(solution)
[0, 10, 16, 23, 33, 63, 72, 89, 97, 108, 131, 146, 152, 153, 156, 159, 171, 188, 210, 211, 212, 215, 219, 234, 248, 249, 273, 320, 325, 329, 339, 357, 363, 387, 394, 396, 402, 408, 412, 418, 426, 463, 469, 472, 473, 485, 506, 515, 517, 533, 536, 549, 586, 613, 614, 615, 622, 625, 630, 634, 640, 649, 651, 653, 671, 674, 697, 698, 711, 715, 720, 730, 731, 733, 747, 758, 770, 772, 773, 776, 777, 778, 783, 784, 789, 809, 828, 832, 833, 855, 861, 873, 891, 894, 918, 952, 953, 968, 977, 979]
>>> print(steps(solution))
[10, 6, 7, 10, 30, 9, 17, 8, 11, 23, 15, 6, 1, 3, 3, 12, 17, 22, 1, 1, 3, 4, 15, 14, 1, 24, 47, 5, 4, 10, 18, 6, 24, 7, 2, 6, 6, 4, 6, 8, 37, 6, 3, 1, 12, 21, 9, 2, 16, 3, 13, 37, 27, 1, 1, 7, 3, 5, 4, 6, 9, 2, 2, 18, 3, 23, 1, 13, 4, 5, 10, 1, 2, 14, 11, 12, 2, 1, 3, 1, 1, 5, 1, 5, 20, 19, 4, 1, 22, 6, 12, 18, 3, 24, 34, 1, 15, 9, 2]
New implementation incorporationg suggestions from the comments
from collections import Counter
def milestones_from_dists(dists):
dists = Counter(dists)
right_end = max(dists)
milestones = [0, right_end]
del dists[right_end]
sorted_dists = sorted(dists)
add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
return milestones
def add_milestone
s_from_dists(dists, milestones, sorted_dists, right_end):
if not dists:
return True # success!
# find max dist that's not fully used yet
deleted_dists = []
while not dists[sorted_dists[-1]]:
deleted_dists.append(sorted_dists[-1])
del sorted_dists[-1]
max_dist = sorted_dists[-1]
# for both possible positions, check if this fits the already placed milestones
for new_milestone in [max_dist, right_end - max_dist]:
used_dists = Counter() # for backing up
for milestone in milestones:
dist = abs(milestone - new_milestone)
if dists[dist]: # this distance is still available
dists[dist] -= 1
if dists[dist] == 0:
del dists[dist]
used_dists[dist] += 1
else: # no solution
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
break
else: # unbroken
milestones.append(new_milestone)
success = add_milestones_from_dists(dists, milestones, sorted_dists, right_end)
if success:
return True
dists.update(used_dists) # back up
sorted_dists.extend(reversed(deleted_dists))
del milestones[-1]
return False
def steps(milestones):
milestones = sorted(milestones)
return [milestones[i] - milestones[i - 1] for i in range(1, len(milestones))]
Timings for random milestones in the range from 0 to 100000:
n = 10: 0.00s
n = 100: 0.05s
n = 1000: 3.20s
n = 10000: still takes too long.
The largest distance in the given set of distance is the distance between the first and the last milestone, i.e. in your example 10. You can find this in O(n) step.
For every other milestone (every one except the first or the last), you can find their distances from the first and the last milestone by looking for a pair of distances that sums up to the maximum distance, i.e. in your example 7+3 = 10, 8+2 = 10. You can find these pairs trivially in O(n^2).
Now if you think the road is from east to west, what remains is that for all the interior milestones (all but the first or the last), you need to know which one of the two distances (e.g. 7 and 3, or 8 and 2) is towards east (the other is then towards west).
You can trivially enumerate all the possibilities in time O(2^(n-2)), and for every possible orientation check that you get the same set of distances as in the problem. This is faster than enumerating through all permutations of the smallest distances in the set.
For example, if you assume 7 and 8 are towards west, then the distance between the two internal milestones is 1 mile, which is not in the problem set. So it must be 7 towards west, 8 towards east, leading to solution (or it's mirror)
WEST | -- 2 -- | -- 5 -- | -- 3 -- | EAST
For a larger set of milestones, you would just start guessing the orientation of the two distances to the endpoints, and whenever you product two milestones that have a distance between them that is not in the problem set, you backtrack.

Compare all elements inside a 2D array with each other

I have a perfectly square 64x64 2D array of integers that will never have a value greater than 64. I was wondering if there is a really fast way to compare all of the elements with each other and display the ones that are the same, in a unique way.
At the current moment I have this
2D int array named array
loop from i = 0 to 64
loop from j = 0 to 64
loop from k = (j+1) to 64
loop from z = 0 to 64
if(array[i][j] == array[k][z])
print "element [i][j] is same as [k][z]
As you see having 4 nested loops is quite a stupid thing that I would like not to use. Language does not matter at all whatsoever, I am just simply curious to see what kind of cool solutions it is possible to use. Since value inside any integer will not be greater than 64, I guess you can only use 6 bits and transform array into something fancier. And that therefore would require less memory and would allow for some really fancy bitwise operations. Alas I am not quite knowledgeable enough to think in that format, and therefore would like to see what you guys can come up with.
Thanks to anyone in advance for a really unique solution.
There's no need to sort the array via an O(m log m) algorithm; you can use an O(m) bucket sort. (Letting m = n*n = 64*64).
An easy O(m) method using lists is to set up an array H of n+1 integers, initialized to -1; also allocate an array L of m integers each, to use as list elements. For the i'th array element, with value A[i], set k=A[i] and L[i]=H[k] and H[k]=i. When that's done, each H[k] is the head of a list of entries with equal values in them. For 2D arrays, treat array element A[i,j] as A[i+n*(j-1)].
Here's a python example using python lists, with n=7 for ease of viewing results:
import random
n = 7
m = n*n
a=[random.randint(1,n) for i in range(m)]
h=[[] for i in range(n+1)]
for i in range(m):
k = a[i]
h[k].append(i)
for i in range(1,n+1):
print 'With value %2d: %s' %(i, h[i])
Its output looks like:
With value 1: [1, 19, 24, 28, 44, 45]
With value 2: [3, 6, 8, 16, 27, 29, 30, 34, 42]
With value 3: [12, 17, 21, 23, 32, 41, 47]
With value 4: [9, 15, 36]
With value 5: [0, 4, 7, 10, 14, 18, 26, 33, 38]
With value 6: [5, 11, 20, 22, 35, 37, 39, 43, 46, 48]
With value 7: [2, 13, 25, 31, 40]
class temp {
int i, j;
int value;
}
then fill your array in class temp array[64][64], then sort it by value (you can do this in Java by implementing a comparable interface). Then the equal element should be after each other and you can extract i,j for each other.
This solution would be optimal, categorizing as a quadratic approach for big-O notation.
Use quicksort on the array, then iterate through the array, storing a temporary value of the "cursor" (current value you're looking at), and determine if the temporary value is the same as the next cursor.
array[64][64];
quicksort(array);
temp = array[0][0];
for x in array[] {
for y in array[][] {
if(temp == array[x][y]) {
print "duplicate found at x,y";
}
temp = array[x][y];
}
}

Resources