Related
Please, I would like to find the maximum sum with only one value per row. I already made the resolution by brute force and it is O (N^5). Now I would like to find a way with dynamic programming or another way to reduce the complexity.
For example:
Matrix:
100 5 4 3 1
90 80 70 60 50
70 69 65 20 10
60 20 10 5 1
50 45 15 6 1
Solution for 5 sets:
100 + 90 + 70 + 60 + 50 = 370
100 + 90 + 69 + 60 + 50 = 369
100 + 90 + 70 + 60 + 45 = 365
100 + 90 + 65 + 60 + 50 = 365
100 + 90 + 69 + 60 + 45 = 364
Sum: 1833
example for the sum with brute force:
for(int i=0; i<matrix[0].size(); i++) {
for(int j=0; j<matrix[1].size(); j++) {
for(int k=0; k<matrix[2].size(); k++) {
for(int l=0; l<matrix[3].size(); l++) {
for(int x=0; x<matrix[4].size(); x++) {
sum.push_back(matrix[0][i] + matrix[1][j] + matrix[2][k] + matrix[3][l] + matrix[4][x]);
}
}
}
}
}
sort(sum.begin(), sum.end(), mySort);
Thanks!
You can solve it in O(k*log k) time with Dijkstra's algorithm. A node in a graph is represented by a list with 5 indexes of the numbers in the corresponding rows of the matrix.
For example in the matrix
100 5 4 3 1
90 80 70 60 50
70 69 65 20 10
60 20 10 5 1
50 45 15 6 1
the node [0, 0, 2, 0, 1] represents the numbers [100, 90, 65, 60, 45]
The initial node is [0, 0, 0, 0, 0]. Every node has up to 5 outgoing edges increasing 1 of the 5 indexes by 1, and the distance between nodes is the absolute difference in the sums of the indexed numbers.
So for that matrix the edges from the node [0, 0, 2, 0, 1] lead:
to [1, 0, 2, 0, 1] with distance 100 - 5 = 95
to [0, 1, 2, 0, 1] with distance 90 - 80 = 10
to [0, 0, 3, 0, 1] with distance 65 - 20 = 45
to [0, 0, 2, 1, 1] with distance 60 - 20 = 40
to [0, 0, 2, 0, 2] with distance 45 - 15 = 30
With this setup you can use Dijkstra's algorithm to find k - 1 closest nodes to the initial node.
Update I previously used a greedy algorithm, which doesn't work for this problem. Here is a more general solution.
Suppose we've already found the combinations with the top m highest sums. The next highest combination (number m+1) must be 1 step away from one of these, where a step is defined as shifting focus one column to the right in one of the rows of the matrix. (Any combination that is more than one step away from all of the top m combinations cannot be the m+1 highest, because you can convert it to a higher one that is not in the top m by undoing one of those steps, i.e., moving back toward one of the existing combinations.)
For m = 1, we know that the "m highest combinations" just means the combination made by taking the first element of each row of the matrix (assuming each row is sorted from highest to lowest). So then we can work out from there:
Create a set of candidate combinations to consider for the next highest position. This will initially hold only the highest possible combination (first column of the matrix).
Identify the candidate with the highest sum and move that to the results.
Find all the combinations that are 1 step away from the one that was just added to the results. Add all of these to the set of candidate combinations. Only n of these will be added each round, where n is the number of rows in the matrix. Some may be duplicates of previously identified candidates, which should be ignored.
Go back to step 2. Repeat until there are 5 results.
Here is some Python code that does this:
m = [
[100, 5, 4, 3, 1],
[90, 80, 70, 60, 50],
[70, 69, 65, 20, 10],
[60, 20, 10, 5, 1],
[50, 45, 15, 6, 1]
]
n_cols = len(m[0]) # matrix width
# helper function to calculate the sum for any combination,
# where a "combination" is a list of column indexes for each row
score = lambda combo: sum(m[r][c] for r, c in enumerate(combo))
# define candidate set, initially with single highest combination
# (this set could also store the score for each combination
# to avoid calculating it repeatedly)
candidates = {tuple(0 for row in m)}
results = set()
# get 5 highest-scoring combinations
for i in range(5):
result = max(candidates, key=score)
results.add(result)
candidates.remove(result) # don't test it again
# find combinations one step away from latest result
# and add them to the candidates set
for j, c in enumerate(result):
if c+1 >= n_cols:
continue # don't step past edge of matrix
combo = result[:j] + (c+1,) + result[j+1:]
if combo not in results:
candidates.add(combo) # drops dups
# convert from column indexes to actual values
final = [
[m[r][c] for r, c in enumerate(combo)]
for combo in results
]
final.sort(key=sum, reverse=True)
print(final)
# [
# [100, 90, 70, 60, 50]
# [100, 90, 69, 60, 50],
# [100, 90, 70, 60, 45],
# [100, 90, 65, 60, 50],
# [100, 90, 69, 60, 45],
# ]
If you want just maximum sum, then sum maximum value at each row.
That is,
M = [[100, 5, 4, 3, 1],
[90, 80, 70, 60, 50],
[70, 69, 65, 20, 10],
[60, 20, 10, 5, 1],
[50, 45, 15, 6, 1]]
sum(max(row) for row in M)
Edit
It is not necessary to use dynamic programming, etc.
There is simple rule: select next number considering difference between the number and current number.
Here is a code using numpy.
import numpy as np
M = np.array(M)
M = -np.sort(-M, axis = 1)
k = 3
answer = []
ind = np.zeros(M.shape[0], dtype = int)
for _ in range(k):
answer.append(sum(M[list(range(M.shape[0])), ind]))
min_ind = np.argmin(M[list(range(len(ind))), ind] - M[list(range(len(ind))), ind+1])
ind[min_ind] += 1
Result is [370, 369, 365].
I have ordered lists of numbers (like barcode positions, spectral lines) that I am trying to compare for similarity. Ideally, I would like to compare two lists to get a value from 1.0 (match) degrading gracefully to 0.
The lists could be offset by an arbitrary amount, and that should not degrade the match. The diffs between adjacent items are the most applicable characterization.
Due to noise in the system, some items may be missing (alternatively, extra items may be inserted, depending on point of view).
The diff values may be reordered.
The diff values may be scaled.
Multiple transformations above may be applied and each should reduce similarity proportionally.
Here is some test data:
# deltas
d = [100+(i*10) for i in xrange(10)] # [100, 110, 120, 130, 140, 150, 160, 170, 180, 190]
d_swap = d[:4] + [d[5]] + [d[4]] + d[6:] # [100, 110, 120, 130, 150, 140, 160, 170, 180, 190]
# absolutes
a = [1000+j for j in [0]+[sum(d[:i+1]) for i in xrange(len(d))]] # [1000, 1100, 1210, 1330, 1460, 1600, 1750, 1910, 2080, 2260, 2450]
a_offs = [i+3000 for i in a] # [4000, 4100, 4210, 4330, 4460, 4600, 4750, 4910, 5080, 5260, 5450]
a_rm = a[:2] + a[3:] # [1000, 1100, 1330, 1460, 1600, 1750, 1910, 2080, 2260, 2450]
a_add = a[:7] + [(a[6]+a[7])/2] + a[7:] # [1000, 1100, 1210, 1330, 1460, 1600, 1750, 1830, 1910, 2080, 2260, 2450]
a_swap = [1000+j for j in [0]+[sum(d_swap[:i+1]) for i in xrange(len(d_swap))]] # [1000, 1100, 1210, 1330, 1460, 1610, 1750, 1910, 2080, 2260, 2450]
a_stretch = [1000+j for j in [0]+[int(sum(d[:i+1])*1.1) for i in xrange(len(d))]] # [1000, 1110, 1231, 1363, 1506, 1660, 1825, 2001, 2188, 2386, 2595]
a_squeeze = [1000+j for j in [0]+[int(sum(d[:i+1])*0.9) for i in xrange(len(d))]] # [1000, 1090, 1189, 1297, 1414, 1540, 1675, 1819, 1972, 2134, 2305]
Sim(a, a_offs) should be 1.0 since offset is not considered a penalty.
Sim(a, a_rm) and Sim(a, a_add) should be about 0.91 because 10 of 11 or 11 of 12 match.
Sim(a, a_swap) should be about 0.96 because one diff is out of place (possibly with a further penalty based on distance if moved more than one position).
Sim(a, a_stretch) and Sim(a, a_squeeze) should be about 0.9 because diffs were scaled by about 1 part in 10.
I am thinking of something like difflib.SequenceMatcher but that works for numeric values with fuzziness instead of hard-compared hashables. It would also need to retain some awareness of the diff (first derivative) relationship.
This seems to be a dynamic programming problem, but I can't figure out how to construct an appropriate cost metric.
I am currently trying to understand the inception-v3 architecture and was taking a closer look at the definition of the model's layers:
with scopes.arg_scope([ops.conv2d, ops.max_pool, ops.avg_pool],stride=1, padding=’VALID’):
# 299 x 299 x 3
end_points[’conv0’] = ops.conv2d(inputs, 32, [3, 3], stride=2,scope=’conv0’)
# 149 x 149 x 32
end_points[’conv1’] = ops.conv2d(end_points[’conv0’], 32, [3, 3], scope=’conv1’)
# 147 x 147 x 32
end_points[’conv2’] = ops.conv2d(end_points[’conv1’], 64, [3, 3], padding=’SAME’, scope=’conv2’)
# 147 x 147 x 64
end_points[’pool1’] = ops.max_pool(end_points[’conv2’], [3, 3], stride=2, scope=’pool1’)
# 73 x 73 x 64
end_points[’conv3’] = ops.conv2d(end_points[’pool1’], 80, [1, 1], scope=’conv3’)
# 73 x 73 x 80.
end_points[’conv4’] = ops.conv2d(end_points[’conv3’], 192, [3, 3], scope=’conv4’)
# 71 x 71 x 192.
end_points[’pool2’] = ops.max_pool(end_points[’conv4’], [3, 3], stride=2, scope=’pool2’)
# 35 x 35 x 192.
net = end_points[’pool2’]
Checking the dimensions of each layer, I first had to take a look at the different padding styles: VALID and SAME. VALID will discard edges, while SAME will actually pad equally on both sides, so convolution still works on edges.
This holds for example for the first layer with 299x299 pixels to 149x149 with a stride of 2, so we only consider all odd pixels [Filter size: [3,3]] and end up with a dimension of 149x149, not 150x150 because padding is VALID (edges are discarded). Convolving this layer again, with the same filter size but now a stride of 1 we get 147x147 due to the edges "suffering" from being discarded. This layer then is again convolved but now with the twist, that padding is set to SAME which results in the same dimension of 147x147 as the layer before.
Now comes the spot that confuses me:
Assuming, SAME padding was only valid for the conv2 layer and is globally still set to VALID, the dimension for pool1 is correctly shown as 73x73 due to discarding the edge. When now going to the next convolutional layer conv3 I would expect it to become 71x71, taken the VALID padding as active. However, the output of conv3 remains at 73x73, which means, that SAME padding is used. But in conv4, the padding now seems to be VALID, reducing the dimension to 71x71 confusing me totally.
In the readme on github of slim's arg_scope I found, that setting one of the arguments locally overrides the global argument given:
with slim.arg_scope([slim.ops.conv2d], padding='SAME', stddev=0.01, weight_decay=0.0005):
net = slim.ops.conv2d(inputs, 64, [11, 11], scope='conv1')
net = slim.ops.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')
net = slim.ops.conv2d(net, 256, [11, 11], scope='conv3')
As the example illustrates, the use of arg_scope makes the code
cleaner, simpler and easier to maintain. Notice that while argument
values are specifed in the arg_scope, they can be overwritten locally.
In particular, while the padding argument has been set to 'SAME', the
second convolution overrides it with the value of 'VALID'.
However, this would mean, that conv4 should also have dimension of 73x73 because the padding would be SAME, so preserving the edges and the final pooling layer pool2 would then even be 37x37.
What is the thing that I am missing? Where is my mistake?
Thank you for helping me, I hope I have made the confusing problem clear.
I didn't see the filter size for the pool1 layer is actually [1,1] so it is not reducing the dimensions and has nothing to do with the arg_scope as it stays exactly how it should.
I don't even know how to explain this... I've been looking for algos but no luck.
I need a function that would return an array of incrementally bigger numbers (not sure what kind of curve) from two numbers that I'd pass as parameters.
Ex.:
$length = 20;
get_numbers(1, 1000, $length);
> 1, 2, 3, 5, 10, 20, 30, 50, 100, 200, 500... // let's say that these are 20 numbers that add up to 1000
Any idea how I could do this..? I guess I'm not smart enough to figure it out.
How about an exponential curve? Sample Python implementation:
begin = 1
end = 1000
diff = end - begin
length = 10
X = diff**(1.0/(length-1))
seq = []
for i in range(length):
seq.append(int(begin+X**i))
print seq
(note: ** is the Python operator for exponentiation. Other languages may or may not use ^ instead)
Result:
[2, 3, 5, 10, 22, 47, 100, 216, 464, 999]
I need to find numbers in a list, which make up a specific total:
Sum: 500
Subtotals: 10 490 20 5 5
In the end I need: {10 490, 490 5 5}
How do you call this type of problem? Are there algorithms to solve it efficiently?
This is Knapsack problem and it is an NP-complete problem, i.e. there is no efficient algorithm known for it.
This is not a knapsack problem.
In the worst case, with N subtotals, there can be O(2^N) solutions, so any algorithm in worst-case will be no better than this (thus, the problem doesn't belong to NP class at all).
Let's assume there are no non-positive elements in the Subtotals array and any element is no greater than Sum. We can sort array of subtotals, then build array of tail sums, adding 0 to the end. In your example, it will look like:
Subtotals: (490, 20, 10, 5, 5)
PartialSums: (530, 40, 20, 10, 5, 0)
Now for any "remaining sum" S, position i, and "current list" L we have problem E(S, i, L):
E(0, i, L) = (print L).
E(S, i, L) | (PartialSums[i] < S) = (nothing).
E(S, i, L) = E(S, i+1, L), E(S-Subtotals[i], j, L||Subtotals[i]), where j is index of first element of Subtotals lesser than or equal to (S-Subtotals[i]) or i+1, whichever is greater.
Our problem is E(Sum, 0, {}).
Of course, there's a problem with duplicates (if there were another 490 number in your list, this algorithm would output 4 solutions). If that's not what you need, using array of pairs (value, multiplicity) may help.
P.S. You may also consider dynamic programming if size of the problem is small enough:
Start with set {0}. Create array of sets equal to array of subtotals in size.
For every subtotal create a new set from previous set by adding subtotal value. Remove all elements greater than Sum. Merge it with previous set (it will essentially be the set of all possible sums).
If in the final set doesn't have Sum, then there is no solution. Otherwise, you backtrack solution from Sum to 0, checking whether previous set contains [value] and [value-subtotal].
Example:
(10, 490, 20, 5, 5)
Sets:
(0)
(0, 10)
(0, 10, 490, 500)
(0, 10, 20, 30, 490, 500) (510, 520 - discarded)
(0, 5, 10, 15, 20, 25, 30, 35, 490, 495, 500)
(0, 5, 10, 15, 20, 25, 30, 35, 40, 490, 495, 500)
From last set: [500-5] in previous set, [495-5] in previous set, [490-20] not in previous set ([490] is), [490-490] is 0, resulting answer {5, 5, 490}.