DTW algorithm: simple implementation - Verification - algorithm

I have tried to make a simple implementation of the DTW algorithm in C,without using any substantial optimization techniques. I am trying to use this implementation for some simple sketch recognition, which is to say finding the k closest neighbors of a given sketch from within a set. I have gotten some results that seem weird to me and I would like to know of this is because of my dtw implementation. I need someone to verify my algorithm.
As I said, I am trying to find the k closest neighbors, so the only 'optimization' I have implemented to make calculations faster is that if the minimum cost of a given line calculated is at any point greater than the maximum distance between the k sketches currently considered as the closest neighbors, I stop calculating and return +inf.
Here is the corresponding algorithm:
(returnValue totalCost) dtw(sketch1, sketch2, curMaxDist){
distMatrix = 'empty matrix of size (sketch.size) x (sketch2.size)'
totalCostMatrix = 'empty matrix of size (sketch1.size) x (sketch2.size)'
for(i = 0 to sketch1.size - 1){
for(j = 0 to sketch2.size - 1){
distMatrix[i][j] = euclidianDistance(sketch1.point[i], sketch2.point[j])
totalCostMatrix[i][j] = +inf
}
}
//I am forcing the first points of each sketch to correspond to one
// and continue applying the algorithm from the next points.
for(i = 1 to sketch1.size - 1){
curMinDist = +inf
for(j = 1 to sketch2.size - 1){
totalCostMatrix[i][j] = min(totalCostMatrix[i-1][j-1],
totalCostMatrix[i-1][j],
totalCostMatrix[i][j-1]) + distMatrix[i][j]
if(totalCostMatrix[i][j] < curMinDist)
curMinDist = totalCostMatrix[i][j]
}
if(curMinDist > curMaxDist)
return +inf
}
return totalCostMatrix[sketch1.size - 1][sketch2.size - 1]
}
I am sure there is nothind wrong with the implementation as far as the syntax, C language etc is concerned since I have checked that and I always get the expectes result. I was just wandering if there is something wrong with the reasoning behind the algorithm. I am asking because it is a really well known algorithm and a really simple implementation so maybe it is easy for someone to spot an error there.

Related

Proving that there are no overlapping sub-problems?

I just got the following interview question:
Given a list of float numbers, insert “+”, “-”, “*” or “/” between each consecutive pair of numbers to find the maximum value you can get. For simplicity, assume that all operators are of equal precedence order and evaluation happens from left to right.
Example:
(1, 12, 3) -> 1 + 12 * 3 = 39
If we built a recursive solution, we would find that we would get an O(4^N) solution. I tried to find overlapping sub-problems (to increase the efficiency of this algorithm) and wasn't able to find any overlapping problems. The interviewer then told me that there wasn't any overlapping subsolutions.
How can we detect when there are overlapping solutions and when there isn't? I spent a lot of time trying to "force" subsolutions to appear and eventually the Interviewer told me that there wasn't any.
My current solution looks as follows:
def maximumNumber(array, current_value=None):
if current_value is None:
current_value = array[0]
array = array[1:]
if len(array) == 0:
return current_value
return max(
maximumNumber(array[1:], current_value * array[0]),
maximumNumber(array[1:], current_value - array[0]),
maximumNumber(array[1:], current_value / array[0]),
maximumNumber(array[1:], current_value + array[0])
)
Looking for "overlapping subproblems" sounds like you're trying to do bottom up dynamic programming. Don't bother with that in an interview. Write the obvious recursive solution. Then memoize. That's the top down approach. It is a lot easier to get working.
You may get challenged on that. Here was my response the last time that I was asked about that.
There are two approaches to dynamic programming, top down and bottom up. The bottom up approach usually uses less memory but is harder to write. Therefore I do the top down recursive/memoize and only go for the bottom up approach if I need the last ounce of performance.
It is a perfectly true answer, and I got hired.
Now you may notice that tutorials about dynamic programming spend more time on bottom up. They often even skip the top down approach. They do that because bottom up is harder. You have to think differently. It does provide more efficient algorithms because you can throw away parts of that data structure that you know you won't use again.
Coming up with a working solution in an interview is hard enough already. Don't make it harder on yourself than you need to.
EDIT Here is the DP solution that the interviewer thought didn't exist.
def find_best (floats):
current_answers = {floats[0]: ()}
floats = floats[1:]
for f in floats:
next_answers = {}
for v, path in current_answers.iteritems():
next_answers[v + f] = (path, '+')
next_answers[v * f] = (path, '*')
next_answers[v - f] = (path, '-')
if 0 != f:
next_answers[v / f] = (path, '/')
current_answers = next_answers
best_val = max(current_answers.keys())
return (best_val, current_answers[best_val])
Generally the overlapping sub problem approach is something where the problem is broken down into smaller sub problems, the solutions to which when combined solve the big problem. When these sub problems exhibit an optimal sub structure DP is a good way to solve it.
The decision about what you do with a new number that you encounter has little do with the numbers you have already processed. Other than accounting for signs of course.
So I would say this is a over lapping sub problem solution but not a dynamic programming problem. You could use dive and conquer or evenmore straightforward recursive methods.
Initially let's forget about negative floats.
process each new float according to the following rules
If the new float is less than 1, insert a / before it
If the new float is more than 1 insert a * before it
If it is 1 then insert a +.
If you see a zero just don't divide or multiply
This would solve it for all positive floats.
Now let's handle the case of negative numbers thrown into the mix.
Scan the input once to figure out how many negative numbers you have.
Isolate all the negative numbers in a list, convert all the numbers whose absolute value is less than 1 to the multiplicative inverse. Then sort them by magnitude. If you have an even number of elements we are all good. If you have an odd number of elements store the head of this list in a special var , say k, and associate a processed flag with it and set the flag to False.
Proceed as before with some updated rules
If you see a negative number less than 0 but more than -1, insert a / divide before it
If you see a negative number less than -1, insert a * before it
If you see the special var and the processed flag is False, insert a - before it. Set processed to True.
There is one more optimization you can perform which is removing paris of negative ones as candidates for blanket subtraction from our initial negative numbers list, but this is just an edge case and I'm pretty sure you interviewer won't care
Now the sum is only a function of the number you are adding and not the sum you are adding to :)
Computing max/min results for each operation from previous step. Not sure about overall correctness.
Time complexity O(n), space complexity O(n)
const max_value = (nums) => {
const ops = [(a, b) => a+b, (a, b) => a-b, (a, b) => a*b, (a, b) => a/b]
const dp = Array.from({length: nums.length}, _ => [])
dp[0] = Array.from({length: ops.length}, _ => [nums[0],nums[0]])
for (let i = 1; i < nums.length; i++) {
for (let j = 0; j < ops.length; j++) {
let mx = -Infinity
let mn = Infinity
for (let k = 0; k < ops.length; k++) {
if (nums[i] === 0 && k === 3) {
// If current number is zero, removing division
ops.splice(3, 1)
dp.splice(3, 1)
continue
}
const opMax = ops[j](dp[i-1][k][0], nums[i])
const opMin = ops[j](dp[i-1][k][1], nums[i])
mx = Math.max(opMax, opMin, mx)
mn = Math.min(opMax, opMin, mn)
}
dp[i].push([mx,mn])
}
}
return Math.max(...dp[nums.length-1].map(v => Math.max(...v)))
}
// Tests
console.log(max_value([1, 12, 3]))
console.log(max_value([1, 0, 3]))
console.log(max_value([17,-34,2,-1,3,-4,5,6,7,1,2,3,-5,-7]))
console.log(max_value([59, 60, -0.000001]))
console.log(max_value([0, 1, -0.0001, -1.00000001]))

Dynamic programming from cormen's book

When reading about dynamic programming in "Introduction to algorithms" By cormen, Chapter 15: Dynamic Programming , I came across this statement
When developing a dynamic-programming algorithm, we follow a sequence of
four steps:
Characterize the structure of an optimal solution.
Recursively define the value of an optimal solution.
Compute the value of an optimal solution, typically in a bottom-up fashion.
Construct an optimal solution from computed information.
Steps 1–3 form the basis of a dynamic-programming solution to a problem. If we
need only the value of an optimal solution, and not the solution itself, then we
can omit step 4. When we do perform step 4, we sometimes maintain additional
information during step 3 so that we can easily construct an optimal solution.
I did not understand the difference in step 3 and 4.
computing the value of optimal solution
and
constructing the optimal solution.
I was expecting to understand this by reading even further, but failed to understand.
Can some one help me understanding this by giving an example ?
Suppose we are using dynamic programming to work out whether there is a subset of [1,3,4,6,10] that sums to 9.
The answer to step 3 is the value, in this case "TRUE".
The answer to step 4 is working out the actual subset that sums to 9, in this case "3+6".
In dynamical programming we most of the time end up with a huge results hash. However initially it only contains the result obtained from the first, smallest, simplest (bottom) case and by using these initial results and calculating on top of them we eventually merge to the target. At this point the last item in the hash most of the time is the target (step 3 completed). Then we will have to process it to get the desired result.
A perfect example could be finding the minimum number of cubes summing up to a target. Target is 500 and we should get [5,5,5,5] or if the target is 432 we must get [6,6].
So we can implement this task in JS as follows;
function getMinimumCubes(tgt){
var maxi = Math.floor(Math.fround(Math.pow(tgt,1/3))),
hash = {0:[[]]},
cube = 0;
for (var i = 1; i <= maxi; i++){
cube = i*i*i;
for (var j = 0; j <= tgt - cube; j++){
hash[j+cube] = hash[j+cube] ? hash[j+cube].concat(hash[j].map(e => e.concat(i)))
: hash[j].map(e => e.concat(i));
}
}
return hash[tgt].reduce((p,c) => p.length < c.length ? p:c);
}
var target = 432,
result = [];
console.time("perf:");
result = getMinimumCubes(target);
console.timeEnd("perf:");
console.log(result);
So in this code, hash = {0:[[]]}, is step 1; the nested for loops which eventually prepare the hash[tgt] are in fact step 3 and the .reduce() functor at the return stage is step 4 since it shapes up the last item of the hash (hash[tgt]) to give us the desired result by filtering out the shortest result among all results that sum up to the target value.
To me the step 2 is somewhat meaningless. Not because of the mention of recursion but also by meaning. Besides I have never used nor seen a recursive approach in dynamical programming. It's best implemented with while or for loops.

Given a large database of over 50,000 , How can I quickly search for desired points

I have a database of over 50,000 points. Each point has 3 dimensions. Let's label them [i,j,k]
I wish to look for points in which it is better than another point in some other way.
For example, Object A [10 10 3], and Object B[1 1 4], Object C[1 1 1], Object D[1 1 10]
Then the desired output would be A and D (since C is worser than all of them, and B beats A in dimenson[k] but D beats B in dimension [k])
I've tried some basic comparison algorithms (i.e. if else statements) which do work when I cut down the database size. But with 50,000, it takes more than 10mins to find the desired output, which of course is not a good solution.
Could somebody recommend me a method or two to do this the fastest possible way?
Thanks
EDIT:
Thanks I think I've got it
You can do many optimizations to your code:
{
vector<bool> isinterst(n, true);
for (int i=0; i<n; i++) {
for (int j=0; j<n; j++) {
if (isinterst[i]) {
bool worseelsewhere=false;
for (int k=0; k<d; k++)
{
if (point[i][k]<point[j][k])
{
worseelsewhere=true;
break; //you can exit for loop if worseelsewhere is set to true
}
}
if(worseelsewhere == false)
{
continue; //skip the rest if worseelsewhere is false
}
bool worse=true;
for (int k=0; k<d; k++)
{
if (point[i][k]>point[j][k])
{
worse=false;
break; //you can exit for loop if worse is set to false
}
}
if (worseelsewhere && worse) {
isinterst[i]=false;
//cout << i << " Not desirable " << endl;
}
}
}
}
You're looking for pareto-optimal points. These form a convex hull. That's easiest to see in 2 dimensions. Use an iterative algorithm to determine the pareto-optimal points of the first N points. For N=1, that's just the first point. For N=2, the next point is either dominated by the first (discard 2nd), dominates the 1st (discard 1st), lies above to the left, or below to the right (and so is also pareto-optimal).
You can speed up classification by keeping a simplified upper and lower bound for the convex hull, e.g. just single points {minX, minY, minZ} and {maxX, maxY, maxZ}. If P={x,y,z} is dominated by {minX, minY, minZ} then it is dominated by all pareto-optimal points so far and can be discarded. If P dominates {maxX, maxY, maxZ}, it also dominates all points that were pareto-optimal so far and you can discard all those.
A quick O(log N) initial step is to first sort the collection in X order to find the point with max X, then Y to find the point with max Y, and finally with max Z. Finding the pareto-optimal points in ths N=3 subset is easy, and can be hardcoded. You can then use this set as a first approximation.
A more refined solution is to then sort by X+Y, X+Z, Y+Z and X+Y+Z and find those maxima as well. Again, this produces points which are good initial candidates because they will dominate many other points.
E.g. in your case, sorting by X and sorting by Y would both produce point A; sorting by Z would produce point D, neither dominates the other, and you can then quickly discard B and C.
Without knowing your definition of "better" it's a bit hard to make concrete suggestions here. I note, however, that you appear to working with spatial data. A data structure that is often used when working with spatial data is the R-Tree (http://en.wikipedia.org/wiki/R-tree). This provides an efficient index for multidimensional information.
Perhaps the boost::geometry library has some tools that will assist: http://www.boost.org/doc/libs/1_53_0/libs/geometry/doc/html/geometry/introduction.html

Algorithm to find out all the possible positions

I need an algorithm to find out all the possible positions of a group of pieces in a chessboard. Like finding all the possible combinations of the positions of a number N of pieces.
For example in a chessboard numbered like cartesian coordinate systems any piece would be in a position
(x,y) where 1 <= x <= 8 and 1 <= y <= 8
I'd like to get an algorithm which can calculate for example for 3 pieces all the possible positions of the pieces in the board. But I don't know how can I get them in any order. I can get all the possible positions of a single piece but I don't know how to mix them with more pieces.
for(int i = 0; i<= 8; i++){
for(int j = 0; j<= 8; j++){
System.out.println("Position: x:"+i+", y:"+j);
}
}
How can I get a good algoritm to find all the posible positions of the pieces in a chessboard?
Thanks.
You got 8x8 board, so total of 64 squares.
Populate a list containing these 64 sqaures [let it be list], and find all of the possibilities recursively: Each step will "guess" one point, and invoke the recursve call to find the other points.
Pseudo code:
choose(list,numPieces,sol):
if (sol.length == numPieces): //base clause: print the possible solution
print sol
return
for each point in list:
sol.append(point) //append the point to the end of sol
list.remove(point)
choose(list,numPieces,sol) //recursive call
list.add(point) //clean up environment before next recursive call
sol.removeLast()
invoke with choose(list,numPieces,[]) where list is the pre-populated list with 64 elements, and numPieces is the pieces you are going to place.
Note: This solution assumes pieces are not identical, so [(1,2),(2,1)] and [(2,1),(1,2)] are both good different solutions.
EDIT:
Just a word about complexity, since there are (n^2)!/(n^2-k)! possible solutions for your problem - and you are looking for all of them, any algorithm will suffer from exponential run time, so trying to invoke it with just 10 pieces, will take ~400 years
[In the above notation, n is the width and length of the board, and k is the number of pieces]
You can use a recursive algorithm to generate all possiblities:
void combine(String instr, StringBuffer outstr, int index)
{
for (int i = index; i < instr.length(); i++)
{
outstr.append(instr.charAt(i));
System.out.println(outstr);
combine(instr, outstr, i + 1);
outstr.deleteCharAt(outstr.length() - 1);
}
}
combine("abc", new StringBuffer(), 0);
As I understand you should consider that some firgure may come block some potential position for figures that can reach them on the empty board. I guess it is the most tricky part.
So you should build some set of vertexes (set of board states) that is reached from some single vertex (initial board state).
The first algorithm that comes to my mind:
Pre-conditions:
Order figures in some way to form circle.
Assume initial set of board states (S0) to contain single element which represents inital board state.
Actions
Choose next figure to extend set of possible positions
For each state of board within S(n) walk depth-first all possible movements that new board states and call it F(n) (frame).
Form S(n+1) = S(n) ∪ F(n).
Repeat steps till all frames of updates during whole circle pass will not be empty.
This is kind of mix breath-first and depth-first search

Suggestion on algorithm to distribute objects of different value

I have the following problem:
Given N objects (N < 30) of different values multiple of a "k" constant i.e. k, 2k, 3k, 4k, 6k, 8k, 12k, 16k, 24k and 32k, I need an algorithm that will distribute all items to M players (M <= 6) in such a way that the total value of the objects each player gets is as even as possible (in other words, I want to distribute all objects to all players in the fairest way possible).
EDIT: By fairest distribution I mean that the difference between the value of the objects any two players get is minimal.
Another similar case would be: I have N coins of different values and I need to divide them equally among M players; sometimes they don't divide exactly and I need to find the next best case of distribution (where no player is angry because another one got too much money).
I don't need (pseudo)code to solve this (also, this is not a homework :) ), but I'll appreciate any ideas or links to algorithms that could solve this.
Thanks!
The problem is strongly NP-complete. This means there is no way to ensure a correct solution in reasonable time. (See 3-partition-problem, thanks Paul).
Instead you'll wanna go for a good approximate solution generator. These can often get very close to the optimal answer in very short time. I can recommend the Simulated Annealing technique, which you will also be able to use for a ton of other NP-complete problems.
The idea is this:
Distribute the items randomly.
Continually make random swaps between two random players, as long as it makes the system more fair, or only a little less fair (see the wiki for details).
Stop when you have something fair enough, or you have run out of time.
This solution is much stronger than the 'greedy' algorithms many suggest. The greedy algorithm is the one where you continuously add the largest item to the 'poorest' player. An example of a testcase where greedy fails is [10,9,8,7,7,5,5].
I did an implementation of SA for you. It follows the wiki article strictly, for educational purposes. If you optimize it, I would say a 100x improvement wouldn't be unrealistic.
from __future__ import division
import random, math
values = [10,9,8,7,7,5,5]
M = 3
kmax = 1000
emax = 0
def s0():
s = [[] for i in xrange(M)]
for v in values:
random.choice(s).append(v)
return s
def E(s):
avg = sum(values)/M
return sum(abs(avg-sum(p))**2 for p in s)
def neighbour(s):
snew = [p[:] for p in s]
while True:
p1, p2 = random.sample(xrange(M),2)
if s[p1]: break
item = random.randrange(len(s[p1]))
snew[p2].append(snew[p1].pop(item))
return snew
def P(e, enew, T):
if enew < e: return 1
return math.exp((e - enew) / T)
def temp(r):
return (1-r)*100
s = s0()
e = E(s)
sbest = s
ebest = e
k = 0
while k < kmax and e > emax:
snew = neighbour(s)
enew = E(snew)
if enew < ebest:
sbest = snew; ebest = enew
if P(e, enew, temp(k/kmax)) > random.random():
s = snew; e = enew
k += 1
print sbest
Update: After playing around with Branch'n'Bound, I now believe this method to be superior, as it gives perfect results for the N=30, M=6 case within a second. However I guess you could play around with the simulated annealing approach just as much.
The greedy solution suggested by a few people seems like the best option, I ran it a bunch of times with some random values, and it seems to get it right every time.
If it's not optimal, it's at the very least very close, and it runs in O(nm) or so (I can't be bothered to do the math right now)
C# Implementation:
static List<List<int>> Dist(int n, IList<int> values)
{
var result = new List<List<int>>();
for (int i = 1; i <= n; i++)
result.Add(new List<int>());
var sortedValues = values.OrderByDescending(val => val);
foreach (int val in sortedValues)
{
var lowest = result.OrderBy(a => a.Sum()).First();
lowest.Add(val);
}
return result;
}
how about this:
order the k values.
order the players.
loop over the k values giving the next one to the next player.
when you get to the end of the players, turn around and continue giving the k values to the players in the opposite direction.
Repeatedly give the available object with the largest value to the player who has the least total value of objects assigned to him.
This is a straight-forward implementation of Justin Peel's answer:
M = 3
players = [[] for i in xrange(M)]
values = [10,4,3,1,1,1]
values.sort()
values.reverse()
for v in values:
lowest=sorted(players, key=lambda x: sum(x))[0]
lowest.append(v)
print players
print [sum(p) for p in players]
I am a beginner with Python, but it seems to work okay. This example will print
[[10], [4, 1], [3, 1, 1]]
[10, 5, 5]
30 ^ 6 isn't that large (it's less than 1 billion). Go through every possible allocation, and pick the one that's the fairest by whatever measure you define.
EDIT:
The purpose was to use the greedy solution with small improvement in the implementation, which is maybe transparent in C#:
static List<List<int>> Dist(int n, IList<int> values)
{
var result = new List<List<int>>();
for (int i = 1; i <= n; i++)
result.Add(new List<int>());
var sortedValues = values.OrderByDescending(val => val);//Assume the most efficient sorting algorithm - O(N log(N))
foreach (int val in sortedValues)
{
var lowest = result.OrderBy(a => a.Sum()).First();//This can be done in O(M * log(n)) [M - size of sortedValues, n - size of result]
lowest.Add(val);
}
return result;
}
Regarding this stage:
var lowest = result.OrderBy(a => a.Sum()).First();//This can be done in O(M * log(n)) [M - size of sortedValues, n - size of result]
The idea is that the list is always sorted (In this code it is done by OrderBy). Eventually, this sorting wont take more than O (log(n)) - because we just need to INSERT at most one item into a sorted list - that should take the same as a binary search.
Because we need to repeat this phase for sortedValues.Length times, the whole algorithm runs in O(M * log(n)).
So, in words, it can be rephrased as:
Repeat the steps below till you finish the Values values:
1. Add the biggest value to the smallest player
2. Check if this player still has the smallest sum
3. If yes, go to step 1.
4. Insert the last-that-was-got player to the sorted players list
Step 4 is the O (log(n)) step - as the list is always sorted.

Resources