In section 3.5.4.2 of RFC5104, an algorithm to derive the bounding set of a set of lines is derived. Basically each line is of the form y=mx+b, and the objective is to find the points of intersections that identify the convex hull (equivalent to determine which receivers are relevant for bitrate adaptation in RTP media sessions). The following observation is taken from the RFC
These observations lead to the conclusion that when processing the
TMMBR tuples to select the initial bounding set, one should sort and
process the tuples by order of increasing overhead. Once a
particular tuple has been added to the bounding set, all tuples not
already selected and having lower overhead can be eliminated, because
the next side of the bounding polygon has to be steeper (i.e., the
corresponding TMMBR must have higher overhead) than the latest added
tuple.
I don't think this is correct. Assume you have a line, such as the one indicated in Figure 1 of the RFC with symbol 'a'. It is possible to draw a line with larger slope, such as the line indicated with symbol 'b' that starts on the Y-axis below line 'a'. In other words, if line 'b' has lower intercept on the Y-axis, you should consider line 'b' first. But if this is true, the rest of the algorithm does not work.
I believe the key here, and indeed the reason why the greedy approach works at all to solve the problem, is the sort ordering. While your intuitive claim is correct (that is, that a line segment exists that invalidates the proof of this solution), it's also guaranteed not to be present in a situation in which it would otherwise be problematic.
Wikipedia has a great set of references on solutions to this problem. For a more elegant, closed-form solution to the problem and its isoforms, consider asking about the finer points of this question more abstractly on http://cs.stackexchange.com or http://math.stackexchange.com.
Best of luck with retrofitting RTP to meet your needs.
Related
I am trying to use the Viterbi min-sum algorithm which tries to find the pathway through a bunch of nodes that minimizes the overall Hamming distance (fancy term for "xor two numbers and count the resulting bits") against some fixed input.
I understand find how to use DP to compute the minimal distance overall, but I am having trouble using it to also capture the corresponding path that corresponds to the minimal distance.
It seems like memoizing the path at each node would be really memory-intensive. Is there a standard way to handle these kinds of problems?
Edit:
http://i.imgur.com/EugiEWG.jpg
Here is a sample trellis with what I am talking about. The general idea is to find the path through the trellis that most closely emulates the input bitstring, with minimal error (measured by minimizing overall Hamming distance, or the number of mismatched bits).
As you can see, the first chunk of my input string is 01, and I can traverse there in column 1 of the trellis. The next chunk is 10, and I can move there in column 2. Next chunk is 11. Fine so far. Next chunk is 10, which is a problem because I can't reach that state from where I am now, so I have to go to the next best thing (00) and the rest can be filled fine.
But this can become more complex. I'd need to be able to somehow get the corresponding path to the minimal Hamming distance.
(The point of this exercise is that the trellis represents what are ACTUALLY valid transitions, whereas the input string is something you receive through telecommunicationa and might get garbled and have incorrect bits here and there. This program tries to figure out what the input string SHOULD be by minimizing error).
There's the usual "follow path backwards" technique, requiring only the table of values (but the whole table of values, no cheating with "keep only the most recent part"). The algorithm is simple: start at the end, decide which way you came from. You can make that decision, because either there's exactly one way such that if you came from it you'd compute the value that matches the stored one, or several result in the same value and it wouldn't matter which one you chose.
Storing also a table of "back-pointers" doesn't take much space (about as much as the table of weights, but you can actually omit most of the table of weights if you do this), doing it that way allows you to have a much simpler backwards phase: just follow the pointers. That really is the path, just stored backwards.
You are correct that the immediate approach for calculating the paths, is space expensive.
This problem comes up often in DNA sequencing, where the cost is prohibitive. There are a number of ways to overcome it (see more here):
You can reduce up to a square root of the space if you are willing to double the execution time (see 2.1.1 in the link above).
Using a compressed tree, you can reduce one of the dimensions logarithmically (see 2.1.2 in the link above).
This is a problem I came across frequently and I'm searching a more effective way to solve it. Take a look at this pics:
Let's say you want to find the shortest distance from the red point to a line segment an. Assume you only know the start/end point (x,y) of the segments and the point. Now this can be done in O(n), where n are the line segments, by checking every distance from the point to a line segment. This is IMO not effective, because in the worst case there have to be n-1 distance checks till the right one is found.
This can be a real performance issue for n = 1000 f.e. (which is a likely number), especially if the distance calculation isn't just done in the euclidean space by the Pythagorean theorem but for example by a geodesic method like the haversine formula or Vincenty's.
This is a general problem in different situations:
Is the point inside a radius of the vertices?
Which set of vertices is nearest to the point?
Is the point surrounded by line segments?
To answer these questions, the only approach I know is O(n). Now I would like to know if there is a data structure or a different strategy to solve these problems more efficiently?
To make it short: I'm searching a way, where the line segments / vertices could be "filtered" somehow to get a set of potential candidates before I start my distance calculations. Something to reduce the complexity to O(m) where m < n.
Probably not an acceptable answer, but too long for a comment: The most appropriate answer here depends on details that you did not state in the question.
If you only want to perform this test once, then there will be no way avoid a linear search. However, if you have a fixed set of lines (or a set of lines that does not change too significantly over time), then you may employ various techniques for accelerating the queries. These are sometimes referred to as Spatial Indices, like a Quadtree.
You'll have to expect a trade-off between several factors, like the query time and the memory consumption, or the query time and the time that is required for updating the data structure when the given set of lines changes. The latter also depends on whether it is a structural change (lines being added or removed), or whether only the positions of the existing lines change.
I'm trying to implement an SVM algorithm, but I'm having a hard time understanding how d-dimensional data sets are actually handled. In my particular case, each 'point' has nearly 400 identifying features.
In the two dimensional space, it basically tries to find a line between the two groups that maximizes the margin from any point on either side. I can sort of imagine what such a 'line' would look like in a d-dimensional space, but I'm completely lost on how the classification would actually work.
There is a similar question here, but I'm not getting it. I sort of get how the separation would occur after you have the classifier, but I'm lost on how to actually get the classifier.
If you can imagine how the line of the 2D case would become a d-dimensional hyperplane for higher dimensions, then you are pretty much done. The actual classification occurs when you test a point over the hyperplane, which will give you a positive number if the point belongs to class 1 or negative if it belongs to class 2.
Notice that in the formula there is no restriction for the dimension of each point:
[Image courtesy of wikipedia]
And in case you are curious about what happens with the non-linear case when you use the kernel trick, I would like to share with you a video that illustrates very well the idea.
http://www.youtube.com/watch?v=3liCbRZPrZA
I have a need to take a 2D graph of n points and reduce it the r points (where r is a specific number less than n). For example, I may have two datasets with slightly different number of total points, say 1021 and 1001 and I'd like to force both datasets to have 1000 points. I am aware of a couple of simplification algorithms: Lang Simplification and Douglas-Peucker. I have used Lang in a previous project with slightly different requirements.
The specific properties of the algorithm I am looking for is:
1) must preserve the shape of the line
2) must allow me reduce dataset to a specific number of points
3) is relatively fast
This post is a discussion of the merits of the different algorithms. I will post a second message for advice on implementations in Java or Groovy (why reinvent the wheel).
I am concerned about requirement 2 above. I am not an expert enough in these algorithms to know whether I can dictate the exact number of output points. The implementation of Lang that I've used took lookAhead, tolerance and the array of Points as input, so I don't see how to dictate the number of points in the output. This is a critical requirement of my current needs. Perhaps this is due to the specific implementation of Lang we had used, but I have not seen a lot of information on Lang on the web. Alternatively we could use Douglas-Peucker but again I am not sure if the number of points in the output can be specified.
I should add I am not an expert on these types of algorithms or any kind of math wiz, so I am looking for mere mortal type advice :) How do I satisfy requirements 1 and 2 above? I would sacrifice performance for the right solution.
I think you can adapt Douglas-PĆ¼cker quite straightforwardly. Adapt the recursive algorithm so that rather than producing a list it produces a tree mirroring the structure of the recursive calls. The root of the tree will be the single-line approximation P0-Pn; the next level will represent the two-line approximation P0-Pm-Pn where Pm is the point between P0 and Pn which is furthest from P0-Pn; the next level (if full) will represent a four-line approximation, etc. You can then trim the tree either on the basis of depth or on the basis of distance of the inserted point from the parent line.
Edit: in fact, if you take the latter approach you don't need to build a tree. Instead you populate a priority queue where the priority is given by the distance of the inserted point from the parent line. Then when you've finished the queue tells you which points to remove (or keep, according to the order of the priorities).
You can find my C++ implementation and article on Douglas-Peucker simplification here and here. I also provide a modified version of the Douglas-Peucker simplification that allows you to specify the number of points of the resulting simplified line. It uses a priority queue as mentioned by 'Peter Taylor'. Its a lot slower though, so I don't know if it would satisfy the 'is relatively fast' requirement.
I'm planning on providing an implementation for Lang simplification (and several others). Currently I don't see any easy way how to adjust Lang to reduce to a fixed point count. If you
could live with a less strict requirement: 'must allow me reduce dataset to an approximate number of points', then you could use an iterative approach. Guess an initial value for lookahead: point count / desired point count. Then slowly increase the lookahead until you approximately hit the desired point count.
I hope this helps.
p.s.: I just remembered something, you could also try the Visvalingam-Whyatt algorithm. In short:
-compute the triangle area for each point with its direct neighbors
-sort these areas
-remove the point with the smallest area
-update the area of its neighbors
-resort
-continue until n points remain
I'm in the process of learning about simulated annealing algorithms and have a few questions on how I would modify an example algorithm to solve a 0-1 knapsack problem.
I found this great code on CP:
http://www.codeproject.com/KB/recipes/simulatedAnnealingTSP.aspx
I'm pretty sure I understand how it all works now (except the whole Bolzman condition, as far as I'm concerned is black magic, though I understand about escaping local optimums and apparently this does exactly that). I'd like to re-design this to solve a 0-1 knapsack-"ish" problem. Basically I'm putting one of 5,000 objects in 10 sacks and need to optimize for the least unused space. The actual "score" I assign to a solution is a bit more complex, but not related to the algorithm.
This seems easy enough. This means the Anneal() function would be basically the same. I'd have to implement the GetNextArrangement() function to fit my needs. In the TSM problem, he just swaps two random nodes along the path (ie, he makes a very small change each iteration).
For my problem, on the first iteration, I'd pick 10 random objects and look at the leftover space. For the next iteration, would I just pick 10 new random objects? Or am I best only swapping out a few of the objects, like half of them or only even one of them? Or maybe the number of objects I swap out should be relative to the temperature? Any of these seem doable to me, I'm just wondering if someone has some advice on the best approach (though I can mess around with improvements once I have the code working).
Thanks!
Mike
With simulated annealing, you want to make neighbour states as close in energy as possible. If the neighbours have significantly greater energy, then it will just never jump to them without a very high temperature -- high enough that it will never make progress. On the other hand, if you can come up with heuristics that exploit lower-energy states, then exploit them.
For the TSP, this means swapping adjacent cities. For your problem, I'd suggest a conditional neighbour selection algorithm as follows:
If there are objects that fit in the empty space, then it always puts the biggest one in.
If no objects fit in the empty space, then pick an object to swap out -- but prefer to swap objects of similar sizes.
That is, objects have a probability inverse to the difference in their sizes. You might want to use something like roulette selection here, with the slice size being something like (1 / (size1 - size2)^2).
Ah, I think I found my answer on Wikipedia.. It suggests moving to a "neighbor" state, which usually implies changing as little as possible (like swapping two cities in a TSM problem)..
From: http://en.wikipedia.org/wiki/Simulated_annealing
"The neighbours of a state are new states of the problem that are produced after altering the given state in some particular way. For example, in the traveling salesman problem, each state is typically defined as a particular permutation of the cities to be visited. The neighbours of some particular permutation are the permutations that are produced for example by interchanging a pair of adjacent cities. The action taken to alter the solution in order to find neighbouring solutions is called "move" and different "moves" give different neighbours. These moves usually result in minimal alterations of the solution, as the previous example depicts, in order to help an algorithm to optimize the solution to the maximum extent and also to retain the already optimum parts of the solution and affect only the suboptimum parts. In the previous example, the parts of the solution are the parts of the tour."
So I believe my GetNextArrangement function would want to swap out a random item with an item unused in the set..