I'm trying to code the livewire algorithm but I'm a little stuck because the algorithm explained in the article "Intelligent Scissors for Image Composition" is a little messy and I don't understand complety how to apply certain things for example: How to calculate de local cost map and other stuff.
So please can anyone give a hand and explain it step by step in just simple words?
I would apreciate any help
Thanks.
You should read Mortensen, Eric N., and William A. Barrett. "Interactive segmentation with intelligent scissors." Graphical models and image processing 60.5 (1998): 349-384. which contains more details about the algorithm than the shorter paper "Intelligent Scissors for Image Composition."
Here is a high-level overview:
The Intelligent Scissors algorithm uses a variant of Dijkstra's graph search algorithm to find a minimum cost path from a seed pixel to a destination pixel (the position of the mouse cursor during interactive segmentation).
1) Local costs
Each edge from a pixel p to a pixel q has a local cost, which is a linear combination of the local costs (adjusted by the distance between p and q to account for diagonal pixels):
Laplacian zero-crossing f_Z(q)
Gradient magnitude f_G(q)
Gradient direction f_D(p,q)
Edge pixel value f_P(q)
Inside pixel value f_I(q)
Outside pixel value f_O(q)
Some of these local costs are static and can be computed offline. f_Z and f_G are computed at different scales (meaning with different size kernels) to better represent the edge a pixel q. f_G, f_P, f_I, f_O are dynamically (or have a dynamic component as is the case for f_G) computed for on-the-fly training.
2) On-the-fly training
To prevent snapping to a different edge with a lower cost than the current one being followed, the algorithm uses on-the-fly training to assign a lower cost to neighboring pixels that "look like" past pixels along the current edge.
This is done by building a histogram of image value features along the last 64 or 128 edge pixels. The image value features are computed by scaling and rounding f'_G (where f_G = 1 - f'_G), f_P, f_I, and f_O as to have integer values in [0 255] or [0 1023] which can be used to index the histograms.
The histograms are inverted and scaled to compute dynamic cost maps m_G, m_P, m_I, and m_O. The idea is that a low cost neighbor q should fit in the histogram of the 64 or 128 pixels previously seen.
The paper gives pseudo code showing how to compute these dynamic costs given a list of previously chosen pixels on the path.
3) Graph search
The static and dynamic costs are combined together into a single cost to move from pixel p to one of its 8 neighbors q. Finding the lowest cost path from a seed pixel to a destination pixel is done by essentially using Dijkstra's algorithm with a min-priority queue. The paper gives pseudo code.
Related
Given the (lat, lon) coordinates of a group of n locations on the surface of the earth, find a (lat, lon) point c, and a value of r > 0 such that
we maximize the density, d, of locations per square
mile, say, in the surface area described and contained by the circle defined by c and r.
At first I thought maybe you could solve this using linear programming. However, density depends on area depends on r squared. Quadratic term. So, I don't think problem is amenable to linear programming.
Is there a known method for solving this kind of thing? Suppose you simplify the problem to (x, y) coordinates on the Cartesian plane. Does that make it easier?
You've got two variables c and r that you're trying to find so as to maximize the density, which is a function of c and r (and the locations, which is a constant). So maybe a hill-climbing, gradient descent, or simulated annealing approach might work? You can make a pretty good guess for your first value. Just use the centroid of the locations. I think the local maximum you reach from there would be a global maximum.
Steps:
Cluster your points using a density based clustering algorithm1;
Calculate the density of each cluster;
Recursively (or iteratively) sub-cluster the points in the most dense cluster;
The algorithm has to be ignoring the outliers and making them a cluster in their own. This way, all the outliers with high density will be kept and outliers with low density will be weaned out.
Keep track of the cluster with highest density observed till now. Return when you finally reach a cluster made of a single point.
This algorithm will work only when you have clusters like the ones shown below as the recursive exploration will be resulting in similarly shaped clusters:
The algorithm will fail with awkwardly shaped clusters like this because as you can see that even though the triangles are most densely placed when you calculate the density in the donut shape, they will report a far lower density wrt the circle centered at [0, 0]:
1. One density based clustering algorithm that will work for you is DBSCAN.
I am interesting in finding the diameter of two points sets, in 128 dimensions. The first has 10000 points and the second 1000000. For that reason I would like to do something better than the naive approach which takes O(n²). The algorithm will be able to handle any number of points and dimensions, but I am currently very interested in these two particular data sets.
I am very interesting in gaining speed over accuracy, thus, based on this, I would find the (approximate) bounding box of the point set, by computing the min and max value per coordinate, thus O(n*d) time. Then, if I find the diameter of this box, the problem is solved.
In the 3d case, I could find the diameter of the one side, since I know the two edges and then, I could apply the Pythagorean theorem on the other, which is vertical to this side. I am not sure for this however and for sure, I can't see how to generalize it to d dimensions.
An interesting answer can be found here, but it seems to be specific for 3 dimensions and I want a method for d dimensions.
Interesting paper: On computing the diameter of a point set in high dimensional Euclidean space. Link. However, implementing the algorithm seems too much for me in this phase.
The classic 2-approximation algorithm for this problem, with running time O(nd), is to choose an arbitrary point and then return the maximum distance to another point. The diameter is no smaller than this value and no larger than twice this value.
I would like to add a comment, but not enough reputation for that...
I just want to warn other readers that the "bounding box" solution is very inaccurate. Take for example the Euclidean ball of radius one. This set has diameter two, but its bounding box is [-1, 1]^d, which has diameter twice the square root of d. For d = 128, this is already a very bad approximation.
For a crude estimate, I would stay with David Eisenstat's answer.
There is a precision based algorithm which performs very well on any dimension, which is based on computing the dimension of an axial bounding box.
The idea is that it's possible to find the lower and upper boundaries of the axis bounding box length function since it's partial derivatives are limited, and depend on the angle between the axises.
The limit of the local maxima derivatives between two axises in 2d space can be computed as:
sin(a/2)*(1 + tan(a/2))
That means that, for example, for 90deg between axises the boundary is 1.42 (sqrt(2))
Which reduces to a/2 when a => 0, so the upper boundary is proportional to the angle.
For a multidimensional case the formula varies slightly, but still it's easy to compute.
So, the search of local minima convolves in logarithmic time.
The good news is that we can run the search of such local maxima in parallel.
Also, we can filter out both the regions of the search based on the best achieved result so far, as well as the points themselves, which are belo the lower limit of the search in the worst region.
The worst case of the algorithm is where all of the points are placed on the surface of a sphere.
This can be firther improved: when we detect a local search which operates on just few points, we swap to bruteforce for this particular axis. It works fast, because we need only the points which are subject to that particular local search, which can be determined as points actually bound by two opposite spherical cones of a particular angle sharing the same axis.
It's hard to figure out the big O notation, because it depends on desired precision and the distribution of points (bad when most of the points are on a sphere's surface).
The algorithm i use is here:
Set the initial angle a = pi/2.
Take one axis for each dimension. The angle and the axises form the initial 'bucket'
For each axis, compute the span on that axis by projecting all the points onto the axis, and finding min and max of the coordinates on the axis.
Compute the upper and lower bounds of the diameter which is interesting. It's based on the formula: sin(a/2)*(1 + tan(a/2)) and multiplied by assimetry cooficient, computed from the length of the current axis projections.
For the next step, kill all of the points which fall under the lower bound in each dimension at the same time.
For each exis, If the amount of points above the upper bound is less then some reasonable amount (experimentally computed) then compute using a bruteforce (N^2) on the set of the points in question, and adjust the lower bound, and kill the axis for the next step.
For the next step, Kill all of the axises, which have all of their points under the lower bound.
If the precision is satisfactory (upper bound - lower bound) < epsilon, then return the upper bound as the result.
For all of the survived axises, there is a virtual cone on that axis (actually, the two opposite cones), which covers some area on a virtual sphere which encloses a face of the cube. If i'm not mistaken, it's angle would be a * sqrt(2). Set the new angle to a / sqrt(2). Create a whole bucket of new axises (2 * number of dimensions), so the new cone areas would cover the initial cone area. It's the hard part for me, as i have not enough imagination for n>3-dimensional case.
Continue from step (3).
You can paralellize the procedure, synchronizing the limits computed so far for the points from (5) through (7).
I'm going to summarize the algorithm proposed by Timothy Shields.
Pick random point x.
Pick point y furthest from x.
If not done, let x = y, and go to step 2
The more times you repeat, the more accurate the result will be... ??
EDIT: actually this algorithm is not very good. Think about a 2D rectangle with vertices ABCD. There are two maxima: between AC and BD, which are separated by a sizable valley. This algorithm will get stuck at one or the other 50/50. If AC is slightly larger than BD, you'll be getting the wrong answer 50% of the time no matter how many times you iterate. Other regular polygons have the same issue, and in higher dimensions it is even worse.
I have an edge map of a scene and would like to extract the edge which best separates the sky and terrain. This seems to be well framed as a graph traversal problem. However, popular search algorithms such as A* are reliant upon the use of a starting and ending point (other than the first and last column respectively). Are there any algorithms for graph search which do not require these parameters? I would also like to maximize some global features of the extracted edge such as smoothness.
Note: speed is a significant issue, this needs to be done in real time.
Computer vision researchers have attacked this type of problem with minimum cuts. Wikipedia has a whole article about graph cuts in computer vision. I'll sketch here the algorithm proposed by Greig, Porteous, and Seheult, who were the first to make this connection.
Suppose that we have a function from pixel colors to log likelihoods of how likely that pixel is to be sky versus terrain. Prepare a graph with a source vertex, a sink vertex, and a vertex for each pixel. Connect the source to each pixel with capacity equal to the log likelihood of that pixel being sky. Connect each pixel to the sink with capacity equal to the log likelihood of that pixel being terrain. For each pair of adjacent pixels, connect them with capacity equal to the log likelihood of them having different classifications. Compute a minimum cut. All of the vertices on the source side of the cut are classified as sky, and all of the vertices on the sink side of the cut are classified as terrain.
Alternatively, if the terrain is known to be at the bottom of the image and the sky is known to be at the top, connect the source instead to each of the top pixels and connect the bottom pixels to the sink, with infinite capacity. Then we can dispense with the log likelihoods for classifying pixels based on color, leaving the edge capacities to vary with the similarity of adjacent pixel colors.
If I have a set of quad tree (say on a Hilbert curve) what would be a good way to approach finding the optimum (or good enough) set of ranges at particular depth.
For example, if I'm searching for points between the bounding box 0,0 and 1,3 then I can apply the following naive ranges:
Depth 1 - Range 0,0-1,0 (~33% search space)
Depth 2 - Ranges 0,0-1,0
and 1,0-0,1 (~13% search space)
Depth 3 - Ranges 0,0-1,0 and 1,3-0,3
(~9.8% search space)
Clearly depth 3 for this search is optimal but the reduced search space has only dropped a small amount compared to the drop from depth 1 to depth 2.
At (much) bigger depths, or with searches that cross boundaries, is there a good algorithm(s) for estimating the difference between various depths, or ideally picking a mix of ranges at different depths that ideally cover the bounding box.
I'm not interested in polygons specifically but bonus points if there is a solution that works for polygons.
Althoug your question needs more details, some answers:
You can estimate the depth of a quad by log4(N).
(Take the logarithm of base 4 of the number of elements N.)
Depeending on the type of quadtree you can limit the maximum depth to that number.
The order of inserting the elements influences the structure of the quad.
Pre sorting the data before inserting can improve a bit the quad structure. The type of pre sort depens on the quad. if you use a hilbert backed up quad, you could pre sort the data by hilbert index.
When you use a hilbert curve it's a spatial index it's not a quadtree. A quadtree has also some limitations for example how many points you can store. So, on a hilbert curve it's better to use small tiles so the bounding box can fit nice.
I started toying with this idea some years ago when I wrote my university papers. The idea is this - the perfect color quantization algorithm would take an arbitrary true-color picture and reduce the number of colors to the minimum possible, while maintaining that the new image is completely indistinguishable from the original with a naked eye.
Basically the setting is simple - you have a set of points in the RGB cube (from 0 to 255 integer values on each axis). You have to replace each of these points with another point in such a way that:
The total number of points after the operation is as small as possible;
The distance from an original point to the replaced point is no larger than some predefined constants R, G and B on each of the red, green and blue axis (these are taken from the sensitivity of the human eye and are in general configurable by the user).
I know that there are many color quantization algorithms out there that work with different efficiencies, but they are mostly targeted at reducing colors to a certain number, not "the minimum possible without violating these constraints".
Also, I would like the algorithm to produce really absolute minimum possible, not just something that is "pretty close to minimum".
Is this possible without a time consuming full search of all combinations (infeasible for any real picture)? My instincts tell me that this is a NP-complete problem or worse, but I cannot prove it.
Bonus setting: Change the limit from constants R,G,B to a function F(Rsource, Gsource, Bsource, Rtarget, Gtarget, Btarget) which returns TRUE if the mapping would be OK, and FALSE if it was out of range.
Given your definitions the structure of the picture (i.e. how the pixels are organized) does not matter at all, the only thing that matters is the subset of RGB triplets that appear at least once in the picture as a pixel value. Let that subset be S. You want to find then another subset of RGB triplets E (the encoding) such that for every s in S there exists a counterpart e in E such that diff(s,e) <= threshold where threshold is the limit you impose on the acceptable difference and diff(...) reduces the triplet distance into a single number.
Additionally, you want to find E that is minimal in size i.e. for any E' s.t. |E'|<|E|, there is at least one (s,e) pair violating the difference constraint.
This particular problem cannot be given an asymptotic complexity assessment because it has only a finite set of instances. It can be solved in constant time (theoretically) by precalculating the minimum set E for every subset S. There is a huge amount of subsets S but yet only a finite number, so the problem cannot be e.g. classified as NP-complete optimization problem or anything. The actual run-time of your algorithm for this parcticular problem hence depends completely on the amount of preprocessing you are willing to tolerate. In order to get an asymptotic complexity assessment you need to generalize the problem first so that the set of problem instances is strictly infinite.
Optimal quantization is an NP-hard problem (Son H. Nguyen, Andrzej Skowron — Quantization Of Real Value Attributes, 1995).
Predefined maximum distance doesn't make things easier when you have clusters of points which are larger than your sphere, but distances between points are less than sphere radius — then you have a lot of combinations (as each choice of placement of a sphere may displace all other spheres). And unfortunately this is going to happen quite often on real images with gradients (it's not unusual for entire histogram to be one huge cluster).
You can modify many quantization algorithms to pick number of clusters until certain quality is satisfied, e.g. in Median Cut and Linde–Buzo–Gray you can simply stop subdividing space when you reach your quality limit. It won't be guarantee that it's global minimum (that is NP-hard), but in LBG you'll at least know you're at local minimum.
Here's an idea how I'd go about this - unfortunately this will probably need a lot of memory and be very slow:
You create a 256x256x256 cubic data structure that contains a counter and a "neighbors" list of colors. For every unique color that you find in your image you increase the counter of each cell which is within the radius of a sphere around that color. The radius of the sphere is the maximum acceptable distance that you have defined originally. You also add the color to the neighbors list of each cell.
Once you have added all unique colors you loop through the cube and find the cell with the maximum counter value. Add this color to your result list. Now loop through your cube again and remove this color and all colors that are in the neighbors list of that color from all cells and decrease each cell's counter whenever you remove a color. Then repeat searching for the maximum counter and removing until no more colors are in the cube.
Alternatively once could also add the same color multiple times if it occurs more often in the image. Not sure if that would improve the visual result.