How to compute area of each blob individually [closed]

How to compute area of each blob individually [closed] - image

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am attaching here an image for which I need to calculate number of blobs and compute area of each blob separately. I am using Matlab for doing this.
Black regions have index value '0' and white background have index value '1'
Thank you in advance. It would be great if some one helps me in doing this.

The main problem is to find the number of blobs. For that, I'll rather use k-Means clustering. It will be too long to explain what the k-means clustering does, how does it work and so on, so I'll jump straight to the point: the k-Means algorithm groups n points into k groups (clusters). The result is a partitioned space: a given point cannot be in two clusters at the same time and a cluster is identified by its centroid (the mean point).
So let's import the image and find all x and y coordinates for black points: these are indeed the points we want to cluster.
I=imread('image.jpg');
BW = im2bw(I, graythresh(I));
[x,y]=find(BW==0);
Now it's time to trigger the k-Means algorithm in order to group such points. Since we don't know k, that is the number of blobs, we can perform some sort of bruteforce approach. We select some candidate values for k and apply k-Means clustering to all of these values. Later on, we select the best k by means of the Elbow Method: we plot the so-called Within Cluster Sum of Squares (that is the sum of all the distances between points and their respective centroid) and select the k value such that adding another cluster doesn't give much better modeling of the data.
for k=1:10
[idx{k},C{k},sumd{k}] = kmeans([x y],k,'Replicates',20);
end
sumd=cellfun(#(x) sum(x(:)),sumd);
The code above performs the k-Means for k in range [1, 10]. Since in standard k-Means the first centroids are randomly selected amongst the points in our dataset (i.e. the black points), we repeat k-Means 20 times for each value of k and then the algorithm will automatically return the best results amongst the 20 repetitions. Such results are idx that is a vector of n points (where n is the number of black points) that contains in its j-th position the centroid ID for the j-th black point. C are the centroid coordinates and sumd is the sum of squares.
We then plot the sum of squares vs the k candidates:
figure(6);
plot(1:10,sumd,'*-');
and we obtain something like:
According to the Elbow Method explained above, 6 is the optimal number of clusters: indeed after 6 the plot tends to be rather horizontal.
So from the arrays above, we select the 6th element:
best_k=6;
best_idx=idx{best_k};
best_C=C{best_k};
and the returned clusters are
gscatter(x,y,best_idx); hold on;
plot(best_C(:,1),best_C(:,2),'xk','LineWidth',2);
Note: the image is rotated because plot() handles matrices (coordinates) differently with respect to imshow(). Also black-crossed points are the centroids for each cluster.
And finally by counting the number of points per cluster, we gather the area of the cluster itself (i.e. the blob):
for m=1:best_k
Area(m)=sum(best_idx==m);
end
Area =
1619 46 141 104 584 765
Obviously the i-th item in Area is the area of the i-th cluster, as reported by the legend.
Further readings
In this Wikipedia link you can find some more details regarding the determination of the number of cluster (the "best k") in the k-Means algorithm. Amongst these methods you can find the Elbow Method as well. As #rayryeng correctly pointed out in the comments below, the Elbow plot is just an heuristic: in some datasets you cannot clearly spot a "knee" in the curve...we've been lucky though!
Last but not least, if you want to know more about the k-Means algorithm, please have a look at #rayryeng's answer linked below in the comments: it's a brilliantly detailed answer that not only describes the algorithm itself, but also talks about the repetitions I've set in the code, the initial centroid randomly selected and all these aspects I've been skipping in order to avoid an endless answer.

Related

How do I implement genetic algorithm on placing 2 or more kinds of element with different (repeating)distances in a grid?

Please forgive me if I do not explain my question clearly in title.
Here may I show you two pictures as my example:
my question is described as follows: I have 2 or more different objects(In the pictures, two objects: circle and cross), each one is placed repeatedly with a fixed row/column distance (In the pictures, the circle has a distance of 4 and cross has a distance of 2) into a grid.
In the first picture, each of the two objects are repeated correctly without any interruptions(here interruption means one object may occupy another one's position), but the arrangement in the first picture is ununiform distributed; on the contrary, in the second picture, the two objects may have interruptions (the circle object occupies cross objects' position) but the picture is uniformly distributed.
My target is to get the placement as uniform as possible (the objects are still placed with fixed distances but may allow some occupations). Is there a potential algorithm for this question? Or are there any similar questions?
I have some immature thinkings on this problem that: 1. occupation may relate to least common multiple; 2. how to define "uniformly distributed" mathematically? Maybe there's no genetic solution but is there a solution for some special cases? (for example, 3 objects with distance of multiple of 2, or multiple of 3?)

Uniformity can be measured as sum of squared inverse distances(or distances to equilibrium distances). Because it has squared relation, any single piece that approaches others will have big fitness penalty in system so that the system will not tolerate too close piece and prefer a better distribution.
If you do not use squared (or higher orders) distance but simple distance, then system starts tolerating even overlapped pieces.
If you want to manually compute uniformity, then compute the standard deviation of distances. You'd say its perfect with 1 distance and 0 deviation but small enough deviation also acceptable.
I tested this only on a problem to fit 106 circles in a square thats 10x the size of circle.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!

I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.

I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).

You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Clustering 2d integer coordinates into sets of at most N points

I have a number of points on a relatively small 2-dimensional grid, which wraps around in both dimensions. The coordinates can only be integers. I need to divide them into sets of at most N points that are close together, where N will be quite a small cut-off, I suspect 10 at most.
I'm designing an AI for a game, and I'm 99% certain using minimax on all the game pieces will give me a usable lookahead of about 1 move, if that. However distant game pieces should be unable to affect each other until we're looking ahead by a large number of moves, so I want to partition the game into a number of sub-games of N pieces at a time. However, I need to ensure I select a reasonable N pieces at a time, i.e. ones that are close together.
I don't care whether outliers are left on their own or lumped in with their least-distant cluster. Breaking up natural clusters larger than N is inevitable, and only needs to be sort-of reasonable. Because this is used in a game AI with limited response time, I'm looking for as fast an algorithm as possible, and willing to trade off accuracy for performance.
Does anyone have any suggestions for algorithms to look at adapting? K-means and relatives don't seem appropriate, as I don't know how many clusters I want to find but I have a bound on how large clusters I want. I've seen some evidence that approximating a solution by snapping points to a grid can help some clustering algorithms, so I'm hoping the integer coordinates makes the problem easier. Hierarchical distance-based clustering will be easy to adapt to the wrap-around coordinates, as I just plug in a different distance function, and also relatively easy to cap the size of the clusters. Are there any other ideas I should be looking at?
I'm more interested in algorithms than libraries, though libraries with good documentation of how they work would be welcome.
EDIT: I originally asked this question when I was working on an entry for the Fall 2011 AI Challenge, which I sadly never got finished. The page I linked to has a reasonably short reasonably high-level description of the game.
The two key points are:
Each player has a potentially large number of ants
Every ant is given orders every turn, moving 1 square either north, south, east or west; this means the branching factor of the game is O(4ants).
In the contest there were also strict time constraints on each bot's turn. I had thought to approach the game by using minimax (the turns are really simultaneous, but as a heuristic I thought it would be okay), but I feared there wouldn't be time to look ahead very many moves if I considered the whole game at once. But as each ant moves only one square each turn, two ants cannot N spaces apart by the shortest route possibly interfere with one another until we're looking ahead N/2 moves.
So the solution I was searching for was a good way to pick smaller groups of ants at a time and minimax each group separately. I had hoped this would allow me to search deeper into the move-tree without losing much accuracy. But obviously there's no point using a very expensive clustering algorithm as a time-saving heuristic!
I'm still interested in the answer to this question, though more in what I can learn from the techniques than for this particular contest, since it's over! Thanks for all the answers so far.

The median-cut algorithm is very simple to implement in 2D and would work well here. Your outliers would end up as groups of 1 which you could discard or whatever.
Further explanation requested:
Median cut is a quantization algorithm but all quantization algorithms are special case clustering algorithms. In this case the algorithm is extremely simple: find the smallest bounding box containing all points, split the box along its longest side (and shrink it to fit the points), repeat until the target amount of boxes is achieved.
A more detailed description and coded example
Wiki on color quantization has some good visuals and links

Since you are writing a game where (I assume) only a constant number of pieces move between each clusering, you can take advantage of a Online algorithm to get consant update times.
The property of not locking yourself to a number of clusters is called Nonstationary, I believe.
This paper seams to have a good algorithm with both of the above two properties: Improving the Robustness of 'Online Agglomerative Clustering Method' Based on Kernel-Induce Distance Measures (You might be able to find it elsewhere as well).
Here is a nice video showing the algorithm in works:

Construct a graph G=(V, E) over your grid, and partition it.
Since you are interested in algorithms rather than libraries, here is a recent paper:
Daniel Delling, Andrew V. Goldberg, Ilya Razenshteyn, and Renato F. Werneck. Graph Partitioning with Natural Cuts. In 25th International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE Computer
Society, 2011. [PDF]
From the text:
The goal of the graph partitioning problem is to ﬁnd a minimum-cost partition P such that the size of each cell is bounded by U.
So you will set U=10.

You can calculate a minimum spanning tree and remove the longest edges. Then you can calculate the k-means. Remove another long edge and calculate the k-means. Rinse and repeat until you have N=10. I believe this algorithm is named single-link k-means and the cluster are similar to voronoi diagrams:
"The single-link k-clustering algorithm ... is precisely Kruskal's algorithm ... equivalent to finding an MST and deleting the k-1 most expensive edges."
See for example here: https://stats.stackexchange.com/questions/1475/visualization-software-for-clustering

Consider the case where you only want two clusters. If you run k-means, then you will get two points, and the division between the two clusters is a plane orthogonal to the line between the centres of the two clusters. You can find out which cluster a point is in by projecting it down to the line and then comparing its position on the line with a threshold (e.g. take the dot product between the line and a vector from either of the two cluster centres and the point).
For two clusters, this means that you can adjust the sizes of the clusters by moving the threshold. You can sort the points on their distance along the line connecting the two cluster centres and then move the threshold along the line quite easily, trading off the inequality of the split with how neat the clusters are.
You probably don't have k=2, but you can run this hierarchically, by dividing into two clusters, and then sub-dividing the clusters.
(After comment)
I'm not good with pictures, but here is some relevant algebra.
With k-means we divide points according to their distance from cluster centres, so for a point Xi and two centres Ai and Bi we might be interested in
SUM_i (Xi - Ai)^2 - SUM_i(Xi - Bi)^2
This is SUM_i Ai^2 - SUM_i Bi^2 + 2 SUM_i (Bi - Ai)Xi
So a point gets assigned to either cluster depending on the sign of K + 2(B - A).X - a constant plus the dot product between the vector to the point and the vector joining the two cluster circles. In two dimensions, the dividing line between the points on the plane that end up in one cluster and the points on the plane that end up in the other cluster is a line perpendicular to the line between the two cluster centres. What I am suggesting is that, in order to control the number of points after your division, you compute (B - A).X for each point X and then choose a threshold that divides all points in one cluster from all points in the other cluster. This amounts to sliding the dividing line up or down the line between the two cluster centres, while keeping it perpendicular to the line between them.
Once you have dot products Yi, where Yi = SUM_j (Bj - Aj) Xij, a measure of how closely grouped a cluster is is SUM_i (Yi - Ym)^2, where Ym is the mean of the Yi in the cluster. I am suggesting that you use the sum of these values for the two clusters to tell how good a split you have. To move a point into or out of a cluster and get the new sum of squares without recomputing everything from scratch, note that SUM_i (Si + T)^2 = SUM_i Si^2 + 2T SUM_i Si + T^2, so if you keep track of sums and sums of squares you can work out what happens to a sum of squares when you add or subtract a value to every component, as the mean of the cluster changes when you add or remove a point to it.

Best parallel method for calculating the integral of a 2D function

In some crunching number program, I have a function which can be just 1 or 0 in three dimensions. I do not know in advance the function, but I need to know the total "surface" of the function which is equal to zero. In a similar problem I could draw a rectangle over the 2D representation of the map of United Kingdom. The function is equal to 0 at sea, and 1 at the earth. I need to know the total water surface. I wonder what is the best parallel algorithm or method for doing this.
I thought first about the following approach; a) divide 2D map area into a rectangular grid. For each point that belongs to the center of each cell, check whether it is earth of water. This can be done in parallel. At the end of the procedure I will have a matrix with ones and zeroes. I will get the area with some precision. Now I want to increase this precision, so b) choose the cells that are in the border regions between zeroes and ones (what is the best criterion for doing this?) and in those cells, divide them again into successive cells and repeat the process until one gets the desired accuracy. I guess that in this process, the critical parameters are the grid size for each new stage, and how to store and check the cells that belong to the border area. Finally the most optimal method, from the computational point of view, is the one that performs the minimal number of checks in order to get the value of the total surface with the desired accuracy.

First of all, it looks like you are talking about 3D function, e.g. for two coordinates x and y you have f(x, y) = 0 if (x, y) belongs to the sea, and f(x, y) = 1 otherwise.
Having said that, you can use the following simple approach.
Split your rectangle into N subrectangles, where N is the number of
your processors (or processor cores, or nodes in a cluster, etc.)
For each subrectangle use Monte Carlo method to calculate the
surface of the water.
Add the N values to calculate the total surface of the water.
Of course, you can use any other method to calculate the surface, Mothe Carlo was just an example. But the idea is the same: subdivide your problem to N subproblems, solve them in parallel, then combine the results.
Update: For the Monte Carlo method the error estimate decreases as 1/sqrt(N) where N is the number of samples. For instance, to reduce the error by a factor of 2 requires a 4-fold increase in the number of sample points.

I believe that your attitude is reasonable.
Choose the cells that area in the border regions between zeroes and ones (what is the best criterion for doing this?)
Each cell has 8 sorrunding cells (3x3), or 24 sorrunding cells (5x5). If at least one of the 9 or 25 cells contains land, and at least one of these cells contains water - increase the accuracy for the whole block of cells (3x3 or 5x5) and query again.
When the accuracy is good enough - instead of splitting, just add the land area to the sum.
Efficiency
Use a producers-consumer queue. Create n threads, where n equals to the number of cores on your machine. All threads should do the same job:
Dequeue a geo-cell from the queue
If the area of the cell is still large - divide it into 3x3 or 5x5 cells, for each of the split cells check for land/sea. If there is a mix - enqueue all these cells. If it only land: just add the area. only sea: do nothing.
For start, just divide the whole area into reasonable sized cell and equeue all of them.
You can also optimize by not adding all the 9 or 25 cells when there is a mix, but examine the pattern (only top/bottom/left/right cells).
Edit:
There is a tradeoff between accuracy and performance: If the initial cell size is too large, you may miss small lakes or small islands. therefore the optimization criteria should be: start with the largest cells possible that will assure enough accuracy.

Histogram peak identification and gauss fitting with minimal accumulated hight difference in c++

I already asked a similar question some time ago in the following thread: previous thread. Until now I unfortunately couldn't entirely solve that issue and only worked around. Since it is difficult to include all the new information with the previous thread
I post a refined and extended question with distinct context here and link it to the old thread.
I am currently implementing an algorithm from a paper which extracts certain regions of a 3D data set by dynamically identifying value ranges in the data sets histogram.
In a simplified way the method could be described as following:
Find the highest peak in the histogram
Fit a gaussian to the peak
Using the value range defined by the gaussians mean (µ)+/- deviation(ϭ) certain regions
of the histogram are identified, and the voxels (=3D pixels) of these regions are removed from the original histogram.
As a result of the previous step a new highest peak should be revealed, based on which
the steps 1-3 can be repeated. The steps are repeated until the data set histogram is empty.
My questions relate to step 1 and 2 of the above description which is described as following in the paper: "The highest peak is identified and a Gaussian curve is fitted to its shape. The Gaussian is described by its midpoint µ, height h and deviation ϭ. The fitting process minimizes the accumulated height difference between the histogram and the middle part of the Gaussian. The error summation range is
µ+/ϭ? "1
In the following I will ask my questions and add my reflections on them:
How should I identify those bins of the total histogram which describe the highest peak? In order to identify its apex I simply run through the histogram and store the index of the bin with the highest frequency. But how far should the extend of the peak reach to the left and right of the highest bin. At the moment I simply go the the left and right of the highest bin for as long as the next bin is smaller as the previous one. However this is usually a very small range, since there occur creases (mini peaks) in the histogram. I already thought about smoothing the histogram. But I would have to that after each iteration since the subtraction of voxels (step 3 in the description above) can cause the histogram to contain creases again. And I am also worried that the repeated smoothing distorts the results.
Therefore I would like to ask whether there is an efficient way to detect the extend of a peak which is better than my current approach. There have been suggestions about mixture models and deconvolution in the previous thread. However are these methods really reasonable if the shape of the histogram constantly changes after each iteration?
How can I fit a gauss curve to the identified peak so that the accumulated hight difference between the histogram and middle part of the gaussian is minimized?
According to question one from the previous thread I fitted the curve to a given range of histogram bins by computing their mean and deviation (I hope this is correct?!). But how do I minimize the accumulated hight difference between the histogram and middle part of the gaussian from this point?
Thank you for your help!
Regards Marc

Add histogram values to the left and right until the goodness of the fit begins to decrease.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio