H3 polyfill for country-scale polygons - geopandas

I am trying to generate a grid for a given (multi) polygon. I understand a grid as a collection of h3 indices within a (multi)polygon boundary.
Here is the code that I implemented so far:
def generate_grid(region_bounds: gpd.GeoDataFrame) -> pd.DataFrame:
Generates H3 resolution 10 grid. It utilizes the h3.polyfill method.
For more detail see https://geographicdata.science/book/data/h3_grid/build_sd_h3_grid.html
Returns: a dataframe with the following columns:
<index, h3_res_10>
logging.info("Start grid generation")
start = time.time()
resolution = 10
grid = region_bounds.h3.polyfill(resolution)
end = time.time()
logging.info(f"grid generation took {end - start} sec")
# convert polyfill result to df
grid_df = pd.DataFrame.from_dict({"h3_res_10": grid.h3_polyfill[0]})
return grid_df
The problem occurs when the region to process is big, like a country or state.
Is there any way to run the polyfill in parallel for multiple subregions? How can I efficiently split the region into subregions to run h3.polyfill in parallel?

Yes, polyfill (polygonToCells in v4) can be CPU/memory intensive for large regions at fine resolutions of H3. Res 10 is roughly a city block, so a large country will likely have millions of cells.
The best option at the moment is to split up the input into contiguous polygons. The resulting cells will not overlap if the input polygons do not overlap. In Python, you could try Shapely to split the polygon, simply taking vertical tranches with successive north-south lines as the splitters: shapely.ops.split
The resulting set of polygons can then be processed either serially or in parallel with much less memory pressure. CPU time is roughly the same or higher, but could be split across threads.


How to Plot Several 3D Trajectories in Julia (using a for-loop, presumably?) -- also, how to animate over time?

I am trying to plot a set of 3D trajectories in a single plot in Julia. By 3-D trajectories I mean: different sets of 3-D coordinates over time. These trajectories are stored in a multidimensional array called positions, where the dimensions respectively correspond to the Trajectory ID, X-Y-Z coordinate and Time. For example, positions[75,2,1:100] refers to the Y (2nd) coordinate of the 75th Trajectory, across the first 100 timesteps of the trajectory.
I am trying to figure out why the following code doesn't work:
using Plots
time_indices = 1:100
ax= scatter3d(positions[1,1,time_indices],positions[1,2,time_indices],positions[1,3,time_indices],label="Trajectory 1 for times 1 to 100")
for n in 2:size(positions,1)
scatter3d!(ax, positions[n,1,time_indices], positions[n,2,time_indices],positions[n,3,time_indices],label="Trajectory $n for times 1 to 100")
When I run that code, I don't see anything in the Plots window (I'm using Atom), although I don't get any errors / it appears to run successfully. Any thoughts on what I'm doing wrong? Should I use a different backend? It doesn't work on either gr() or plotlyjs() (those are the only ones I know of, based on tutorials I've completed).
Follow up question: once I can successfully plot such 3-D trajectories in a single static plot, I am wondering how you would go about animating them over time (using #gif or #animate macros, presumably)? I am asking this here, because I wasn't able to understand the documentation / tutorial on 3D animations, unfortunately. Googling / other sources have also not helped :(
It looks like you're just missing a display(ax) at the end, after the loop.
Edit: to animate, try
anim = #animate for n in 1:size(positions,1)
scatter3d(positions[n,1,time_indices], positions[n,2,time_indices],positions[n,3,time_indices],label="Trajectory $n for times 1 to 100")
gif(anim, "some_file_name.gif", fps=15)
(or if you want the prior trajectories to show up as well in later frames of the gif, then replace the scatter3d with scatter3d!)
I can't test the above without having your positions, but here is another example that I have just tested on Julia 1.5.0beta with Plots and the GR backend:
using Plots; gr();
anim = #animate for i=1:100
plot(sin.(range(0,i/10*pi,length=1000)), label="", ylims=(-1,1))
gif(anim, "anim_fps15.gif", fps=15)
How to do the second bit (the animation - see details in comments on the accepted answer):
anim = #animate for t in time_indices
all_positions_t = positions[:,:,t] # positions of all trajectories at time t
scatter3d(all_positions_t[:,1], all_positions_t[:,2],all_positions_t[:,3],label="")
gif(anim, "some_file_name.gif", fps=15)

Failed to convert structure to matrix with regionprops in MATLAB

I am working with particle tracking in images in MATLAB and using regionprops function. On the provided resource there is an example with circles:
stats = regionprops('table',bw,'Centroid',...
centers = stats.Centroid;
diameters = mean([stats.MajorAxisLength stats.MinorAxisLength],2);
radii = diameters/2;
In my Matlab R2014b, the line centers = stats.Centroid; produces undesired result: my stats.Centroid structure has 20 elements (each element is two numbers - the coordinates of the center of the region). However, after the following command, my variable center is only 1x2 matrix, instead of desired 20x2.
Screenshot attached.
I tried to go around this with different methods. The only solution I found is to do:
for i=1:20
However, as we all know loops are slow in MATLAB. Is there another method that takes advantage of MATLAB matrix operations?
Doing stats.Centroid would in fact give you a comma-separated list of centroids, so MATLAB would only give you the first centre of that matrix if you did centers = stats.Centroid. What you must do is encapsulate the centres in an array (i.e. [stats.Centroid]), then reshape when you're done.
Something like this should work for you:
centers = reshape([stats.Centroid], 2, []).';
What this will do is read in the centroids as a 1 x 2*M array where M is the total number of blobs and because MATLAB does reshaping in column-major format, you should make sure that specify the total number of rows to be 2 and let MATLAB figure out how many columns there are after by itself. You would then transpose the result when you're done to complete what you want.
Minor Note
If you look at the regionprops documentation page in their Tips section - http://www.mathworks.com/help/images/ref/regionprops.html#buorh6l-1, you will see that they surround stats.Area, which is the area of each blob with [] brackets to ensure that the comma-separated list of values is encapsulated in an array. This is not an accident and there is a purpose of having those there and I've basically told you what that was.

Sort labels of segmented image in kmeans based on cluster mean

I have a simple question but is very interesting. As you know, Kmeans can be give different result after each running due to randomly initial cluster center. However, assume I know that cluster 1 has smaller mean value than cluster 2, cluster 2 has smaller mean value than cluster 3 and so on. I want to make a algorithm to implement that cluster has small mean value, then it will be assigned to small cluster index.
This is my Matlab code. If you are have more sort or more clear way. Please suggest to me
%% K-mean
nrows = size(Img_original,1);
ncols = size(Img_original,2);
I_1D = reshape(Img_original,nrows*ncols,1);
[cluster_idx mu]=kmeans(double(I_1D),num_cluster,'distance','sqEuclidean','Replicates',3);
cluster_label = reshape(cluster_idx,nrows,ncols);
%% Sort based on mu
[mu_sort id_sort]=sort(mu);
%% Save index of order if mu
for i=1:num_cluster
%% Sort cluster label based on mu
for i=1:num_cluster
It's unclear to me as to why you'd want to relabel the clusters based on the ordering of each centroid. You can simply use the labelling vector that is output from k-means to reference which cluster / centroid each point belongs to.
Nevertheless, the initial idea that you had to sort the centroids is a good one. The last part of your code seems rather inefficient because you're looping over each label and doing the reassignment. One thing I could perhaps suggest is to have a lookup table where the input is the original label and the output is the reordered labels based on the sorted centroids.
If you want to pursue this route, you can use a containers.Map where the keys are the labels given from the sort order that is output from sort, and the values are the reordered labels... namely, a vector that goes from 1 up to as many classes you have. You need to do this because the second output of sort tells you where each value in the original array would appear in the sorted result, so you must use this ordering to properly perform the relabelling. In addition, I would use the sortrows function in MATLAB, not raw sort. With how you're doing it, you are sorting each column / variable independently and that will give the wrong centroids. This will work for grayscale images where you only have one feature to consider, namely the grayscale, but if you go beyond grayscale and perhaps go into RGB or whatever colour space you desire, using raw sort will give you incorrect results. You need to consider each row as a single point, then sort the rows jointly.
Given your code, you'd do something like this:
%% K-mean
nrows = size(Img_original,1);
ncols = size(Img_original,2);
I_1D = reshape(Img_original,nrows*ncols,1);
[cluster_idx mu]=kmeans(double(I_1D),num_cluster,'distance','sqEuclidean','Replicates',3);
%% Sort based on mu
[mu_sort id_sort]=sortrows(mu);
%// New - Create lookup
lookup = containers.Map(id_sort, 1:size(mu_sort,1));
%// Relabel the vector
cluster_idx_sort = lookup.values(num2cell(cluster_idx));
cluster_idx_sort = [cluster_idx_sort{:}];
%// Reshape back to original image dimensions
cluster_label = reshape(cluster_idx_sort,nrows,ncols);
This should hopefully give you some speedup in your code.
To double check, I tried this on the cameraman.tif image, that's part of the image processing toolbox. Running the code gives me these cluster centres:
>> mu
mu =
Once I sort the clusters in ascending order, this is what I get for the ordering and for the centroids:
>> mu_sort
mu_sort =
>> id_sort
id_sort =
So that works as we expected... now if we display the original cluster label map before sorting on the centroids with:
cluster_label = reshape(cluster_idx, nrows, ncols);
... we get this image:
Now, if we run through the sorting logic and display the centroids:
imshow(cluster_label, []);
... we get this image:
This works as I expected. Because the centroids flipped, so should the colouring.

How to average multiple images using Octave and matrix manipulation to reduce noise?

Here is my code that is meant to add up the two matrices and using element by element addition and then divide by two.
function [ finish ] = stackAndMeanImage (initFrame, finalFrame)
cd 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
pkg load image;
i = initFrame;
f = finalFrame;
astr = num2str(i);
tmp = imread(astr, 'jpg');
d = f - i
for a = 1:d
astr = num2str(i + 1);
read_tmp = imread(astr, 'jpg');
read_tmp = rgb2gray(read_tmp);
tmp = tmp :+ read_tmp;
tmp = tmp / 2;
imwrite(tmp, 'meanimage.JPG');
finish = 'done';
Here are two example input images
And here is one output image
I am really confused as to what is happening. I have not implemented what the other answers have said yet though.
I am working on an image processing project where I am now manually choosing images that are 'empty' or only have the background, so that my algorithm can compute the differences and then do some more analysis, I have a simple piece of code that computes the mean of the two images, which I have converted to grayscale matrices, but this only works for two images, because when I find the mean of two, then take this mean and find the mean of this versus the next image, and do this repeatedly, I end up with a washed out white image that is absolutely useless. You can't even see anything.
I found that there is a function in Matlab called imFuse that is able to average images. I was wondering if anyone knew the process that imFuse uses to combine images, I am happy to implement this into Octave, or if anyone knew of or has already written a piece of code that achieves something similiar to this. Again, I am not asking for anyone to write code for me, just wondering what the process for this is and if there are already pre-existing functions out there, which I have not found after my research.
You should not end up with a washed-out image. Instead, you should end up with an image, which is technically speaking temporally low-pass filtered. What this means is that half of the information content is form the last image, one quarter from the second last image, one eight from the third last image, etc.
Actually, the effect in a moving image is similar to a display with slow response time.
If you are ending up with a white image, you are doing something wrong. nkjt's guess of type challenges is a good one. Another possibility is that you have forgotten to divide by two after summing the two images.
One more thing... If you are doing linear operations (such as averaging) on images, your image intensity scale should be linear. If you just use the RGB values or some grayscale values simply calculated from them, you may get bitten by the nonlinearity of the image. This property is called the gamma correction. (Admittedly, most image processing programs just ignore the problem, as it is not always a big challenge.)
As your project calculates differences of images, you should take this into account. I suggest using linearised floating point values. Unfortunately, the linearisation depends on the source of your image data.
On the other hand, averaging often the most efficient way of reducing noise. So, there you are in the right track assuming the images are similar enough.
However, after having a look at your images, it seems that you may actually want to do something else than to average the image. If I understand your intention correctly, you would like to get rid of the cars in your road cam to give you just the carless background which you could then subtract from the image to get the cars.
If that is what you want to do, you should consider using a median filter instead of averaging. What this means is that you take for example 11 consecutive frames. Then for each pixel you have 11 different values. Now you order (sort) these values and take the middle (6th) one as the background pixel value.
If your road is empty most of the time (at least 6 frames of 11), then the 6th sample will represent the road regardless of the colour of the cars passing your camera.
If you have an empty road, the result from the median filtering is close to averaging. (Averaging is better with Gaussian white noise, but the difference is not very big.) But your averaging will be affected by white or black cars, whereas median filtering is not.
The problem with median filtering is that it is computationally intensive. I am very sorry I speak very broken and ancient Octave, so I cannot give you any useful code. In MatLab or PyLab you would stack, say, 11 images to a M x N x 11 array, and then use a single median command along the depth axis. (When I say intensive, I do not mean it couldn't be done in real time with your data. It can, but it is much more complicated than averaging.)
If you have really a lot of traffic, the road is visible behind the cars less than half of the time. Then the median trick will fail. You will need to take more samples and then find the most typical value, because it is likely to be the road (unless all cars have similar colours). There it will help a lot to use the colour image, as cars look more different from each other in RGB or HSV than in grayscale.
Unfortunately, if you need to resort to this type of processing, the path is slightly slippery and rocky. Average is very easy and fast, median is easy (but not that fast), but then things tend to get rather complicated.
Another BTW came into my mind. If you want to have a rolling average, there is a very simple and effective way to calculate it with an arbitrary length (arbitrary number of frames to average):
# N is the number of images to average
# P[i] are the input frames
# S is a sum accumulator (sum of N frames)
# calculate the sum of the first N frames
S <- 0
I <- 0
while I < N
S <- S + P[I]
I <- I + 1
# save_img() saves an averaged image
while there are images to process
save_img(S / N)
S <- -P[I-N] + S + P[I]
I <- I + 1
Of course, you'll probably want to use for-loops, and += and -= operators, but still the idea is there. For each frame you only need one subtraction, one addition, and one division by a constant (which can be modified into a multiplication or even a bitwise shift in some cases if you are in a hurry).
I may have misunderstood your problem but I think what you're trying to do is the following. Basically, read all images into a matrix and then use mean(). This is providing that you are able to put them all in memory.
function [finish] = stackAndMeanImage (ini_frame, final_frame)
pkg load image;
dir_path = 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
imgs = cell (1, 1, d);
## read all images into a cell array
current_frame = ini_frame;
for n = 1:(final_frame - ini_frame)
fname = fullfile (dir_path, sprintf ("%i", current_frame++));
imgs{n} = rgb2gray (imread (fname, "jpg"));
## create 3D matrix out of all frames and calculate mean across 3rd dimension
imgs = cell2mat (imgs);
avg = mean (imgs, 3);
## mean returns double precision so we cast it back to uint8 after
## rescaling it to range [0 1]. This assumes that images were all
## originally uint8, but since they are jpgs, that's a safe assumption
avg = im2uint8 (avg ./255);
imwrite (avg, fullfile (dir_path, "meanimage.jpg"));
finish = "done";

How to detect boundaries of a pattern [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Detecting thin lines in blurry image
So as the title says, I am trying to detect boundaries of patterns. In the images attached, you can basically see three different patterns.
Close stripe lines
One thick L shaped line
The area between 1 & 2
I am trying to separate these three, in say 3 separate images. Depend on where the answers go, I will upload more images if needed. Both idea or code will be helpful.
You can solve (for some values of "solve") this problem using morphology. First, to make the image more uniform, remove irrelevant minima. One way to do this is using the h-dome transform for regional minima, which suppresses minima of height < h. Now, we want to join the thin lines. That is accomplished by a morphological opening with a horizontal line of length l. If the lines were merged, then the regional minima of the current image is the background. So we can fill holes to obtain the relevant components. The following code summarizes these tasks:
f = rgb2gray(imread('http://i.stack.imgur.com/02X9Z.jpg'));
hm = imhmin(f, h);
o = imopen(hm, strel('line', l, 0));
result = imfill(~imregionalmin(o), 'holes');
Now, you need to determine h and l. The parameter h is expected to be easier since it is not related to the scale of the input, and in your example, values in the range [10, 30] work fine. To determine l maybe a granulometry analysis could help. Another way is to check if the result contains two significant connected components, corresponding to the bigger L shape and the region of the thin lines. There is no need to increase l one by one, you could perform something that resembles a binary search.
Here are the hm, o and result images with h = 30 and l = 15 (l in [13, 19] works equally good here). This approach gives flexibility on parameter choosing, making it easier to pick/find good values.
To calculate the area in the space between the two largest components, we could merge them and simply count the black pixels inside the new connected component.
You can pass a window (10x10 pixels?) and collect features for that window. The features could be something as simple as the cumulative gradients (edges) within that window. This would distinguish the various areas as long as the window is big enough.
Then using each window as a data point, you can do some clustering, or if the patterns don't vary that much you can do some simple thresholds to determine which data points belong to which patterns (the larger gradient sums belong to the small lines: more edges, while the smallest gradient sums belong to the thickest lines: only one edge, and those in between belong to the other "in-between" pattern .
Once you have this classification, you can create separate images if need be.
Just throwing out ideas. You can binarize the image and do connected component labelling. Then perform some analysis on the connected components such as width to discriminate between the regions.
