Return value of MPI_Dims_create() - parallel-processing

Assuming I have 64 processes and I want to create an MPI Cartesian Topology in 3-D, the default topology returned by MPI_Dims_create() is 4x4x4. Why is it 4x4x4 and why not 8x4x2 or 4x8x2 or 16x2x2 or any other combination that is possible ?

MPI_Dims_create is specifically made as convenience function to create a balanced topology.
A balanced topology, i.e. ideally cube has a certain optimal properties. Consider you are doing a simulation on a 160x160x160 grid with your processes.
With 4x4x4 each processor gets 40x40x40 to work on and in case of a simple border exchange has to send 40x40 to each of the 6 neighbors (9600 in total)
With 8x4x2 each processor gets 20x40x80, the border is 2x20x40 + 2x20x80 + 2x40x80 = 11200
With 16x2x2 each processor gets 10x80x80, the border is 4x10x80 + 2x80x80 = 16000
As you can see, the border size that needs to be exchanged is the smallest for the cube. Generally, a balanced topology is a good default.
You can also set constraints with MPI_Dims_create or use MPI_Cart_create to create flexible Cartesian topologies.

Related

FlowField Pathfinding on Large RTS Maps

When building a large map RTS game, my team are experiencing some performance issues regarding pathfinding.
A* is obviously inefficient due to not only janky path finding, but processing costs for large groups of units moving at the same time.
After research, the obvious solution would be to use FlowField pathfinding, the industry standard for RTS games as it stands.
The issue we are now having after creating the base algorithm is that the map is quite large requiring a grid of around 766 x 485. This creates a noticeable processing freeze or lag when computing the flowfield for the units to follow.
Has anybody experienced this before or have any solutions on how to make the flowfields more efficient? I have tried the following:
Adding flowfields to a list when it is created and referencing later (Works once it has been created, but obviously lags on creation.)
Processing flowfields before the game is started and referencing the list (Due to the sheer amount of cells, this simply doesn't work.)
Creating a grid based upon the distance between the furthest selected unit and the destination point (Works for short distances, not if moving from one end of the map to the other).
I was thinking about maybe splitting up the map into multiple flowfields, but I'm trying to work out how I would make them move from field to field.
Any advice on this?
Thanks in advance!
Maybe this is a bit late answer. Since you have mentioned that this is a large RTS game, then the computation should not be limited to one CPU core. There are a few advice for you to use flowfield more efficiently.
Use multithreads to compute new flow fields for each unit moving command
Group units, so that all units in same command group share the same flowfield
Partition the flowfield grids, so you only have to update any partition that had modification in path (new building/ moving units)
Pre baked flowfields grid slot cost:you prebake basic costs of the grids(based on environments or other static values that won't change during the game).
Divide, e.g. you have 766 x 485 map, set it as 800 * 500, divide it into 100 * 80 * 50 partitions as stated in advice 3.
You have a grid of 10 * 10 = 100 slots, create a directed graph (https://en.wikipedia.org/wiki/Graph_theory) using the a initial flowfield map (without considering any game units), and use A* algorihtm to search the flowfield grid before the game begins, so that you know all the connections between partitions.
For each new flowfield, Build flowfield only with partitions marked by a simple A* search in the graph. And then use alternative route if one node of the route given by A* is totally blocked from reaching the next one (mark the node as blocked and do A* again in this graph)
6.Cache, save flowfield result from step.5 for further usage (same unit spawning from home and going to the enemy base. Same partition routes. Invalidate cache if there is any change in the path, and only invalidate the cache of the changed partition first, check if this partition still connects to other sides, then only minor change will be made within the partition only)
Runtime late updating the units' command. If the map is large enough. Move the units immediately to the next partition without using the flowfield, (use A* first to search the 10*10 graph to get the next partition). And during this time of movement, in the background, build the flowfield using previous step 1-6. (in fact you only need few milliseconds to do the calculation if optimized properly, then the units changes their route accordingly. Most of the time there is no difference and player won't notice a thing. In the worst case, where we finally have to search all patitions to get the only possible route, it is only the first time there will be any delay, and the cache will minimise the time since it is the only way and the cache will be used repeatitively)
Re-do the build process above every once per few seconds for each command group (in the background), just in case anything changes in the middle of the way.
I could get this working with much larger random map (2000*2000) with no fps drop at all.
Hope this helps anyone in the future.

Finding the flow of water in a connected network

I have a set of water meters for water consumers drawn up as geojson and visualized with ol3. For each consumer house i have their usage of water for the given year, and also the water pipe system is given as linestrings, with metadata for the diameter of each pipe section.
What is the minimum required information I need to be able to visualize/calculate the amount of water that passed each pipe in total of the year when the pipes have inner loops/circles.
is there a library that makes it easy to do the calculations in javascript.
Naive approach, start from each house and move to the first pipe junction and add the used mater measurement for the house as water out of the junction and continue until the water plant is reached. This works if there was no loops within the pipe system.
This sounds more like a physics or civil engineering problem than a programming one.
But as best I can tell, you would need time series data for sources and sinks.
Consider this simple network:
Say, A is a source and B and D are sinks/outlets.
If the flow out of B is given, the flow in |CB| would be dependent on the flow out of D.
So e.g. if B and D were always open at the same time, the total volume that has passed |CB| might be close to 0. Conversely, if B and D were never open at the same time the number might be equal to the volume that flowed through |AB|.
If you can obtain time series data, so you have concurrent values of flow through D and B, I would think there would exist a standard way of determining the flow through |CB|.
Wikipedia's Pipe Network Analysis article mentions one such method: The Hardy Cross method, which:
"assumes that the flow going in and out of the system is known and that the pipe length, diameter, roughness and other key characteristics are also known or can be assumed".
If time series data are not an option, I would pretend it was always average (which might not be so bad given a large network, like in your image) and then do the same thing.
You can use the Ford-Fulkerson algorithm to find the maximum flow in a network. To use this algorithm, you need to represent your network as a graph with nodes that represent your houses and edges to represent your pipes.
You can first simplify the network by consolidating demands on "dead-ends". Next you'll need pressure data at the 3 feeds into this network, which I see as the top feed from the 90 (mm?), centre feed at the 63 and bottom feed near to the 50. These 3 clusters are linked by a 63mm running down, which have the consolidated demand and the pressure readings at the feed would be sufficient to give the flowrate across the inner clusters.

Finding regions in scattered data

I have a number of scattered data sets in Nx3 matrices, a simple example plotted with scatter3 is shown below (pastebin of the raw values):
Each of my data sets have an arbitrary number of regions/blobs; the example above for instance has 4.
Does anyone know of a simple method to programmatically find the number of regions in this form of data?
My initial idea was to use a delaunayTriangulation, convexHull approach, but without any data treatment this will still only find the outer volume of the entire plot rather than each region.
The next idea I have would involve grabbing nearest neighbour statistics of each point, asking if it's within a grid size distance of another point, then lumping those that are in to separate blobs/clusters.
Is there a higher level Matlab function I'm not aware of that could assist me here, or does anyone have a better suggestion of how to pull the region count out of data like this?
It sounds like you need a clustering algorithm. Fortunately for you, MATLAB provides a number of these out of the box. There are plenty of algorithms to choose from, and it sounds like you need something where the number of clusters is unknown beforehand, correct?
If this is the case, and your data is as "nice" as your example I would suggest kmeans combined with a technique to properly choose "k", as suggested here.
There are other options of course, I recommend you learn more about the clustering options in MATLAB, here's a nice reference for more reading.
Determining the number of different clusters in a dataset is a tricky problem and probably hard than what we might thought on first sight. In fact, algorithms like k-means are heavily dependent on this. Wikipedia has a nice article on it, but no clear and easy method.
The Elbow method as mentioned in there seems to be comparatively easy to do, although might be computationally costly. In essence, you could just try to use different number of clusters, and choose the number where the variance explained doesn't grows much and plateaued out.
Also, the notion of a cluster need to be clearly defined - what if zooming into any of the blobs displays a similar structure as the corner structure in the picture?
I would suggest implementing a "light" version of the Gaussian Mixture Model.
Have each point "vote" for a cube. In the above example, all points centered around (-1.5,-1.5,0)
would each add a +1 to the square [-1,-2]x[-1,-2]x[0.2,-0.2]. Finally, you can analyze the peaks in the voting matrix.
In the interest of completeness there's a vastly simpler answer to this problem (which I have built) than Hierarchical clustering; which gives much better results and can differentiate between 1 cluster or 2 (an issue that I couldn't manage to fix with MarkV's suggestions). This assumes your data is on a regular grid of known size, and you have an unknown amount of clusters that are separated by at least 2*(grid size):
% Idea is as follows:
% * We have a known grid size, dx.
% * A random point [may as well be minima(1,:)] will be in a cluster of
% values if any others in the list lie dx away (with one dimention
% varied), sqrt(2)*dx (two dimensions varied) or sqrt(3)*dx (three
% dimensions varied).
% * Chain these objects together until all are found, any with distances
% beyond sqrt(3)*dx of the cluster are ignored for now.
% * Set this cluster aside, repeat until no minima data is left.
function [blobs, clusterIdx] = findClusters(minima,dx)
%problem setup
dx2 = sqrt(2)*dx;
dx3 = sqrt(3)*dx;
eqf = #(list,dx,dx2,dx3)(abs(list-dx) < 0.001 | abs(list-dx2) < 0.001 | abs(list-dx3) < 0.001);
notDoneClust = true;
notDoneMinima = true;
clusterIdx = zeros(size(minima,1),1);
point = minima(1,:);
list = minima(2:end,:);
blobs = 0;
while notDoneMinima
cluster = nan(1,3);
while notDoneClust
[~, dist] = knnsearch(point,list); %All distances to each other point in data
nnidx = eqf(dist,dx,dx2,dx3); %finds indexes of nn values to point
cluster = cat(1,cluster,point,list(nnidx,:)); %add points to current cluster
point = list(nnidx,:); %points to check are now all values that are nn to initial point
list = list(~nnidx,:); %list is now all other values that are not in that list
notDoneClust = ~isempty(point); %if there are no more points to check, all values of the cluster have been found
end
blobs = blobs+1;
clusterIdx(ismemberf(minima,cluster(2:end,:),'rows')) = blobs;
%reset points and list for a new cluster
if ~isempty(list)
if length(list)>1
point = list(1,:);
list = list(2:end,:);
notDoneClust = true;
else
%point is a cluster of its own. Don't reset loops, add point in
%as a cluster and exit (NOTE: I have yet to test this portion).
blobs = blobs+1;
clusterIdx(ismemberf(minima,point,'rows')) = blobs;
notDoneMinima = false;
end
else
notDoneMinima = false;
end
end
end
I fully understand this method is useless for clustering data in the general sense, as any outlying data will be marked as a separate cluster. This (if it happens) is what I need anyway, so this may just be an edge case scenario.

Confusion with neural networks in MATLAB

I'm working on character recognition (and later fingerprint recognition) using neural networks. I'm getting confused with the sequence of events. I'm training the net with 26 letters. Later I will increase this to include 26 clean letters and 26 noisy letters. If I want to recognize one letter say "A", what is the right way to do this? Here is what I'm doing now.
1) Train network with a 26x100 matrix; each row contains a letter from segmentation of the bmp (10x10).
2) However, for the test targets I use my input matrix for "A". I had 25 rows of zeros after the first row so that my input matrix is the same size as my target matrix.
3) I run perform(net, testTargets,outputs) where outputs are the outputs from the net trained with the 26x100 matrix. testTargets is the matrix for "A".
This doesn't seem right though. Is training supposed by separate from recognizing any character? What I want to happen is as follows.
1) Training the network for an image file that I select (after processing the image into logical arrays).
2) Use this trained network to recognize letter in a different image file.
So train the network to recognize A through Z. Then pick an image, run the network to see what letters are recognized from the picked image.
Okay, so it seems that the question here seems to be more along the lines of "How do I neural networks" I can outline the basic procedure here to try to solidify the idea in your mind, but as far as actually implementing it goes you're on your own. Personally I believe that proprietary languages (MATLAB) are an abomination, but I always appreciate intellectual zeal.
The basic concept of a neural net is that you have a series of nodes in layers with weights that connect them (depending on what you want to do you can either just connect each node to the layer above and beneath, or connect every node, or anywhere in betweeen.). Each node has a "work function" or a probabilistic function that represents the chance that the given node, or neuron will evaluate to "on" or 1.
The general workflow starts from whatever top layer neurons/nodes you've got, initializing them to the values of your data (in your case, you would probably start each of these off as the pixel values in your image, normalized to be binary would be simplest). Each of those nodes would then be multiplied by a weight and fed down towards your second layer, which would be considered a "hidden layer" depending on the sum (either geometric or arithmetic sum, depending on your implementation) which would be used with the work function to determine the state of your hidden layer.
That last point was a little theoretical and hard to follow, so here's an example. Imagine your first row has three nodes ([1,0,1]), and the weights connecting the three of those nodes to the first node in your second layer are something like ([0.5, 2.0, 0.6]). If you're doing an arithmetic sum that means that the weighting on the first node in your "hidden layer" would be
1*0.5 + 0*2.0 + 1*0.6 = 1.1
If you're using a logistic function as your work function (a very common choice, though tanh is also common) this would make the chance of that node evaluating to 1 approximately 75%.
You would probably want your final layer to have 26 nodes, one for each letter, but you could add in more hidden layers to improve your model. You would assume that the letter your model predicted would be the final node with the largest weighting heading in.
After you have that up and running you want to train it though, because you probably just randomly seeded your weights, which makes sense. There are a lot of different methods for this, but I'll generally outline back-propagation which is a very common method of training neural nets. The idea is essentially, since you know which character the image should have been recognized, you compare the result to the one that your model actually predicted. If your model accurately predicted the character you're fine, you can leave the model as is, since it worked. If you predicted an incorrect character you want to go back through your neural net and increment the weights that lead from the pixel nodes you fed in to the ending node that is the character that should have been predicted. You should also decrement the weights that led to the character it incorrectly returned.
Hope that helps, let me know if you have any more questions.

OpenGL ES 2.0 on Tegra2: How many GPU cores are used in glDrawArrays/glDrawElements functions?

Does Anybody have information about how much GPU cores will be used when I call glDrawArrays/glDrawElements?
A bit more details to explain my question.
Processor Tegra2 has 4cores GPU. To work libGLESv2.so is used.
After all preparatory works have been done (create and link shaders; upload textures and etc), I call DRAW function which started rasterization and create image in framebuffer.
I think, DRAW function has to use as more cores as possible to do rasterizarion more fast.
But I can't found any documents which confirm my theory.
Description of OpenGL has only information about there own level API, and, understandably, not any information about below levels. NVIDIA don't present description how libGLESv2.so is realized.
If nobody don't want to answer, I will do it myself :)
After a few attempts, I got following results:
!!Please, note I use GPU to data computing. Using linear data arrays so screen is defined with height = 1 and wight = array size. In order to computing line with wight = array size is drawn.
DRAW functions use as much cores as possible.
But it depends on how much verteces are send to DRAW.
For example: if line (2 vertices) is drawn one performance is got. If this line is divided on the several small lines (4 or more vertices), performance will be better.
All in all, as I suppose to use all cores is necessary to call DRAW with number of vertices at least equal the number of GPU cores . In my case computing was faster about 20% when line was divided on the two sublines. Following divide almost didn't give performance increasing.

Resources