Mutably borrow multiple disjoint sections of an ImageBuffer

Mutably borrow multiple disjoint sections of an ImageBuffer - image

I would like to wrap an inner image with tiled versions of wrapper images using the image-rs crate.
Like so (the blue square is the inner image, and the other colors are the wrappers. Sizes may differ. One layer may be N pixels wide, while the next is K pixels wide. The wrapper layers are not solid colors, but scaled images. This is simply example output illustrating the problem):
I have some code that accomplishes this, in a single RgbaImage allocation, and in a single thread:
let out: RgbaImage = construct_image(); //constructed differently, but here to show types
...
//TODO This loop can be parallelized?
for LocalizedWrapper { // LocalizedWrapper just holds coords, size, and wrapper that knows how to wrap around an image, given coords and size
coordinates, //top-left coordinates
size, //size of bounding rectangle
wrapper, //abstract logic for wrapping differently
} in localized_wrappers
{
wrapper.wrap_mut(&mut out, coordinates, size, ppi)?; //wrap_mut() calls image::ImageBuffer::sub_image(), image::imageops::tile(), and friends
}
This works great (it generated the image above). However, it mutably borrows the entire RgbaImage on every iteration; and I'd like to parallelize the loop to write each layer concurrently. This seems sound as I can guarantee the layers will never overlap.
I initially thought to convert the RgbaImage to its raw content using ImageBuffer::into_raw(), then using slice.split_at_mut() repeatedly, using Rayon to populate each slice in parallel; but then I end up with a messy split job and multiple &mut [u8] per layer with no way of recreating a GenericImage for the layer.
tl;dr How do I wrap multiple images around a central image, in parallel, in a single allocation, using GenericImage to allow using image::imageops::tile() and friends in wrap_mut()?

Related

How should you correctly encode a large number of blit or scaling commands in Metal?

In an application I'm working on that uses a traditional Metal rendering loop, I periodically need to copy a lot of image data from IOSurfaces to MTLTextures so that the data can be accessed in the fragment shader. I am trying to learn the most effective way to do that.
Each IOSurface represents a tile in a potentially very large image. (Like a stitched panorama.) Core Image is used for rendering image data into each IOSurface.
In Metal, I have an MTLTexture of type 2DArray that contains enough slices to cover the viewport and/or image itself, if the image is "zoomed out" smaller than the view's size.
The IOSurface and MTLTexture each of dimensions that are powers-of-two, but they might be different dimensions at times. When they are the same dimension, I use an MTLBlitCommandEncoder but when they differ in size I use MPSImageScale.
If I need to copy a lot of IOSurface's to a lot of Metal Textures, should I do it one-at-a-time, in batches or all at once?
Attempt #1: All At Once
This method works but starts to breakdown if the number of visible surfaces becomes quite large. You end up pre-allocating a bunch of surface-backed textures before committing them. This method seems the most logical to me, but it also causes the most warnings in Xcode's GPU insights and uses up the most texture memory when it doesn't need to.
Pseudo-code below:
func renderAllAtOnce() {
// Create one command buffer.
let commandBuffer = commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer.makeBlitCommandEncoder()
// Encode a copy for each surface.
for surface in visibleSurfaces {
// Make a texture from the surface.
let surfaceTexture = makeTextureFromSurface(surface)
// Copy from the surface-backed texture into the appropriate slice in the destination texture.
bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
}
// Commit the encoder.
blitEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Bind textures and issue draw calls using a render encoder.
renderEncoder.draw(...)
}
Attempt 2: In Batches
In this implementation, I arbitrarily group the copy copy commands into groups of 10. This means I only ever pre-allocate up to 10 surface-backed 'sourceTextures' before committing the buffer. This seems to make the GPU a bit happier but the value of 10 seems rather arbitrary. Is there an optimum number here one could determine based on the hardware?
func renderInBatches() {
// Arbitrarily group surfaces into groups of 10.
for group in visibleSurfaces(groupsOf: 10) {
// Create a new command buffer and encoder for each group.
let commandBuffer = commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer.makeBlitCommandEncoder()
// Encode only up to 10 copy commands.
for surface in group {
let surfaceTexture = makeTextureFromSurface()
bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
}
blitEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
// Bind textures and issue draw calls using a render encoder.
}
Attempt 3: One at a Time
No code, but this option is just using the batch option above but with groups of 1. In effect, creating a new command buffer and blit encoded for every surface that needs to be copied to a texture. Initially this seemed incredibly wasteful, but now I realize that command buffers and encoders are quite lightweight. After all, you create new ones on each render pass anyways.
But is doing it once at a time under-utilizing the GPU? There's no dependencies between the copy operations.
TL;DR
If you have to issue a lot of blit copy commands, or scale commands using MPS, what is the most efficient and "correct" way of doing that?
For now I'm building against macOS 11.0 and higher. The application is expected to run on any supported hardware.

You should definitely put as much work in a command buffers and encoders as possible.
In this case, you can have a single command buffer, which you populate with image filters first, and then do all the blits in a single blit command encoder.
On another note, you can also create an MTLTexture from IOSurface, so you won't have to blit if they have the same dimensions.
https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc

Effecient algorithms for image pixel manipulations

Is there any efficient method to access and change image pixels than the usual scanning the pixel array and changing them? I've a psuedocode but I want a a better method than this. I just need an algorithm, any language is fine. It looks something like this:-
For i in range(0,len(pixel_array),4)
pixel_array[0] = a //a is some random value
pixel_array[1] = a
pixel_array[2] = a
pixel_array[3] = 1

I had a similar problem once, while i was moving pixels around in an SDL_Texture. There is not really any better option for you to do this, unless you tell us, what exactly you want to to. Do you have to manipulate pixel per pixel or can you set complete areas in one (e.g. if you draw a line, you could use memset to set a whole range in the array to that data)? You need to check, what could be done any faster. Drawing a rectangle can be optimized by editing the pixels in a bulk. But if you really need to change every pixel independently, then, no, there isnt any faster attempt besides doing it on the GPU instead of the CPU (e.g. using OpenGL).

WebGL: Framebuffers and textures with one, one-byte channel?

I'm generating blurred drop shadows in WebGL by drawing the object to be blurred onto an off-screen framebuffer/texture, then applying a few passes of a filter to it (back and forth between two off-screen framebuffers), then copying the result to the final output.
However, I'm just dropping the RGB channels, overwriting them with the desired color of the drop shadow (usually black) while maintaining the alpha channel. It seems like I could probably get better performance by just having my off-screen framebuffers be a single (alpha) channel.
Is there a way to do that, and would it actually help?
Also, is there a better way to apply multiple passes of a filter than just alternating between two frame buffers and using the previous frame buffer's bound texture as the input?

Assuming WebGL follows GLES then per the spec (Page 91):
The name of the color buffer of an application-created framebuffer object
is COLOR_ATTACHMENT0 ... Color buffers consist of R, G, B, and,
optionally, A unsigned integer values.
So you can't attach only to A, or only to any single colour channel.
Options to explore:
Use colorMask to disable writing to R, G and B. Depending on what data layout your GPU uses internally you can imagine that could effectively achieve exactly what you want or possibly have no effect whatsoever.
Is there a way you could render to the depth channel instead of to the alpha channel?
Reducing memory bandwidth is often helpful but if it's not a bottleneck then you could end up prematurely optimising.
To avoid excessive per-frame ping-ponging you'd normally attempt to reform your shader so that it does the effect of all the stages in one. Otherwise consider whether there's any better than-linear way to combine multiple passes. Instead of knowing only how to get from stage n to stage n+1, can you go from stage n to stage 2n? Or even just n+2?

Storing data for levels in a game like RISK or Total War

I'm working on a game which is a bit like the boardgame RISK, or the campaign section of the Total War series. I currently have a working implementation of the region system, but because of bad performance, the game hangs after certain commands. I'm sure it is possible to do it better.
What I want to do
I want to be able to present a map, such as a world map, and divide it up into regions (e.g. countries). I want to be able to select regions by clicking on them, send units to them, and get the adjacent regions.
What I've tried
A map is defined by 3 files:
A text file, which contains data formatted like this:
"Region Name" "Region Color" "Game-related information" ["Adjacent Region 1", "Adjacent Region 2", ...]'
An image file, where each region is seperated by a black border and has its own color. So for example there could be two regions, one would have the RGB values 255, 0, 0 (red) and another one 255, 255, 255 (white). They are seperated by a black border (but this is not necessary for the algorithm to work).
Another image file, which is the actual image that is drawn to the screen. It is the "nice looking" map.
An example of such a colour map:
(All the white parts evaluate to the same region in the current implementation. Just imagine they all have different colours).
When I load these files, I first load the colour image. Then I load the text file and go through each line. I create regions with the correct settings, as I want to. There's no real performance hit here, as it's simply reading data. A bunch of Region objects is then made, and given the correct colors.
At this stage, everything works fine. I can click on regions, ask the pixel data of the colour image, and by going through all the Regions in a list I can find the one that matches the colour of that particular pixel.
Issues
However, here's where the performance hit comes in:
Issue 1: Units
Each player has a bunch of units. I want to be able to spawn these units in a region. Let's say I want to spawn a unit in the red region. I go through all the pixels in my file, and when I hit a red one, I place the unit there.
for(int i = 0; i < worldmap.size(); i++) {
for(int j = 0; j < worldmap[i].size(); j++) {
if(worldmap[i][j].color == unit_color) {
// place it here
}
}
}
A simple glance at this pseudocode shows that this is not going to work well. Not at a reasonable pace, anyway.
Issue 2: Region colouring
Another issue is that I want to colour the regions owned by players on the "nice looking" map. Let's say player one owns three regions: Blue, Red and Green. I then go through the worldmap, find the blue, red and green pixels on the colour image, and then colour those pixels on the "nice looking" map in a transparent version of the player colour.
However, this is also a very heavy operation and it takes a few seconds.
What I want to ask
Since this is a turn based game, it's not really that big a deal that every now and then, the game slows down a bit. However, it is not to my liking that I'm writing this ugly code.
I have considered other options, such as storing each point of a region as a float, but that would be a massive strain on memory (64 bits times a 3000x1000 resolution image is a lot).
I was wondering if there are algorithms created for this, or if I should try to use more memory to relieve the processor. I've looked for other games and how they do this, but to no avail. I've yet to find some source code on this, or an article.
I have deliberately not put too many code in this question, since it's already fairly lengthy, and the code has a lot of dependencies on other parts of my application. However, if it is needed to solve the problem, I will post some ASAP.
Thanks in advance!

Problem 1: go through the color map with a step size of 10 in both X and Y directions. This reduces the number of pixels considered by a factor of 100. Works if each country contains a square of at least 10x10 pixels.
Problem 2: The best solution here is to do this once, not once per player or once per region. Create a lookup table from region color to player color, iterate over all pixels of the region map, and look up the corresponding player color to apply.
It may help to reduce the region color map to RGB 332 (8 bits total). You probably don't need that many fine shades of lila, and using just one byte colors makes the lookup table a lot easier, just a plain array with 256 elements would work. Considering your maps are 3000x1000 pixels, this would also reduce the map size by 6 MB.
Another thing to consider is whether you really need a region map with 3000x1000 pixel resolution. The nice map may be that big, but the region map could be resampled at 1500x500 pixel resolution. Your borders looked thick enough (more than 2 pixels) so a 1 pixel loss of region resolution would not matter. Yet it would reduce the region map by another 2.25 MB. At 750 kB, it now probably fits in the CPU cache.

What if you traced the regions (so one read through the entire data file) and stored the boundaries. For example, in Java there is a Path2D class which I have used before to store the outlines of states. In fact, if you used this method your data file wouldn't even need all the pixel data, just the boundaries of the areas. This is especially true since it seems your regions aren't dynamic, so you can simply hard-code the boundary values into the data file.
From here you can simply target a location within the boundaries (most libraries/languages with this concept support some sort of isPointInBoundary(x, y) method). You could even create your own Region class that that has a boundary saved to it along with other information (such as what game pieces are currently on it!).
Hope that helps you think about it clearer - should be pretty nice to code too.

How to implement batches using webgl?

I am working on a small game using webgl. Within this game I have some kind of forest which consists out of many (100+) tree objects. Because I only have a few different tree models, I rotate and scale these models in a different way before I display them.
At the moment I loop over all trees to display them:
for (var tree in trees) {
tree.display();
}
While the display() method of tree looks like:
display : function() { // tree
this.treeModel.setRotation(this.rotation);
this.treeModel.setScale(this.scale);
this.treeModel.setPosition(this.position);
this.treeModel.display();
}
Many tree objects share the same treeModel object, so I have to set rotation/scale/position of the model everytime before I display it. The rotation/scale/position values are different for every tree.
The display method of treeModel does all the gl stuff:
display : function() { // treeModel
// bind texture
// set uniforms for projection/modelview matrix based on rotation/scale/position
// bind buffers
// drawArrays
}
All tree models use the same shader but can use different textures.
Because a single tree model consists only out of a few triangles I want to combine all trees into one VBO and display the whole forest with one drawArrays() call.
Some assumptions to make talking about numbers easier:
There are 250 trees to display
There are 5 different tree models
Every tree model has 50 triangles
Questions I have:
At the moment I have 5 buffers that are 50 * 3 * 8 (position + normal + texCoord) * floatSize bytes large. When i want to display all trees with one vbo i would have a buffer with 250 * 50 * 3 * 8 * floatSize byte size. I think I can't use an index buffer because I have different position values for every tree (computed out of the position value of the tree model and the tree position/scale/rotation). Is this correct or is there still a way I can use index buffers to reduce the buffer size at least a bit? Maybe there are other ways to optimize this?
How to handle different textures of the tree models? I can bind all textures to different texture units but how can I decide within the shader which texture should be used for the fragment that is currently displayed?
When I want to add a new tree (or any other kind of object) to this buffer at runtime: Do I have to create a new buffer and copy the content? I think new values can't be added by using glMapBuffer. Is this correct?

Index element buffers can only reach over attributes that are equal to or below 65535 in length, so you need to use drawArrays instead. It's usually not a big loss.
You can add trees to the end of the buffers using GL.bufferSubData.
If your textures are in reasonable sized (like 128x128 or 256x256), you can probably merge them into one big texture and handle the whole thing with the UV-coords. If not, you can add another attribute saying what texture the vertex belongs to and have a condition in the vertex shader, alternatively an array of sampler2Ds (not sure it works, never tried it). Remember that conditions in shaders are pretty slow.
If you decide to stick to your current solution, make sure to sort the trees so the once using the same textures are rendered after each other - keeping state switching down is essential, always.

A few thoughts:
Once you plant a tree in your world, do you ever modify it? Will it animate at all? Or is it just static geometry? If it's truly static, you could always build a single buffer with several copies of each tree. As you append trees, first apply (in Javascript) that instance's world transform to the vertices. If using triangle strips, you can link trees together using degenerate polygons.
You could roll your own pseudo-instanced drawing:
Encode an instance ID in the array buffer. Just set this to the same value for all vertices that are part of the same tree instance. I seem to recall that you can't have non-floaty vertex attributes in ES GLSL (maybe that's a Chrome limitation), so you will need to bring it in as a float but use it as an int. Since it's coming in as a float, you will have to deal with the fact that it's interpolated across your triangle, and so the value will have minor fluctuations - but simply rounding to the nearest integer fixes that right up.
Use a separate texture (which I will call the data texture) to encode all the per-instance information. In your vertex shader, look at the instance ID of the current vertex and use that to compute a texture coordinate in the data texture. Pull out whatever you need to transform the current vertex, and apply it. I think this is called a "dependent texture read", which is generally frowned upon because it can cause performance issues, but it might help you batch your geometry, which can help solve performance issues. If you're interested, you'll have to try it and see what happens.
Hope for an extension to support real instanced drawing.

Your current approach isn't so bad. I'd say: Stick with it until you hit some wall.
50 triangles is already a reasonable batch size for a single drawElements/drawArrays call. It's not optimal, but also not so bad. So for every tree change the paramters like location, texture and maybe shape through uniforms. Then do a draw call for each tree. Also a total of 250 drawElements calls isn't so bad either.
So I'd use one single VBO that contains all the used tree geometry variants. I'd actually split up the trees into building blocks, so that I could recombine them for added variety. And for each tree set appropriate offsets into the VBO before calling drawArrays or drawElements.
Also don't forget that you can do a very cheap field of view culling of each tree.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio