What could prevent OpenGL glDrawPixels from working on some video cards? - macos

The following code writes no data to the back buffer on Intel integrated video cards,for example, on a MacBook. On ATI cards, such as in the iMac, it draws to the back buffer. The width and height are correct (and 800x600 buffer) and m_PixelBuffer is correctly filled with 0xAA00AA00.
My best guess so far is that there is something amiss with needing glWindowPos set. I do not currently set it (or the raster position), and when I get GL_CURRENT_RASTER_POSITION I noticed that the default on the ATI card is 0,0,0,0 and the Intel it's 0,0,0,1. When I set the raster pos on the ATI card to 0,0,0,1 I get the same result as the Intel card, nothing drawn to the back buffer. Is there some transform state I'm missing? This is a 2D application so the view transform is a very simple glOrtho.
glDrawPixels(GetBufferWidth(), GetBufferHeight(), GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, m_PixelBuffer);
Any more info I can provide, please ask. I'm pretty much an OpenGL and Mac newb so I don't know if I'm providing enough information.

I've always had problems with OpenGL implementations from Intel, though I'm not sure that's your problem this time. I think you're running into some byte-order issues. Give this a read and feel free to experiment with different constants for packing and color order.
http://developer.apple.com/documentation/MacOSX/Conceptual/universal_binary/universal_binary_tips/chapter_5_section_25.html
I know it's OSX guide, you can probably find similar OpenGL articles on other sites for other platforms. This should be applicable.

I've always had problems with OpenGL
implementations from Intel
This is kind of what I'm worried about, but I have a hard time believing they'd screw up something as basic as glDrawPixels, and also, since I can "duplicate" the problem by changing the raster position vector, it makes me think it's my fault and I'm missing something basic.
I think you're running into some
byte-order issues
That was my first inclination, and I've tried packing differently, with no result. I also tried packing the buffer with values that would present a usable alpha if swizzled, with no result. This is why I'm barking up the raster pos tree, but I'm still honestly not 100% sure. Note that I'm targeting only Intel Macs if that makes a difference.
Thanks for the link, it was a good read, and good to tuck away for future reference. I'd upmod but I can't until I get 3 more rep points :)

It's highly unlikely that a basic function like glDrawPixels might not be working. Have you tried some really simple settings like GL_RGB or GL_RGBA for format and GL_UNSIGNED_BYTE or GL_FLOAT for type? If not, can you share with us the smallest possible program which replicates your problem?

The default raster position should be (0,0,0,1), but you can reset it to make sure.
Just before calling glDrawPixels(), try
GLint valid;
glGet(GL_CURRENT_RASTER_POSITION_VALID, &valid);
This should tell you if the current raster position is valid. If it is, then this is not your problem.

Related

Vulkan/OpenGL subpasses that fetch more than single fragment

So, Vulkan introduced subpasses and opengl implelemts similar behaviour with ARM_framebuffer_fetch
In the past, I have used framebuffer_fetch successfully for tonemapping post-effect shaders.
Back then the limitation was that one could only read the contents of the framebuffer at the location of the currently rendered fragment.
Now, what I wonder is whether there is any way by now in Vulkan (or even OpenGL ES) to read from multiple locations (for example to implement a blur kernel) without having a tiled hardware to store/load to RAM.
In theory I guess it should be possible, the first pass wpuld just need to render slightly larger than the blur subpass, based on kernel size (so for example if kernel size was 4 pixels then the tile resolved would need to be 4 pixels smaller than the in-tile buffer sizes) and some pixels would have to be rendered redundantly (on the overlaps of tiles).
Now, is there a way to do that?
I seem to recall having seen some Vulkan instruction related to subpasses that would allow to define the support size (which sounded like what I’m looking for now) but I can’t recall where I saw that.
So my questions:
With Vulkan on a mobile tiled renderer architecture, is it possible to forward-render some geometry and the render a full-screen blur over it, all within a single in-tile pass (without the hardware having to store the result of the intermediate pass to ram first and then load the texture from ram when bluring)? If so, how?
If the answer to 1 is yes, can it also be done in OpenGL ES?
Short answer, no. Vulkan subpasses still have the 1:1 fragment-to-pixel association requirements.

Store resource in Direct2D on GPU

Is there some way to store a "scene" in Direct2D on the GPU?
I'm looking for something like ID2D1Mesh (i.e. storing the resource in vector format, not as a bitmap) but where I can configure if the mesh/scene/resource should be rendered with anti-aliasing or not.
Rick is correct in that you can apply antialiasing at two different levels. Either through Direct2D or through Direct3D. You can do both but that’s pointless and would only waste resources and lead to poor results. Direct2D antialiasing is suitable if you want per-primitive geometry-aware antialiasing. Direct3D antialiasing is useful if you want to sacrifice a bit of quality for better overall performance in some scenarios.
The Direct2D 1.1 command list literally stores/records a list of drawing commands that can be played back against different targets. This may be what you’re after as it’s not rasterized. Conceptually it’s like storing a vector image in device memory. Command lists are somewhat limited in that you cannot modify the command list once created and resources being drawn may also not be changed, but it’s still quite handy nonetheless.
There is a way to get antialiasing with ID2D1Mesh, but it's non-trivial. You have to create the Direct3D device yourself and then use ID2D1Factory::CreateDxgiSurfaceRenderTarget(). This allows you to configure the multisampling/antialiasing settings of the D3D device directly, and then meshes play along just fine (in fact I think you'd just always tell Direct2D to use aliased rendering). I haven't done this myself, but there is a MSDN sample that shows how to do this. It's not for the faint of heart ... and in order to do software rendering you have to initialize a WARP device. It does work, however.
Also, in Direct2D 1.1 (Windows 8, or Windows 7 + Platform Update), you can use the ID2D1CommandList interface for record/playback stuff. I'm not sure if that's implemented as "compile to GPU" (ala mesh), or if it's just macros (record/playback of commands).
In Windows 8.1, Direct2D introduced geometry realizations, which lets you store a tessellated version of the geometry and later render it back with or without anti-aliasing, just like you asked. These are highly recommended over the use of meshes. Command lists, while convenient, don't have the same caching abilities as creating and storing the geometry realizations yourself.

j2me: what is effect of 'g.clipRect()' on speed?

I'm developing in j2me and using canvas to drawing some images.
Now, my question is : what is difference of below sample codes in speed of drawing?
drawing after clipping area rectangle:
g.clipRect(x, y, myImage.getWidth(), myImage.getHeight());
g.drawImage(myImage, x , y, Graphics.TOP | Graphics.LEFT);
g.setClip(0, 0, screenWidth, screenHeight);
drawing without clip:
g.drawImage(myImage, x, y, Graphics.TOP | Graphics.LEFT);
is the first one is faster? I'm drawing on screen a lot.
Well the direct answer to your question would be Mu I'm afraid - because you appear to approach the issue from the wrong direction.
Thing is, clipping API is not intended for performance considerations / optimizations. You can find full coverage of its purpose in API documentation (available online), it does not state anything related to performance impact:
Clipping
The clip is the set of pixels in the destination of the Graphics object that may be modified by graphics rendering operations.
There is a single clip per Graphics object. The only pixels modified by graphics operations are those that lie within the clip. Pixels outside the clip are not modified by any graphics operations.
Operations are provided for intersecting the current clip with a given rectangle and for setting the current clip outright...
Attempting to use clipping API for imaginary performance considerations will make your code a nightmare to understand for future maintainers. Note this future maintainer may be you yourself, just few weeks / months / years later - I for one had my nose broken on my own code written some time ago without clearly understandable intent - trust me, it hurts the same as messing with poor code written by anyone else.
Don't get me wrong - there is a chance that clipping may have substantial performance impact in some particular case on specific device - why not, everything is possible given the variety of MIDP implementations. Know what? there is even a chance of it having an opposite impact on some other device, why not.
If (if) that happens, if (if) you'll somehow get a clear, solid, tested and proven justification of specific performance impact - then (then), go ahead, implement whatever tricks necessary to reach required performance, no matter how perverse they may be (BTDTGTTS). Until then, though, drop any baseless assumptions that just may come to your mind.
Until then... Just. Drop. It.
Developers love to optimize code and with good reason. It is so satisfying and fun. But knowing when to optimize is far more important. Unfortunately, developers generally have horrible intuition about where the performance problems in an application will actually be... Most performance tuning reminds me of the old joke about the guy who's looking for his keys in the kitchen even though he lost them in the street, because the light's better in the kitchen... (Brian Goetz)
This will almost certainly vary between platforms, and will depend on how much you're actually drawing.
I suggest you measure performance yourself by logging the number of paints per second, or the average duration of a paint method, and painting this on screen.
Drawing without clip should be faster on any platform for the simple reason that you are not calling two clip methods. But I might ask, why are you using clip to begin with?
You usually use clipping when you have an animation sprite or an icon variation in the same file. In this case you can create a file for each frame/icon. It will increase your jar file size and will use more heap space to hold these images on memory, but will be drawn faster.

How to work with pixels using Direct2D

Could somebody provide an example of an efficient way to work with pixels using Direct2D?
For example, how can I swap all green pixels (RGB = 0x00FF00) with red pixels (RGB = 0xFF0000) on a render target? What is the standard approach? Is it possible to use ID2D1HwndRenderTarget for that? Here I assume using some kind of hardware acceleration. Should I create a different object for direct pixels manipulations?
Using DirectDraw I would use BltFast method on the IDirectDrawSurface7 with logical operation. Is there something similar with Direct2D?
Another task is to generate complex images dynamically where each point location and color is a result of a mathematical function. For the sake of an example let's simplify everything and draw Y = X ^ 2. How to do that with Direct2D? Ultimately I'm going to need to draw complex functions but if somebody could give me a simple example for Y = X ^ 2.
First, it helps to think of ID2D1Bitmap as a "device bitmap". It may or may not live in local, CPU-addressable memory, and it doesn't give you any convenient (or at least fast) way to read/write the pixels from the CPU side of the bus. So approaching from that angle is probably the wrong approach.
What I think you want is a regular WIC bitmap, IWICBitmap, which you can create with IWICImagingFactory::CreateBitmap(). From there you can call Lock() to get at the buffer, and then read/write using pointers and do whatever you want. Then, when you need to draw it on-screen with Direct2D, use ID2D1RenderTarget::CreateBitmap() to create a new device bitmap, or ID2D1Bitmap::CopyFromMemory() to update an existing device bitmap. You can also render into an IWICBitmap by making use of ID2D1Factory::CreateWicBitmapRenderTarget() (not hardware accelerated).
You will not get hardware acceleration for these types of operations. The updated Direct2D in Win8 (should also be available for Win7 eventually) has some spiffy stuff for this but it's rather complex looking.
Rick's answer talks about the methods you can use if you don't care about losing hardware acceleration. I'm focusing on how to accomplish this using a substantial amount of GPU acceleration.
In order to keep your rendering hardware accelerated and to get the best performance, you are going to want to switch from ID2DHwndRenderTarget to using the newer ID2DDevice and ID2DDeviceContext interfaces. It honestly doesn't add that much more logic to your code and the performance benefits are substantial. It also works on Windows 7 with the Platform Update. To summarize the process:
Create a DXGI factory when you create your D2D factory.
Create a D3D11 device and a D2D device to match.
Create a swap chain using your DXGI factory and the D3D device.
Ask the swap chain for its back buffer and wrap it in a D2D bitmap.
Render like before, between calls to BeginDraw() and EndDraw(). Remember to unbind the back buffer and destroy the D2D bitmap wrapping it!
Call Present() on the swap chain to see the results.
Repeat from 4.
Once you've done that, you have unlocked a number of possible solutions. Probably the simplest and most performant way to solve your exact problem (swapping color channels) is to use the color matrix effect as one of the other answers mentioned. It's important to recognize that you need to use the newer ID2DDeviceContext interface rather than the ID2DHwndRenderTarget to get this however. There are lots of other effects that can do more complicated operations if you so choose. Here are some of the most useful ones for simple pixel manipulation:
Color matrix effect
Arithmetic operation
Blend operation
For generally solving the problem of manipulating the pixels directly without dropping hardware acceleration or doing tons of copying, there are two options. The first is to write a pixel shader and wrap it in a completely custom D2D effect. It's more work than just getting the pixel buffer on the CPU and doing old-fashioned bit mashing, but doing it all on the GPU is substantially faster. The D2D effects framework also makes it super simple to reuse your effect for other purposes, combine it with other effects, etc.
For those times when you absolutely have to do CPU pixel manipulation but still want a substantial degree of acceleration, you can manage your own mappable D3D11 textures. For example, you can use staging textures if you want to asynchronously manipulate your texture resources from the CPU. There is another answer that goes into more detail. See ID3D11Texture2D for more information.
The specific issue of swapping all green pixels with red pixels can be addressed via ID2D1Effect as of Windows 8 and Platform Update for Windows 7.
More specifically, Color matrix effect.

Tips for optimizing performance of -webkit-transform?

I'm using webkit-transform: translate3d and a few other properties pretty extensively on a mobile app for iPhone because its hardware accelerated. With about 98% of the features in place, performance is great. I'm aware of not trying to do too much at once.
I'm successfully simulating swiping in a very excellent, native way. What I've noticed now is that when I add the last 2% of features I'm seeing some image redrawing issues in the that is being animated while swiping. After you swipe through all 4 images and they load, then performance is perfectly smooth again. However, when this section is hidden and shown, the same thing happens.
What I hypothesize is happening is there's an internal buffer being hit and it has to reload each time.
So this with that background, the general question is what kinds of performance optimizations have other developers been making for -webkit-transform? I'm not necessarily asking about my particular situation, but rather what wider range of optimizations have people figured out for their individual needs?
Hopefully if this question gets some answers, it can be a resource for other folks asking the same question down the road.
It's a fairly well known thing, but making sure the element you transform is using 3d transforms where possible helps a lot on devices that hardware acccelerate transforms (iOS at the moment).
The easiest way to do that is to add:
transform: translate3d(0,0,0);
with the appropriate prefixes to the css of the element in question, then just animate it as normal, either by using 2d or 3d transforms.
It might sound a bit weird but i had a similar issue and i solved it by using -webkit-perspective: 1000.
Don't know how this acts in favor of the transitions, but in my case it did.

Resources