FFMPEG Frame to DirectX Surface Hardware Accelerated

FFMPEG Frame to DirectX Surface Hardware Accelerated - windows

I use ffmpeg functions to decode h264 frames and display in a window on windows platform. The approach which I use is as below (from FFMPEG Frame to DirectX Surface):
AVFrame *frame;
avcodec_decode_video(_ffcontext, frame, etc...);
lockYourSurface();
uint8_t *buf = getPointerToYourSurfacePixels();
// Create an AVPicture structure which contains a pointer to the RGB surface.
AVPicture pict;
memset(&pict, 0, sizeof(pict));
avpicture_fill(&pict, buf, PIX_FMT_RGB32,
_ffcontext->width, _ffcontext->height);
// Convert the image into RGB and copy to the surface.
img_convert(&pict, PIX_FMT_RGB32, (AVPicture *)frame,
_context->pix_fmt, _context->width, _context->height);
unlockYourSurface();
In the code, I use sws_scale instead of img_convert.
When I pass the surface data pointer to sws_scale (in fact in avpicture_fill), it seems that the data pointer is actually on RAM not on GPU memory, and when I want to display the surface, it seems that the data is moved to GPU and then displayed. As I know CPU utilization is high when data is copied between RAM and GPU memory.
How I can tel ffmpeg to render directly to a surface on GPU memory (not a data pointer on RAM)?

I have found the answer to this problem. To prevent extra cpu usage in displaying frames using ffmpeg, we must not decode the frame to RGB. Almost all of the video files are decoded to YUV (this is the original image format inside the video file). The point here is that GPU is able to display YUV data directly without need to convert it to RGB. As I know, using ffmpeg usual version, decoded data is always on RAM. For a frame, the amount of YUV data is very small as compared to RGB decoded equivalent of the same frame. So when we move YUV data to GPU instead of converting to RGB and then moving to GPU, we speed up operation from two sides:
No conversion to RGB
Amount of data moved between RAM and GPU is decreased
So finally the overall CPU usage is decreased.

Related

What is the most effective pixel format for WIC bitmap processing?

I'm trying to make a simple video player like program with Direct2D and WIC Bitmap.
It requires fast and CPU economic drawing (with stretch) of YUV pixel format frame data.
I've already tested with GDI. I hope switching to Direct2D give at least 10x performance gain (smaller CPU overhead).
What I'll doing is basically such as below:
Create an empty WIC bitmap A (for drawing canvas)
Create another WIC bitmap B with YUV frame data (format conversion)
Draw bitmap B onto A, then draw A to the D2D render target
For 1, 2 step, I must select a pixel format.
WIC Native Pixel Formats
There is a MSDN page recommends WICPixelFormat32bppPBGRA.
http://msdn.microsoft.com/en-us/library/windows/desktop/hh780393(v=vs.85).aspx
What's the difference WICPixelFormat32bppPBGRA and WICPixelFormat32bppBGRA? (former has additional P)
If WICPixelFormat32bppPBGRA is the way to go, is it always the case? Regardless hardware and/or configuration?
What is the most effective pixel format for WIC bitmap processing actually?

Unfortunately, using Direct2D 1.1 or lower, you cannot use pixelformat different from DXGI_FORMAT_B8G8R8A8_UNORM which is equivalent to WIC's WICPixelFormat32bppPBGRA ("P" is if you use D2D1_ALPHA_MODE_PREMULTIPLIED alpha mode in D2D).
If your target OS in Windows 8 then you can use Direct2D's newer features. As far as I remember there is some kind of YUV support for D2D bitmaps. (Edit: No, there is not. RGB32 remains the only pixelformat supported along with some alpha-only formats)
What is the most effective pixel format for WIC bitmap processing actually?
I'm not sure how to measure pixel format effectiveness, but if you want to use hardware acceleration you should draw using D2D instead of WIC, and use WIC only for colorspace conversion. (GDI is also hardware accelarated btw).
What's the difference WICPixelFormat32bppPBGRA and WICPixelFormat32bppBGRA? (former has
additional P)
P means that RGB components are premultiplied. (see here)
What I'll doing is basically such as below:
Create an empty WIC bitmap A (for drawing canvas)
Create another WIC bitmap B with YUV frame data (format conversion)
Draw bitmap B onto A, then draw A to the D2D render target
If you target for performance, you should minimize the bitmap copy operations, also you should avoid using WIC bitmap render target, because it is uses software rendering. If your player would only render to a window, you can use HWND render target, or DeviceContext with Swap Chain (depending of Direct2D version you use).
Instead of rendering frame B to frame A, you can use software pixel format conversion featuers of WIC (e.g. IWICFormatConverter). Another way would be to write (or find) a custom conversion routine using SIMD operations. Or use shaders to convert the format (colorspace) on the GPU side. But the latter two require advanced knowledge.
When it is converted you can lock the pixels to get the pixel data and directly copy that data to a D2D bitmap (ID2D1Bitmap::CopyFromMemory()). Given that you already have a ready d2d bitmap.
And the last step would be to render the bitmap to the render target. And you can use transformation matrices to realize stretching.

DIfference between ID2D1Bitmap and IWicBitmap

What is the differnce betweem ID2D1Bitmap and IWicBitmap
I have raw memory data and i wanted to create a bitmap

A WIC bitmap represents an image in system memory that can be in a wide range of formats (JPEG, PNG, BMP, etc.). A D2D bitmap represents an image in GPU memory that is one of a handful of hardware-accelerated fomats.
Assuming you want to draw the bitmap to the screen using D2D, and your raw memory data is in a format compatible with D2D, you should use ID2D1RenderTarget::CreateBitmap directly. If it is not a compatible format (e.g. it is a pointer to the raw data of a .png file), you will need to load it into an IWicBitmap and then use ID2D1RenderTarget::CreateBitmapFromWicBitmap.

high performance video output with Qt

I'm writing a video player where my code decodes the video to raw YCbCr frames.
What would be the fastest way to output these through the Qt framework? I want to
avoid copying data around too much as the images are in HD format.
I am afraid that software color conversion into a QImage would be slow and that later the QImage will again be copied when drawing into the GUI.
I have had a look at QAbstractVideoSurface and even have running code,
but cannot grasp how this is faster, since like in the VideoWidget example
(http://idlebox.net/2010/apidocs/qt-everywhere-opensource-4.7.0.zip/multimedia-videowidget.html), rendering is still done by calling QPainter::drawImage with
QImage, which has to be in RGB.
The preferred solution seems to me to have access to a hardware surface directly
into which I could decode the YCbCr or at least directly do the RGB conversion (with libswscale) into.
But I cannot see how I could do this (without using OpenGL, which would give me
free scaling too, though).

One common solution is to use QGL Widget with texture mapping. The application allocates a texture buffer on first frame, then call update texture in remaining frames. This is pure GL call, Qt not supporting texture manipulation yet. But QGLwidget can be used as a container.
Decoding was done using SSE2. Hope this helps.

How to create RAW image?

I'm building one part of H264 encoder. For testing system, I need to created input image for encoding. We have a programme for read image to RAM file format to use.
My question is how to create a RAW file: bitmap or tiff (I don't want to use compressed format link JPEG)? I googled and recognize alot of raw file type. So what type i should use and how to create? . I think i will use C/C++ or Matlab to create raw file.
P/S: my need format is : YUV ( or Y Cb Cr) 4:2:0 and 8 bit colour deepth

The easiest raw format is just a stream of numbers, representing the pixels. Each raw format can be associated with metadata such as:
width, heigth
width / image row (ie. gstreamer & x-window align each row to dword boundaries)
bits per pixel
byte format / endianness (if 16 bits per pixel or more)
number of image channels
color system HSV, RGB, Bayer, YUV
order of channels, e.g. RGBA, ABGR, GBR
planar vs. packed (or FOURCC code)
or this metadata can be just an internal specification...
I believe one of the easiest approaches (after of course a steep learning curve :) is to use e.g. gstreamer, where you can use existing file/stream sources that read data from camera, file, pre-existing jpeg etc. and pass those raw streams inside a defined pipeline. One useful element is a filesink, which would simply write a single or few successive raw data frames to your filesystem. The gstreamer infrastructure has possibly hundreds of converters and filters, btw. including h264 encoder...
I would bet that if you just dump your memory, that output will conform already to some FOURCC -format (also recognized by gstreamer).

OpenGL ES, Changing texture format from RGBA8888 to RGBA4444 will improve fill rate?

I'm rendering a lot of big images with alpha testing, and I'm hitting the fill rate.
If I change the texture format from RGBA8888 to RGBA4444 will the fill rate improve?
EDIT: Hardware is Iphone 3GS and OpenGL version 1.1

Doing some tests I've found the following:
Loading png as RGBA8888 I get 42fps
Loading png as RGBA4444 I get 44fps
Loading pvrtc2 I get 53 fps (and I had to double the texture size because it was not squared)
It seems that changing from rgba8888 to rgba4444 does not improve framerate. But using pvrtc2 might do.

You don't specify the particular hardware you're asking about, but Apple has this to say about the PowerVR GPUs in iOS devices:
If your application cannot use
compressed textures, consider using a
lower precision pixel format. A
texture in RGB565, RGBA5551, or
RGBA4444 format uses half the memory
of a texture in RGBA8888 format. Use
RGBA8888 only when your application
needs that level of quality.
While this will improve memory usage, I'm not sure that it will have a dramatic effect on fill rate. You might see an improvement from being able to hold more of the texture in cache at once, but I'd listen to Tommy's point about per-pixel operations being the more likely bottleneck here.
Also, when it comes to texture size, you'll get much better image quality at a smaller size by using texture compression (like PVRTC for the PowerVR GPUs) than lowering the precision of the texture pixel format.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio