I transcoding video on NVIDIA QUADRO K4200 in Ubuntu (ffmpeg version is 2.7.1, NVENC SDK 5.0.1). GPU Memory usage for one stream is 100 MB. Please see an output of nvidia-smi command:
But when I run the same transcoding process with the same ffmpeg parameters on another computer with NVIDIA GTX 980 TI (ffmpeg version is 3.0, NVENC SDK 5.0.1) then GPU Memory usage for one stream is 170 MB. Please see the screenshot below:
Why such a difference in memory usage? Can I decrease a GPU Memory usage on GTX 980 TI to 100MB for one transcode process as on QUADRO K4000?
Your answer:
On Quadro and Tesla GPUs, the number of maximum simultaneous NVENC sessions is unlimited, and as such, these platforms will often incur lower driver overheads for the same work unit.
It is also wise to consider that unlike NVCUVENC (with uses your CUDA cores to encode elementary video streams), NVENC is a dedicated hardware-based Silicon Intellectual Property core (SIP) , and if you're comparing across different driver and platform versions, all other factors remaining constant, your mileage will always vary.
Thanks and regards,
Brainiarc7.
Related
I am working on a project which captures screen and encodes it. I can already capture screen using desktop duplication API (Win8+). Using the API I can get ID3D11Texture2D textures and transfer them from GPU to CPU and then use libx264 to encode them.
However, pulling textures from GPU to CPU can be a bottle neck which potentially can decrease the fps. Also libx264 takes up CPU cycles (depending on quality) to encode frames. I am looking for encoding ID3D11Texture2D textures in GPU itself instead of using CPU for encoding as an optimization.
I have already checked the documentation and some sample codes but I have had no success. I would appreciate if someone could point me to some resource that does exactly what I want reliably.
Video encoders, hardware and software, might be available in different form factors. Windows itself offers extensible APIs with choice of encoders, and then additional encoders might be available as libraries and SDKs. You are already using one of such libraries (x264). Hardware encoders are typically vendor-specific and depend on available hardware, which is involved directly in the process of encoding. If you are interested in solution for specific hardware, it might make sense to check for respective SDK.
Otherwise, typical generic interface for hardware backed video encoding in Windows is Media Foundation Transform (MFT). Microsoft provides stock software only H.264 video encoder which is unlikely to give any advantage over x264, except the fact that it shares MFT interface with other options. Video hardware drivers, however, would often install additional MFTs for the hardware available, which add more MFTs backed by hardware implementation. Examples of such are:
Intel® Quick Sync Video H.264 Encoder MFT
NVIDIA H.264 Encoder MFT
AMDh264Encoder
Offered by different vendors, they offer similar functionality and your using these MFTs to encode H.264 is a good way to take advantage of hardware video encoding with a wide range of hardware.
See also:
Registering and Enumerating MFTs
Basic MFT Processing Model
You have to check if sharing texture between GPU encoder and DirectX is possible.
I know it's possible to share texture between Nvidia Decoder and DirectX, because i've done it successfully. Nvidia has some interop capacity, so first, look if you can share texture to do all things in GPU.
With Nvidia you can do this : Nvidia Decoding->DirectX Display in GPU.
Check if DirectX Display->Nvidia Enconding is possible (knowing nvidia offers Interop)
For Intel and ATI, i don't know if they provide interop with DirectX.
The main thing to know is to check if you can interop your DirectX texture with GPU encoder texture.
My application do encoding of captured frame from GDI or DXGI method. currently i am doing encoding with help x264 library.
AFAIK x264 is software based library, i want to do encoding with help of GPU, so it can save CPU cycles and hope speed also will be faster.
After searching, I found a H.264 Video Encoder MFT which is doing h264 encoding.
But couple of questions are answered for me.
1) is It faster than x264 encoding library?
2) can bitmap frame be encoded with help this MFT?
- i have seen only MFVideoFormat_I420, MFVideoFormat_IYUV, MFVideoFormat_NV12, MFVideoFormat_YUY2, MFVideoFormat_YV12 these formats are supported
3) is it hardware accelerated(mean it's using CPU or GPU)?
- Initially my understanding was it uses GPU but i get confused after reading this post MFT Encoder (h264) High CPU utilization.
4) can H.264 Video Encoder MFT be used stand alone without using sink writer, as i have to sent data on network?
5) is there any other alternative in windows?
It might be some questions are very silly, please feel free to edit.
Media Foundation H.264 Video Encoder is software encoder. From my [subjective] experience it slower than x264 and, perhaps more important, x264 offers wider range of settings, specifically when it comes to choose modes on the speed over quality end of the range. Either way, stock MS encoder is not hardware accelerated.
However, there might be other MFTs available (typically installed with respective hardware drivers) that do hardware accelerated H.264 encoding. You can discover them by enumerating MFTs, perhaps most popular is Intel Quick Sync Video (QSV) Encoder.
HardwareVideoEncoderTransform app does the enumeration and provides you with relevant details:
Typical input is NV12, some offer other input choices (such as e.g. 32-bit RGB). If you need other formats, you will have to pre-convert the input.
Hardware backed encoders CPU consumption is low, and their efficiency depends on the hardware implementation. Yes, you can use them standalone, entirely standalone or wrapped as DirectShow filter and included in normal DirectShow pipeline.
Alternate H.264 encoders are typically SDK based, or wrappers over those SDKs in DirectShow/MFT form factors because vendors package their implementation in well-known forms already familiar to multimedia developers.
I compiled ffmpeg and h264 libraries for android using NDK.
I am recording videos using the muxing.c example from the ffmpeg library. Everything works correct (still haven't worked on the audio) but the camera is dropping frames and it takes around 100ms to save each frame, which is unacceptable.
I have also tried making a queue and saving them into another thread (let's call it B) but at the end I need to wait for around 120 seconds because the background thread (B)is still recording the frames.
Is there a workaround for this issue, besides reducing the video size? Ideally I would like to save the frames in real time, at least reduce the saving time. Is it just that Android is incapable of doing this? .
First of all, check if you can be better served by the hardware encoder (via MediaRecorder or MediaCodec in Java, or using OpenMax from native code).
If for some reason you must encode in software, and your device is multicore, you can gain a lot by compiling x264 to use sliced multithreading. Let me cite my post of 2 years ago:
We are using x264 directly (no ffmpeg code involved), and with ultafast/zerolatency preset we get 30 FPS for VGA on Samsung Note10 (http://www.gsmarena.com/samsung_galaxy_note_10_1_n8000-4573.php) with Quad-core 1.4 GHz Cortex-A9 Exynos 4412 CPU, which is on the paper weaker than Droid's Quad-core 1.5 GHz Krait Qualcomm MDM615m/APQ8064 (http://www.gsmarena.com/htc_droid_dna-5113.php).
Note that x264 build scripts do not enable pthreads for Android (because NDK does not include libpthread.a), bit you can build the library with multithread support (very nice for a Quad-core CPU) if you simply create a dummy libpthread.a, see https://mailman.videolan.org/pipermail/x264-devel/2013-March/009941.html.
Note that encoder setup is only one part of the problem. If you work with the deprecated camera API, you should use preallocated buffers and a background thread for camera callbacks, as I explained elsewhere.
I would like to do a performance comparison between Ffmpeg and Intel Media SDK in transcoding.
I have to write a new application that will do the following.
Receive frames from MJPEG, MPEG4 and H264 cameras.
Transcode the frames. Output will be h264. Here I have to either use Ffmpeg or Intel Media SDK.
Multicast transcoded frames as RTSP streaming.
I have noticed that both these libs are CPU intensive. Is there any settings in Ffmpeg can reduce the CPU usage?
Thanks in advance,
As with all media encoding, you will trade speed for quality. x264 (ffmpeg) will produce higher quality (or smaller files at the same quality) but will use more CPU. Intel Media SDK should use very little CPU, but the quality will be a bit less. It accomplishes this by using specialized hardware on the CPU, if your CPU supports it.
So, what is best? It depends on want you want to optimize for CPU, Power usage, Quality, or file size.
I have a potential job which will require me to do some video encoding with FFMPEG and x264. I'll have a series of files which I'll need to encode once, then I'll be able to bring down the instances. Since I'm not really sure of the resource utilization of x264 and FFMPEG, what kind of instances should I get? I'm thinking either a
High-CPU Extra Large Instance
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
or, alternatively a
Cluster GPU Quadruple Extra Large Instance
22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge
What should I use? Does x264/FFMPEG perform better with faster/more CPUs or does it really pound the GPU more? In any case, it seems that the Cluster GPU seems to be the higher performance instance. What should I prefer?
Ffmpeg recently added support for VAAPI and VDPAU, but this allows it to use the GPU only for decoding of H.264 video. For encoding, it uses the CPU.
The short answer to CPU/GPU is, it depends on how you use ffmpeg to do the H264 encoding.
Also, you are confusing the H264 and x264. H264 is the video codec standard and x264 is one implementation of H264 standard. x264 is so popular, so sometimes it has become synonymous and confused with H264. The reason that I point that out is x264 is a software-based implementation of H264, which means it will only use the CPU cores for all the processes. There will be no GPU usage in your use case when you use x264 for video encoding.
That being said, maybe what you are trying to ask is whether to go for
hardware-based implementation of H264 (which uses the GPU), or
software-based implementation of H264 (which uses the CPU)
There are several implementations available for each available. Ffmpeg already has a nice page on this. If you are planning to use the Nvidia GPU instances, then you would need to compile FFmpeg with NVENC support to get the hardware implementation. Using GPUs/CPUs to efficiently do all your transcoding process is an art itself.
So in short, x264 will not use GPU. If you want to use GPU, you need to use hardware implementations of the encoders. Which implementation is better largely depends upon your use case and what you care about (quality, cost, turnaround time, etc.)
My background/ Disclaimer: I work as a Senior Engineer at Bitmovin. We solve this "cluster/resource" allocation engineering problem, among many many many other problems, to extract the best possible video quality out of a given bitrate. And in the end, we offer APIs where you can just simply plug them into your workflow. The views expressed here are my own.
In the present, Amazon EC2 offers (some) GPU accelerated instances using modern NVIDIA GPUs, meaning that you can take advantage of NVENC on them.
You are probably better off using a service like zencoder.com, they have an excellent API and the quality you will get out of it will most probably be better than hours of fiddling with Ffmpeg parameters optimisation.