DirectX vs FFmpeg - ffmpeg

i'm in the process of deciding how to decode received video frames, based on the following:
platform is Windows.
frames are encoded in H264 or H265.
GPU should be used as much
certainly we prefer less coding and simplest code. we just need to decode and show the result on screen. no recording is required, not anything else.
still i'm a newbie, but i think one may decode a frame directly by directx or through ffmpeg. am i right?
if so, which one is preferred?

For a simple approach and simple code using GPU only, take a look at my project using DirectX : H264Dxva2Decoder
If you are ready to code, you can use my approach.
If not, you can use MediaFoundation or FFMPEG, both can do the job.
MediaFoundation is C++ and COM oriented. FFMPEG is C oriented. It can make the difference for you.
EDIT
You can use my program because you have frames encoded in H264 or H265. For h265, you will have to add extra code.
Of course, you need to make modifications. And yes you can send frames to DirectX without using a file. This project use only avcc video file format, but it can be modify for others cases.
You don't need the atom parser. You need to modify the nalu parser, if frames are annex-b format, for example. You will also need to modify the buffering mechanism, if frames are annex-b format.
I can help you, if you provide frames samples encoded in H264.
About Ffmpeg, it has fewer limitations than my program, according to h264 specifications,
but does not provide the rendering mechanism. You will have to mix Ffmepg and my rendering mechanism, for example.
Or study a program like MPC-HC that shows the mix. I can not help anymore here.
EDIT 2
One thing to know, you can't decode encoded packets directly to GPU. You need to parse them before. That's why there is a nalu parser (see DXVA_PicParams_H264).
If you are not ready to code and to understand how it works, use Ffmpeg, it will be simpler, in effect. You can focus on rendering, not on decoding.
It's also important to know which one gives a better result, consumes less resources (CPU, GPU, RAM (both system memory and graphics card memory), supports wider range of formats, etc.
You ask for a real expertise...
If you code your own program, you will be able to optimize it, and certainly get better results. If you use Ffmpeg, and it has performance problems in your context, you could be blocked... because you will not modify Ffmpeg.
You say you will use Bosch cameras. Normally, all encoded video will be in the same format. So once your code is able to decode it, you don't really need all the Ffmpeg features.

Related

Real time microphone audio manipulation windows

I would like to make an app (Target pc windows) that let you modify the micro input in real time, like introducing sound effects or even modulating your voice.
I searched over the internet and only found people telling that it would not be possible without using a virtual audio cable.
However I know some apps with similar behavior (voicemod, resonance) not using a virtual audio cable so I would like some help about how can be done (just the name of a library capable would be enough) or where to start.
Firstly, you can use professional ready-made software for that - Digital audio workstation (DAW) in combination with a huge number of plugins for that.
See 5 steps to real-time process your instrument in the DAW.
And What is (audio) direct monitoring?
If you are sure you have to write your own, you can use libraries for real-time audio processing (as far as I know, C++ is better for this than C#).
These libraries really works. They are specially designed for realtime.
https://github.com/thestk/rtaudio
http://www.portaudio.com/
See also https://en.wikipedia.org/wiki/Csound
If you don't have a professional sound interface yet, but want to minimize a latency, read about Asio4All
The linked tutorial worked for me. In it, a sound is recorded and saved to a .wav.
The key to having this stream to a speaker would be opening a SourceDataLine and outputting to that instead of writing to a wav file. So, instead of outputting on line 59 to AudioSystem.write, output to a SourceDataLine write method.
IDK if there will be a feedback issue. Probably good to output to headphones and not your speakers!
To add an effect, the AudioInputLine has to be accessed and processed in segments. In each segment the following needs to happen:
obtain the byte array from the AudioInputLine
convert the audio bytes to PCM
apply your audio effect to the PCM (if the effect is a volume change over time, this could be done by progressively altering a volume factor between 0 to 1, multiplying the factor against the PCM)
convert back to audio bytes
write to the SourceDataLine
All these steps have been covered in StackOverflow posts.
The link tutorial does some simplification in how file locations, threads, and the stopping and starting are handled. But most importantly, it shows a working, live audio line from the microphone.

How to make mpv more compatible with ffmpeg filters like minterpolate?

ffmpeg filter minterpolate (motion interpolation) does not work in MPV.
(Nevertheless the file then is played normally without the minterpolate).
(I researched using search engines and throughout documentation and troubleshooted to make a use of opengl and generally tried everything apart from asking for help and learning to understand more in the source code and I'm not a programmer)…
--gpu-context=angle --gpu-api=opengl also does not make opengl work. (I'm guessing opengl could help from seeing its use in the documentations).
Note
To get a full list of available video filters, see --vf=help and
http://ffmpeg.org/ffmpeg-filters.html .
Also, keep in mind that most actual filters are available via the
lavfi wrapper, which gives you access to most of libavfilter's
filters. This includes all filters that have been ported from MPlayer
to libavfilter.
Most builtin filters are deprecated in some ways, unless they're only
available in mpv (such as filters which deal with mpv specifics, or
which are implemented in mpv only).
If a filter is not builtin, the lavfi-bridge will be automatically
tried. This bridge does not support help output, and does not verify
parameters before the filter is actually used. Although the mpv syntax
is rather similar to libavfilter's, it's not the same. (Which means
not everything accepted by vf_lavfi's graph option will be accepted by
--vf.)
You can also prefix the filter name with lavfi- to force the wrapper.
This is helpful if the filter name collides with a deprecated mpv
builtin filter. For example --vf=lavfi-scale=args would use
libavfilter's scale filter over mpv's deprecated builtin one.
I expect MPV to play with minterpolate (one of several filters that MPV can use, listed in http://ffmpeg.org/ffmpeg-filters.html) enabled. But this is what happens:
Input: "--vf=lavfi=[minterpolate=fps=60000/1001:mi_mode=mci]"
Output:
cplayer: (+) Video --vid=1 (*) (h264 1280x720 29.970fps)
cplayer: (+) Audio --aid=1 (*) (aac 2ch 44100Hz)
vd: Using hardware decoding (d3d11va).
ffmpeg: Impossible to convert between the formats supported by the filter 'mpv_src_in0' and the filter 'auto_scaler_0'
lavfi: failed to configure the filter graph
vf: Disabling filter lavfi.00 because it has failed.
(Interesting is also that --gpu-api=opengl does not work (despite that according to specification my—not to brag—HD Graphics 400 Braswell supports its 4.2 version)… And that aresample seems to have no effect too, and with the few audio filters selected playback often doesn't start nor output errors.)
The problem is that you're using hardware decoding WITHOUT copying the decoded video back to system memory. This means your video filter can't access it. The fix is simple but that error message makes it very hard to figure this out.
To fix this just pass in --hwdec=no. Though --hwdec=auto-copy also fixes it but minterpolate in mci mode is so CPU intensive there's not much point in also using hardware decoding. (for most video sources)
All together:
mpv input.mkv --hwdec=no --vf=lavfi="[minterpolate=fps=60000/1001:mi_mode=mci]"
Explanation: The most efficient hardware decoding doesn't copy the video data back to system memory after decoding. But you need it in memory for running CPU based filtering on the decoded video data. You were asking mpv to do some video filtering but it doesn't have access to the decoded video data.
More details from the mpv docs:
auto-copy selects only modes that copy the video data back to system memory after decoding. This selects modes like vaapi-copy (and so on). If none of these work, hardware decoding is disabled. This mode is usually guaranteed to incur no additional quality loss compared to software decoding (assuming modern codecs and an error free video stream), and will allow CPU processing with video filters. This mode works with all video filters and VOs.
Because these copy the decoded video back to system RAM, they're often less efficient than the direct modes, and may not help too much over software decoding.

ffmpeg capture streams in sync

I'd like to capture multiple real-time video streams arriving on rtp protocol, using ffmpeg. When I initiate the recording, by issuing the ffmpeg <command line parameters> command, it always takes a while for the connection to built up and the actual recording to begin. This might be more than 2 seconds in certain cases, which cause a constant time difference at the replay.
How can I extract the information containing the time of the first actually recorded frame from ffmpeg? If it's not possible with ffmpeg without editing the source (which I did, and would like to avoid for other reasons), is there any similar multi-platform open-source tool which could be used?
Not possible without effort on your side. Use something like live555 to capture your streams. All your sources must synchronize to a single clock using ntp and then rtp timestamps can be used at the receiver end to synchronize the various streams. This is not trivial and is used in video conferencing systems. I am not aware of any free implementation of the same.
If you do not have control over the sources then you are out of luck because there is no such things as a common base time across the streams. If you do, you still need to modify live555 and your player to synchronize using the timestamps on the streams and the ntp clock. Like I said, not trivial.
Perhaps gstreamer might already have plugins for it, its been a while since I used it so I am not sure. You could take a look there. (gstreamer.net).

Playing H.264 video in an application through ffmpeg using DXVA2 acceleration

I am trying to output H.264 video in a Windows application. I am moderately familiar with FFMPEG and I have been successful at getting it to play H.264 in a SDL window without a problem. Still, I would really benefit from using Hardware Acceleration (probably through DXVA2)
I am reading raw H264 video, no container, no audio ... just raw video (and no B-frames, just I and P). Also, I know that all the systems that will use this applications have Nvidia GPUs supporting at least VP3.
Given that set of assumptions I was hoping to cut some corners, make it simple instead of general, just have it working for my particular scenario.
So far I know that I need to set the hardware acceleration in the codec context by filling the hwaccel member through a call to ff_find_hwaccel. My plan is to look at Media Player Classic Home Cinema which does a pretty good job at supporting DXVA2 using FFMPEG when decoding H.264. However, the code is quite large and I am not exactly sure where to look. I can find the place where ff_find_hwaccel is called in h264.c, but I was wondering where else should I be looking at.
More specifically, I would like to know what is the minimum set of steps that I have to code to get DXVA2 through FFMPEG working?
EDIT: I am open to look at VLC or anything else if someone knows where I can find the "important" piece of code that does the trick. I just mentioned MPC-HC because I think it is the easiest to get to compile in Windows.

Encode WebCam frames with H.264 on .NET

What i want to do is the following procedure:
Get a frame from the Webcam.
Encode it with an H264 encoder.
Create a packet with that frame with my own "protocol" to send it via UDP.
Receive it and decode it...
It would be a live streaming.
Well i just need help with the Second step.
Im retrieving camera images with AForge Framework.
I dont want to write frames to files and then decode them, that would be very slow i guess.
I would like to handle encoded frames in memory and then create the packets to be sent.
I need to use an open source encoder. Already tryed with x264 following this example
How does one encode a series of images into H264 using the x264 C API?
but seems it only works on Linux, or at least thats what i thought after i saw like 50 errors when trying to compile the example with visual c++ 2010.
I have to make clear that i already did a lot of research (1 week reading) before writing this but couldnt find a (simple) way to do it.
I know there is the RTMP protocol, but the video stream will always be seen by one peroson at a(/the?) time and RTMP is more oriented to stream to many people. Also i already streamed with an adobe flash application i made but was too laggy ¬¬.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
I hope that at least someone could point me on(/at?) the right direction.
My english is not good maybe blah blah apologies. :P
PS: doesnt has to be in .NET, it can be in any language as long as it works on Windows.
Many many many many thanks in advance.
You could try your approach using Microsoft's DirectShow technology. There is an opensource x264 wrapper available for download at Monogram.
If you download the filter, you need to register it with the OS using regsvr32. I would suggest doing some quick testing to find out if this approach is feasible, use the GraphEdit tool to connect your webcam to the encoder and have a look at the configuration options.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
This really depends on the required latency: the more frames you package, the less header overhead, but the more latency since you have to wait for multiple frames to be encoded before you can send them. For live streaming the latency should be kept to a minimum and the typical protocols used are RTP/UDP. This implies that your maximum packet size is limited to the MTU of the network often requiring IDR frames to be fragmented and sent in multiple packets.
My advice would be to not worry about sending more frames in one packet until/unless you have a reason to. This is more often necessary with audio streaming since the header size (e.g. IP + UDP + RTP) is considered big in relation to the audio payload.

Resources