I'm going through the HEVC decoder integrated in FFMPEG. I'm actually trying to understand its flow and working.
By flow, i mean the part in code where it is reading various parameters of the input .bin file. Like where is it reading the resolution, where is it deciding the fps that it needs to play, the output display format that is yuv420p etc.
Initially What i suspected is the demuxer of hevc situated at /libavformat/hevcdec.c In this file does the input file reading work. There is a probe function which is used to detect which decoder to select while decoding the input bin stream. Further we have a FF_DEF_RAWVIDEO_DEMUXER. Is it in this function that the resolution and other parameters read from the input file?
Secondly what i suspect is the hevc parser situated at: /libavcodec/hevc_parser.c but here i think it is just parsing the frame data, that is finding end of frame. So, is this assumption of mine right?
Any suggestions or any predictions will be really helpful to me. Please provide your valuable suggestions. Thanks in advance.
To understand more specifically what is going on in decoder, it's better to start your study with HEVC/H.265 standard (http://www.itu.int/rec/T-REC-H.265). It contains all the information you need to know to find the location of resolution, fps, etc.
If you want to get more details from FFMPEG, here are some hints:
/libavcodec/hevc_parser.c contains H.265 Annex B parser, which converts byte stream into series of NAL units. Each NAL unit has its own format and should be parsed depending on its NAL unit type.
If you are looking for basic properties of video sequence, you may be interested in SPS (Sequence Parameter Set) parsing. Its format is described in section 7.3.2.2.1 of the standard and there is a function ff_hevc_decode_nal_sps located in /libavcodec/hevc_ps.c which extracts SPS parameters from the bitstream.
Note: I was talking about FFMPEG version 2.5.3. Code structure can be different for other versions.
Related
Every once and a while this comes up and I've never been able to find a good solution but I figured I'd ask the experts: is there a video frame parser in FFMPEG that can determine frame type WITHOUT resorting to decoding?
Each codec has its particular syntax, and the decoder is the bespoke component that can work with that syntax. Now, is there an operational mode where the decoders analyze and log surface level parameters of the payload without entering the codepath to generate raster? In general, no.
There is a -debug option (http://www.ffmpeg.org/ffmpeg-all.html#Codec-Options), which when set, certain decoders, mainly the native MPEG decoders, will log some metadata, but this will invoke the decoder. For modern codecs, there's the trace_headers bitstream filter which can be used in streamcopy mode. This will parse and print all the parameter sets and slice headers. You can dump these to file and inspect them.
According to the description, it places global headers in extradata instead of every keyframe.
But what's the actual purpose? Is it useful e.g. for streaming?
I guess that the resulting video file will be marginally shorter, but more prone to corruption (no redundancy = file unplayable if main header corrupted or only partially downloaded)?
Or maybe it somehow improves decoding a little bit? As headers are truly global and cannot change from keyframe to keyframe?
Thanks!
By extradata FFmpeg means out-of-band, as opposed to in-band. The behavior of the flag is format specific. This is useful for headers that are not expected to change because it reduces the overhead.
Example: for H.264 in MP4 the SPS and PPS are stored in the avcC atom. For the same H.264 stream in let's say MPEG-TS the sequences are repeated in the bitstream.
I want to stream video (no audio) from a server to a client. I will encode the video using libx264 and decode it with ffmpeg. I plan to use fixed settings (at the very least they will be known in advance by both the client and the server). I was wondering if I can avoid wrapping the compressed video in a container format (like mp4 or mkv).
Right now I am able to encode my frames using x264_encoder_encode. I get a compressed frame back, and I can do that for every frame. What extra information (if anything at all) do I need to send to the client so that ffmpeg can decode the compressed frames, and more importantly how can I obtain it with libx264. I assume I may need to generate NAL information (x264_nal_encode?). Having an idea of what is the minimum necessary to get the video across, and how to put the pieces together would be really helpful.
I found out that the minimum amount of information are the NAL units from each frame, this will give me a raw h264 stream. If I were to write this to a file, I could watchit using VLC if adding a .h264
I can also open such a file using ffmpeg, but if I want to stream it, then it makes more sense to use RTSP, and a good open source library for that is Live555: http://www.live555.com/liveMedia/
In their FAQ they mention how to send the output from your encoder to live555, and there is source for both a client and a server. I have yet to finish coding this, but it seems like a reasonable solution
I am new to MPEG-4 and taking baby steps to learn it. I am using FFMPEG as reference.
I understand that all mpeg-4 are encoded into NAL units and wrt to FFMPEG av_read_frame() function returns one NAL unit, Am I right? Is frame a NAL unit? (though it can be a combination of multiple NALs)
I also saw that h264_parser.c implements a function called h264_parse which is calling parse_nal_units() inside, If i need to get NAL units how can I use this parse_nal_units from my main function?
What is av_parse_Parse2() function do? does it return decoded NAL units?
OR FFMPEG has -vbsf h264_mp4toannexb switch to dump raw NAL units, Can somebody help me understand how I can use the same from my main function?
Please help me out here...
-ash5
For question 1:
The following article has links that will help you understand what NALs are.
In h264 NAL units means frame.?
NALs are divided into several types, and depending on the type can contain decoding parameters (SPS, PPS), enhancement information (SEI) and video samples (slice header and data). A common sequence from a broadcast transport stream would be SPS, PPS, SEI, slice_header(), slice_data(), SEI, slice_header(), slice_data() *
You probably don't need to understand ISO 14496-10 section 7.3 "Syntax in tabular form" for your application.
I tried to decode a series of nal units with ffmpeg (libavcodec) but I get a "no frame" error. I produced the nal units with the guideline at How does one encode a series of images into H264 using the x264 C API?. I tried the following strategy for decoding:
avcodec_init();
avcodec_register_all();
AVCodec* pCodec;
pCodec=lpavcodec_find_decoder(CODEC_ID_H264);
AVCodecContext* pCodecContext;
pCodecContext=lpavcodec_alloc_context();
avcodec_open(pCodecContext,pCodec);
AVFrame *pFrame;
pFrame=avcodec_alloc_frame();
//for every nal unit:
int frameFinished=0;
//nalData2 is nalData without the first 4 bytes
avcodec_decode_video(pCodecContext,pFrame,&frameFinished,(uint8_t*) nalData2,nalLength);
I passed all units I got to this code but frameFinished stays 0. I guess there must be something wrong with the pCodecContext setup. Can someone send me a working example?
Thank you
Check out this tutorial. It should be able to decode any video type including H.264:
http://dranger.com/ffmpeg/
I don't know exactly what is causing the problem, but I suspect it has something to do with the fact that you are not using the av_read_frame from libavformat to parse out a frames worth of data at a time. H.264 has the ability to split a frame into multiple slices and therefore multiple NAL units.
I am pretty sure the x264 encoder does not do this by default and produces one NAL unit per frame. However, there are NAL units with other stream information that need to be fed to the decoder. I have experimented with this in the past and when av_read_frame parses out a frames worth of data, it sometimes contains multiple NAL units. I would suggest following the tutorial closely and seeing if that works.
Another thing is that I think you do need to pass the first 4 bytes of the NAL unit into avcodec_decode_video if that is the start code you are talking about (0x00000001). Having investigated the output from av_read_frame, the start codes are still in the data when passed to the decoder.
Try this after the codec context instantiation code:
if(pCodec->capabilities & CODEC_CAP_TRUNCATED)
pCodecContext->flags |= CODEC_FLAG_TRUNCATED; /* We may send incomplete frames */
if(pCodec->capabilities & CODEC_FLAG2_CHUNKS)
pCodecContext->flags2 |= CODEC_FLAG2_CHUNKS;