Anyone familiar with mp4 data structure? - data-structures

Where in the mp4 file structure is the duration of it?

This may not be the answer to your problem but it was to mine: http://mediainfo.sourceforge.net/
(It has a library and it's open source so you can just check for the part(s) you need)

See https://github.com/sannies/mp4parser project. It is a Java library that shows the structure of mp4 files.

For the Red5 MP4 reader I used the "mvhd" atom, since it contains both time scale and duration fields. Getting the duration from the atom will be different based on the version being used, below you can see an example:
public long create_full_atom(MP4DataStream bitstream) throws IOException {
long value = bitstream.readBytes(4);
version = (int)value >> 24;
flags = (int)value & 0xffffff;
readed += 4;
return readed;
}
public long create_movie_header_atom(MP4DataStream bitstream) throws IOException {
create_full_atom(bitstream);
if (version == 1) {
creationTime = createDate(bitstream.readBytes(8));
modificationTime = createDate(bitstream.readBytes(8));
timeScale = (int)bitstream.readBytes(4);
duration = bitstream.readBytes(8);
readed += 28;
} else {
creationTime = createDate(bitstream.readBytes(4));
modificationTime = createDate(bitstream.readBytes(4));
timeScale = (int)bitstream.readBytes(4);
duration = bitstream.readBytes(4);
readed += 16;
}
int qt_preferredRate = (int)bitstream.readBytes(4);
int qt_preferredVolume = (int)bitstream.readBytes(2);
bitstream.skipBytes(10);
long qt_matrixA = bitstream.readBytes(4);
long qt_matrixB = bitstream.readBytes(4);
long qt_matrixU = bitstream.readBytes(4);
long qt_matrixC = bitstream.readBytes(4);
long qt_matrixD = bitstream.readBytes(4);
long qt_matrixV = bitstream.readBytes(4);
long qt_matrixX = bitstream.readBytes(4);
long qt_matrixY = bitstream.readBytes(4);
long qt_matrixW = bitstream.readBytes(4);
long qt_previewTime = bitstream.readBytes(4);
long qt_previewDuration = bitstream.readBytes(4);
long qt_posterTime = bitstream.readBytes(4);
long qt_selectionTime = bitstream.readBytes(4);
long qt_selectionDuration = bitstream.readBytes(4);
long qt_currentTime = bitstream.readBytes(4);
long nextTrackID = bitstream.readBytes(4);
readed += 80;
return readed;
}
On a side note I used the values to calculate play time and fps like so:
double fps = (videoSampleCount * timeScale) / (double) duration;
double videoTime = ((double) duration / (double) timeScale);
The videoSampleCount variable comes from the "stsz" atom.

As far as i know - "mp4" container is derived from the QuickTime atom structure. You can read the description of QuickTime File Format.
Parsing quicktime atoms is not a big deal (look at atomicParsley project). I'm not sure for MP4, but as for MOV-files - there's a "duration" field in "mvhd" (movie header) atom and also in "tkhd" (track header) atom. This duration is usually a number of frames multiplied by the "time scale" attribute.
Time scale can be found in the same atoms.

MP4 is a "container" format, which basically means it can contain a number of different audio or video streams. And each stream could have it's own duration value...
To dig out what you need, you're going to want some more reference files. I might suggest looking here and here... but you'll probably have to go searching beyond that for the different types of A/V streams you want to support.

Basically MP4 structure is a tree.
Macro areas are:
ftyp - file type
moov - contains meta data (song title, autors, url, and other infos)
free - empty area to separate header and data
mdat - contains the audio frames
You can try this freeware MP4 Analyzer tool
http://www.thinmultimedia.co.kr/products/MP4Reader_download.html

Duration of the movie is in the movie header mvhd.
The duration in seconds is derived from two fields in mvhd.
4 byte time scale
4 byte duration
These are lines 380 and 382 in spec posted by #Tom Brito.
So given timescale 'ts' and duration 'dur'
Duration in seconds = dur / ts

Using MP4Parser http://code.google.com/p/mp4parser/ as previous poster mentioned - they even have a sample that provides duration:
https://mp4parser.googlecode.com/svn/trunk/examples/src/main/java/com/googlecode/mp4parser/GetDuration.java

Media Box Viewer can be used. It is MP4 and Quicktime parser. When you open a Quicktime file, you can see the atom structure. Look for the video description atom. One of its properties is the duration. Media Box Viewer can be downloaded from www.jdxsoftware.org.

Related

FFMEG libavcodec decoder then re-encode video issue

I'm trying to use libavcodec library in FFMpeg to decode then re-encode a h264 video.
I have the decoding part working (rendes to an SDL window fine) but when I try to re-encode the frames I get bad data in the re-encoded videos samples.
Here is a cut down code snippet of my encode logic.
EncodeResponse H264Codec::EncodeFrame(AVFrame* pFrame, StreamCodecContainer* pStreamCodecContainer, AVPacket* pPacket)
{
int result = 0;
result = avcodec_send_frame(pStreamCodecContainer->pEncodingCodecContext, pFrame);
if(result < 0)
{
return EncodeResponse::Fail;
}
while (result >= 0)
{
result = avcodec_receive_packet(pStreamCodecContainer->pEncodingCodecContext, pPacket);
// If the encoder needs more frames to create a packed then return and wait for
// method to be called again upon a new frame been present.
// Else check if we have failed to encode for some reason.
// Else a packet has successfully been returned, then write it to the file.
if (result == AVERROR(EAGAIN) || result == AVERROR_EOF)
{
// Higher level logic, dedcodes next frame from source
// video then calls this method again.
return EncodeResponse::SendNextFrame;
}
else if (result < 0)
{
return EncodeResponse::Fail;
}
else
{
// Prepare packet for muxing.
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
av_packet_rescale_ts(m_pPacket, pStreamCodecContainer->pEncodingCodecContext->time_base,
m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base);
}
m_pPacket->stream_index = pStreamCodecContainer->streamIndex;
int result = av_interleaved_write_frame(m_pEncodingFormatContext, m_pPacket);
av_packet_unref(m_pPacket);
}
}
return EncodeResponse::EncoderEndOfFile;
}
Strange behaviour I notice is that before I get the first packet from avcodec_receive_packet I have to send 50+ frames to avcodec_send_frame.
I built a debug build of FFMpeg and stepping into the code I notice that AVERROR(EAGAIN) is returned by avcodec_receive_packet because of the following in x264encoder::encode in encoder.c
if( h->frames.i_input <= h->frames.i_delay + 1 - h->i_thread_frames )
{
/* Nothing yet to encode, waiting for filling of buffers */
pic_out->i_type = X264_TYPE_AUTO;
return 0;
}
For some reason my code-context (h) never has any frames. I have spent a long time trying to debug ffmpeg and to determine what I'm doing wrong. But have reached the limit of my video codec knowledge (which is little).
I'm testing this with a video that has no audio to reduce complication.
I have created a cut down version of my application and provided a self contained (with ffmpeg and SDL built dependencies) project. Hopefully this can help anyone-one willing to help me :).
Project Link
https://github.com/maxhap/video-codec
After looking into encoder initialisation I found that I have to set the codec AV_CODEC_FLAG_GLOBAL_HEADER before calling avcodec_open2
pStreamCodecContainer->pEncodingCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
This change led to the re-encoded moov box looking much heathier (used MP4Box.js to parse it). However, the video still does not play correctly, the output video has grey frames at the start when played in VLC and won't play in other players.
I have since tried creating an encoding context via the sample code, rather than using my decoding codec parameters. This led to fixing the bad/data or encoding issue. However, my DTS times are scaling to huge numbers
Here is my new codec init
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
pStreamCodecContainer->pEncodingCodecContext->height = pStreamCodecContainer->pDecodingCodecContext->height;
pStreamCodecContainer->pEncodingCodecContext->width = pStreamCodecContainer->pDecodingCodecContext->width;
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
/* take first format from list of supported formats */
if (pStreamCodecContainer->pEncodingCodec->pix_fmts)
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pEncodingCodec->pix_fmts[0];
}
else
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pDecodingCodecContext->pix_fmt;
}
/* video time_base can be set to whatever is handy and supported by encoder */
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
}
else
{
pStreamCodecContainer->pEncodingCodecContext->channel_layout = pStreamCodecContainer->pDecodingCodecContext->channel_layout;
pStreamCodecContainer->pEncodingCodecContext->channels =
av_get_channel_layout_nb_channels(pStreamCodecContainer->pEncodingCodecContext->channel_layout);
/* take first format from list of supported formats */
pStreamCodecContainer->pEncodingCodecContext->sample_fmt = pStreamCodecContainer->pEncodingCodec->sample_fmts[0];
pStreamCodecContainer->pEncodingCodecContext->time_base = AVRational{ 1, pStreamCodecContainer->pEncodingCodecContext->sample_rate };
}
Any ideas why my DTS time is re-scaling incorrectly?
I managed to fix the DTS scalling by using the time_base value directly from the decoding streams.
So
pStreamCodecContainer->pEncodingCodecContext->time_base = m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base
Instead of
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
I will create an answer based on all my finding.
To fix the initial problem of a corrupted moov box I had to add the AV_CODEC_FLAG_GLOBAL_HEADER flag to the encoding codec context before calling avcodec_open2.
encCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
The next issue was badly scaled DTS values in the encoded package, this was causing a side effect of the final mp4 duration being in the hundreds of hours long. To fix this I had to change the encoding codec context timebase to be that of the decoding context streams timebase. This is different than using av_inv_q(framerate) as suggested in the avcodec transcoding example.
encCodecContext->time_base = decCodecFormatContext->streams[streamIndex]->time_base;

avcodec_open only works with uncompressed format

Context: I have a file called libffmpeg.so, that I took from the APK of an Android application that is using FFMPEG to encode and decode files between several Codecs. Thus, I take for grant that this is compiled with encoding options enable and that this .so file is containing all the codecs somewhere. This file is compiled for ARM (what we call ARMEABI profile on Android).
I also have a very complete class with interops to call API from ffmpeg. Whatever is the origin of this static library, all call responses are good and most endpoints exist. If not I add them or fix deprecated one.
When I want to create an ffmpeg Encoder, the returned encoder is correct.
var thisIsSuccessful = avcodec_find_encoder(myAVCodec.id);
Now, I have a problem with Codecs. The problem is that - let's say that out of curiosity - I iterate through the list of all the codecs to see which one I'm able to open with the avcodec_open call ...
AVCodec codec;
var res = FFmpeg.av_codec_next(&codec);
while((res = FFmpeg.av_codec_next(res)) != null)
{
var name = res->longname;
AVCodec* encoder = FFmpeg.avcodec_find_encoder(res->id);
if (encoder != null) {
AVCodecContext c = new AVCodecContext ();
/* put sample parameters */
c.bit_rate = 64000;
c.sample_rate = 22050;
c.channels = 1;
if (FFmpeg.avcodec_open (ref c, encoder) >= 0) {
System.Diagnostics.Debug.WriteLine ("[YES] - " + name);
}
} else {
System.Diagnostics.Debug.WriteLine ("[NO ] - " + name);
}
}
... then only uncompressed codecs are working. (YUV, FFmpeg Video 1, etc)
My hypothesis are these one:
An option that was missing at the time of compiling to the .so file
The av_open_codec calls is acting depending on the properties of the AVCodecContext I've referenced in the call.
I'm really curious about why only a minimum set of uncompressed codecs are returned?
[EDIT]
#ronald-s-bultje answer led me to read AVCodecContext API description, and there are a lot of mendatory fileds with "MUST be set by user" when used on an encoder. Setting a value for these parameters on AVCodecContext made most of the nice codecs available:
c.time_base = new AVRational (); // Output framerate. Here, 30fps
c.time_base.num = 1;
c.time_base.den = 30;
c.me_method = 1; // Motion-estimation mode on compression -> 1 is none
c.width = 640; // Source width
c.height = 480; // Source height
c.gop_size = 30; // Used by h264. Just here for test purposes.
c.bit_rate = c.width * c.height * 4; // Randomly set to that...
c.pix_fmt = FFmpegSharp.Interop.Util.PixelFormat.PIX_FMT_YUV420P; // Source pixel format
The av_open_codec calls is acting depending on the properties of the
AVCodecContext I've referenced in the call.
It's basically that. I mean, for the video encoders, you didn't even set width/height, so most encoders really can't be expected to do anything useful like this, and are right to error right out.
You can set default parameters using e.g. avcodec_get_context_defaults3(), which should help you a long way to getting some useful settings in the AVCodecContext. After that, set typical ones like width/height/pix_fmt to the ones describing your input format (if you want to do audio encoding - which is actually surprisingly unclear from your question, you'll need to set some different ones like sample_fmt/sample_rate/channels, but same idea). And then you should be relatively good to go.

Extracting Frames From A Video In Matlab

I was trying to extract frames from a small video using the following lines of code :
clc;
close all;
% Open an sample avi file
[FileName,PathName] = uigetfile('*.AVI','Select the Video');
file = fullfile(PathName,FileName);
%filename = '.\003.AVI';
mov = MMREADER(file);
% Output folder
outputFolder = fullfile(cd, 'frames');
if ~exist(outputFolder, 'dir')
mkdir(outputFolder);
end
%getting no of frames
numberOfFrames = mov.NumberOfFrames;
numberOfFramesWritten = 0;
for frame = 1 : numberOfFrames
thisFrame = read(mov, frame);
outputBaseFileName = sprintf('%3.3d.png', frame);
outputFullFileName = fullfile(outputFolder, outputBaseFileName);
imwrite(thisFrame, outputFullFileName, 'png');
progressIndication = sprintf('Wrote frame %4d of %d.', frame,numberOfFrames);
disp(progressIndication);
numberOfFramesWritten = numberOfFramesWritten + 1;
end
progressIndication = sprintf('Wrote %d frames to folder "%s"',numberOfFramesWritten,outputFolder);
disp(progressIndication);
However, I am getting the following error on running this code :
??? Error using ==> extract at 10
The file requires the following codec(s) to be installed on your system:
Unknown Codec
Can someone help me to sort out this error ? Thanks.
The file seems to be encoded with an unknown video codec (unknown to MatLab probably). The file extension (.avi, .mpeg, etc.) does not denote a codec but rather a container if I'm not mistaking.
The links at the bottom provide some information about supported file formats by MatLab. You should try to retrieve what container and codec your video file uses and see if MatLab supports it. A way of retrieving the codec is by opening it in VLC mediaplayer (by VideoLan) right click the movie, extra-> codec information, or if you are on windows simply open the movie in VLC and press CTRL+J.
Some usefull links:
http://www.mathworks.nl/help/matlab/ref/mmreader-class.html
http://www.mathworks.nl/help/matlab/import_export/supported-video-file-formats.html
http://www.videolan.org/vlc/
Kind regards,
Ernst Jan
Instead of MMREADER, I used the following lines of code :
movieInfo = aviinfo(movieFullFileName);
mov = aviread(movieFullFileName);
% movie(mov);
% Determine how many frames there are.
numberOfFrames = size(mov, 2);
numberOfFramesWritten = 0;
It worked.

Is packet duration guaranteed to be uniform for entire stream?

I use packet duration to translate from frame index to pts and back, and I'd like to be sure that this is a reliable method of doing so.
Alternatively, is there a better way to translate pts to a frame index and vice versa?
A snippet showing my usage:
bool seekFrame(int64_t frame)
{
if(frame > container.frameCount)
frame = container.frameCount;
// Seek to a frame behind the desired frame because nextFrame() will also increment the frame index
int64_t seek = pts_cache[frame-1]; // pts_cache is an array of all frame pts values
// get the nearest prior keyframe
int preceedingKeyframe = av_index_search_timestamp(container.video_st, seek, AVSEEK_FLAG_BACKWARD);
// here's where I'm worried that packetDuration isn't a reliable method of translating frame index to
// pts value
int64_t nearestKeyframePts = preceedingKeyframe * container.packetDuration;
avcodec_flush_buffers(container.pCodecCtx);
int ret = av_seek_frame(container.pFormatCtx, container.videoStreamIndex, nearestKeyframePts, AVSEEK_FLAG_ANY);
if(ret < 0) return false;
container.lastPts = nearestKeyframePts;
AVFrame *pFrame = NULL;
while(nextFrame(pFrame, NULL) && container.lastPts < seek)
{
;
}
container.currentFrame = frame-1;
av_free(pFrame);
return true;
}
No, not guaranteed. It may work with some codec/container combination where frame-rate is static. avi, h264 raw (annex-b) and yuv4mpeg come to mind. But other containers like flv, mp4, ts, have a PTS/DTS (or CTS) for EVERY frame. The source could be variable frame rate, or frames could have be dropped at some point during processing due to bandwidth. Also some codecs will remove duplicate frames.
So unless you created the file yourself. Do not trust it. There is no guaranteed way to look at a frame and know its 'index' except start at the beginning and count.
Your method, MAY be good enough for most files however.

Video decoding using ffms2 (ffmpegsource)

I'm using ffms2 (aka FFmpegSource) for decoding video frames and display on UI based on wxWidgets.
My player works fine for low resolution video (320*240, 640*480) but for higher resolution (1080) it is very slow. I'm not able to meed the desired frame for high resolution video.
After time analysis I found that FFMS_GetFrame() frame function takes much longer time for high resolution frame.
Here are the results.
1. 320*240 FFMS_GetFrame takes 4-6ms
2. 640*480 FFMS_GetFrame takes >20ms
3. 1080*720 FFMS_GetFrame takes >40
Which means that I'll never meets 30 fps requirement for 1080p frame with FFMS2. But I'm not sure if this is the case.
Please suggest what could be going wrong.
void SetPosition(int64 pos)
{
uint8_t* data_ptr = NULL;
/*check if position is valid*/
if (!m_track || pos < 0 && pos > m_videoProp->NumFrames - 1)
return; // ERR_POS;
wxMilliClock_t start_wx_t = wxGetLocalTimeMillis();
long long start_t = start_wx_t.GetValue();
m_frameId = pos;
if(m_video)
{
m_frameProp = FFMS_GetFrame(m_video, m_frameId, &m_errInfo);
if(!m_frameProp) return;
if(m_frameProp)
{
m_width_ffms2 = m_frameProp->EncodedWidth;
m_height_ffms2 = m_frameProp->EncodedHeight;
}
wxMilliClock_t end_wx_t = wxGetLocalTimeMillis();
long long end_t = end_wx_t.GetValue();
long long diff_t = end_t - start_t;
wxLogDebug(wxString(wxT("Frame Grabe Millisec") + ToString(diff_t)));
//m_frameInfo = FFMS_GetFrameInfo(m_track, FFMS_TYPE_VIDEO);
/* If you want to change the output colorspace or resize the output frame size, now is the time to do it.
IMPORTANT: This step is also required to prevent resolution and colorspace changes midstream. You can
always tell a frame's original properties by examining the Encoded properties in FFMS_Frame. */
/* A -1 terminated list of the acceptable output formats (see pixfmt.h for the list of pixel formats/colorspaces).
To get the name of a given pixel format, strip the leading PIX_FMT_ and convert to lowercase. For example,
PIX_FMT_YUV420P becomes "yuv420p". */
#if 0
int pixfmt[2];
pixfmt[0] = FFMS_GetPixFmt("bgr24");
pixfmt[1] = -1;
#endif
// FFMS_SetOutputFormatV2 returns 0 on success. It Returns non-0 and sets ErrorMsg on failure.
int failure = FFMS_SetOutputFormatV2(m_video, pixfmt, m_width_ffms2, m_height_ffms2, FFMS_RESIZER_BICUBIC, &m_errInfo);
if (failure)
{
//FFMS_DestroyVideoSource(m_video);
//m_video = NULL;
return; //return ERR_POS;
}
data_ptr = m_frameProp->Data[0];
}
else
{
m_width_ffms2 = 320;
m_height_ffms2 = 240;
}
if(data_ptr)
{
memcpy(m_buf, data_ptr, 3*m_height_ffms2 * m_width_ffms2);
}
else
{
memset(m_buf, 0, 3*m_height_ffms2 * m_width_ffms2);
}
}
Slower video decoding with larger frames is totally normal. 1080x720 has about ten times as many pixels as 320x240, so having GetFrame take about ten times as long is not surprising (it's not a strictly linear relationship as there's a lot of other factors that play into decoding speed, but pixel count and time to decode are fairly correlated).
Setting the output format for every frame is unnecessary and is going to be making things a lot slower. Unless you specifically want the output format to change you should call it just once after opening the video, and it'll apply to all frames requested after that.

Resources