Video decoding using ffms2 (ffmpegsource) - ffmpeg

I'm using ffms2 (aka FFmpegSource) for decoding video frames and display on UI based on wxWidgets.
My player works fine for low resolution video (320*240, 640*480) but for higher resolution (1080) it is very slow. I'm not able to meed the desired frame for high resolution video.
After time analysis I found that FFMS_GetFrame() frame function takes much longer time for high resolution frame.
Here are the results.
1. 320*240 FFMS_GetFrame takes 4-6ms
2. 640*480 FFMS_GetFrame takes >20ms
3. 1080*720 FFMS_GetFrame takes >40
Which means that I'll never meets 30 fps requirement for 1080p frame with FFMS2. But I'm not sure if this is the case.
Please suggest what could be going wrong.
void SetPosition(int64 pos)
{
uint8_t* data_ptr = NULL;
/*check if position is valid*/
if (!m_track || pos < 0 && pos > m_videoProp->NumFrames - 1)
return; // ERR_POS;
wxMilliClock_t start_wx_t = wxGetLocalTimeMillis();
long long start_t = start_wx_t.GetValue();
m_frameId = pos;
if(m_video)
{
m_frameProp = FFMS_GetFrame(m_video, m_frameId, &m_errInfo);
if(!m_frameProp) return;
if(m_frameProp)
{
m_width_ffms2 = m_frameProp->EncodedWidth;
m_height_ffms2 = m_frameProp->EncodedHeight;
}
wxMilliClock_t end_wx_t = wxGetLocalTimeMillis();
long long end_t = end_wx_t.GetValue();
long long diff_t = end_t - start_t;
wxLogDebug(wxString(wxT("Frame Grabe Millisec") + ToString(diff_t)));
//m_frameInfo = FFMS_GetFrameInfo(m_track, FFMS_TYPE_VIDEO);
/* If you want to change the output colorspace or resize the output frame size, now is the time to do it.
IMPORTANT: This step is also required to prevent resolution and colorspace changes midstream. You can
always tell a frame's original properties by examining the Encoded properties in FFMS_Frame. */
/* A -1 terminated list of the acceptable output formats (see pixfmt.h for the list of pixel formats/colorspaces).
To get the name of a given pixel format, strip the leading PIX_FMT_ and convert to lowercase. For example,
PIX_FMT_YUV420P becomes "yuv420p". */
#if 0
int pixfmt[2];
pixfmt[0] = FFMS_GetPixFmt("bgr24");
pixfmt[1] = -1;
#endif
// FFMS_SetOutputFormatV2 returns 0 on success. It Returns non-0 and sets ErrorMsg on failure.
int failure = FFMS_SetOutputFormatV2(m_video, pixfmt, m_width_ffms2, m_height_ffms2, FFMS_RESIZER_BICUBIC, &m_errInfo);
if (failure)
{
//FFMS_DestroyVideoSource(m_video);
//m_video = NULL;
return; //return ERR_POS;
}
data_ptr = m_frameProp->Data[0];
}
else
{
m_width_ffms2 = 320;
m_height_ffms2 = 240;
}
if(data_ptr)
{
memcpy(m_buf, data_ptr, 3*m_height_ffms2 * m_width_ffms2);
}
else
{
memset(m_buf, 0, 3*m_height_ffms2 * m_width_ffms2);
}
}

Slower video decoding with larger frames is totally normal. 1080x720 has about ten times as many pixels as 320x240, so having GetFrame take about ten times as long is not surprising (it's not a strictly linear relationship as there's a lot of other factors that play into decoding speed, but pixel count and time to decode are fairly correlated).
Setting the output format for every frame is unnecessary and is going to be making things a lot slower. Unless you specifically want the output format to change you should call it just once after opening the video, and it'll apply to all frames requested after that.

Related

Text typing effect in C# based on current FPS and characters per second?

I am working on a typing effect for TextMeshPro texts in Unity that should take into account both the current FPS and the user input 'characters per second' (to determine the speed).
The implementation runs in an IEnumerator and displays one or more characters of a given text at a time. After displaying the char(s), there is a 'yield return new WaitForSeconds()' before the next round of revealing begins. (It should be possible to display more than one char at a time, because WaitForSeconds() takes too much time in some cases, even if I enter very small numbers. So rather than waiting after every single char, it should be waited after a precomputed number of chars to maintain the specified typing speed.)
I'm not sure if my approach works as intended in the different possible scenarios and additionally, I'm not happy with the computations within the IEnumerator because I think it could slow down the revealing process.
I tested the typing effect with different FPS in PlayMode (by setting the FPS with "Application.targetFrameRate" manually) and noticed that with FPS lower than 30 the text is revealed very haltingly and thus the viewer gets frustrated because it looks laggy. Maybe someone has experience with this and can suggest an easier way of implementation?
// Precompute the time for displaying a single character by a given number of characters per second:
public void GetTimePerChar() {
if (_charsPerSecond > 0) {
TimePerChar = 1f / _charsPerSecond;
} else {
TimePerChar = 0f;
}
// Coroutine that reveals characters of a given text over time:
private IEnumerator DisplayText() {
/* some other code */
while (TmpText.maxVisibleCharacters < TotalCharCount) { // Reveal characters until the the total amount of chars is reached
if (CharsPerSecond > 0) {
TimePerFrame = Time.deltaTime; // How much time does 1 frame take
if (TimePerChar > 0) {
CharsPerFrame = TimePerFrame / TimePerChar; // How many chars can be displayed within a single frame
} else {
CharsPerFrame = 0f;
}
CharsPerRound = (int)Math.Ceiling(CharsPerFrame); // Rounded up number of chars (as fractions doesn't make sense to display)
TimePerRound = TimePerChar * CharsPerRound; // The individual waiting time before the next char(s) get revealed
TmpText.maxVisibleCharacters += CharsPerRound; // Reveal one or more characters at a time
if (CharsPerRound - CharsPerFrame > 0) { // If the number of chars got rounded up wait afterwards as long as it shall take to display them
yield return new WaitForSeconds(TimePerRound);
}
} else {
TmpText.maxVisibleCharacters = TotalCharCount; // With zero CharsPerSecond, display the whole text at once
}
}
/* some other code */
}

Problems with video recording logic; filling empty frame and synchronizing fps

I got some problems with my video recording logic.
My recording algorithm is below as pseudo code.
fps = 30
msPerFrame = 1000 / fps
videoRecorder = VideoRecorder(fps)
timer.start()
while (true) {
if ((timer.elapsed() >= msPerFrame) && (newFrame.isReady() == true)) {
videoRecorder.push(newFrame)
timer.restart()
}
}
Note that the videoRecorder determines the fps of the video file as soon as it is created and starts recording.
The problem is below:
What is the best way to handle when (timer.elapsed() >= msPerFrame) && (newFrame.isReady() == false)? If I just wait for a frame to be ready, this gap is recorded at fps despite being larger than the actual msPerFrame.
How to calibrate recording fps error? If fps=30, msPerFrame=33.3333.... However, the timer.elapsed() returns milliseconds value so timer.elapsed() >= msPerFrame may true when msPerFrame >= 34. So the 30 newFrames at 1020 milliseconds are pushed to 1000 milliseconds of resulting video.
I simply solved the question (1) using frame buffer which contains newFrame as FIFO order. The recorder now references the last frame of the frame buffer when needing to push the frame to the recording buffer. So, if newFrame is not ready when timer.elapsed() >= msPerFrame, then the recorder push the last frame one more.
bufferMaxSize = MAX_BUF_SIZE
frameBuffer
isRecording = false
// This callback is called when newFrame is ready.
callbackFunction(newFrame) {
frameBuffer.push(newFrame) // Push to the back of the buffer
while (frameBuffer.size() > MAX_BUF_SIZE)
frameBuffer.pop() // Remove front of the buffer
}
startRecordingFunction() {
fps = 30
msPerFrame = 1000 / fps
videoRecorder = VideoRecorder(fps)
timer.start()
while (isRecording) {
if (timer.elapsed() >= msPerFrame) {
videoRecorder.push(frameBuffer.back())
timer.restart()
}
}
videoRecorder.save("video.mp4")
}
Plus, the thread running startRecordingFunction() is critical for time, so giving higher job priority of that thread gave me more stable result.

FFMEG libavcodec decoder then re-encode video issue

I'm trying to use libavcodec library in FFMpeg to decode then re-encode a h264 video.
I have the decoding part working (rendes to an SDL window fine) but when I try to re-encode the frames I get bad data in the re-encoded videos samples.
Here is a cut down code snippet of my encode logic.
EncodeResponse H264Codec::EncodeFrame(AVFrame* pFrame, StreamCodecContainer* pStreamCodecContainer, AVPacket* pPacket)
{
int result = 0;
result = avcodec_send_frame(pStreamCodecContainer->pEncodingCodecContext, pFrame);
if(result < 0)
{
return EncodeResponse::Fail;
}
while (result >= 0)
{
result = avcodec_receive_packet(pStreamCodecContainer->pEncodingCodecContext, pPacket);
// If the encoder needs more frames to create a packed then return and wait for
// method to be called again upon a new frame been present.
// Else check if we have failed to encode for some reason.
// Else a packet has successfully been returned, then write it to the file.
if (result == AVERROR(EAGAIN) || result == AVERROR_EOF)
{
// Higher level logic, dedcodes next frame from source
// video then calls this method again.
return EncodeResponse::SendNextFrame;
}
else if (result < 0)
{
return EncodeResponse::Fail;
}
else
{
// Prepare packet for muxing.
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
av_packet_rescale_ts(m_pPacket, pStreamCodecContainer->pEncodingCodecContext->time_base,
m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base);
}
m_pPacket->stream_index = pStreamCodecContainer->streamIndex;
int result = av_interleaved_write_frame(m_pEncodingFormatContext, m_pPacket);
av_packet_unref(m_pPacket);
}
}
return EncodeResponse::EncoderEndOfFile;
}
Strange behaviour I notice is that before I get the first packet from avcodec_receive_packet I have to send 50+ frames to avcodec_send_frame.
I built a debug build of FFMpeg and stepping into the code I notice that AVERROR(EAGAIN) is returned by avcodec_receive_packet because of the following in x264encoder::encode in encoder.c
if( h->frames.i_input <= h->frames.i_delay + 1 - h->i_thread_frames )
{
/* Nothing yet to encode, waiting for filling of buffers */
pic_out->i_type = X264_TYPE_AUTO;
return 0;
}
For some reason my code-context (h) never has any frames. I have spent a long time trying to debug ffmpeg and to determine what I'm doing wrong. But have reached the limit of my video codec knowledge (which is little).
I'm testing this with a video that has no audio to reduce complication.
I have created a cut down version of my application and provided a self contained (with ffmpeg and SDL built dependencies) project. Hopefully this can help anyone-one willing to help me :).
Project Link
https://github.com/maxhap/video-codec
After looking into encoder initialisation I found that I have to set the codec AV_CODEC_FLAG_GLOBAL_HEADER before calling avcodec_open2
pStreamCodecContainer->pEncodingCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
This change led to the re-encoded moov box looking much heathier (used MP4Box.js to parse it). However, the video still does not play correctly, the output video has grey frames at the start when played in VLC and won't play in other players.
I have since tried creating an encoding context via the sample code, rather than using my decoding codec parameters. This led to fixing the bad/data or encoding issue. However, my DTS times are scaling to huge numbers
Here is my new codec init
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
pStreamCodecContainer->pEncodingCodecContext->height = pStreamCodecContainer->pDecodingCodecContext->height;
pStreamCodecContainer->pEncodingCodecContext->width = pStreamCodecContainer->pDecodingCodecContext->width;
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
/* take first format from list of supported formats */
if (pStreamCodecContainer->pEncodingCodec->pix_fmts)
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pEncodingCodec->pix_fmts[0];
}
else
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pDecodingCodecContext->pix_fmt;
}
/* video time_base can be set to whatever is handy and supported by encoder */
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
}
else
{
pStreamCodecContainer->pEncodingCodecContext->channel_layout = pStreamCodecContainer->pDecodingCodecContext->channel_layout;
pStreamCodecContainer->pEncodingCodecContext->channels =
av_get_channel_layout_nb_channels(pStreamCodecContainer->pEncodingCodecContext->channel_layout);
/* take first format from list of supported formats */
pStreamCodecContainer->pEncodingCodecContext->sample_fmt = pStreamCodecContainer->pEncodingCodec->sample_fmts[0];
pStreamCodecContainer->pEncodingCodecContext->time_base = AVRational{ 1, pStreamCodecContainer->pEncodingCodecContext->sample_rate };
}
Any ideas why my DTS time is re-scaling incorrectly?
I managed to fix the DTS scalling by using the time_base value directly from the decoding streams.
So
pStreamCodecContainer->pEncodingCodecContext->time_base = m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base
Instead of
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
I will create an answer based on all my finding.
To fix the initial problem of a corrupted moov box I had to add the AV_CODEC_FLAG_GLOBAL_HEADER flag to the encoding codec context before calling avcodec_open2.
encCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
The next issue was badly scaled DTS values in the encoded package, this was causing a side effect of the final mp4 duration being in the hundreds of hours long. To fix this I had to change the encoding codec context timebase to be that of the decoding context streams timebase. This is different than using av_inv_q(framerate) as suggested in the avcodec transcoding example.
encCodecContext->time_base = decCodecFormatContext->streams[streamIndex]->time_base;

How to seek to a specific frame in a h.264 video?

I need to decode a video from specific frame by FFmpeg, I know h.264 video has I-frame/P-frame/B-frame, when I use av_seek_frame() to seek to a specific frame, it could only help me to find the nearest previous I-frame, so I tried to use the AVPacket.dts and a while loop to navigate to the location of the specific frame, and then start to demux the video frame.
auto DurationTime = static_cast<uint64_t>(m_pAVFormatContext->streams[m_VideoStreamIndex]->duration);
auto FrameNum = m_pAVFormatContext->streams[m_VideoStreamIndex]->nb_frames;
av_seek_frame(m_pAVFormatContext, m_VideoStreamIndex, DurationTime / FrameNum * (vSpecificFrame), AVSEEK_FLAG_FRAME | AVSEEK_FLAG_BACKWARD);
int TempValue = 0;
while ((TempValue = av_read_frame(m_pAVFormatContext, &m_AVPacket)) >= 0 && m_AVPacket.stream_index != m_VideoStreamIndex)
{
av_packet_unref(&m_AVPacket);
}
while(m_AVPacket.dts / (DurationTime / FrameNum) < vSpecificFrame)
{
if(m_AVPacket.data)
av_packet_unref(&m_AVPacket);
while ((TempValue = av_read_frame(m_pAVFormatContext, &m_AVPacket)) >= 0 && m_AVPacket.stream_index != m_VideoStreamIndex)
{
av_packet_unref(&m_AVPacket);
}
}
if(m_AVPacket.data)
av_packet_unref(&m_AVPacket);
But the question is when I demux the next frame, I can get it's pos, data, size and all other information, but the image is unable to display. Before demuxing the next I-frame, all frames between this specific frame with the next I-frame has the same case.
Did I use the wrong method? Or whether there has any other method to seek to the specific frame of a video? Thanks very much.

Is packet duration guaranteed to be uniform for entire stream?

I use packet duration to translate from frame index to pts and back, and I'd like to be sure that this is a reliable method of doing so.
Alternatively, is there a better way to translate pts to a frame index and vice versa?
A snippet showing my usage:
bool seekFrame(int64_t frame)
{
if(frame > container.frameCount)
frame = container.frameCount;
// Seek to a frame behind the desired frame because nextFrame() will also increment the frame index
int64_t seek = pts_cache[frame-1]; // pts_cache is an array of all frame pts values
// get the nearest prior keyframe
int preceedingKeyframe = av_index_search_timestamp(container.video_st, seek, AVSEEK_FLAG_BACKWARD);
// here's where I'm worried that packetDuration isn't a reliable method of translating frame index to
// pts value
int64_t nearestKeyframePts = preceedingKeyframe * container.packetDuration;
avcodec_flush_buffers(container.pCodecCtx);
int ret = av_seek_frame(container.pFormatCtx, container.videoStreamIndex, nearestKeyframePts, AVSEEK_FLAG_ANY);
if(ret < 0) return false;
container.lastPts = nearestKeyframePts;
AVFrame *pFrame = NULL;
while(nextFrame(pFrame, NULL) && container.lastPts < seek)
{
;
}
container.currentFrame = frame-1;
av_free(pFrame);
return true;
}
No, not guaranteed. It may work with some codec/container combination where frame-rate is static. avi, h264 raw (annex-b) and yuv4mpeg come to mind. But other containers like flv, mp4, ts, have a PTS/DTS (or CTS) for EVERY frame. The source could be variable frame rate, or frames could have be dropped at some point during processing due to bandwidth. Also some codecs will remove duplicate frames.
So unless you created the file yourself. Do not trust it. There is no guaranteed way to look at a frame and know its 'index' except start at the beginning and count.
Your method, MAY be good enough for most files however.

Resources