Video Slideshow from png files + mp3 audio - ffmpeg

I have a bunch of .png frames and a .mp3 audio file which I would like to convert into a video. Unfortunately, the frames do not correspond to a constant frame rate. For instance, one frame may need to be displayed for 1 second, whereas another may need to be displayed for 3 seconds.
Is there any open-source software (something like ffmpeg) which would help me accomplish this? Any feedback would be greatly appreciated.
Many thanks!

This is not an elegant solution, but it will do the trick: duplicate frames as necessary so that you end up with some resulting (fairly high) constant framerate, 30 or 60 fps (or higher if you need higher time resolution). You simply change which frame is duplicated at the closest new frame to the exact timestamp you want. Frames which are exact duplicates will be encoded to a tiny size (a few bytes) with any decent codec, so this is fairly compact. Then just encode with ffmpeg as usual.
If you have a whole lot of these and need to do it the "right" way: you can indicate the timing either in the container (such as mp4, mkv, etc) or in the codec. For example in an H.264 stream you will have to insert SEI messages of type pic_timing to specify the timing of each frame. Alternately you will have to write your own muxer relying on a container library such as Matroska (mkv) or GPAC (mp4) to indicate the timing in the container. Note that not all codecs/containers support arbitrarily variable frame rate. Only a few codecs support timing in the codec. Also, if timing is specified in both container and codec, the container timing is used (but if you are muxing a stream into a container, the muxer should pick up the individual frame timestamps from the codec).

Related

FFmpeg api, how to mux raw h264 without pts timestamps to mp4 or m3u8

I tried to follow the following example: https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/muxing.c
Problem: my stream h264 is not possible to do demux, so the frames I send to the encoder have some blank data, example pkt.pts == AV_NOPTS_VALUE, this causes an error when calling the av_interleaved_write_frame (mux) function.
Considering that the framerate is not constant, how do I generate the pkt.pts correctly from the video frames as I get it from the raw live stream?
Is there any way for ffmpeg libav to automatically calculate pkt.pts, pkt.dts timestamps as I send frames to the muxer with av_interleaved_write_frame?
Quite an old question, but it's still worth answering, since FFMPEG doesn't make it easy.
Consequent frames' PTS and DTS (in generic case they would be the same) shall be equal to previousPacketPTS + curtrentPacket.duration. Your curtrentPacket.duration is just what it sounds - information of how long given frame would be displayed before switching to the next one. Remember that this duration is in stream's time base units, which is rational of a second (for example 1/50 time base means the shortest frame of that stream can last 1/50 sec, or 20 ms). So you can translate time difference between two video frames into video frame duration, ie. when you receive a video frame, then it's duration would be the time needed for the next frame to come - again, in stream's time base. And that's all you need for calculating PTS and DTS for the frames.

What does ffmpeg think is the difference between an audio frame and audio sample?

Here's a curious option listed in the man pages of ffmpeg:
-aframes number (output)
Set the number of audio frames to output. This is an obsolete alias for "-frames:a", which you should use instead.
What an 'audio frame' is seems dubious to me. This SO answer says that frame is synonymous with sample, but that can't be what ffmpeg thinks a frame is. Just look at this example when I resample some audio to 22.05 kHz and a length of exactly 313 frames:
$ ffmpeg -i input.mp3 -frames:a 313 -ar:a 22.05K output.wav
If 'frame' and 'sample' were synonymous, we would expect audio duration to be 0.014 seconds, but the actual duration is 8 seconds. ffmpeg thinks the frame rate of my input is 39.125.
What's going on here? What does ffmpeg think an audio frame really is? How do I go about finding this frame rate of my input audio?
FFmpeg uses an AVFrame structure internally to convey and process all media data in chunks. The number of samples per frame depends on the decoder. For video, a frame consists of all pixel data for one picture, which is a logical grouping, although it can also contain pixel data for two half-pictures of an interlaced video stream.
For audio, decoders of DCT-based codecs typically fill a frame with the number of samples used in the DCT window - that's 1024 for AAC and 576/1152 for MP3, as Brad mentioned, depending on sampling rate. PCM samples are independent so there is no inherent concept of framing and thus frame size. However the samples still need to be accommodated within AVFrames, and ffmpeg defaults to 1024 samples per frame for planar PCM in each buffer (one for each channel).
You can use the ashowinfo filter to display the frame size. You can also use the asetnsamples filter to regroup the data in a custom frame size.
A "frame" is a bit of an overloaded term here.
In PCM, a frame is a set of samples occurring at the same time. If your audio were 22.05 kHz and you had 313 PCM frames, it's length in time would be about 14 milliseconds, as you expect.
However, your audio isn't PCM... it's MP3. An MP3 frame is about 26 milliseconds long. 313 of them add up to about 8 seconds. The frame here is a block of audio that cannot be decoded independently. (In fact, some frames actually depend on other frames via the bit reservoir!)

Using ffprobe to get number of keyframes in raw AVI file *without* processing entire file?

This question and answer cover how to get the framecount and keyframe count from an AVI file, which is very useful. I've got a raw AVI file and want to count the number of keyframes (equivalent to non-dropped frames for raw AVI), but it takes a long time to process through a raw AVI file.
There is some way to get this information without fully processing the file, as VirtualDub provides both framecount and key framecount in the file information, as well as total keyframe size, almost instantly for a 25-second raw 1920x1080 AVI. But ffprobe requires count_frames to populate nb_read_frames, which takes some good processing time.
I can do some math with the file's size and the frame's width/height/format to get a fairly good estimate of the number of frames, but I'm worried the overhead of the container could be enough to throw the math off for very short clips. (For my 25 second clip, I get 1286.12 frames, when there are really 1286.)
Any thoughts on if there is a way to get this information programatically with ffprobe or ffmpeg without processing the whole file? Or with another API on windows?

Adjusting PTS when switching between streams

My application needs to switch between two (or more) streams at the input while there is only one output (you could think about as a stream multiplexer). The frames from the input are decoded and then re-encoded again due to an overlay stuff.
So to arrange the AVFrame PTS I calculate an interval before encoding the frames. But the thing is when I switch between a RTMP stream and a MP4 file, the video is delayed a bit every time I switch. So, at the third switch the resulting stream is out of sync.
I don't know if I'm missing something I have to modify on the frame before encoding. I also though about creating an independent PTS for frames at the output but I don't know how to create it.
The input streams could have different FPS, timebases or codecs and the application must be able to deal with all of them.
I discovered the root cause.
The problem was the MP4 file. With this type of file (for some reason) the video and audio packets are read in bug bunches (i.e.: 20 video packets and then 20 audio packets) whilst on a RTMP stream is more like (2 video and then 2 audio packets).
So the problem was the switch was being applied before reading all the bunch (i.e.: 20 video packets and 10 audio packets) so after that point the resulting stream is out of sync no matter what you do after that.
The solution I implemented waits until a decoded frame's type is different than the previous one. Then is when I perform the switch.

In ffmpeg, can I specify time in frames rather than seconds?

I am programatically extracting multiple audio clips from single video files using ffmpeg.
My input data (start and end points) are specified in frames rather than seconds, and the audio clip will be used by a frame-centric user (an animator). So, I'd prefer to work in frames throughout.
In addition, the framerate is 30fps, which means I'd be working in steps of 0.033333 seconds, and I'm not sure it's reasonable to expect ffmpeg to trim correctly given such values.
Is it possible to specify a frame number instead of an ffmpeg time duration for start point (-ss) and duration (-t)? Or are there frame-centric ffmpeg commands that I've missed?
Audio frame or sample numbers don't correspond to video frame numbers, and I don't see a way to specify audio trim points by referencing video frame indices. Nevertheless, see this answer for more details.

Resources