My application needs to switch between two (or more) streams at the input while there is only one output (you could think about as a stream multiplexer). The frames from the input are decoded and then re-encoded again due to an overlay stuff.
So to arrange the AVFrame PTS I calculate an interval before encoding the frames. But the thing is when I switch between a RTMP stream and a MP4 file, the video is delayed a bit every time I switch. So, at the third switch the resulting stream is out of sync.
I don't know if I'm missing something I have to modify on the frame before encoding. I also though about creating an independent PTS for frames at the output but I don't know how to create it.
The input streams could have different FPS, timebases or codecs and the application must be able to deal with all of them.
I discovered the root cause.
The problem was the MP4 file. With this type of file (for some reason) the video and audio packets are read in bug bunches (i.e.: 20 video packets and then 20 audio packets) whilst on a RTMP stream is more like (2 video and then 2 audio packets).
So the problem was the switch was being applied before reading all the bunch (i.e.: 20 video packets and 10 audio packets) so after that point the resulting stream is out of sync no matter what you do after that.
The solution I implemented waits until a decoded frame's type is different than the previous one. Then is when I perform the switch.
Related
I tried to follow the following example: https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/muxing.c
Problem: my stream h264 is not possible to do demux, so the frames I send to the encoder have some blank data, example pkt.pts == AV_NOPTS_VALUE, this causes an error when calling the av_interleaved_write_frame (mux) function.
Considering that the framerate is not constant, how do I generate the pkt.pts correctly from the video frames as I get it from the raw live stream?
Is there any way for ffmpeg libav to automatically calculate pkt.pts, pkt.dts timestamps as I send frames to the muxer with av_interleaved_write_frame?
Quite an old question, but it's still worth answering, since FFMPEG doesn't make it easy.
Consequent frames' PTS and DTS (in generic case they would be the same) shall be equal to previousPacketPTS + curtrentPacket.duration. Your curtrentPacket.duration is just what it sounds - information of how long given frame would be displayed before switching to the next one. Remember that this duration is in stream's time base units, which is rational of a second (for example 1/50 time base means the shortest frame of that stream can last 1/50 sec, or 20 ms). So you can translate time difference between two video frames into video frame duration, ie. when you receive a video frame, then it's duration would be the time needed for the next frame to come - again, in stream's time base. And that's all you need for calculating PTS and DTS for the frames.
My video stream is encoded with H.264, and audio stream is encoded with AAC. In fact, I get these streams by reading a file whose format is flv. I only decode video stream in order to get all video frames, then I do something by using ffmpeg before encoding them, such as change some pixels. At last I will push the video and audio stream to Crtmpserver. When I pull the live stream from this server, I find the video is not fluent but audio is normal. But when I change gop_size from 12 to 3, everything is OK. What reasons cause that problem, can anyone explain something to me?
Either the CPU, or the bandwidth is not sufficient for your usage. RTMP will always process audio before video. If ffmpeg, or the network is not able to keep up with the live stream, Video frames will be dropped. Because audio is so much smaller, and cheaper to encode, a very slow CPU or congested network will usually have no problems keeping up.
I have series of encoded packets, H.264 video and AAC audio. As they're coming on, I'm writing them to a video file, using av_write_frame.
Given the following situation in a row
10 seconds of video, then
10 seconds of video and audio, then
10 seconds of video.
Everything muxes fine and when played back via VLC or QuickTime, everything looks good. If I play it in Windows Media Player, the audio is played immediately.
It seems I'm doing something wrong, but checking the PTS of the audio stream packets, they are set to 10 seconds based on the time base of the audio stream.
It seems that it's best to inject empty audio packets at the beginning of the stream. This is the only way that video playback in WMP would work. Every player handles the streams differently and this is the best way to ensure compatibility across players.
I'm converting my video to mp4 H.264 with ffmpeg than changing the moov atom to front with qt-faststart, so I can stream the video.
Everything works fine with small videos 5-10 minute, but when it comes to large ones 1-2 hrs it can take a significant time to start playing. it loads 6-10mb and only than start playing the video.
In flv that's not the case, it plays immediately no matter how large the video is. How can i fix that?
It is just the nature of the formats. The moov atom contains all the metadata for every frame of audio or video in the file. So, the more frames, the larger the moov. By putting all this metatadata in one place, it makes seeking within a file much easier. Once you have downloaded the moov, the player knows exactly what byte in the file to request to seek to a specific frame or time. An FLV file is sent one frame at a time, there is no index of frame locations, this makes seeking extremely difficult for the player.
You can try making the moov smaller by ensuring your video is not in variable frame rate, and that you do not have unnecessary data (such as movie posters) embedded in the metadata. Having the server send gzip streams may help as well as the moov should compress well.
I have a bunch of .png frames and a .mp3 audio file which I would like to convert into a video. Unfortunately, the frames do not correspond to a constant frame rate. For instance, one frame may need to be displayed for 1 second, whereas another may need to be displayed for 3 seconds.
Is there any open-source software (something like ffmpeg) which would help me accomplish this? Any feedback would be greatly appreciated.
Many thanks!
This is not an elegant solution, but it will do the trick: duplicate frames as necessary so that you end up with some resulting (fairly high) constant framerate, 30 or 60 fps (or higher if you need higher time resolution). You simply change which frame is duplicated at the closest new frame to the exact timestamp you want. Frames which are exact duplicates will be encoded to a tiny size (a few bytes) with any decent codec, so this is fairly compact. Then just encode with ffmpeg as usual.
If you have a whole lot of these and need to do it the "right" way: you can indicate the timing either in the container (such as mp4, mkv, etc) or in the codec. For example in an H.264 stream you will have to insert SEI messages of type pic_timing to specify the timing of each frame. Alternately you will have to write your own muxer relying on a container library such as Matroska (mkv) or GPAC (mp4) to indicate the timing in the container. Note that not all codecs/containers support arbitrarily variable frame rate. Only a few codecs support timing in the codec. Also, if timing is specified in both container and codec, the container timing is used (but if you are muxing a stream into a container, the muxer should pick up the individual frame timestamps from the codec).