ffmpeg decoding slow calling without avformat_find_stream_info - performance

I am decoding h264 rtp stream with ffmpeg on android. I found a strange problem: if I don't call avformat_find_stream_info,decoding P frame takes tens of micro seconds, by contrast, calling avformat_find_stream_info before decoding will reduce P frame decoding time to less than 1 ms on average. However, avformat_find_stream_info itself is time consuming on network streams. Are there anything I can do to make decoding fast without calling avformat_find_stream_info?

When avformat_find_stream_info is called, streaming URL(or local file) is scanned by this function to check valid streams in given URL.
That means, in other words, it will decode few packets from given input URL so you can decode packets fast with AVCodecContext, which is initialized in avformat_find_stream_info.
I didn't test it but It cannot be decoded without calling avformat_find_stream_info In general situation, or maybe it is initialized every time when packet is decoded.
Anyway, that's why avformat_find_stream_info consumes network traffic. because, as i said, it pulls first few packets.
If you really want to decode packets fast without calling this function, you should initialize AVCodecContext yourself, manually.

Related

Convert LPCM buffer to AAC for HTTP Live Streaming

I have an application that records audio from devices into a Float32 (LPCM) buffer.
However, LPCM needs to be encoded in an audio format (MP3, AAC) to be used as a media segment to be streamed, according to the HTTP Live Streaming specifications. I have found some useful resources on how to convert a LPCM file to an AAC / MP3 file but this is not exactly what I am looking for, since I am not willing to convert a file but a buffer.
What are the main differences between converting an audio file and a raw audio buffer (LPCM, Float32)? Is the latter more trivial?
My initial thought was to create a thread that would regularly fetch data from a ring buffer (where the raw audio is stored) and convert it to a a valid audio format (either AAC or MP3).
Would it be more sensible to do so immediately when the AudioBuffer is captured through a AURenderCallback and hence pruning the ring buffer?
Thanks for your help,
The core audio recording buffer length and the desired audio file length are rarely always exactly the same. So it might be better to poll your circular/ring buffer (you know the sample rate, which should tell approximately how often) to decouple the two rates, and convert the buffer (if filled sufficiently) to a file at a later time. You can memory map a raw audio file to the buffer, but there may or may not be any performance difference between that, and async writing a temp file.

av_interleaved_write_frame return 0 but no data written

I use the ffmpeg to stream the encoded aac data , i use the
av_interleaved_write_frame()
to write frame.
The return value is 0,and it means success as the description.
Write a packet to an output media file ensuring correct interleaving.
The packet must contain one audio or video frame. If the packets are already correctly interleaved, the application should call av_write_frame() instead as it is slightly faster. It is also important to keep in mind that completely non-interleaved input will need huge amounts of memory to interleave with this, so it is preferable to interleave at the demuxer level.
Parameters
s media file handle
pkt The packet containing the data to be written. pkt->buf must be set to a valid AVBufferRef describing the packet data. Libavformat takes ownership of this reference and will unref it when it sees fit. The caller must not access the data through this reference after this function returns. This can be NULL (at any time, not just at the end), to flush the interleaving queues. Packet's stream_index field must be set to the index of the corresponding stream in s.streams. It is very strongly recommended that timing information (pts, dts duration) is set to correct values.
Returns
0 on success, a negative AVERROR on error.
However, I found no data written.
What did i miss ? How to solve it ?
av_interleaved_write_frame() must hold data in memory before it writes it out. interleaving is the process of taking multiple streams (one audio stream, one video for example) and serializing them in a monotonic order. SO, if you write an audio frame, it will keep in in memory until you write a video frame that comes 'later'. Once a later video frame is written, the audio frame can be flushed' This way streams can be processed at different speeds or in different threads, but the output is still monotonic. If you are only writing one stream (one acc stream, no video) then use av_write_frame() as suggested.

rtp decoding issue on p frames

I am streaming an rtsp stream from an IP camera. I have a parser which packages the data into frames based on the rtp payload type. The parser is able to process I frames since these contain the start of frame and end of frame packets, as well as packets in between (this is FU-A payload type).
These are combined to create a complete frame. The problem comes in when I try to construct P frames, from the wireshark dump some of these appear to be fragmented (FU-A payload type) these contain the start of frame and end of frame packets, however these do not contain packets in between. Also in some instances the camera sends strange marked packets with a payload type 1, this according to my understanding should be a complete frame.
Upon processing these two versions of P frames I then use ffmpeg to attempt to decode the frames, I receive errors messages like top block unavailable for intra mode 4x4.
At first I thought this could be due to an old ffmpeg version but I searched the web and recompiled ffmpeg with the same problem.
The I frames appear fragmented and contain lots of packets, some P frame have a start of frame (0x81) and EOF (0x41) but no packets in between and some just looked corrupt starting with 0x41 (seems like this should be the second byte) which gives payload type of 1. I am a novice when it comes to these issues but I looked at rtp documentation and I cannot find an issue with how I handle the data.
Also I stream from VLC and this seems fine but appears to halve the frame rate, I am not sure how they are able to reconstruct frames.
Please could someone help.
It is common for I-frames to be fragmented since they are usually a lot bigger than p-frames. P-frames can however also be fragmented. However there is nothing wrong with a P-frame that has been fragmented into 2 RTP packets i.e. one with the FU-header start bit set, and the following one with the end bit set. There do not need to be packets in between. For example, if the MTU is 1500, and the NAL unit is 1600 bytes large, this will be fragmented into 2 RTP packets.
As for the packets "looking corrupt" starting with 0x41 without a prior packet with a 0x81, you should examine the sequence number in the RTP header as this will tell you straight away if packets are missing. If you are seeing packet loss, the first thing to try is to increase your socket receiver buffer size.
Since VLC is able to play the stream, there is most likely an issue in the way you are reassembling the NAL units.
Also, in your question it is not always clear which byte you are referring to: I'm assuming that the 0x41 and 0x81 appear in the 2nd byte of the RTP payload, i.e. the FU header in the case where the NAL unit type of the first byte is FU-A.
Finally, note that "payload type" is the RTP payload type (RFC3550), not the NAL unit type defined in the H.264 standard.

Parallelize encoding of audio-only segments in ffmpeg

We are looking to decrease the execution time of segmentation/encoding from wav to aac segmented for HTTP live streaming using ffmpeg to segment and generate a m3u8 playlist by utilizing all the cores of our machine.
In one experiment, I had ffmpeg directly segment a wav file into aac with libfdk_aac, however it took quite a long time to finish.
In the second experiment, I had ffmpeg segment a wav file as is (wav) which was quite fast (< 1 second on our machines), then use GNU parallel to execute ffmpeg again to encode the wav segments to aac and manually changed the .m3u8 file without changing their durations. This was performed much faster however "silence" gaps could be heard when streaming the output audio.
I have initially tried the second scenario using mp3 and result was still quite the same. Though I've read that lame adds padding during encoding (http://scruss.com/blog/2012/02/21/generational-loss-in-mp3-re-encoding/), does this this mean that libfdk_aac also adds padding during encoding?
Maybe this one is related to this question: How can I encode and segment audio files without having gaps (or audio pops) between segments when I reconstruct it?
According to section 4 of HLS Specification, we have this:
A Transport Stream or audio elementary stream segment MUST be the
continuation of the encoded media at the end of the segment with the
previous sequence number, where values in a continuous series, such as
timestamps and Continuity Counters, continue uninterrupted
"Silence" gaps are 99,99% of times related to wrong counters/discontinuity. Because you wrote that you manually changed the .m3u8 file without changing their durations I deduce you tried to cut the audio by yourself. It can't be done.
An HLS stream can't have a parallelizable creation because of these counters. They must follow a sequence [ MPEG2-TS :-( ]. You better get a faster processor.

DirectShow push sources, syncing and timestamping

I have a filter graph that takes raw audio and video input and then uses the ASF Writer to encode them to a WMV file.
I've written two custom push source filters to provide the input to the graph. The audio filter just uses WASAPI in loopback mode to capture the audio and send the data downstream. The video filter takes raw RGB frames and sends them downstream.
For both the audio and video frames, I have the performance counter value for the time the frames were captured.
Question 1: If I want to properly timestamp the video and audio, do I need to create a custom reference clock that uses the performance counter or is there a better way for me to sync the two inputs, i.e. calculate the stream time?
The video input is captured from a Direct3D buffer somewhere else and I cannot guarantee the framerate, so it behaves like a live source. I always know the start time of a frame, of course, but how do I know the end time?
For instance, let's say the video filter ideally wants to run at 25 FPS, but due to latency and so on, frame 1 starts perfectly at the 1/25th mark but frame 2 starts later than the expected 2/25th mark. That means there's now a gap in the graph since the end time of frame 1 doesn't match the start time of frame 2.
Question 2: Will the downstream filters know what to do with the delay between frame 1 and 2, or do I manually have to decrease the length of frame 2?
One option is to omit time stamps, but this might end up in filters fail to process this data. Another option is to use System Reference Clock to generate time stamps - in any way this is preferable to directly using performance counter as a time stamp source.
Yes you need to time stamp video and audio in order to keep them in sync, this is the only way to tell that data is actually attributed to the same time
Video samples don't have time, you can omit stop time or set it equal to start time, a gap between video frame stop time and next frame start time has no consequences
Renderers are free to choose whether they need to respect time stamps or not, with audio you of course will want smooth stream without gaps in time stamps

Resources