How does Cobalt packet Opus track?

How does Cobalt packet Opus track? - cobalt

When the audio codec is Opus, some extra parameters are very important for our integration.
Is there a way to get codec delay, seek preroll and codec private?
When SB_API_VERSION is not less than SB_AUDIO_SPECIFIC_CONFIG_AS_POINTER, 'codec private' for Opus has been passed to starboard.
As I am not very sure whether the audio sample was preprocessed with 'codec delay' and 'seek preroll', is it unnecessary for audio decoder to use those?

Opus metadata is stored in AudioDecoderConfig::extra_data() and passed into SbPlayerCreate() via SbMediaAudioHeader::audio_specific_config.
You may parse it by using code similar to ParseOpusHeader function inside "media/filters/opus_audio_decoder.cc".
Unfortunately |audio_specific_config| is an array of 8 bytes in COBALT_9 and the extra bytes of the Opus metadata is missing. There is several solution to this:
1. Remove support of Opus as it is optional per the 2017 requirements. Use aac instead.
2. Use an Opus decoder that doesn't need the metadata.
3. Wait until COBALT_11 is released, in which version the size limit of |audio_specific_config| is removed. But this may not be feasible per your 2017 release schedule.
4. Increase the size of SbMediaAudioHeader::audio_specific_config to a larger number (say 1024). This will make your future rebase slightly harder.

Related

Convert m3u8 (HLS) to mpd (MPEG-DASH)

I have Live stream of HLS [https://82-80-192-30.vidnt.com/ipbc_IPBCchannel11LVMRepeat/definst/IPBCchannel11LVM_3.stream/playlist.m3u8] and I want to convert it to MPEG-DASH.
What is the best practice?
The stream is already h264 aac therefore I understand I do not need to reencode and I just need to transmux.
What should I use?
ffmpeg? mp4box?
Notes:
I used nginx-rtmp-module (https://github.com/ut0mt8/nginx-rtmp-module/) in order to create DASH from RTMP stream according to this tutorial: https://isrv.pw/html5-live-streaming-with-mpeg-dash
But nginx-rtmp-module can get as input just rtmp streams and it did not work for me with HLS stream.
I used ffmpeg in order to create dash from m3u8 as following:
ffmpeg -i https://82-80-192-30.vidnt.com/ipbc_IPBCchannel11LVMRepeat/_definst_/IPBCchannel11LVM_3.stream/playlist.m3u8 -strict -2 -min_seg_duration 2000 -window_size 5 -extra_window_size 5 -use_template 1 -use_timeline 1 -f dash out.mpd
But this is very limited. I can't control the segment duration.
The min_seg_duration parameter of ffmpeg does not work very well for me, and also it can set the minimum duration while I want to limit the maximum duration of each segment (the segment comes out with ~10 seconds, while I need it to be ~2-4 seconds as I'm playing live).

Firstly it is worth saying that if you can avoid doing this you will be saving yourself a whole lot of work!
Most devices and clients these days can play both HLS and DASH streams, so the usual approach is to add any extra functionality needed in your app or client.
If you do have to convert server side, then its worth being aware that while HLS streams typically used TS segments in the past, recently support for fragmented MP4 has become available within the HLS ecosystem.
If you have TS video streams then you will need to do a conversion along the lines you outline above with ffmpeg.
If you have fragmented MP4 then you should actually have the correct format already and may find you just have to create the manifest file so DASH can access the fragmented mp4 streams.
All the above assumes that your content is not encrypted or that you don't have to support encryption - if it is then you may not be able to convert the media, or you may have to also encrypt the media differently for some streams than others, as currently most deployed windows and chrome devices and browsers use a slightly different encryption approach (a different AES mode) than Apple devices.

FFmpeg: what does the "global_header" flag do?

According to the description, it places global headers in extradata instead of every keyframe.
But what's the actual purpose? Is it useful e.g. for streaming?
I guess that the resulting video file will be marginally shorter, but more prone to corruption (no redundancy = file unplayable if main header corrupted or only partially downloaded)?
Or maybe it somehow improves decoding a little bit? As headers are truly global and cannot change from keyframe to keyframe?
Thanks!

By extradata FFmpeg means out-of-band, as opposed to in-band. The behavior of the flag is format specific. This is useful for headers that are not expected to change because it reduces the overhead.
Example: for H.264 in MP4 the SPS and PPS are stored in the avcC atom. For the same H.264 stream in let's say MPEG-TS the sequences are repeated in the bitstream.

FFMpeg - Is it difficultt to use

I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.

Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.

FFMPEG: Frame parameter initializations in HEVC decoder

I'm going through the HEVC decoder integrated in FFMPEG. I'm actually trying to understand its flow and working.
By flow, i mean the part in code where it is reading various parameters of the input .bin file. Like where is it reading the resolution, where is it deciding the fps that it needs to play, the output display format that is yuv420p etc.
Initially What i suspected is the demuxer of hevc situated at /libavformat/hevcdec.c In this file does the input file reading work. There is a probe function which is used to detect which decoder to select while decoding the input bin stream. Further we have a FF_DEF_RAWVIDEO_DEMUXER. Is it in this function that the resolution and other parameters read from the input file?
Secondly what i suspect is the hevc parser situated at: /libavcodec/hevc_parser.c but here i think it is just parsing the frame data, that is finding end of frame. So, is this assumption of mine right?
Any suggestions or any predictions will be really helpful to me. Please provide your valuable suggestions. Thanks in advance.

To understand more specifically what is going on in decoder, it's better to start your study with HEVC/H.265 standard (http://www.itu.int/rec/T-REC-H.265). It contains all the information you need to know to find the location of resolution, fps, etc.
If you want to get more details from FFMPEG, here are some hints:
/libavcodec/hevc_parser.c contains H.265 Annex B parser, which converts byte stream into series of NAL units. Each NAL unit has its own format and should be parsed depending on its NAL unit type.
If you are looking for basic properties of video sequence, you may be interested in SPS (Sequence Parameter Set) parsing. Its format is described in section 7.3.2.2.1 of the standard and there is a function ff_hevc_decode_nal_sps located in /libavcodec/hevc_ps.c which extracts SPS parameters from the bitstream.
Note: I was talking about FFMPEG version 2.5.3. Code structure can be different for other versions.

What is the minimum amount of metadata is needed to stream only video using libx264 to encode at the server and libffmpeg to decode at the client?

I want to stream video (no audio) from a server to a client. I will encode the video using libx264 and decode it with ffmpeg. I plan to use fixed settings (at the very least they will be known in advance by both the client and the server). I was wondering if I can avoid wrapping the compressed video in a container format (like mp4 or mkv).
Right now I am able to encode my frames using x264_encoder_encode. I get a compressed frame back, and I can do that for every frame. What extra information (if anything at all) do I need to send to the client so that ffmpeg can decode the compressed frames, and more importantly how can I obtain it with libx264. I assume I may need to generate NAL information (x264_nal_encode?). Having an idea of what is the minimum necessary to get the video across, and how to put the pieces together would be really helpful.

I found out that the minimum amount of information are the NAL units from each frame, this will give me a raw h264 stream. If I were to write this to a file, I could watchit using VLC if adding a .h264
I can also open such a file using ffmpeg, but if I want to stream it, then it makes more sense to use RTSP, and a good open source library for that is Live555: http://www.live555.com/liveMedia/
In their FAQ they mention how to send the output from your encoder to live555, and there is source for both a client and a server. I have yet to finish coding this, but it seems like a reasonable solution

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio