Read from UDP Multicast RTSP Video Stream - ffmpeg

I am currently developing an application that needs to decode a UDP multicast RTSP stream. At the moment, I can view the RTP stream using ffplay via
ffplay -rtsp_transport udp_multicast rtsp://streamURLGoesHere
However, I am trying to use FFMPEG to open the UDP stream via (error checking and cleanup code removed for the sake of brevity).
AVFormatContext* ctxt = NULL;
av_open_input_file(
&ctxt,
urlString,
NULL,
0,
NULL
);
av_find_stream_info(ctxt);
AVCodecContext* codecCtxt;
int videoStreamIdx = -1;
for (int i = 0; i < ctxt->nb_streams; i++)
{
if (ctxt->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
{
videoStreamIdx = i;
break;
}
}
AVCodecContext* codecCtxt = ctxt->streams[videoStreamIdx]->codec;
AVCodec* codec = avcodec_fine_decoder(codecCtxt->codec_id);
avcodec_open(codecCtxt, codec);
AVPacket packet;
while(av_read_frame(ctxt, &packet) >= 0)
{
if (packet.stream_index == videoStreamIdx)
{
/// Decoding performed here
...
}
}
...
This approach works fine with file inputs that consist of a raw encoded video stream, but for UDP multicast RTSP streams, it fails any error checking performed on av_open_input_file(). Please advise...

It turns out that opening a multicast UDP RTSP stream can be performed via the following:
AVFormatContext* ctxt = avformat_alloc_context();
AVDictionary* options = NULL;
av_dict_set(&options, "rtsp_transport", "udp_multicast", 0);
avformat_open_input(
&ctxt,
urlString,
NULL,
&options
);
...
avformat_free_context(ctxt);
Using avformat_open_input() in this manner instead of av_open_input_file() results in the desired behavior. I'm guessing that av_open_input_file() is either deprecated or was never intended to be used in this manner -- more than likely the latter ;)

Related

Encoding images to h264 and rtp output: SDP file without sprop-parameter-sets does not play

tl;dr: I try to encode acquired camera frames to h264, send via RTP
and play this back on another device. SDP file generated by ffmpeg for
a sample video has info which my own SDP file misses. My SDP file
plays in ffplay, but not VLC, while both play ffmpeg's SDP file. I am
suspecting missing sprop-parameter-sets in my SDP file.
Ultimately I want to play this back in VLC.
I am writing code that encodes images to h264 and outputs to an RTP
server (or client? anyway the part that is listening). I generate an
SDP file for this.
ffplay plays the stream without problem
mplayer shows a green box embedded in a larger black box, but I read
somewhere it only supports mpegts over RTP, so not sure
VLC does not play the SDP file.
Now when instead I use some random video and have ffmpeg output an SDP
file like so
ffmpeg -re -i some.mp4 -an -c:v copy -f rtp -sdp_file
video.sdp "rtp://127.0.0.1:5004"
I can see that the generated SDP file – which plays in both ffplay and
VLC – includes the base64 encoded sprop-parameter-sets field, and
removing this causes the stream to not play.
> cat video.sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 127.0.0.1
t=0 0
a=tool:libavformat 58.76.100
m=video 5004 RTP/AVP 96
b=AS:1034
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1;
sprop-parameter-sets=Z2QANKzZQDAA7fiMBagICAoAAAMAAgAAAwDwHjBjLA==,aOvjyyLA;
profile-level-id=640034
My own SDP file on the other hand, does not contain this information,
and VLC hangs for 10s and then stops trying with "no data received".
> cat test.sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 127.0.0.1
t=0 0
a=tool:libavformat 58.76.100
m=video 44499 RTP/AVP 96
b=AS:2000
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1
So my theory is that my custom code must somehow add this SPS
information to the SDP file. But despite hours of searching, I could
not find a structured way to set the extradata field on the AVStream's
AVCodecParams. The code I'm using is roughly this (I'm sure there's
unrelated errors in there):
// variables
std::vector<std::uint8_t> imgbuf;
AVFormatContext *ofmt_ctx = nullptr;
AVCodec *out_codec = nullptr;
AVStream *out_stream = nullptr;
AVCodecContext *out_codec_ctx = nullptr;
SwsContext *swsctx = nullptr;
cv::Mat canvas_;
unsigned int height_;
unsigned int width_;
unsigned int fps_;
AVFrame *frame_ = nullptr;
AVOutputFormat *format = av_guess_format("rtp", nullptr, nullptr);
const auto url = std::string("rtp://127.0.0.1:5001");
avformat_alloc_output_context2(ofmt_ctx, format, format->name, url.c_str());
out_codec = avcodec_find_encoder(AV_CODEC_ID_H264);
stream = avformat_new_stream(ofmt_ctx, out_codec);
out_codec_ctx = avcodec_alloc_context3(out_codec);
// then, for each incoming image:
while (receive_image) {
static bool first_time = true;
if (first_time) {
// discover necessary params such as image dimensions from the first
// received image
first_time = false;
height_ = image.rows;
width_ = image.cols;
codec_ctx->codec_tag = 0;
codec_ctx->bit_rate = 2e6;
// does nothing, unfortunately
codec_ctx->thread_count = 1;
codec_ctx->codec_id = AV_CODEC_ID_H264;
codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->width = width_;
codec_ctx->height = height_;
codec_ctx->gop_size = 6;
codec_ctx->pix_fmt = AV_PIX_FMT_YUV420P;
codec_ctx->framerate = fps_;
codec_ctx->time_base = av_inv_q(fps_);
avcodec_parameters_from_context(stream, out_codec_ctx);
// this stuff is empty: is that the problem?
stream->codecpar->extradata = codec_ctx->extradata;
stream->codecpar->extradata_size = codec_ctx->extradata_size;
AVDictionary *codec_options = nullptr;
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "preset", "ultrafast", 0);
av_dict_set(&codec_options, "tune", "zerolatency", 0);
// open video encoder
avcodec_open2(codec_ctx, codec, &codec_options);
stream->time_base.num = 1;
stream->time_base.den = fps_;
avio_open(&(ofmt_ctx->pb), ofmt_ctx->filename, AVIO_FLAG_WRITE);
/* Write a file for VLC */
char buf[200000];
AVFormatContext *ac[] = {ofmt_ctx};
av_sdp_create(ac, 1, buf, 20000);
printf("sdp:\n%s\n", buf);
FILE *fsdp = fopen("test.sdp", "w");
fprintf(fsdp, "%s", buf);
fclose(fsdp);
swsctx = sws_getContext(width_, height_, AV_PIX_FMT_BGR24, width_, height_,
out_codec_ctx->pix_fmt, SWS_BICUBIC, nullptr,
nullptr, nullptr);
}
if (!frame_) {
frame_ = av_frame_alloc();
std::uint8_t *framebuf = new uint8_t[av_image_get_buffer_size(
codec_ctx->pix_fmt, width_, height_, 1)];
av_image_fill_arrays(frame_->data, frame_->linesize, framebuf,
codec_ctx->pix_fmt, width, height, 1);
frame_->width = width_;
frame_->height = height_;
frame_->format = static_cast<int>(codec_ctx->pix_fmt);
success = avformat_write_header(ofmt_ctx, nullptr);
}
if (imgbuf.empty()) {
imgbuf.resize(height_ * width_ * 3 + 16);
canvas_ = cv::Mat(height_, width_, CV_8UC3, imgbuf.data(), width_ * 3);
} else {
image.copyTo(canvas_);
}
const int stride[] = {static_cast<int>(image.step[0])};
sws_scale(swsctx, &canvas_.data, stride, 0, canvas_.rows, frame_->data,
frame_->linesize);
frame_->pts += av_rescale_q(1, out_codec_ctx->time_base, stream->time_base);
AVPacket pkt = {0};
avcodec_send_frame(out_codec_ctx, frame_);
avcodec_receive_packet(out_codec_ctx, &pkt);
av_interleaved_write_frame(ofmt_ctx, &pkt);
}
Can anyone offer some advice here?
--
Update
When setting
this->out_codec_ctx->flags |=AV_CODEC_FLAG_GLOBAL_HEADER;
extradata is actually present in the codec context, but I had to move avcodec_parameters_from_context() after avcodec_open2(), as the extradata is empty before opening the codec. I now get sprop-parameter-sets in the SDP file, but VLC still does not play it.
The solution in my case was the port number (???). Apparently, VLC cannot receive from 44499 which is the port I was using, but 5004 like the ffmpeg example works. I don't know if this is a MacOS idiosyncrasy or transfers to linux as well.
I tried several ports:
5001 does not work
5002 works
5003 does not work
5004 works
5005 does not work
5006 works
44498 works
So it seems that for VLC to receive RTP packets, the port number must be even-numbered? Wat?
The explanation seems to be that live555 discards the lsb of the port number: https://github.com/rgaufman/live555/blob/master/liveMedia/MediaSession.cpp#L696
So only even ports make it through unchanged. This is recommended or mandated in the RFC:
For UDP and similar protocols,
RTP SHOULD use an even destination port number and the corresponding
RTCP stream SHOULD use the next higher (odd) destination port number.

Encoding of raw frames (D3D11Texture2D) to an rtsp stream using libav*

I have managed to create a rtsp stream using libav* and directX texture (which I am obtaining from GDI API using Bitblit method). Here's my approach for creating live rtsp stream:
Create output context and stream (skipping the checks here)
avformat_alloc_output_context2(&ofmt_ctx, NULL, "rtsp", rtsp_url); //RTSP
vid_codec = avcodec_find_encoder(ofmt_ctx->oformat->video_codec);
vid_stream = avformat_new_stream(ofmt_ctx,vid_codec);
vid_codec_ctx = avcodec_alloc_context3(vid_codec);
Set codec params
codec_ctx->codec_tag = 0;
codec_ctx->codec_id = ofmt_ctx->oformat->video_codec;
//codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->width = width; codec_ctx->height = height;
codec_ctx->gop_size = 12;
//codec_ctx->gop_size = 40;
//codec_ctx->max_b_frames = 3;
codec_ctx->pix_fmt = target_pix_fmt; // AV_PIX_FMT_YUV420P
codec_ctx->framerate = { stream_fps, 1 };
codec_ctx->time_base = { 1, stream_fps};
if (fctx->oformat->flags & AVFMT_GLOBALHEADER)
{
codec_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
Initialize video stream
if (avcodec_parameters_from_context(stream->codecpar, codec_ctx) < 0)
{
Debug::Error("Could not initialize stream codec parameters!");
return false;
}
AVDictionary* codec_options = nullptr;
if (codec->id == AV_CODEC_ID_H264) {
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "preset", "fast", 0);
av_dict_set(&codec_options, "tune", "zerolatency", 0);
}
// open video encoder
int ret = avcodec_open2(codec_ctx, codec, &codec_options);
if (ret<0) {
Debug::Error("Could not open video encoder: ", avcodec_get_name(codec->id), " error ret: ", AVERROR(ret));
return false;
}
stream->codecpar->extradata = codec_ctx->extradata;
stream->codecpar->extradata_size = codec_ctx->extradata_size;
Start streaming
// Create new frame and allocate buffer
AVFrame* AllocateFrameBuffer(AVCodecContext* codec_ctx, double width, double height)
{
AVFrame* frame = av_frame_alloc();
std::vector<uint8_t> framebuf(av_image_get_buffer_size(codec_ctx->pix_fmt, width, height, 1));
av_image_fill_arrays(frame->data, frame->linesize, framebuf.data(), codec_ctx->pix_fmt, width, height, 1);
frame->width = width;
frame->height = height;
frame->format = static_cast<int>(codec_ctx->pix_fmt);
//Debug::Log("framebuf size: ", framebuf.size(), " frame format: ", frame->format);
return frame;
}
void RtspStream(AVFormatContext* ofmt_ctx, AVStream* vid_stream, AVCodecContext* vid_codec_ctx, char* rtsp_url)
{
printf("Output stream info:\n");
av_dump_format(ofmt_ctx, 0, rtsp_url, 1);
const int width = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureWidth();
const int height = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureHeight();
//DirectX BGRA to h264 YUV420p
SwsContext* conversion_ctx = sws_getContext(width, height, src_pix_fmt,
vid_stream->codecpar->width, vid_stream->codecpar->height, target_pix_fmt,
SWS_BICUBIC | SWS_BITEXACT, nullptr, nullptr, nullptr);
if (!conversion_ctx)
{
Debug::Error("Could not initialize sample scaler!");
return;
}
AVFrame* frame = AllocateFrameBuffer(vid_codec_ctx,vid_codec_ctx->width,vid_codec_ctx->height);
if (!frame) {
Debug::Error("Could not allocate video frame\n");
return;
}
if (avformat_write_header(ofmt_ctx, NULL) < 0) {
Debug::Error("Error occurred when writing header");
return;
}
if (av_frame_get_buffer(frame, 0) < 0) {
Debug::Error("Could not allocate the video frame data\n");
return;
}
int frame_cnt = 0;
//av start time in microseconds
int64_t start_time_av = av_gettime();
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
// frame pixel data info
int data_size = width * height * 4;
uint8_t* data = new uint8_t[data_size];
// AVPacket* pkt = av_packet_alloc();
while (RtspStreaming::IsStreaming())
{
/* make sure the frame data is writable */
if (av_frame_make_writable(frame) < 0)
{
Debug::Error("Can't make frame writable");
break;
}
//get copy/ref of the texture
//uint8_t* data = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetBuffer();
if (!WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetPixels(data, 0, 0, width, height))
{
Debug::Error("Failed to get frame buffer. ID: ", RtspStreaming::WindowId());
std::this_thread::sleep_for (std::chrono::seconds(2));
continue;
}
//printf("got pixels data\n");
// convert BGRA to yuv420 pixel format
int srcStrides[1] = { 4 * width };
if (sws_scale(conversion_ctx, &data, srcStrides, 0, height, frame->data, frame->linesize) < 0)
{
Debug::Error("Unable to scale d3d11 texture to frame. ", frame_cnt);
break;
}
//Debug::Log("frame pts: ", frame->pts, " time_base:", av_rescale_q(1, vid_codec_ctx->time_base, vid_stream->time_base));
frame->pts = frame_cnt++;
//frame_cnt++;
//printf("scale conversion done\n");
//encode to the video stream
int ret = avcodec_send_frame(vid_codec_ctx, frame);
if (ret < 0)
{
Debug::Error("Error sending frame to codec context! ",frame_cnt);
break;
}
AVPacket* pkt = av_packet_alloc();
//av_init_packet(pkt);
ret = avcodec_receive_packet(vid_codec_ctx, pkt);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
{
//av_packet_unref(pkt);
av_packet_free(&pkt);
continue;
}
else if (ret < 0)
{
Debug::Error("Error during receiving packet: ",AVERROR(ret));
//av_packet_unref(pkt);
av_packet_free(&pkt);
break;
}
if (pkt->pts == AV_NOPTS_VALUE)
{
//Write PTS
//Duration between 2 frames (us)
int64_t calc_duration = (double)AV_TIME_BASE / av_q2d(vid_stream->r_frame_rate);
//Parameters
pkt->pts = (double)(frame_cnt * calc_duration) / (double)(av_q2d(time_base) * AV_TIME_BASE);
pkt->dts = pkt->pts;
pkt->duration = (double)calc_duration / (double)(av_q2d(time_base) * AV_TIME_BASE);
}
int64_t pts_time = av_rescale_q(pkt->dts, time_base, time_base_q);
int64_t now_time = av_gettime() - start_time_av;
if (pts_time > now_time)
av_usleep(pts_time - now_time);
//pkt.pts = av_rescale_q_rnd(pkt.pts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.dts = av_rescale_q_rnd(pkt.dts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);
//pkt->pos = -1;
//write frame and send
if (av_interleaved_write_frame(ofmt_ctx, pkt)<0)
{
Debug::Error("Error muxing packet, frame number:",frame_cnt);
break;
}
//Debug::Log("RTSP streaming...");
//sstd::this_thread::sleep_for(std::chrono::milliseconds(1000/20));
//av_packet_unref(pkt);
av_packet_free(&pkt);
}
//av_free_packet(pkt);
delete[] data;
/* Write the trailer, if any. The trailer must be written before you
* close the CodecContexts open when you wrote the header; otherwise
* av_write_trailer() may try to use memory that was freed on
* av_codec_close(). */
av_write_trailer(ofmt_ctx);
av_frame_unref(frame);
av_frame_free(&frame);
printf("streaming thread CLOSED!\n");
}
Now, this allows me to connect to my rtsp server and maintain the connection. However, on the rtsp client side I am getting either gray or single static frame as shown below:
Would appreciate if you can help with following questions:
Firstly, why the stream is not working in spite of continued connection to the server and updating frames?
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264? When I manually set it to AV_CODEC_ID_H264 the program fails at avcodec_open2 with return value of -22.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
Would be great if you can point out some example code for creating livestream. I have checked following resources:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/encode_video.c
streaming FLV to RTMP with FFMpeg using H264 codec and C++ API to flv.js
https://medium.com/swlh/streaming-video-with-ffmpeg-and-directx-11-7395fcb372c4
Update
While test I found that I am able to play the stream using ffplay, while it's getting stuck on VLC player. Here is snapshot on the ffplay log
The basic construct and initialization seems to be okay. Find below responses to your questions
why the stream is not working in spite of continued connection to the server and updating frames?
If you're getting an error or broken stream, you might wanna check into your presentation and decompression timestamps (pts/dts) of your packet.
In your code, I notice that you're taking time_base from video stream object which is not guranteed to be same as codec->time_base value and usually varies depending upon active stream.
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264?
I don't see why not... RTSP is just a protocol for carrying your packets over the network. So you should be able use AV_CODEC_ID_H264 for encoding the stream.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
In libav during encoding process a single packet is used for encoding a video frame, while there can be multiple audio frames in a single packet. I should reference this, but can't seem to find any source at the moment. But anyways the point is you would need to create new packet every time.
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
You don't need to add anything else for real time streaming. Although you might wanna implement it to limit the number of frame updates that you pass to encoder and save some performance.
for me the difference between ffplay good capture and VLC bad capture (for UDP packets) was pkt_size=xxx attribute (ffmpeg -re -i test.mp4 -f mpegts udp://127.0.0.1:23000?pkt_size=1316) (VLC open media network tab udp://#:23000:pkt_size=1316). So only if pkt_size is defined (and equal) VLC is able to capture.

RTMP live stream directly from NVENC encoder

I am trying to create a live RTMP stream containing the animation generated with NVIDIA OptiX. The stream is to be received by nginx + rtmp module and broadcasted in MPEG-DASH format. Full chain up to dash.js player is working if the video is first saved to .flv file and then I send it with ffmpeg without any reformatting using command:
ffmpeg -re -i my_video.flv -c:v copy -f flv rtmp://x.x.x.x:1935/dash/test
But I want to stream directly from the code. And with this I am failng... Nginx logs an error "dash: invalid avcc received (2: No such file or directory)". Then it seems to receive the stream correctly (segments are rolling, dash manifest is there), however the stream is not possible to play in the browser.
I can see only one difference in the manifest between direct stream and stream from file. Codecs attribute of the representation in the direct stream is missed: codecs="avcc1.000000" instead of "avc1.640028" which I get when streaming from file.
My code opens the stream:
av_register_all();
AVOutputFormat* fmt = av_guess_format("flv",
file_name, nullptr);
fmt->video_codec = AV_CODEC_ID_H264;
AVFormatContext* _oc;
avformat_alloc_output_context2(&_oc, fmt, nullptr, "rtmp://x.x.x.x:1935/dash/test");
AVStream* _vs = avformat_new_stream(_oc, nullptr);
_vs->id = 0;
_vs->time_base = AVRational { 1, 25 };
_vs->avg_frame_rate = AVRational{ 25, 1 };
AVCodecParameters *vpar = _vs->codecpar;
vpar->codec_id = fmt->video_codec;
vpar->codec_type = AVMEDIA_TYPE_VIDEO;
vpar->format = AV_PIX_FMT_YUV420P;
vpar->profile = FF_PROFILE_H264_HIGH;
vpar->level = _level;
vpar->width = _width;
vpar->height = _height;
vpar->bit_rate = _avg_bitrate;
avio_open(&_oc->pb, _oc->filename, AVIO_FLAG_WRITE);
avformat_write_header(_oc, nullptr);
Width, height, bitrate, level and profile I get from NVENC encoder settings. I also do the error checking, ommited here. Then I have a loop writing each encoded packets, with IDR frames etc all prepared on the fly with NVENC. The loop body is:
auto & pkt_data = _packets[i];
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.pts = av_rescale_q(_n_frames++, AVRational{ 1, 25 }, _vs->time_base);
pkt.duration = av_rescale_q(1, AVRational{ 1, 25 }, _vs->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = _vs->index;
pkt.data = pkt_data.data();
pkt.size = (int)pkt_data.size();
if (!memcmp(pkt_data.data(), "\x00\x00\x00\x01\x67", 5))
{
pkt.flags |= AV_PKT_FLAG_KEY;
}
av_write_frame(_oc, &pkt);
Obviously ffmpeg is writing avcc code somewhere... I have no clue where to add this code so the RTMP server can recognize it. Or I am missing something else?
Any hint greatly appreciated, folks!
Thanks to Gyan's comment I was able to solve the issue. Following the AV_CODEC_FLAG_GLOBAL_HEADER flag in the wrapper one can see how the global header is added, which was missing in my case. You can use directly the NVENC API function nvEncGetSequenceParams, but since I am anyway using SDK, it is a bit cleaner.
So I had to attach the header to AVCodecParameters::extradata:
std::vector<uint8_t> payload;
_encoder->GetSequenceParams(payload);
vpar->extradata_size = payload.size();
vpar->extradata = (uint8_t*)av_mallocz(payload.size() + AV_INPUT_BUFFER_PADDING_SIZE);
memcpy(vpar->extradata, payload.data(), payload.size());
_encoder is my instance of NvEncoder from SDK.
The wrapper is doing the same thing, however using deprecated struct AVCodecContext.

sync dumped rtp streams [duplicate]

Hi I am in a need of a bit of a help/guidance because I got stuck in my research.
The problem:
How to convert RTP data using either gstreamer or avlib (ffmpeg) in either API (by programming) or console versions.
Data
I have RTP dump that comes from RTP/RTCP over TCP so I can get the precise start and stop for each RTP packet in file. It's a H264 video stream dump.
The data is in this fashion because I need to acquire the RTCP/RTP interleaved stream via libcurl (which I'm currently doing)
Status
I've tried to use ffmpeg to consume pure RTP packets but is seems that using rtp either by console or by programming involves "starting" the whole rtsp/rtp session business in ffmpeg. I've stopped there and for the time being I didn't pursue this avenue deeper. I guess this is possible with lover level RTP API like ff_rtp_parse_packet() I'm too new with this lib to do it straight out.
Then there is the gstreamer It has somewhat more capabilities to do it without programming, but for the time being I'm not able to figure out how to pass it the RTP dump I have.
I have also tried to do a little bit of a trickery and stream the dump via socat/nc to the udp port and listen on it via ffplay with sdp file as an input, there seems to be some progress the rtp at least gets recognized, but for socat there are loads of packet missing (data sent too fast perhaps?) and in the end the data is not visualized. When I used nc the video was badly misshapen but at least there were not that much receive errors.
One way or another the data is not properly visualized.
I know I can depacketize the data "by hand" but the idea is to do it via some kind of library because in the end there would also be second stream with audio that would have to be muxed together with the video.
I would appreciate any help on how to tackle this problem.
Thanks.
Finally after some period of time I had time to sit down at this problem again, and finally I've got the solution that satisfies me. I went on with RTP interleaved stream (RTP is interleaved with RTCP over single TCP connection).
So I had a interleaved RTCP/RTP stream that needed to be disassembled to Audio (PCM A-Law) and Video (h.264 Constrained baseline) RTP packets.
The decomposition of the RTSP stream containing RTP data is described here rfc2326.
Depacketization of the H264 is described here rfc6184, for the PCM A-Law the frames came out to be raw audio in RTP so no depacketization was necessary.
Next step was to calculate proper PTS (or presentation time stamp) for each stream, that was a bit of a hassle but finally the Live555 code came to help
(see RTP lipsync synchronization).
The last task was to mux it into a container that would support PCM alaw, I've used ffmpeg's avlibraries.
There are many examples over the Internet but many of them are outdated (ffmpeg is very 'dynamic' in API changes region) so I'm posting (most important parts of) what actually worked for me in the end:
The setup part:
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include "libavutil/intreadwrite.h"
#include "libavutil/mathematics.h"
AVFormatContext *formatContext;
AVOutputFormat *outputFormat;
AVStream *video_st;
AVStream *audio_st;
AVCodec *av_encode_codec = NULL;
AVCodec *av_audio_encode_codec = NULL;
AVCodecContext *av_video_encode_codec_ctx = NULL;
AVCodecContext *av_audio_encode_codec_ctx = NULL;
av_register_all();
av_log_set_level(AV_LOG_TRACE);
outputFormat = av_guess_format(NULL, pu8outFileName, NULL);
outputFormat->video_codec = AV_CODEC_ID_H264;
av_encode_codec = avcodec_find_encoder(AV_CODEC_ID_H264);
av_audio_encode_codec = avcodec_find_encoder(AV_CODEC_ID_PCM_ALAW);
avformat_alloc_output_context2(&formatContext, NULL, NULL, pu8outFileName);
formatContext->oformat = outputFormat;
strcpy(formatContext->filename, pu8outFileName);
outputFormat->audio_codec = AV_CODEC_ID_PCM_ALAW;
av_video_encode_codec_ctx = avcodec_alloc_context3(av_encode_codec);
av_audio_encode_codec_ctx = avcodec_alloc_context3(av_audio_encode_codec);
av_video_encode_codec_ctx->codec_id = outputFormat->video_codec;
av_video_encode_codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
av_video_encode_codec_ctx->bit_rate = 4000;
av_video_encode_codec_ctx->width = u32width;
av_video_encode_codec_ctx->height = u32height;
av_video_encode_codec_ctx->time_base = (AVRational){ 1, u8fps };
av_video_encode_codec_ctx->max_b_frames = 0;
av_video_encode_codec_ctx->pix_fmt = AV_PIX_FMT_YUV420P;
av_audio_encode_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16;
av_audio_encode_codec_ctx->codec_id = AV_CODEC_ID_PCM_ALAW;
av_audio_encode_codec_ctx->codec_type = AVMEDIA_TYPE_AUDIO;
av_audio_encode_codec_ctx->sample_rate = 8000;
av_audio_encode_codec_ctx->channels = 1;
av_audio_encode_codec_ctx->time_base = (AVRational){ 1, u8fps };
av_audio_encode_codec_ctx->channel_layout = AV_CH_LAYOUT_MONO;
video_st = avformat_new_stream(formatContext, av_encode_codec);
audio_st = avformat_new_stream(formatContext, av_audio_encode_codec);
audio_st->index = 1;
video_st->avg_frame_rate = (AVRational){ 90000, 90000 / u8fps };
av_stream_set_r_frame_rate(video_st, (AVRational){ 90000, 90000 / u8fps });
The packets for video are written like this:
uint8_t *pu8framePtr = video_frame;
AVPacket pkt = { 0 };
av_init_packet(&pkt);
if (0x65 == pu8framePtr[4] || 0x67 == pu8framePtr[4] || 0x68 == pu8framePtr[4])
{
pkt.flags = AV_PKT_FLAG_KEY;
}
pkt.data = (uint8_t *)pu8framePtr;
pkt.size = u32LastFrameSize;
pkt.pts = av_rescale_q(s_video_sync.fSyncTime.tv_sec * 1000000 + s_video_sync.fSyncTime.tv_usec, (AVRational){ 1, 1000000 }, video_st->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = video_st->index;
av_interleaved_write_frame(formatContext, &pkt);
av_packet_unref(&pkt);
and for the audio like this:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.flags = AV_PKT_FLAG_KEY;
pkt.data = (uint8_t *)pu8framePtr;
pkt.size = u32AudioDataLen;
pkt.pts = av_rescale_q(s_audio_sync.fSyncTime.tv_sec * 1000000 + s_audio_sync.fSyncTime.tv_usec, (AVRational){ 1, 1000000 }, audio_st->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = audio_st->index;
if (u8FirstIFrameFound) {av_interleaved_write_frame(formatContext, &pkt);}
av_packet_unref(&pkt)
and at the end some deinits:
av_write_trailer(formatContext);
av_dump_format(formatContext, 0, pu8outFileName, 1);
avcodec_free_context(&av_video_encode_codec_ctx);
avcodec_free_context(&av_audio_encode_codec_ctx);
avio_closep(&formatContext->pb);
avformat_free_context(formatContext);

Transcoding videos with LibAvFormat for playback on iOS devices

I’m trying to transcode a video on my iOS app using FFMpeg/LibAv.
What I’m trying to accomplish is to transcode a video in order to resize each frame and possibly lower the bitrate in order to save valuable MB in the device.
The resulting video must be playable on all iPhone5+ devices.
After reading the documentation I found out that:
I do not need to encode/decode the audio stream -> I’ll copy as-is to the output file
I need to encode the video using the h264 codec (LibX264) with a profile supported by iOS (baseline profile with level 3.0 - https://trac.ffmpeg.org/wiki/Encode/H.264#Compatibility)
I’m also setting the picture format to YUV planar since it’s the only one supported by iOS
For the sake of testing I’m not using any filter (I’m using a dummy/passthrough) at all or even trying to lower the bitrate, I’m just trying to decode the video stream and encode it again
Most of the code is based on the transcoding.c and filtering.c available on the FFMpeg examples directory
FFMpeg-wise what I’m trying to achieve with LibAv is:
ffmpeg -i INPUT.MOV -c:v libx264 -preset ultrafast -profile:v baseline -level 3.0 -c:a copy output.MOV
(the resulting file - which can be found below - is playable on QuickTime if it’s generated by FFMpeg through the command line)
The original video was generated with a regular iPhone using iOS 8.2 but the problem is not device specific or iOS specific, it occurs on all videos generated with LibAv.
Although both resulting files are playable by VideoLan (VLC) the one I generated through LibAv is not playable by QuickTime even though I can’t find anything wrong with it.
As you can see below, I create the video stream with the proper video codec on the call to avformat_new_stream:
AVStream *out_stream; // output stream
AVStream *in_stream; // input stream
AVCodecContext *dec_ctx, *enc_ctx; // codec context for the stream
AVCodec *encoder; // codec used
int ret;
unsigned int i;
ofmt_ctx = NULL;
// Allocate an AVFormatContext for an output format. This will be the file header (similar to avformat_open_input but with an zero'ed memory)
avformat_alloc_output_context2(&ofmt_ctx, NULL, NULL, filename);
if (!ofmt_ctx) {
av_log(NULL, AV_LOG_ERROR, "Could not create output context\n");
[self errorWith:kErrorCreatingOutputContext and:#"Could not create output context"];
return AVERROR_UNKNOWN;
}
// we must not use the AVCodecContext from the video stream directly! So we have to use avcodec_copy_context() to copy the context to a new location (after allocating memory for it, of course).
// iterate over all input streams
for (i = 0; i < ifmt_ctx->nb_streams; i++) {
in_stream = ifmt_ctx->streams[i]; // input stream
dec_ctx = in_stream->codec; // get the codec context for the decoder
if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
// lets use h264
encoder = avcodec_find_encoder(AV_CODEC_ID_H264);
if (!encoder) {
[self errorWith:kErrorCodecNotFound and:#"H264 Codec Not Found"];
return AVERROR_UNKNOWN;
}
out_stream = avformat_new_stream(ofmt_ctx, encoder); // create a new stream with h264 codec
if (!out_stream) {
av_log(NULL, AV_LOG_ERROR, "Failed allocating output stream\n");
[self errorWith:kErrorAllocateOutputStream and:#"Failed allocating output stream"];
return AVERROR_UNKNOWN;
}
enc_ctx = out_stream->codec; // pointer to the stream codec context
/* we transcode to same properties (picture size,
* sample rate etc.). These properties can be changed for output
* streams easily using filters */
if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
enc_ctx->width = dec_ctx->width;
enc_ctx->height = dec_ctx->height;
enc_ctx->sample_aspect_ratio = dec_ctx->sample_aspect_ratio;
enc_ctx->pix_fmt = AV_PIX_FMT_YUV420P;
enc_ctx->time_base = dec_ctx->time_base;
av_opt_set(enc_ctx->priv_data, "preset", "ultrafast", 0);
av_opt_set(enc_ctx->priv_data, "profile", "baseline", 0);
av_opt_set(enc_ctx->priv_data, "level", "3.0", 0);
}
out_stream->time_base = in_stream->time_base;
AVDictionaryEntry *tag = NULL;
while ((tag = av_dict_get(in_stream->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
printf("%s=%s\n", tag->key, tag->value);
char *k = av_strdup(tag->key); // if your strings are already allocated,
char *v = av_strdup(tag->value); // you can avoid copying them like this
av_dict_set(&out_stream->metadata, k, v, 0);
}
ret = avcodec_open2(enc_ctx, encoder, NULL);
if (ret < 0) {
av_log(NULL, AV_LOG_ERROR, "Cannot open video encoder for stream #%u\n", i);
[self errorWith:kErrorCantOpenOutputFile and:[NSString stringWithFormat:#"Cannot open video encoder for stream #%u",i]];
return ret;
}
}
else if(dec_ctx->codec_type == AVMEDIA_TYPE_UNKNOWN) {
// if we cant figure out the stream type, fail
av_log(NULL, AV_LOG_FATAL, "Elementary stream #%d is of unknown type, cannot proceed\n", i);
[self errorWith:kErrorUnknownStream and:[NSString stringWithFormat:#"Elementary stream #%d is of unknown type, cannot proceed",i]];
return AVERROR_INVALIDDATA;
}
else {
out_stream = avformat_new_stream(ofmt_ctx, NULL);
if (!out_stream) {
av_log(NULL, AV_LOG_ERROR, "Failed allocating output stream\n");
[self errorWith:kErrorAllocateOutputStream and:#"Failed allocating output stream"];
return AVERROR_UNKNOWN;
}
enc_ctx = out_stream->codec;
/* this stream must be remuxed */
// copies ifmt_ctx->streams[i]->codec into ofmt_ctx->streams[i]->codec - Copy the settings of the source AVCodecContext into the destination AVCodecContext.
ret = avcodec_copy_context(ofmt_ctx->streams[i]->codec,
ifmt_ctx->streams[i]->codec);
if (ret < 0) {
av_log(NULL, AV_LOG_ERROR, "Copying stream context failed\n");
[self errorWith:kErrorCopyStreamFailed and:#"Copying stream context failed"];
return ret;
}
}
// dunno what this is for
if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)
enc_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
if (!(ofmt_ctx->oformat->flags & AVFMT_NOFILE)) {
// Create and initialize a AVIOContext for accessing the
// resource indicated by url.
ret = avio_open(&ofmt_ctx->pb, filename, AVIO_FLAG_WRITE);
if (ret < 0) {
av_log(NULL, AV_LOG_ERROR, "Could not open output file '%s'", filename);
[self errorWith:kErrorCantOpenOutputFile and:[NSString stringWithFormat:#"Could not open output file '%s'", filename]];
return ret;
}
}
/* init muxer, write output file header */
// Allocate the stream private data and write the stream header to an output media file.
ret = avformat_write_header(ofmt_ctx, NULL);
if (ret < 0) {
av_log(NULL, AV_LOG_ERROR, "Error occurred when opening output file\n");
[self errorWith:kErrorOutFileCantWriteHeader and:#"Error occurred when opening output file"];
return ret;
}
return 0;
You can find the files here:
Original final: https://www.dropbox.com/s/2jjs1uy2pu2veyy/IMG_5705.MOV?dl=0
File generated with FFMpeg - https://www.dropbox.com/s/9hfmq3fcifgpfqc/local-ffmpeg.MOV?dl=0
File generated by code - https://www.dropbox.com/s/rttvny39rj7ejpf/generated-by-Ze.MOV?dl=0
Thank you so much,
Ze

Resources