I'm trying to write a decoder for a webrtc app in C. I receive a RTP stream, I parse every packet, reorder them, and put the payload in a AVPacket, as described here (FFmpeg decode raw buffer with avcodec_decode_video2).
The reordering part is not described in this link but I'm pretty sure this part is OK.
The question is, I dont know how to give the decoder information about resolution, pix_fmt etc. Do I need to create an AVstream* and fill it with all information I took from rtp header?
Do someone have a piece of running code that decode a VP8 packet depacketized without the use of rtp_dec etc.?
In this link, no more information seems to be sent to the decoder, is it able to decode without knowing resolution and without any header?
You don't need to feed "resolution, pix_fmt etc." information to decoder, as those get derived by decoder from the input AVPackets.
Encoders need such information like resolution, pix_fmt etc. to generate the compressed byte/bit-stream. And, encoders embed this (resolution, pix_fmt etc.) information in generated bit-stream. Once decoder receives bit-stream in right order, it derives the resolution, pix_fmt info before proceeding to de-compressing it.
Probably, the packet order that you are feeding to decoder is the cause in your case.
Related
I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.
I have an application that sends raw h264 NALUs as generated from encoding on the fly using x264 x264_encoder_encode. I am getting them through plain TCP so I am not missing any frames.
I need to be able to decode such a stream in the client using Hardware Acceleration in Windows (DXVA2). I have been struggling to find a way to get this to work using FFMPEG. Perhaps it may be easier to try Media Foundation or DirectShow, but they won't take raw H264.
I either need to:
Change the code from the server application to give back an mp4 stream. I am not that experienced with x264. I was able to get raw H264 by calling x264_encoder_encode, by following the answer to this question: How does one encode a series of images into H264 using the x264 C API? How can I go from this to something that is wrapped in MP4 while still being able to stream it in realtime
I could at the receiver wrap it with mp4 headers and feed it into something that can play it using DXVA. I wouldn't know how to do this
I could find another way to accelerate it using DXVA with FFMPEG or something else that takes it in raw format.
An important restriction is that I need to be able to pre-process each decoded frame before displaying it. Any solution that does decoding and displaying in a single step would not work for me
I would be fine with either solution
I believe you should be able to use H.264 packets off the wire with Media Foundation. there's an example on page 298 of this book http://www.docstoc.com/docs/109589628/Developing-Microsoft-Media-Foundation-Applications# that use a HTTP stream with Media Foundation.
I'm only learning Media Foundation myself and am trying to do a similar thing to you, in my case I want to use H.264 payloads from an RTP packet, and from my understanding that will require a custom IMFSourceReader. Accessing the decoded frames should also be possible from what I've read since there seems to be complete flexibility in chaining components together into topologies.
I want to stream video (no audio) from a server to a client. I will encode the video using libx264 and decode it with ffmpeg. I plan to use fixed settings (at the very least they will be known in advance by both the client and the server). I was wondering if I can avoid wrapping the compressed video in a container format (like mp4 or mkv).
Right now I am able to encode my frames using x264_encoder_encode. I get a compressed frame back, and I can do that for every frame. What extra information (if anything at all) do I need to send to the client so that ffmpeg can decode the compressed frames, and more importantly how can I obtain it with libx264. I assume I may need to generate NAL information (x264_nal_encode?). Having an idea of what is the minimum necessary to get the video across, and how to put the pieces together would be really helpful.
I found out that the minimum amount of information are the NAL units from each frame, this will give me a raw h264 stream. If I were to write this to a file, I could watchit using VLC if adding a .h264
I can also open such a file using ffmpeg, but if I want to stream it, then it makes more sense to use RTSP, and a good open source library for that is Live555: http://www.live555.com/liveMedia/
In their FAQ they mention how to send the output from your encoder to live555, and there is source for both a client and a server. I have yet to finish coding this, but it seems like a reasonable solution
I have to write a RTMP parser which will handle the packets captured form a RTMP stream on wireshark and i will extract the data from the pcap.
I have gone through the specs ad i am able to understand the handshake process and also able to locate the media in TCP packets but i am confused in case of Multiple Audio/Video session which are interleaved within a single pcap, how we can handle that in the parsing so as make our parser able to parse multiple stream simultaneously. Any uniqueness will be very helpful for the simultaneous parsing of the different RTMP streams.
EDIT (after #Martin Redmond's answer): yeah that I am able to figure out but it seems like some FLV data is being streamed over the RTMp but that FLV header is missing and there seems to be different handshake and FLV data is streaming for same IP with different ports. So, i am not able to find if its the real FLV file or only header as if i extract only the header and the other data, i am not able to make a FLV file from it.
Any way to validate or extract the media from that RTMP stream???
The header information for each chunk of data lets you figure out which stream the chunk belongs to. It's not straight forward though. The header information gets compressed and the relevant info may have only been sent at the begining of the stream so you need have a context for each chunk.
The important part is the streamid. Video and audio from the same source will have the same streamid but will have different channel numbers and datatypes.
In the spec. the streamid is referred to as the message stream id (section 6.1.2.1) and is only sent with a type 0 header.
I am working on ffmpeg and trying to add a audio stream on the fly. I am using AudioQueues and I get raw audio buffer. I am encoding audio with linear PCM and hence the audio I get will be of raw format, which I know ffmpeg does accept it. But I cannot figure out how. I have looked into AVStream, where in we have to create a new stream for this audio channel but how do I encode it to a video which is already initialized in another AVStream structure.
Overall, I would like to have an idea of the architecture of ffmpeg. I found it difficult to work since it is least documented. Any pointers or details are appreciated.
Thanks and Regards,
Raj Pawan G
If you want to use java, you'll find a much better documented API wrapper for FFmpeg with Xuggler.
That said, FFmpeg can support Raw PCM bu tnot all containers can contain it. use the PCM codecs (see avcodec.h) and find the one that has the right size and attributes you want. To add the audio to the same container find a AVFormatContext object that you use for your existing video stream, and use av_new_stream(...) to add a new stream. Then attach your PCM encoder and 'encode' to that and write resulting packets. See output_example.c in the FFmpeg for examples of this API in action.