Why there are so many "zzzzzzzzzzzz..." in chromecast payload type 127 RTP packets? - chromecast

As the picture shows: chromecast payload type 127 RTP packets from Nexus 5 contains lots of "zzzz..." comparing to from Ubuntu.
It looks like these "zzzzz..." is not so meaningful for screen mirror, thus I'm pretty confused why there are so many redundant content in packet?!

Related

Streaming vulkan framebuffer over HLS

I need to stream frames from a native android Vulkan application of mine, I'm successfully copying data off my framebuffer and the raw frame data is ready for encoding. However I'm torn on what the best next steps would be.
The raw image needs to be encoded to JPEG, then transported via HLS which will end up at some HTTP server which will host the playlist externally.
The main point of contention is how I can encode the raw frame and then serve it to the HTTP server.
I assume I'll need to start a tcp server to send raw frames to ffmpeg/gstreamer pipeline for encoding, but I'm not sure what that command would look like. Or whether there is a better way to do this.
You could use AppSRC to send your buffer to your Gstreamer Pipeline.
You can see information here:
https://gstreamer.freedesktop.org/documentation/tutorials/basic/short-cutting-the-pipeline.html?gi-language=c
Here HLS Sink Element
https://gstreamer.freedesktop.org/documentation/hls/hlssink2.html?gi-language=c
Are you sure HLS Protocol supports JPEG ?
If you want to transfer a jpeg encoded file here an example :
https://gstreamer.freedesktop.org/documentation/curl/curlhttpsink.html?gi-language=c
As a conclusion :
Buffer -> AppSrc -> jpegenc -> jpegparse ! curlhttpsink
For HLS you have hlssink2 element : https://gstreamer.freedesktop.org/documentation/hls/hlssink2.html?gi-language=c
Here how to make a Gstreamer Android App :
https://gstreamer.freedesktop.org/documentation/tutorials/android/index.html?gi-language=c

FFmpeg: why avcodec_receive_frame need additional packet to get an I frame

I'm new to FFmpeg. When learn it with the nice repo(https://github.com/leandromoreira/ffmpeg-libav-tutorial),in the hello_world example I find avcodec_receive_frame dosen't return the first I frame until it gets the third packet, as following screenshot shows:
I'm wondering why additional packets are needed to receive an I frame.
Most modern video codecs are using I/P/B frames which brings the decoding time stamp (DTS) and presentation time stamp (PTS). So, what hello_world does with ffmpeg's lib is the following:-
Demuxing (av_read_frame)
Demuxes packets based on file format (mp4/avi/mkv etc.) until you have a packet for the stream that you want (eg. video) - (We might could say NAL units as an example here - not sure)
Feeds the decoder with the packet (avcodec_send_packet)
Starts the decoding process until it has enough packets to give you the first frame (decodes based on DTS)
Checks whether a frame is ready to be presented (avcodec_receive_frame)
Asks the decoder if it has a frame to be presented after feeding it. It might not be ready and you need to re-feed it or even it might give you more than 1 frames at once. (Frames comes out based on PTS)

PCM audio streaming over websocket

I've been struggling with the following problem and can't figure out a solution. The provided java server application sends pcm audio data in chunks over a websocket connection. There are no headers etc. My task is to play these raw chunks of audio data in the browser without any delay. In the earlier version, I used audioContext.decodeAudioData because I was getting the full array with the 44 byte header at the beginning. Now there is no header so decodeAudioData cannot be used. I'll be very grateful for any suggestions and tips. Maybe I've to use some JS decoding library, any example or link will help me a lot.
Thanks.
1) Your requirement "play these raw chunks of audio data in the browser without any delay" is not possible. There is always some amount of time to send audio, receive it, and play it. Read about the term "latency." First you must get a realistic requirement. It might be 1 second or 50 milliseconds but you need to get something realistic.
2) Web sockets use tcp. TCP is designed for reliable communications, congestion control, etc. It is not design for fast low latency communications.
3) Give more information about your problem. Is you client and server communicating over the Internet or over a local Lan? This will hugely effect your performance and design.
4) The 44 byte header was a wav file header. It tells the type of data (sample rate, mono/stereo, bits per sample). You must know this information to be able to play the audio. IF you know the PCM type, you could insert it yourself and use your decoder as you did before. Otherwise, you need to construct an audio player manually.
Streaming audio over networks is not a trivial task.

Encode WebCam frames with H.264 on .NET

What i want to do is the following procedure:
Get a frame from the Webcam.
Encode it with an H264 encoder.
Create a packet with that frame with my own "protocol" to send it via UDP.
Receive it and decode it...
It would be a live streaming.
Well i just need help with the Second step.
Im retrieving camera images with AForge Framework.
I dont want to write frames to files and then decode them, that would be very slow i guess.
I would like to handle encoded frames in memory and then create the packets to be sent.
I need to use an open source encoder. Already tryed with x264 following this example
How does one encode a series of images into H264 using the x264 C API?
but seems it only works on Linux, or at least thats what i thought after i saw like 50 errors when trying to compile the example with visual c++ 2010.
I have to make clear that i already did a lot of research (1 week reading) before writing this but couldnt find a (simple) way to do it.
I know there is the RTMP protocol, but the video stream will always be seen by one peroson at a(/the?) time and RTMP is more oriented to stream to many people. Also i already streamed with an adobe flash application i made but was too laggy ¬¬.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
I hope that at least someone could point me on(/at?) the right direction.
My english is not good maybe blah blah apologies. :P
PS: doesnt has to be in .NET, it can be in any language as long as it works on Windows.
Many many many many thanks in advance.
You could try your approach using Microsoft's DirectShow technology. There is an opensource x264 wrapper available for download at Monogram.
If you download the filter, you need to register it with the OS using regsvr32. I would suggest doing some quick testing to find out if this approach is feasible, use the GraphEdit tool to connect your webcam to the encoder and have a look at the configuration options.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
This really depends on the required latency: the more frames you package, the less header overhead, but the more latency since you have to wait for multiple frames to be encoded before you can send them. For live streaming the latency should be kept to a minimum and the typical protocols used are RTP/UDP. This implies that your maximum packet size is limited to the MTU of the network often requiring IDR frames to be fragmented and sent in multiple packets.
My advice would be to not worry about sending more frames in one packet until/unless you have a reason to. This is more often necessary with audio streaming since the header size (e.g. IP + UDP + RTP) is considered big in relation to the audio payload.

RTMP parsing with multiple Audio Video Session in the pcap

I have to write a RTMP parser which will handle the packets captured form a RTMP stream on wireshark and i will extract the data from the pcap.
I have gone through the specs ad i am able to understand the handshake process and also able to locate the media in TCP packets but i am confused in case of Multiple Audio/Video session which are interleaved within a single pcap, how we can handle that in the parsing so as make our parser able to parse multiple stream simultaneously. Any uniqueness will be very helpful for the simultaneous parsing of the different RTMP streams.
EDIT (after #Martin Redmond's answer): yeah that I am able to figure out but it seems like some FLV data is being streamed over the RTMp but that FLV header is missing and there seems to be different handshake and FLV data is streaming for same IP with different ports. So, i am not able to find if its the real FLV file or only header as if i extract only the header and the other data, i am not able to make a FLV file from it.
Any way to validate or extract the media from that RTMP stream???
The header information for each chunk of data lets you figure out which stream the chunk belongs to. It's not straight forward though. The header information gets compressed and the relevant info may have only been sent at the begining of the stream so you need have a context for each chunk.
The important part is the streamid. Video and audio from the same source will have the same streamid but will have different channel numbers and datatypes.
In the spec. the streamid is referred to as the message stream id (section 6.1.2.1) and is only sent with a type 0 header.

Resources