I'm implementing an RTMP server right now, and everything's been working except for video streaming. I can stream audio with no problems (using OBS to stream), and play it back via VLC. The problem is VLC plays the audio, but no video. What I'm doing right now is forwarding every audio and video message I receive from OBS, I grab the original payload (audio/video data) and put in a Type 0 Chunk, since I've seen pretty much every implementation do this. I don't know if I'm missing some sort of processing that should be done on the video data.
If I try to playback with ffmpeg (saving the RTMP stream to an flv file), then I get this output:
[NULL # 000001eb053ed440] missing picture in access unit with size 5209
[AVBSFContext # 000001eb053ecbc0] No start code is found.
rtmp://192.168.1.2/app/publish: could not find codec parameters
Input #0, flv, from 'rtmp://192.168.1.2/app/publish':
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Data: none
Stream #0:1: Video: h264, none, 1k tbn
Output #0, flv, to 'av.flv':
Output file #0 does not contain any stream
It says missing picture in access unit with size 5209, No start code is found, and could not find codec parameters. What am I missing here? I know I'm forwarding the payload exactly as I've received it in my server, I even did a hash check on the video payload I'm receiving and the one I'm sending and it's exactly the same. Any help would be greatly appreciated.
Fixed by following #szatmary's suggestion: resending the sequence headers to every playback client before sending any audio/video messages.
Related
Given this stream from an RTSP camera which produce H264 stream:
Input #0, rtsp, from 'rtsp://admin:admin#192.168.0.15:554':
Metadata:
title : LIVE555 Streaming Media v2017.10.28
comment : LIVE555 Streaming Media v2017.10.28
Duration: N/A, start: 0.881956, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1600x900, 25 fps, 25 tbr, 90k tbn, 50 tbc
I want to run ffmpeg and pipe its output to a HTML5 video component with MSE.
Everything is fine and smooth as long I run this ffmpeg command (piping is removed!):
$ ffmpeg -i 'rtsp://admin:admin#192.168.0.15:554' -c:v copy -an -movflags frag_keyframe+empty_moov -f mp4
However it takes a bit time at the beginning.
I realized that the function avformat_find_stream_info makes about 15-20 seconds of delay on my system. Here is the docs.
Now I have also realized that if I add -probesize 32, avformat_find_stream_info will return almost immediately, but it cause some warnings:
$ ffmpeg -probesize 32 -i 'rtsp://admin:admin#192.168.0.15:554' -c:v copy -an -movflags frag_keyframe+empty_moov -f mp4
[rtsp # 0x1b2b300] Stream #0: not enough frames to estimate rate; consider increasing probesize
[rtsp # 0x1b2b300] decoding for stream 0 failed
Input #0, rtsp, from 'rtsp://admin:admin#192.168.0.15:554':
Metadata:
title : LIVE555 Streaming Media v2017.10.28
comment : LIVE555 Streaming Media v2017.10.28
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1600x900, 25 tbr, 90k tbn, 50 tbc
If I dump out this stream (into a file, test.mp4), all mediaplayers can play it perfectly.
However if I pipe this output into the HTML5 video with MSE, the stream sometimes displayed correctly, sometimes it just doesn't. No warnings or error messages are printed on the console in the browser.
From the second output I can see the fps is missing. I tried to set it up manually, but was not succeed (it seemed I could not change it manually).
How can I avoid avformat_find_stream_info and have the HTML5 MSE playback if I know everything of the stream beforehand?
Update
According to #szatmary's comments and answers I have search for a h264 bitstream parser.
This is what I found. I did also save the mp4 file which is not playable by HTML5 video, but by VLC it does, and I dropped into this analyser.
Here is a screenshot of my analysis:
Some facts here:
until #66 there is no type7 (SPS) unit in the stream.
62 is the last PPS before the first SPS arrived.
there are a lot of PPS even before 62.
bitstream ends at #103.
playing in VLC the stream is 20 seconds long.
I have several things to clear:
the #62 and #66 sps/pps units (or whatever) are holding metadata only for the next coming frames, or they can even refer to previous frames?
VLC plays 20 seconds, is it possible that it scans the whole file before, then play the frames from #1 based on #62 and #66 infos? - if VLC would get the file as stream, in this case it might play only a few seconds (#66 - #103).
most important: what shall I do with the bitstream parser to make HTML5 video playing this data? Shall I drop all the units before #62? Or before #66?
Now I'm really lost in this topic. I have created a video, with FFMPEG but this time I allowed it to finish its avformat_find_stream_info function.
Saved the video with the same methods as previously. VLC now plays 18 seconds (this is okay, I have a 1000 frame limitation in ffmpeg command).
However let's see now the bitstream information:
Now PPS and SPS are 130 and 133 respectively. This resulted a stream which is 2 sec shorter than before. (I guess)
Now I have learned that in a correct parsed h264 there can still be a lot of units before the first SPS(/PPS).
SO I would finetune my question above: what shall I do with the bitstream parser to make HTML5 video playing this data?
Also the bitstream parser I have found is not good, because it uses a binary wrapper => it can not be run purely on the client side.
I'm looking at mp4box now.
How can I avoid avformat_find_stream_info and have the HTML5 MSE playback if I know everything of the stream beforehand?
You don't know everything of the stream beforehand. You don't know the resolution, or the bitrate, or the level or the profile, or the constraint flags. You don't know the scaling lists values, You don't know the VUI data, you don't know if CABAC is used.
The player needs all these things to play the video, and they are not know until the player, or ffmpeg sees the first sps/pps in the stream. By limiting the analyze duration you are telling ffmpeg to give up looking for it, so it cant be guaranteed to produce a valid stream. It may work sometimes, it may not other times, and it largely depends on what frame in the rstp stream you start on.
A possible solution would be to add more keyframes to the source video if you can, This will send the sps/pps more frequently. If you don't control the source stream, you must just wait until a sps/pps show up int the stream.
Edit: I found the cause. The stream always begins with something which is not a JPEG. Only after it there is a normal MJPEG stream. Interestingly, not all of the small examples of using V4L2/MJPEG decoders can divide what the camera produces properly into frames. Something called capturev4l2.c is a rare example of doing it properly. Possibly there is some detail, which decides if the camera's bugginess is worked around or not.
I have a noname almost-UVC-compliant camera (it fails several compatibility tests). This is a relatively cheap global shutter camera, and thus I would like to use it instead of something properly documented. It outputs what is reported (and properly played) by mplayer as
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 57.107.100 (external)
Selected video codec: [ffmjpeg] vfm: ffmpeg (FFmpeg MJPEG)
ffprobe shows the following:
[mjpeg # 0x55c086dcc080] Format mjpeg detected only with low score of 25, misdetection possible!
Input #0, mjpeg, from '/home/sc/Desktop/a.raw':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown), 640x480, 25 tbr, 1200k tbn, 25 tbc
But as opposed to mplayer, it is unable to play it.
I tried decode_jpeg_raw from mjpegtools, it complains about the header, which seems to change with each captured stream. So does not look like an unwrapped stream of JPEG images.
I thus tried 0_hello_world.c from libavcodec/libavformat, but its stops at avformat_open_input() with an error Invalid data found when processing input. A 100-frame sample file is sitting here a.raw. Do you have any idea how to determine a method of decoding it in C into anything plain bitmap?
The file is grayscale, does not begin with a constant value, guvcview and mplayer are the only players I know, which can decode it without artifacts...
Since you have raw stream, I think what you need is a decoder with parser.
Check this decode_video.c example on ffmpeg:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/decode_video.c
Change necessary parts accordingly, like avcodec_find_decoder(...).
Hope that helps.
I am trying to receive H264 frames from a USB webcamera connected to my Raspberry PI
Using the RPi Camera Module I can run the following command to get H264 data outputted in stdin: raspivid -t 0 -w 640 -h 320 -fps 15 -o - with close to zero latency
Is there an equivalent function to do this with a USB camera? I have two USB cameras I would like to do this with.
Using ffprobe /dev/videoX I get the following output: (shorted down to the important details):
$ ffprobe /dev/video0
...
Input #0, video4linux2,v4l2, from '/dev/video0':
Duration: N/A, start: 18876.273861, bitrate: 147456 kb/s
Stream #0:0: Video: rawvideo (YUY2 / 0x32595559), yuyv422, 1280x720, 147456 kb/s, 10 fps, 10 tbr, 1000k tbn, 1000k tbc
$ ffprobe /dev/video1
...
Input #0, video4linux2,v4l2, from '/dev/video1':
Duration: N/A, start: 18980.783228, bitrate: 115200 kb/s
Stream #0:0: Video: rawvideo (YUY2 / 0x32595559), yuyv422, 800x600, 115200 kb/s, 15 fps, 15 tbr, 1000k tbn, 1000k tbc
$ ffprobe /dev/video2
...
Input #0, video4linux2,v4l2, from '/dev/video2':
Duration: N/A, start: 18998.984143, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, -5 kb/s, 30 fps, 30 tbr, 1000k tbn, 2000k tbc
As far as I can tell two of them are not H264, which will need to be "decoded" to H264 so I understand there is added a bit latency there. But the third one (video2) is H264 so I should be able to get data from it? I've tried to just pipe it out with CAT but it says I got invalid arguments.
I've come as far as using FFMPEG might be the only option here. Would like to use software easily available for all RPi (apt install).
Bonus question regarding H264 packets: When I stream the data from raspivid command to my decoder it works perfectly. But if I decide to drop the 10 first packets then it never initializes the decoding process and just shows a black background. Anyone know what might be missing in the first packets that I might be able to recreate in my software so I dont have to restart the stream for every newly connected user?
EDIT: Bonus Question Answer: After googling around I see that the first two frames raspivid sends me are. So by ignoring the two first frames my decoder wont "decode" properly. So if I save those frames and send them first to all new users it works perfectly. Seems like these are used in some kind of initial process.
0x27 = 01 00111 = type 7 Sequence parameter set (B-frame)
0x28 = 01 01000 = type 8 Picture parameter set (B-frame)
First, let us get the data flow right. For the Raspi cam:
The Raspi camera is connected by CSI (Camera Serial Interface) to the Raspi. This link carries uncompressed, raw image data.
raspivid talks to the embedded GPU of the Raspi to access the image data and also asks the GPU to perform H.264 encoding, which always adds some latency (you could use raspiyuv to get the raw uncompressed data, possibly with less latency).
USB webcams typically transfer uncompressed, raw image data. But some also transfer H.264 or jpeg encoded data.
Next, the Video for Linux API version 2 was not made for shell pipes, so you can't get data out of a /dev/videoX with cat. You need some code to perform IOCTL calls to negotiate what and how to read data from the device. ffmpeg does exactly that.
Regarding your bonus question, you might try the --inline option of raspivid, which forces the stream to include PPS and SPS headers on every I-frame.
Next, outputting H.264 data from ffmpeg, using -f rawvideo looks wrong to me, since rawvideo means uncompressed video. You could instead try -f h264 to force raw H.264 videooutput format:
ffmpeg -i /dev/video2 -c copy -f h264 pipe:1
Finally, you actually want to get a H.264 stream from your USB webcam. Since the image data comes uncompressed from the camera, it first has to be encoded to H.264. The sensible option on the Raspi is to use the hardware encoder, since using a software encoder like x264 would consume too much CPU resources.
If you have an ffmpeg that was configured using --enable-mmal and/or --enable-omx-rpi, you can use ffmpeg to talk to the hardware H.264 encoder.
Otherwise, take a look at gstreamer and its omxh264enc element, eg. here. gstreamer can also talk to v4l2 devices.
Can anyone tell me where metadata is stored in common video file formats? And if it would be located towards the start of the file, or scattered throughout.
I'm working with a remote object store containing a lot of video files and I want to extract metadata, in particular video duration and video dimensions from those files, without streaming the entire file contents to the local machine.
I'm hoping that this metadata will be stored in the first X bytes of files, and so I can just fetch a byte range starting at the beginning instead of the whole file, passing this partial file data to ffprobe.
For testing purposes I created a 22MB MP4 file, and used the following command to supply only the first 1MB of data to ffprobe:
head -c1024K '2013-07-04 12.20.07.mp4' | ffprobe -
It prints:
avprobe version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2007-2013 the Libav developers
built on Apr 2 2013 17:02:36 with gcc 4.6.3
[mov,mp4,m4a,3gp,3g2,mj2 # 0x1a6b7a0] stream 0, offset 0x10beab: partial file
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pipe:':
Metadata:
major_brand : isom
minor_version : 0
compatible_brands: isom3gp4
creation_time : 1947-07-04 11:20:07
Duration: 00:00:09.84, start: 0.000000, bitrate: N/A
Stream #0.0(eng): Video: h264 (High), yuv420p, 1920x1080, 20028 kb/s, PAR 65536:65536 DAR 16:9, 29.99 fps, 30 tbr, 90k tbn, 180k tbc
Metadata:
creation_time : 1947-07-04 11:20:07
Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 189 kb/s
Metadata:
creation_time : 1947-07-04 11:20:07
So I see the first 1MB was enough to extract video duration 9.84 seconds and video dimensions 1920x1080, even though ffprobe printed the warning about detecting a partial file. If I supply less than 1MB, it fails completely.
Would this approach work for other common video file formats to reliably extract metadata, or do any common formats scatter metadata throughout the file?
I'm aware of the concept of container formats and that various codecs may be used represent the audio/video data inside those containers. I'm not familiar with the details though. So I guess the question may apply to common combinations of containers + codecs? Thanks in advance.
Okay to answer my own question after a lot of digging through the specs for MP4, 3GP and AVI...
AVI
Metadata is at the start of AVI files, according to the AVI file format specification.
Video duration is not stored verbatim in AVI files, but is calculated (in microseconds) as dwMicroSecPerFrame x dwTotalFrames.
Reading between the lines of the spec, it seems that many items of metadata can be read directly from offsets within AVI files without parsing at all. But the spec does not mention these offsets explicitly so using this rule of thumb could be risky.
Offset 32: dwMicroSecPerFrame, offset 48: dwTotalFrames, offset 64: dwWidth, offset 68: dwHeight.
So for AVI, it is possible to extract this metadata with only the first X bytes of the file.
MP4, 3GP (3GPP), 3G2 (3GPP2)
All of these file formats are based on the ISO base media file format known as ISO/IEC 14496-12 (MPEG-4 Part 12).
This format allows metadata to be stored anywhere in the file, but in practice it will be either at the start or the end because the raw captured audio/video data is saved contiguously in the middle. (An exception however, would be "fragmented" MP4 files, which are rare.)
Only files with the metadata stored at the start can be played via progressive download, but it is up to the capture device or decoder to support this.
AFAICT this means that to extract metadata from these files, only the first X bytes of the file would be required, and from that information it could be determined that potentially also the last X bytes would be required. But bytes in the middle would not be required.
I create a simple direct show source filter using FFmpeg.I read rtp packets from RTSP source and give them to decoder. It works for h264 stream.
MyRtspSourceFilter[H264 Stream] ---> h264 Decoder --> Video Renderer
The bad news is that it does not work for MPEG-4. I can able to connect my rtsp source filter with MPEG-Decoder. I got no exception but video renderer does not show anything. Actually just show one frame then nothing [just stop]... Decoders and Renderers are 3rd party so i can not debug them.
MyRtspSourceFilter[MP4 Stream] ---> MPEG-4 Decoder --> Video Renderer
I can able to get rtp packets from MPEG-4 RTSP Source using FFmpeg sucessfully.There is no problem with it.
It seems that i have not set something(?) in my Rtsps Source
Filter which is not necessary for H264 stream but may be important for
MPEG-4 stream
What may cause this h264 stream and MPEG-4 stream difference in a direct show rtsp source filter? Any ideas.
More Info:
-- First i try some other rtsp source filters for MPEG-4 Stream...Although my rtsp source is same i see different subtypes in their pin connections.
-- Secondly i realy get suspicious if the source is really MPEG-4 SO i check with FFmpeg...FFmpeg gives the source codec id as "CODEC_ID_MPEG4".
Update:
[ Hack ]
I just set m_bmpInfo.biCompression = DWORD('xvid') it just worked fine...But it is static. How to dynamically get/determine this value using ffmpeg or other ways...
I am on the RTSP-server side, different use case with required by-frame conversions
MP4 file ---> MPEG-4 Decoder --> H264 Encoder --> RTSP Stream
Will deploy libav, which is kernel of ffmpeg.
EDIT:
With H264 encoded video layer, the video just needs to be remuxed from
length-prefixed file format "AVCC" to byte stream format according to some "Annex B" of the MPEG-4 specification. libav provides required bit-stream filter "h264_mp4toannexb"
MP4 file ---> h264_mp4toannexb_bsf --> RTSP Stream
Now, for decoding RTSP:
Video and Audio come in separate channels. Parsing and decoding the H264 stream is done here: my basic h264 decoder using libav
Audio is a different thing:
RTP Transport suggests, that AAC frames are encapsulated in ADTS, where RTSP players like VLC expect plane AAC and accordingly available RTSP server implementations AACSource::HandleFrame() pinch the ADTS header off.
Another different thing is "time stamps and RTP":
VLC does not support compensation of time offsets between audio and video. Nearly every RTSP producer or consumer has constraints or non-documented assumptions for a time offset; you might consider an additional delay pipe to compensate the offset of an RTSP source.