Synchronize RTSP with computer time - ffmpeg

I am successfully using libav to receive the video stream from an RTSP network source. The point is that I need to syncronize my computer's time with the video capturing, meaning that I need to know which datetime of my computer corresponds to the first frame (pts = 0). My API calls are the following ones:
av_register_all()
avcodec_register_all()
avformat_network_init()
avformat_open_input()
avformat_find_stream_info()
av_read_play()
loop
av_init_packet()
av_read_frame()
[...]
av_free_packet
end loop
With the calls above, I successfully read frames, but I do need to know how can I know the exact absolute datetime that corresponds to the first frame, since it has a pts of 0. Maybe I can use a time() function or GetSystemTime (I am using Windows) between two calls of the above, but do not really know how libav works.

Related

How do I buffer and capture an RTSP stream to disk based on a trigger?

I think what I'm asking about is similar to this ffmpeg post about how to capture a lightning strike (https://trac.ffmpeg.org/wiki/Capture/Lightning).
I have a Raspberry Pi with an IP cam over RTSP, and what I'm wondering is how to maintain a continual 5 second live video buffer, until I trigger a "save" command which will pipe that 5 second buffer to disk, and continue streaming the live video to disk until I turn it off.
Essentially, Pi boots up, this magic black box process starts and is saving live video into a fixed-size, 5-second buffer, and then let's say an hour later - I click a button, and it flushes that 5-second buffer to a file on disk and continues to pipe the video to disk, until I click cancel.
In my environment, I'm able to use ffmpeg, gstreamer, or openRTSP. For each of these, I can connect to my RTSP stream and save it to disk, but I'm not sure how to create this ever-present 5 second cache.
I feel like the gstreamer docs are alluding to it here (https://gstreamer.freedesktop.org/documentation/application-development/advanced/buffering.html?gi-language=c), but I guess I'm just not grokking how the buffering fits in with a triggered save. From that article, I get the impression that the end-time of the video is known in advance (I could artificially limit mine, I guess).
I'm not in a great position to post-process the file, so using something like openRTSP, saving a whole bunch of video segments, and then merging them isn't really an option.
Note: After a successful save, I wouldn't need to save another video for a minute or so, so that 5 second cache has plenty of time to fill back up before the next
This is the closest similar question that I've found: https://video.stackexchange.com/questions/18514/ffmpeg-buffered-recording
Hey,
I dont know if you have knowledge about python, but there is a libary called pyav thats a fancy python wrapper/interface for ffmpeg.
There u can just read your frames from an RTSP Source and handle that frames as you want.
Here is just an idea/hack implementaion about that what u describe, you need to design your framebuffer. When u know that u get 25 FPS from your camera than you can restrict the queue size to 125.
import av
import time
import queue
from threading import Thread, Event
class LightingRecorder(Thread):
def __init__(self, source: str = ""):
Thread.__init__(self)
self.source = source
self.av_instance = None
self.connected = False
self.frame_buffer = queue.Queue()
self.record_event = Event()
def open_rtsp_stream(self):
try:
self.av_instance = av.open(self.source, 'r')
self.connected = True
print ("Connected")
except av.error.HTTPUnauthorizedError:
print ("aHTTPUnauthorizedError")
except Exception as Error:
# Catch other pyav errors if you want, just for example
print (Error)
def run(self):
self.open_rtsp_stream()
while 1:
if self.connected:
for packet in self.av_instance.demux():
for frame in packet.decode():
if packet.stream.type == 'video':
# TODO:
# Handle clearing of Framebuffer, remove frames that are older as a specific timestamp
# Or calculate FPS per seconds and store that many frames on framebuffer
print ("Add Frame to framebuffer", frame)
self.frame_buffer.put(frame)
if self.record_event.is_set():
[frame.to_image().save('frame-%04d.jpg' % frame.index) for frame in self.frame_buffer]
else:
time.sleep(10)
LightingRecorder(source='rtsp://root:pass#192.168.1.197/axis-media/media.amp').start()
iSpy/AgentDVR can do exactly what you want https://www.ispyconnect.com/userguide-recording.aspx:
Buffer: This is the number of seconds of video to buffer in memory.
This feature enables iSpy to capture the full event that causes the
motion detection event.
Edit:
iSpy runs only on Windows unlike AgentDVR which also has versions for Linux/OSX/RPi.

FFmpeg: why avcodec_receive_frame need additional packet to get an I frame

I'm new to FFmpeg. When learn it with the nice repo(https://github.com/leandromoreira/ffmpeg-libav-tutorial),in the hello_world example I find avcodec_receive_frame dosen't return the first I frame until it gets the third packet, as following screenshot shows:
I'm wondering why additional packets are needed to receive an I frame.
Most modern video codecs are using I/P/B frames which brings the decoding time stamp (DTS) and presentation time stamp (PTS). So, what hello_world does with ffmpeg's lib is the following:-
Demuxing (av_read_frame)
Demuxes packets based on file format (mp4/avi/mkv etc.) until you have a packet for the stream that you want (eg. video) - (We might could say NAL units as an example here - not sure)
Feeds the decoder with the packet (avcodec_send_packet)
Starts the decoding process until it has enough packets to give you the first frame (decodes based on DTS)
Checks whether a frame is ready to be presented (avcodec_receive_frame)
Asks the decoder if it has a frame to be presented after feeding it. It might not be ready and you need to re-feed it or even it might give you more than 1 frames at once. (Frames comes out based on PTS)

Onvif playback stream cannot seek

I'm trying to obtain playback video streams from some Axis and Hikvision cameras, using Onvif.
I'm doing this in a C# application, and the resulted stream is played in VLC.
Using the FindRecordings/GetRecordingSearchResult calls and then GetReplayUri I can obtain the playback stream (RTSP/H264), but here I have this problem: this behaves like a live stream - I can only use play and pause. I cannot use the time cursor to seek, cannot play in reverse.
So I find this unusable for a playback application - you have to watch the entire recording (days or hours of recording!) in order to see a specific event in time. And once you play it, you cannot go back 1 minute to see it again.
This seems quite stupid to me, so I believe that I'm doing something wrong in my code. Maybe I'm missing some configuration in order to obtain a 'true' playback stream.
My question is: is this playback stream behavior the 'standard' one, and I cannot expect more on this? Or some of you have this working ok (seek, reverse, frame by frame stepping), so I will know it can be done.
Thank you.
Reverse playback is possible, but it is not easy. First, the reverse replay is initiated using the Scale header field with a negative value. As an example:
PLAY rtsp://192.168.0.1/path/to/recording RTSP/1.0
Cseq: 123
Session: 12345678
Require: onvif-replay
Range: clock=20090615T114900.440Z-
Rate-Control: no
Scale: -1.0
After the stream is initialized, you will get GOPs in reverse order, not just reversed frames. I don't know if VLC supports this way of operating.
Be aware that only devices with the ReversePlayback capability support reverse playback.
Please refer to the streaming specification for further details.
This is not a real solution to the problem above, but maybe it would help others to deal with this situation.
Some cameras with which I worked were continuously recording on the same video file (so the time range was not known) and they were reporting (via RTSP) the available time interval like this:
range:npt=0-
Due to this, VLC was not displaying any time interval in the time slider, so it was not
allowing for seek. In my case, it was a requirement to use VLC, so I had to find a workaround to the problem.
This was a module which was acting like a proxy, and it sit between VLC and the RTSP source (camera). So all RTSP traffic between VLC and camera was going via this module which I controlled, so I could easily change the responses from camera in a way which was ok for VLC, so I got the seek capability available in VLC.

How to scale and mux audio?

First problem is with audio rescaling. I'm trying to redo doc/examples/transcode_aac.c so that it also resamples from 41100 to 48000, it contained a warning that it can't do it.
Using doc/examples/resampling_audio.c as a reference, I saw that before doing swr_convert, I need to find the number of audio samples at the output with the code like this:
int dst_nb_samples = av_rescale_rnd( input_frame->nb_samples + swr_get_delay(resampler_context, 41100),
48000, 41100, AV_ROUND_UP);
Problem is, when I just set int dst_nb_samples = input_frame->nb_samples (which is 1024), it encodes and plays normally, but when I do that av_rescale_rnd thing (which results in 1196), audio is slowed down and distorted, like there are skips in the audio.
Second problem is with trying to mux webm with opus audio.
When I set AVStream->time_base to 1/48000, and increase AVFrame->pts by 960, the resulted file is played in the player as a file that is much bigger. 17 seconds audio shows as 16m11s audio, but it plays normally.
When I increase pts by 20, it displays normally, but has a lot of [libopus # 00ffa660] Queue input is backward in time messages during the encoding. Same for pts 30, still has those messages.
Should I try time_scale 1/1000? webm always have timecodes in milliseconds, and opus have packet size of 20ms (960 samples at 48000 Hz).
Search for pts += 20;
Here is the whole file, all modification I did are marked with //MINE: http://www.mediafire.com/file/jlgo7x4hiz7bw64/transcode_aac.c
Here is the file I tested it on http://www.mediafire.com/file/zdy0zarlqw3qn6s/480P_600K_71149981_soundonly.mkv
The easiest way to achieve that is by using swr_convert_frame which take a frame and resample it to a completely different one.
You can read more about it here: https://ffmpeg.org/doxygen/3.2/swresample_8h_source.html
dst_nb_samples can be calculated as this:
dst_nb_samples = 48000.0 / audio_stream->codec->sample_rate * inputAudioFrame->nb_samples;
Yours probably correct too, I didn't check, but this one I used before, confirm with yours but the number you gave check out. So real problem is probably somewhere else. Try to supply 960 samples in sync with video frames, to do this you need to store audio frames to an additional liner buffer. See if problem fixes.
And/or:
2ndly my experiences says audio pts increase as number of samples per frame (i.e. 960 for 50fps video for 48000hz (48000/50)), not by ms. If you supply 1196 samples, use pts += 1196 (if not used additional buffer I mentioned above). This is different then video frame pts. Hope that helps.
You are definitely in right path. I'll examine the source code if I have time. Anyway hope that helps.

FFMpeg - Is it difficultt to use

I am trying to use ffmpeg, and have been doing a lot of experiment last 1 month.
I have not been able to get through. Is it really difficult to use FFmpeg?
My requirement is simple as below.
Can you please guide me if ffmpeg is suitable one or I have implement on my own (using codec libs available).
I have a webm file (having VP8 and OPUS frames)
I will read the encoded data and send it to remote guy
The remote guy will read the encoded data from socket
The remote guy will write it to a file (can we avoid decoding).
Then remote guy should be able to pay the file using ffplay or any player.
Now I will take a specific example.
Say I have a file small.webm, containing VP8 and OPUS frames.
I am reading only audio frames (OPUS) using av_read_frame api (Then checks stream index and filters audio frames only)
So now I have data buffer (encoded) as packet.data and encoded data buffer size as packet.size (Please correct me if wrong)
Here is my first doubt, everytime audio packet size is not same, why the difference. Sometimes packet size is as low as 54 bytes and sometimes it is 420 bytes. For OPUS will frame size vary from time to time?
Next say somehow extract a single frame (really do not know how to extract a single frame) from packet and send it to remote guy.
Now remote guy need to write the buffer to a file. To write the file we can use av_interleaved_write_frame or av_write_frame api. Both of them takes AVPacket as argument. Now I can have a AVPacket, set its data and size member. Then I can call av_write_frame api. But that does not work. Reason may be one should set other members in packet like ts, dts, pts etc. But I do not have such informations to set.
Can somebody help me to learn if FFmpeg is the right choice, or should I write a custom logic like parse a opus file and get frame by frame.
Now remote guy need to write the buffer to a file. To write the file
we can use av_interleaved_write_frame or av_write_frame api. Both of
them takes AVPacket as argument. Now I can have a AVPacket, set its
data and size member. Then I can call av_write_frame api. But that
does not work. Reason may be one should set other members in packet
like ts, dts, pts etc. But I do not have such informations to set.
Yes, you do. They were in the original packet you received from the demuxer in the sender. You need to serialize all information in this packet and set each value accordingly in the receiver.

Resources