ffmpeg av_seek_frame with AVSEEK_FLAG_ANY causes grey screen - ffmpeg

Problem:
omxplayer's source code calls the ffmpeg av_seek_frame() method using the AVSEEK_FLAG_BACKWARD flag. Although not 100% sure, I believe this seeks to the closest i-frame. Instead, I want to seek to exact locations, so I modified the source code such that the av_seek_frame() method now uses the AVSEEK_FLAG_ANY flag. Now, when the movie loads, I get a grey screen, generally for 1 second, during which I can hear the audio. I have tried this on multiple computers (I am actually synchronizing them, therefore, at the same time too) so it is not a n isolated incident. My guess is that seeking to non i-frames is computationally more expensive, resulting in the initial grey screen.
Question: How, using ffmpeg, can I instruct the audio to wait until the video is ready before proceeding.

Actually, AVSEEK_FLAG_BACKWARD indicates that you want to find closest keyframe having a smaller timestamp than the one you are seeking.
By using AVSEEK_FLAG_ANY, you get the frame that corresponds exactly to the timestamp you asked for. But this frame might not be a keyframe, which means that it cannot be fully decoded. That explains your "grey screen", that appears until the next keyframe is reached.
The solution would therefore be to seek backward using AVSEEK_FLAG_BACKWARD and, from this keyframe, read the next frames (e.g. using av_read_frame()) until you get to the one corresponding to your timestamp. At this point, your frame would be fully decoded, and would not appear as a "grey screen" anymore.
NOTE: It appears that, for some reason, av_seek_frame() using AVSEEK_FLAG_BACKWARD returns the next keyframe when the frame that I am seeking is the one directly before this keyframe. Otherwise it returns the previous keyframe (which is what I want). My solution is to change the timestamp I give to av_seek_frame() to ensure that it will return the keyframe before the frame I am seeking.

Completing JonesV answer with some code:
void seekFrame(unsigned frameIndex)
{
// Seek is done on packet dts
int64_t target_dts_usecs = (int64_t)round(frameIndex
* (double)m_video_stream->r_frame_rate.den
/ m_video_stream->r_frame_rate.num * AV_TIME_BASE);
// Remove first dts: when non zero seek should be more accurate
auto first_dts_usecs = (int64_t)round(m_video_stream->first_dts
* (double)m_video_stream->time_base.num
/ m_video_stream->time_base.den * AV_TIME_BASE);
target_dts_usecs += first_dts_usecs;
int rv = av_seek_frame(
m_format_ctx, -1, target_dts_usecs, AVSEEK_FLAG_BACKWARD);
if (rv < 0)
throw exception("Failed to seek");
avcodec_flush_buffers(m_codec_ctx);
}
Then you can begin decoding checking AVPacket.dts against original target dts, computed on AVStream.time_base. As soon as you reached the target dts, the next decoded frame should be the desired frame.

Related

How do I buffer and capture an RTSP stream to disk based on a trigger?

I think what I'm asking about is similar to this ffmpeg post about how to capture a lightning strike (https://trac.ffmpeg.org/wiki/Capture/Lightning).
I have a Raspberry Pi with an IP cam over RTSP, and what I'm wondering is how to maintain a continual 5 second live video buffer, until I trigger a "save" command which will pipe that 5 second buffer to disk, and continue streaming the live video to disk until I turn it off.
Essentially, Pi boots up, this magic black box process starts and is saving live video into a fixed-size, 5-second buffer, and then let's say an hour later - I click a button, and it flushes that 5-second buffer to a file on disk and continues to pipe the video to disk, until I click cancel.
In my environment, I'm able to use ffmpeg, gstreamer, or openRTSP. For each of these, I can connect to my RTSP stream and save it to disk, but I'm not sure how to create this ever-present 5 second cache.
I feel like the gstreamer docs are alluding to it here (https://gstreamer.freedesktop.org/documentation/application-development/advanced/buffering.html?gi-language=c), but I guess I'm just not grokking how the buffering fits in with a triggered save. From that article, I get the impression that the end-time of the video is known in advance (I could artificially limit mine, I guess).
I'm not in a great position to post-process the file, so using something like openRTSP, saving a whole bunch of video segments, and then merging them isn't really an option.
Note: After a successful save, I wouldn't need to save another video for a minute or so, so that 5 second cache has plenty of time to fill back up before the next
This is the closest similar question that I've found: https://video.stackexchange.com/questions/18514/ffmpeg-buffered-recording
Hey,
I dont know if you have knowledge about python, but there is a libary called pyav thats a fancy python wrapper/interface for ffmpeg.
There u can just read your frames from an RTSP Source and handle that frames as you want.
Here is just an idea/hack implementaion about that what u describe, you need to design your framebuffer. When u know that u get 25 FPS from your camera than you can restrict the queue size to 125.
import av
import time
import queue
from threading import Thread, Event
class LightingRecorder(Thread):
def __init__(self, source: str = ""):
Thread.__init__(self)
self.source = source
self.av_instance = None
self.connected = False
self.frame_buffer = queue.Queue()
self.record_event = Event()
def open_rtsp_stream(self):
try:
self.av_instance = av.open(self.source, 'r')
self.connected = True
print ("Connected")
except av.error.HTTPUnauthorizedError:
print ("aHTTPUnauthorizedError")
except Exception as Error:
# Catch other pyav errors if you want, just for example
print (Error)
def run(self):
self.open_rtsp_stream()
while 1:
if self.connected:
for packet in self.av_instance.demux():
for frame in packet.decode():
if packet.stream.type == 'video':
# TODO:
# Handle clearing of Framebuffer, remove frames that are older as a specific timestamp
# Or calculate FPS per seconds and store that many frames on framebuffer
print ("Add Frame to framebuffer", frame)
self.frame_buffer.put(frame)
if self.record_event.is_set():
[frame.to_image().save('frame-%04d.jpg' % frame.index) for frame in self.frame_buffer]
else:
time.sleep(10)
LightingRecorder(source='rtsp://root:pass#192.168.1.197/axis-media/media.amp').start()
iSpy/AgentDVR can do exactly what you want https://www.ispyconnect.com/userguide-recording.aspx:
Buffer: This is the number of seconds of video to buffer in memory.
This feature enables iSpy to capture the full event that causes the
motion detection event.
Edit:
iSpy runs only on Windows unlike AgentDVR which also has versions for Linux/OSX/RPi.

How to scale and mux audio?

First problem is with audio rescaling. I'm trying to redo doc/examples/transcode_aac.c so that it also resamples from 41100 to 48000, it contained a warning that it can't do it.
Using doc/examples/resampling_audio.c as a reference, I saw that before doing swr_convert, I need to find the number of audio samples at the output with the code like this:
int dst_nb_samples = av_rescale_rnd( input_frame->nb_samples + swr_get_delay(resampler_context, 41100),
48000, 41100, AV_ROUND_UP);
Problem is, when I just set int dst_nb_samples = input_frame->nb_samples (which is 1024), it encodes and plays normally, but when I do that av_rescale_rnd thing (which results in 1196), audio is slowed down and distorted, like there are skips in the audio.
Second problem is with trying to mux webm with opus audio.
When I set AVStream->time_base to 1/48000, and increase AVFrame->pts by 960, the resulted file is played in the player as a file that is much bigger. 17 seconds audio shows as 16m11s audio, but it plays normally.
When I increase pts by 20, it displays normally, but has a lot of [libopus # 00ffa660] Queue input is backward in time messages during the encoding. Same for pts 30, still has those messages.
Should I try time_scale 1/1000? webm always have timecodes in milliseconds, and opus have packet size of 20ms (960 samples at 48000 Hz).
Search for pts += 20;
Here is the whole file, all modification I did are marked with //MINE: http://www.mediafire.com/file/jlgo7x4hiz7bw64/transcode_aac.c
Here is the file I tested it on http://www.mediafire.com/file/zdy0zarlqw3qn6s/480P_600K_71149981_soundonly.mkv
The easiest way to achieve that is by using swr_convert_frame which take a frame and resample it to a completely different one.
You can read more about it here: https://ffmpeg.org/doxygen/3.2/swresample_8h_source.html
dst_nb_samples can be calculated as this:
dst_nb_samples = 48000.0 / audio_stream->codec->sample_rate * inputAudioFrame->nb_samples;
Yours probably correct too, I didn't check, but this one I used before, confirm with yours but the number you gave check out. So real problem is probably somewhere else. Try to supply 960 samples in sync with video frames, to do this you need to store audio frames to an additional liner buffer. See if problem fixes.
And/or:
2ndly my experiences says audio pts increase as number of samples per frame (i.e. 960 for 50fps video for 48000hz (48000/50)), not by ms. If you supply 1196 samples, use pts += 1196 (if not used additional buffer I mentioned above). This is different then video frame pts. Hope that helps.
You are definitely in right path. I'll examine the source code if I have time. Anyway hope that helps.

ffmper with vlc - too heavy screen grabbing, resulting injumpy file, frame loss

I'm using 'vlc/ffmpeg' package to grab the screen and convert it to H.264 file.
The problem arises when the host is heavily loaded. I need to maintain correct time stamps and use the 5 fps (relatively low frame rate). Yet sometimes the resulting file jumps few seconds forward, apparently due to frame loss.
I can deal with the frame loss, it's OK, but I need to duplicate lost frames to maintain correct timing.
My configuration file:
vlc.exe screen:// -I dummy --verbose=2 --one-instance :screen-fps=5 :screen-caching=10000 :sout=#transcode{venc=x264{preset=ultrafast,tune=zerolatency},vcodec=h264,fps=5,vb=3000,width=1024,height=576,acodec=none}:file{dst="C:\tmp\output.mp4"}
What should I add/config to preserve proper time stamps and clip duration?
Many thanks for your help.
OK, I found adding 'copyts' option does exactly what I need.

On removing silence from videos

I have a video with voice over.
I extracted it's audio.
I extracted a cutlist from adobe audition of this audio like this:
Start:
00:00:12:00
End:
00:00:13:00
These are the parts that are silence and that are to be removed.
I converted these to frames given 25fps video file.
I created an avisynth file like this:
AVISource("20130531_1303_46.avi")
Crop(2,0,852,480)
Trim(0,4-1) ++ Trim(50+1,0)
Trim(0,34-1) ++ Trim(82+1,0)
....
Each line contains the start(first trim) minus (sum of all differences between previous end and start) and end(second trim) minus (sum of all differences between previous end and start) - frames.
I load this into virtualdub.
I remove all silences in audition according to the cutlist and save as mp3
I load the mp3 into virtualdub.
Problem: It's not in sync over the whole video, ie it starts in sync and after a while it drifts off in the positive direction(I have to enter a negative value of -3000ms for it to be in sync in the middle. Also it is chopped off more often than not.)
Means that something is wrong. I guess with the sum of all differences.
To understand this:
When you select a part in virtualdub and remove it, the total count of frames is the total count of frames minus the amount of frames the part has.
Example:
Frames 2-5 were removed. So it's basically 1-6-7-8-... left. According to a original time-frame relationship I would start at let's say 7. But 7 is now 3. And this adds up the more I remove.
So I thought: If I use the frame number minus the sum of all previously removed frames, I should cut at the right place.
I seem to forget something, what is it?
I found the solution. If I use Start = 4 and End = 50 then I need to step
End - Start + 1
forward and not
End - Start
But still, the 25fps resolution is too low, sometimes audio parts are chopped off.

Synchronize RTSP with computer time

I am successfully using libav to receive the video stream from an RTSP network source. The point is that I need to syncronize my computer's time with the video capturing, meaning that I need to know which datetime of my computer corresponds to the first frame (pts = 0). My API calls are the following ones:
av_register_all()
avcodec_register_all()
avformat_network_init()
avformat_open_input()
avformat_find_stream_info()
av_read_play()
loop
av_init_packet()
av_read_frame()
[...]
av_free_packet
end loop
With the calls above, I successfully read frames, but I do need to know how can I know the exact absolute datetime that corresponds to the first frame, since it has a pts of 0. Maybe I can use a time() function or GetSystemTime (I am using Windows) between two calls of the above, but do not really know how libav works.

Resources