On removing silence from videos - algorithm

I have a video with voice over.
I extracted it's audio.
I extracted a cutlist from adobe audition of this audio like this:
Start:
00:00:12:00
End:
00:00:13:00
These are the parts that are silence and that are to be removed.
I converted these to frames given 25fps video file.
I created an avisynth file like this:
AVISource("20130531_1303_46.avi")
Crop(2,0,852,480)
Trim(0,4-1) ++ Trim(50+1,0)
Trim(0,34-1) ++ Trim(82+1,0)
....
Each line contains the start(first trim) minus (sum of all differences between previous end and start) and end(second trim) minus (sum of all differences between previous end and start) - frames.
I load this into virtualdub.
I remove all silences in audition according to the cutlist and save as mp3
I load the mp3 into virtualdub.
Problem: It's not in sync over the whole video, ie it starts in sync and after a while it drifts off in the positive direction(I have to enter a negative value of -3000ms for it to be in sync in the middle. Also it is chopped off more often than not.)
Means that something is wrong. I guess with the sum of all differences.
To understand this:
When you select a part in virtualdub and remove it, the total count of frames is the total count of frames minus the amount of frames the part has.
Example:
Frames 2-5 were removed. So it's basically 1-6-7-8-... left. According to a original time-frame relationship I would start at let's say 7. But 7 is now 3. And this adds up the more I remove.
So I thought: If I use the frame number minus the sum of all previously removed frames, I should cut at the right place.
I seem to forget something, what is it?

I found the solution. If I use Start = 4 and End = 50 then I need to step
End - Start + 1
forward and not
End - Start
But still, the 25fps resolution is too low, sometimes audio parts are chopped off.

Related

How to record last X hours of a live stream via FFMPEG?

In normal case, saving a live stream in a local file is easy. But I'm looking for a way to keep last X amount of time of the stream. i.e last 2 days.
Obviously the file must be updated constantly.
Any help? Thanks.

FFMPEG Increment filename past 1000 and reset from 1

From the documentation:
ffmpeg -i video.webm image-%03d.png
This will extract 25 images per second from the file video.webm and save them as image-000.png, image-001.png, image-002.png up to image-999.png. If there are more than 1000 frames then the last image will be overwritten with the remaining frames leaving only the last frame.
Is there any way to increment this number past 1000, and can I also have this restart from 1 so that we're not just overwriting the last frame?
I have a script that analyzes these images as they come in so I use locally stored images as a buffer/queue. It's also useful for me to have more images stored so I can go back and debug anything, so being able to do the above would be quite helpful for me.
ffmpeg -i video.webm image-%04d.png
Will output image-0001.png, etc, allowing you to go beyond 999.

How to scale and mux audio?

First problem is with audio rescaling. I'm trying to redo doc/examples/transcode_aac.c so that it also resamples from 41100 to 48000, it contained a warning that it can't do it.
Using doc/examples/resampling_audio.c as a reference, I saw that before doing swr_convert, I need to find the number of audio samples at the output with the code like this:
int dst_nb_samples = av_rescale_rnd( input_frame->nb_samples + swr_get_delay(resampler_context, 41100),
48000, 41100, AV_ROUND_UP);
Problem is, when I just set int dst_nb_samples = input_frame->nb_samples (which is 1024), it encodes and plays normally, but when I do that av_rescale_rnd thing (which results in 1196), audio is slowed down and distorted, like there are skips in the audio.
Second problem is with trying to mux webm with opus audio.
When I set AVStream->time_base to 1/48000, and increase AVFrame->pts by 960, the resulted file is played in the player as a file that is much bigger. 17 seconds audio shows as 16m11s audio, but it plays normally.
When I increase pts by 20, it displays normally, but has a lot of [libopus # 00ffa660] Queue input is backward in time messages during the encoding. Same for pts 30, still has those messages.
Should I try time_scale 1/1000? webm always have timecodes in milliseconds, and opus have packet size of 20ms (960 samples at 48000 Hz).
Search for pts += 20;
Here is the whole file, all modification I did are marked with //MINE: http://www.mediafire.com/file/jlgo7x4hiz7bw64/transcode_aac.c
Here is the file I tested it on http://www.mediafire.com/file/zdy0zarlqw3qn6s/480P_600K_71149981_soundonly.mkv
The easiest way to achieve that is by using swr_convert_frame which take a frame and resample it to a completely different one.
You can read more about it here: https://ffmpeg.org/doxygen/3.2/swresample_8h_source.html
dst_nb_samples can be calculated as this:
dst_nb_samples = 48000.0 / audio_stream->codec->sample_rate * inputAudioFrame->nb_samples;
Yours probably correct too, I didn't check, but this one I used before, confirm with yours but the number you gave check out. So real problem is probably somewhere else. Try to supply 960 samples in sync with video frames, to do this you need to store audio frames to an additional liner buffer. See if problem fixes.
And/or:
2ndly my experiences says audio pts increase as number of samples per frame (i.e. 960 for 50fps video for 48000hz (48000/50)), not by ms. If you supply 1196 samples, use pts += 1196 (if not used additional buffer I mentioned above). This is different then video frame pts. Hope that helps.
You are definitely in right path. I'll examine the source code if I have time. Anyway hope that helps.

Converting subtitles for different framerates

I'm trying to make a simple CLI program that parses a SRT subtitle file and creates a new one, editing the timestamps to fit the desired framerate.
Eg I have a one-hour video track that runs at 25.0fps, with proper subtitles.
When encoding the same video at 23.976fps, the output video is a few seconds shorter (3 seconds approximately)
I've tried applying the following cross product to each time value in my srt file :
timestamp = timestamp * outputfps / inputfps
This produces captions that are approx. 3 minutes earlier compared to the input SRT (for the last captions - for the first ones the delay is obviously lesser), where the maximum delay should be 3 seconds, according to the new video file length.
This is all new for me and it seems obvious that something's wrong with the way I convert these timestamps. Could you please highlight my mistake?
Edit : According to j_random_hacker clever answer, the video should have the same duration at 25 than at 12 fps, which is easily verified. Seems like the 3 seconds offset I have is there no matter what the output framerate is - I guess there's some sort of trimming happening back there.
The main question remains : how does one convert a subtitle track so it doesn't go out of sync as the video file plays? (see my own comment below if this is unclear)

ffmpeg av_seek_frame with AVSEEK_FLAG_ANY causes grey screen

Problem:
omxplayer's source code calls the ffmpeg av_seek_frame() method using the AVSEEK_FLAG_BACKWARD flag. Although not 100% sure, I believe this seeks to the closest i-frame. Instead, I want to seek to exact locations, so I modified the source code such that the av_seek_frame() method now uses the AVSEEK_FLAG_ANY flag. Now, when the movie loads, I get a grey screen, generally for 1 second, during which I can hear the audio. I have tried this on multiple computers (I am actually synchronizing them, therefore, at the same time too) so it is not a n isolated incident. My guess is that seeking to non i-frames is computationally more expensive, resulting in the initial grey screen.
Question: How, using ffmpeg, can I instruct the audio to wait until the video is ready before proceeding.
Actually, AVSEEK_FLAG_BACKWARD indicates that you want to find closest keyframe having a smaller timestamp than the one you are seeking.
By using AVSEEK_FLAG_ANY, you get the frame that corresponds exactly to the timestamp you asked for. But this frame might not be a keyframe, which means that it cannot be fully decoded. That explains your "grey screen", that appears until the next keyframe is reached.
The solution would therefore be to seek backward using AVSEEK_FLAG_BACKWARD and, from this keyframe, read the next frames (e.g. using av_read_frame()) until you get to the one corresponding to your timestamp. At this point, your frame would be fully decoded, and would not appear as a "grey screen" anymore.
NOTE: It appears that, for some reason, av_seek_frame() using AVSEEK_FLAG_BACKWARD returns the next keyframe when the frame that I am seeking is the one directly before this keyframe. Otherwise it returns the previous keyframe (which is what I want). My solution is to change the timestamp I give to av_seek_frame() to ensure that it will return the keyframe before the frame I am seeking.
Completing JonesV answer with some code:
void seekFrame(unsigned frameIndex)
{
// Seek is done on packet dts
int64_t target_dts_usecs = (int64_t)round(frameIndex
* (double)m_video_stream->r_frame_rate.den
/ m_video_stream->r_frame_rate.num * AV_TIME_BASE);
// Remove first dts: when non zero seek should be more accurate
auto first_dts_usecs = (int64_t)round(m_video_stream->first_dts
* (double)m_video_stream->time_base.num
/ m_video_stream->time_base.den * AV_TIME_BASE);
target_dts_usecs += first_dts_usecs;
int rv = av_seek_frame(
m_format_ctx, -1, target_dts_usecs, AVSEEK_FLAG_BACKWARD);
if (rv < 0)
throw exception("Failed to seek");
avcodec_flush_buffers(m_codec_ctx);
}
Then you can begin decoding checking AVPacket.dts against original target dts, computed on AVStream.time_base. As soon as you reached the target dts, the next decoded frame should be the desired frame.

Resources