Need to create a longer version of a video by looping it, all while keeping the frame rate, bitrate, and audio quality. not to mention the resolution - moviepy

I have this 5 minute long 24fps video, and I need to extend it's length by a factor of some integer number, something like 12, though preferably 24. I need to keep the frame rate the same, which I have achieved through moviepy, but the bit rate has changed. The whole point of this is to create a video that is equally as intensive on an APU, but is longer. I don't understand the implications of higher bitrates, data nor total, but I need to. Is there a way to concatenate a video with itself and maintain those values in the 'details' tab of properties. keep in mind I'm doing low power measurements.
I tried Microsoft Clipchamp, but that gave me 30fps. I looked for other video editors that are free but none gave 24fps. I tried moviepy which gave 24fps but much lower bitrates.

Related

Converting image sequence to video with inconsistent frame rate

I recently collected video data where the video was generated as image sequences. However, between different video of the same length, different numbers of frames were acquired, which made me think that the image sequence have varied frame rates between videos. So my question is how do I convert this image sequence back to video with accurate duration between frames. Is there a way to get that information from the date and time it was created using a code? I know ffmpeg seems to be the tools many people use.
I am not sure where to start. I am not very familiar with coding, so already have trouble executing the correct codes.

What's the difference with crf and qp in ffmpeg?

After browsing around Google, I've came across this page about h264 encoding and discovered about qp. https://trac.ffmpeg.org/wiki/Encode/H.264
My questions are: What are the differences with crf and qp? Is it better to use qp over crf overall, or is it only if for using qp 0 for best lossless? Does qp have a known sensible setting if it's preferred? So far, I know crf has the default value of 23 while 18 is a sensible preferred increase in quality, although I don't understand why 18 wouldn't be default if better sensible lossless. Lastly, would changing either of them cause incompatibility with non-ffmpeg players or just qp?
I'm converting from webm to mp4 by the way.
I was going to test crf 23 and 18 and pick which is best but I can't seem to find any concrete information on this comparison or about qp.
When you set the quantization parameter QP directly it remains constant throughout the encoding and each frame will be compressed based on the set value.
Constant rate factor CRF allows the QP to go up for frames with a lot of motion or down for still frames resulting in a consistent perceived quality while keeping the compression efficient.
This article explains it very well.
The CRF default is just a default, you need to pick a value adapted for your type of video. FFmpeg has filters like PSNR and SSIM which allow you to compare the results.
Constant QP mode is useful in limited circumstances.
I have some video game recordings from Subnautica. When I travel through a gate the wormhole effects are very messy and CRF mode will reduce the quality. Unfortunately this causes the HUD elements to become fuzzy. In order to preserve the crispness of the HUD I switched from CRF to constant QP mode. I guess a future version of the H.264 encoder could be improved to keep the QP crisp in blocks where there are unchanging pixels, but that does not exist today.
Constant QP would probably be a good choice for intermediate encodings (like converting screen captures from variable frame rate to constant frame rate, because blender is not architected to properly handle variable frame rate content) where you want to preserve quality and can afford the extra disk space.
In most footage CRF will save you some disk space or bandwidth. But in a few cases like mine I prefer constant QP.
Just for the record, -qp is supported by both livx264 and h264_nvenc codecs (CUDA-backed coded).
CUDA ignores -crf, which took me forever to notice.

FFMPEG API -- How much do stream parameters change frame-to-frame?

I'm trying to extract raw streams from devices and files using ffmpeg. I notice the crucial frame information (Video: width, height, pixel format, color space, Audio: sample format) is stored both in the AVCodecContext and in the AVFrame. This means I can access it prior to the stream playing and I can access it for every frame.
How much do I need to account for these values changing frame-to-frame? I found https://ffmpeg.org/doxygen/trunk/demuxing__decoding_8c_source.html#l00081 which indicates that at least width, height, and pixel format may change frame to frame.
Will the color space and sample format also change frame to frame?
Will these changes be temporary (a single frame) or lasting (a significant block of frames) and is there any way to predict for this stream which behavior will occur?
Is there a way to find the most descriptive attributes that this stream is possible of producing, such that I can scale all the lower-quality frames up, but not offer a result that is mindlessly higher-quality than the source, even if this is a device or a network stream where I cannot play all the frames in advance?
The fundamental question is: how do I resolve the flexibility of this API with the restriction that raw streams (my output) do not have any way of specifying a change of stream attributes mid-stream. I imagine I will need to either predict the most descriptive attributes to give the stream, or offer a new stream when the attributes change. Which choice to make depends on whether these values will change rapidly or stay relatively stable.
So, to add to what #szatmary says, the typical use case for stream parameter changes is adaptive streaming:
imagine you're watching youtube on a laptop with various methods of internet connectivity, and suddenly bandwidth decreases. Your stream will automatically switch to a lower bandwidth. FFmpeg (which is used by Chrome) needs to support this.
alternatively, imagine a similar scenario in a rtc video chat.
The reason FFmpeg does what it does is because the API is essentially trying to accommodate to the common denominator. Videos shot on a phone won't ever change resolution. Neither will most videos exported from video editing software. Even videos from youtube-dl will typically not switch resolution, this is a client-side decision, and youtube-dl simply won't do that. So what should you do? I'd just use the stream information from the first frame(s) and rescale all subsequent frames to that resolution. This will work for 99.99% for the cases. Whether you want to accommodate your service to this remaining 0.01% depends on what type of videos you think people will upload and whether resolution changes make any sense in that context.
Does colorspace change? They could (theoretically) in software that mixes screen recording with video fragments, but it's highly unlikely (in practice). Sample format changes as often as video resolution: quite often in the adaptive scenario, but whether you care depends on your service and types of videos you expect to get.
Usually not often, or ever. However, this is based on the codec and are options chosen at encode time. I pass the decoded frames through swscale just in case.

ffplay seek function

I'm trying to figure out how the seek function using the left/right arrows in ffplay works
i went inside thier open source code and tried to change the values from 10,-10 to different values so i can see if the seek moves correctly but after few attempts i saw that the movie postion after using either left or right arrow isnt moving to exactly the value i specified.
For example, if i used the default value 10, and the movie was on 00:10:00, after pressing the right arrow which suppose to move the movie to 00:20:00 i got something like 00:21:35 and it was not constant.
I tried that on varity of movies and got diffrenet results each time.
Anyone has any idea what i'm doing wrong? or can explain how the seek works in ffplay?
Video seeking precision depends on a variety of factors, but mainly PTS, DTS, and GOP Length. a GOP (Group of Pictures) starts with an I frame (or fixed picture). When you seek, it's just probably trying to find the closest I frame that has a PTS (Presentation Timestamp) grater than 20. What complicates things even further is that not all videos have a fixed GOP length (also called closed GOP) so seeking 10 seconds further in different positions will not always add 11.35 seconds.
Check out this article on GOP
http://en.wikipedia.org/wiki/Group_of_pictures

How does MPEG4 compression work?

Can anyone explain in a simple clear way how MPEG4 works to compress data. I'm mostly interested in video. I know there are different standards or parts to it. I'm just looking for the predominant overall compression method, if there is one with MPEG4.
MPEG-4 is a huge standard, and employs many techniques to achieve the high compression rates that it is capable of.
In general, video compression is concerned with throwing away as much information as possible whilst having a minimal effect on the viewing experience for an end user. For example, using subsampled YUV instead of RGB cuts the video size in half straight away. This is possible as the human eye is less sensitive to colour than it is to brightness. In YUV, the Y value is brightness, and the U and V values represent colour. Therefore, you can throw away some of the colour information which reduces the file size, without the viewer noticing any difference.
After that, most compression techniques take advantage of 2 redundancies in particular. The first is temporal redundancy and the second is spatial redundancy.
Temporal redundancy notes that successive frames in a video sequence are very similar. Typically a video would be in the order of 20-30 frames per second, and nothing much changes in 1/30 of a second. Take any DVD and pause it, then move it on one frame and note how similar the 2 images are. So, instead of encoding each frame independently, MPEG-4 (and other compression standards) only encode the difference between successive frames (using motion estimation to find the difference between frames)
Spatial redundancy takes advantage of the fact that in general the colour spread across images tends to be quite low frequency. By this I mean that neighbouring pixels tend to have similar colours. For example, in an image of you wearing a red jumper, all of the pixels that represent your jumper would have very similar colour. It is possible to use the DCT to transform the pixel values into the frequency space, where some high frequency information can be thrown away. Then, when the reverse DCT is performed (during decoding), the image is now without the thrown away high-frequency information.
To view the effects of throwing away high frequency information, open MS paint and draw a series of overlapping horizontal and vertical black lines. Save the image as a JPEG (which also uses DCT for compression). Now zoom in on the pattern, notice how the edges of the lines are not as sharp anymore and are kinda blurry. This is because some high frequency information (the transition from black to white) has been thrown away during compression. Read this for an explanation with nice pictures
For further reading, this book is quite good, if a little heavy on the maths.
Like any other popular video codec, MPEG4 uses a variation of discrete cosine transform and a variety of motion-compensation techniques (which you can think of as motion-prediction if that helps) that reduce the amount of data needed for subsequent frames. This page has an overview of what is done by plain MPEG4.
It's not totally dissimilar to the techniques used by JPEG.
MPEG4 uses a variety of techniques to compress video.
If you haven't already looked at wikipedia, this would be a good starting point.
There is also this article from the IEEE which explains these techniques in more detail.
Sharp edges certainly DO contain high frequencies. Reducing or eliminating high frequencies reduces the sharpness of edges. Fine detail including sharp edges is removed with high frequency removal - bility to resolve 2 small objects is removed with high frequencies - then you see just one.

Resources