Change the scale of overlay video based on audio level - ffmpeg

I'm looking to adjust the scale of my image overlay based on the input audio's loudness.
As the volume rises, I want to make it larger, as it gets quieter, I want to make it smaller.
I can't figure out how to access any relevant audio information in video filters.
I'm open to multi-step solutions but the result needs to be in sync

Related

Resampling HTMLImageElement for animation

An HTMLVideoElement can be resampled in order to get different frames into a texture over time.
For example, as shown at https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/Tutorial/Animating_textures_in_WebGL
However- when loading an animated gif into a HTMLImageElement, resampling does not show the updated texture. This is true even if the image is mounted on the dom and the different frames show on that copy.
Is there a standard way to display an animated gif in webgl, or must it be somehow rewritten into a spritesheet (or series of textures) at runtime?
GIFs aren't automatically animated with WebGL (or regular canvas for that matter) and there is no standard way of doing this.
Contrary to video elements GIF images will only draw the first frame via drawImage() while drawing video elements will draw current frame. This is in part because we don't really have access to any of the image's frames via API (this also applies to animated PNG files, aka APNG) and animated images will be handled only as an internal process conducted at the discretion of the browser and only when in DOM.
With video elements though we do have access to "frames", that is, time via currentTime so it's sort of implies that we want to deal with what we see or what exist at the current time.
You have to manually animate a GIF image though. This means you have to extract each frames as separate images/buffers first then show them at the rate you chose. The browser won't help you a bit here, but you can do this by parsing the file format manually.
Of course, this can be a bit tedious, but luckily there are people out there that has done all the lifting and hard work. For example gifuct (I have not tested it myself but there are others out there as well) will allow you to extract each frame from a GIF
Then render each frame you got from that into the frame buffer and upload to the GPU at the frame rate you choose.
Or:
pre-process the GIF into a spritesheet as you mention
or load it as an image sequence instead
or convert the GIF to a video (this may even reduce the total size)
And as a shameless plug if you should consider APNG instead: I have made apng-parser which does the same for APNG files.
My recommendation though is to convert the GIF/APNG to a video file which gives the animation capabilities for free, potentially smaller files, can be buffered and streamed for long animations, less code to include and typically a single file to deal with (you may have to provide different video formats for older browsers). Free software such as FFMpeg can help you with the conversion.

FFMPEG API -- How much do stream parameters change frame-to-frame?

I'm trying to extract raw streams from devices and files using ffmpeg. I notice the crucial frame information (Video: width, height, pixel format, color space, Audio: sample format) is stored both in the AVCodecContext and in the AVFrame. This means I can access it prior to the stream playing and I can access it for every frame.
How much do I need to account for these values changing frame-to-frame? I found https://ffmpeg.org/doxygen/trunk/demuxing__decoding_8c_source.html#l00081 which indicates that at least width, height, and pixel format may change frame to frame.
Will the color space and sample format also change frame to frame?
Will these changes be temporary (a single frame) or lasting (a significant block of frames) and is there any way to predict for this stream which behavior will occur?
Is there a way to find the most descriptive attributes that this stream is possible of producing, such that I can scale all the lower-quality frames up, but not offer a result that is mindlessly higher-quality than the source, even if this is a device or a network stream where I cannot play all the frames in advance?
The fundamental question is: how do I resolve the flexibility of this API with the restriction that raw streams (my output) do not have any way of specifying a change of stream attributes mid-stream. I imagine I will need to either predict the most descriptive attributes to give the stream, or offer a new stream when the attributes change. Which choice to make depends on whether these values will change rapidly or stay relatively stable.
So, to add to what #szatmary says, the typical use case for stream parameter changes is adaptive streaming:
imagine you're watching youtube on a laptop with various methods of internet connectivity, and suddenly bandwidth decreases. Your stream will automatically switch to a lower bandwidth. FFmpeg (which is used by Chrome) needs to support this.
alternatively, imagine a similar scenario in a rtc video chat.
The reason FFmpeg does what it does is because the API is essentially trying to accommodate to the common denominator. Videos shot on a phone won't ever change resolution. Neither will most videos exported from video editing software. Even videos from youtube-dl will typically not switch resolution, this is a client-side decision, and youtube-dl simply won't do that. So what should you do? I'd just use the stream information from the first frame(s) and rescale all subsequent frames to that resolution. This will work for 99.99% for the cases. Whether you want to accommodate your service to this remaining 0.01% depends on what type of videos you think people will upload and whether resolution changes make any sense in that context.
Does colorspace change? They could (theoretically) in software that mixes screen recording with video fragments, but it's highly unlikely (in practice). Sample format changes as often as video resolution: quite often in the adaptive scenario, but whether you care depends on your service and types of videos you expect to get.
Usually not often, or ever. However, this is based on the codec and are options chosen at encode time. I pass the decoded frames through swscale just in case.

Why not we use original image instead decoded image to P-frame?

I'm trying to want to know the P-frame at mpeg.
I have a query about reference image.
Why not we use original image instead decoded image to make P-frame?
I-frame, B-frame and P-frame allows to compress the video.
Indeed, in a video you have a lot of redundant information.
Think about a car moving across the screen: all the pixels in the background do not change from a picture to another, only those around the car are "moving". With the I-B-P frame truck, you give the code of the background and then, you just signalling slight changes (the car moving) through vectors.
This way you have to carry less information than if you have to repeat the entire picture each time.
See also:
Video compression
https://stackoverflow.com/a/24084121/3194340

Detect frames that have a given image/logo with FFmpeg

I'm trying to split a video by detecting the presence of a marker (an image) in the frames. I've gone over the documentation and I see removelogo but not detectlogo.
Does anyone know how this could be achieved? I know what the logo is and the region it will be on.
I'm thinking I can extract all frames to png's and then analyse them one by one (or n by n) but it might be a lengthy process...
Any pointers?
ffmpeg doesn't have any such ability natively. The delogo filter simply works by taking a rectangular region in its parameters and interpolating that region based on its surroundings. It doesn't care what the region contained previously; it'll fill in the region regardless of what it previously contained.
If you need to detect the presence of a logo, that's a totally different task. You'll need to create it yourself; if you're serious about this, I'd recommend that you start familiarizing yourself with the ffmpeg filter API and get ready to get your hands dirty. If the logo has a distinctive color, that might be a good way to detect it.
Since what you're after is probably going to just be outputting information on which frames contain (or don't contain) the logo, one filter to look at as a model will be the blackframe filter (which searches for all-black frames).
You can write a detect-logo module, Decode the video(YUV 420P FORMAT), feed the raw frame to this module, Do a SAD(Sum of Absolute Difference) on the region where you expect a logo,if SAD is negligible its a match, record the frame number. You can split the videos at these frames.
SAD is done only on Y(luma) frames. To save processing you can scale the video to a lower resolution before decoding it.
I have successfully detect logo using a rpi and coral ai accelerator in conjunction with ffmeg to to extract the jpegs. Crop the image to just the logo then apply to your trained model. Even then you will need to sample a minute or so of video to determine the actual logos identity.

Still images to video for storage - But back to still images for viewing

Using ffmpeg I can take a number of still images and turn them into a video. I would like to do this to decrease the total size of all my timelapse photos. But I would also like to extract the still images for use at a later date.
In order to use this method:
- I will need to correlate the original still image against a frame number in the video.
- And I will need to extract a thumbnail of a given frame number in a
video.
But before I go down this rabbit hole, I want to know if the requirements are possible using ffmpeg, and if so any hints on how to accomplish the task.
note: The still images are timelapse from a single camera over a day, so temporal compression will be measurable compared to a stack of jpegs.
When you use ffmpeg to create a video from a sequence of images, the images aren't affected in any way. You should still be able to use them for what you're trying to do, unless I'm misunderstanding your question.
Edit: You can use ffmpeg to create images from an existing video. I'm not sure how well it will work for your purposes, but the images are pretty high quality, if not the same as the originals. You'd have to play around with it to make sure the extracted images are exactly the same as the input images as far as sequential order and naming, but if you take fps into account, it should work.
The command to do this (from the ffmpeg documentation) is as follows:
ffmpeg -i movie.mpg movie%d.jpg

Resources