I need to know what would be the most favorable approach for streaming screen content and controlling remote computer (mouse, keyboard). (I would like to build something like "one-click teamviewer")
So my main question is about picking the video compression method for such requirements:
Most information between subsequent frames stays the same
Color depth can be degraded, but the details (text) must remain sharp
It should work on low-end bandwidths: 512k and below
Frames can be dropped
Related
An HTMLVideoElement can be resampled in order to get different frames into a texture over time.
For example, as shown at https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/Tutorial/Animating_textures_in_WebGL
However- when loading an animated gif into a HTMLImageElement, resampling does not show the updated texture. This is true even if the image is mounted on the dom and the different frames show on that copy.
Is there a standard way to display an animated gif in webgl, or must it be somehow rewritten into a spritesheet (or series of textures) at runtime?
GIFs aren't automatically animated with WebGL (or regular canvas for that matter) and there is no standard way of doing this.
Contrary to video elements GIF images will only draw the first frame via drawImage() while drawing video elements will draw current frame. This is in part because we don't really have access to any of the image's frames via API (this also applies to animated PNG files, aka APNG) and animated images will be handled only as an internal process conducted at the discretion of the browser and only when in DOM.
With video elements though we do have access to "frames", that is, time via currentTime so it's sort of implies that we want to deal with what we see or what exist at the current time.
You have to manually animate a GIF image though. This means you have to extract each frames as separate images/buffers first then show them at the rate you chose. The browser won't help you a bit here, but you can do this by parsing the file format manually.
Of course, this can be a bit tedious, but luckily there are people out there that has done all the lifting and hard work. For example gifuct (I have not tested it myself but there are others out there as well) will allow you to extract each frame from a GIF
Then render each frame you got from that into the frame buffer and upload to the GPU at the frame rate you choose.
Or:
pre-process the GIF into a spritesheet as you mention
or load it as an image sequence instead
or convert the GIF to a video (this may even reduce the total size)
And as a shameless plug if you should consider APNG instead: I have made apng-parser which does the same for APNG files.
My recommendation though is to convert the GIF/APNG to a video file which gives the animation capabilities for free, potentially smaller files, can be buffered and streamed for long animations, less code to include and typically a single file to deal with (you may have to provide different video formats for older browsers). Free software such as FFMpeg can help you with the conversion.
I'm trying to extract raw streams from devices and files using ffmpeg. I notice the crucial frame information (Video: width, height, pixel format, color space, Audio: sample format) is stored both in the AVCodecContext and in the AVFrame. This means I can access it prior to the stream playing and I can access it for every frame.
How much do I need to account for these values changing frame-to-frame? I found https://ffmpeg.org/doxygen/trunk/demuxing__decoding_8c_source.html#l00081 which indicates that at least width, height, and pixel format may change frame to frame.
Will the color space and sample format also change frame to frame?
Will these changes be temporary (a single frame) or lasting (a significant block of frames) and is there any way to predict for this stream which behavior will occur?
Is there a way to find the most descriptive attributes that this stream is possible of producing, such that I can scale all the lower-quality frames up, but not offer a result that is mindlessly higher-quality than the source, even if this is a device or a network stream where I cannot play all the frames in advance?
The fundamental question is: how do I resolve the flexibility of this API with the restriction that raw streams (my output) do not have any way of specifying a change of stream attributes mid-stream. I imagine I will need to either predict the most descriptive attributes to give the stream, or offer a new stream when the attributes change. Which choice to make depends on whether these values will change rapidly or stay relatively stable.
So, to add to what #szatmary says, the typical use case for stream parameter changes is adaptive streaming:
imagine you're watching youtube on a laptop with various methods of internet connectivity, and suddenly bandwidth decreases. Your stream will automatically switch to a lower bandwidth. FFmpeg (which is used by Chrome) needs to support this.
alternatively, imagine a similar scenario in a rtc video chat.
The reason FFmpeg does what it does is because the API is essentially trying to accommodate to the common denominator. Videos shot on a phone won't ever change resolution. Neither will most videos exported from video editing software. Even videos from youtube-dl will typically not switch resolution, this is a client-side decision, and youtube-dl simply won't do that. So what should you do? I'd just use the stream information from the first frame(s) and rescale all subsequent frames to that resolution. This will work for 99.99% for the cases. Whether you want to accommodate your service to this remaining 0.01% depends on what type of videos you think people will upload and whether resolution changes make any sense in that context.
Does colorspace change? They could (theoretically) in software that mixes screen recording with video fragments, but it's highly unlikely (in practice). Sample format changes as often as video resolution: quite often in the adaptive scenario, but whether you care depends on your service and types of videos you expect to get.
Usually not often, or ever. However, this is based on the codec and are options chosen at encode time. I pass the decoded frames through swscale just in case.
I would like to copy pixels from a 1080p video from one location to another efficiently/with as little CPU impact as possible.
So far my implementation is fairly simple:
using BitmapData's draw() method to grab the pixels from the video
using BitmapData's copyPixels() to shuffle pixels about
Ideally this would have as little CPU impact as possible but I am running out of options and could really use some tips from experienced actionscript 3 developers.
I've profiled my code with Scout and noticed the CPU usage is mostly around 70% but goes above 100% quite a bit. I've looked into StageVideo but one of the main limitations is this:
The video data cannot be copied into a BitmapData object
(BitmapData.draw).
Is there a more direct way to access video pixels, rather than rasterizing a DisplayObject ?
Can I access each video frame as a ByteArray directly and plug it into a BitmapData object ?
(I found appendBytes but it seems to do the reverse of what I need in my setup).
What is the most CPU friendly way to manipulate pixels from an h264 1080p video in actionscript 3 ?
Also, is there a faster way to moving pixels around other than copyPixels() using Flash Player ?Also, I see Scout points out that video is not hardware accelerated( .rend.video.hwrender: false ). Shouldn't h264 video be hardware accelerated (even without stage video) according to this article (or is this for the fullscreen mode only) ?
Latest AIR beta introduced video as texture support which you could possibly use to manipulate the video on GPU (and do that way faster than with BitmapData). But keep in mind that it is currently available for AIR on Windows only and there are some other limitations.
I'm trying to split a video by detecting the presence of a marker (an image) in the frames. I've gone over the documentation and I see removelogo but not detectlogo.
Does anyone know how this could be achieved? I know what the logo is and the region it will be on.
I'm thinking I can extract all frames to png's and then analyse them one by one (or n by n) but it might be a lengthy process...
Any pointers?
ffmpeg doesn't have any such ability natively. The delogo filter simply works by taking a rectangular region in its parameters and interpolating that region based on its surroundings. It doesn't care what the region contained previously; it'll fill in the region regardless of what it previously contained.
If you need to detect the presence of a logo, that's a totally different task. You'll need to create it yourself; if you're serious about this, I'd recommend that you start familiarizing yourself with the ffmpeg filter API and get ready to get your hands dirty. If the logo has a distinctive color, that might be a good way to detect it.
Since what you're after is probably going to just be outputting information on which frames contain (or don't contain) the logo, one filter to look at as a model will be the blackframe filter (which searches for all-black frames).
You can write a detect-logo module, Decode the video(YUV 420P FORMAT), feed the raw frame to this module, Do a SAD(Sum of Absolute Difference) on the region where you expect a logo,if SAD is negligible its a match, record the frame number. You can split the videos at these frames.
SAD is done only on Y(luma) frames. To save processing you can scale the video to a lower resolution before decoding it.
I have successfully detect logo using a rpi and coral ai accelerator in conjunction with ffmeg to to extract the jpegs. Crop the image to just the logo then apply to your trained model. Even then you will need to sample a minute or so of video to determine the actual logos identity.
I use RDP-based Windows' Remote Client Desktop utility to connect to my desktop from my laptop. It's much faster and looks better than remote control applications like TeamViewer etc.
Out of curiosity, why is RDP better?
Thank you.
There are two major factors at work which determine the performance of a remote control product:
How does it detect when changes occur on the screen?
Some RC products divide the screen into tiles and scan the screen frame buffer periodically to determine if any changes have occurred.
Others will hook directly into the OS. In the past this was done by intercepting the video driver. Now you can create a mirror driver into which the OS "mirrors" all drawing operations. This is, obviously, much faster.
How does it send those changes across the wire?
Some products (like VNC) will always send bitmaps of any area that changed.
Others will send the actual operation that caused the change. e.g. render text string s using font f at coordinates (x,y) or draw bezier curve using a given set of parameters and, of course, render bitmap. This is, again, much faster.
RDP uses the faster (and more difficult to implement) technique in both cases. I believe the actual protocol it uses is T.128.
Bitmaps are usually compressed. Some products (like Carbon Copy) also maintain synchronized bitmap caches on both sides of the connection in order to squeeze out even more performance.
RDP is a specific protocol which allows to transmit low-level screen drawing operations. It is also aware of pixmap entities on the screen. For example it understands when an icon is drawn and caches it (typically in a lossy compressed format) on the client side.
Other software does not have this low-level access: It waits for the screen to change and then re-transmit a capture of the screen or the changed regions. Whenever the screen changes, a pixmap representation has to be transmitted. Because this is lossy compressed in general, it also looks worse.