What image processing Library should I use - image

I have been reading https://stackoverflow.com/questions/158756/what-is-the-best-image-manipulation-library And tried a few libraries and are now looking for inputs on what is the best for our need. I will start by describing our current setting and problems.
We have a system that needs to resize and crop a large amount of images from big original images. We handle 50 000+ images every day on 2 powerfull servers. Today we use ImageGlue from WebSupergoo but we don't like it at all, it is slow and hangs the service now and then (Its in another unanswered stack overflow question). We have a threaded windows service that uses Microsoft ThreadPool to resize as much as possible on the 8 core machines.
I have tried AForge and it went very well it was loads faster and never crashed or anything. But I had problems with quality on a few images. This due to what algorithms I used ofc so can be tweaked. But want to widen our eyes to see if thats the right way to go.
so:
It needs to be c# .net and run in a windows service. (Since we wont change the rest of the service only image handling)
It needs to handle threaded environment well.
We have a great need of it being fast since today its too slow. But we also want good quality and small filesize since the images are later displayed on webpage with loads of visitors and needs good quality.
So we have a lot of demands on ability to get god quality at a fast pace, and also secondary keep filesizes lowered even if that can be adjusted with compression a bit.
Any comments or suggestions on what library to use?

I understand it sais that you want to still use C# but providing an alternative.
Depending on the ammount of work you are doing, the fastest way to manipulate images is doing it entirely on a GPU (that would offload most of the pixel work). You can interoperate with CUDA from Managed C++ that you can call from your service. Or use DirectX surfaces and rendering targets (you can have antialiasing and all the high-quality stuff out-of-the-box).
However, before doing anything makes sure your workload is dominated by the trilinear/bilinear resizing and not by the encoding/decoding of the image. BTW you will need at least one fast nVidia videocard on each server to do the offloading (cheap GTX 460 would be more than enough).

Related

What are the performance costs imposed on a browser to download images?

Images don't block initial render. I believe this to be mostly true. This means requesting/downloading the image from the network does not happen on the main thread. I'm assuming decoding/rasterizing an image also happen off the main thread for some browsers (but I might be wrong).
Often I've heard people simply say "just let the images download in the background". However, making the next reasonable step with this information alone, images should have 0 impact on the performance of your web app when looking at Time to Interactive or Time to First Meaningful paint. From my experience, it doesn't. Performance is improved by 2-4 seconds by lazy loading images (using IntersectionObserver) on an image heavy page vs. "just letting them download in the background".
What specific browser internals/steps to decode/paint an image causes performance regressions when loading a web page? What steps take resources from the main thread?
That's a bit broad, there are many things that will affect the rest of the page, and all depending of a lot of different factors.
Network requests are not handled by the CPU, so there should be no overhead here.
Parsing the metadata is implementation dependent, could be using the same process or some dedicated one, but generally that's not an heavy call.
Decoding the image data and rasterization is implementation dependent too, some browsers will do it directly when they get the data, while others will wait until it's really needed to do this, though there are ways to ensure it's done synchronously (on the same thread).
Painting the image may be Hardware Accelerated (done on the GPU) or done by software (on the CPU) and in that case that could impact general performances, but modern renderers will probably discard the images that are not in the current viewport.
And finally, having your <img> elements get resized by their content will cause a complete reflow of your page. That's usually the most notable performance hit when loading images in a web-page, so be sure you set the sizes of your images through CSS in order to prevent that reflow.

Lightweight 2D library for embedded linux

I'm developing an application on relatively restricted embedded Linux platform, meaning it has 256MB of flash; no problem with RAM however. The application uses SPI TFT screen, exposed through framebuffer driver. The only thing required from UI is to support text presentation with various fonts and sizes, including text animations (fade, slide, etc.). On the prototype, which ran on RPi 3 I used libcairo so it went well. Now, provided the tight space constraints on the real platform, it doesn't seem feasible to use libcairo anymore, since according to what I've seen it requires more than 100 MB of space with all dependencies it has. Note however, that I come from bare metal world and never dealt with complex UI, so I might be completely wrong about libcairo and its size. So guys, please suggest what 2D library I could pick for my case (C++ is preferred, but C is also ok), and just in case there is a way to use libcairo with few megs footprint, please point me to the right direction.
Regards

How does WinRT handle BitmapImage and Image memory

I am new to programming Windows Store Apps with C# and I am trying to understand how image memory is handled. My app is very simple:
1) it references a bitmap from a file using a Windows.UI.Xaml.Media.Imaging.BitmapImage object and then uses that as the Source for a Windows.UI.Xaml.Controls.Image object. In my case the image on disk has larger dimensions than what is being displayed on screen so it is being scaled by the system.
My question is how does WinRT handle the memory for the image? I used the vmmap tool and I see in the Mapped File section there is an entry for my image file. I guess this means that the raw bytes for this file are fully loaded into memory. Since this is a JPG these bytes must be decoded into pixel bytes. It seems from my tests that setting the UriSource of the BitmapImage doesn't actually cause any processing to take place since it takes 0 ms and that instead there is some lazy loading going on.
So the questions are: Which object is dominator of the the uncompressed unscaled pixel data? What object is the dominator for the scaled pixel data that gets drawn on screen? Are there tools that can easily show me this? In the Java world I use the Eclipse memory analyzer tool. I tried using PerfView but the results make no sense to me, it seems the tool was meant for analyzing performance.
UPDATE:
At the BUILD conference the team discussed the Windows Performance Toolkit. I never heard anyone mention PerfView so I believe that WPT is the latest and greatest tool for analyzing memory and performance, here is a link:
http://msdn.microsoft.com/en-us/performance/cc825801.aspx
A short answer is most likely "optimally". Not being a smartass, there are just a lot of different systems out there. One mentioned hardware acceleration, you can also consider number of cores, display memory, disk speed, monitor bit depth and resolution, the list goes on and on.

What is the overhead of constantly uploading new Textures to the GPU in OpenGL?

What is the overhead of continually uploading textures to the GPU (and replacing old ones). I'm working on a new cross-platform 3D windowing system that uses OpenGL, and am planning on uploading a single Bitmap for each window (containing the UI elements). That Bitmap would be updated in sync with the GPU (using the VSync). I was wondering if this is a good idea, or if constantly writing bitmaps would incur too much of a performance overhead. Thanks!
Well something like nVidia's Geforce 460M has 60GB/sec bandwidth on local memory.
PCI express 2.0 x16 can manage 8GB/sec.
As such if you are trying to transfer too many textures over the PCIe bus you can expect to come up against memory bandwidth problems. It gives you about 136 meg per frame at 60Hz. Uncompressed 24-bit 1920x1080 is roughly 6 meg. So, suffice to say you could upload a fair few frames of video per frame on a 16x graphics card.
Sure its not as simple as that. There is PCIe overhead of around 20%. All draw commands must be uploaded over that link too.
In general though you should be fine providing you don't over do it. Bear in mind that it would be sensible to upload a texture in one frame that you aren't expecting to use until the next (or even later). This way you don't create a bottleneck where the rendering is halted waiting for a PCIe upload to complete.
Ultimately, your answer is going to be profiling. However, some early optimizations you can make are to avoid updating a texture if nothing has changed. Depending on the size of the textures and the pixel format, this could easily be prohibitively expensive.
Profile with a simpler situation that simulates the kind of usage you expect. I suspect the performance overhead (without the optimization I mentioned, at least) will be unusable if you have a handful of windows bigger, depending on the size of these windows.

Accelerated downloads with HTTP byte range headers

Has anybody got any experience of using HTTP byte ranges across multiple parallel requests to speed up downloads?
I have an app that needs to download fairly large images from a web service (1MB +) and then send out the modified files (resized and cropped) to the browser. There are many of these images so it is likely that caching will be ineffective - i.e. the cache may well be empty. In this case we are hit by some fairly large latency times whilst waiting for the image to download, 500 m/s +, which is over 60% our app's total response time.
I am wondering if I could speed up the download of these images by using a group of parallel HTTP Range requests, e.g. each thread downloads 100kb of data and the responses are concatenated back into a full file.
Does anybody out there have any experience of this sort of thing? Would the overhead of the extra downloads negate a speed increase or might this actually technique work? The app is written in ruby but experiences / examples from any language would help.
A few specifics about the setup:
There are no bandwidth or connection restrictions on the service (it's owned by my company)
It is difficult to pre-generate all the cropped and resized images, there are millions with lots of potential permutations
It is difficult to host the app on the same hardware as the image disk boxes (political!)
Thanks
I found your post by Googling to see if someone had already written a parallel analogue of wget that does this. It's definitely possible and would be helpful for very large files over a relatively high-latency link: I've gotten >10x improvements in speed with multiple parallel TCP connections.
That said, since your organization runs both the app and the web service, I'm guessing your link is high-bandwidth and low-latency, so I suspect this approach will not help you.
Since you're transferring large numbers of small files (by modern standards), I suspect you are actually getting burned by the connection setup more than by the transfer speeds. You can test this by loading a similar page full of tiny images. In your situation you may want to go serial rather than parallel: see if your HTTP client library has an option to use persistent HTTP connections, so that the three-way handshake is done only once per page or less instead of once per image.
If you end up getting really fanatical about TCP latency, it's also possible to cheat, as certain major web services like to.
(My own problem involves the other end of the TCP performance spectrum, where a long round-trip time is really starting to drag on my bandwidth for multi-TB file transfers, so if you do turn up a parallel HTTP library, I'd love to hear about it. The only tool I found, called "puf", parallelizes by files rather than byteranges. If the above doesn't help you and you really need a parallel transfer tool, likewise get in touch: I may have given up and written it by then.)
I've written the backend and services for the sort of place you're pulling images from. Every site is different so details based on what I did might not apply to what you're trying to do.
Here's my thoughts:
If you have a service agreement with the company you're pulling images from (which you should because you have a fairly high bandwidth need), then preprocess their image catalog and store the thumbnails locally, either as database blobs or as files on disk with a database containing the paths to the files.
Doesn't that service already have the images available as thumbnails? They're not going to send a full-sized image to someone's browser either... unless they're crazy or sadistic and their users are crazy and masochistic. We preprocessed our images into three or four different thumbnail sizes so it would have been trivial to supply what you're trying to do.
If your request is something they expect then they should have an API or at least some resources (programmers) who can help you access the images in the fastest way possible. They should actually have a dedicated host for that purpose.
As a photographer I also need to mention that there could be copyright and/or terms-of-service issues with what you're doing, so make sure you're above board by consulting a lawyer AND the site you're accessing. Don't assume everything is ok, KNOW it is. Copyright laws don't fit the general public's conception of what copyrights are, so involving a lawyer up front can be really educational, plus give you a good feeling you're on solid ground. If you've already talked with one then you know what I'm saying.
I would guess that using any p2p network would be useless as there is more permutations then often used files.
Downloading parallel few parts of file can give improvement only in slow networks (slower then 4-10Mbps).
To get any improvement of using parallel download you need to ensure there will be enough server power. From you current problem (waiting over 500ms for connection) I assume you already have problem with servers:
you should add/improve load-balancing,
you should think of changing server software for something with more performance
And again if 500ms is 60% of total response time then you servers are overloaded, if you think they are not you should search for bottle neck in connections/server performance.

Resources