How does WinRT handle BitmapImage and Image memory - memory-management

I am new to programming Windows Store Apps with C# and I am trying to understand how image memory is handled. My app is very simple:
1) it references a bitmap from a file using a Windows.UI.Xaml.Media.Imaging.BitmapImage object and then uses that as the Source for a Windows.UI.Xaml.Controls.Image object. In my case the image on disk has larger dimensions than what is being displayed on screen so it is being scaled by the system.
My question is how does WinRT handle the memory for the image? I used the vmmap tool and I see in the Mapped File section there is an entry for my image file. I guess this means that the raw bytes for this file are fully loaded into memory. Since this is a JPG these bytes must be decoded into pixel bytes. It seems from my tests that setting the UriSource of the BitmapImage doesn't actually cause any processing to take place since it takes 0 ms and that instead there is some lazy loading going on.
So the questions are: Which object is dominator of the the uncompressed unscaled pixel data? What object is the dominator for the scaled pixel data that gets drawn on screen? Are there tools that can easily show me this? In the Java world I use the Eclipse memory analyzer tool. I tried using PerfView but the results make no sense to me, it seems the tool was meant for analyzing performance.
At the BUILD conference the team discussed the Windows Performance Toolkit. I never heard anyone mention PerfView so I believe that WPT is the latest and greatest tool for analyzing memory and performance, here is a link:

A short answer is most likely "optimally". Not being a smartass, there are just a lot of different systems out there. One mentioned hardware acceleration, you can also consider number of cores, display memory, disk speed, monitor bit depth and resolution, the list goes on and on.


Which library/code is responsible for rendering the terminal in retro computers?

For example as you type, which library is telling the computer screen to display the respective ascii character and to move the cursor accordingly?
Imagine something like the old school computers (with no GUI) running DOS or Basic... what/which library is responsible for the UI?
Links to source code would be great for understanding how said library(ies) works.
The photo you have posted is of a BBC Micro running in Mode 7. This was an exception to most rules. Mode 7 was a low memory mode, in which there where no pixels, just 256 text characters. 1K of memory was reserved in RAM to contain what was displayed on the screen at that moment. A special chip on the circuit board, called the Video ULA (Uncommited Logic Array) read the contents of that memory and coded it to the output. The ULA was ROM and could not be changed by the programmer.
The ZX81 worked in a similar way: 256 possible text characters and no pixels. However the ZX81 had less dedicated chips and the main CPU did most of the work.
A more common setup was that every pixel was represented by a number of bits in memory (often more than one bit per pixel was needed because colours had to be indicated). Examples are BBC in modes 1-6; the Acorn Electron; Spectrum; C64; also many others. When the user placed text on the screen, the computers ROM would convert this to the correct pixels. Graphics could often be written directly to the RAM, or 'plotted' via BASIC. Once again, dedicated ROM chips and circuitry would then render this memory to the output. This approach required much more memory to display.
Every 8 bit computer had its own way of representing the display in RAM. You need to get manuals of the machine you are trying to program (easy to find on internet for the better known Micros).
Many emulators are open source, if you want to see the internals. For example:
If you're interested in seeing the internals of a terminal to better understand how it works and renders input/output, Bash is completely open source. You can download its latest source code here.

XAP file from XNA game is huge, how can I compile resources without images being so big?

I typically have been writing xna games for windows phone 7 and set all my content to a build action of compile, which is default; what I've noticed is that my XAP file is now huge after finishing a new project, it seems to have taken 15MB worth of images and blown them up to 200MB in size. Is there anyway to get the build to be smaller while keeping the images compiled? From what I read it compiles images as basically full bitmaps. What's another direction I can take to resolve this issue, as forcing users to download a 200MB app seems unfair when at most it should only take up 15-20MB.
The XNA Content Pipeline basically stores images as they will be used on the GPU. That is either as an uncompressed bitmap, or DXT compressed (which doesn't compress it by much).
So if your original files were in jpeg format (or, to a lesser extent, png), you will find that your original files are much smaller than the built XNB files.
So the answer is to distribute your original jpeg and png files, and load them with Texture2D.FromStream. Note that this uses more CPU power to convert them into the right format at runtime (although I've heard reports of faster loading in some cases, because there's less data being transferred). Also you'll have to do premultiplied alpha manually yourself (and anything else that the content pipeline is handling for you).
Another thing you might want to look into is turning on compression for your sound effects. By default they are uncompressed. See this answer for details.
For more info, this article looks helpful.

What image processing Library should I use

I have been reading And tried a few libraries and are now looking for inputs on what is the best for our need. I will start by describing our current setting and problems.
We have a system that needs to resize and crop a large amount of images from big original images. We handle 50 000+ images every day on 2 powerfull servers. Today we use ImageGlue from WebSupergoo but we don't like it at all, it is slow and hangs the service now and then (Its in another unanswered stack overflow question). We have a threaded windows service that uses Microsoft ThreadPool to resize as much as possible on the 8 core machines.
I have tried AForge and it went very well it was loads faster and never crashed or anything. But I had problems with quality on a few images. This due to what algorithms I used ofc so can be tweaked. But want to widen our eyes to see if thats the right way to go.
It needs to be c# .net and run in a windows service. (Since we wont change the rest of the service only image handling)
It needs to handle threaded environment well.
We have a great need of it being fast since today its too slow. But we also want good quality and small filesize since the images are later displayed on webpage with loads of visitors and needs good quality.
So we have a lot of demands on ability to get god quality at a fast pace, and also secondary keep filesizes lowered even if that can be adjusted with compression a bit.
Any comments or suggestions on what library to use?
I understand it sais that you want to still use C# but providing an alternative.
Depending on the ammount of work you are doing, the fastest way to manipulate images is doing it entirely on a GPU (that would offload most of the pixel work). You can interoperate with CUDA from Managed C++ that you can call from your service. Or use DirectX surfaces and rendering targets (you can have antialiasing and all the high-quality stuff out-of-the-box).
However, before doing anything makes sure your workload is dominated by the trilinear/bilinear resizing and not by the encoding/decoding of the image. BTW you will need at least one fast nVidia videocard on each server to do the offloading (cheap GTX 460 would be more than enough).

What is the overhead of constantly uploading new Textures to the GPU in OpenGL?

What is the overhead of continually uploading textures to the GPU (and replacing old ones). I'm working on a new cross-platform 3D windowing system that uses OpenGL, and am planning on uploading a single Bitmap for each window (containing the UI elements). That Bitmap would be updated in sync with the GPU (using the VSync). I was wondering if this is a good idea, or if constantly writing bitmaps would incur too much of a performance overhead. Thanks!
Well something like nVidia's Geforce 460M has 60GB/sec bandwidth on local memory.
PCI express 2.0 x16 can manage 8GB/sec.
As such if you are trying to transfer too many textures over the PCIe bus you can expect to come up against memory bandwidth problems. It gives you about 136 meg per frame at 60Hz. Uncompressed 24-bit 1920x1080 is roughly 6 meg. So, suffice to say you could upload a fair few frames of video per frame on a 16x graphics card.
Sure its not as simple as that. There is PCIe overhead of around 20%. All draw commands must be uploaded over that link too.
In general though you should be fine providing you don't over do it. Bear in mind that it would be sensible to upload a texture in one frame that you aren't expecting to use until the next (or even later). This way you don't create a bottleneck where the rendering is halted waiting for a PCIe upload to complete.
Ultimately, your answer is going to be profiling. However, some early optimizations you can make are to avoid updating a texture if nothing has changed. Depending on the size of the textures and the pixel format, this could easily be prohibitively expensive.
Profile with a simpler situation that simulates the kind of usage you expect. I suspect the performance overhead (without the optimization I mentioned, at least) will be unusable if you have a handful of windows bigger, depending on the size of these windows.

Report Direct3D memory usage

I have a Direct3D 9 application and I would like to monitor the memory usage.
Is there a tool to know how much system and video memory is used by Direct3D?
Ideally, it would also report how much is allocated for textures, vertex buffers, index buffers...
You can use the old DirectDraw interface to query the total and available memory.
The numbers you get that way are not reliable though.
The free memory may change at any instant and the available memory often takes the AGP-memory into account (which is strictly not video-memory). You can use the numbers to do a good guess about the default texture-resolutions and detail-level of your application/game, but that's it.
You may wonder why is there no way to get better numbers, after all it can't be to hard to track the resource-usage.
From an application point of view this is correct. You may think that the video memory just contains surfaces, textures, index- and vertex buffers and some shader-programs, but that's not true on the low-level side.
There are lots of other resources as well. All these are created and managed by the Direct3D driver to make the rendering as fast as possible. Among others there are hirarchical z-buffer acceleration structures, pre-compiled command lists (e.g. the data required to render something in the format as understood by the GPU). The driver also may queue rendering-commands for multiple frames in advance to even out the frame-rate and increase parallelity between the GPU and CPU.
The driver also does a lot of work under the hood for you. Heuristics are used to detect draw-calls with static geometry and constant rendering-settings. A driver may decide to optimize the geometry in these cases for better cache-usage. This all happends in parallel and under the control of the driver. All this stuff needs space as well so the free memory may changes at any time.
However, the driver also does caching for your resources, so you don't really need to know the resource-usage at the first place.
If you need more space than available the that's no problem. The driver will move the data between system-ram, AGP-memory and video ram for you. In practice you never have to worry that you run out of video-memory. Sure - once you need more video-memory than available the performance will suffer, but that's life :-)
Two suggestions:
You can call GetAvailableTextureMem in various times to obtain a (rough) estimate of overall memory usage progression.
Assuming you develop on nVidia's, PerfHUD includes a graphical representation of consumed AGP/VID memory (separated).
You probably won't be able to obtain a nice clean matrix of memory consumers (vertex buffers etc.) vs. memory location (AGP, VID, system), as -
(1) the driver has a lot of freedom in transferring resources between memory types, and
(2) the actual variety of memory consumers is far greater than the exposed D3D interfaces.
