What is the best way to draw 4bpp 2D tiles with multiple palettes? - winapi

I'm creating a generic SNES tilemap editor (similar to NES Screen Tool), meaning I'm drawing a lot of 4bpp tiles. However, my graphics loop takes too long to run, even with CachedBitmaps, which can't have their palettes changed, of which I may need to switch between 8. I can deal with the SNES format and size of things, but am struggling with the Windows side.
// basically the entire graphics drawing routine
case(WM_PAINT):{
PAINTSTRUCT ps;
HDC hdc = BeginPaint(hwnd, &ps);
Gdiplus::Graphics graphics(hdc);
graphics.Clear(ARGB1555toARGB8888(CGRAM[0])); // convert 1st 15-bit CGRAM color to 32-bit & clear bkgd
// tileset2[i]->SetPalette(colorpalette); // called in tileset loading to test 1 palette
for(uint16_t i = 0; i < 1024; i++){
tilesetX[i] = new Gdiplus::CachedBitmap(tileset2[i], &graphics);
}
/* struct SNES_Tile{
uint16_t tileIndex: 10,
uint16_t palette: 3,
uint16_t priority: 1, // (irrelevant for this project)
uint16_t horzFlip: 1,
uint16_t vertFlip: 1,
}*/
// I can see each individual tile being drawn
for(int y = 0; y < 32; y++){
for(int x = 0; x < 32; x++){
// assume tilemap is set to 32x32, and not 64x32 or 32x64 or 64x64
graphics.DrawCachedBitmap(tilesetX[BG2[y * 32 + x] & 0x03FF], x * BG2CHRSize, y * BG2CHRSize);
// BG2[y * 32 + x] & 0x03FF : get tile index from VRAM and strip attributes
// tilesetX[...] : get CachedBitmap to draw
}
}
EndPaint(hwnd, &ps);
break;
}
I am early enough in my program that rewriting the entire graphics routine wouldn't be too much of a hassle.
Should I give up on GDI+ and switch to Direct2D or something else? Is there a faster way to draw 4bpp bitmaps without having to create a copy for each palette?
EDIT:
The reason my graphics drawing routine was so slow was because I was drawing directly to the screen. It is much faster to draw to a separate bitmap as a buffer, then draw the buffer to the screen.
Updating the tile's palette when drawing to the buffer results in perfectly reasonably speeds.

Related

GDI: Create Mountain Chart/Graph?

I can use Polyline() GDI function to plot values to create a graph but now I want the lower part of it filled in to create a mountain type chart. Is there something built-in to help create that? (I don't need gradient, but that would be a nice touch).
TIA!!
For this diagram type you need to draw a filled shape. The Polygon function can be used to draw an irregularly shaped filled object:
The polygon is outlined by using the current pen and filled by using the current brush [...].
The polygon points need to be constructed from the data points (left to right). To turn this into a closed shape, the bottom right and bottom left points of the diagram area need to be appended. The Polygon function then closes the shape automatically by drawing a line from the last vertex to the first.
The following implementation renders a single data set into a given device context's area:
/// \brief Renders a diagram into a DC
///
/// \param dc Device context to render into
/// \param area Diagram area in client coordinates
/// \param pen_col Diagram outline color
/// \param fill_col Diagram fill color
/// \param data Data points to render in data coordinate space
/// \param y_min Y-axis minimum in data coordinate space
/// \param y_max Y-axis maximum in data coordiante space
void render_diagram(HDC dc, RECT area,
COLORREF pen_col, COLORREF fill_col,
std::vector<int> const& data, int y_min, int y_max) {
// Make sure we have data
if (data.size() < 2) { return; }
// Make sure the diagram area isn't empty
if (::IsRectEmpty(&area)) { return; }
// Make sure the y-scale is sane
if (y_max <= y_min) { return; }
std::vector<POINT> polygon{};
// Reserve enough room for the data points plus bottom/right
// and bottom/left to close the shape
polygon.reserve(data.size() + 2);
auto const area_width{ area.right - area.left };
auto const area_height{ area.bottom - area.top };
// Translate coordinates from data space to diagram space
// In lieu of a `zip` view in C++ we're using a raw loop here
// (we need the index to scale the x coordinate, so we cannot
// use a range-based `for` loop)
for (int index{}; index < static_cast<int>(data.size()); ++index) {
// Scale x value
auto const x = ::MulDiv(index, area_width - 1, static_cast<int>(data.size()) - 1) + area.left;
// Flip y value so that the origin is in the bottom/left
auto const y_flipped = y_max - (data[index] - y_min);
// Scale y value
auto const y = ::MulDiv(y_flipped, area_height - 1, y_max - y_min);
polygon.emplace_back(POINT{ x, y });
}
// Semi-close the shape
polygon.emplace_back(POINT{ area.right - 1, area.bottom - 1 });
polygon.emplace_back(POINT{ area.left, area.bottom - 1 });
// Prepare the DC for rendering
auto const prev_pen{ ::SelectObject(dc, ::GetStockObject(DC_PEN)) };
auto const prev_brush{ ::SelectObject(dc, ::GetStockObject(DC_BRUSH)) };
::SetDCPenColor(dc, pen_col);
::SetDCBrushColor(dc, fill_col);
// Render the graph
::Polygon(dc, polygon.data(), static_cast<int>(polygon.size()));
// Restore DC (stock objects do not need to be destroyed)
::SelectObject(dc, prev_brush);
::SelectObject(dc, prev_pen);
}
Most of this function deals with translating and scaling data values into the target (client) coordinate space. The actual rendering is fairly compact in comparison, and starts from the comment reading // Prepare the DC for rendering.
To test this you can start from a standard Windows Desktop application, and dump the following into the WM_PAINT handler:
case WM_PAINT:
{
RECT rc{};
::GetClientRect(hWnd, &rc);
// Leave a 10px border around the diagram area
::InflateRect(&rc, -10, -10);
PAINTSTRUCT ps;
HDC hdc = BeginPaint(hWnd, &ps);
auto pen_col = RGB(0x00, 0x91, 0x7C);
auto fill_col = RGB(0xCC, 0xE9, 0xE4);
render_diagram(hdc, rc, pen_col, fill_col, g_dataset1, 0, 100);
pen_col = RGB(0x02, 0x59, 0x55);
fill_col = RGB(0xCC, 0xDD, 0xDD);
render_diagram(hdc, rc, pen_col, fill_col, g_dataset2, 0, 100);
EndPaint(hWnd, &ps);
}
break;
g_dataset1/g_dataset2 are containers holding random values that serve as test input. It is important to understand, that the final diagram is rendered back to front, meaning that data sets with smaller values need to be rendered after data sets with higher values; the lower portion gets repeatedly overdrawn.
This produces output that looks something like this:
Note that on a HiDpi display device GDI rendering gets auto-scaled. This produces the following:
Looking closely you'll observe that the lines are wider than 1 device pixel. If you'd rather have a more crisp look, you can disable DPI virtualization by declaring the application as DPI aware. Things don't change for a standard DPI display; on a HiDpi display the rendering now looks like this:

Processing: Efficiently create uniform grid

I'm trying to create a grid of an image (in the way one would tile a background with). Here's what I've been using:
PImage bgtile;
PGraphics bg;
int tilesize = 50;
void setup() {
int t = millis();
fullScreen(P2D);
background(0);
bgtile = loadImage("bgtile.png");
int bgw = ceil( ((float) width) / tilesize) + 1;
int bgh = ceil( ((float) height) / tilesize) + 1;
bg = createGraphics(bgw*tilesize,bgh*tilesize);
bg.beginDraw();
for(int i = 0; i < bgw; i++){
for(int j = 0; j < bgh; j++){
bg.image(bgtile, i*tilesize, j*tilesize, tilesize, tilesize);
}
}
bg.endDraw();
print(millis() - t);
}
The timing code says that this takes about a quarter of a second, but by my count there's a full second once the window opens before anything shows up on screen (which should happen as soon as draw is first run). Is there a faster way to get this same effect? (I want to avoid rendering bgtile hundreds of times in the draw loop for obvious reasons)
One way could be to make use of the GPU and let OpenGL repeat a texture for you.
Processing makes it fairly easy to repeat a texture via textureWrap(REPEAT)
Instead of drawing an image you'd make your own quad shape and instead of calling vertex(x, y) for example, you'd call vertex(x, y, u, v); passing texture coordinates (more low level info on the OpenGL link above). The simple idea is x,y would control the geometry on screen and u,v would control how the texture is applied to the geometry.
Another thing you can control is textureMode() which allows you control how you specify the texture coordinates (U, V):
IMAGE mode is the default: you use pixel coordinates (based on the dimensions of the texture)
NORMAL mode uses values between 0.0 and 1.0 (also known as normalised values) where 1.0 means the maximum the texture can go (e.g. image width for U or image height for V) and you don't need to worry about knowing the texture image dimensions
Here's a basic example based on the textureMode() example above:
PImage img;
void setup() {
fullScreen(P2D);
noStroke();
img = loadImage("https://processing.org/examples/moonwalk.jpg");
// texture mode can be IMAGE (pixel dimensions) or NORMAL (0.0 to 1.0)
// normal means 1.0 is full width (for U) or height (for V) without having to know the image resolution
textureMode(NORMAL);
// this is what will make handle tiling for you
textureWrap(REPEAT);
}
void draw() {
// drag mouse on X axis to change tiling
int tileRepeats = (int)map(constrain(mouseX,0,width), 0, width, 1, 100);
// draw a textured quad
beginShape(QUAD);
// set the texture
texture(img);
// x , y , U , V
vertex(0 , 0 , 0 , 0);
vertex(width, 0 , tileRepeats, 0);
vertex(width, height, tileRepeats, tileRepeats);
vertex(0 , height, 0 , tileRepeats);
endShape();
text((int)frameRate+"fps",15,15);
}
Drag the mouse on the Y axis to control the number of repetitions.
In this simple example both vertex coordinates and texture coordinates are going clockwise (top left, top right, bottom right, bottom left order).
There are probably other ways to achieve the same result: using a PShader comes to mind.
Your approach caching the tiles in setup is ok.
Even flattening your nested loop into a single loop at best may only shave a few milliseconds off, but nothing substantial.
If you tried to cache my snippet above it would make a minimal difference.
In this particular case, because of the back and forth between Java/OpenGL (via JOGL), as far as I can tell using VisualVM, it looks like there's not a lot of room for improvement since simply swapping buffers takes so long (e.g. bg.image()):
An easy way to do this would be to use processing's built in get(); which saves a PImage of the coordinates you pass, for example: PImage pic = get(0, 0, width, height); will capture a "screenshot" of your entire window. So, you can create the image like you already are, and then take a screenshot and display that screenshot.
PImage bgtile;
PGraphics bg;
PImage screenGrab;
int tilesize = 50;
void setup() {
fullScreen(P2D);
background(0);
bgtile = loadImage("bgtile.png");
int bgw = ceil(((float) width) / tilesize) + 1;
int bgh = ceil(((float) height) / tilesize) + 1;
bg = createGraphics(bgw * tilesize, bgh * tilesize);
bg.beginDraw();
for (int i = 0; i < bgw; i++) {
for (int j = 0; j < bgh; j++) {
bg.image(bgtile, i * tilesize, j * tilesize, tilesize, tilesize);
}
}
bg.endDraw();
screenGrab = get(0, 0, width, height);
}
void draw() {
image(screenGrab, 0, 0);
}
This will still take a little bit to generate the image, but once it does, there is no need to use the for loops again unless you change the tilesize.
#George Profenza's answer looks more efficient than my solution, but mine may take a little less modification to the code you already have.

Direct Access to CreateDIBitmap Bits

[The final fix, which works unconditionally: use SetDIBitsToDevice, not BitBlt, to copy out the post-text-draw image data. With this change, all occurrences of the problem are gone.]
I fixed the problem I'm having, but for the life of me I can't figure out why it occurred.
Create a bitmap with CreateDIBitmap. Get a pointer to the bitmap bits.
Select the bitmap into a memory DC.
Background fill the bitmap by directly writing the bitmap memory.
TextOut.
No text displays.
What fixed the problem: change item 3. from direct fill to a call to FillRect. All is well, it works perfectly.
This is under Windows 10 but from what little I could find on the web, it spans all versions of Windows. NO operations work on the bitmap - even calling FillRect - after the manual write. No savvy, Kimosabe. Elsewhere in the app, I even build gradient fills by directly writing to that bitmap memory and there is no problem. But once TextOut is called after the manual fill, the bitmap is locked (effectively) and no further functions work on it - nor do any return an error.
I'm using a font with a 90 degree escapement. Have not tried it with a "normal" font, 0 degree escapement. DrawTextEx with DT_CALCRECT specifically states it only works on 0 degree escapement fonts so I had to use TextOut for this reason.
Very bizarre.
No, there were no stupid mistakes like using the same text color as the background color. I've spent too long on this for that. One option people have available is that the endless energy that would normally be spent destroying the question and/or the person who asked it could instead be used to write a few lines of code and try it for yourself.
Here's a function to make a bitmap. Don't pass a plain colour, pass a gradient fill, say going from white to pinkish.
Does it display correctly? If so, does the TextOut call on top of that work?
static HBITMAP MakeBitmap(unsigned char *rgba, int width, int height, VOID **buff)
{
VOID *pvBits; // pointer to DIB section
HBITMAP answer;
BITMAPINFO bmi;
HDC hdc;
int x, y;
int red, green, blue, alpha;
// setup bitmap info
bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
bmi.bmiHeader.biWidth = width;
bmi.bmiHeader.biHeight = height;
bmi.bmiHeader.biPlanes = 1;
bmi.bmiHeader.biBitCount = 32; // four 8-bit components
bmi.bmiHeader.biCompression = BI_RGB;
bmi.bmiHeader.biSizeImage = width * height * 4;
hdc = CreateCompatibleDC(GetDC(0));
answer = CreateDIBSection(hdc, &bmi, DIB_RGB_COLORS, &pvBits, NULL, 0x0);
for (y = 0; y < height; y++)
{
for (x = 0; x < width; x++)
{
red = rgba[(y*width + x) * 4];
green = rgba[(y*width + x) * 4 + 1];
blue = rgba[(y*width + x) * 4 + 2];
alpha = rgba[(y*width + x) * 4 + 3];
red = (red * alpha) >> 8;
green = (green * alpha) >> 8;
blue = (blue * alpha) >> 8;
((UINT32 *)pvBits)[(height - y - 1) * width + x] = (alpha << 24) | (red << 16) | (green << 8) | blue;
}
}
DeleteDC(hdc);
*buff = pvBits;
return answer;
}

Windows StretchBlt API performance

I timed a DDB drawing operation which uses multiple StretchBlt and StretchDIBits calls.
And I found that, time to complete is increase/decrease proportionally to the destination window size.
With 900x600 window it takes around 5ms, but with 1920x1080 it takes as large as 55ms (source image is 1280x640).
It seems Stretch.. APIs don't use any hardware acceleration features.
Source image (actually this is temporary drawing canvas) is created with CreateDIBSection because I need resulting (stretched and merged) bitmap's pixel data for every frame drawn.
Let's assume, Windows GDI is hopeless. Then what is the promising alternative?
I considered D3D, D2D with WIC method (write to WIC bitmap and draw it with D2D then read back pixel data from the WIC bitmap).
I planed to try D2D with WIC method because I will needed to use extensive text drawing feature sometime soon.
But it seems WIC is not that promising: What is the most effective pixel format for WIC bitmap processing?
I've implemented D2D + WIC routine today. Test results are really good.
With my previous GDI StretchDIBits version, it took 20 ~ 60ms time for drawing 1280x640 DDB into a 1920x1080 window. After switching to Direct2D + WIC, it usually takes under 5ms, also picture quality looks better.
I used ID2D1HwndRenderTarget with WicBitmapRenderTarget, because I need to read/write raw pixel data.
HWndRenderTarget is only used for screen painting (WM_PAINT).
The main advantage of HWndRenderTarget is that the destination window size doesn't affect drawing performance.
WicBitmapRenderTarget is used as a temporary drawing canvas (as Memory DC in GDI drawing). We can create WicBitmapRenderTarget with a WIC bitmap object (like GDI DIBSection). We can read/write raw pixel data from/to this WIC bitmap at any time. Also it's very fast. For side note, somewhat similar D3D GetFrontBufferData call is really slow.
Actual pixel I/O is done through IWICBitmap and IWICBitmapLock interface.
Writing:
IWICBitmapPtr m_wicRemote;
...
const uint8* image = ...;
...
WICRect rcLock = { 0, 0, width, height };
IWICBitmapLockPtr wicLock;
hr = m_wicRemote->Lock(&rcLock, WICBitmapLockWrite, &wicLock);
if (SUCCEEDED(hr))
{
UINT cbBufferSize = 0;
BYTE *pv = NULL;
hr = wicLock->GetDataPointer(&cbBufferSize, &pv);
if (SUCCEEDED(hr))
{
memcpy(pv, image, cbBufferSize);
}
}
m_wicRenderTarget->BeginDraw();
m_wicRenderTarget->SetTransform(D2D1::Matrix3x2F::Identity());
ID2D1BitmapPtr d2dBitmap;
hr = m_wicRenderTarget->CreateBitmapFromWicBitmap(m_wicRemote, &d2dBitmap.GetInterfacePtr());
if (SUCCEEDED(hr))
{
float cw = (renderTargetSize.width / 2);
float ch = renderTargetSize.height;
float x, y, w, h;
FitFrameToCenter(cw, ch, (float)width, (float)height, x, y, w, h);
m_wicRenderTarget->DrawBitmap(d2dBitmap, D2D1::RectF(x, y, x + w, y + h));
}
m_wicRenderTarget->EndDraw();
Reading:
IWICBitmapPtr m_wicCanvas;
IWICBitmapLockPtr m_wicLockedData;
...
UINT width, height;
HRESULT hr = m_wicCanvas->GetSize(&width, &height);
if (SUCCEEDED(hr))
{
WICRect rcLock = { 0, 0, width, height };
hr = m_wicCanvas->Lock(&rcLock, WICBitmapLockRead, &m_wicLockedData);
if (SUCCEEDED(hr))
{
UINT cbBufferSize = 0;
BYTE *pv = NULL;
hr = m_wicLockedData->GetDataPointer(&cbBufferSize, &pv);
if (SUCCEEDED(hr))
{
return pv; // return data pointer
// need to Release m_wicLockedData after reading is done
}
}
}
Drawing:
ID2D1HwndRenderTargetPtr m_renderTarget;
....
D2D1_SIZE_F renderTargetSize = m_renderTarget->GetSize();
m_renderTarget->BeginDraw();
m_renderTarget->SetTransform(D2D1::Matrix3x2F::Identity());
m_renderTarget->Clear(D2D1::ColorF(D2D1::ColorF::Black));
ID2D1BitmapPtr d2dBitmap;
hr = m_renderTarget->CreateBitmapFromWicBitmap(m_wicCanvas, &d2dBitmap.GetInterfacePtr());
if (SUCCEEDED(hr))
{
UINT width, height;
hr = m_wicCanvas->GetSize(&width, &height);
if (SUCCEEDED(hr))
{
float x, y, w, h;
FitFrameToCenter(renderTargetSize.width, renderTargetSize.height, (float)width, (float)height, x, y, w, h);
m_renderTarget->DrawBitmap(d2dBitmap, D2D1::RectF(x, y, x + w, y + h));
}
}
m_renderTarget->EndDraw();
In my opinion, GDI Stretch.. APIs are totally useless in Windows 7+ setup (for performance sensitive applications).
Also note that, unlike Direct3D, basic graphics operations such as text drawing, ling drawing are really simple in Direct2D.

Upscaling images on Retina devices

I know images upscale by default on retina devices, but the default scaling makes the images blurry.
I was wondering if there was a way to scale it in nearest-neighbor mode, where there are no transparent pixels created, but rather each pixel multiplied by 4, so it looks like it would on a non retina device.
Example of what I'm talking about can be seen in the image below.
example http://cclloyd.com/downloads/sdfsdf.png
CoreGraphics will not do a 2x scale like that, you need to write a bit of explicit pixel mapping logic to do something like this. The following is some code I used to do this operation, you would of course need to fill in the details as this operates on an input buffer of pixels and writes to an output buffer of pixels that is 2x larger.
// Use special case "DOUBLE" logic that will simply duplicate the exact
// RGB value from the indicated pixel into the 2x sized output buffer.
int numOutputPixels = resizedFrameBuffer.width * resizedFrameBuffer.height;
uint32_t *inPixels32 = (uint32_t*)cgFrameBuffer.pixels;
uint32_t *outPixels32 = (uint32_t*)resizedFrameBuffer.pixels;
int outRow = 0;
int outColumn = 0;
for (int i=0; i < numOutputPixels; i++) {
if ((i > 0) && ((i % resizedFrameBuffer.width) == 0)) {
outRow += 1;
outColumn = 0;
}
// Divide by 2 to get the column/row in the input framebuffer
int inColumn = outColumn / 2;
int inRow = outRow / 2;
// Get the pixel for the row and column this output pixel corresponds to
int inOffset = (inRow * cgFrameBuffer.width) + inColumn;
uint32_t pixel = inPixels32[inOffset];
outPixels32[i] = pixel;
//fprintf(stdout, "Wrote 0x%.10X for 2x row/col %d %d (%d), read from row/col %d %d (%d)\n", pixel, outRow, outColumn, i, inRow, inColumn, inOffset);
outColumn += 1;
}
This code of course depends on you creating a buffer of pixels and then wrapping it back up into CFImageRef. But, you can find all the code to do that kind of thing easily.

Resources