Does Mat in hls:Mat really represents a matrix?

Does Mat in hls:Mat really represents a matrix? - fpga

I am working on Vivado HLS. I am reading an image via stream and storing it in hls:mat. I want to perform an element-wise operation on this mat. Does mat really represent a matrix? Is there a way in which I can access it like a Matrix i.e. A[rows][columns]?
Method A.at<double>(0,0) is not working.

No, according to Xilinx application note XAPP1167:
A second limitation is that the hls::Mat<> datatype used to model
images is internally defined as a stream of pixels, using the
hls::stream<> datatype, rather than as an array of pixels in external
memory. As a result, random access is not supported on images, and the
cv::Mat<>.at() method and cvGet2D() function have no corresponding
equivalent function in the synthesizable library.
So you can only stream data to/from hls::Mat and you cannot access a random element.

I found the answer using Sobel code (XAP1167)
void created_window(MY_IMAGE& src, MY_IMAGE& dst, int rows, int cols)
{
MY_BUFFER buff_A;
MY_WINDOW WINDOW_3x3;
for(int row = 0; row < rows+1; row++){
for(int col = 0; col < cols+1; col++){
#pragma HLS loop_flatten off
#pragma HLS dependence variable=&buff_A false
#pragma HLS PIPELINE II = 1
// Temp values are used to reduce the number of memory reads
unsigned char temp;
MY_PIXEL tempx;
//Line Buffer fill
if(col < cols){
buff_A.shift_down(col);
temp = buff_A.getval(0,col);
}
//There is an offset to accommodate the active pixel region
//There are only MAX_WIDTH and MAX_HEIGHT valid pixels in the image
if(col < cols && row < rows){
MY_PIXEL new_pix;
src >> new_pix;
tempx = new_pix;
buff_A.insert_bottom(tempx.val[0],col);
}
//Shift the processing window to make room for the new column
WINDOW_3x3.shift_right();
//The processing window only needs to store luminance values
//rgb2y function computes the luminance from the color pixel
if(col < cols){
WINDOW_3x3.insert(buff_A.getval(2,col),2,0);
WINDOW_3x3.insert(temp,1,0);
WINDOW_3x3.insert(tempx.val[0],0,0);
}
MY_PIXEL conn_obj;
//The operator only works on the inner part of the image
//This design assumes there are no connected objects on the boundary of the image
conn_obj = find_conn(&WINDOW_3x3);
//The output image is offset from the input to account for the line buffer
if(row > 0 && col > 0) {
dst << conn_obj;
}
}
}
}
void create_window(AXI_STREAM& video_in, AXI_STREAM& video_out, int rows, int cols)
{
//Create AXI streaming interfaces for the core
#pragma HLS INTERFACE axis port=video_in bundle=INPUT_STREAM
#pragma HLS INTERFACE axis port=video_out bundle=OUTPUT_STREAM
#pragma HLS INTERFACE s_axilite port=rows bundle=CONTROL_BUS offset=0x14
#pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS offset=0x1C
#pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
#pragma HLS INTERFACE ap_stable port=rows
#pragma HLS INTERFACE ap_stable port=cols
MY_IMAGE img_0(rows, cols);
MY_IMAGE img_1(rows, cols);
#pragma HLS dataflow
hls::AXIvideo2Mat(video_in, img_0);
created_window(img_0, img_1, rows, cols);
hls::Mat2AXIvideo(img_0, video_out);
}

Related

How to correctly use class in external header files with C++ in Visual studio 2019?

I referenced the header file where the class is located and put its location in the additional include Directories, but it still reports an error LNK2019, I don't know what I did wrong? I've tried multiple methods of this, but none of them seem to work. Any ideas? The code is as follows and the head files are attached. Thanks in advance.[enter image description here][1]
The header files are in the trng folder in this link https://github.com/rabauke/trng4
#include <cstdlib>
#include <iostream>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/uniform01_dist.hpp>
int main() {
const long samples = 1000000l; // total number of points in square
long in = 0l; // no points in circle
// distribute workload over all processes and make a global reduction
#pragma omp parallel reduction(+:in)
{
trng::yarn2 rx, ry; // random number engines for x- and y-coordinates
int size = omp_get_num_threads(); // get total number of processes
int rank = omp_get_thread_num(); // get rank of current process
// split PRN sequences by leapfrog method
rx.split(2, 0); // choose sub-stream no. 0 out of 2 streams
ry.split(2, 1); // choose sub-stream no. 1 out of 2 streams
rx.split(size, rank); // choose sub-stream no. rank out of size streams
ry.split(size, rank); // choose sub-stream no. rank out of size streams
trng::uniform01_dist<> u; // random number distribution
// throw random points into square
for (long i = rank; i < samples; i += size) {
double x = u(rx), y = u(ry); // choose random x- and y-coordinates
if (x * x + y * y <= 1.0) // is point in circle?
++in; // increase thread-local counter
}
}
// print result
std::cout << "pi = " << 4.0 * in / samples << std::endl;
return EXIT_SUCCESS;
}
[1]: https://i.stack.imgur.com/eTKlU.png

FFMPEG - AVFrame to per channel array conversion

I am looking to copy an AVFrame into an array where pixels are stored one channel at a time in a row-major order.
Details:
I am using FFMPEG's api to read frames from a video. I have used avcodec_decode_video2 to fetch each frame as an AVFrame as follows:
AVFormatContext* fmt_ctx = NULL;
avformat_open_input(&fmt_ctx, filepath, NULL, NULL);
...
int video_stream_idx; // stores the stream index for the video
...
AVFrame* vid_frame = NULL;
vid_frame = av_frame_alloc();
AVPacket vid_pckt;
int frame_finish;
...
while (av_read_frame(fmt_ctx, &vid_pckt) >= 0) {
if (b_vid_pckt.stream_index == video_stream_idx) {
avcodec_decode_video2(cdc_ctx, vid_frame, &frame_finish, &vid_pckt);
if (frame_finish) {
/* perform conversion */
}
}
}
The destination array looks like this:
unsigned char* frame_arr = new unsigned char [cdc_ctx->width * cdc_ctx->height * 3];
I need to copy all of vid_frame into frame_arr, where the range of pixel values should be [0, 255]. The problem is that the array needs to store the frame in row major order, one channel at a time, i.e. R11, R12, ... R21, R22, ... G11, G12, ... G21, G22, ... B11, B12, ... B21, B22, ... (I have used the notation [color channel][row index][column index], i.e. G21 is the green channel value of pixel at row 2, column 1). I have had a look at sws_scale, but I don't understand it enough to figure out whether that function is capable of doing such a conversion. Can somebody help!! :)

The format you called "one channel at a time" has a term named planar. (btw, the opposite format is named packed) And almost every pixel format is of row order.
The problem here is the input format may vary and all of them should be converted to one format. That's what sws_scale() does.
However, there is no such planar RGB format in ffmpeg libs yet. You have to write your own pixel format description into ffmpeg source code libavutil/pixdesc.c and re-build the libs.
Or you can just convert the frame into AV_PIX_FMT_GBRP format, which is the most similar one to what you want. AV_PIX_FMT_GBRP is a planar format, while the green channel is at first and red at last (blue middle). And rearrange these channels then.
// Create a SwsContext first:
SwsContext* sws_ctx = sws_getContext(cdc_ctx->width, cdc_ctx->height, cdc_ctx->pix_fmt, cdc_ctx->width, cdc_ctx->height, AV_PIX_FMT_GBRP, 0, 0, 0, 0);
// alloc some new space for storing converted frame
AVFrame* gbr_frame = av_frame_alloc();
picture->format = AV_PIX_FMT_GBRP;
picture->width = cdc_ctx->width;
picture->height = cdc_ctx->height;
av_frame_get_buffer(picture, 32);
....
while (av_read_frame(fmt_ctx, &vid_pckt) >=0) {
ret = avcodec_send_packet(cdc_ctx, &vid_pckt);
// In particular, we don't expect AVERROR(EAGAIN), because we read all
// decoded frames with avcodec_receive_frame() until done.
if (ret < 0)
break;
ret = avcodec_receive_frame(cdc_ctx, vid_frame);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
break;
if (ret >= 0) {
// convert image from native format to planar GBR
sws_scale(sws_ctx, vid_frame->data,
vid_frame->linesize, 0, vid_frame->height,
gbr_frame->data, gbr_frame->linesize);
// rearrange gbr channels in gbr_frame as you like
// g channel is gbr_frame->data[0]
// b channel is gbr_frame->data[1]
// r channel is gbr_frame->data[2]
// ......
}
}
av_frame_free(gbr_frame);
av_frame_free(vid_frame);
sws_freeContext(sws_ctx);
avformat_free_context(fmt_ctx)

Compose new 32bit bitmap with alpha channel from other 32bit bitmaps

I several a 32bit bitmap with Alpha channel.
I need to compose a new Bitmap that has again an alpha channel. So the final bitmap is later used with AlphaBlend.
There is no need for stretching. If there would be no alpha channel, I would just use BitBlt to create the new bitmap.
I am not using managed code, I just want to do this with standard GDI / WinAPI functions. Also I am interested in a solution that there is no need for some special libraries.
TIA
Note: I know that I can use several AphaBlend functions to do the same composition in the final output. But for the ease of use in my program I would prefer to compose such a bitmap once.

You can go through every pixel and compose them manually:
void ComposeBitmaps(BITMAP* bitmaps, int bitmapCount, BITMAP& outputBitmap)
{
for(int y=0; y<outputBitmap.bmHeight; ++y)
{
for(int x=0; x<outputBitmap.bmWidth; ++x)
{
int b = 0;
int g = 0;
int r = 0;
int a = 0;
for(int i=0; i<bitmapCount; ++i)
{
unsigned char* samplePtr = (unsigned char*)bitmaps[i].bmBits+(y*outputBitmap.bmWidth+x)*4;
b += samplePtr[0]*samplePtr[3];
g += samplePtr[1]*samplePtr[3];
r += samplePtr[2]*samplePtr[3];
a += samplePtr[3];
}
unsigned char* outputSamplePtr = (unsigned char*)outputBitmap.bmBits+(y*outputBitmap.bmWidth+x)*4;
if(a>0)
{
outputSamplePtr[0] = b/a;
outputSamplePtr[1] = g/a;
outputSamplePtr[2] = r/a;
outputSamplePtr[3] = a/bitmapCount;
}
else
outputSamplePtr[3] = 0;
}
}
(Assuming all bitmaps are 32-bit and have the same width and height)
Or, if you want to draw bitmaps one on top of another, rather than mix them in equal proportions:
unsigned char* outputSamplePtr = (unsigned char*)outputBitmap.bmBits+(y*outputBitmap.bmWidth+x)*4;
outputSamplePtr[3] = 0;
for(int i=0; i<bitmapCount; ++i)
{
unsigned char* samplePtr = (unsigned char*)bitmaps[i].bmBits+(y*outputBitmap.bmWidth+x)*4;
outputSamplePtr[0] = (outputSamplePtr[0]*outputSamplePtr[3]*(255-samplePtr[3])+samplePtr[0]*samplePtr[3]*255)/(255*255);
outputSamplePtr[1] = (outputSamplePtr[1]*outputSamplePtr[3]*(255-samplePtr[3])+samplePtr[1]*samplePtr[3]*255)/(255*255);
outputSamplePtr[2] = (outputSamplePtr[2]*outputSamplePtr[3]*(255-samplePtr[3])+samplePtr[2]*samplePtr[3]*255)/(255*255);
outputSamplePtr[3] = samplePtr[3]+outputSamplePtr[3]*(255-samplePtr[3])/255;
}

I found the following solution that fits best for me.
I Create a new target bitmap with CreateDIBSection
I prefill the new bitmap with fully transparent pixels. (FillMemory/ZeroMemory)
I Receive the Pixel that needs to be copied with GetDIBits. If possible form the width I directly copy the rows into the buffer I previously created. Otherwise I copy the data row by row into the buffer created in step.
The resulting bitmap can be used with AlphaBlend and in CImageList objects.
Because the bitmaps don't overlap I don't need take care about the target data.

How to reduce the number of colors in an image with OpenCV?

I have a set of image files, and I want to reduce the number of colors of them to 64. How can I do this with OpenCV?
I need this so I can work with a 64-sized image histogram.
I'm implementing CBIR techniques
What I want is color quantization to a 4-bit palette.

This subject was well covered on OpenCV 2 Computer Vision Application Programming Cookbook:
Chapter 2 shows a few reduction operations, one of them demonstrated here in C++ and later in Python:
#include <iostream>
#include <vector>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
void colorReduce(cv::Mat& image, int div=64)
{
int nl = image.rows; // number of lines
int nc = image.cols * image.channels(); // number of elements per line
for (int j = 0; j < nl; j++)
{
// get the address of row j
uchar* data = image.ptr<uchar>(j);
for (int i = 0; i < nc; i++)
{
// process each pixel
data[i] = data[i] / div * div + div / 2;
}
}
}
int main(int argc, char* argv[])
{
// Load input image (colored, 3-channel, BGR)
cv::Mat input = cv::imread(argv[1]);
if (input.empty())
{
std::cout << "!!! Failed imread()" << std::endl;
return -1;
}
colorReduce(input);
cv::imshow("Color Reduction", input);
cv::imwrite("output.jpg", input);
cv::waitKey(0);
return 0;
}
Below you can find the input image (left) and the output of this operation (right):
The equivalent code in Python would be the following:
(credits to #eliezer-bernart)
import cv2
import numpy as np
input = cv2.imread('castle.jpg')
# colorReduce()
div = 64
quantized = input // div * div + div // 2
cv2.imwrite('output.jpg', quantized)

You might consider K-means, yet in this case it will most likely be extremely slow. A better approach might be doing this "manually" on your own. Let's say you have image of type CV_8UC3, i.e. an image where each pixel is represented by 3 RGB values from 0 to 255 (Vec3b). You might "map" these 256 values to only 4 specific values, which would yield 4 x 4 x 4 = 64 possible colors.
I've had a dataset, where I needed to make sure that dark = black, light = white and reduce the amount of colors of everything between. This is what I did (C++):
inline uchar reduceVal(const uchar val)
{
if (val < 64) return 0;
if (val < 128) return 64;
return 255;
}
void processColors(Mat& img)
{
uchar* pixelPtr = img.data;
for (int i = 0; i < img.rows; i++)
{
for (int j = 0; j < img.cols; j++)
{
const int pi = i*img.cols*3 + j*3;
pixelPtr[pi + 0] = reduceVal(pixelPtr[pi + 0]); // B
pixelPtr[pi + 1] = reduceVal(pixelPtr[pi + 1]); // G
pixelPtr[pi + 2] = reduceVal(pixelPtr[pi + 2]); // R
}
}
}
causing [0,64) to become 0, [64,128) -> 64 and [128,255) -> 255, yielding 27 colors:
To me this seems to be neat, perfectly clear and faster than anything else mentioned in other answers.
You might also consider reducing these values to one of the multiples of some number, let's say:
inline uchar reduceVal(const uchar val)
{
if (val < 192) return uchar(val / 64.0 + 0.5) * 64;
return 255;
}
which would yield a set of 5 possible values: {0, 64, 128, 192, 255}, i.e. 125 colors.

There are many ways to do it. The methods suggested by jeff7 are OK, but some drawbacks are:
method 1 have parameters N and M, that you must choose, and you must also convert it to another colorspace.
method 2 answered can be very slow, since you should compute a 16.7 Milion bins histogram and sort it by frequency (to obtain the 64 higher frequency values)
I like to use an algorithm based on the Most Significant Bits to use in a RGB color and convert it to a 64 color image. If you're using C/OpenCV, you can use something like the function below.
If you're working with gray-level images I recommed to use the LUT() function of the OpenCV 2.3, since it is faster. There is a tutorial on how to use LUT to reduce the number of colors. See: Tutorial: How to scan images, lookup tables... However I find it more complicated if you're working with RGB images.
void reduceTo64Colors(IplImage *img, IplImage *img_quant) {
int i,j;
int height = img->height;
int width = img->width;
int step = img->widthStep;
uchar *data = (uchar *)img->imageData;
int step2 = img_quant->widthStep;
uchar *data2 = (uchar *)img_quant->imageData;
for (i = 0; i < height ; i++) {
for (j = 0; j < width; j++) {
// operator XXXXXXXX & 11000000 equivalent to XXXXXXXX AND 11000000 (=192)
// operator 01000000 >> 2 is a 2-bit shift to the right = 00010000
uchar C1 = (data[i*step+j*3+0] & 192)>>2;
uchar C2 = (data[i*step+j*3+1] & 192)>>4;
uchar C3 = (data[i*step+j*3+2] & 192)>>6;
data2[i*step2+j] = C1 | C2 | C3; // merges the 2 MSB of each channel
}
}
}

Here's a Python implementation of color quantization using K-Means Clustering with cv2.kmeans. The idea is to reduce the number of distinct colors in an image while preserving the color appearance of the image as much as possible. Here's the result:
Input -> Output
Code
import cv2
import numpy as np
def kmeans_color_quantization(image, clusters=8, rounds=1):
h, w = image.shape[:2]
samples = np.zeros([h*w,3], dtype=np.float32)
count = 0
for x in range(h):
for y in range(w):
samples[count] = image[x][y]
count += 1
compactness, labels, centers = cv2.kmeans(samples,
clusters,
None,
(cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10000, 0.0001),
rounds,
cv2.KMEANS_RANDOM_CENTERS)
centers = np.uint8(centers)
res = centers[labels.flatten()]
return res.reshape((image.shape))
image = cv2.imread('1.jpg')
result = kmeans_color_quantization(image, clusters=8)
cv2.imshow('result', result)
cv2.waitKey()

The answers suggested here are really good. I thought I would add my idea as well. I follow the formulation of many comments here, in which it is said that 64 colors can be represented by 2 bits of each channel in an RGB image.
The function in code below takes as input an image and the number of bits required for quantization. It uses bit manipulation to 'drop' the LSB bits and keep only the required number of bits. The result is a flexible method that can quantize the image to any number of bits.
#include "include\opencv\cv.h"
#include "include\opencv\highgui.h"
// quantize the image to numBits
cv::Mat quantizeImage(const cv::Mat& inImage, int numBits)
{
cv::Mat retImage = inImage.clone();
uchar maskBit = 0xFF;
// keep numBits as 1 and (8 - numBits) would be all 0 towards the right
maskBit = maskBit << (8 - numBits);
for(int j = 0; j < retImage.rows; j++)
for(int i = 0; i < retImage.cols; i++)
{
cv::Vec3b valVec = retImage.at<cv::Vec3b>(j, i);
valVec[0] = valVec[0] & maskBit;
valVec[1] = valVec[1] & maskBit;
valVec[2] = valVec[2] & maskBit;
retImage.at<cv::Vec3b>(j, i) = valVec;
}
return retImage;
}
int main ()
{
cv::Mat inImage;
inImage = cv::imread("testImage.jpg");
char buffer[30];
for(int i = 1; i <= 8; i++)
{
cv::Mat quantizedImage = quantizeImage(inImage, i);
sprintf(buffer, "%d Bit Image", i);
cv::imshow(buffer, quantizedImage);
sprintf(buffer, "%d Bit Image.png", i);
cv::imwrite(buffer, quantizedImage);
}
cv::waitKey(0);
return 0;
}
Here is an image that is used in the above function call:
Image quantized to 2 bits for each RGB channel (Total 64 Colors):
3 bits for each channel:
4 bits ...

There is the K-means clustering algorithm which is already available in the OpenCV library. In short it determines which are the best centroids around which to cluster your data for a user-defined value of k ( = no of clusters). So in your case you could find the centroids around which to cluster your pixel values for a given value of k=64. The details are there if you google around. Here's a short intro to k-means.
Something similar to what you are probably trying was asked here on SO using k-means, hope it helps.
Another approach would be to use the pyramid mean shift filter function in OpenCV. It yields somewhat "flattened" images, i.e. the number of colors are less so it might be able to help you.

If you want a quick and dirty method in C++, in 1 line:
capImage &= cv::Scalar(0b11000000, 0b11000000, 0b11000000);
So, what it does is keep the upper 2 bits of each R, G, B component, and discards the lower 6 bits, hence the 0b11000000.
Because of the 3 channels in RGB, you get maximum 4 R x 4 B x 4 B = max 64 colors. The advantage of doing this is that you can run this on any number of images and the same colors will be mapped.
Note that this can make your image a bit darker since it discards some bits.
For a greyscale image, you can do:
capImage &= 0b11111100;
This will keep the upper 6 bits, which means you get 64 grays out of 256, and again the image can become a bit darker.
Here's an example, original image = 251424 unique colors.
And the resulting image has 46 colors:

Assuming that you want to use the same 64 colors for all images (ie palette not optimized per image), there are a at least a couple choices I can think of:
1) Convert to Lab or YCrCb colorspace and quantize using N bits for luminance and M bits for each color channel, N should be greater than M.
2) Compute a 3D histogram of color values over all your training images, then choose the 64 colors with the largest bin values. Quantize your images by assigning each pixel the color of the closest bin from the training set.
Method 1 is the most generic and easiest to implement, while method 2 can be better tailored to your specific dataset.
Update:
For example, 32 colors is 5 bits so assign 3 bits to the luminance channel and 1 bits to each color channel. To do this quantization, do integer division of the luminance channel by 2^8/2^3 = 32 and each color channel by 2^8/2^1 = 128. Now there are only 8 different luminance values and 2 different color channels each. Recombine these values into a single integer doing bit shifting or math (quantized color value = luminance*4+color1*2+color2);

A simple bitwise and with a proper bitmask would do the trick.
python, for 64 colors,
img = img & int("11000000", 2)
The number of colors for an RGB image should be a perfect cube (same across 3 channels).
For this method, the number of possible values for a channel should be a power of 2. (This check is ignored by the code and the next lower power of 2 is taken by it)
import numpy as np
import cv2 as cv
def is_cube(n):
cbrt = np.cbrt(n)
return cbrt ** 3 == n, int(cbrt)
def reduce_color_space(img, n_colors=64):
n_valid, cbrt = is_cube(n_colors)
if not n_valid:
print("n_colors should be a perfect cube")
return
n_bits = int(np.log2(cbrt))
if n_bits > 8:
print("Can't generate more colors")
return
bitmask = int(f"{'1' * n_bits}{'0' * (8 - n_bits)}", 2)
return img & bitmask
img = cv.imread("image.png")
cv.imshow("orig", img)
cv.imshow("reduced", reduce_color_space(img))
cv.waitKey(0)

img = numpy.multiply(img//32, 32)

Why don't you just do Matrix multiplication/division? Values will be automatically rounded.
Pseudocode:
convert your channels to unsigned characters (CV_8UC3),
Divide by
total colors / desired colors. Mat = Mat / (256/64). Decimal points
will be truncated.
Multiply by the same number. Mat = mat * 4
Done. Each channel now only contains 64 colors.

Fast block placement algorithm, advice needed?

I need to emulate the window placement strategy of the Fluxbox window manager.
As a rough guide, visualize randomly sized windows filling up the screen one at a time, where the rough size of each results in an average of 80 windows on screen without any window overlapping another.
If you have Fluxbox and Xterm installed on your system, you can try the xwinmidiarptoy BASH script to see a rough prototype of what I want happening. See the xwinmidiarptoy.txt notes I've written about it explaining what it does and how it should be used.
It is important to note that windows will close and the space that closed windows previously occupied becomes available once more for the placement of new windows.
The algorithm needs to be an Online Algorithm processing data "piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start."
The Fluxbox window placement strategy has three binary options which I want to emulate:
Windows build horizontal rows or vertical columns (potentially)
Windows are placed from left to right or right to left
Windows are placed from top to bottom or bottom to top
Differences between the target algorithm and a window-placement algorithm
The coordinate units are not pixels. The grid within which blocks will be placed will be 128 x 128 units. Furthermore, the placement area may be further shrunk by a boundary area placed within the grid.
Why is the algorithm a problem?
It needs to operate to the deadlines of a real time thread in an audio application.
At this moment I am only concerned with getting a fast algorithm, don't concern yourself over the implications of real time threads and all the hurdles in programming that that brings.
And although the algorithm will never ever place a window which overlaps another, the user will be able to place and move certain types of blocks, overlapping windows will exist. The data structure used for storing the windows and/or free space, needs to be able to handle this overlap.
So far I have two choices which I have built loose prototypes for:
1) A port of the Fluxbox placement algorithm into my code.
The problem with this is, the client (my program) gets kicked out of the audio server (JACK) when I try placing the worst case scenario of 256 blocks using the algorithm. This algorithm performs over 14000 full (linear) scans of the list of blocks already placed when placing the 256th window.
For a demonstration of this I created a program called text_boxer-0.0.2.tar.bz2 which takes a text file as input and arranges it within ASCII boxes. Issue make to build it. A little unfriendly, use --help (or any other invalid option) for a list of command line options. You must specify the text file by using the option.
2) My alternative approach.
Only partially implemented, this approach uses a data structure for each area of rectangular free unused space (the list of windows can be entirely separate, and is not required for testing of this algorithm). The data structure acts as a node in a doubly linked list (with sorted insertion), as well as containing the coordinates of the top-left corner, and the width and height.
Furthermore, each block data structure also contains four links which connect to each immediately adjacent (touching) block on each of the four sides.
IMPORTANT RULE: Each block may only touch with one block per side. This is a rule specific to the algorithm's way of storing free unused space and bears no factor in how many actual windows may touch each other.
The problem with this approach is, it's very complex. I have implemented the straightforward cases where 1) space is removed from one corner of a block, 2) splitting neighbouring blocks so that the IMPORTANT RULE is adhered to.
The less straightforward case, where the space to be removed can only be found within a column or row of boxes, is only partially implemented - if one of the blocks to be removed is an exact fit for width (ie column) or height (ie row) then problems occur. And don't even mention the fact this only checks columns one box wide, and rows one box tall.
I've implemented this algorithm in C - the language I am using for this project (I've not used C++ for a few years and am uncomfortable using it after having focused all my attention to C development, it's a hobby). The implementation is 700+ lines of code (including plenty of blank lines, brace lines, comments etc). The implementation only works for the horizontal-rows + left-right + top-bottom placement strategy.
So I've either got to add some way of making this +700 lines of code work for the other 7 placement strategy options, or I'm going to have to duplicate those +700 lines of code for the other seven options. Neither of these is attractive, the first, because the existing code is complex enough, the second, because of bloat.
The algorithm is not even at a stage where I can use it in the real time worst case scenario, because of missing functionality, so I still don't know if it actually performs better or worse than the first approach.
The current state of C implementation of this algorithm is freespace.c. I use gcc -O0 -ggdb freespace.c to build, and run it in an xterm sized to atleast 124 x 60 chars.
What else is there?
I've skimmed over and discounted:
Bin Packing algorithms: their
emphasis on optimal fit does not
match the requirements of this
algorithm.
Recursive Bisection Placement algorithms: sounds promising, but
these are for circuit design. Their
emphasis is optimal wire length.
Both of these, especially the latter, all elements to be placed/packs are known before the algorithm begins.
What are your thoughts on this? How would you approach it? What other algorithms should I look at? Or even what concepts should I research seeing as I've not studied computer science/software engineering?
Please ask questions in comments if further information is needed.
Further ideas developed since asking this question
Some combination of my "alternative algorithm" with a spatial hashmap for identifying if a large window to be placed would cover several blocks of free space.

I would consider some kind of spatial hashing structure. Imagine your entire free space is gridded coarsely, call them blocks. As windows come and go, they occupy certain sets of contiguous rectangular blocks. For each block, keep track of the largest unused rectangle incident to each corner, so you need to store 2*4 real numbers per block. For an empty block, the rectangles at each corner have size equal to the block. Thus, a block can only be "used up" at its corners, and so at most 4 windows can sit in any block.
Now each time you add a window, you have to search for a rectangular set of blocks for which the window will fit, and when you do, update the free corner sizes. You should size your blocks so that a handful (~4x4) of them fit into a typical window. For each window, keep track of which blocks it touches (you only need to keep track of extents), as well as which windows touch a given block (at most 4, in this algorithm). There is an obvious tradeoff between the granularity of the blocks and the amount of work per window insertion/removal.
When removing a window, loop over all blocks it touches, and for each block, recompute the free corner sizes (you know which windows touch it). This is fast since the inner loop is at most length 4.
I imagine a data structure like
struct block{
int free_x[4]; // 0 = top left, 1 = top right,
int free_y[4]; // 2 = bottom left, 3 = bottom right
int n_windows; // number of windows that occupy this block
int window_id[4]; // IDs of windows that occupy this block
};
block blocks[NX][NY];
struct window{
int id;
int used_block_x[2]; // 0 = first index of used block,
int used_block_y[2]; // 1 = last index of used block
};
Edit
Here is a picture:
It shows two example blocks. The colored dots indicate the corners of the block, and the arrows emanating from them indicate the extents of the largest-free-rectangle from that corner.
You mentioned in the edit that the grid on which your windows will be placed is already quite coarse (127x127), so the block sizes would probably be something like 4 grid cells on a side, which probably wouldn't gain you much. This method is suitable if your window corner coordinates can take on a lot of values (I was thinking they would be pixels), but not so much in your case. You can still try it, since it's simple. You would probably want to also keep a list of completely empty blocks so that if a window comes in that is larger than 2 block widths, then you look first in that list before looking for some suitable free space in the block grid.

After some false starts, I have eventually arrived here. Here is where the use of data structures for storing rectangular areas of free space have been abandoned. Instead, there is a 2d array with 128 x 128 elements to achieve the same result but with much less complexity.
The following function scans the array for an area width * height in size. The first position it finds it writes the top left coordinates of, to where resultx and resulty point to.
_Bool freespace_remove( freespace* fs,
int width, int height,
int* resultx, int* resulty)
{
int x = 0;
int y = 0;
const int rx = FSWIDTH - width;
const int by = FSHEIGHT - height;
*resultx = -1;
*resulty = -1;
char* buf[height];
for (y = 0; y < by; ++y)
{
x = 0;
char* scanx = fs->buf[y];
while (x < rx)
{
while(x < rx && *(scanx + x))
++x;
int w, h;
for (h = 0; h < height; ++h)
buf[h] = fs->buf[y + h] + x;
_Bool usable = true;
w = 0;
while (usable && w < width)
{
h = 0;
while (usable && h < height)
if (*(buf[h++] + w))
usable = false;
++w;
}
if (usable)
{
for (w = 0; w < width; ++w)
for (h = 0; h < height; ++h)
*(buf[h] + w) = 1;
*resultx = x;
*resulty = y;
return true;
}
x += w;
}
}
return false;
}
The 2d array is zero initialized. Any areas in the array where the space is used are set to 1. This structure and function will work independently from the actual list of windows that are occupying the areas marked with 1's.
The advantages of this method are its simplicity. It only uses one data structure - an array. The function is short, and should not be too difficult to adapt to handle the remaining placement options (here it only handles Row Smart + Left to Right + Top to Bottom).
My initial tests also look promising on the speed front. Though I don't think this would be suitable for a window manager placing windows on, for example, a 1600 x 1200 desktop with pixel accuracy, for my purposes I believe it is going to be much better than any of the previous methods I have tried.
Compilable test code here:
http://jwm-art.net/art/text/freespace_grid.c
(in Linux I use gcc -ggdb -O0 freespace_grid.c to compile)

#include <limits.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#define FSWIDTH 128
#define FSHEIGHT 128
#ifdef USE_64BIT_ARRAY
#define FSBUFBITS 64
#define FSBUFWIDTH 2
typedef uint64_t fsbuf_type;
#define TRAILING_ZEROS( v ) __builtin_ctzl(( v ))
#define LEADING_ONES( v ) __builtin_clzl(~( v ))
#else
#ifdef USE_32BIT_ARRAY
#define FSBUFBITS 32
#define FSBUFWIDTH 4
typedef uint32_t fsbuf_type;
#define TRAILING_ZEROS( v ) __builtin_ctz(( v ))
#define LEADING_ONES( v ) __builtin_clz(~( v ))
#else
#ifdef USE_16BIT_ARRAY
#define FSBUFBITS 16
#define FSBUFWIDTH 8
typedef uint16_t fsbuf_type;
#define TRAILING_ZEROS( v ) __builtin_ctz( 0xffff0000 | ( v ))
#define LEADING_ONES( v ) __builtin_clz(~( v ) << 16)
#else
#ifdef USE_8BIT_ARRAY
#define FSBUFBITS 8
#define FSBUFWIDTH 16
typedef uint8_t fsbuf_type;
#define TRAILING_ZEROS( v ) __builtin_ctz( 0xffffff00 | ( v ))
#define LEADING_ONES( v ) __builtin_clz(~( v ) << 24)
#else
#define FSBUFBITS 1
#define FSBUFWIDTH 128
typedef unsigned char fsbuf_type;
#define TRAILING_ZEROS( v ) (( v ) ? 0 : 1)
#define LEADING_ONES( v ) (( v ) ? 1 : 0)
#endif
#endif
#endif
#endif
static const fsbuf_type fsbuf_max = ~(fsbuf_type)0;
static const fsbuf_type fsbuf_high = (fsbuf_type)1 << (FSBUFBITS - 1);
typedef struct freespacegrid
{
fsbuf_type buf[FSHEIGHT][FSBUFWIDTH];
_Bool left_to_right;
_Bool top_to_bottom;
} freespace;
void freespace_dump(freespace* fs)
{
int x, y;
for (y = 0; y < FSHEIGHT; ++y)
{
for (x = 0; x < FSBUFWIDTH; ++x)
{
fsbuf_type i = FSBUFBITS;
fsbuf_type b = fs->buf[y][x];
for(; i != 0; --i, b <<= 1)
putchar(b & fsbuf_high ? '#' : '/');
/*
if (x + 1 < FSBUFWIDTH)
putchar('|');
*/
}
putchar('\n');
}
}
freespace* freespace_new(void)
{
freespace* fs = malloc(sizeof(*fs));
if (!fs)
return 0;
int y;
for (y = 0; y < FSHEIGHT; ++y)
{
memset(&fs->buf[y][0], 0, sizeof(fsbuf_type) * FSBUFWIDTH);
}
fs->left_to_right = true;
fs->top_to_bottom = true;
return fs;
}
void freespace_delete(freespace* fs)
{
if (!fs)
return;
free(fs);
}
/* would be private function: */
void fs_set_buffer( fsbuf_type buf[FSHEIGHT][FSBUFWIDTH],
unsigned x,
unsigned y1,
unsigned xoffset,
unsigned width,
unsigned height)
{
fsbuf_type v;
unsigned y;
for (; width > 0 && x < FSBUFWIDTH; ++x)
{
if (width < xoffset)
v = (((fsbuf_type)1 << width) - 1) << (xoffset - width);
else if (xoffset < FSBUFBITS)
v = ((fsbuf_type)1 << xoffset) - 1;
else
v = fsbuf_max;
for (y = y1; y < y1 + height; ++y)
{
#ifdef FREESPACE_DEBUG
if (buf[y][x] & v)
printf("**** over-writing area ****\n");
#endif
buf[y][x] |= v;
}
if (width < xoffset)
return;
width -= xoffset;
xoffset = FSBUFBITS;
}
}
_Bool freespace_remove( freespace* fs,
unsigned width, unsigned height,
int* resultx, int* resulty)
{
unsigned x, x1, y;
unsigned w, h;
unsigned xoffset, x1offset;
unsigned tz; /* trailing zeros */
fsbuf_type* xptr;
fsbuf_type mask = 0;
fsbuf_type v;
_Bool scanning = false;
_Bool offset = false;
*resultx = -1;
*resulty = -1;
for (y = 0; y < (unsigned) FSHEIGHT - height; ++y)
{
scanning = false;
xptr = &fs->buf[y][0];
for (x = 0; x < FSBUFWIDTH; ++x, ++xptr)
{
if(*xptr == fsbuf_max)
{
scanning = false;
continue;
}
if (!scanning)
{
scanning = true;
x1 = x;
x1offset = xoffset = FSBUFBITS;
w = width;
}
retry:
if (w < xoffset)
mask = (((fsbuf_type)1 << w) - 1) << (xoffset - w);
else if (xoffset < FSBUFBITS)
mask = ((fsbuf_type)1 << xoffset) - 1;
else
mask = fsbuf_max;
offset = false;
for (h = 0; h < height; ++h)
{
v = fs->buf[y + h][x] & mask;
if (v)
{
tz = TRAILING_ZEROS(v);
offset = true;
break;
}
}
if (offset)
{
if (tz)
{
x1 = x;
w = width;
x1offset = xoffset = tz;
goto retry;
}
scanning = false;
}
else
{
if (w <= xoffset) /***** RESULT! *****/
{
fs_set_buffer(fs->buf, x1, y, x1offset, width, height);
*resultx = x1 * FSBUFBITS + (FSBUFBITS - x1offset);
*resulty = y;
return true;
}
w -= xoffset;
xoffset = FSBUFBITS;
}
}
}
return false;
}
int main(int argc, char** argv)
{
int x[1999];
int y[1999];
int w[1999];
int h[1999];
int i;
freespace* fs = freespace_new();
for (i = 0; i < 1999; ++i, ++u)
{
w[i] = rand() % 18 + 4;
h[i] = rand() % 18 + 4;
freespace_remove(fs, w[i], h[i], &x[i], &y[i]);
/*
freespace_dump(fs);
printf("w:%d h:%d x:%d y:%d\n", w[i], h[i], x[i], y[i]);
if (x[i] == -1)
printf("not removed space %d\n", i);
getchar();
*/
}
freespace_dump(fs);
freespace_delete(fs);
return 0;
}
The above code requires one of USE_64BIT_ARRAY, USE_32BIT_ARRAY, USE_16BIT_ARRAY, USE_8BIT_ARRAY to be defined otherwise it will fall back to using only the high bit of an unsigned char for storing the state of grid cells.
The function fs_set_buffer will not be declared in the header, and will become static within the implementation when this code gets split between .h and .c files. A more user friendly function hiding the implementation details will be provided for removing used space from the grid.
Overall, this implementation is faster without optimization than my previous answer with maximum optimization (using GCC on 64bit Gentoo, optimization options -O0 and -O3 respectively).
Regarding USE_NNBIT_ARRAY and the different bit sizes, I used two different methods of timing the code which make 1999 calls to freespace_remove.
Timing main() using the Unix time command (and disabling any output in the code) seemed to prove my expectations correct - that higher bit sizes are faster.
On the other hand, timing individual calls to freespace_remove (using gettimeofday) and comparing the maximum time taken over the 1999 calls seemed to indicate lower bit sizes were faster.
This has only been tested on a 64bit system (Intel Dual Core II).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio