How do convolution matrices work? - image

How do those matrices work? Do I need to multiple every single pixel? How about the upperleft, upperright, bottomleft and bottomleft pixels where there's no surrounding pixel? And does the matrix work from left to right and from up to bottom or from up to bottom first and then left to right?
Why does this kernel (Edge enhance) : http://i.stack.imgur.com/d755G.png
turns into this image: http://i.stack.imgur.com/NRdkK.jpg

The Convolution filter is applied to every single pixel.
On the edges there are a few things you can do (all leave a type of border or shrink the image):
skip the edges and crop 1 pixel from the edge of the image
substitute 0 or 255 for any of the pixels that are out of bounds for the image
use a cubic spline (or other interpolation method) between 0 (or 255) and the value of the images edge pixel to come up with a substitute.
The order you apply the convolution does not matter (upper right to bottom left is most common) you should get the same results no matter the order.
However, a common mistake when applying a convolution matrix is to overwrite the current pixel you are examining with the new value. This will affect the value you come up with for the pixel next to the current one. A better method would be to create a buffer to hold the computed values, so that previous applications of the convolution filter do not affect current application of the matrix.
From your example images it is hard to tell why the filter applied creates the black and white version without seeing the original image.

Below is a step by step example of applying a convolution kernel to an image (1D for simplicity).
As for the edge enhancement kernel in your post, notice the +1 next to the -1. Think about what that will do. If the region is constant the two pixel under the +/-1 will add to zero (black). If the two pixels are different they will have a non-zero value. So what you are seeing is that pixels next to each other that are different get highlighted, while ones that are the same get set to black. The bigger the difference the brighter (more white) the pixel in the filtered image.

Yes, you multiply every pixel, with that matrix. The traditional method is to find the relevant pixels relative to the pixel being convoluted, multiple the factors, and average it out. So a 3x3 blur of:
1, 1, 1,
1, 1, 1,
1, 1, 1
This matrix, means you take the relevant values of the various components and multiply them. Then divide by the number of elements. So you would get that 3 by 3 box, add up all the red values then divide by 9. You'd get the 3 by 3 box, add up all the green values then divide by 9. You'd get the 3 by 3 box, add up all the blue values then divide by 9.
This means a couple things. First, you need a second giant chunk of memory to perform this operation. And you do every pixel you can.
However, that's only for the traditional method and the traditional method is actually needlessly convoluted (get it?). If you return the results in a corner. You never actually need any additional memory and always do the entire operation within the memory footprint you started with.
public static void convolve(int[] pixels, int offset, int stride, int x, int y, int width, int height, int[][] matrix, int parts) {
int index = offset + x + (y*stride);
for (int j = 0; j < height; j++, index += stride) {
for (int k = 0; k < width; k++) {
int pos = index + k;
pixels[pos] = convolve(pixels,stride,pos, matrix, parts);
}
}
}
private static int crimp(int color) {
return (color >= 0xFF) ? 0xFF : (color < 0) ? 0 : color;
}
private static int convolve(int[] pixels, int stride, int index, int[][] matrix, int parts) {
int redSum = 0;
int greenSum = 0;
int blueSum = 0;
int pixel, factor;
for (int j = 0, m = matrix.length; j < m; j++, index+=stride) {
for (int k = 0, n = matrix[j].length; k < n; k++) {
pixel = pixels[index + k];
factor = matrix[j][k];
redSum += factor * ((pixel >> 16) & 0xFF);
greenSum += factor * ((pixel >> 8) & 0xFF);
blueSum += factor * ((pixel) & 0xFF);
}
}
return 0xFF000000 | ((crimp(redSum / parts) << 16) | (crimp(greenSum / parts) << 8) | (crimp(blueSum / parts)));
}
With the kernel traditionally returning the value to the center most pixel. This allows the image to blur around the edges but more or less remain where it started. This seemed like a good idea but it's actually problematic. The correct way to do it is to have the results pixel in the upper-left corner. Then you can simply, and with no extra memory, just iterate the the entire image with a scanline, going one pixel at a time and returning the value, without causing errors. The bulk of the color weight is shifted up and left by one pixel. But, it's one pixel, and you can shift it back down and to the left if you iterate backwards with a result pixel in the bottom-right. Though this might be trouble for the cache hits.
However, a lot of modern architecture have GPUs now, so the entire image can be done simultaneously. Making it a kind of moot point. But, it is strange that one of the most important algorithm in graphics is weird in requiring this, as that makes the easiest way to do the operation impossible, and a memory hog.
So that people like Matt on this question say things like "However, a common mistake when applying a convolution matrix is to overwrite the current pixel you are examining with the new value." -- Really this is the correct way to do it, the error is writing the result pixel to the center rather than the upper left corner. Because unlike the upper-left corner, you will need the center pixel again. You won't ever need the upper-left corner again (assuming you are iterating left->right, top->bottom), and so it's safe to store your value there.
"This will affect the value you come up with for the pixel next to the current one." -- If you wrote it to the upper left corner as you processed it as a scan, you would overwrite data that you do not ever need again. Using a bunch of extra memory isn't a better solution.
As such, here's likely the fastest Java blur you'd ever see.
private static void applyBlur(int[] pixels, int stride) {
int v0, v1, v2, r, g, b;
int pos;
pos = 0;
try {
while (true) {
v0 = pixels[pos];
v1 = pixels[pos+1];
v2 = pixels[pos+2];
r = ((v0 >> 16) & 0xFF) + ((v1 >> 16) & 0xFF) + ((v2 >> 16) & 0xFF);
g = ((v0 >> 8 ) & 0xFF) + ((v1 >> 8) & 0xFF) + ((v2 >> 8) & 0xFF);
b = ((v0 ) & 0xFF) + ((v1 ) & 0xFF) + ((v2 ) & 0xFF);
r/=3;
g/=3;
b/=3;
pixels[pos++] = r << 16 | g << 8 | b;
}
}
catch (ArrayIndexOutOfBoundsException e) { }
pos = 0;
try {
while (true) {
v0 = pixels[pos];
v1 = pixels[pos+stride];
v2 = pixels[pos+stride+stride];
r = ((v0 >> 16) & 0xFF) + ((v1 >> 16) & 0xFF) + ((v2 >> 16) & 0xFF);
g = ((v0 >> 8 ) & 0xFF) + ((v1 >> 8) & 0xFF) + ((v2 >> 8) & 0xFF);
b = ((v0 ) & 0xFF) + ((v1 ) & 0xFF) + ((v2 ) & 0xFF);
r/=3;
g/=3;
b/=3;
pixels[pos++] = r << 16 | g << 8 | b;
}
}
catch (ArrayIndexOutOfBoundsException e) { }
}

Related

Is it posible to know the brightness of a picture in Flutter?

I am building an application which has a Camera inside.
After I take a photo, I want to analyze it to know the brightness of this picture, if it is bad I have to take again the photo.
This is my code right now, it's a javascript function that I found and writing in Dart:
Thanks to #Abion47
EDIT 1
for (int i = 0; i < pixels.length; i++) {
int pixel = pixels[i];
int b = (pixel & 0x00FF0000) >> 16;
int g = (pixel & 0x0000FF00) >> 8;
int r = (pixel & 0x000000FF);
avg = ((r + g + b) / 3).floor();
colorSum += avg;
}
brightness = (colorSum / (width * height)).floor();
}
brightness = (colorSum / (width * height)).round();
// I tried with this other code
//brightness = (colorSum / pixels.length).round();
return brightness;
But I've got less brightness on white than black, the numbers are a little bit weird.
Do you know a better way to know the brightness?
SOLUTION:
Under further investigation we found the solution, we had an error doing the image decoding, but we used a Image function to do it.
Here is our final code:
Image image = decodeImage(file.readAsBytesSync());
var data = image.getBytes();
var colorSum = 0;
for(var x = 0; x < data.length; x += 4) {
int r = data[x];
int g = data[x + 1];
int b = data[x + 2];
int avg = ((r + g + b) / 3).floor();
colorSum += avg;
}
var brightness = (colorSum / (image.width * image.height)).floor();
return brightness;
Hope it helps you.
There are several things wrong with your code.
First, you are getting a range error because you are attempting to access a pixel that doesn't exist. This is probably due to width and/or height being greater than the image's actual width or height. There are a lot of ways to try and get these values, but for this application it doesn't actually matter since the end result is to get an average value across all pixels in the image, and you don't need the width or height of the image for that.
Second, you are fetching the color values by serializing the color value into a hex string and then parsing the individual channel substrings. Your substring is going to result in incorrect values because:
foo.substring(a, b) takes the substring of foo from a to b, exclusive. That means that a and b are indices, not lengths, and the resulting string will not include the character at b. So assuming hex is "01234567", when you do hex.substring(0, 2), you get "01", and then you do hex.substring(3, 5) you get "34" while hex.substring(6, 8) gets you "67". You need to do hex.substring(0, 2) followed by hex.substring(2, 4) and hex.substring(4, 6) to get the first three channels.
That being said, you are fetching the wrong channels. The image package stores its pixel values in ABGR format, meaning the first two characters in the hex string are going to be the alpha channel which is unimportant when calculating image brightness. Instead, you want the second, third, and forth channels for the blue, green, and red values respectively.
And having said all that, this is an extremely inefficient way to do this anyway when the preferred way to retrieve channel data from an integer color value is with bitwise operations on the integer itself. (Never convert a number to a string or vice versa unless you absolutely have to.)
So in summary, what you want will likely be something akin to the following;
final pixels = image.data;
double colorSum = 0;
for (int i = 0; i < pixels.length; i++) {
int pixel = pixels[i];
int b = (pixel & 0x00FF0000) >> 16;
int g = (pixel & 0x0000FF00) >> 8;
int r = (pixel & 0x000000FF);
avg = (r + g + b) / 3;
colorSum += avg;
}
return colorSum / pixels.length;

Why Am I Getting a Negative Number from this code in processing?

Please help, I cannot figure out why this code prints a negative number instead of a value from 0 - 255.
void setup() {
size(1024, 1024);
background(random(0,255), random(0,255), random(0,255));
println(get(1,1));
}
The reason is describe here in Processing documentation:
From a technical standpoint, colors are 32 bits of information ordered
as AAAAAAAARRRRRRRRGGGGGGGGBBBBBBBB where the A's contain the alpha
value, the R's are the red value, G's are green, and B's are blue.
Each component is 8 bits (a number between 0 and 255). These values
can be manipulated with bit shifting.
The color type is secretly an int. In second reference bit shifting is descibed how you can use this value to extract alpha, red, green and blue:
// Using "right shift" as a faster technique than red(), green(), and blue()
color argb = color(204, 204, 51, 255);
int a = (argb >> 24) & 0xFF;
int r = (argb >> 16) & 0xFF; // Faster way of getting red(argb)
int g = (argb >> 8) & 0xFF; // Faster way of getting green(argb)
int b = argb & 0xFF; // Faster way of getting blue(argb)
fill(r, g, b, a);
rect(30, 20, 55, 55);
>> - shift bits to right, i.e. cut last 24bits
& 0xFF - get only last 8 bits
and according to the authors, this is a faster method than red(), green(), and blue() functions.
If you want to compute this value by yourself, I tried to recreate the calculation:
int a = 255;
int r = 204;
int g = 204;
int b = 51;
int res = ((-128*int(a/128)+int(a%128))<<24) | (r << 16) | (g << 8) | b;
((-128*int(a/128)+int(a%128))<<24) - Alpha is between [0;255]. In their solution alpha is tranform between [-128;127] with overflow of range, i.e. 126=>126, 127=>127, 128=>-128, 129=>-127 and so on 255=>-1
Questions like this are best answered by looking at the reference.
The get() function returns a color type. Behind the scenes, color is represented by an int value, which is what you're seeing when you print it. You usually don't need to worry about this int value, and you can just use the color type directly, for example by passing into the fill() function.
But if you want the RGB values from a color value, you can use the red(), green(), and blue() functions.

Algorithm for bit expansion/duplication?

Is there an efficient (fast) algorithm that will perform bit expansion/duplication?
For example, expand each bit in an 8bit value by 3 (creating a 24bit value):
1101 0101 => 11111100 01110001 11000111
The brute force method that has been proposed is to create a lookup table. In the future, the expansion value may need to be variable. That is, in the above example we are expanding by 3 but may need to expand by some other value(s). This would require multiple lookup tables that I'd like to avoid if possible.
There is a chance to make it quicker than lookup table if arithmetic calculations are for some reason faster than memory access. This may be possible if calculations are vectorized (PPC AltiVec or Intel SSE) and/or if other parts of the program need to use every bit of cache memory.
If expansion factor = 3, only 7 instructions are needed:
out = (((in * 0x101 & 0x0F00F) * 0x11 & 0x0C30C3) * 5 & 0x249249) * 7;
Or other alternative, with 10 instructions:
out = (in | in << 8) & 0x0F00F;
out = (out | out << 4) & 0x0C30C3;
out = (out | out << 2) & 0x249249;
out *= 7;
For other expansion factors >= 3:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = out * ((1 << shift) + 1) & mask;
}
out *= (1 << N) - 1;
Or other alternative, for expansion factors >= 2:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = (out | out << shift) & mask;
}
out *= (1 << N) - 1;
shift and mask values are better to be calculated prior to bit stream processing.
You can do it one input bit at at time. Of course, it will be slower than a lookup table, but if you're doing something like writing for a tiny, 8-bit microcontroller without enough room for a table, it should have the smallest possible ROM footprint.

How to reduce the number of colors in an image with OpenCV?

I have a set of image files, and I want to reduce the number of colors of them to 64. How can I do this with OpenCV?
I need this so I can work with a 64-sized image histogram.
I'm implementing CBIR techniques
What I want is color quantization to a 4-bit palette.
This subject was well covered on OpenCV 2 Computer Vision Application Programming Cookbook:
Chapter 2 shows a few reduction operations, one of them demonstrated here in C++ and later in Python:
#include <iostream>
#include <vector>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
void colorReduce(cv::Mat& image, int div=64)
{
int nl = image.rows; // number of lines
int nc = image.cols * image.channels(); // number of elements per line
for (int j = 0; j < nl; j++)
{
// get the address of row j
uchar* data = image.ptr<uchar>(j);
for (int i = 0; i < nc; i++)
{
// process each pixel
data[i] = data[i] / div * div + div / 2;
}
}
}
int main(int argc, char* argv[])
{
// Load input image (colored, 3-channel, BGR)
cv::Mat input = cv::imread(argv[1]);
if (input.empty())
{
std::cout << "!!! Failed imread()" << std::endl;
return -1;
}
colorReduce(input);
cv::imshow("Color Reduction", input);
cv::imwrite("output.jpg", input);
cv::waitKey(0);
return 0;
}
Below you can find the input image (left) and the output of this operation (right):
The equivalent code in Python would be the following:
(credits to #eliezer-bernart)
import cv2
import numpy as np
input = cv2.imread('castle.jpg')
# colorReduce()
div = 64
quantized = input // div * div + div // 2
cv2.imwrite('output.jpg', quantized)
You might consider K-means, yet in this case it will most likely be extremely slow. A better approach might be doing this "manually" on your own. Let's say you have image of type CV_8UC3, i.e. an image where each pixel is represented by 3 RGB values from 0 to 255 (Vec3b). You might "map" these 256 values to only 4 specific values, which would yield 4 x 4 x 4 = 64 possible colors.
I've had a dataset, where I needed to make sure that dark = black, light = white and reduce the amount of colors of everything between. This is what I did (C++):
inline uchar reduceVal(const uchar val)
{
if (val < 64) return 0;
if (val < 128) return 64;
return 255;
}
void processColors(Mat& img)
{
uchar* pixelPtr = img.data;
for (int i = 0; i < img.rows; i++)
{
for (int j = 0; j < img.cols; j++)
{
const int pi = i*img.cols*3 + j*3;
pixelPtr[pi + 0] = reduceVal(pixelPtr[pi + 0]); // B
pixelPtr[pi + 1] = reduceVal(pixelPtr[pi + 1]); // G
pixelPtr[pi + 2] = reduceVal(pixelPtr[pi + 2]); // R
}
}
}
causing [0,64) to become 0, [64,128) -> 64 and [128,255) -> 255, yielding 27 colors:
To me this seems to be neat, perfectly clear and faster than anything else mentioned in other answers.
You might also consider reducing these values to one of the multiples of some number, let's say:
inline uchar reduceVal(const uchar val)
{
if (val < 192) return uchar(val / 64.0 + 0.5) * 64;
return 255;
}
which would yield a set of 5 possible values: {0, 64, 128, 192, 255}, i.e. 125 colors.
There are many ways to do it. The methods suggested by jeff7 are OK, but some drawbacks are:
method 1 have parameters N and M, that you must choose, and you must also convert it to another colorspace.
method 2 answered can be very slow, since you should compute a 16.7 Milion bins histogram and sort it by frequency (to obtain the 64 higher frequency values)
I like to use an algorithm based on the Most Significant Bits to use in a RGB color and convert it to a 64 color image. If you're using C/OpenCV, you can use something like the function below.
If you're working with gray-level images I recommed to use the LUT() function of the OpenCV 2.3, since it is faster. There is a tutorial on how to use LUT to reduce the number of colors. See: Tutorial: How to scan images, lookup tables... However I find it more complicated if you're working with RGB images.
void reduceTo64Colors(IplImage *img, IplImage *img_quant) {
int i,j;
int height = img->height;
int width = img->width;
int step = img->widthStep;
uchar *data = (uchar *)img->imageData;
int step2 = img_quant->widthStep;
uchar *data2 = (uchar *)img_quant->imageData;
for (i = 0; i < height ; i++) {
for (j = 0; j < width; j++) {
// operator XXXXXXXX & 11000000 equivalent to XXXXXXXX AND 11000000 (=192)
// operator 01000000 >> 2 is a 2-bit shift to the right = 00010000
uchar C1 = (data[i*step+j*3+0] & 192)>>2;
uchar C2 = (data[i*step+j*3+1] & 192)>>4;
uchar C3 = (data[i*step+j*3+2] & 192)>>6;
data2[i*step2+j] = C1 | C2 | C3; // merges the 2 MSB of each channel
}
}
}
Here's a Python implementation of color quantization using K-Means Clustering with cv2.kmeans. The idea is to reduce the number of distinct colors in an image while preserving the color appearance of the image as much as possible. Here's the result:
Input -> Output
Code
import cv2
import numpy as np
def kmeans_color_quantization(image, clusters=8, rounds=1):
h, w = image.shape[:2]
samples = np.zeros([h*w,3], dtype=np.float32)
count = 0
for x in range(h):
for y in range(w):
samples[count] = image[x][y]
count += 1
compactness, labels, centers = cv2.kmeans(samples,
clusters,
None,
(cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10000, 0.0001),
rounds,
cv2.KMEANS_RANDOM_CENTERS)
centers = np.uint8(centers)
res = centers[labels.flatten()]
return res.reshape((image.shape))
image = cv2.imread('1.jpg')
result = kmeans_color_quantization(image, clusters=8)
cv2.imshow('result', result)
cv2.waitKey()
The answers suggested here are really good. I thought I would add my idea as well. I follow the formulation of many comments here, in which it is said that 64 colors can be represented by 2 bits of each channel in an RGB image.
The function in code below takes as input an image and the number of bits required for quantization. It uses bit manipulation to 'drop' the LSB bits and keep only the required number of bits. The result is a flexible method that can quantize the image to any number of bits.
#include "include\opencv\cv.h"
#include "include\opencv\highgui.h"
// quantize the image to numBits
cv::Mat quantizeImage(const cv::Mat& inImage, int numBits)
{
cv::Mat retImage = inImage.clone();
uchar maskBit = 0xFF;
// keep numBits as 1 and (8 - numBits) would be all 0 towards the right
maskBit = maskBit << (8 - numBits);
for(int j = 0; j < retImage.rows; j++)
for(int i = 0; i < retImage.cols; i++)
{
cv::Vec3b valVec = retImage.at<cv::Vec3b>(j, i);
valVec[0] = valVec[0] & maskBit;
valVec[1] = valVec[1] & maskBit;
valVec[2] = valVec[2] & maskBit;
retImage.at<cv::Vec3b>(j, i) = valVec;
}
return retImage;
}
int main ()
{
cv::Mat inImage;
inImage = cv::imread("testImage.jpg");
char buffer[30];
for(int i = 1; i <= 8; i++)
{
cv::Mat quantizedImage = quantizeImage(inImage, i);
sprintf(buffer, "%d Bit Image", i);
cv::imshow(buffer, quantizedImage);
sprintf(buffer, "%d Bit Image.png", i);
cv::imwrite(buffer, quantizedImage);
}
cv::waitKey(0);
return 0;
}
Here is an image that is used in the above function call:
Image quantized to 2 bits for each RGB channel (Total 64 Colors):
3 bits for each channel:
4 bits ...
There is the K-means clustering algorithm which is already available in the OpenCV library. In short it determines which are the best centroids around which to cluster your data for a user-defined value of k ( = no of clusters). So in your case you could find the centroids around which to cluster your pixel values for a given value of k=64. The details are there if you google around. Here's a short intro to k-means.
Something similar to what you are probably trying was asked here on SO using k-means, hope it helps.
Another approach would be to use the pyramid mean shift filter function in OpenCV. It yields somewhat "flattened" images, i.e. the number of colors are less so it might be able to help you.
If you want a quick and dirty method in C++, in 1 line:
capImage &= cv::Scalar(0b11000000, 0b11000000, 0b11000000);
So, what it does is keep the upper 2 bits of each R, G, B component, and discards the lower 6 bits, hence the 0b11000000.
Because of the 3 channels in RGB, you get maximum 4 R x 4 B x 4 B = max 64 colors. The advantage of doing this is that you can run this on any number of images and the same colors will be mapped.
Note that this can make your image a bit darker since it discards some bits.
For a greyscale image, you can do:
capImage &= 0b11111100;
This will keep the upper 6 bits, which means you get 64 grays out of 256, and again the image can become a bit darker.
Here's an example, original image = 251424 unique colors.
And the resulting image has 46 colors:
Assuming that you want to use the same 64 colors for all images (ie palette not optimized per image), there are a at least a couple choices I can think of:
1) Convert to Lab or YCrCb colorspace and quantize using N bits for luminance and M bits for each color channel, N should be greater than M.
2) Compute a 3D histogram of color values over all your training images, then choose the 64 colors with the largest bin values. Quantize your images by assigning each pixel the color of the closest bin from the training set.
Method 1 is the most generic and easiest to implement, while method 2 can be better tailored to your specific dataset.
Update:
For example, 32 colors is 5 bits so assign 3 bits to the luminance channel and 1 bits to each color channel. To do this quantization, do integer division of the luminance channel by 2^8/2^3 = 32 and each color channel by 2^8/2^1 = 128. Now there are only 8 different luminance values and 2 different color channels each. Recombine these values into a single integer doing bit shifting or math (quantized color value = luminance*4+color1*2+color2);
A simple bitwise and with a proper bitmask would do the trick.
python, for 64 colors,
img = img & int("11000000", 2)
The number of colors for an RGB image should be a perfect cube (same across 3 channels).
For this method, the number of possible values for a channel should be a power of 2. (This check is ignored by the code and the next lower power of 2 is taken by it)
import numpy as np
import cv2 as cv
def is_cube(n):
cbrt = np.cbrt(n)
return cbrt ** 3 == n, int(cbrt)
def reduce_color_space(img, n_colors=64):
n_valid, cbrt = is_cube(n_colors)
if not n_valid:
print("n_colors should be a perfect cube")
return
n_bits = int(np.log2(cbrt))
if n_bits > 8:
print("Can't generate more colors")
return
bitmask = int(f"{'1' * n_bits}{'0' * (8 - n_bits)}", 2)
return img & bitmask
img = cv.imread("image.png")
cv.imshow("orig", img)
cv.imshow("reduced", reduce_color_space(img))
cv.waitKey(0)
img = numpy.multiply(img//32, 32)
Why don't you just do Matrix multiplication/division? Values will be automatically rounded.
Pseudocode:
convert your channels to unsigned characters (CV_8UC3),
Divide by
total colors / desired colors. Mat = Mat / (256/64). Decimal points
will be truncated.
Multiply by the same number. Mat = mat * 4
Done. Each channel now only contains 64 colors.

Calculate the Hilbert value of a point for use in a Hilbert R-Tree?

I have an application where a Hilbert R-Tree (wikipedia) (citeseer) would seem to be an appropriate data structure. Specifically, it requires reasonably fast spatial queries over a data set that will experience a lot of updates.
However, as far as I can see, none of the descriptions of the algorithms for this data structure even mention how to actually calculate the requisite Hilbert Value; which is the distance along a Hilbert Curve to the point.
So any suggestions for how to go about calculating this?
Fun question!
I did a bit of googling, and the good news is, I've found an implementation of Hilbert Value.
The potentially bad news is, it's in Haskell...
http://www.serpentine.com/blog/2007/01/11/two-dimensional-spatial-hashing-with-space-filling-curves/
It also proposes a Lebesgue distance metric you might be able to compute more easily.
Below is my java code adapted from C code in the paper "Encoding and decoding the Hilbert order" by Xian Lu and Gunther Schrack, published in Software: Practice and Experience Vol. 26 pp 1335-46 (1996).
Hope this helps. Improvements welcome !
Michael
/**
* Find the Hilbert order (=vertex index) for the given grid cell
* coordinates.
* #param x cell column (from 0)
* #param y cell row (from 0)
* #param r resolution of Hilbert curve (grid will have Math.pow(2,r)
* rows and cols)
* #return Hilbert order
*/
public static int encode(int x, int y, int r) {
int mask = (1 << r) - 1;
int hodd = 0;
int heven = x ^ y;
int notx = ~x & mask;
int noty = ~y & mask;
int temp = notx ^ y;
int v0 = 0, v1 = 0;
for (int k = 1; k < r; k++) {
v1 = ((v1 & heven) | ((v0 ^ noty) & temp)) >> 1;
v0 = ((v0 & (v1 ^ notx)) | (~v0 & (v1 ^ noty))) >> 1;
}
hodd = (~v0 & (v1 ^ x)) | (v0 & (v1 ^ noty));
return interleaveBits(hodd, heven);
}
/**
* Interleave the bits from two input integer values
* #param odd integer holding bit values for odd bit positions
* #param even integer holding bit values for even bit positions
* #return the integer that results from interleaving the input bits
*
* #todo: I'm sure there's a more elegant way of doing this !
*/
private static int interleaveBits(int odd, int even) {
int val = 0;
// Replaced this line with the improved code provided by Tuska
// int n = Math.max(Integer.highestOneBit(odd), Integer.highestOneBit(even));
int max = Math.max(odd, even);
int n = 0;
while (max > 0) {
n++;
max >>= 1;
}
for (int i = 0; i < n; i++) {
int bitMask = 1 << i;
int a = (even & bitMask) > 0 ? (1 << (2*i)) : 0;
int b = (odd & bitMask) > 0 ? (1 << (2*i+1)) : 0;
val += a + b;
}
return val;
}
See uzaygezen.
The code and java code above are fine for 2D data points. But for higher dimensions you may need to look at Jonathan Lawder's paper: J.K.Lawder. Calculation of Mappings Between One and n-dimensional Values Using the Hilbert Space-filling Curve.
I figured out a slightly more efficient way to interleave bits. It can be found at the Stanford Graphics Website. I included a version that I created that can interleave two 32 bit integers into one 64 bit long.
public static long spreadBits32(int y) {
long[] B = new long[] {
0x5555555555555555L,
0x3333333333333333L,
0x0f0f0f0f0f0f0f0fL,
0x00ff00ff00ff00ffL,
0x0000ffff0000ffffL,
0x00000000ffffffffL
};
int[] S = new int[] { 1, 2, 4, 8, 16, 32 };
long x = y;
x = (x | (x << S[5])) & B[5];
x = (x | (x << S[4])) & B[4];
x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
return x;
}
public static long interleave64(int x, int y) {
return spreadBits32(x) | (spreadBits32(y) << 1);
}
Obviously, the B and S local variables should be class constants but it was left this way for simplicity.
Michael,
thanks for your Java code! I tested it and it seems to work fine, but I noticed that the bit-interleaving function overflows at recursion level 7 (at least in my tests, but I used long values), because the "n"-value is calculated using highestOneBit()-function, which returns the value and not the position of the highest one bit; so the loop does unnecessarily many interleavings.
I just changed it to the following snippet, and after that it worked fine.
int max = Math.max(odd, even);
int n = 0;
while (max > 0) {
n++;
max >>= 1;
}
If you need a spatial index with fast delete/insert capabilities, have a look at the PH-tree. It partly based on quadtrees but faster and more space efficient. Internally it uses a Z-curve which has slightly worse spatial properties than an H-curve but is much easier to calculate.
Paper: http://www.globis.ethz.ch/script/publication/download?docid=699
Java implementation: http://globis.ethz.ch/files/2014/11/ph-tree-2014-11-10.zip
Another option is the X-tree, which is also available here:
https://code.google.com/p/xxl/
Suggestion: A good simple efficient data structure for spatial queries is a multidimensional binary tree.
In a traditional binary tree, there is one "discriminant"; the value that's used to determine whether you take the left branch or the right branch. This can be considered to be the one-dimensional case.
In a multidimensional binary tree, you have multiple discriminants; consecutive levels use different discriminants. For example, for two dimensional spacial data, you could use the X and Y coordinates as discriminants. Consecutive levels would use X, Y, X, Y...
For spatial queries (for example finding all nodes within a rectangle) you do a depth-first search of the tree starting at the root, and you use the discriminant at each level to avoid searching down branches that contain no nodes in the given rectangle.
This allows you to potentially cut the search space in half at each level, making it very efficient for finding small regions in a massive data set. (BTW, this data structure is also useful for partial-match queries, i.e. queries that omit one or more discriminants. You just search down both branches at levels with an omitted discriminant.)
A good paper on this data structure: http://portal.acm.org/citation.cfm?id=361007
This article has good diagrams and algorithm descriptions: http://en.wikipedia.org/wiki/Kd-tree

Resources