MPSImageIntegral returning all zeros - macos

I am trying to use MPSImageIntegral to calculate the sum of some elements in an MTLTexture. This is what I'm doing:
std::vector<float> integralSumData;
for(int i = 0; i < 10; i++)
integralSumData.push_back((float)i);
MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatR32Float
width:(integralSumData.size()) height:1 mipmapped:NO];
textureDescriptor.usage = MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite;
id<MTLTexture> texture = [_device newTextureWithDescriptor:textureDescriptor];
// Calculate the number of bytes per row in the image.
NSUInteger bytesPerRow = integralSumData.size() * sizeof(float);
MTLRegion region =
{
{ 0, 0, 0 }, // MTLOrigin
{integralSumData.size(), 1, 1} // MTLSize
};
// Copy the bytes from the data object into the texture
[texture replaceRegion:region
mipmapLevel:0
withBytes:integralSumData.data()
bytesPerRow:bytesPerRow];
MTLTextureDescriptor *textureDescriptor2 = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatR32Float
width:(integralSumData.size()) height:1 mipmapped:NO];
textureDescriptor2.usage = MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite;
id<MTLTexture> outtexture = [_device newTextureWithDescriptor:textureDescriptor2];
// Create a MPS filter.
MPSImageIntegral *integral = [[MPSImageIntegral alloc] initWithDevice: _device];
MPSOffset offset = { 0,0,0};
[integral setOffset:offset];
[integral setEdgeMode:MPSImageEdgeModeZero];
[integral encodeToCommandBuffer:commandBuffer sourceTexture:texture destinationTexture:outtexture];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
But, when I check my outtexture values, its all zeroes. Am I doing something wrong? Is this a correct way in which I shall use MPSImageIntegral?
I'm using the following code to read values written into the outTexture:
float outData[100];
[outtexture getBytes:outData bytesPerRow:bytesPerRow fromRegion:region mipmapLevel:0];
for(int i = 0; i < 100; i++)
std::cout << outData[i] << "\n";
Thanks

As pointed out by #Matthijis: All I had to do was use an MTLBlitEncoder to make sure I synchronise my MTLTexture before reading it into CPU, and it worked like charm!

Related

How to repeat tensor in libtorch

I am using libtorch to inference, I have read data from txt file to vector and convert to tensor, I want to repeat a tensor three times then change it to 3D,
I tried this
std::vector<std::vector<float>> feature_data(255, std::vector<float>(221));
ifstream f_data("../data.txt"); //
if (! f_data) {
cout << "Error, file couldn't be opened" << endl;
return 1;
}
for(int i=0;i<255;i++)
{
for(int j=0;j<221;j++)
{
if ( !f_data )
{
std::cout << "read error" << std::endl;
break;
}
f_data >> feature_data[i][j];
}
}
auto data_options = torch::TensorOptions().dtype(at::kFloat);
auto feature_tensor = torch::zeros({255,221}, data_options);
for (int i = 0; i < 255; i++)
feature_tensor.slice(0, i,i+1) = torch::from_blob(feature_data[i].data(), {221},
data_options);
// begin to repeat three times
auto tensor_clone = feature_tensor.clone();
auto one_time_clone = torch::cat({feature_tensor, tensor_clone}, 0);
auto two_times_clone = torch::cat({one_time_clone, tensor_clone}, 0);
auto transformed_asr = two_times_clone.view({3, 255, 221});
it looks troublesome and I am not sure if it is right, is there an easy way?
...
for (int i = 0; i < 255; i++)
feature_tensor.slice(0, i,i+1) = torch::from_blob(feature_data[i].data(), {221},
data_options);
auto tensor_clone = feature_tensor.repeat({3, 1});
...

Issue in plotting resultant bit map of two bit maps difference

I want to compare one bitmap with another bitmap (reference bitmap) and draw all the difference of it in resultant bit map.
Using below code I am able to draw only difference area but not with exact color of it.
Here is my code
Bitmap ResultantBitMap = new Bitmap(bitMap1.Height, bitMap2.Height);
BitmapData bitMap1Data = bitMap1.LockBits(new Rectangle(0, 0, bitMap1.Width, bitMap1.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
BitmapData bitMap2Data = bitMap2.LockBits(new Rectangle(0, 0, bitMap2.Width, bitMap2.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
BitmapData bitMapResultantData = ResultantBitMap.LockBits(new Rectangle(0, 0, ResultantBitMap.Width, ResultantBitMap.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
IntPtr scan0 = bitMap1Data.Scan0;
IntPtr scan02 = bitMap2Data.Scan0;
IntPtr scan0ResImg1 = bitMapResultantData.Scan0;
int bitMap1Stride = bitMap1Data.Stride;
int bitMap2Stride = bitMap2Data.Stride;
int ResultantImageStride = bitMapResultantData.Stride;
for (int y = 0; y < bitMap1.Height; y++)
{
//define the pointers inside the first loop for parallelizing
byte* p = (byte*)scan0.ToPointer();
p += y * bitMap1Stride;
byte* p2 = (byte*)scan02.ToPointer();
p2 += y * bitMap2Stride;
byte* pResImg1 = (byte*)scan0ResImg1.ToPointer();
pResImg1 += y * ResultantImageStride;
for (int x = 0; x < bitMap1.Width; x++)
{
//always get the complete pixel when differences are found
if (Math.Abs(p[0] - p2[0]) >= 20 || Math.Abs(p[1] - p2[1]) >= 20 || Math.Abs(p[2] - p2[2]) >= 20)
{
pResImg1[0] = p2[0];// B
pResImg1[1] = p2[1];//R
pResImg1[2] = p2[2];//G
pResImg1[3] = p2[3];//A (Opacity)
}
p += 4;
p2 += 4;
pResImg1 += 4;
}
}
bitMap1.UnlockBits(bitMap1Data);
bitMap2.UnlockBits(bitMap2Data);
ResultantBitMap.UnlockBits(bitMapResultantData);
ResultantBitMap.Save(#"c:\\abcd\abcd.jpeg");
What I want is the difference image with exact color of the reference image.
It's hard to tell what's going on without knowing what all those library calls and "+= 4" are but, are you sure p and p2 correspond to the first and second images of your diagram?
Also, your "Format32bppArgb" format suggests that [0] corresponds to alpha, not to red. Maybe there's a problem with that, too.

NSBitmapImageRep:colorAtX:y returned NSColor keeps growing on the heap

I'm comparing the "color distance" between two images with the same width and height to see how similar they are - the measure of similarity is just comparing them pixel by pixel and seeing how far each of their color channels are from one another.
- (NSNumber*) calculateFitness:(NSImage*)currentImage
andDestinationImage:(NSImage*)destinationImage {
NSData *tiffData = [currentImage TIFFRepresentation];
NSBitmapImageRep *currentImageRep = [NSBitmapImageRep
imageRepWithData:tiffData];
NSData *destinationImageTiffData = [destinationImage TIFFRepresentation];
NSBitmapImageRep *destinationImageRep = [NSBitmapImageRep imageRepWithData:destinationImageTiffData];
long fitnessScore = 0;
for (int width = 0; width < currentImageRep.size.width; width++) {
for (int height = 0; height < currentImageRep.size.height; height++) {
NSColor *destinationColor = [destinationImageRep colorAtX:width y:height];
NSColor *currentColor = [currentImageRep.size.height colorAtX:width y:height];
CGFloat deltaRed = (currentColor.redComponent - destinationColor.redComponent) * 255;
CGFloat deltaGreen = (currentColor.greenComponent - destinationColor.greenComponent) * 255;
CGFloat deltaBlue = (currentColor.blueComponent - destinationColor.blueComponent) * 255;
fitnessScore += (deltaRed * deltaRed) +
(deltaGreen * deltaGreen) +
(deltaBlue * deltaBlue);
}
}
}
I call this method many many times in my program to compare the fitness of thousands of images to one another. What I'm noticing in instruments is that the number of living NSCalibratedRGBColor objects keeps growing and it's due to the destinationColor and currentColor objects being created with NSBitmapImageRep:colorAtX:y above. Eventually, my entire system memory will be consumed.
So - is there a reason why this happens? What am I doing wrong? Is there a more efficient way to get the raw bitmap data for my images?
Thanks
Mustafa
You might get better performance by using the raw bitmap data. NSBitmapImageRep's -colorAtX:y: (and -getPixel:atX:y) are quite slow if you're going through all the image data. Also all the NSColors allocated will be held in the autorelease pool until your app returns to the main loop .
unsigned char *currentData = [currentImageRep bitmapData];
unsigned char *destinationData = [destinationImageRep bitmapData];
NSUInteger width = [currentImageRep pixelWidth];
NSUInteger height = [currentImageRep pixelHeight];
NSUInteger currentBytesPerRow = [currentImageRep bytesPerRow];
NSUInteger destBytesPerRow = [destinationImageRep bytesPerRow];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
unsigned char *srcPixel = currentData + ((x * 4) + (y * currentBytesPerRow));
unsigned char *destPixel = destinationData + ((x * 4) + (y * destBytesPerRow));
char sr, sg, sb;
char dr, dg, db;
sr = *srcPixel;
sg = *(srcPixel + 1);
sb = *(srcPixel + 2);
dr = *destPixel;
dg = *(destPixel + 1);
db = *(destPixel + 2);
CGFloat deltaRed = (sr - dr);
CGFloat deltaGreen = (sg - dg);
CGFloat deltaBlue = (sb - db);
fitnessScore += (deltaRed * deltaRed) +
(deltaGreen * deltaGreen) +
(deltaBlue * deltaBlue);
}
}
I wrote this https://medium.com/#iainx/fast-colour-analysis-d8b6422c1135 on doing fast colour analysis, and this was one of the things I discovered.

Parallelizing with OpenMP - how?

I want to parallelize an OpenMP raytracing algorithm that contains two for loops.
Is there anything more I can do than just setting omp_set_num_threads(omp_get_max_threads()) and putting #pragma omp parallel for in front of the first for loop?
So far I've reached a 2.13-times faster algorithm.
Code:
start = omp_get_wtime();
#pragma omp parallel for
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
int intersection_object = -1; // none
int reflected_intersection_object = -1; // none
double current_lambda = 0x7fefffffffffffff; // maximum positive double
double current_reflected_lambda = 0x7fefffffffffffff; // maximum positive double
RAY ray, shadow_ray, reflected_ray;
PIXEL pixel;
SPHERE_INTERSECTION intersection, current_intersection, shadow_ray_intersection, reflected_ray_intersection, current_reflected_intersection;
double red, green, blue;
double theta, reflected_theta;
bool bShadow = false;
pixel.i = i;
pixel.j = j;
// 1. compute ray:
compute_ray(&ray, &view_point, &viewport, &pixel, &camera_frame, focal_distance);
// 2. check if ray hits an object:
for (int k = 0; k < NSPHERES; k++)
{
if (sphere_intersection(&ray, &sphere[k], &intersection))
{
// there is an intersection between ray and object
// 1. Izracunanaj normalu...
intersection_normal(&sphere[k], &intersection, &ray);
// 2. ako je lambda presjecista manji od trenutacnog:
if (intersection.lambda_in < current_lambda)
{
current_lambda = intersection.lambda_in;
intersection_object = k;
copy_intersection_struct(&current_intersection, &intersection);
}
// izracunaj current lambda current_lambda =
// oznaci koji je trenutacni object : intersection_object =
// kopiraj strukturu presjeka : copy_intersection_struct();
}
}
// Compute the color of the pixel:
if (intersection_object > -1)
{
compute_shadow_ray(&shadow_ray, &intersection, &light);
theta = dotproduct(&(shadow_ray.direction), &(intersection.normal));
for (int l = 0; l<NSPHERES; l++)
{
if (l != intersection_object)
{
if (sphere_intersection(&shadow_ray, &sphere[l], &shadow_ray_intersection) && (theta>0.0))
bShadow = true;
}
}
if (bShadow)
{ // if in shadow, add only ambiental light to the surface color
red = shadow(sphere[intersection_object].ka_rgb[CRED], ambi_light_intensity);
green = shadow(sphere[intersection_object].ka_rgb[CGREEN], ambi_light_intensity);
blue = shadow(sphere[intersection_object].ka_rgb[CBLUE], ambi_light_intensity);
}
else
{
// the intersection is not in shadow:
red = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CRED], sphere[intersection_object].ks_rgb[CRED], sphere[intersection_object].ka_rgb[CRED], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
green = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CGREEN], sphere[intersection_object].ks_rgb[CGREEN], sphere[intersection_object].ka_rgb[CGREEN], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
blue = blinnphong_shading(&current_intersection, &light, &view_point,
sphere[intersection_object].kd_rgb[CBLUE], sphere[intersection_object].ks_rgb[CBLUE], sphere[intersection_object].ka_rgb[CBLUE], sphere[intersection_object].shininess,
light_intensity, ambi_light_intensity);
}
tabelaPixlov[i][j].red = red;
tabelaPixlov[i][j].green = green;
tabelaPixlov[i][j].blue = blue;
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
intersection_object = -1;
bShadow = false;
}
else
{
// draw the pixel with the background color
tabelaPixlov[i][j].red = 0;
tabelaPixlov[i][j].green = 0;
tabelaPixlov[i][j].blue = 0;
intersection_object = -1;
bShadow = false;
}
current_lambda = 0x7fefffffffffffff;
current_reflected_lambda = 0x7fefffffffffffff;
}
}
//glFlush();
stop = omp_get_wtime();
for (int i = 0; i < (viewport.xvmax - viewport.xvmin); i++)
{
for (int j = 0; j < (viewport.yvmax - viewport.yvmin); j++)
{
glColor3f(tabelaPixlov[i][j].red, tabelaPixlov[i][j].green, tabelaPixlov[i][j].blue);
glBegin(GL_POINTS);
glVertex2i(i, j);
glEnd();
}
}
printf("%f\n št niti:%d\n", stop - start, omp_get_max_threads());
glutSwapBuffers();
}
With ray tracing you should use schedule(dynamic). Besides that I would suggest fusing the loop
#pragma omp parallel for schedule(dynamic) {
for(int n=0; n<((viewport.xvmax - viewport.xvmin)*(viewport.yvmax - viewport.yvmin); n++) {
int i = n/(viewport.yvmax - viewport.yvmin);
int j = n%(viewport.yvmax - viewport.yvmin)
//...
}
Also, why are you setting the number of threads? Just use the default which should be set to the number of logical cores. If you have Hyper Threading ray tracing is one of the algorithms that will benefit from Hyper Threading so you don't want to set the number of threads to the number of physical cores.
In addition to using MIMD with OpenMP I would suggest looking into using SIMD for ray tracing. See Ingo Wald's PhD thesis for an example on how to do this http://www.sci.utah.edu/~wald/PhD/. Basically you shoot four (eight) rays in one SSE (AVX) register and then go down the ray tree for each ray in parallel. However, if one ray finishes you hold it and wait until all four are finished (this is similar to what is done on the GPU). There have been many papers written since which have more advanced tricks based on this idea.

how to create a new QImage from an array of floats

I have an array of floats that represents an Image.(column first).
I want to show the image on a QGraphicsSecene as a QPixmap. In order to do that I tried to create anew image from my array with the QImage constructor - QImage ( const uchar * data, int width, int height, Format format ).
I first created a new unsigned char and casted every value from my original array to new unsigned char one, and then tried to create a new image with the following code:
unsigned char * data = new unsigned char[fres.length()];
for (int i =0; i < fres.length();i++)
data[i] = char(fres.dataPtr()[i]);
bcg = new QImage(data,fres.cols(),fres.rows(),1,QImage::Format_Mono);
The problem is when I try to access the information in the following way:
bcg->pixel(i,j);
I get only the value 12345.
How can I create a viewable image from my array.
Thanks
There are two problems here.
One, casting a float to a char simply rounds the float, so 0.3 may be rounded to 0 and 0.9 may be rounded to 1. For a range of 0..1, the char will only contain 0 or 1.
To give the char the full range, use a multiply:
data[i] = (unsigned char)(fres.dataPtr()[i] * 255);
(Also, your cast was incorrect.)
The other problem is that your QImage::Format is incorrect; Format_Mono expects 1BPP bitpacked data, not 8BPP as you're expecting. There are two ways to fix this issue:
// Build a colour table of grayscale
QByteArray data(fres.length());
for (int i = 0; i < fres.length(); ++i) {
data[i] = (unsigned char)(fres.dataPtr()[i] * 255);
}
QVector<QRgb> grayscale;
for (int i = 0; i < 256; ++i) {
grayscale.append(qRgb(i, i, i));
}
QImage image(data.constData(), fres.cols(), fres.rows(), QImage::Format_Index8);
image.setColorTable(grayscale);
// Use RGBA directly
QByteArray data(fres.length() * 4);
for (int i = 0, j = 0; i < fres.length(); ++i, j += 4) {
data[j] = data[j + 1] = data[j + 2] = // R, G, B
(unsigned char)(fres.dataPtr()[i] * 255);
data[j + 4] = ~0; // Alpha
}
QImage image(data.constData(), fres.cols(), fres.rows(), QImage::Format_ARGB32_Premultiplied);

Resources