How to joint some objects in digital image? - image

I'm looking for some algorithm to joint objects, for example, combine an apple into a tree in digital image and some demo in Matlab. Please show me some materials of that. Thanks for reading and helping me!!!

I not sure if I undertand your question, but if you are looking to do some image overlaping, as does photoshop layers, you can use some image characteristics to, through that characteristc, determine the degree of transparency.
For example, consider using two RGB images. Image A will be overlapped by image B. To do it, we'll use image B brightness to determine transparency degree (255 = 100%).
Intensity = pixel / 255;
NewPixel = (PixelA * (1 - Intensity)) + (PixelB * Intensity);
As intensity is a percentage and each pixel is multiplied by the complement of this percentage, the resulting sum will never overflow over 255 (max graylevel)
int WidthA = imageA.Width * channels;
int WidthB = imageB.Width * channels;
int width = Min(ImageA.Width, ImageB.Width) * channels;
int height = Min(ImageA.Height, ImageB.Height);
byte *ptrA = imageA.Buffer;
byte *ptrB = imageB.Buffer;
for (int y = 0; y < height; y++)
for (int x = 0; x < width; x += channels, ptrA += channels, ptrB += channels)
//Take the intensity of the pixel. If RGB (channels = 3), intensity = (R+B+G) / 3. If grayscale, the pixel value is intensity itself
int avg = 0;
for (int j = 0; j < channels; ++j)
avg += ptrB[j];
//Obtain the intensity as a value between 0..100%
double intensity = (double)(avg / channels) / 255;
for (int j = 0; j < channels; ++j)
//Write in image A the resulting pixel which is obtained by multiplying Image B pixel
//by 100% - intensity plus Image A pixel multiplied by the intensity
ptrA[j] = (byte) ((ptrB[j] * (1.0 - intensity)) + ((intensity) * ptrA[j]));
ptrA = imageA.Buffer + (y * WidthA));
ptrB = imageB.Buffer + (y * WidthB));
You can also change this algorithm in order to overlap Image A over B, in a different place. I'm assuming here the image B coordinate (0, 0) will overlap image A coordinate (0, 0).
But once again, I'm not sure if this is what you are looking for.


How to spread the audio spectrum into a grid

I'm trying to use processing to take an audio input and create a audio spectrum that is broken into multiple rows and fits uniformly to the width of the sketch.
I want the ellipse to be spread out in a grid like fashion and also represent different parts of the spectrum.
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
FFT fft;
AudioInput mic;
void setup()
size(512, 512, P3D);
minim = new Minim(this);
mic = minim.getLineIn();
fft = new FFT(mic.bufferSize(), mic.sampleRate());
void draw()
for(int i = 0; i < fft.specSize(); i++)
float size = fft.getBand(i);
float x = map(i, 0, fft.specSize(), 0, height);
float y = i;
ellipse(x, y, size, size );
The fft data is a 1D signal and you want to visualise the data as a 2D grid.
If you know how many rows and columns you want your grid to have you can use arithmetic to calculate the x and y grid location base on the index.
Let's say you have 100 elements and you want to display them in a 10x10 grid:
use the 1D array counter and modulo (%) the number of columns to calculate the 2D x index and divide (/) by the number of columns to calculate the 2D y index:
for(int i = 0 ; i < 100; i++){
println(i,i % 10, i / 10);
here's a longer commented example:
// fft data placeholder
float[] values = new float[100];
// fill with 100 random values
for(int i = 0 ; i < values.length; i++){
values[i] = random(0.0,1.0);
// how many rows/cols
int rows = 10;
int cols = 10;
// how large will a grid element be (including spacing)
float widthPerSquare = (width / cols);
// grid elements offset from top left
float offsetX = widthPerSquare * 0.5;
float offsetY = widthPerSquare * 0.5;
// traverse data
for(int i = 0; i < 100; i++){
// calculate x,y indices
int gridX = i % rows;
int gridY = i / rows;
// calculate on screen x,y position based on grid element size
float x = offsetX + (gridX * widthPerSquare);
float y = offsetY + (gridY * widthPerSquare);
// set the size to only be 75% of the grid element (to leave some spacing)
float size = values[i] * widthPerSquare * 0.75;
//fill(values[i] * 255);
In your case, let's say fft.specSize() is around 512 and you want to draw a square grid, you could do something like this:
import ddf.minim.analysis.*;
import ddf.minim.*;
Minim minim;
FFT fft;
AudioInput mic;
int rows;
int cols;
float xSpacing;
float ySpacing;
void setup()
size(512, 512, P3D);
minim = new Minim(this);
mic = minim.getLineIn();
fft = new FFT(mic.bufferSize(), mic.sampleRate());
// define your own grid size or use an estimation based on square root of your FFT data
rows = cols = (int)sqrt(fft.specSize());
println(rows,rows * rows);
xSpacing = width / cols;
ySpacing = height / rows;
void draw()
for(int i = 0; i < fft.specSize(); i++)
float size = fft.getBand(i) * 90;
float x = (i % rows) * xSpacing;
float y = (i / rows) * ySpacing;
ellipse(x, y, size, size );
Notice that the example isn't applying the offset and the grid is 22 x 22 (484 != 512),
but hopefully it will give you some ideas.
The other thing to bare in mind is the contents of that FFT array.
You might want to scale that logarithmically to account for how we perceive sound.
Check out Processing > Examples > Contributed Libraries > Minim > Analysis > SoundSpectrum and have a look at logAverages(). Playing minBandwidth and bandsPerOctave might help you get a nicer visualisation.
If you want to go a bit deeper into visualisation checkout this wakjah' excellent answer here and if you have time, go through Dan Ellis' amazing Music Signal Computing course

Can't isolate pixels from av_frame_copy_to_buffer

I'm trying to pull the YUV pixel data from an AVFrame, modify the pixels, and put it back into FFmpeg.
I'm currently using this to retrieve the YUV buffer
const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(base->format);
int baseSize = av_image_get_buffer_size(base->format, base->width, base->height, 32);
uint8_t *baseBuffer = (uint8_t*)malloc(baseSize);
av_image_copy_to_buffer(baseBuffer, baseSize, base->data, base->linesize, base->format, base->width, base->height, 32);
But I can't seem to correctly target pixels in that buffer. From the source code they seem to be stacking the planes on top of each other, leading me to attempt this
int width = base->width;
int height = base->height;
int chroma2h = desc->log2_chroma_h;
int linesizeY = base->linesize[0];
int linesizeU = base->linesize[1];
int linesizeV = base->linesize[2];
int chromaHeight = (height + (1 << chroma2h) -1) >> chroma2h;
int x = 100;
int y = 100;
uint8_t *vY = base;
uint8_t *vU = base +(linesizeY*height);
uint8_t *vV = base +((linesizeY*height) + (linesizeU*chromaHeight));
vY+= x + (y * linesizeY);
vU+= x + (y * linesizeU);
vV+= x + (y * linesizeV);
Using that, if I try to modify pixels from a range of 300,300-400,400 I get a small box darker than the rest of the video, along with horizontal stripes of darkness along the video. The original color is still there, so I think I'm still touching the Y plane on all 3 pointers.
How can I actually hit the pixels I want to hit?

Low pass filter with unit integral

In image processing, specifically in fingerprint recognition, I have to apply a two-dimensional low pass filter with a unit integral.
What does this unit integral mean? Also, if I choose a Gaussian filter, what sigma to use?
Unit integral means that the total area of the mask or kernel should be 1. For example, a 3 x 3 averaging filter means that every coefficient in your mask should be 1/9. When you sum up all of the elements in the mask it adds to 1.
The Gaussian filter inherently has a unit integral / unit area of 1. If you use MATLAB, the fspecial command with the gaussian flag has its mask normalized.
However, if you want to create the Gaussian mask yourself, you can use the following equation:
Bear in mind that (x,y) are the locations inside the mask with respect to the centre. As such, if you have a 5 x 5 mask, then at row = 2, col = 2, x = 0 and y = 0. However, the above equation does not generate a unit area of 1. It is theoretically equal to 1 if you integrate over the entire 2D plane. Because we are truncating the Gaussian function, the area is not 1. As such, once you generate all of your coefficients, you need to make sure that the total area is 1 by summing up every single element in the mask. Then, you take this number and divide every single element in your mask by this number. In fact when you generate the Gaussian mask, it's not important to multiply the exponential term by the scale factor in the equation. By ensuring that the sum of the mask is equal to 1, the scale is effectively removed. You can just use the exponential term instead to shave off some calculations.
In terms of the sigma that is completely up to you. Usually people go with the half width of 3*sigma rule, so the total width spanning from left to right in 1D is 6*sigma + 1 (including the centre). In order to figure out what sigma you want specifically, people figure out how wide the smallest feature is in the image, set that as the width then figure out the sigma from there. For example, if the biggest width is 13, then rearranging for sigma in the equation gives you 2. In other words:
13 = 6*sigma + 1
12 = 6*sigma
sigma = 2
As such, you'd set your sigma to 2 and make the mask 13 x 13. For more information about the 3*sigma rule, check out my post on the topic here: By which measures should I set the size of my Gaussian filter in MATLAB?
Once you create that mask, use any convolution method you wish to Gaussian filter your image.
Here's another post that may help you if you can use MATLAB.
How to make a Gaussian filter in Matlab
If you need to use another language like C or Java, then you could create a Gaussian mask in the following way:
C / C++
#define WIDTH 13
float sigma = ((float)WIDTH - 1.0f) / 6.0f;
int half_width = (int)(WIDTH / 2.0);
float mask[WIDTH][WIDTH];
float scale = 0.0f;
for (int i = -half_width; i <= half_width; i++) {
for(int j = -half_width; j <= half_width; j++) {
mask[i+half_width][j+half_width] = expf( -((float)(i*i + j*j) / (2.0*sigma*sigma)) );
scale += mask[i+half_width][j+half_width];
for (int i = 0; i < WIDTH; i++)
for (int j = 0; j < WIDTH; j++)
mask[i][j] /= scale;
int WIDTH = 13;
float sigma = ((float)WIDTH - 1.0f) / 6.0f);
int half_width = Math.floor((float)WIDTH / 2.0f);
float[][] mask = new float[WIDTH][WIDTH];
float scale = 0.0f;
for (int i = -half_width; i <= half_width; i++) {
for (int j = -half_width; j <= half_width; j++) {
mask[i+half_width][j+half_width] = (float) Math.exp( -((double)(i*i + j*j) / (2.0*sigma*sigma)) );
scale += mask[i+half_width][j+half_width];
for (int i = 0; i < WIDTH; i++)
for (int j = 0; j < WIDTH; j++)
mask[i][j] /= scale;
As I noted before, notice that in the code I didn't have to divide by 2*pi*sigma^2. Again, the reason why is because when you normalize the kernel, this constant factor gets cancelled out anyway, so there's no need to add any additional overhead when computing the mask coefficients.

Which is best simple Gaussian blur or FFT of Gaussian blur for sigma=20?

I'm making a program to blur a 16 bit grayscale image in CUDA.
In my program, if I use a Gaussian blur function with sigma = 20 or 30, it takes a lot of time, while it is fast with sigma = 2.0 or 3.0.
I've read in some web site that Guaussian blur with FFT is good for large kernel size or large sigma value:
Is It really true ?
Which algorithm should I use: simple Gaussian blur or Gaussian blur with FFT ?
My code for Guassian Blur is below. In my code , is there something wrong or not ?
enter code here
void gaussian_blur(
unsigned short* const blurredChannel, // return value: blurred channel (either red, green, or blue)
const unsigned short* const inputChannel, // red, green, or blue channel from the original image
int rows,
int cols,
const float* const filterWeight, // gaussian filter weights. The weights look like a bell shape.
int filterWidth // number of pixels in x and y directions for calculating average blurring
int r = blockIdx.y * blockDim.y + threadIdx.y; // current row
int c = blockIdx.x * blockDim.x + threadIdx.x; // current column
if ((r >= rows) || (c >= cols))
int half = filterWidth / 2;
float blur = 0.f; // will contained blurred value
int width = cols - 1;
int height = rows - 1;
for (int i = -half; i <= half; ++i) // rows
for (int j = -half; j <= half; ++j) // columns
// Clamp filter to the image border
int h = min(max(r + i, 0), height);
int w = min(max(c + j, 0), width);
// Blur is a product of current pixel value and weight of that pixel.
// Remember that sum of all weights equals to 1, so we are averaging sum of all pixels by their weight.
int idx = w + cols * h; // current pixel index
float pixel = static_cast<float>(inputChannel[idx]);
idx = (i + half) * filterWidth + j + half;
float weight = filterWeight[idx];
blur += pixel * weight;
blurredChannel[c + r * cols] = static_cast<unsigned short>(blur);
void createFilter(float *gKernel,double sigma,int radius)
double r, s = 2.0 * sigma * sigma;
// sum is for normalization
double sum = 0.0;
// generate 9*9 kernel
int m=0;
for (int x = -radius; x <= radius; x++)
for(int y = -radius; y <= radius; y++)
r = std::sqrtf(x*x + y*y);
gKernel[m] = (exp(-(r*r)/s))/(3.14 * s);
sum += gKernel[m];
// normalize the Kernel
for(int i = 0; i < (radius*2 +1); ++i)
for(int j = 0; j < (radius*2 +1); ++j)
gKernel[m++] /= sum;
int main()
cudaError_t cudaStatus;
const int size =81;
float gKernel[size];
float *dev_p=0;
cudaStatus = cudaMalloc((void**)&dev_p, size * sizeof(float));
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
cudaStatus = cudaMemcpy(dev_p, gKernel, size* sizeof(float), cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
/* i read image Buffere in unsigned short that code is not added here ,becouse it is large , and copy image data of buffere from host to device*/
/* So, suppose i have unsigned short *d_img which contain image data */
cudaMalloc( (void**)&d_img, length* sizeof(unsigned short));
cudaMalloc( (void**)&d_blur_img, length* sizeof(unsigned short));
static const int BLOCK_WIDTH = 32;
int image_width=1580.0,image_height=1050.0;
int x = static_cast<int>(ceilf(static_cast<float>(image_width) / BLOCK_WIDTH));
int y = static_cast<int>(ceilf(static_cast<float>((image_height) ) / BLOCK_WIDTH));
const dim3 grid (x, y, 1); // number of blocks
const dim3 block(BLOCK_WIDTH, BLOCK_WIDTH, 1);
/* after bluring image i will copied buffer from Device to Host and free gpu memory */
return 0;
Short answer: both algorithms are good with respect to image blurring, so feel free to pick the best (fastest) one for your use case.
Kernel size and sigma value are directly correlated: the greater the sigma, the larger the kernel (and thus the more operations-per-pixel to get the final result).
If you implemented a naive convolution, then you should try a separable convolution implementation instead; it will reduce the computation time by an order of magnitude already.
Now some more insight: they implement almost the same Gaussian blurring operation. Why almost ? It's because taking the FFT of an image does implicitly periodize it. Hence, at the border of the image, the convolution kernel sees an image that has been wrapped around its edge. This is called circular convolution (because of the wrapping). On the other hand, Gaussian blur implements a simple linear convolution.

Area of scanned (2D) figure

Lets say I have 100 one-colored A4 sheets of paper, that are cut into different shapes and figures (2D), scanned, saved as an image file, and then needs to be sorted in ascending order of area.
Is there an effective way to find the area of the figures and arrange them?
If all pictures have the same size and all shapes the same color (that´s the situation if I don´t missunderstand your question), you can calculate the average color value.
The nearer the calculated color comes to the figures´s color, the bigger is the shape on the Image.
Some code:
private Color GetAverageImageColor(Image img)
double[] rgb = new double[3];
Color col;
Bitmap bmp = new Bitmap(img);
for(int y = 0; y < bmp.Size.Height; y++)
for(int x = 0; x < bmp.Size.Width; x++)
col = bmp.GetPixel(x, y);
rgb[0] += col.R;
rgb[1] += col.G;
rgb[2] += col.B;
for (int i = 0; i < 3; i++)
rgb[i] /= (bmp.Size.Height * bmp.Size.Width);
rgb[i] = Math.Round(rgb[i]);
return Color.FromArgb((int) rgb[0], (int) rgb[1], (int) rgb[2]);
