I would like to get a background subtraction method for outdoor conditions, capable of gradually adjust itself to environment light variations but with the capacity of revealing a presence even that is not in motion.
The problem with adaptive opencv background subtraction methods is that they are only capable to detect a presence when it is moving. On the other hand, old background subtraction methods do not work when the conditions of light are not always the same.
In order to get this I’ve modified the Golan Levin’s method in the video library of Processing (actual frames are compared with a first initial frame), setting a certain low difference threshold.
I therefore assume that all changes over that threshold are due to presence (persons, animals, etc), and changes below this are due to progressive light conditions, and I put this changed pixel in the background’s pixels array.
/* auto-updating background part*/
diferencia = diffR+diffG+diffB;
if (diferencia<minDif) backgroundPixels[i]=video.pixels[i];
That’s not working satisfactorily, image gets dirty, far off from being homogenous. Any idea of how to achieve this would be extremely welcome.
I post the whole code, if it could be of some help. Thanks a lot for your time.
import processing.video.*;
int numPixels;
int[] backgroundPixels;
Capture video;
int camSel=0;
int topDiff=763;
int unbralDif=120;
int mindDif=20;
boolean subtraction, lowSubtr;
PGraphics _tempPG;
void setup() {
size(640, 480);
_tempPG=createGraphics(width, height);
if (camSel==0)video = new Capture(this, width, height);
else video = new Capture(this, width, height, Capture.list()[1]);
video.start();
numPixels = video.width * video.height;
backgroundPixels = new int[numPixels];
loadPixels();
}
void draw() {
if (video.available()) {
video.read();
video.loadPixels();
int presenceSum = 0;
for (int i = 0; i < numPixels; i++) {
color currColor = video.pixels[i];
color bkgdColor = backgroundPixels[i];
int currR = (currColor >> 16) & 0xFF;
int currG = (currColor >> 8) & 0xFF;
int currB = currColor & 0xFF;
int bkgdR = (bkgdColor >> 16) & 0xFF;
int bkgdG = (bkgdColor >> 8) & 0xFF;
int bkgdB = bkgdColor & 0xFF;
int diffR = abs(currR - bkgdR);
int diffG = abs(currG - bkgdG);
int diffB = abs(currB - bkgdB);
presenceSum += diffR + diffG + diffB;
pixels[i] = 0xFF000000 | (diffR << 16) | (diffG << 8) | diffB;
/* auto-updating background part*/
int diferencia = diffR+diffG+diffB;
//detect pixels that have change below a threshold
if (lowSubtr && diferencia<mindDif) {
/* substitute with them the backgound img array */
backgroundPixels[i]=video.pixels[i];
}
/* end auto-updating background part*/
}
updatePixels();
}
subtraction=false;
}
void keyPressed() {
if (keyPressed)startSubtr();
}
void startSubtr() {
arraycopy(video.pixels, backgroundPixels);
lowSubtr=true;
}
void actualizacion(int[] _srcArr, int[] _inputArr, int _ind) {
for (int i=0; i<_srcArr.length; i++) {
_srcArr[_ind]=_inputArr[i];
}
}
I was investigating the delay_ms function of avr-gcc. In delay.h I found its definition:
void _delay_ms(double __ms)
{
double __tmp ;
#if __HAS_DELAY_CYCLES && defined(__OPTIMIZE__) && \
!defined(__DELAY_BACKWARD_COMPATIBLE__) && \
__STDC_HOSTED__
uint32_t __ticks_dc;
extern void __builtin_avr_delay_cycles(unsigned long);
__tmp = ((F_CPU) / 1e3) * __ms;
#if defined(__DELAY_ROUND_DOWN__)
__ticks_dc = (uint32_t)fabs(__tmp);
#elif defined(__DELAY_ROUND_CLOSEST__)
__ticks_dc = (uint32_t)(fabs(__tmp)+0.5);
#else
//round up by default
__ticks_dc = (uint32_t)(ceil(fabs(__tmp)));
#endif
__builtin_avr_delay_cycles(__ticks_dc);
#else
...
}
I am interested in how the __builtin_avr_delay_cycles function looks like internally and where it is defined? Where can I find the source?
As said in my comment to this very question on electronics.SE:
Compiler builtins are kinda funky to find, always, because they are not just C functions, but things that get inserted while parsing/compiling the code (at various levels of abstraction from the textual representation of the code itself. compiler theory stuff). What you're looking for is the function avr_expand_builtin in the GCC source tree. There's a case AVR_BUILTIN_DELAY_CYCLES in there. Look for what happens there.
Which is:
/* Implement `TARGET_EXPAND_BUILTIN'. */
/* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
SUBTARGET may be used as the target for computing one of EXP's operands.
IGNORE is nonzero if the value is to be ignored. */
static rtx
avr_expand_builtin (tree exp, rtx target,
rtx subtarget ATTRIBUTE_UNUSED,
machine_mode mode ATTRIBUTE_UNUSED,
int ignore)
{
tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
const char *bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
unsigned int id = DECL_FUNCTION_CODE (fndecl);
const struct avr_builtin_description *d = &avr_bdesc[id];
tree arg0;
rtx op0;
gcc_assert (id < AVR_BUILTIN_COUNT);
switch (id)
{
case AVR_BUILTIN_NOP:
emit_insn (gen_nopv (GEN_INT (1)));
return 0;
case AVR_BUILTIN_DELAY_CYCLES:
{
arg0 = CALL_EXPR_ARG (exp, 0);
op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
if (!CONST_INT_P (op0))
error ("%s expects a compile time integer constant", bname);
else
avr_expand_delay_cycles (op0);
return NULL_RTX;
}
…
thus, the function you're looking for is avr_expand_delay_cycles in the same file:
static void
avr_expand_delay_cycles (rtx operands0)
{
unsigned HOST_WIDE_INT cycles = UINTVAL (operands0) & GET_MODE_MASK (SImode);
unsigned HOST_WIDE_INT cycles_used;
unsigned HOST_WIDE_INT loop_count;
if (IN_RANGE (cycles, 83886082, 0xFFFFFFFF))
{
loop_count = ((cycles - 9) / 6) + 1;
cycles_used = ((loop_count - 1) * 6) + 9;
emit_insn (gen_delay_cycles_4 (gen_int_mode (loop_count, SImode),
avr_mem_clobber()));
cycles -= cycles_used;
}
if (IN_RANGE (cycles, 262145, 83886081))
{
loop_count = ((cycles - 7) / 5) + 1;
if (loop_count > 0xFFFFFF)
loop_count = 0xFFFFFF;
cycles_used = ((loop_count - 1) * 5) + 7;
emit_insn (gen_delay_cycles_3 (gen_int_mode (loop_count, SImode),
avr_mem_clobber()));
cycles -= cycles_used;
}
if (IN_RANGE (cycles, 768, 262144))
{
loop_count = ((cycles - 5) / 4) + 1;
if (loop_count > 0xFFFF)
loop_count = 0xFFFF;
cycles_used = ((loop_count - 1) * 4) + 5;
emit_insn (gen_delay_cycles_2 (gen_int_mode (loop_count, HImode),
avr_mem_clobber()));
cycles -= cycles_used;
}
if (IN_RANGE (cycles, 6, 767))
{
loop_count = cycles / 3;
if (loop_count > 255)
loop_count = 255;
cycles_used = loop_count * 3;
emit_insn (gen_delay_cycles_1 (gen_int_mode (loop_count, QImode),
avr_mem_clobber()));
cycles -= cycles_used;
}
while (cycles >= 2)
{
emit_insn (gen_nopv (GEN_INT (2)));
cycles -= 2;
}
if (cycles == 1)
{
emit_insn (gen_nopv (GEN_INT (1)));
cycles--;
}
}
Of biggest interest here is that this modifies a node in the Abstract Syntax Tree, and emits instructions there.
i am trying to write a code to display Mandelbrot set for the numbers between
(-3,-3) to (2,2) on my terminal.
The main function generates & feeds a complex number to analyze function.
The analyze function returns character "*" for the complex number Z within the set and "." for the numbers which lie outside the set.
The code:
#define MAX_A 2 // upperbound on real
#define MAX_B 2 // upper bound on imaginary
#define MIN_A -3 // lowerbnd on real
#define MIN_B -3 // lower bound on imaginary
#define NX 300 // no. of points along x
#define NY 200 // no. of points along y
#define max_its 50
int analyze(double real,double imag);
void main()
{
double a,b;
int x,x_arr,y,y_arr;
int array[NX][NY];
int res;
for(y=NY-1,x_arr=0;y>=0;y--,x_arr++)
{
for(x=0,y_arr++;x<=NX-1;x++,y_arr++)
{
a= MIN_A+ ( x/( (double)NX-1)*(MAX_A-MIN_A) );
b= MIN_B+ ( y/( (double)NY-1 )*(MAX_B-MIN_B) );
//printf("%f+i%f ",a,b);
res=analyze(a,b);
if(res>49)
array[x][y]=42;
else
array[x][y]=46;
}
// printf("\n");
}
for(y=0;y<NY;y++)
{
for(x=0;x<NX;x++)
printf("%2c",array[x][y]);
printf("\n");
}
}
The analyze function accepts the coordinate on imaginary plane ;
and computes (Z^2)+Z 50 times ; and while computing if the complex number explodes, then function returns immidiately else the function returns after finishing 50 iterations;
int analyze(double real,double imag)
{
int iter=0;
double r=4.0;
while(iter<50)
{
if ( r < ( (real*real) + (imag*imag) ) )
{
return iter;
}
real= ( (real*real) - (imag*imag) + real);
imag= ( (2*real*imag)+ imag);
iter++;
}
return iter;
}
So, i am analyzing 60000 (NX * NY) numbers & displaying it on the terminal
considering 3:2 ratio (300,200) , i even tried 4:3 (NX:NY) , but the output remains same and the generated shape is not even close to the mandlebrot set :
hence, the output appears inverted ,
i browsed & came across lines like:
(x - 400) / ZOOM;
(y - 300) / ZOOM;
on many mandelbrot codes , but i am unable to understand how this line may rectify my output.
i guess i am having trouble in mapping output to the terminal!
(LB_Real,UB_Imag) --- (UB_Real,UB_Imag)
| |
(LB_Real,LB_Imag) --- (UB_Real,LB_Imag)
Any Hint/help will be very useful
The Mandelbrot recurrence is zn+1 = zn2 + c.
Here's your implementation:
real= ( (real*real) - (imag*imag) + real);
imag= ( (2*real*imag)+ imag);
Problem 1. You're updating real to its next value before you've used the old value to compute the new imag.
Problem 2. Assuming you fix problem 1, you're computing zn+1 = zn2 + zn.
Here's how I'd do it using double:
int analyze(double cr, double ci) {
double zr = 0, zi = 0;
int r;
for (r = 0; (r < 50) && (zr*zr + zi*zi < 4.0); ++r) {
double zr1 = zr*zr - zi*zi + cr;
double zi1 = 2 * zr * zi + ci;
zr = zr1;
zi = zi1;
}
return r;
}
But it's easier to understand if you use the standard C99 support for complex numbers:
#include <complex.h>
int analyze(double cr, double ci) {
double complex c = cr + ci * I;
double complex z = 0;
int r;
for (r = 0; (r < 50) && (cabs(z) < 2); ++r) {
z = z * z + c;
}
return r;
}
I'm trying to implement an OCR algorithm (GOCR algorithm specifically) to 32F429IDISCOVERY board and I'm still getting nothing back...
I'm recording a image from OV7670 camera in RGB565 format to SDRAM of the board that is then converted to greyscale and passed to the algorithm itself.
From this and other forums I got the impression that GOCR is very good algorithm and it seemed to be working very well on PC but I just cant get it to work on the board.
Does anyone have some experience with implementing OCR or GOCR? I am not sure where the problem is because it beaves in a very wierd way. The code stops in different part of the algorithm almost every time...
Calling the OCR algorithm:
void ocr_algorithm(char *output_str) {
job_t job1, *job; /* fixme, dont want global variables for lib */
job=OCR_JOB=&job1;
int linecounter;
const char *line;
uint8_t r,g,b;
uint32_t n,i,buffer;
char *p_pic;
uint32_t *image = (uint32_t*) SDRAM_START_ADR;
setvbuf(stdout, (char *) NULL, _IONBF, 0); /* not buffered */
job_init(job); /* init cfg and db */
job_init_image(job); /* single image */
p_pic = malloc(IMG_ROWS*IMG_COLUMNS);
// Converting RGB565 to grayscale
i=0;
for (n = 0; n < IMG_ROWS*IMG_COLUMNS; n++) {
if (n % 2 == 0){
buffer = image[i] & 0xFFFF;
}
else{
buffer = (image[i] >> 16) & 0xFFFF;
i++;
}
r = (uint8_t) ((buffer >> 11) & 0x1F);
g = (uint8_t) ((buffer >> 5) & 0x3F);
b = (uint8_t) (buffer & 0x1F);
// RGB888
r = ((r * 527) + 23) >> 6;
g = ((g * 259) + 33) >> 6;
b = ((b * 527) + 23) >> 6;
// Greyscale
p_pic[n] = 0.299*r + 0.587*g + 0.114*b;
}
//read_picture;
job->src.p.p = p_pic;
job->src.p.x = IMG_ROWS;
job->src.p.y = IMG_COLUMNS;
job->src.p.bpp = 1;
/* call main loop */
pgm2asc(job);
//print output
strcpy(output_str, "");
linecounter = 0;
line = getTextLine(&(job->res.linelist), linecounter++);
while (line) {
strcat(output_str, line);
strcat(output_str, "\n");
line = getTextLine(&(job->res.linelist), linecounter++);
}
free_textlines(&(job->res.linelist));
job_free_image(job);
free(p_pic);
}
I've written a CUDA4 Bayer demosaicing routine, but it's slower than single threaded CPU code, running on a16core GTS250.
Blocksize is (16,16) and the image dims are a multiple of 16 - but changing this doesn't improve it.
Am I doing anything obviously stupid?
--------------- calling routine ------------------
uchar4 *d_output;
size_t num_bytes;
cudaGraphicsMapResources(1, &cuda_pbo_resource, 0);
cudaGraphicsResourceGetMappedPointer((void **)&d_output, &num_bytes, cuda_pbo_resource);
// Do the conversion, leave the result in the PBO fordisplay
kernel_wrapper( imageWidth, imageHeight, blockSize, gridSize, d_output );
cudaGraphicsUnmapResources(1, &cuda_pbo_resource, 0);
--------------- cuda -------------------------------
texture<uchar, 2, cudaReadModeElementType> tex;
cudaArray *d_imageArray = 0;
__global__ void convertGRBG(uchar4 *d_output, uint width, uint height)
{
uint x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
uint y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;
uint i = __umul24(y, width) + x;
// input is GR/BG output is BGRA
if ((x < width) && (y < height)) {
if ( y & 0x01 ) {
if ( x & 0x01 ) {
d_output[i].x = (tex2D(tex,x+1,y)+tex2D(tex,x-1,y))/2; // B
d_output[i].y = (tex2D(tex,x,y)); // G in B
d_output[i].z = (tex2D(tex,x,y+1)+tex2D(tex,x,y-1))/2; // R
} else {
d_output[i].x = (tex2D(tex,x,y)); //B
d_output[i].y = (tex2D(tex,x+1,y) + tex2D(tex,x-1,y)+tex2D(tex,x,y+1)+tex2D(tex,x,y-1))/4; // G
d_output[i].z = (tex2D(tex,x+1,y+1) + tex2D(tex,x+1,y-1)+tex2D(tex,x-1,y+1)+tex2D(tex,x-1,y-1))/4; // R
}
} else {
if ( x & 0x01 ) {
// odd col = R
d_output[i].y = (tex2D(tex,x+1,y+1) + tex2D(tex,x+1,y-1)+tex2D(tex,x-1,y+1)+tex2D(tex,x-1,y-1))/4; // B
d_output[i].z = (tex2D(tex,x,y)); //R
d_output[i].y = (tex2D(tex,x+1,y) + tex2D(tex,x-1,y)+tex2D(tex,x,y+1)+tex2D(tex,x,y-1))/4; // G
} else {
d_output[i].x = (tex2D(tex,x,y+1)+tex2D(tex,x,y-1))/2; // B
d_output[i].y = (tex2D(tex,x,y)); // G in R
d_output[i].z = (tex2D(tex,x+1,y)+tex2D(tex,x-1,y))/2; // R
}
}
}
}
void initTexture(int imageWidth, int imageHeight, uchar *imagedata)
{
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 0, 0, 0, cudaChannelFormatKindUnsigned);
cutilSafeCall( cudaMallocArray(&d_imageArray, &channelDesc, imageWidth, imageHeight) );
uint size = imageWidth * imageHeight * sizeof(uchar);
cutilSafeCall( cudaMemcpyToArray(d_imageArray, 0, 0, imagedata, size, cudaMemcpyHostToDevice) );
cutFree(imagedata);
// bind array to texture reference with point sampling
tex.addressMode[0] = cudaAddressModeClamp;
tex.addressMode[1] = cudaAddressModeClamp;
tex.filterMode = cudaFilterModePoint;
tex.normalized = false;
cutilSafeCall( cudaBindTextureToArray(tex, d_imageArray) );
}
There aren't any obvious bugs in your code, but there are several obvious performance opportunities:
1) for best performance, you should use texture to stage into shared memory - see the 'SobelFilter' SDK sample.
2) As written, the code is writing bytes to global memory, which is guaranteed to incur a large performance hit. You can use shared memory to stage results before committing them to global memory.
3) There is a surprisingly big performance advantage to sizing blocks in a way that match the hardware's texture cache attributes. On Tesla-class hardware, the optimal block size for kernels using the same addressing scheme as your kernel is 16x4. (64 threads per block)
For workloads like this, it may be hard to compete with optimized CPU code. SSE2 can do 16 byte-sized operations in a single instruction, and CPUs are clocked about 5 times as fast.
Based on answer on Nvidia forums, here (for the search engines) is a slightly more optomised version which writes a 2x2 block of pixels in each thread. Although the difference in speed isn't measurable on my setup.
Note it should be called with a gridsize half the size of the image;
dim3 blockSize(16, 16); // for example
dim3 gridSize((width/2) / blockSize.x, (height/2) / blockSize.y);
__global__ void d_convertGRBG(uchar4 *d_output, uint width, uint height)
{
uint x = 2 * (__umul24(blockIdx.x, blockDim.x) + threadIdx.x);
uint y = 2 * (__umul24(blockIdx.y, blockDim.y) + threadIdx.y);
uint i = __umul24(y, width) + x;
// input is GR/BG output is BGRA
if ((x < width-1) && (y < height-1)) {
// x+1, y+1:
d_output[i+width+1] = make_uchar4( (tex2D(tex,x+2,y+1)+tex2D(tex,x,y+1))/2, // B
(tex2D(tex,x+1,y+1)), // G in B
(tex2D(tex,x+1,y+2)+tex2D(tex,x+1,y))/2, // R
0xff);
// x, y+1:
d_output[i+width] = make_uchar4( (tex2D(tex,x,y+1)), //B
(tex2D(tex,x+1,y+1) + tex2D(tex,x-1,y+1)+tex2D(tex,x,y+2)+tex2D(tex,x,y))/4, // G
(tex2D(tex,x+1,y+2) + tex2D(tex,x+1,y)+tex2D(tex,x-1,y+2)+tex2D(tex,x-1,y))/4, // R
0xff);
// x+1, y:
d_output[i+1] = make_uchar4( (tex2D(tex,x,y-1) + tex2D(tex,x+2,y-1)+tex2D(tex,x,y+1)+tex2D(tex,x+2,y-1))/4, // B
(tex2D(tex,x+2,y) + tex2D(tex,x,y)+tex2D(tex,x+1,y+1)+tex2D(tex,x+1,y-1))/4, // G
(tex2D(tex,x+1,y)), //R
0xff);
// x, y:
d_output[i] = make_uchar4( (tex2D(tex,x,y+1)+tex2D(tex,x,y-1))/2, // B
(tex2D(tex,x,y)), // G in R
(tex2D(tex,x+1,y)+tex2D(tex,x-1,y))/2, // R
0xff);
}
}
There are many if's and else's in the code. If you structure the code to eliminate all the conditional statements then you will get a huge performance boost as branching is a performance killer. It is indeed possible to remove the branches. There are exactly 30 cases which you will have to code explicitly. I have implemented it on CPU and it does not contain any conditional statements. I am thinking of making a blog explaining it. Will post it once its done.