I am trying to detect doctor's tools from endoscopy video using the following code
static void video_tracking(){
//read image
IplImage orgImg = cvLoadImage("C:/Users/Ioanna/Desktop/pic.png");
IplImage thresholdImage = hsvThreshold(orgImg);
cvSaveImage("hsvthreshold.jpg", thresholdImage);
Dimension position = getCoordinates(thresholdImage);
System.out.println("Dimension of original Image : " + thresholdImage.width() + " , " + thresholdImage.height());
System.out.println("Position of red spot : x : " + position.width + " , y : " + position.height);
static Dimension getCoordinates(IplImage thresholdImage) {
int posX = 0;
int posY = 0;
CvMoments moments = new CvMoments();
cvMoments(thresholdImage, moments, 1);
double momX10 = cvGetSpatialMoment(moments, 1, 0); // (x,y)
double momY01 = cvGetSpatialMoment(moments, 0, 1);// (x,y)
double area = cvGetCentralMoment(moments, 0, 0);
posX = (int) (momX10 / area);
posY = (int) (momY01 / area);
return new Dimension(posX, posY);
static IplImage hsvThreshold(IplImage orgImg) {
// Convert the image into an HSV image
IplImage imgHSV = cvCreateImage(cvGetSize(orgImg), 8, 3);
cvCvtColor(orgImg, imgHSV, CV_BGR2HSV);
//create a new image that will hold the threholded image
// 1- color = monochrome
IplImage imgThreshold = cvCreateImage(cvGetSize(orgImg), orgImg.depth(), 1);
//do the actual thresholding
cvInRangeS(imgHSV, cvScalar(13, 0, 0, 0), cvScalar(40, 117, 124, 88), imgThreshold);
cvSmooth(imgThreshold, imgThreshold, CV_MEDIAN, 13);
// save
return imgThreshold;
The input of the above code is a color image with 2 doctor's object and the output is a grayscale image that detect the tools. Sorry, but the system doesn't allow to upload the images.
The problem is I don't know how to find the position of 2 tools and draw a rectangle. Please can some one explain how to archive my objective using javacv/opencv?
It can be achieve using Haar Casacde files .Some of the bulit-In Haar files provided by google can found in opencv\data\haarcascades directory .These files are XML files generated by heavy training sections.Some of the sample haar files are used for face detection,ear detection etc.A sample project can be fount at here.You can generate your own haar cascade file for any purpose.one of the method for generate Haar cascade file can be found at here
I want to resize a screen captured using the Desktop Duplication API in SharpDX. I am using the Screen Capture sample code from the SharpDX Samples repository, relevant portion follows:.
SharpDX.DXGI.Resource screenResource;
OutputDuplicateFrameInformation duplicateFrameInformation;
// Try to get duplicated frame within given time
duplicatedOutput.AcquireNextFrame(10000, out duplicateFrameInformation, out screenResource);
if (i > 0)
// copy resource into memory that can be accessed by the CPU
using (var screenTexture2D = screenResource.QueryInterface<Texture2D>())
device.ImmediateContext.CopyResource(screenTexture2D, screenTexture);
// Get the desktop capture texture
var mapSource = device.ImmediateContext.MapSubresource(screenTexture, 0, MapMode.Read, MapFlags.None);
// Create Drawing.Bitmap
var bitmap = new System.Drawing.Bitmap(width, height, PixelFormat.Format32bppArgb);
var boundsRect = new System.Drawing.Rectangle(0, 0, width, height);
// Copy pixels from screen capture Texture to GDI bitmap
var mapDest = bitmap.LockBits(boundsRect, ImageLockMode.WriteOnly, bitmap.PixelFormat);
var sourcePtr = mapSource.DataPointer;
var destPtr = mapDest.Scan0;
for (int y = 0; y < height; y++)
// Iterate and write to bitmap...
I would like to resize the image much smaller than the actual screen size before processing it as a byte array. I do not need to save the image, just get at the bytes. I would like to do this relatively quickly and efficiently (e.g. leveraging GPU if possible).
I'm not able to scale during CopyResource, as the output dimensions are required to be the same as the input dimensions. Can I perform another copy from my screenTexture2D to scale? How exactly do I scale the resource - do I use a Swap Chain, Matrix transform, or something else?
If you are fine resizing to a power of two from the screen, you can do it by:
Create a smaller texture with RenderTarget/ShaderResource usage, and options GenerateMipMaps, same size of screen, mipcount > 1 (2 for having size /2, 3 for having /4...etc.).
Copy the first mipmap of the screen texture to the smaller texture
DeviceContext.GenerateMipMaps on the smaller texture
Copy the selected mimap of the smaller texture (1: /2, 2: /4...etc.) to the staging texture (that should also be declared smaller, i.e. same size as the mipmap that is going to be used)
A quick hack on the original code to generate a /2 texture would be like this:
private static void Main()
// # of graphics card adapter
const int numAdapter = 0;
// # of output device (i.e. monitor)
const int numOutput = 0;
const string outputFileName = "ScreenCapture.bmp";
// Create DXGI Factory1
var factory = new Factory1();
var adapter = factory.GetAdapter1(numAdapter);
// Create device from Adapter
var device = new Device(adapter);
// Get DXGI.Output
var output = adapter.GetOutput(numOutput);
var output1 = output.QueryInterface<Output1>();
// Width/Height of desktop to capture
int width = output.Description.DesktopBounds.Width;
int height = output.Description.DesktopBounds.Height;
// Create Staging texture CPU-accessible
var textureDesc = new Texture2DDescription
CpuAccessFlags = CpuAccessFlags.Read,
BindFlags = BindFlags.None,
Format = Format.B8G8R8A8_UNorm,
Width = width/2,
Height = height/2,
OptionFlags = ResourceOptionFlags.None,
MipLevels = 1,
ArraySize = 1,
SampleDescription = { Count = 1, Quality = 0 },
Usage = ResourceUsage.Staging
var stagingTexture = new Texture2D(device, textureDesc);
// Create Staging texture CPU-accessible
var smallerTextureDesc = new Texture2DDescription
CpuAccessFlags = CpuAccessFlags.None,
BindFlags = BindFlags.RenderTarget | BindFlags.ShaderResource,
Format = Format.B8G8R8A8_UNorm,
Width = width,
Height = height,
OptionFlags = ResourceOptionFlags.GenerateMipMaps,
MipLevels = 4,
ArraySize = 1,
SampleDescription = { Count = 1, Quality = 0 },
Usage = ResourceUsage.Default
var smallerTexture = new Texture2D(device, smallerTextureDesc);
var smallerTextureView = new ShaderResourceView(device, smallerTexture);
// Duplicate the output
var duplicatedOutput = output1.DuplicateOutput(device);
bool captureDone = false;
for (int i = 0; !captureDone; i++)
SharpDX.DXGI.Resource screenResource;
OutputDuplicateFrameInformation duplicateFrameInformation;
// Try to get duplicated frame within given time
duplicatedOutput.AcquireNextFrame(10000, out duplicateFrameInformation, out screenResource);
if (i > 0)
// copy resource into memory that can be accessed by the CPU
using (var screenTexture2D = screenResource.QueryInterface<Texture2D>())
device.ImmediateContext.CopySubresourceRegion(screenTexture2D, 0, null, smallerTexture, 0);
// Generates the mipmap of the screen
// Copy the mipmap 1 of smallerTexture (size/2) to the staging texture
device.ImmediateContext.CopySubresourceRegion(smallerTexture, 1, null, stagingTexture, 0);
// Get the desktop capture texture
var mapSource = device.ImmediateContext.MapSubresource(stagingTexture, 0, MapMode.Read, MapFlags.None);
// Create Drawing.Bitmap
var bitmap = new System.Drawing.Bitmap(width/2, height/2, PixelFormat.Format32bppArgb);
var boundsRect = new System.Drawing.Rectangle(0, 0, width/2, height/2);
// Copy pixels from screen capture Texture to GDI bitmap
var mapDest = bitmap.LockBits(boundsRect, ImageLockMode.WriteOnly, bitmap.PixelFormat);
var sourcePtr = mapSource.DataPointer;
var destPtr = mapDest.Scan0;
for (int y = 0; y < height/2; y++)
// Copy a single line
Utilities.CopyMemory(destPtr, sourcePtr, width/2 * 4);
// Advance pointers
sourcePtr = IntPtr.Add(sourcePtr, mapSource.RowPitch);
destPtr = IntPtr.Add(destPtr, mapDest.Stride);
// Release source and dest locks
device.ImmediateContext.UnmapSubresource(stagingTexture, 0);
// Save the output
// Capture done
captureDone = true;
catch (SharpDXException e)
if (e.ResultCode.Code != SharpDX.DXGI.ResultCode.WaitTimeout.Result.Code)
throw e;
// Display the texture using system associated viewer
System.Diagnostics.Process.Start(Path.GetFullPath(Path.Combine(Environment.CurrentDirectory, outputFileName)));
// TODO: We should cleanp up all allocated COM objects here
You need to take your original source surface in GPU memory and Draw() it on to a smaller surface. This involves simple vector/pixel shaders, which some folks with simple needs would rather bypass.
I would look to see if someone made a sprite lib for sharpdx. It should be a common "thing"...or using Direct2D (which is much more fun). Since D2D is just a user-mode library over D3D, it interops with D3D very easily.
I've never used SharpDx, but fFrom memory you would do something like this:
1.) Create an ID2D1Device, wrapping your existing DXGI Device (make sure your dxgi device creation flag has D3D11_CREATE_DEVICE_BGRA_SUPPORT)
2.) Get the ID2D1DeviceContext from your ID2D1Device
3.) Wrap your source and destination DXGI surfaces into D2D bitmaps with ID2D1DeviceContext::CreateBitmapFromDxgiSurface
4.) ID2D1DeviceContext::SetTarget of your destination surface
5.) BeginDraw, ID2D1DeviceContext::DrawBitmap, passing your source D2D bitmap. EndDraw
6.) Save your destination
Here is a pixelate example...
D2D1_SIZE_F rtSize = mp_ppBitmap1->GetSize();
rtSize.height *= (1.0f / cbpx.iPixelsize.y);
rtSize.width *= (1.0f / cbpx.iPixelsize.x);
D2D1_RECT_F rtRect = { 0.0f, 0.0f, rtSize.width, rtSize.height };
D2D1_SIZE_F rsSize = mp_ppBitmap0->GetSize();
D2D1_RECT_F rsRect = { 0.0f, 0.0f, rsSize.width, rsSize.height };
d2d_device_context_h()->DrawBitmap(mp_ppBitmap0.Get(), &rtRect, 1.0f,
d2d_device_context_h()->DrawBitmap(mp_ppBitmap1.Get(), &rsRect, 1.0f,
Where iPixelsize.xy is the size of the "pixelated pixel", note that i just use linear interpolation when shrinking the bmp and NOT when i reenlarge. This will generate a pixelation effect.
I have tried this code. 540 is the left most x value of the box,3 is left most y value of the box,262 - width ,23 -height of the region which I am going to calculate the ratio of the white/black pixels. What I really wanted to do is detect the number of white/black pixel ratio in a specific region.I have calculate the coordinates for each cell (regions which I am going to specified)and try with this code.But the error in counting.
Can I please have an idea about this issue please..
I am really stuck here with my final year project.
CvSize cvSize = cvSize(img.width(), img.height());
IplImage image = cvCreateImage(cvSize, IPL_DEPTH_8U, 1);
IplImage image2 = cvCreateImage(cvSize, IPL_DEPTH_8U, 3);
cvCvtColor(image2, image, CV_RGB2GRAY);
cvSetImageROI(image2, cvRect(540,3,262,23));
//IplImage image2 = cvCreateImage(cvSize, IPL_DEPTH_8U, 3);
//cvCvtColor(arg0, arg1, arg2)
// cvCvtColor(image2, image, CV_RGB2GRAY);
//cvThreshold(image, image, 128, 255, CV_THRESH_BINARY);
CvLineIterator iterator = new CvLineIterator();
double sum = 0, green_sum = 0, red_sum = 0;
CvPoint p2 = new CvPoint(802,3);
CvPoint p1 = new CvPoint(540,26);
int lineCount = cvInitLineIterator(image2, p1, p2, iterator, 8, 0 );
for (int i = 0; i < lineCount; i++) {
sum += iterator.ptr().get() & 0xFF;
it gave the result as sum................0.0
I have really stuck with this..can you please give any solution for this issue please
Move CV_NEXT_LINE_POINT(iterator); line inside the for loop. Then it should work.
Can s.o help to translate this code to ruby-openvc?
IplImage* GetThresholdedImage(IplImage* img)
#Convert the image into an HSV image
IplImage* imgHSV = cvCreateImage(cvGetSize(img), 8, 3);
cvCvtColor(img, imgHSV, CV_BGR2HSV);
#create a new image that will hold the threholded image (which will be returned).
IplImage* imgThreshed = cvCreateImage(cvGetSize(img), 8, 1);
#Now we do the actual thresholding:
cvInRangeS(imgHSV, cvScalar(20, 100, 100), cvScalar(30, 255, 255), imgThreshed)
return imgThreshed;
def getThresholdedImage2 (img)
#blur the source image to reduce color noise
img = img.smooth(CV_GAUSSIAN, 7, 7)
#convert the image to hsv(Hue, Saturation, Value) so its
#easier to determine the color to track(hue) imgHSV = IplImage.new(img.width, img.height, 8, 3);
imgHSV = img.BGR2HSV
#create a new image that will hold the threholded image (which will be returned).
imgThreshed = IplImage.new(img.width, img.height, 8, 1);
#Now we do the actual thresholding:
imgThreshed = imgHSV.in_range(CvScalar.new(20, 100, 100), CvScalar.new(30, 255, 255));
return imgThreshed
Actally, I don't know ruby at all but it seems that I found solution to your problem. Ruby-OpenCV seems to be just a library-wrapper.
For example, if you want to find analogue of cvInRangeS function you should do the following.
By searching in source files I found ext/opencv/cvmat.h with this content:
VALUE rb_range(VALUE self, VALUE start, VALUE end);
VALUE rb_range_bang(VALUE self, VALUE start, VALUE end);
And in the cpp file there's description:
* call-seq:
* in_range(<i>min, max</i>) -> cvmat
* Check that element lie between two object.
* <i>min</i> and <i>max</i> should be CvMat that have same size and type, or CvScalar.
* Return new matrix performed per-element,
* dst(I) = within the range ? 0xFF : 0
So you should find all needed ruby functions by this way. Good luck!
The below code helps me to convert OpenGL output to JPEG image using libjpg but the resultant image is flipped vertical...
The code works perfect but the final image is flipped I dont know why ?!
unsigned char *pdata = new unsigned char[width*height*3];
glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, pdata);
FILE *outfile;
if ((outfile = fopen("sample.jpeg", "wb")) == NULL) {
printf("can't open %s");
struct jpeg_compress_struct cinfo;
struct jpeg_error_mgr jerr;
cinfo.err = jpeg_std_error(&jerr);
jpeg_stdio_dest(&cinfo, outfile);
cinfo.image_width = width;
cinfo.image_height = height;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_RGB;
/*set the quality [0..100] */
jpeg_set_quality (&cinfo, 100, true);
jpeg_start_compress(&cinfo, true);
JSAMPROW row_pointer;
int row_stride = width * 3;
while (cinfo.next_scanline < cinfo.image_height) {
row_pointer = (JSAMPROW) &pdata[cinfo.next_scanline*row_stride];
jpeg_write_scanlines(&cinfo, &row_pointer, 1);
OpenGL's coordinate system has the origin in the lower left corner of the image. LIBJPEG assumes that the origin of the image is in the upper left corner of the image. Make the following change to fix your code:
while (cinfo.next_scanline < cinfo.image_height)
row_pointer = (JSAMPROW) &pdata[(cinfo.image_height-1-cinfo.next_scanline)*row_stride];
jpeg_write_scanlines(&cinfo, &row_pointer, 1);
I'm trying to detect a object using cvblob. So I use cvRenderBlob() method. Program compiled successfully but when at the run time it is returning an unhandled exception. When I break it, the arrow is pointed out to CvLabel *labels = (CvLabel *)imgLabel->imageData + imgLabel_offset + (blob->miny * stepLbl); statement in the cvRenderBlob() method definition of the cvblob.cpp file. But if I use cvRenderBlobs() method it's working fine. I need to detect only one blob that is the largest one. Some one please help me to handle this exception.
Here is my VC++ code,
CvCapture* capture = 0;
IplImage* frame = 0;
int key = 0;
CvBlobs blobs;
CvBlob *blob;
capture = cvCaptureFromCAM(0);
if (!capture) {
printf("Could not initialize capturing....\n");
return 1;
int screenx = GetSystemMetrics(SM_CXSCREEN);
int screeny = GetSystemMetrics(SM_CYSCREEN);
while (key!='q') {
frame = cvQueryFrame(capture);
if (!frame) break;
IplImage* imgHSV = cvCreateImage(cvGetSize(frame), 8, 3);
cvCvtColor(frame, imgHSV, CV_BGR2HSV);
IplImage* imgThreshed = cvCreateImage(cvGetSize(frame), 8, 1);
cvInRangeS(imgHSV, cvScalar(61, 156, 205),cvScalar(161, 256, 305), imgThreshed); // for light blue color
IplImage* imgThresh = imgThreshed;
cvSmooth(imgThresh, imgThresh, CV_GAUSSIAN, 9, 9);
cvShowImage("Thresh", imgThresh);
IplImage* labelImg = cvCreateImage(cvGetSize(imgHSV), IPL_DEPTH_LABEL, 1);
unsigned int result = cvLabel(imgThresh, labelImg, blobs);
blob = blobs[cvGreaterBlob(blobs)];
cvRenderBlob(labelImg, blob, frame, frame);
/*cvRenderBlobs(labelImg, blobs, frame, frame);*/
/*cvFilterByArea(blobs, 60, 500);*/
cvFilterByLabel(blobs, cvGreaterBlob(blobs));
cvShowImage("Video", frame);
key = cvWaitKey(1);
First off, I'd like to point out that you are actually using the regular c syntax. C++ uses the class Mat. I've been working on some blob extraction based on green objects in the picture. Once thresholded properly, which means we have a "binary" image, background/foreground. I use
findContours() //this function expects quite a bit, read documentation
Descriped more clearly in the documentation on structural analysis. It will give you the contour of all the blobs in the image. In a vector which is handling another vector, which is handling points in the image; like so
vector<vector<Point>> contours;
I too need to find the biggest blob, and though my approach can be faulty to some extend, I won't need it to be different. I use
minAreaRect() // expects a set of points (contained by the vector or mat classes
Descriped also under structural analysis
Then access the size of the rect
int sizeOfObject = 0;
int idxBiggestObject = 0; //will track the biggest object
if(contours.size() != 0) //only runs code if there is any blobs / contours in the image
for (int i = 0; i < contours.size(); i++) // runs i times where i is the amount of "blobs" in the image.
myVector = minAreaRect(contours[i])
if(myVector.size.area > sizeOfObject)
sizeOfObject = myVector.size.area; //saves area to compare with further blobs
idxBiggestObject = i; //saves index, so you know which is biggest, alternatively, .push_back into another vector
So okay, we really only measure a rotated bounding box, but in most cases it will do. I hope that you will either switch to c++ syntax, or get some inspiration from the basic algorithm.