I am trying to find a way to detect a hot object using Panasonic's GridEye which is a sensor that gives an 8x8 matrix with a temperature in each square.
Is there a common algorithm that allows to detect high variations in the coeffcients of my matrix? (allowing to detect very high and very low temperature objects)
In the image attached for example, I'd like to detect the red squares. The algorithm shall return "Object with temperature 28°C detected".
Temperature profile when a human enters the camera field (distance=1m)
Thanks in advance for your answers :)
Related
I am working on a video stabilization project and need a robust algorithm to estimate transformation matrix between adjacent frames.
My present approach is detecting Harris corners on a image and applying optical flow to find correspondent points of those corners on the other. Subsequently, I use RANSAC estimation to find Rigid transform between 2 point patterns, which represents the global motion between the two frames. The thing is, when images are low quality, there are few or even no key points detected, which leads to failed estimation of motion matrix.
Other methods such as block-based (using phase correlation) do not give better performance, from my experience. The most accurate method is image registration based on global motion optimization algorithm, but it's unfeasible for my application in term of computational load.
Could anyone please give me an idea?
Thank in advance.
So here's the thing: consider a detector, made of several vertical strings, each hosting 60 sensors distributed equidistantly, as shown in the following picture (dark dots are sensors):
Particles will then flow through and produce pulses on each sensor that you can use. Informations include time of passing by, coordinates (stringnumber+position on string or cartesian coordinates), total charge, etc.
The final goal is to reconstruct angles of incoming particles and their energies. Though we first just consider a simple classification problem of finding out whether the particle came in form the north or south hemisphere (i.e. from the top or bottom direction of the detector.
As input values we use every timestamp of each sensor, their exact position hasn't to be passed as it will be encoded by the position of the timestamp in the input list.
The problem we have is with sensor that didn't detect any particle. Is it clever to insert their timestamp as float("Inf")? Leaving their time as zero is another option, but it will then be ambiguous with the sensor that triggered as first (having timestamps normalised to [0,1]).
I'd be glad to hear any hint from people who have already worked with timestamps in neural networks. Also, if you have any good idea on developing NN's for our future tasks please share them! The planned approach are convolutional (deep) neural networks, but we still have to think about how to encode sensor-positions in that irregular hexagonal shape.
A few points to consider:
- How about -1?
- How about NaN?
What language are you programming in?
I am interested in using the Project Tango tablet for 3D reconstruction using arbitrary point features. In the current SDK version, we seem to have access to the following data.
A 1280 x 720 RGB image.
A point cloud with 0-~10,000 points, depending on the environment. This seems to average between 3,000 and 6,000 in most environments.
What I really want is to be able to identify a 3D point for key points within an image. Therefore, it makes sense to project depth into the image plane. I have done this, and I get something like this:
The problem with this process is that the depth points are sparse compared to the RGB pixels. So I took it a step further and performed interpolation between the depth points. First, I did Delaunay triangulation, and once I got a good triangulation, I interpolated between the 3 points on each facet and got a decent, fairly uniform depth image. Here are the zones where the interpolated depth is valid, imposed upon the RGB iamge.
Now, given the camera model, it's possible to project depth back into Cartesian coordinates at any point on the depth image (since the depth image was made such that each pixel corresponds to a point on the original RGB image, and we have the camera parameters of the RGB camera). However, if you look at the triangulation image and compare it to the original RGB image, you can see that depth is valid for all of the uninteresting points in the image: blank, featureless planes mostly. This isn't just true for this single set of images; it's a trend I'm seeing for the sensor. If a person stands in front of the sensor, for example, there are very few depth points within their silhouette.
As a result of this characteristic of the sensor, if I perform visual feature extraction on the image, most of the areas with corners or interesting textures fall in areas without associated depth information. Just an example: I detected 1000 SIFT keypoints from an an RGB image from an Xtion sensor, and 960 of those had valid depth values. If I do the same thing to this system, I get around 80 keypoints with valid depth. At the moment, this level of performance is unacceptable for my purposes.
I can guess at the underlying reasons for this: it seems like some sort of plane extraction algorithm is being used to get depth points, whereas Primesense/DepthSense sensors are using something more sophisticated.
So anyway, my main question here is: can we expect any improvement in the depth data at a later point in time, through improved RGB-IR image processing algorithms? Or is this an inherent limit of the current sensor?
I am from the Project Tango team at Google. I am sorry you are experiencing trouble with depth on the device. Just so that we are sure your device is in good working condition, can you please test the depth performance against a flat wall. Instructions are as below:
https://developers.google.com/project-tango/hardware/depth-test
Even with a device in good working condition, the depth library is known to return sparse depth points on scenes with low IR reflectance objects, small sized objects, high dynamic range scenes, surfaces at certain angles and objects at distances larger than ~4m. While some of these are inherent limitations in the depth solution, we are working with the depth solution provider to bring improvements wherever possible.
Attached an image of a typical conference room scene and the corresponding point cloud. As you can see, 1) no depth points are returned from the laptop screen (low reflectance), the table top objects such as post-its, pencil holder etc (small object sizes), large portions of the table (surface at an angles), room corner at the far right (distance >4m).
But as you move around the device, you will start getting depth point returns. Accumulating depth points is a must to get denser point clouds.
Please also keep us posted on your findings at project-tango-hardware-support#google.com
In my very basic initial experiments, you are correct with respect to depth information returned from the visual field, however, the return of surface points is anything but constant. I find as I move the device I can get major shifts in where depth information is returned, i.e. there's a lot of transitory opacity in the image with respect to depth data, probably due to the characteristics of the surfaces.
So while no return frame is enough, the real question seems to be the construction of a larger model (point cloud to open, possibly voxel spaces as one scales up) to bring successive scans into a common model. It's reminiscent of synthetic aperture algorithms in spirit, but the letters in the equations are from a whole different set of laws.
In short, I think a more interesting approach is to synthesize a more complete model by successive accumulation of point cloud data - now, for this to work, the device team has to have their dead reckoning on the money for whatever scale this is done. Also this addresses an issue that no sensor improvements can address - if your visual sensor is perfect, it still does nothing to help you relate the sides of an object at least be in the close neighborhood of the front of the object.
I have been working a self project in image processing and robotics where instead robot as usual detecting colors and picking out the object, it tries to detect the holes(resembling different polygons) on the board. For a better understanding of the setup here is an image:
As you can see I have to detect these holes, find out their shapes and then use the robot to fit the object into the holes. I am using a kinect depth camera to get the depth image. The pic is shown below:
I was lost in thought of how to detect the holes with the camera, initially using masking to remove the background portion and some of the foreground portion based on the depth measurement,but this did not work out as, at different orientations of the camera the holes would merge with the board... something like inranging (it fully becomes white). Then I came across adaptiveThreshold function
adaptiveThreshold(depth1,depth3,255,ADAPTIVE_THRESH_GAUSSIAN_C,THRESH_BINARY,7,-1.0);
With noise removal using erode, dilate, and gaussian blur; which detected the holes in a better manner as shown in the picture below. Then I used the cvCanny edge detector to get the edges but so far it has not been good as shown in the picture below.After this I tried out various feature detectors from SIFT, SURF, ORB, GoodFeaturesToTrack and found out that ORB gave the best times and the features detected. After this I tried to get the relative camera pose of a query image by finding its keypoints and matching those keypoints for good matches to be given to the findHomography function. The results are as shown below as in the diagram:
In the end i want to get the relative camera pose between the two images and move the robot to that position using the rotational and translational vectors got from the solvePnP function.
So is there any other method by which I could improve the quality of the
holes detected for the keypoints detection and matching?
I had also tried contour detection and approxPolyDP but the approximated shapes are not really good:
I have tried tweaking the input parameters for the threshold and canny functions but
this is the best I can get
Also ,is my approach to get the camera pose correct?
UPDATE : No matter what I tried I could not get good repeatable features to map. Then I read online that a depth image is cheap in resolution and its only used for stuff like masking and getting the distances. So , it hit me that the features are not proper because of the low resolution image with its messy edges. So I thought of detecting features on a RGB image and using the depth image to get only the distances of those features. The quality of features I got were literally off the charts.It even detected the screws on the board!! Here are the keypoints detected using GoodFeaturesToTrack keypoint detection..
I met an another hurdle while getting the distancewith the distances of the points not coming out properly. I searched for possible causes and it occured to me after quite a while that there was a offset in the RGB and depth images because of the offset between the cameras.You can see this from the first two images. I then searched the net on how to compensate this offset but could not find a working solution.
If anyone one of you could help me in compensate the offset,it would be great!
UPDATE: I could not make good use of the goodFeaturesToTrack function. The function gives the corners in Point2f type .If you want to compute the descriptors we need the keypoints and converting Point2f to Keypoint with the code snippet below leads to the loss of scale and rotational invariance.
for( size_t i = 0; i < corners1.size(); i++ )
{
keypoints_1.push_back(KeyPoint(corners1[i], 1.f));
}
The hideous result from the feature matching is shown below .
I have to start on different feature matchings now.I'll post further updates. It would be really helpful if anyone could help in removing the offset problem.
Compensating the difference between image output and the world coordinates:
You should use good old camera calibration approach for calibrating the camera response and possibly generating a correction matrix for the camera output (in order to convert them into real scales).
It's not that complicated once you have printed out a checkerboard template and capture various shots. (For this application you don't need to worry about rotation invariance. Just calibrate the world view with the image array.)
You can find more information here: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/own_calib.html
--
Now since I can't seem to comment on the question, I'd like to ask if your specific application requires the machine to "find out" the shape of the hole on the fly. If there are finite amount of hole shapes, you may then model them mathematically and look for the pixels that support the predefined models on the B/W edge image.
Such as (x)^2+(y)^2-r^2=0 for a circle with radius r, whereas x and y are the pixel coordinates.
That being said, I believe more clarification is needed regarding the requirements of the application (shape detection).
If you're going to detect specific shapes such as the ones in your provided image, then you're better off using a classifer. Delve into Haar classifiers, or better still, look into Bag of Words.
Using BoW, you'll need to train a bunch of datasets, consisting of positive and negative samples. Positive samples will contain N unique samples of each shape you want to detect. It's better if N would be > 10, best if >100 and highly variant and unique, for good robust classifier training.
Negative samples would (obviously), contain stuff that do not represent your shapes in any way. It's just for checking the accuracy of the classifier.
Also, once you have your classifier trained, you could distribute your classifier data (say, suppose you use SVM).
Here are some links to get you started with Bag of Words:
https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/
Sample code:
http://answers.opencv.org/question/43237/pyopencv_from-and-pyopencv_to-for-keypoint-class/
How do I segment a 2D image into blobs of similar values efficiently? The given input is a n array of integer, which includes hue for non-gray pixels and brightness of gray pixels.
I am writing a virtual mobile robot using Java, and I am using segmentation to analyze the map and also the image from the camera. This is a well-known problem in Computer Vision, but when it's on a robot performance does matter so I wanted some inputs. Algorithm is what matters, so you can post code in any language.
Wikipedia article: Segmentation (image processing)
[PPT] Stanford CS-223-B Lecture 11 Segmentation and Grouping (which says Mean Shift is perhaps the best technique to date)
Mean Shift Pictures (paper is also available from Dorin Comaniciu)
I would downsample,in colourspace and in number of pixels, use a vision method(probably meanshift) and upscale the result.
This is good because downsampling also increases the robustness to noise, and makes it more likely that you get meaningful segments.
You could use floodfill to smooth edges afterwards if you need smoothness.
Some more thoughts (in response to your comment).
1) Did you blend as you downsampled? y[i]=(x[2i]+x[2i+1])/2 This should eliminate noise.
2)How fast do you want it to be?
3)Have you tried dynamic meanshift?(also google for dynamic x for all algorithms x)
Not sure if it is too efficient, but you could try using a Kohonen neural network (or, self-organizing map; SOM) to group the similar values, where each pixel contains the original color and position and only the color is used for the Kohohen grouping.
You should read up before you implement this though, as my knowledge of the Kohonen network goes as far as that it is used for grouping data - so I don't know what the performance/viability options are for your scenario.
There are also Hopfield Networks. They can be mangled into grouping from what I read.
What I have now:
Make a buffer of the same size as the input image, initialized to UNSEGMENTED.
For each pixel in the image where the corresponding buffer value is not UNSEGMENTED, flood the buffer using the pixel value.
a. The border checking of the flooding is done by checking if pixel is within EPSILON (currently set to 10) of the originating pixel's value.
b. Flood filling algorithm.
Possible issue:
The 2.a.'s border checking is called many times in the flood filling algorithm. I could turn it into a lookup if I could precalculate the border using edge detection, but that may add more time than current check.
private boolean isValuesCloseEnough(int a_lhs, int a_rhs) {
return Math.abs(a_lhs - a_rhs) <= EPSILON;
}
Possible Enhancement:
Instead of checking every single pixel for UNSEGMENTED, I could randomly pick a few points. If you are expecting around 10 blobs, picking random points in that order may suffice. Drawback is that you might miss a useful but small blob.
Check out Eyepatch (eyepatch.stanford.edu). It should help you during the investigation phase by providing a variety of possible filters for segmentation.
An alternative to flood-fill is the connnected-components algorithm. So,
Cheaply classify your pixels. e.g. divide pixels in colour space.
Run the cc to find the blobs
Retain the blobs of significant size
This approach is widely used in early vision approaches. For example in the seminal paper "Blobworld: A System for Region-Based Image Indexing and Retrieval".