I am working on a problem that involves segregating solar panel vs house.
Both the house as well as panel are of same color.
NOTE: There are two houses in the image. I am referring to the one which is bluish.
PFB the image as well as my approach.
Any insights how to deal with such situations are welcome.
My approach
transform to hsv colorspace
Perform thresholding on hue component of the image.
Dialate/Erode.
hsv_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
## Thresholding values
red_MIN = np.array([100, 10, 10],np.uint8)
red_MAX = np.array([130, 255, 255],np.uint8)
frame_threshed = cv2.inRange(hsv_img, red_MIN, red_MAX)
k_dialation = np.ones((5,5),np.uint8)
dialation = cv2.dilate(frame_threshed,k_dialation, iterations =5)
k = np.ones((3,3),np.int8)
erosion = cv2.erode(dialation,k,iterations =8)
I also tried drawing contours perform shape analysis,
calculate area,
but as both panel and house have same shape from top and similar area, This approach doesnt work.
I tried template matching,
query image : House structure.
I convolved over the image with query image to find relevant structure(house in my case)
steps 4-6 might work on this image but isnt generalized solution . A slight variation in terms of house position or a different shape house will break the algorithm.
Result after doing 1-5 steps.
Since the problem involves segregating the solar panels with roof tops, you cant expect a generic OpenCV solution to work out. Using supervised learning methods can be one choice.
If you want to segregate roof tops from solar panels, one distinction we can observe is that solar panels have a certain repetitive distinctive pattern. This can be employed for the job. Let me introduce to you this method called Histogram of Sparse Codes.
It is more like the HOG, where the object is recognized by the shape (gradients), with the difference that HSC takes " by learning and
using local representations that are much more expressive
than gradients ".
Then, as an improvement, you can try region proposal algorithms like Selective search instead of sliding window based approach.
HSC makes a dictionary of signals (Signals here are patches of images) from the train images and then reconstructs your test image from those signals. Then a histogram of the codes used to regenerate each signal is generated. Since your region of interest is unique, different set of codes are generated for your object and background and hence can make a distinction. Passing it to SVM can easily segregate your solar panels.
Link to the paper.
Link for selective search
You can use mlpack for sparse coding implementation.
All the best finding solar panel datasets.
Image after selective search (No thresholding etc hence more generalized usage):
If you are finding all the above very difficult to implement. Find the "+" signs inside the selective search regions. You can use template matching or HOG. Then the region with more rate of +'s is the solar panel.
Related
For my bachelor thesis I need to analyse images taken in the ocean to count and measure the size of water particles.
my problem:
besides the wanted water particles, the images show hexagonal patches all over the image in:
- different sizes
- not regular shape
- different greyscale values
(Example image below!)
It is clear that these patches will falsify my image analysis concerning the size and number of particles.
For this reason this patches need to be detected and deleted somehow.
Since it will be just a little part of the work in my thesis, I don't want to spend much time in it and already tried classic ways like: (imageJ)
playing with the threshold (resulting in also deleting wanted water particles)
analyse image including the hexagonal patches and later sort out the biggest areas (the hexagonal patches have quite the biggest areas, but you will still have a lot of haxagons)
playing with filters: using gaussian filter on a duplicated image and subtract the copy from the original deletes many patches (in reducing the greyscale value) but also deletes little wanted water particles and so again falsifies the result
a more complicated and time consuming solution would be to use a implemented library in for example matlab or opencv to detect points, that describe the shapes.
but so far I could not find any code that fits my task.
Does anyone of you have created such a code I could use for my task or any other idea?
You can see a lot of hexagonal patches in different depths also.
the little spots with an greater pixel value are the wanted particles!
Image processing is quite an involved area so there are no hard and fast rules.
But if it was me I would 'Mask' the image. This involves either defining what you want to keep or remove as a pixel 'Mask'. You then scan the mask over the image recursively and compare the mask to the image portion selected. You then select or remove the section (depending on your method) if it meets your criterion.
One such example of a criteria would be the spatial and grey-scale error weighted against a likelihood function (eg Chi-squared, square mean error etc.) or a Normal distribution that you define the uncertainty..
Some food for thought
Maybe you can try with the Hough transform:
https://en.wikipedia.org/wiki/Hough_transform
Matlab have an built-in function, hough, wich implements this, but only works for lines. Maybe you can start from that and change it to recognize hexagons.
I have been working a self project in image processing and robotics where instead robot as usual detecting colors and picking out the object, it tries to detect the holes(resembling different polygons) on the board. For a better understanding of the setup here is an image:
As you can see I have to detect these holes, find out their shapes and then use the robot to fit the object into the holes. I am using a kinect depth camera to get the depth image. The pic is shown below:
I was lost in thought of how to detect the holes with the camera, initially using masking to remove the background portion and some of the foreground portion based on the depth measurement,but this did not work out as, at different orientations of the camera the holes would merge with the board... something like inranging (it fully becomes white). Then I came across adaptiveThreshold function
adaptiveThreshold(depth1,depth3,255,ADAPTIVE_THRESH_GAUSSIAN_C,THRESH_BINARY,7,-1.0);
With noise removal using erode, dilate, and gaussian blur; which detected the holes in a better manner as shown in the picture below. Then I used the cvCanny edge detector to get the edges but so far it has not been good as shown in the picture below.After this I tried out various feature detectors from SIFT, SURF, ORB, GoodFeaturesToTrack and found out that ORB gave the best times and the features detected. After this I tried to get the relative camera pose of a query image by finding its keypoints and matching those keypoints for good matches to be given to the findHomography function. The results are as shown below as in the diagram:
In the end i want to get the relative camera pose between the two images and move the robot to that position using the rotational and translational vectors got from the solvePnP function.
So is there any other method by which I could improve the quality of the
holes detected for the keypoints detection and matching?
I had also tried contour detection and approxPolyDP but the approximated shapes are not really good:
I have tried tweaking the input parameters for the threshold and canny functions but
this is the best I can get
Also ,is my approach to get the camera pose correct?
UPDATE : No matter what I tried I could not get good repeatable features to map. Then I read online that a depth image is cheap in resolution and its only used for stuff like masking and getting the distances. So , it hit me that the features are not proper because of the low resolution image with its messy edges. So I thought of detecting features on a RGB image and using the depth image to get only the distances of those features. The quality of features I got were literally off the charts.It even detected the screws on the board!! Here are the keypoints detected using GoodFeaturesToTrack keypoint detection..
I met an another hurdle while getting the distancewith the distances of the points not coming out properly. I searched for possible causes and it occured to me after quite a while that there was a offset in the RGB and depth images because of the offset between the cameras.You can see this from the first two images. I then searched the net on how to compensate this offset but could not find a working solution.
If anyone one of you could help me in compensate the offset,it would be great!
UPDATE: I could not make good use of the goodFeaturesToTrack function. The function gives the corners in Point2f type .If you want to compute the descriptors we need the keypoints and converting Point2f to Keypoint with the code snippet below leads to the loss of scale and rotational invariance.
for( size_t i = 0; i < corners1.size(); i++ )
{
keypoints_1.push_back(KeyPoint(corners1[i], 1.f));
}
The hideous result from the feature matching is shown below .
I have to start on different feature matchings now.I'll post further updates. It would be really helpful if anyone could help in removing the offset problem.
Compensating the difference between image output and the world coordinates:
You should use good old camera calibration approach for calibrating the camera response and possibly generating a correction matrix for the camera output (in order to convert them into real scales).
It's not that complicated once you have printed out a checkerboard template and capture various shots. (For this application you don't need to worry about rotation invariance. Just calibrate the world view with the image array.)
You can find more information here: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/own_calib.html
--
Now since I can't seem to comment on the question, I'd like to ask if your specific application requires the machine to "find out" the shape of the hole on the fly. If there are finite amount of hole shapes, you may then model them mathematically and look for the pixels that support the predefined models on the B/W edge image.
Such as (x)^2+(y)^2-r^2=0 for a circle with radius r, whereas x and y are the pixel coordinates.
That being said, I believe more clarification is needed regarding the requirements of the application (shape detection).
If you're going to detect specific shapes such as the ones in your provided image, then you're better off using a classifer. Delve into Haar classifiers, or better still, look into Bag of Words.
Using BoW, you'll need to train a bunch of datasets, consisting of positive and negative samples. Positive samples will contain N unique samples of each shape you want to detect. It's better if N would be > 10, best if >100 and highly variant and unique, for good robust classifier training.
Negative samples would (obviously), contain stuff that do not represent your shapes in any way. It's just for checking the accuracy of the classifier.
Also, once you have your classifier trained, you could distribute your classifier data (say, suppose you use SVM).
Here are some links to get you started with Bag of Words:
https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/
Sample code:
http://answers.opencv.org/question/43237/pyopencv_from-and-pyopencv_to-for-keypoint-class/
i need a way to determine wheter a picture is a photograph or not. I've got a bunch of random image files (paper document scans, logos and of course photographs taken by a camera) and i need to filter out only the photographs for creating a preview.
The solution proposed at Determine if image is photograph or drawing, quickly only works in a limited way (i.e. some logos are completly black with wite font, some logos have only colors in it - no white areas) and sometimes i've got scan of a white paper containing multiple photographs with white space arround - i need to identify those, too - because then i have to key out the white part and save the photographs on the scan in seperate files.
Your process to do this should probably be similar to the following:
Extract features from the image (pixel values, groups of pixels,
HoG, SIFT, GIST, DCT, Wavelet, Dictionary learning coefficients,
etc. depending on how much time you have)
Aggregate these features somehow so that you get a fixed length
vector (histogram, pyramid scheme)
Apply a standard classification (SVM, k-NN, neural network, Random
Forest) or clustering algorithm (k-means, GMM, etc.) and measure how
well it works (F1 score is usually okay, ROC may be better for
2-class problems)
Repeat from step 1 with different features if you are unsatisfied with the results from 3
The solution you reference seems to be pretty reasonable in terms of steps 1 and 2.
A simple next step in extracting and aggregating features could be to create histograms from all pixel values in the image. If you have a lot of labeled data you should feed these features to a standard classifier. Otherwise, run a clustering algorithm on these histogram features and check the cluster assignments to see if they are correlated with the photograph/non-photograph assignment.
Check the following paper:
http://www.vision.ee.ethz.ch/~gallju/projects/houghforest/houghforest.html . They provide source code.
I believe the program accepts an input file with negative and positive images for training. The output of the classification part of it will be a image voting map (hough map?). You might need to decide on a threshold value to locate regions of interest. So if there two logos in the image it will mark out both of them. The algorithm worked very well for me in a past.
Training on 100 positive and 100 negative images should be enough, I believe. Don't use big images for training also (256x256 should be enough).
I have a web cam that takes a picture every N seconds. This gives me a collection of images of the same scene over time. I want to process that collection of images as they are created to identify events like someone entering into the frame, or something else large happening. I will be comparing images that are adjacent in time and fixed in space - the same scene at different moments of time.
I want a reasonably sophisticated approach. For example, naive approaches fail for outdoor applications. If you count the number of pixels that change, for example, or the percentage of the picture that has a different color or grayscale value, that will give false positive reports every time the sun goes behind a cloud or the wind shakes a tree.
I want to be able to positively detect a truck parking in the scene, for example, while ignoring lighting changes from sun/cloud transitions, etc.
I've done a number of searches, and found a few survey papers (Radke et al, for example) but nothing that actually gives algorithms that I can put into a program I can write.
Use color spectroanalisys, without luminance: when the Sun goes down for a while, you will get similar result, colors does not change (too much).
Don't go for big changes, but quick changes. If the luminance of the image changes -10% during 10 min, it means the usual evening effect. But when the change is -5%, 0, +5% within seconds, its a quick change.
Don't forget to adjust the reference values.
Split the image to smaller regions. Then, when all the regions change same way, you know, it's a global change, like an eclypse or what, but if only one region's parameters are changing, then something happens there.
Use masks to create smart regions. If you're watching a street, filter out the sky, the trees (blown by wind), etc. You may set up different trigger values for different regions. The regions should overlap.
A special case of the region is the line. A line (a narrow region) contains less and more homogeneous pixels than a flat area. Mark, say, a green fence, it's easy to detect wheter someone crosses it, it makes bigger change in the line than in a flat area.
If you can, change the IRL world. Repaint the fence to a strange color to create a color spectrum, which can be identified easier. Paint tags to the floor and wall, which can be OCRed by the program, so you can detect wheter something hides it.
I believe you are looking for Template Matching
Also i would suggest you to look on to Open CV
We had to contend with many of these issues in our interactive installations. It's tough to not get false positives without being able to control some of your environment (sounds like you will have some degree of control). In the end we looked at combining some techniques and we created an open piece of software named OpenTSPS (Open Toolkit for Sensing People in Spaces - http://www.opentsps.com). You can look at the C++ source in github (https://github.com/labatrockwell/openTSPS/).
We use ‘progressive background relearn’ to adjust to the changing background over time. Progressive relearning is particularly useful in variable lighting conditions – e.g. if lighting in a space changes from day to night. This in combination with blob detection works pretty well and the only way we have found to improve is to use 3D cameras like the kinect which cast out IR and measure it.
There are other algorithms that might be relevant, like SURF (http://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/ and http://en.wikipedia.org/wiki/SURF) but I don't think it will help in your situation unless you know exactly the type of thing you are looking for in the image.
Sounds like a fun project. Best of luck.
The problem you are trying to solve is very interesting indeed!
I think that you would need to attack it in parts:
As you already pointed out, a sudden change in illumination can be problematic. This is an indicator that you probably need to achieve some sort of illumination-invariant representation of the images you are trying to analyze.
There are plenty of techniques lying around, one I have found very useful for illumination invariance (applied to face recognition) is DoG filtering (Difference of Gaussians)
The idea is that you first convert the image to gray-scale. Then you generate two blurred versions of this image by applying a gaussian filter, one a little bit more blurry than the first one. (you could use a 1.0 sigma and a 2.0 sigma in a gaussian filter respectively) Then you subtract from the less-blury image, the pixel intensities of the more-blurry image. This operation enhances edges and produces a similar image regardless of strong illumination intensity variations. These steps can be very easily performed using OpenCV (as others have stated). This technique has been applied and documented here.
This paper adds an extra step involving contrast equalization, In my experience this is only needed if you want to obtain "visible" images from the DoG operation (pixel values tend to be very low after the DoG filter and are veiwed as black rectangles onscreen), and performing a histogram equalization is an acceptable substitution if you want to be able to see the effect of the DoG filter.
Once you have illumination-invariant images you could focus on the detection part. If your problem can afford having a static camera that can be trained for a certain amount of time, then you could use a strategy similar to alarm motion detectors. Most of them work with an average thermal image - basically they record the average temperature of the "pixels" of a room view, and trigger an alarm when the heat signature varies greatly from one "frame" to the next. Here you wouldn't be working with temperatures, but with average, light-normalized pixel values. This would allow you to build up with time which areas of the image tend to have movement (e.g. the leaves of a tree in a windy environment), and which areas are fairly stable in the image. Then you could trigger an alarm when a large number of pixles already flagged as stable have a strong variation from one frame to the next one.
If you can't afford training your camera view, then I would suggest you take a look at the TLD tracker of Zdenek Kalal. His research is focused on object tracking with a single frame as training. You could probably use the semistatic view of the camera (with no foreign objects present) as a starting point for the tracker and flag a detection when the TLD tracker (a grid of points where local motion flow is estimated using the Lucas-Kanade algorithm) fails to track a large amount of gridpoints from one frame to the next. This scenario would probably allow even a panning camera to work as the algorithm is very resilient to motion disturbances.
Hope this pointers are of some help. Good Luck and enjoy the journey! =D
Use one of the standard measures like Mean Squared Error, for eg. to find out the difference between two consecutive images. If the MSE is beyond a certain threshold, you know that there is some motion.
Also read about Motion Estimation.
if you know that the image will remain reletivly static I would reccomend:
1) look into neural networks. you can use them to learn what defines someone within the image or what is a non-something in the image.
2) look into motion detection algorithms, they are used all over the place.
3) is you camera capable of thermal imaging? if so it may be worthwile to look for hotspots in the images. There may be existing algorithms to turn your webcam into a thermal imager.
I have a number of images (as well as the original data sources) that exhibit specific features. Some of them have distinct vertical/horizontal regions, as shown in the following figure or simply "blobs"/concentrations of points in very specific regions.
These images are associated with specific labels/classes, for instance, a label "A" exhibits very characteristic horizontal lines (like those marked in figure) at y = 700 and y = 150. Those images that belong to class "B", exhibit vertical lines at x = 200, 260 and 370, class "C"..., and so on.
Besides these known/labelled classes, I have a bunch of images that exhibit one of these features, or their combination.
My goal is to use these known classes to train some ML algorithm in order to further use it for classifying those images that do not have any labels. I understand that I need to somehow extract these particularities (vertical/horizontal lines, blobs of high point density that usually occur in the upper-right corner of the image, or in the (x,y) region of (250-400, 800-1500) and so on). Next, I would need to train some ML algorithm with these features, and only then use the trained system for classif.
I have been looking and playing with some tools for 3-4 days now (like PIL, with different blurring, smoothing and edge detecting techniques, or MDP's Gaussian classifiers and many posts on stackoverflow). The problem is that I cannot for a clear "solution process + appropriate tools" combination.
I would greatly appreciate if someone could guide me a bit more into the techniques for extracting these very specific/weird features from images (or even original datasets), and/or tools to use.
I understand you have the feature vectors for your samples (training data).
If this is so and you are only looking for a machine learning algorithm implementation, I would suggest you to use Support Vector Machines SVM. A popular implementation called SVM-light is available free of cost for your use. http://svmlight.joachims.org/
Please note that the above site gives a 2-class implementation. If you need a multi-class SVM you can get it from http://svmlight.joachims.org/svm_multiclass.html
Yet few more popular classifiers are
Nearest Neighbour classifier
C4.5 Decision Trees
Neural Network