Related
I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.
Some details about object detecting scenario:
As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.
Just in case, an example of each:
I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.
So, the distinct alternatives I have in mind are:
Creating a custom Haar Classifier.
Considering shapes.
Considering outlines.
I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?
Update
I strongly considered the shape/aspect ratio approach.
However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:
I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.
What I've achieved:
Threshold really helped me, I could notice that on white background tests I would obtain for cans:
And this is what it's obtained for bottles:
So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.
So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.
Note: I'm open to test any other algorithms or libraries distinct to Open CV.
I see few basic ideas here:
Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).
Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB
A unique background color will make things easier.
First create a histogram from an image of just the background
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
Then use calcBackProject to create a mask of bg and not bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}
Then either:
Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
Or use matchShapes, though I've never gotten this to work properly.
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.
First create/train the detector with some sample images
//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
Then do some matching
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;
As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.
The contour's moments can be used to determine its centroid (center of gravity):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.
So, my main approach for detection was:
Bottles are transparent and cans are opaque
Generally algorithm consisted in:
Take a grayscale picture.
Apply a binary threshold.
Select a convenient ROI from it.
Obtain it's color mean and even the standard deviation.
Distinguish.
Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
getchar();
return -1;
}
img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
}
else {
return BOTTLE;
}
}
As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.
Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.
I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.
I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.
You know how every colour eventually turns white in an image if it's bright enough or sufficiently over-exposed? I'm trying to figure out a function to do this to apply to generated HDR images, in a realistic and pleasing looking way (using idealised camera performance as a reference I guess).
The problem the algorithm/function I want to obtain should solve is, let's say you have an orange pixel with the (linear RGB) values {1.0, 0.2, 0.0}. Everything is fine if you multiply each value by a factor of 1.0 or less, but let's say you multiply that pixel by 6, now you get {6.0, 1.2, 0.0}, what do you do with your out of range red and green value of 6.0 and 1.2? You could clip them which would give you {1.0, 1.0, 0.0}, which sadly is what Photoshop and 3DS Max seem to do, but it looks so very wrong as now your formerly orange pixel is yellow (so if you start with any saturated hue (meaning at least one channel is 0.0) you always end up with either magenta, yellow or cyan) and it will never become white.
I considered taking half of the excess of one channel and splitting it equally between the other channels, so for example {1.6, 0.5, 0.1} would become {1.0, 0.8, 0.4} but it's too simplistic and not very realistic. I strongly doubt that an acceptable solution could be anywhere near this trivial.
I'm sure there must have been research done on the topic, but I cannot find any relevant literature and sensitometry doesn't seem to be quite what I'm looking for.
Modifying the Python code I left in an answer on another question to work in the range [0.0-1.0]:
def redistribute_rgb(r, g, b):
threshold = 1.0
m = max(r, g, b)
if m <= threshold:
return r, g, b
total = r + g + b
if total >= 3 * threshold:
return threshold, threshold, threshold
x = (3 * threshold - total) / (3 * m - total)
gray = threshold - x * m
return gray + x * r, gray + x * g, gray + x * b
This should return acceptable results in either a linear or gamma-corrected color space, although linear will be better.
Multiplying each r,g,b value by the same amount retains their original proportions and thus the hue, up to the point where x=0 and you've achieved white. You've expressed interest in a non-linear response once clipping starts, but I'm not entirely sure how to work that in. The math was carefully chosen so that at least one of the returned values will be at the threshold, and none will be above.
Running this on your example of (1.6, 0.5, 0.1) returns (1.0, 0.6615, 0.5385).
I've found a way to do it based on Mark Ransom's suggestion with a twist. When the colour is out of gamut we compute the grey colour of equivalent perceptual luminosity then we linearly interpolate between the out-of-gamut input colour and that grey value to find the first in-gamut colour. Weighting each RGB channel to get the perceptual luminosity part is the tricky part seeing as the most commonly used formula from CIELab L = 0.2126*red + 0.7152*green + 0.0722*blue is quite blatantly wrong as it makes the blue way too bright. Instead I did some tests and chose the weights which looked the most correct to me, though these are not definite and you might want to tweak them, although for this particular problem this is perhaps not too crucial.
Or in fewer words the solution is to desaturate the out-of-gamut colour just enough that it might be in-gamut.
Here is my solution in C code. All variables are in floating point format.
Wr=0.125; Wg=0.68; Wb=0.195; // these are the weights for each colour
max = MAXN(MAXN(red, grn), blu); // max is the maximum value of the 3 colours
if (max > 1.) // if the colour is out of gamut
{
L = Wr*red + Wg*grn + Wb*blu; // Luminosity of the colour's grey point
if (L < 1.) // if the grey point is no brighter than white
{
// t represents the ratio on the line between the input colour
// and its corresponding grey point. t is between 0 and 1,
// a lower t meaning closer to the grey point and a
// higher t meaning closer to the input colour
t = (1.-L) / (max-L);
// a simple linear interpolation between the
// input colour and its grey point
red = red*t + L*(1.-t);
grn = grn*t + L*(1.-t);
blu = blu*t + L*(1.-t);
}
else // if it's too bright regardless of saturation
{
red = grn = blu = 1.;
}
}
Here's what it looks like with a linear orange gradient:
It does not use anything like arbitrary gamma which is good, the only mostly arbitrary thing has to do with the Luminosity weights, but I guess those are quite necessary.
You have to map it to some non-linear scale. For example: http://en.wikipedia.org/wiki/Gamma_correction .
Ex: Let y = f(x) = log(1+x) - log(1-x) define the "actual" luminescence.
The reverse function is x = g(y) = (e^y-1)/(e^y+1).
now, you have values x=1 and x=0.2. For the first case the corresponding y is infinity. Six times the infinity is still infinity. If you use function g, you get new x_new = 1.
For x=0.2, y = 0.4054651. After multiplying by 6, y_new = 2.432791 . The corresponding x_new = 0.8385876.
For x=0, x_new will still be 0 (I will leave the calculations to you).
So starting from (1.0, 0.2, 0.0) your new set of points are (1.0, 0.8385876, 0.0).
This is one example of mapping function. There are infinite number of them. Choose one that looks best to you.
Though I have found a lot of topics on color tint and temperature, but till now I have not seen any definite solution, which is the reason I am creating this post..My apologies for that.
I am interested in adjusting color temp and tint in images from RGB values, somewhat similar to the iPhoto application found in iOS where it can be adjusted with a slider bar from left to right.
Whatever I have found, temp and tint are orthogonal properties, where temp adjustment is along the blue (left; cool colors)--yellow(right; warm colors) and tint along the green (left) -- magenta (right) axis.
How do I adjust them using formulas from RGB values i.e., uderlying implementation of the color temp and tint slider bars.
I can convert them to HSV space and then I can rotate the hue wheel channel towards those (blue, yello, green, magenta) angles, but how to do them in a systematic fashion similar to the slider bar implementation by changing gradually from low level (middle of the slider bar) to high level (right/left ends of the slider bar).
Thanks!
You should try using HSL instead of HSV. HSL saturation separates itself from the hue and luminosity has very definitive range when it comes to mathematical calculation.
In HSL, to add tint you move the L factor between 50-100 and to add shade the L factor varies between 0-50. Also saturation for HSL controls the tone directly unlike HSV.
For temperature, you have to devise your own stratagy changing the color between red and blue but one golden hint that I can give you is "every pure RGB color has one of 3 color values as zero, second fixed to 255 and 3rd varies with the factor of 255/60.
Hope this helps-
Whereas color temparature is a physical value, its expression
in terms of RGB values
not
trivial. If all you need is a pair of orthogonal axes in the RGB colorspace for the visual adjustment of white balance, they can be defined with relative ease in such a way as to resemble the true color temperature and its derivative the tint.
Let us name our RGB temperature BY—for the balance between blue and yellow, and our RGB tint GR—for the balance balance between green and red. Now, these functions must satisfy the following obvious requirements:
They shall not depend on brightness, or be invariant to multiplication of all the RGB components by the same factor:
BY(r,g,b) = BY(kr, kg, kb),
GR(r,g,b) = GR(kr, kg, kb).
They shall be zero for neutral gray:
BY(0,0,0) = 0,
GR(0,0,0) = 0.
They shall belong the to same range, symmetrical around zero point. I will use [-1..+1]
Any combination of BY and GR shall define a valid color.
Now, one of the ways to define them could be:
BY = (r + g - 2b)/(r + g + 2b),
GR = (r - g )/(r + g) .
so that each pair of BY and GR determines a specific proportion
r:g:b = (1 + BY)(1 + GR)
(1 + BY)(1 - GR)
1 - BY
The following image shows the colors of maximum brightness on our BY-GR plane. BY is directed right, GR down, and the neutral point (0,0) is at the center:
Proper
adjustment of white balance consists of multiplication of the linear RGB values by individual factors:
r_new = wb_r * r_old
g_new = wb_g * g_old
b_new = wb_b * b_old
It happens to work on gamma-compressed RGB too, but not so well on sRGB, because of a
piece-wise
definition of its transfer function, but the distortion will be small and often unnoticeable. If you want a perfect adjustment, however, make sure to work in linear RGB.
Once a BY-GR pair is chosen and the corresponding RGB proportion calculated, only one degree of freedom remains—the overall multiplier (see req. 1). Choose it so that no pixels become clipped.
I know this is possible duplicated question.
Ruby, Generate a random hex color
My question is slightly different. I need to know, how to generate the random hex light colors only, not the dark.
In this thread colour lumincance is described with a formula of
(0.2126*r) + (0.7152*g) + (0.0722*b)
The same formula for luminance is given in wikipedia (and it is taken from this publication). It reflects the human perception, with green being the most "intensive" and blue the least.
Therefore, you can select r, g, b until the luminance value goes above the division between light and dark (255 to 0). For example:
lum, ary = 0, []
while lum < 128
ary = (1..3).collect {rand(256)}
lum = ary[0]*0.2126 + ary[1]*0.7152 + ary[2]*0.0722
end
Another article refers to brightness, being the arithmetic mean of r, g and b. Note that brightness is even more subjective, as a given target luminance can elicit different perceptions of brightness in different contexts (in particular, the surrounding colours can affect your perception).
All in all, it depends on which colours you consider "light".
Just some pointers:
Use HSL and generate the individual values randomly, but keeping L in the interval of your choosing. Then convert to RGB, if needed.
It's a bit harder than generating RGB with all components over a certain value (say 0x7f), but this is the way to go if you want the colors distributed evenly.
-- I found that 128 to 256 gives the lighter colors
Dim rand As New Random
Dim col As Color
col = Color.FromArgb(rand.Next(128, 256), rand.Next(128, 256), rand.Next(128, 256))
All colors where each of r, g ,b is greater than 0x7f
color = (0..2).map{"%0x" % (rand * 0x80 + 0x80)}.join
I modified one of the answers from the linked question (Daniel Spiewak's answer) to come up with something that is pretty flexible in terms of excluding darker colors:
floor = 22 # meaning darkest possible color is #222222
r = (rand(256-floor) + floor).to_s 16
g = (rand(256-floor) + floor).to_s 16
b = (rand(256-floor) + floor).to_s 16
[r,g,b].map {|h| h.rjust 2, '0'}.join
You can change the floor value to suit your needs. A higher value will limit the output to lighter colors, and a lower value will allow darker colors.
A really nice solution is provided by the color-generator gem, where you can call:
ColorGenerator.new(saturation: 0.75, lightness: 0.5).create_hex
I want to draw some items on screen, each item is in one of N sets. The number of sets changes all the time, so I need to calculate N different colours which are as different as possible (to make it easy to identify what is in which set).
So, for example with N = 2 my results would be black and white. With three I guess I would get all red, all green, all blue. For all four, it's less obvious what the correct answer is, and this is where I'm having trouble.
EDIT:: The obvious approach is to map 0 to Red, 1 to green, and all the colours in between to the appropriate rainbow colours, then you can get a colour for set N by doing GetRainbowColour(N / TotalSets), so a GetRainbowColour method is all that's need to solve this problem
You can read up on the HSL and HSV color models in this wikipedia article. The "H" in the acronymns stands for Hue, and that is the rainbow you want. It sounds like you want saturation to max out. The article also explains how to convert to RGB color.
Looks like a similar question has been asked before here.
The answer to this question is subjective - what is best contrast to someone with full vision is not necessarily best contrast to someone who is colour blind or has limited vision or someone with normal eyesight who is operating the equipment in a dark environment.
Physiologically, humans have much better resolution for intensity that for hue or saturation. That is why analogue TV, digital video and photo compression throw away colour information to reduce bandwidth (4:2:2) - if you put two pixels which are different intensities together, it doesn't matter what colour they are - we simply can only sense colour differences on large areas of like intensity.
So if the thing you are trying to display has lots of small areas of only a few pixels, or you want it to be used in the dark (in the dark the brain boosts the blue cells and we don't see colour as well) or by the 10% of the male population who are colour blind, consider using intensity as the main differentiating factor rather than hue. GetRainbowColour would ignore the most important dimension of the human visual sense, but can work quite well for continuous data.
Thanks, brainjam, for the suggestion to use HSL. I whipped up this little function that seems to work nicely for stacked graphs:
var contrastingColors = function(len) {
var result = [];
if (len > 0) {
var h = [0, 180]; // red, cyan
var shft = 360 / len;
var flip = 0;
var l = 50;
for (var ix = 0; ix < len; ix++) {
result.push("hsl(" + h[flip] + ",100%," + l + "%)");
h[flip] = (h[flip] + shft) % 360;
flip = flip ? 0 : 1;
if (flip == 0) {
l = (l == 50) ? 30 : (l == 30? 70 : 50);
}
}
}
return result;
};