Invisible, interactable objects in AS3 -- how to code efficient invisibility? - performance

Alpha invisibility.
I currently define circular regions on some images as "hot spots". For instance, I could have my photo on screen and overlay a circle on my head. To check for interaction with my head in realtime, I would returnOverlaps and do some manipulation on all objects overlapping the circle. For debugging, I make the circle yellow with alpha 0.5, and for release I decrease alpha to 0, making the circle invisible (as it should be).
Does this slow down the program? Is there another way to make the circle itself invisible while still remaining capable of interaction? Is there some way to color it "invisible" without using a (potentially) costly alpha of 0? Cache as bitmap matrix? Or some other efficient way to solve the "hot spot" detection without using masks?

Having just a few invisible display objects should not slow it down that much, but having many could. I think a more cleaner option may be to just handle it all in code, rather then have actual invisible display objects on the stage.
For a circle, you would define the center point and radius. Then to get if anyone clicked on it, you could go:
var xDist:Number = circle.x - mousePoint.x;
var yDist:Number = circle.y - mousePoint.y;
if((xDist * xDist) + (yDist * yDist) <= (circle.radius * circle.radius)){
// mousePoint is within circle
} else {
// mousePoint is outside of circle
}
If you insist on using display objects to set these circular hit areas (sometimes it can be easier visually, then by numbers), you could also write some code to read those display objects (and remove them from being rendered) in to get their positions and radius size.
added method:
// inputX and inputY are the hotspot's x and y positions, and inputRadius is the radius of the hotspot
function hitTestObj(inputA:DisplayObject, inputX:int, inputY:int, inputRadius:int):Boolean {
var xDist:Number = inputX - inputA.x;
var yDist:Number = inputY - inputA.y;
var minDist:Number = inputRadius + (inputA.width / 2);
return (((xDist * xDist) + (yDist * yDist)) =< (minDist * minDist))
}

An alpha=0 isn't all that costly in terms of rendering as Flash player will optimize for that (check here for actual figures). Bitmap caching wouldn't be of any help as the sprite is invisible. There's other ways to perform collision detection by doing the math yourself (more relevant in games with tens or even hundreds of sprites) but that would be an overkill in your case.

Related

My mesh flips for a rotation smaller than math.pi

I am coming back since I am having this geometric problem that I am not familiar with on Unity.
For a f-zero style game, I have a collider box (white on the screen captures) which is the origin of my raycast, and is bound to the movement of the vehicle.
In the shown code, this is this.collider. I control its rotation via a traditional applymatrix and there is no problem.
Then, on top of that, I have the rendered body of the vehicle in this.meshes. It inherits the rotation of the collider box, but gets some extra rotation on its vertical axis to give a visual sliding dynamic during the hard turns.
It is separate from the collider to keep the vector.forward of the movement (and the raycast) not affected by the extra-rotation. This is purely visual.
My question is: what is the best way to implement it?
I tried different things, but, basically, if I copy the position and rotation of the collider, no problem. As soon as I try to add some extra rotation = this.driftRotation, my body flips when rotation.y value is less than -math.pi. I can adjust the value of the rotation by incrementing Math.PI (like in Unity), but it doesn't work here.
No clean solution found with applyMatrix neither, and not a lot of google answers on "vertical rotation flip mesh"... though I'm pretty sure this pissue is common.
Some code:
this.meshes.position.set(
this.collider.position.x,
this.collider.position.y,
this.collider.position.z);
this.meshes.rotation.x = this.collider.rotation.x;
this.meshes.rotation.y = this.collider.rotation.y + this.driftRotation;
this.meshes.rotation.z = this.collider.rotation.z;
Enclosed more explicit pictures:
Thank you
Marquizzo, that's precisely the point: the 3rd px follows the 2nd one, so I'm still turning right but rotation suddenly flips (again, when rotation.y reaches -PI).
Anyway, I fixed it by not trying to directly change rotation.y value, but playing with matrix. Just takes time to understand what does what.
For those who may face a similar pb, here is my temp solution, until I find sthing more performant:
this.meshes.matrix.identity();
if (Math.abs(driftAmount) > 0)
{
this.driftAxis.copy(this.driftDirection);
this.driftValue = js.Utils.lerp(this.driftValue, Math.sign(driftAmount) * 0.4, 0.05);
this.meshes.matrix.makeRotationAxis(this.driftAxis, this.driftValue);
}
else if (Math.abs(this.driftValue) > 0)
{
this.driftAxis.copy(this.driftDirection);
this.driftValue = js.Utils.lerp(this.driftValue, 0, 0.1);
if (Math.abs(this.driftValue) < 0.001)
{
this.driftValue = 0;
}
this.meshes.matrix.makeRotationAxis(this.driftAxis, this.driftValue);
}
this.meshes.applyMatrix(this.collider.matrix);
I had to add a driftAxis along a driftDrection, which is my axis for my vertical rotation.
For ref. I think this subject is +/- bound to the issue I had:
https://github.com/mrdoob/three.js/issues/1460
Now I have another issue, how to add another rotation to this.meshes on another axis, the forward one, for a rolling effect, because if I just add another makeRotationAxis in this code it just skips the first one. But that sounds less difficult to figure out, there must exist the equivalent of combineMatrix something...

outlining text in processing

My goal is to obtain an outline of text that is 1 pixels wide.
It could look something like this: https://jsfiddle.net/Lk1ju9yw/
I can't think of a good way to go about this so I did the following (in pseudocode):
PImage img;
void setup() {
size(400, 400);
// use text() to write on the canvas
// initialize PImage img
// load pixels for canvas and img
// loop thru canvas pixels and look for contrast
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
// compare canvas pixels at x-y with its neighbors
// change respective pixel on PImage img so as not to disturb canvas
}
}
// update pixels and draw img over the canvas
img.updatePixels();
img(img, 0, 0);
}
In a nutshell, I wrote white text on a black background on the canvas, did some edge detection and drew the results on a PImage, then used the PImage to store the results. I guess I could have skipped the PImage phase but I wanted to see what the edge detection algorithm produced.
So this does a decent job of getting the outline but there are some problems:
The outline is sometimes 1+ pixels wide. This is a problem. Suppose I want to store the outline (ie. all the positions of the white pixels) in an ArrayList.
For example, if using the ArrayList I draw an ellipse at EVERY point along the outline, the result is ok. But if I want the ellipses spaced apart, the ellipse-outline becomes kind of rough. In the fiddle I provided, the left edge of the letter 'h' is 2 pixels wide. Sometimes the ellipse will be drawn at the inner pixel, sometimes at the outer. That kind of thing makes it look ugly.
Elements of the ArrayList might be neighbors in the ArrayList, but not on the PImage. If I want to draw a circle for every 10th ArrayList location, the result won't necessarily be spaced apart on the PImage.
Here is an example of how ugly it can be: https://jsfiddle.net/Lk1ju9yw/1/
I am quite sure I understand why this is happening. I just don't know how to avoid it.
I also believe there is a solution (a PFont method) in p5.js. I am comfortable using p5 but unless I have to (let's say, because of difficulty), I would rather use processing. I've also heard of some libraries in processing that can help with this. Partly, I am interested in the result, but I am also interested in learning if I can program a solution myself (with some guidance, that is).
You can get an outline of text very easily in P5.js, because text honors the fill and stroke colors. So if you call noFill() the text will not be filled in, and if you call stroke(0) the text will have a black outline.
function setup() {
createCanvas(400, 200);
noSmooth();
}
function draw() {
background(220);
textSize(72);
textAlign(CENTER);
noFill();
stroke(0);
text("hey", width/2, height/2);
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.5.16/p5.js"></script>
Unfortunately this approach won't work in regular Processing, because it just uses the stroke color for text. I'm not totally sure about Processing.js, but my guess is it's the same as Processing.
If you draw this to a buffer (using createGraphics()), then you can iterate over the buffer to get a list of points that make up your outline.
Now, as for putting the points in the correct order, you're going to have to do that yourself. The first approach that occurs to me is to sort them and group them by letter.
For example, your algorithm might be:
Find the upper-left-most point. Add it to your list.
Does that point you just added have any neighbors? If so, pick one and add it to your list. Repeat this step until the point has no neighbors.
Are there any points left? If so, find the point closest to the one you just added, and add it to your list. Go to step 2.
This might not be perfect, but if you want something more advanced you might have to start thinking about processing the list of points: maybe removing points that have a left neighbor, for example. You're going to have to play around to find the effect you're looking for.
This was an interesting question., thanks for that. Good luck, sounds like a fun project.

Do I need to move the tiles or the player in a 2d tile world?

I'm currently creating a 2d tile game and I'm wondering if the tiles has to move or the character.
I ask this question because I already have created the "2d tile map" but it is running too slow, and I can't fix it. I tried everything now and the result is that I get 30 fps.
The reason that it is running too slow is because every 1ms with a timer, the tiles are being redrawn. But I can't figure out how to fix this problem.
This is how I make the map :
public void makeBoard()
{
for (int i = 0; i < tileArray.GetLength(0); i++)
{
for (int j = 0; j < tileArray.GetLength(1); j++)
{
tileArray[i, j] = new Tile() { xPos = j * 50, yPos = i * 50 };
}
}
}
Here I redraw each 1ms or higher the tiles and sprites :
private void Wereld_Paint_1(object sender, PaintEventArgs e)
{
//label1.Text = k++.ToString();
using (Graphics grap = Graphics.FromImage(bmp))
{
for (int i = 0; i < tileArray.GetLength(0); i++)
{
for (int j = 0; j < tileArray.GetLength(1); j++)
{
grap.DrawImage(tileArray[i, j].tileImage, j * 50, i * 50, 50, 50);
}
}
grap.DrawImage(player.movingObjectImage, player.xPos, player.yPos, 50, 50);
grap.DrawImage(enemyGoblin.movingObjectImage, enemyGoblin.xPos, enemyGoblin.yPos, 50, 50);
groundPictureBox.Image = bmp;
// grap.Dispose();
}
}
This is the Timer with a specific interval :
private void UpdateTimer_Tick(object sender, EventArgs e)
{
if(player.Update()==true) // true keydown event is fired
{
this.Invalidate();
}
label1.Text = lastFrameRate.ToString(); // for fps rate show
CalculateFrameRate(); // for fps rate show
}
Are you writing the tile implementation yourself? Probably the issue is that at every frame you're drawing all tiles.
2D engines with scrolling tiles should draw tiles on a larger sprite than the screen, then draw that sprite around which is a fast operation (you'd need to specify the language you're using so I can provide some hint on how to actually make that fast - basically an in video memory accelerated blit, but every language has it's way to make it happen)
when the border of this super-sprite is closer to the screen border than a threshold (usually half tile), the larger sprite is redrawn around the current position - but there is no need to draw all the tiles on this! start copying the supersprite on this recentered sprite and you only need to draw the tiles missing from the previous supersprite because of the offset.
As mentioned in the comments your concept is wrong. So here's just a simple summary of how to do this task:
Tile map is static
From functional point of view does not matter if player moves or the map but from performance point of view the number of tiles is hugely bigger then number of players so moving player is faster.
To achieve player centered or follow views you have to move the camera too.
rendering
Repainting every 1ms is insane and most likely impossible on nowadays computers if you got medium complexity of the scene. Human vision can't detect it anyway so there is no point in repainting more that 25-40 fps. The only reason for higher fps needs is to be synchronized with your monitor refreshing to avoid scan line artifacts (even LCD use scan lines refreshing). To have more fps then the refresh rate of your monitor is pointless (many fps players would oppose but our perception is what it is no matter what they say).
Anyway if your rendering took more then 1ms (which is more then likely) then your timer is screwed because it should be firing several times before the first handler even stops. That usually causes massive slowdowns due to synchronisation problems so the resulting fps is usually even smaller then the rendering engine could provide. So how to remedy that?
set timer interval to 20ms or more
add bool _redraw=false
And use it to redraw only when you need to repaint screen. So on any action like player movement, camera movement or turn, animation change set it to true
inside timer event handler call your repaint only if _redraw==true and set it to false afterwards.
This will boost performance a lot. even if your repaint will take more than the timer interval still this will be much much faster then your current approach.
To avoid flickering use Back buffering.
camera and clipping
Your map is most likely much bigger then the screen so there is no point to repaint all the tiles. You can look at the camera as a means to select the right part of your map. If your game does not use rotations then you need just position and may be zoom/scale. If you want rotations then 2D 3x3 homogeneous matrices are the way.
Let assume you got only position (no zoom or rotating) then you can use this transformations:
screen_x=world_x-camera_x
screen_y=world_y-camera_y
world_x=screen_x+camera_x
world_y=screen_y+camera_y
So camera is your camera view position, world is you tile position in map grid and screen is the position on screen. If you got indexes of your tile in map then just multiply them by tile size in pixels to obtain the world coordinates.
To select only visible tiles you need to obtain the corner positions of your screen, convert them into world coordinates, then into indexes in map and finally render only tiles inside rectangle that these points form in your map + some margin of error (for example render 1 tile enlarged rectangle in all directions). This way the rendering will be independent on your map size. This process is called clipping.
I strongly recommend to look at these related QAs:
Improving performance of click detection on a staggered column isometric grid
2D Diamond (isometric) map editor ... read the comments there !!!
The demo in the linked QAs use only GDI and direct pixel access to bitmaps in win32 form app so you can compare performance with your code (they should be similar) and tweak your code until it behaves as should.

how can I implement a slow smooth background scrolling in sdl

I am trying to implement background scrolling using SDL 2.
As far as I understand one can only move source rectangle by an integer value.
My scrolling works fine when I move it by one every iteration of the game loop.
But I want to move it slower. I tried to move it using this code
moved += speed;
if (moved >= 1.0) {
++src_rect.x;
moved -= 1;
}
Here moved and speed are doubles . I want my background to move something like ten times slower, therefore I set speed to 0.1. It does move ten times slower, but the animation is no longer smooth. It kind of jumps from one pixel to another, which looks and feels ugly when the speed is low.
I am thinking of making my background larger and scrolling it using an integer. Maybe when background is large enough the speed of 1 will seem slower.
Is there a way to scroll not a very large background slowly and smoothly and the same time?
Thanks.
What I would do is have a set of floats that would track the virtual screen position, then you just cast the floats to integers when you actually render, that way you don't ever lose the precision of the floats.
To give you an example, I have an SDL_Rect, I want to move it every frame. I have two floating point variables that track the x and y position of the rect, every frame I would update those x and y positions, cast them to an integer, and then render the rect, EX:
// Rect position
float XPos = 0.0f;
float YPos = 0.0f;
SDL_Rect rect = {0, 0, 64, 64};
// Update virtual positions
XPos += 20.0f * DeltaTime;
YPos += 20.0f * DeltaTime;
// Move rect down and to the right
rect.x = (int)XPos;
rect.y = (int)YPos;
While this doesn't give you the exact result you are wanting, it is the only way that I know of to do this, it will let you delay your movement more precisely without giving you that ugly chunkiness in the movement, it also will let you add stuff like more precise acceleration too. Hope this helps.

What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.
Some details about object detecting scenario:
As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.
Just in case, an example of each:
I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.
So, the distinct alternatives I have in mind are:
Creating a custom Haar Classifier.
Considering shapes.
Considering outlines.
I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?
Update
I strongly considered the shape/aspect ratio approach.
However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:
I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.
What I've achieved:
Threshold really helped me, I could notice that on white background tests I would obtain for cans:
And this is what it's obtained for bottles:
So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.
So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.
Note: I'm open to test any other algorithms or libraries distinct to Open CV.
I see few basic ideas here:
Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).
Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB
A unique background color will make things easier.
First create a histogram from an image of just the background
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
Then use calcBackProject to create a mask of bg and not bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}
Then either:
Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
Or use matchShapes, though I've never gotten this to work properly.
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.
First create/train the detector with some sample images
//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
Then do some matching
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;
As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.
The contour's moments can be used to determine its centroid (center of gravity):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.
So, my main approach for detection was:
Bottles are transparent and cans are opaque
Generally algorithm consisted in:
Take a grayscale picture.
Apply a binary threshold.
Select a convenient ROI from it.
Obtain it's color mean and even the standard deviation.
Distinguish.
Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
getchar();
return -1;
}
img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
}
else {
return BOTTLE;
}
}
As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.
Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.
I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.
I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.

Resources