dlib object detector for dog face detection - need advice on improving recall - dlib

I'm trying to train a dog face detector with dlib's hog pyramid detector.
I used Columbia dogs dataset: ftp://ftp.umiacs.umd.edu/pub/kanazawa/CU_Dogs.zip
At first I would get a recall of 0%, but by increasing C value I managed to increase it to 65% on training set and 45% on testing set. After certain point increasing C value stopped helping (1000+) and would only slow down training.
Could you give any advice on how I could improve recall to a descent recall quality?

Related

Kalman Filter implementation to estimate position with IMU under high impacts and acceleration

I am trying to implement a Kalman Filter to estimate the position of my arm moving in the sagittal plane (2d). To do this, I have an IMU which as usually done, I use the gyro as input to my state model and the accelerometer as my observation.
Regarding the bias, I used 0.001 for the variances of my covariance matrix of the state estimation equation and 0.03 for the variance of the accelerometer (measurement).
This filter works really well if I move my arm slowly from 0 to 90º. But if I perform sudden movements, the accelerometer makes my estimation move downward and it is not very precise (i'm off about 15º), once I move slowly it works well again. But the response under high acceleration/sudden movement is not good.
For this reason, I've thought of having a variance switch which tracks the variance of the last 10-20 values of my accelerometer angle measurements and if the variance is above a certain level I would increase the variance of the accelerometer in the covariance matrix.
Would this be an accurate approach in a system with very high accelerations? What would be a more correct way to estimate the angle under sudden movements? As I mentioned, the result I get when the accelerometer has low variance is very good, but not when "shaken fast".
Also, I would assume that due to this behavior, the accelerometer's variance does not behave according to a gaussian distribution, but I would not know how to model this behavior.
You can run a "Bank of Filters", that is independent filters with different noise levels for the variance, and then compute a weighted average the estimates based on their likelihoodlink to a reference. You can find several references in literature, during my recent work I discovered Y.Bar-Shalom has documented such an approach.
In scientific terms what you are describing is an adaptive-stochastic state estimation problem΄ long story short there exist methods to change the modelled measurement noise on-line depending on performance indications from the filter.
All the best,
D.D.
Denmark

Gain/Lift chart interpretation using H2OFlow

The above image is the H2O GBM classification model lift chart for training and validation data sets. I am confused it with the other lift charts I have seen. Normally the baseline will be 45 degrees and the lift curve used to be somewhat convex shape from the baseline curve. In the above figure if the green line shows the lift curve, why is it constant and coming down and touches the baseline? Also why the baseline is not 45 degree? Can anyone help me to interpret the model using the above graph? Is my model perform well?
The black line is not the baseline, but the cumulative capture rate. The capture rate is the proportion of all the events that fall into the group/bin. E.g. if 90 out of total 100 positive outcomes/events fall into the first bin, then the capture rate for that bin is 0.9.
The green line is the cumulative lift curve, so by definition the two lines converge at 1.
Whether your model performs well or not depends on your goal. According to the validation metrics, you could capture about 80% of the events by targeting only 50% of the population, which means lift of about 1.6.

Shading mask algorithm for radiation calculations

I am working on a software (Ruby - Sketchup) to calculate the radiation (sun, sky and surrounding buildings) within urban development at pedestrian level. The final goal is to be able to create a contour map that shows the level of total radiation. With total radiation I mean shortwave (light) and longwave(heat). (To give you an idea: http://www.iaacblog.com/maa2011-2012-digitaltools/files/2012/01/Insolation-Analysis-All-Year.jpg)
I know there are several existing software that do this, but I need to write my own as this calculation is only part of a more complex workflow.
The (obvious) pseudo code is the following:
Select and mesh surface for analysis
From each point of the mesh
Cast n (see below) rays in the upper hemisphere (precalculated)
For each ray check whether it is in shade
If in shade => Extract properties from intersected surface
If not in shade => Flag it
loop
loop
loop
The approach above is brute force, but it is the only I can think of. The calculation time increases with the fourth power of the accuracy (Dx,Dy,Dazimth, Dtilt). I know that software like radiance use a Montecarlo approach to reduce the number of rays.
As you can imagine, the accuracy of the calculation for a specific point of the mesh is strongly dependent by the accuracy of the skydome subdivision. Similarly the accuracy on the surface depends on the coarseness of the mesh.
I was thinking to a different approach using adaptive refinement based on the results of the calculations. The refinement could work for the surface analyzed and the skydome. If the results between two adjacent points differ more than a threshold value, than a refinement will be performed. This is usually done in fluid simulation, but I could not find anything about light simulation.
Also i wonder whether there are are algorithms, from computer graphics for example, that would allow to minimize the number of calculations. For example: check the maximum height of the surroundings so to exclude certain part of the skydome for certain points.
I don't need extreme accuracy as I am not doing rendering. My priority is speed at this moment.
Any suggestion on the approach?
Thanks
n rays
At the moment I subdivide the sky by constant azimuth and tilt steps; this causes irregular solid angles. There are other subdivisions (e.g. Tregenza) that maintain a constant solid angle.
EDIT: Response to the great questions from Spektre
Time frame. I run one simulation for each hour of the year. The weather data is extracted from an epw weather file. It contains, for each hour, solar altitude and azimuth, direct radiation, diffuse radiation, cloudiness (for atmospheric longwave diffuse). My algorithm calculates the shadow mask separately then it uses this shadow mask to calculate the radiation on the surface (and on a typical pedestrian) for each hour of the year. It is in this second step that I add the actual radiation. In the the first step I just gather information on the geometry and properties of the various surfaces.
Sun paths. No, i don't. See point 1
Include reflection from buildings? Not at the moment, but I plan to include it as an overall diffuse shortwave reflection based on sky view factor. I consider only shortwave reflection from the ground now.
Include heat dissipation from buildings? Absolutely yes. That is the reason why I wrote this code myself. Here in Dubai this is key as building surfaces gets very, very hot.
Surfaces albedo? Yes, I do. In Skethcup I have associated a dictionary to every surface and in this dictionary I include all the surface properties: temperature, emissivity, etc.. At the moment the temperatures are fixed (ambient temperature if not assigned), but I plan, in the future, to combine this with the results from a building dynamic thermal simulation that already calculates all the surfaces temperatures.
Map resolution. The resolution is chosen by the user and the mesh generated by the algorithm. In terms of scale, I use this for masterplans. The scale goes from 100mx100m up to 2000mx2000m. I usually tend to use a minimum resolution of 2m. The limit is the memory and the simulation time. I also have the option to refine specific areas with a much finer mesh: for example areas where there are restaurants or other amenities.
Framerate. I do not need to make an animation. Results are exported in a VTK file and visualized in Paraview and animated there just to show off during presentations :-)
Heat and light. Yes. Shortwave and longwave are handled separately. See point 4. The geolocalization is only used to select the correct weather file. I do not calculate all the radiation components. The weather files I need have measured data. They are not great, but good enough for now.
https://www.lucidchart.com/documents/view/5ca88b92-9a21-40a8-aa3a-0ff7a5968142/0
visible light
for relatively flat global base ground light map I would use projection shadow texture techniques instead of ray tracing angular integration. It is way faster with almost the same result. This will not work on non flat grounds (many bigger bumps which cast bigger shadows and also change active light absorbtion area to anisotropic). Urban areas are usually flat enough (inclination does not matter) so the technique is as follows:
camera and viewport
the ground map is a target screen so set the viewpoint to underground looking towards Sun direction upwards. Resolution is at least your map resolution and there is no perspective projection.
rendering light map 1st pass
first clear map with the full radiation (direct+diffuse) (light blue) then render buildings/objects but with diffuse radiation only (shadow). This will make the base map without reflections and or soft shadows in the Magenta rendering target
rendering light map 2nd pass
now you need to add building faces (walls) reflections for that I would take every outdoor face of the building facing Sun or heated enough and compute reflection points onto light map and render reflection directly to map
in tis parts you can add ray tracing for vertexes only to make it more precise and also for including multiple reflections (bu in that case do not forget to add scattering)
project target screen to destination radiation map
just project the Magenta rendering target image to ground plane (green). It is only simple linear affine transform ...
post processing
you can add soft shadows by blurring/smoothing the light map. To make it more precise you can add info to each pixel if it is shadow or wall. Actual walls are just pixels that are at 0m height above ground so you can use Z-buffer values directly for this. Level of blurring depends on the scattering properties of the air and of coarse pixels at 0m ground height are not blurred at all
IR
this can be done in similar way but temperature behaves a bit differently so I would make several layers of the scene in few altitudes above ground forming a volume render and then post process the energy transfers between pixels and layers. Also do not forget to add the cooling effect of green plants and water vaporisation.
I do not have enough experience in this field to make more suggestions I am more used to temperature maps for very high temperature variances in specific conditions and material not the outdoor conditions.
PS. I forgot albedo for IR and visible light is very different for many materials especially aluminium and some wall paintings

What is the recognition rate of PCA eigenfaces?

I used the Database of Faces (formally the ORL Database) from the AT&T Laboratories Cambridge. The database consists of 400 images with 10 images per person, i.e, there is 10 images of each 40 person.
I separated 5 images of each person for training and the remaining 5 images of each person for testing.
So I have 2 folders:
1) Training (5 images/person = 200 images)
2) Testing (5 images/person = 200 images)
The photos in the training folder are different from those in the testing folder.
The percentage recognition rate I got is only 80%.
But if I pre-process the image before recognition I got:
pre-processing with imajust: 82%
pre-processing with sharpen: 83%
pre-processing with sharpen and imadjust: 84%
(If pre-processing is done, it is applied to bot training and testing images)
For the number of eigenfaces used, all eigenvalues of matrix L are sorted and those who are less than a specified threshold, are eliminated.
L_eig_vec = [];
for i = 1 : size(V,2)
if( D(i,i)>1 )
L_eig_vec = [L_eig_vec V(:,i)];
end
end
I use matlab to implement the face recognition system. Is it normal that the recognition rate is that low?
The accuracy would depend on the classifier you are using once you have the data in the PCA projected space. In the original Turk/Pentland eigenface paper
http://www.face-rec.org/algorithms/PCA/jcn.pdf
they just use kNN / Euclidean distance but a modern implementation might use SVMs e.g. with an rbf kernel as the classifier in the "face space", with C and gamma parameters optimized using a grid search. LibSVM would do that for you and there is a Matlab wrapper available.
Also you should register the faces first i.e. warping the images so they have facial landmarks e.g. eyes, nose, mouyth in a harmonised position across all the dataset? If the images aren't pre-registered then you will get a performance loss. I would expect a performance in the 90s for a dataset of 5 people using Eigenfaces with SVM and pre-registration. That figure is a gut feeling based on prior implementation / performance of past student projects. One thing to note however is your number of training examples is very low - 5 points in a high dimensional space is not much to train a classifier on.

Advice to consider when training a robust cascade classifier?

I'm training a cascade classifier in order to detect animals in images. Unfortunately my false positive rate is quite high (super high using Haar and LBP, acceptable using HOG). I'm wondering how I could possibly improve my classifier.
Here are my questions:
what is the amount of training samples that is necessary for a robust detection? I've read somewhere that 4000 pos and 800 neg samples are needed. Is that a good estimate?
how different should the training samples be? Is there a way to quantify image difference in order to include / exclude possible 'duplicate' data?
how should I deal with occluded objects? should I train only the part of the animal that is visible, or should I rather pick my ROI so that the average ROI is quite constant?
re occluded objects: animals have legs, arms, tails, heads etc. Since some body parts tend to be occluded quite often, does it make sense to select the 'torso' as the ROI?
should I try to downscale my images and train on smaller images sizes? Could this possibly improve things?
I'm open for any pointers here!
4000 pos - 800 neg is a bad ratio. The thing with negative samples is that you need to train your system as many of them as possible, since Adaboost ML algorithm -the core algorithm for all haar like feature selection processes- depends highly on them. Using 4000 / 10000 would be a good enhancement.
Detecting "animals" is a hard problem. Since your problem is a decision process, which is already NP-hard, you are increasing complexity with your range of classification. Start with cats first. Have a system that detects cats. Then apply the same to the dogs. Have, say 40 systems, detecting different animals and use them for your purpose later on.
For training, do not use occluded objects as positives. i.e. if you want to detect frontal faces, then train frontal faces with only applying position and orientation changes, without including any other object in front of it.
Downscaling is not important as the haar classifier itself downscales everything to 24x24. Watch whole viola-jones presentation when you have enough time.
Good luck.

Resources