How to remove false detection (False Positives) in Faster RCNN - performance

I am using Faster RCNN with Inception V2 on custom dataset. My model is working fine with good detection accuracy. However, I am facing false positive problem when I pass an image to the model I get correct prediction but I am also getting some wrong bounding boxes with high confidence score. Is there any method which can be used as a post-processing to remove these extra detection?

It seems like this is a common problem with Transfer Learning, you should check out this discussion. In the end it seems that all boils down to what's the source of your false positives.
For instance, once I trained a detector to detect smoke in wildfire images, but it ended up also catching the clouds. To solve that I also annotated the clouds as a new class, and ignored it's detections. This greatly improved the performance.
If it's making wrong detections with high confidence I think it would be hard to solve this problem with only a post-processing.
You could also try hard mining. Altough I'm not really sure how to do it for Faster R-CNN.

While testing the model, a value like this in the code
min_score_thresh=0.90
If you set this value to 0.90, it will show results that are 90% or more correct.

Related

Time Series Anomaly Detection from Data vs Image

I was assigned with project to do anomaly detection on for our company KPI. I googled and found AnomalyDetection by Twitter. There was an idea from my colleague to do the anomaly detection on the graph images (comparing with previous week images to identify anomaly points) instead of using time-series raw data.
I am not familiar with the Anomaly Detection, anyone here experienced and able to advice which one is better (Anomaly Detection from data or image) in term of:
1. Accuracy
2. Storage
3. Processing
Advantages:
Data-agnostic. Can theoretically be ran on anything where one can get an image/visualization out.
Image models are relatively well understood.
Pretrained models are available.
Disadvantages:
Requires much more data to learn useful model.
The image pixel space is much more complicated than the time-series it represents. Probably at least 100x.
Requires much more compute power. Both at training time, and at prediction time. Probably at least 100x.
Requires much more storage for datasets. Probably at least 100x.
Sensitive to changes in visualization.
A change in tickmarks or font for example would be an anomaly. Even a change in image compression may impact, if not controlled for.
Lose explain-ability. May be hard to know why a certain image is anomaly, even for simple cases like a mean shift.
Much more complex model setup and infrastructure needed
For an application like Anomaly Detection on Time Series on metrics, I would not recommend doing it. I am not even sure I have seen it studied.
I think it is unlikely that a high performing Anomaly Detection system for metrics can be built effectively with image processing on graphs.
Anomalies are typically quite rare, which means that it is a "low data" scenario. But also many anomalies are quite simple, and can be detected with simple methods - as basic as well chosen thresholds can go a long way. Using image processing does not help with any of these challenges, in fact it is worse in most regards.

Regularization vs. Validation

What I always see in the papers and articles about under/overfitting is a falling curve for training error and a U-shaped curve for testing error, saying the area left to the U-curve bottom is subject to underfitting and the area right to it is subject to overfitting.
To find the best model, we can test each configuration (e.g. changing the number of nodes and layers) and compare the test error values to find the minimum point (typically via cross-validation). That looks straightforward and perfect.
Do we need a regularizer to achieve this point? This is what I am not sure I have the topic understood well. To me, it seems that we don't need a regularizer if we can test different model configurations. The only case when a regularizer comes to play is when we have a fixed model configuration (e.g. fixed number of nodes and layers) and don't want to try other configurations, so we use regularizer to limit the model complexity by forcing other model parameters (e.g. network weights) to low values. Is this view right?
But if it is right, then what is the intuition behind it? First of all, when using a regularizer we don't know in advance if this network configuration/complexity bring us to the right or left of the minimum of test error curve. It may be already underfit, overfit, or fit. Putting math aside, why forcing weights to lower values will cause network to be more generalizable and less overfit? Is there any analogy of this method with the previous method of moving along test loss curve to find its minimum? Also regularizer does its job while training, it can not do anything with test data. How can it help to move toward minimum test error?

Keras «Powerful image classification with little data»: disparity between training and validation

I followed this post and first made it work on the dataset «Cats vs dogs». Then I substituted this set with my own images, which show the presence of an object vs the absence of that object. My dataset is even smaller than the one in the post. I only have 496 images containing that object for training and 160 images with that object for validation. For the «absent» class I have numerous samples (without that object in an image).
So far I didn't try class_weight to tackle the imbalanced data problem. I just randomly choose 496 and 160 images without that object for training and validation, respectively. Basically, I do a two class image classification with a smaller dataset using the techniques in this post. Thus I expected a worse performance in comparison due to the insufficient data. But the actual problem is that the performance is not convergent as shown in the figures.
Could you tell me possible reasons that lead to the unconvergence? I guess the problem is related to my dataset as the model works perfectly for «cats vs dogs». But I don't know how to address it. Are there any good techniques to make it convergent?
Thank you.
This performance plot is based on VGG16, keeping all layers up to fully connected layer and training a small fully connected layer with 256 neurons.
This performance plot is also based on VGG16, but using 128 neurons instead of 256 neurons. Also I set epochs to 80.
Based on the suggestions provided so far, I'm thinking to have a customized convnet model to fight the overfitting problem. But how to do this? One of my worries is that a model with fewer layers will downgrade the performance for training. Any guidelines to customize a good model for little data? Thank you.
Updates:
Now I think I know the half reason that leads to the unconvergent problem. You know, Actually I only have 100+ images. The rest images are downloaded from Flickr. I thought those images having centric objects and better quality will work for the model. But later on I found they can not contribute to the accuracy and even worse the output class probabilities. After removing these downloaded images, the performance is bumping upward a little and the uncovergency is gone. Note I only use 64*2 images for training and 48*2 images for testing. Also I found the image augmentation could not improve the performance for my dataset. Without image augmentation, the training accuracy could reach 1. But if I add some image augmentation, the training accuracy is only around 85%. Did somebody have such experience? Why doesn't data augmentation always work? Because our specific dataset? Thank you very much.
Your model is working great, but it's "overfitting". It means it's capable of memorizing all your training data without really "thinking". That leads to great training results and bad test results.
Common ways to avoid overfitting are:
More data - If you have little data, the chance of overfitting increases
Less units/layers - Make the model less capable, so it will stop memorizing and start thinking.
Add "dropouts" to your layers (something that randomly discards part of the results to prevent the model from being too powerful)
Do more layers mean more power and performance?
If by performance you mean capability of learning, yes. (If you mean "speed", no)
Yes, more layers mean more power. But too much power leads to overfitting: the model is so capable that it can memorize training data.
So there is an optimal point:
A model that is not very capable will not give you the proper results (both training and test results will be bad)
A model that is too capable will memorize the training data (excellent training results, but bad test results)
A balanced model will learn the right things (good training and test results)
That's exactly why we use test data, it's data that is not presented for training, so the model doesn't learn from the test data.

Identify changes in the slope using machine learning

I want to get my hands dirty with some machine learning, and I finally have a problem which seems like a good beginner project. However, despite reading a lot about the subject I am unsure how to get started, and what my basic approach should be.
I have a dataset which should look like this.
a real dataset looks more like this:
I want to identify the points in the red circles (on the first image), and be robust against occasional artifacts like the one in the blue circle.
I sounds like a really easy task. However, the is quite a lot of noise in the raw data. My current implementation is pretty traditional. It blurs the data and compares the first and second derivative to some estimated threshold values. This approach works, but can "only" identify the points with ~99.7% accuracy, but since I do around 100.000 measurements a day I would love to increase this number.
So, this is what I have:
All the datasets I want/need
A pretty good model of how the data should look.
A pretty good training set, using my existing algorithm (the outlines can be fixed manually)
However, I do not have a basic idea how what approach I should use. I feels like none of the material I've read on machine learning fit's this problem.
Can someone help me with the super high level approach to solve this problem?

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

Resources