Autotuning an asymmetric PID Loop utilizing the Relay Method - controls

I am currently working on a custom PID autotuner for a temperature controller. My heater is controlled by a relay and I do not have a cooling element. I am struggling to apply the Åström-Hägglund method to my situation, following this document.
First, the heater block does not function in a symmetric sinusoidal fashion. I do not fully understand how to determine Ku and Pu from this situation.
Second, my relay does not provide the option for a negative step, as seems to be required by Åström-Hägglund. I rely on newton's law of cooling for this part.
How do I account for these discrepancies in my calculation? Is there a better tuning method to attempt?
My Tuning Profile -> This may be compared to the tuning profile in figure 4 on page 2 in the document linked above for further clarification.

Related

How to test if my implementation of back propagation neural Network is correct

I am working on an implementation of the back propagation algorithm. What I have implemented so far seems working but I can't be sure that the algorithm is well implemented, here is what I have noticed during training test of my network :
Specification of the implementation :
A data set containing almost 100000 raw containing (3 variable as input, the sinus of the sum of those three variables as expected output).
The network does have 7 layers all the layers use the Sigmoid activation function
When I run the back propagation training process:
The minimum of costs of the error is found at the fourth iteration (The minimum cost of error is 140, is it normal? I was expecting much less than that)
After the fourth Iteration the costs of the error start increasing (I don't know if it is normal or not?)
The short answer would be "no, very likely your implementation is incorrect". Your network is not training as can be observed by the very high cost of error. As discussed in comments, your network suffers very heavily from vanishing gradient problem, which is inevitable in deep networks. In essence, the first layers of you network learn much slower than the later. All neurons get some random weights at the beginning, right? Since the first layer almost doesn't learn anything, the large initial error propagates through the whole network!
How to fix it? From the description of your problem it seems that a feedforward network with just a single hidden layer in should be able to do the trick (as proven in universal approximation theorem).
Check e.g. free online book by Michael Nielsen if you'd like to learn more.
so I do understand from that the back propagation can't deal with deep neural networks? or is there some method to prevent this problem?
It can, but it's by no mean a trivial challenge. Deep neural networks have been used since 60', but only in 90' researchers came up with methods how to deal with them efficiently. I recommend reading "Efficient BackProp" chapter (by Y.A. LeCun et al.) of "Neural Networks: Tricks of the Trade".
Here is the summary:
Shuffle the examples
Center the input variables by subtracting the mean
Normalize the input variable to a standard deviation of 1
If possible, decorrelate the input variables.
Pick a network with the sigmoid function f(x)=1.7159*(tanh(2/3x): it won't saturate at +1 / -1, but instead will have highest gain at these points (second derivative is at max.)
Set the target values within the range of the sigmoid, typically +1 and -1.
The weights should be randomly drawn from a distribution with mean zero and a standard deviation given by m^(-1/2), where m is the number of inputs to the unit
The preferred method for training the network should be picked as follows:
If the training set is large (more than a few hundred samples) and redundant, and if the task is classification, use stochastic gradient with careful tuning, or use the stochastic diagonal Levenberg Marquardt method.
If the training set is not too large, or if the task is regression, use conjugate gradient.
Also, some my general remarks:
Watch for numerical stability if you implement it yourself. It's easy to get into troubles.
Think of the architecture. Fully-connected multi-layer networks are rarely a smart idea. Unfortunately ANN are poorly understood from theoretical point of view and one of the best things you can do is just check what worked for others and learn useful patterns (with regularization, pooling and dropout layers and such).

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

How to determine if a current set of data values represent or relate to previous historic data values?

I am trying to develop an method to identify browsing pattern of a user on the basis of page requests.
In a simple example I have created 8 pages and for each page request from the user to the page I have stored that page's request frequency in the database as you can see below:
Now, my hypothesis is to identify the difference in the page request pattern, which leads to my assumption that if the pattern differs from pre-existing one then its a different (fraudulent) user. I am trying to develop this method as a part of an Multifactor-Authentication system.
Now when a user logs in and browses with a different pattern from the ones observed previously, the system should be able to identify it as a change in pattern.
Question is how to utilize these data values to check if current pattern relates to pre-existing patterns or not.
OK, here's a pretty simple idea (and basically, what you're looking to do is generate a set of features, then identify if the current session behaviour is different to the previously observed behaviour). I like to think of these one-class problems (only normal behaviour to train on, want to detect significant departure) as density estimation problems, so here's a simple probability model which will allow you to get the probability of a current request pattern. Basically, when this gets too low (and how low that is will be something you need to tune for the desired behaviour), something is going on.
Our observations consist of counts for each of the pages. Let their sum, the total number of requests, be equal to c_total, and counts for each page i be p_i. Then I'd propose:
c_total ~ Poisson(\lambda)
p|c_total ~ Multinomial(\theta, c_total)
This allows you to assign probability to a new observation given learned user-specific parameters \lambda (uni-variate) and \theta (vector of same dimension as p). To do this, calculate the probability of seeing that many requests from the pmf of the Poisson distribution, then calculate the probability of seeing the page counts from the multinomial, and multiply them together. You probably then want to normalise by c_total so that you can compare sessions with different numbers of requests (since the more requests, the more numbers < 1 you're multiplying together).
So, all that's left is to get the parameters from previous, "good" sessions from that user. The simplest thing is maximum likelihood, where \lambda is the mean total number of requests in previous sessions, and \theta_i is the proportion of all page views which were p_i (for that particular user). This may work for you: however, given that you want to be learning from very small numbers of observations, I'd be tempted to go with a full Bayesian model. This will also let you neatly update parameters after each non-suspicious observation. Inference in these distributions is very easy, with conjugate priors for \lambda and \theta and analytic predictive distributions, so it won't be difficult if you're familiar with these kinds of model at all.
One approach would be to use an unsupervised learning method such as a Self-Organizing Map (SOM, http://en.wikipedia.org/wiki/Self-organizing_map). Train the SOM on data representing expected/normal user behavior and then see how well the candidate data set fits the trained map. Keywords to search for in conjunction with "Self-organizing maps" might be "novelty/anomaly/intrusion detection" (turns up e.g. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.2616&rep=rep1&type=pdf)
You should think about whether fraudulent use-cases can be modeled in advance (in which case you can train detectors specifically for them) or whether only deviations from normal behavior are of interest.
If you want to start simple, implement a cosine similarity measure. This would allow you to define a set of "good" vectors. The current user's activity could be compared to the good vectors. If you cannot retrieve a good vector, then the activity is flagged.

Determining the duration of a frequency and the magnitude

I am working with a system in which I am getting data from a sensor (gyro) at 1KHz.
What I am trying to do is determine when the system is vibrating so that I can turn down the PID gains on the output.
What I currently have is a high pass filter on the incoming values. I then have set the alpha value to 1/64, which I believe should be filtering for about a 10KHz frequency. I then take this value and then integrate if it is individual above a threshold. When my integrated value passes another threshold, I then assume that the system is vibrating. I also reset the integrated value every half second to ensure that it does simply grow towards the threshold.
What I am trying to do with this system is make sure that it is really vibrating and not seeing a jolt. I have tried to do this with a upper limit to how much will be added to the integrated value, but this is not really appearing to work.
What I am looking for is any better way to go about detecting that the system is vibrating, and not being effected by a jolt, my primary issue is that that I do not miss detect a jolt for a vibration because then that will cause the values on the PID to be lowered unnecessarily.
FFT. It will separate out the "jolts" from the vibrations, because jolts will register across all frequencies and vibrations will spike around a particular frequency.
I agree with the above. There are many free algorithms for the Fast Fourier Transform avalible online. If you are not familiar with the FFT it is an operation that defines a relationship between a
function in the time domain and its representation in the frequency domain, enabling
analysis of the original function’s frequency content. This will enable you to determine if there is any noise or oscillatory behavior in your signal or time-series.
Another method your could use to establish whether your time-series has underlying periodicity is that of the Structure Function (Structure Function Analysis). Structure function analysis provides a method of quantifying time variability in a signal without the problem of aliasing, or windowing, that are encountered using the traditional FFT technique. Potentially it is able to provide information on the nature of the process that causes variability. The method is mainly concerned with the categorization of underlying noise processes and the identification of correlation time-scales. This is a fairly simple algorithm that you could probably write yourself.
Going one step further and being more "snazzy" would be to use a Wavelet Transform. Fourier analysis is a very powerful tool for detecting and quantifying periodic oscillations in time-series; that is signals of truly constant period, phase, and amplitude. However, real systems almost never exhibit such consistent behavior; periodic oscillations often arising intermittently as transient phenomenon. Although Fourier analysis can, to some extent detect and quantify such transient behavior, it is far from ideal for such purposes. Wavelet analysis has been developed to overcome these difficulties. See http://atoc.colorado.edu/research/wavelets/software.html for some source code and more information about wavelets.

An explanation of the packet-pair probing algorithm in plain language

Networked applications often benefit from the ability to estimate the bandwidth between two end-points on the Internet. This may be advantageous not only for rate control purposes, but also in isolating preferred connections where a number of alternatives exist.
Although there are a couple of rigorous treatments of packet-pair probing, a summary of the high-level principles and salient points, covering both the how and the why of the method would be very beneficial; even if only to serve as a bootstrap to more in-depth study.
Any pointers to implementations or usage of packet-pair probing that serve as good examples would also be much appreciated.
Update:
I found some good soft introduction material in a usenix paper derived from work on the nettimer tool - in particular the discussion concerning use of cross-talk filters and sampling windows for increased agility make a lot of sense.
About high-level principles: traditional means of estimating bandwidth send one packet to target and wait for it to return, then send another packet and wait for return, etc... in a sequential way. Then one computes some kind of average/median of the total time of the return trip per k-byte (or any other unit). This information is then used against the theoretical maximum bandwidth (when available) to estimate the available unused bandwidth.
Packet-pair probing send a group of packets to the target at once (i.e., in a parallel way) and wait for them to return. Then a kind of average/median is computed too and evaluated against the maximum theoretical bandwidth.
If you send more packets at once, you are disturbing the system you are trying to measure and you have to take this into account in your estimations, but it goes faster than the one-by-one method and feels more like a snapshot. The bottom question is: what's the trade-off between measurement accuracy and speed of measurement in both cases? Is there any value in this trading?
I have written a program for bandwidth estimation using packet pair method. If anyone wants to have a look at it, will be happy to share it..
EDIT:
Here, is how, I had implemented it in a class assignment,
https://github.com/npbendre/Bandwidth-Estimation-using-Packet-Pair-Probing-Algorithm
Hope this helps!

Resources