I noticed that one can constraint parameters in LMFIT using min, max, and /or use an expression. I was wondering if there is a way I could use an expression to constrain a parameter to follow a normal distribution defined by a mean and standard deviation. For example, one of my parameters lies between -3000 and 5000, if I specify these as minimum and maximum value, the optimizer considers them as equally likely (uniform) but instead I want it to consider values far from the mean less likely (i.e normal). Thank you.
Specifying min and max values does not actually assert equal probability for all values between these bounds. It does assert zero probability outside the bounds.
A non-linear least-squares fit as done with lmfit seeks to find the highest probability value for all parameters, it does not treat all values as equally probable. You supply starting values for each parameter, and the method uses computed gradients (could be analytic, but typically numeric) to find the direction for optimizing each parameter value.
But if I understand your goal, you don't really want "hard wall constraints", but want to penalize the fit if a parameter is too far from the expected value. Lmfit does not have a built-in way to easily enable this, but such penalties can be added in the objective function. One approach is to add a "penalty" value as an added element in the array to be minimized. That is, you can extend the residual. Since "least-squares" a Gaussian distribution for the residual in the first place, you can simply append (np.concatenate) a term of::
(current_parameter_value - expected_value)/sigma_expected_value
to the residual. In some sense, this is similar to a regularization and is sometimes called a restraint to allow but penalize values for a parameter that are far from the expected value.
Hope that makes sense!
Related
I am creating a baseline.
This baseline is composed by some variables.
Some of these variables are positive, whilst others present positive and negative values.
Here the problem: I need to do a log transformation of these variables for another model. How can I deal with it?
Do I need to add to every pos/neg vector its minimun in absolute value +1?
In this case, my baseline will be modified. Can I proportionate again with the constant used (its minimun in absolute value +1)?
Thank you in advance to everyone.
So I have this issue where I have to find the best distribution that, when passed through a function, matches a known surface. I have written a script that creates the distribution given some parameters and spits out a metric that compares the given surface to the known, but this script takes a non-negligible time, so I can't just run through a very large set of parameters to find the optimal set of parameters. I looked into the simplex method, and it seems to be the right path, but its not quite what I need, because I dont exactly have a set of linear equations, and dont know the constraints for the parameters, but rather one method that gives a single output (an thats all). Can anyone point me in the right direction to how to solve this problem? Thanks!
To quickly go over my process / problem again, I have a set of parameters (at this point 2 but will be expanded to more later) that defines a distribution. This distribution is used to create a surface, which is compared to a known surface, and an error metric is produced. I want to find the optimal set of parameters, but cannot run through an arbitrarily large number of parameters due to the time constraint.
One situation consistent with what you have asked is a model in which you have a reasonably tractable probability distribution which generates an unknown value. This unknown value goes through a complex and not mathematically nice process and generates an observation. Your surface corresponds to the observed probability distribution on the observations. You would be happy finding the parameters that give a good least squares fit between the theoretical and real life surface distribution.
One approximation for the fitting process is that you compute a grid of values in the space output by the probability distribution. Each set of parameters gives you a probability for each point on this grid. The not nice process maps each grid point here to a nearest grid point in the space of the surface. The least squares fit is a quadratic in the probabilities calculated for the first grid, because the probabilities calculated for a grid point in the surface are the sums of the probabilities calculated for values in the first grid that map to something nearer to that point in the surface than any other point in the surface. This means that it has first (and even second) derivatives that you can calculate. If your probability distribution is nice enough you can use the chain rule to calculate derivatives for the least squares fit in the initial parameters. This means that you can use optimization methods to calculate the best fit parameters which require not just a means to calculate the function to be optimized but also its derivatives, and these are generally more efficient than optimization methods which require only function values, such as Nelder-Mead or Torczon Simplex. See e.g. http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/optim/package-summary.html.
Another possible approach is via something called the EM Algorithm. Here EM stands for Expectation-Maximization. It can be used for finding maximum likelihood fits in cases where the problem would be easy if you could see some hidden state that you cannot actually see. In this case the output produced by the initial distribution might be such a hidden state. One starting point is http://www-prima.imag.fr/jlc/Courses/2002/ENSI2.RNRF/EM-tutorial.pdf.
I read a initialization rule for variables included in reduction clauses in OpenMP, from the pdf:
Parallel Programming in Fortran 95 using OpenMP, 2002.
In the table 3.1, it is said that for MAX operation, the init value should be smallest representable number.
So it means I have to use the smallest representable number in my computer for the specific type of variables?? Why cannot just use a variable that is enough for the correct result?
Using the smallest possible numbers ensures that whatever other value you pass will be higher and MAX will use return that one instead of the default one.
If you where to use another value and then MAX gets called with a value smaller than that, which is possible, MAX will return the initial/default one.
You may consider that to be purely academic, but if you still need to pick an initial value why not use the only one that works on all cases?
I have a vector of equiprobable values. Let's say:
[ 12.62 22.856 22.983 23.111 24.295]
I have to pick a single value among these values. In this case, in my opinion, I would exclude 12.62. Then the mean value among the remaining (22.856 22.983 23.111 24.295) is 23.311. I think 23.111 should be a good choice between these equiprobable values. When considering a general vector of arbitrary values and dimension which criterion/algorithm should I use to pick a single value inside the considered vector?
I'm trying to read (i.e., guess) from your question and from the comments so far what criteria you're implicitly already applying. I came up with these:
The vector shall be mapped to one of its elements.
Outliers amongst the elements shall neither be candidates nor influence which of the other elements is chosen.
The mean of the candidates shall be in some way relevant to what candidate is chosen.
with these criteria, one can come up with the following algorithm:
Identify outliers (criteria of what constitutes an outlier to be determined)
Remove outliers
Compute mean of remaining values
Choose the one of the remaining values closest to that mean
Off course there are infinitely many other algorithms that'd also satisfy the identified criteria, with varying meaningfulness and varying applicability to different use-cases. And off course, again depending on the actual use-case, the criteria I identified might or might not be the correct generalization of what you did.
When considering a general vector of arbitrary values and dimension which criterion/algorithm should I use to pick a single value inside the considered vector?
It depends what you want.
It can be mean(input_vector) or norm(input_vector).
You should first ask yourself what you want this scalar value to be/represent.
A simple solution:
vector=[ 12.62 22.856 22.983 23.111 24.295];
all_idx = 1:length(vector);
outlier_idx = abs(vector- median(vector)) > 2*std(vector);
vector=vector(~outlier_idx);
val = mean(vector);
tmp = abs(vector-val);
[v, idx] = min(tmp);
vector(idx)
This code returns 23.111, I assumed that outliers are more than two standard deviations from the median..
I am a data mining student and I have a problem that I was hoping that you guys could give me some advice on:
I need a genetic algo that optimizes the weights between three inputs. The weights need to be positive values AND they need to sum to 100%.
The difficulty is in creating an encoding that satisfies the sum to 100% requirement.
As a first pass, I thought that I could simply create a chrom with a series of numbers (ex.4,7,9). Each weight would simply be its number divided by the sum of all of the chromosome's numbers (ex. 4/20=20%).
The problem with this encoding method is that any change to the chromosome will change the sum of all the chromosome's numbers resulting in a change to all of the chromosome's weights. This would seem to significantly limit the GA's ability to evolve a solution.
Could you give any advice on how to approach this problem?
I have read about real valued encoding and I do have an implementation of a GA but it will give me weights that may not necessarily add up to 100%.
It is mathematically impossible to change one value without changing at least one more if you need the sum to remain constant.
One way to make changes would be exactly what you suggest: weight = value/sum. In this case when you change one value, the difference to be made up is distributed across all the other values.
The other extreme is to only change pairs. Start with a set of values that add to 100, and whenever 1 value changes, change another by the opposite amount to maintain your sum. The other could be picked randomly, or by a rule. I'd expect this would take longer to converge than the first method.
If your chromosome is only 3 values long, then mathematically, these are your only two options.