How do I set a max value for a parameter to be dependent on the value of another parameter in lmfit? - lmfit

I am currently trying to estimate a Voigt profile on a measurement. I want to set an upper limit for the parameter 'amplitude', where the value of the upper limit is decided by another parameter, gamma:
Voigt_dBm = Model(V_dBm) #V_dBm is defined as a Voigt profile
params = Voigt_dBm.make_params(gamma=5, alpha=720, ...
amplitude=2e-8, offset=1e-9, max_lin=max(y_lin)) #Values for parameters are appropriate for the data
params.add('max_lin', vary=False) #This value comes from the data and should be kept static
params.add('amplitude',max=max_lin**(gamma*2)**2) <--- This is where I want to add the gamma-dependt limit
result = Voigt_dBm.fit(y,params,x=f,nan_policy='propagate')

lmfit does not allow using expressions for bounds - the bounds need to have values known before the fit begins and cannot change during the fit.
You could do something like this:
params = Voigt_dBm.make_params(gamma=5, alpha=720, offset=1e-9, ...)
params.add('max_lin', value=maxy(y_lin), vary=False)
params.add('amp_offset', value=max(y_lin)**(gamma*2)**2/4.0, min=0)
params.add('amplitude', expr='max_lin**(gamma*2)**2 - amp_offset')
This will constrain amplitude to be max_lin**(gamma*2)**2 minus some variable amount. By putting a limit on that amplitude offset that it must be positive,
the resulting amplitude cannot exceed your max_lin**(gamma*2)**2, even if gamma changes during the fit. I guessed an initial value of 1/4 of that amount, but maybe you would have a better idea of what a reasonable initial value should be.
You can put bounds on parameters constrained by a mathematical expression, so if you wanted to ensure amplitude was positive, you could add min=0 to params.add('amplitude', ....).

Related

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

Is it possible to pre-assign values to decision variables in CPLEX OPL

I have a large number of variables ( both Binary and Continuous). Therefore I have determined a logic to assign some variables set to 0 so that they do not become part of the optimisation process.
For example I have a binary decision variable y[b][t]:
where b varies from 1 to 100
and t from 1 to 5.
I could determine using some logic that y[20][2] onwards to y[100][2] would be 0. I want to assign the fixed value of 0 to these variables y[20][2] onwards to y[100][2] thereby reducing the number of variables in my optimisation problem. While y is a binary decision variable I have other continuous variable as well which I would like to similarly set to 0 in advance.
Is there a way how this can be achieved ? I haven't used Python with CPEX but hear that this can be probably be achieved by setting a lower and upper bound of the variables. Is there a similar method in OPL.
----Added 13th Aug
May be I was not very clear or I could not understand the solution suggested.
What I wanted is say I have the following decision variable Xbmt ...(I have a few of them)
Originally declared as :
dvar float+ Xbmt[PitBlocks][Plants][TimePeriods];
But for some of the PitBlocks and some time periods I want to define this decision variable as 0. Those time periods for which I want to set the decision variable as 0 are defined in a tuple nullVariables. It has block id same as PitBlocks, and it has time_period same as TimePeriod. Hence I want something like below. But I cannot declare the decision variable twice. I need it 0 only for those ids in the nullVariable set.
dvar float+ Xbmt[NullVariablesSet.block_id][Plants][NullVariablesSet.time_period] in 0..0;
How can this be achieved where some of Xbmt remain as decision variables where as some are removed by setting as 0
see https://github.com/AlexFleischerParis/zooopl/blob/master/zoopreassign.mod
within
Making Decision Optimization Simple
int nbKids=300;
{int} seats={40,30}; // how many seats, {} means this is a set
float costBus[seats]=[500,400];
// Now let s see how preassign some decision variables
// Suppose we know that we have exactly 6 buses 40 seats
{int} preassignedseats={40};
int preassignedvalues[preassignedseats]=[6];
dvar int+ nbBus[s in seats]
in
((s in preassignedseats)?preassignedvalues[s]:0)
..
((s in preassignedseats)?preassignedvalues[s]:maxint);
minimize sum(b in seats) costBus[b]*nbBus[b];
subject to
{
sum(b in seats) b*nbBus[b]>=nbKids;
}

EasyPredictModelWrapper giving wrong prediction

public BinomialModelPrediction predictBinomial(RowData data) throws PredictException {
double[] preds = this.preamble(ModelCategory.Binomial, data);
BinomialModelPrediction p = new BinomialModelPrediction();
double d = preds[0];
p.labelIndex = (int)d;
String[] domainValues = this.m.getDomainValues(this.m.getResponseIdx());
p.label = domainValues[p.labelIndex];
p.classProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.classProbabilities, 0, p.classProbabilities.length);
if(this.m.calibrateClassProbabilities(preds)) {
p.calibratedClassProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.calibratedClassProbabilities, 0, p.calibratedClassProbabilities.length);
}
return p;
}
Eg: classProbabilities =[0.82333,0,276666]
labelIndex = 1
label = true
domainValues = [false,true]
what does this labelIndex signifies and does the class probabilities
order is same as the domain value order ,If order is same then it means that here probability of false is 0.82333 and probability of true is 0.27666 but why is this labelIndex showing as 1 and label as true.
Please help me to figure out this issue.
Like Tom commented, the prediction is not "wrong". You can infer from this that the threshold H2O has chosen is less than 0.27666. You probably have imbalanced training data, otherwise H2O would have not picked a low threshold for classifying a predicted value of 0.27666 as a 1. Does your training set include fewer examples of the positive class than the negative class?
If you don't like that threshold for whatever reason, then you can manually create your own. Just make sure you know how to properly evaluate the effect of using different thresholds on the performance of your model, otherwise I'd recommend just using the default threshold.
The name, "classProbabilities" is a misnomer. These are not actual probabilities, they are predicted values, though people often use the terms interchangeably. Binary classification algorithms produce "predicted values" that look like probabilities when they're between 0 and 1, but unless a calibration process is performed, they are not going to represent the probabilities. Calibration is not necessarily a straight-forward process and there are many techniques. Here's some more info about calibration methods for imbalanced data. In H2O, you can perform calibration using Platt scaling using the calibrate_model option. But this is probably not really necessary to what you're trying to do.
The proper way to use the raw output from a binary classification model is to only look at the predicted value for the positive class (you can simply ignore the predicted value for the negative class). Then you choose a threshold which suits your needs, or you can use the default threshold in H2O, which is chosen to maximize the F1 score. Some other software will use a hardcoded threshold of 0.5, but that will be a terrible choice if you don't have an even number of positive and negative examples in your training data. If you have only a few positive examples in your training data, then the best threshold will be something much lower than 0.5.

how can I get the location for the maximum value in fortran?

I have a 250*2001 matrix. I want to find the location for the maximum value for a(:,i) where i takes 5 different values: i = i + 256
a(:,256)
a(:,512)
a(:,768)
a(:,1024)
a(:,1280)
I tried using MAXLOC, but since I'm new to fortran, I couldn't get it right.
Try this
maxloc(a(:,256:1280:256))
but be warned, this call will return a value in the range 1..5 for the second dimension. The call will return the index of the maxloc in the 2001*5 array section that you pass to it. So to get the column index of the location in the original array you'll have to do some multiplication. And note that since the argument in the call to maxloc is a rank-2 array section the call will return a 2-element vector.
Your question is a little unclear: it could be either of two things you want.
One value for the maximum over the entire 250-by-5 subarray;
One value for the maximum in each of the 5 250-by-1 subarrays.
Your comments suggest you want the latter, and there is already an answer for the former.
So, in case it is the latter:
b(1:5) = MAXLOC(a(:,256:1280:256), DIM=1)

Suggest an algorithm/method for finding a proper value

I have a bunch of values, for example: [1,2,14,51,100,103,107,110,300,505,1034].
And I have a pattern values, for example [1,10,20,100,500,1000].
I need to get the best 'suitable' value FROM pattern. In my example it is 100. How can I detect this value?
Example from life. The app has a bunch of distances between user position and some objects. The app also has a preset filter by distance: [1 meter, 10 meters, 20 meters, 100 meters]. I heed to set the filter by default not just to the first value (1 meter in my example), but to the value which match the bunch of distances the best way(100 meter in my example). I need to detect one value.
Thank you for help and any ideas.
I would say create a function like this (this is not real code) :
var ratio1 = 0.66
var ratio2 = 1.5
function Score(currentPatternValue, arrayOfValues)
{
count = 0
for each value in arrayOfValues <br>
if value > ratio1 * currentPatternValue AND value < ratio2 * currentPatternValue<br>
count++<br>
return count
}
then you run this for each value in your pattern values and pick the one with the highest score returned from that function

Resources