How to deal with negative and positive variables in a log transformation - transformation

I am creating a baseline.
This baseline is composed by some variables.
Some of these variables are positive, whilst others present positive and negative values.
Here the problem: I need to do a log transformation of these variables for another model. How can I deal with it?
Do I need to add to every pos/neg vector its minimun in absolute value +1?
In this case, my baseline will be modified. Can I proportionate again with the constant used (its minimun in absolute value +1)?
Thank you in advance to everyone.

Related

lmfit: defining a parameter to follow a Gaussian distribution

I noticed that one can constraint parameters in LMFIT using min, max, and /or use an expression. I was wondering if there is a way I could use an expression to constrain a parameter to follow a normal distribution defined by a mean and standard deviation. For example, one of my parameters lies between -3000 and 5000, if I specify these as minimum and maximum value, the optimizer considers them as equally likely (uniform) but instead I want it to consider values far from the mean less likely (i.e normal). Thank you.
Specifying min and max values does not actually assert equal probability for all values between these bounds. It does assert zero probability outside the bounds.
A non-linear least-squares fit as done with lmfit seeks to find the highest probability value for all parameters, it does not treat all values as equally probable. You supply starting values for each parameter, and the method uses computed gradients (could be analytic, but typically numeric) to find the direction for optimizing each parameter value.
But if I understand your goal, you don't really want "hard wall constraints", but want to penalize the fit if a parameter is too far from the expected value. Lmfit does not have a built-in way to easily enable this, but such penalties can be added in the objective function. One approach is to add a "penalty" value as an added element in the array to be minimized. That is, you can extend the residual. Since "least-squares" a Gaussian distribution for the residual in the first place, you can simply append (np.concatenate) a term of::
(current_parameter_value - expected_value)/sigma_expected_value
to the residual. In some sense, this is similar to a regularization and is sometimes called a restraint to allow but penalize values for a parameter that are far from the expected value.
Hope that makes sense!

Why must initialize smallest representable number for REDUCTION clause for MAX in OpenMP?

I read a initialization rule for variables included in reduction clauses in OpenMP, from the pdf:
Parallel Programming in Fortran 95 using OpenMP, 2002.
In the table 3.1, it is said that for MAX operation, the init value should be smallest representable number.
So it means I have to use the smallest representable number in my computer for the specific type of variables?? Why cannot just use a variable that is enough for the correct result?
Using the smallest possible numbers ensures that whatever other value you pass will be higher and MAX will use return that one instead of the default one.
If you where to use another value and then MAX gets called with a value smaller than that, which is possible, MAX will return the initial/default one.
You may consider that to be purely academic, but if you still need to pick an initial value why not use the only one that works on all cases?

Algorithm to pick a single value from a vector in matlab?

I have a vector of equiprobable values. Let's say:
[ 12.62 22.856 22.983 23.111 24.295]
I have to pick a single value among these values. In this case, in my opinion, I would exclude 12.62. Then the mean value among the remaining (22.856 22.983 23.111 24.295) is 23.311. I think 23.111 should be a good choice between these equiprobable values. When considering a general vector of arbitrary values and dimension which criterion/algorithm should I use to pick a single value inside the considered vector?
I'm trying to read (i.e., guess) from your question and from the comments so far what criteria you're implicitly already applying. I came up with these:
The vector shall be mapped to one of its elements.
Outliers amongst the elements shall neither be candidates nor influence which of the other elements is chosen.
The mean of the candidates shall be in some way relevant to what candidate is chosen.
with these criteria, one can come up with the following algorithm:
Identify outliers (criteria of what constitutes an outlier to be determined)
Remove outliers
Compute mean of remaining values
Choose the one of the remaining values closest to that mean
Off course there are infinitely many other algorithms that'd also satisfy the identified criteria, with varying meaningfulness and varying applicability to different use-cases. And off course, again depending on the actual use-case, the criteria I identified might or might not be the correct generalization of what you did.
When considering a general vector of arbitrary values and dimension which criterion/algorithm should I use to pick a single value inside the considered vector?
It depends what you want.
It can be mean(input_vector) or norm(input_vector).
You should first ask yourself what you want this scalar value to be/represent.
A simple solution:
vector=[ 12.62 22.856 22.983 23.111 24.295];
all_idx = 1:length(vector);
outlier_idx = abs(vector- median(vector)) > 2*std(vector);
vector=vector(~outlier_idx);
val = mean(vector);
tmp = abs(vector-val);
[v, idx] = min(tmp);
vector(idx)
This code returns 23.111, I assumed that outliers are more than two standard deviations from the median..

Mathematica - Solving for the input of a taylor series such that coefficients are minimized

I need to find the value of a variable s such that the taylor expansion of an expression involving s:
Has a minimum (preferably zero, but due to binary minimum is sufficient) in as many coefficients other than 0th order as possible (preferably more than that one minimum coefficient, but 2nd and 3rd have priority).
reports the best n values of s that fulfill the condition within the region (ie show me the 3 best values of s and what the coefficients look like for each).
I have no idea how to even get the output of a Series[] command into any other mathematica command without receiving an error, much less how to actually solve the problem. The equation I am working with is too complex to post here (multi-regional but continuous polynomial expression that can be expanded). Does anyone know what commands to use for this?
The first thing you should realize is that the output of Series is not a sum but a a SeriesData object. To convert it into a sum you have to wrap it in Normal[Series[...]]. Since the question doesn't provide details, I can't say more.

Genetic Algorithm Implementation for weight optimization

I am a data mining student and I have a problem that I was hoping that you guys could give me some advice on:
I need a genetic algo that optimizes the weights between three inputs. The weights need to be positive values AND they need to sum to 100%.
The difficulty is in creating an encoding that satisfies the sum to 100% requirement.
As a first pass, I thought that I could simply create a chrom with a series of numbers (ex.4,7,9). Each weight would simply be its number divided by the sum of all of the chromosome's numbers (ex. 4/20=20%).
The problem with this encoding method is that any change to the chromosome will change the sum of all the chromosome's numbers resulting in a change to all of the chromosome's weights. This would seem to significantly limit the GA's ability to evolve a solution.
Could you give any advice on how to approach this problem?
I have read about real valued encoding and I do have an implementation of a GA but it will give me weights that may not necessarily add up to 100%.
It is mathematically impossible to change one value without changing at least one more if you need the sum to remain constant.
One way to make changes would be exactly what you suggest: weight = value/sum. In this case when you change one value, the difference to be made up is distributed across all the other values.
The other extreme is to only change pairs. Start with a set of values that add to 100, and whenever 1 value changes, change another by the opposite amount to maintain your sum. The other could be picked randomly, or by a rule. I'd expect this would take longer to converge than the first method.
If your chromosome is only 3 values long, then mathematically, these are your only two options.

Resources