Math algorithm question - algorithm

I'm not sure if this can be done without some determining factor....but wanted to see if someone knew of a way to do this.
I want to create a shifting scale for numbers.
Let's say I have the number 26000. I want the outcome of this algorithm to be 6500; or 25% of the original number. But if I have the number 5000, I want the outcome to be 2500; or 50% of the original number.
The percentages don't have to be exact, this is just an example.
I just want to have like a sine wave sort of thing. As the input number gets higher, the output number is a lower percentage of the input.
Does that make sense?

Plot some points in Excel and use the "show formula" option on the line.

Something like f(x) = x / log x?
x | f(x)
=======================
26000 | 5889 (22.6 %)
5000 | 1351 (27.2 %)
100000 | 20000 (20 %)
1000000 | 166666 (16.6 %)
Just a simple example. You can tweak it by playing with the base of the logarithm, by adding multiplicative constants on the numerator (x) or denominator (log x), by using square roots, squaring (or taking the root of) log x or x etc.
Here's what f(x) = 2*log(x)^2*sqrt(x) gives:
x | f(x)
=======================
26000 | 6285 (24 %)
5000 | 1934 (38 %)
500 | 325 (65 %)
100 | 80 (80 %)
1000000 | 72000 (7.2 %)
100000 | 15811 (15 %)

A suitable logarithmic scale might help.

It may be possible to define the function you want exactly if you specify a third transformation in addition to the two you've already mentioned. If you have some specific aim in mind, it's quite likely to fit a well-known mathematical definition which at least one poster could identify for you. It does sound as though you're talking about a logarithmic function. You'll have to be more specific about your requirements to define a useful algorithm however.

I'd suggest you play with the power law family of functions, c*x^a == c * pow(x,a) where a is the power. If you want an exact fraction of your answer, you would choose a=1 and it would just be a constant fraction. But you want the percentage to slowly decrease, so you could choose a<1. For example, we might choose a = 0.9 and c = 0.2 and get
1 0.2
10 1.59
100 12.6
1000 100.2
So it ranges from 20% at 1 to 10% at 1000. You can pick smaller a to make the fraction decrease more rapidly. (And you can scale everything to fit your range.)
In particular, if c*5000^a = 2500 and c*26000^a = 6500, then by dividing we get (5.1)^a = 2.6 which we can solve as a = log(2.6)/log(5.1) = 0.58648.... Then we plug back in to get c*147.69 = 2500 so c = 16.927...
Now the progression goes like so
1000 973
3000 1853
5000 2500
10000 3754
15000 4762
26000 6574
50000 9648
90000 13618

This is somewhat similar to simple compression schemes used for analogue audio. See Wikipedia entry for Companding.

Related

Communicate estimates from GEE o linear mixed models to a general audience

I want to transform the estimates from a GEE model into estimates easy to interpret. I am analyzing in Stata v17 (for Mac) the data from a therapeutic intervention in a pilot study with 26 individuals, randomized to either the treatment or a placebo. The outcome is a set of inflammatory proteins, which expression has been normalized as an arbitrary log2 scale. There are longitudinal measurements at weeks 0, 1, 2 and 3.
To evaluate the impact of treatment on each protein, I have used GEE models. For each protein, the code looks like this:
xtgee log2_biomarker treatment##c.week, family(gaussian) link(identity) corr(ar 1)
And the model output
GEE population-averaged model Number of obs = 104
Group and time vars: id week Number of groups = 26
Family: Gaussian Obs per group:
Link: Identity min = 4
Correlation: AR(1) avg = 4.0
max = 4
Wald chi2(3) = 11.38
Scale parameter = 1.018093 Prob > chi2 = 0.0098
----------------------------------------------------------------------------------
log2_biomarker | Coefficient Std. err. z P>|z| [95% conf. interval]
-----------------+----------------------------------------------------------------
treatment |
Placebo | .1010699 .379534 0.27 0.790 -.6428031 .8449428
week | -.1841974 .125257 -1.47 0.141 -.4296967 .0613018
|
treatment#c.week |
Placebo | .3919614 .1771402 2.21 0.027 .044773 .7391497
|
_cons | .063295 .268371 0.24 0.814 -.4627026 .5892926
----------------------------------------------------------------------------------
The interaction term "treatment#c.week" indicates that this protein increases over time. In order to put it in context with the estimates from models for the other proteins, I would like to translate this 0.39 coefficient into something like this:
"Subjects in the placebo arm experience a X % (or X-fold) greater protein increase per week".
But, having a log2 transformed outcome, I am struggling to come up with the correct formula.
Thanks!

Determine max slope of slowly descending signal

I have an analog power signal from a motor. The signal ramps up quickly, but powers off slowly over the course of several seconds. The signal looks almost like a series of plateaus on the descent. The problem is that the signal doesn't settle back to zero. It settles back to an intermediate level unknown, and varying from motor to motor. See chart below.
I'm trying to find a way determine when the motor is off and at that intermediate level.
My thought is to find and store the max point, and calculate the slopes thereafter until the max slope is greater than some large negative slope value like -160 (~ -60 degrees), and declare that the motor must be powering off. The sample points below are with all duplicates removed. (there's about 5000 samples typically).
My problem is determining the X values. In the formula (y2-y1) / (x2 - x1), the x values could far enough away in time that the slope never appears greater than -30 degrees. Picking an absolute number like 10 would fix this, but is there a more mathematically correct method?
The data shows me calculating slope with method described above and the max of 921. ie (y2 -y1) / ( (10+1) - 10). In this scheme, at datapoint 9, i would say the motor is "Off". I'm looking for a more precise means to determine an X value rather than randomly picking 10 for instance.
+---+-----+----------+
| X | Y | Slope |
+---+-----+----------+
| 1 | 65 | 856.000 |
| 2 | 58 | 863.000 |
| 3 | 57 | 864.000 |
| 4 | 638 | 283.000 |
| 5 | 921 | 0.000 |
| 6 | 839 | -82.000 |
| 7 | 838 | -83.000 |
| 8 | 811 | -110.000 |
| 9 | 724 | -197.000 |
+---+-----+----------+
EDIT: A much simpler answer:
Since your motor is either ON or OFF, and ON wattages are strictly higher than OFF wattages, you should be able to discriminate between ON and OFF wattages by maintaining an average wattage, reporting ON if the current measurement is higher than the average and OFF if it is lower.
Count = 0
Average = 500
Whenever a measurement comes in,
Count = Count + 1
Average = Average + (Measurement - Average) / Count
Return Measurement > Average ? ON : OFF
This represents an average of all the values the wattage has ever been. If we want to eventually "forget" the earliest values (before the motor was ever turned on), we could either keep a buffer of recent values and use that for a moving average, or approximate a moving average with an IIR like
Average = (1-X) * Average + X * Measurement
for some X between 0 and 1 (closer to 0 to change more slowly).
Original answer:
You could treat this as an online clustering problem, where you expect three clusters (before the motor turns on, when the motor is on, and when the motor is turned off), or perhaps four (before the motor turns on, peak power, when the motor is running normally, and when the motor turns off). In effect, you're trying to learn what it looks like when a motor is on (or off).
If you don't have any other information about whether the motor is on or off (which could be used to train a model), here's a simple approach:
Define an "Estimate" to contain:
float Value
int Count
Define an "Estimator" to contain:
float TotalError = 0.0
Estimate COLD_OFF = {Value = 0, Count = 1}
Estimate ON = {Value = 1000, Count = 1}
Estimate WARM_OFF = {Value = 500, Count = 1}
a function Update_Estimate(float Measurement)
Find the Estimate E such that E.Value is closest to Measurement
Update TotalError = TotalError + (E.Value - Measurement)*(E.Value - Measurement)
Update E.Value = (E.Value * E.Count + P) / (E.Count + 1)
Update E.Count = E.Count + 1
return E
This takes initial guesses for what the wattages of these stages should be and updates them with the measurements. However, this has some problems. What if our initial guesses are off?
You could initialize some number of Estimators with different possible (e.g. random) guesses for COLD_OFF, ON, and WARM_OFF; after receiving a measurement, let each Estimator update itself and aggregate their values somehow. This aggregation should reward the better estimates. Since you're storing TotalError for each estimate, you could just pick the output of the Estimator that has the lowest TotalError so far, or you could let the Estimators vote (giving each Estimator's vote a weight proportional to 1/(TotalError + 1) or something like that).

How do I representation percentage in evolutionary Algorithm?

Considering I have 4 chromosomes (gi, i=1 to 4}) to represent 4 percentages of different things so that the sum of 4 percentages are equal to 100. How Do I represent this efficiently?
I know that it is possible by: g1/(g1+g2+g3+g4). However, This is not efficient. Consider all gi=0.2 or all gi=0.1 will represent 25% in these two cases. It is possible to generate many cases where different genes present same percentage. Is there any other efficient way, where unique set of combination of genes present unique set of percentages.
Thanks in advance.
I think you're confusing genes and chromosomes. A chromosome encodes a candidate solution to your problem. A gene is part of a chromosome.
Under this setting, why would you want that constraint on the chromosomes? it sounds like you want it on the genes of a chromosome.
In order to do this you can do a number of things: have each gene encode an integer in [0, 100]. If the genes do not add to 100 in the end, penalize the fitness of those chromosomes.
Another way, which might make crossover operators more natural to apply, is to have each gene store 100 bits. If x bits are set, that means the gene will encode x%.
Yet another way is to have the entire chromosome encode 100 set bits. Then each gene will hold a value x, which represents an interval. The number of set bits between two split points is the percentage associated to that gene. For example:
1 2 3 4 5 6 7 8 ... 100
1 1 1 1 1 1 1 1 ... 1
| | | | |
g1 g2 g3 g4
This can be done by generating 5 random numbers <= 100, sorting them and taking the differences between them.
One way to assign X units to N possibilities is to store X * (N-1) bits. Every unit is given (N-1) bits and if k of the (N-1) bits are set then the unit is assigned to k.
This is easy to work with as there are no invalid solutions and no penalties/repairs are necessary. This makes fitness evaluation, crossover and mutation easier to implement.
For example, the problem is to assign 5 units (X) to one of 4 (N) possibilities. Each individual is (4-1)x5=15 bits.
The bit string: 010 100 000 011 111 assigns the first 2 units to possibility 1 because both groups have 1 bit set. The third unit which has no bits set is assigned to 0. The fourth unit is assigned to 2 and the fifth to 3.
partition units
0 1
1 2
2 1
3 1

order of growth in algorithms

Suppose that you time a program as a function of N and produce
the following table.
N seconds
-------------------
19683 0.00
59049 0.00
177147 0.01
531441 0.08
1594323 0.44
4782969 2.46
14348907 13.58
43046721 74.99
129140163 414.20
387420489 2287.85
Estimate the order of growth of the running time as a function of N.
Assume that the running time obeys a power law T(N) ~ a N^b. For your
answer, enter the constant b. Your answer will be marked as correct
if it is within 1% of the target answer - we recommend using
two digits after the decimal separator, e.g., 2.34.
Can someone explain how to calculate this?
Well, it is a simple mathematical problem.
I : a*387420489^b = 2287.85 -> a = 387420489^b/2287.85
II: a*43046721^b = 74.99 -> a = 43046721^b/74.99
III: (I and II)-> 387420489^b/2287.85 = 43046721^b/74.99 ->
-> http://www.purplemath.com/modules/solvexpo2.htm
Use logarithms to solve.
1.You should calculate the ratio of the growth change from one row to the one next
N seconds
--------------------
14348907 13.58
43046721 74.99
129140163 414.2
387420489 2287.85
2.Calculate the change's ratio for N
43046721 / 14348907 = 3
129140163 / 43046721 = 3
therefore the rate of change for N is 3.
3.Calculate the change's ratio for seconds
74.99 / 13.58 = 5.52
Now let check the ratio between one more pare of rows to be sure
414.2 / 74.99 = 5.52
so the change's ratio for seconds is 5.52
4.Build the following equitation
3^b = 5.52
b = 1.55
Finally we get that the order of growth of the running time is 1.55.

about number of bits required for Fibonacci number

I am reading a algorithms book by S.DasGupta. Following is text snippet from the text regarding number of bits required for nth Fibonacci number.
It is reasonable to treat addition as
a single computer step if small
numbers are being added, 32-bit
numbers say. But the nth Fibonacci
number is about
0.694n bits long, and this can far exceed 32 as n grows. Arithmetic
operations on arbitrarily large
numbers cannot possibly be performed
in a single, constant-time step.
My question is for eg, for Fibonacci number F1 = 1, F2 =1, F3=2, and so on. then substituting "n" in above formula i.e., 0.694n for F1 is approximately 1, F2 is approximately 2 bits, but for F3 and so on above formula fails. I think i didn't understand propely what author mean here, can any one please help me in understanding this?
Thanks
Well,
n 3 4 5 6 7 8
0.694n 2.08 2.78 3.47 4.16 4.86 5.55
F(n) 2 3 5 8 13 21
bits 2 2 3 4 4 5
log(F(n)) 1 1.58 2.32 3 3.7 4.39
Bits required is the base-2 log rounded up, so this is close enough for me.
The value 0.694 comes from the fact that F(n) is the closest integer to (φn)/√5. So log(F(n)) is n * log(phi) - log(sqrt(5)), and log(phi) is 0.694. As n gets bigger, the log(sqrt(5)) and the rounding rapidly become insignificant.
private static int nobFib(int n) // number of bits Fib(n)
{
return n < 6 ? ++n/2 : (int)(0.69424191363061738 * n - 0.1609640474436813);
}
Checked it for n from 0 to 500.000, n=500.000.000, n=1.000.000.000
It's based on Binet's formula.
Needed it for: Fibonacci Sequence Binary Plot.
See: http://bigintegers.blogspot.com/2012/09/fibonacci-sequence-binary-plot-edd-peg.html
First of all, the word about is very important, as in the nth Fibonacci number is about 0.694n bits long. Second, I think the author means when n->infinity. Try some big number and check :)
you cant have say half a bit... the amount of bits must be rounded
so it means
number of bits = Math.ceil(Math.max(0.694*n,32));
so its rounded up for n>32 and 32 for n<32
for 32bit systems that is
and the number may not be exact
I think he's just using the Fibonacci numbers to illustrate his point that for large numbers (>32 bit) addition cannot be assumed to be constant anymore because it involves more than a singe instruction on the CPU.
Why does the formula fail? For F3=2 the binary representation needs 2bits (3 * 0.694 = 2.082) Take F50=12586269025, which can be represented using 33bits (50 * 0.694 = 35) which is still reasonably close to the true value.
N F(N) 0.694*N
1 0 1
2 1 1
3 1 1
4 2 2
5 3 2
6 5 3
7 8 4
8 13 4
etc. That's my interpretation. But then, that means that you have to get to f(47) = 1,836,311,903 before you exceed 32 bits.
The author is basically describing how large numbers affect the performance of the algorithm. To be overly simple, a processor can add numbers of the register size very quickly, if the numbers exceed the register size, more low level processor instructions need to be executed.

Resources