There are multiple methods for find the average of a set of numbers.
First, the sum / count quotient. Add all values and divide them by the number of values.
Second, the moving average. The function I found in another Stack answer is:
New average = old average * (n-1)/n + new value /n
This works as long as each value is added to the average one value at a time.
My concern is that the second method is more calculatively complex for my processor to execute, but I also fear that the first method will result in a loss of resolution for data sets that result in large sums. In a 32 bit system, for example, the resolution of a float value stored is reduced automatically as the magnitude of the number grows.
Does a moving average preserve resolution?
"moving average" does not calculate average over large interval.
It smooth data in such way that newer measurements give larger impact, and older measurements weight becomes smaller and smaller.
If you bother about large sums and want to preserve all possible data "bits", consider special methods like Kahan summation algorithm
I need an algorithm, or technique, or any guidance to optimize the following problem:
I have two companies:
Company A has 324 employees
Company B has 190 employees
The total of employees (A+B) is 514. I need to randomly select 28% of these 514 employees.
Ok, so let's do it: 28% of 514 is 143.92; Oh... this is bad, we are dealing with people here, so we cannot have decimal places. Ok then, I'll try rounding that up or down.
If I round down: 143 is 27,82101167% which is not good, since I must have at least 28%, so I must round up to 144.
So now I know that 144 employees must be selected.
The main problem comes now... It's time to check how much percentage I must use for each company to get the total of 144. How do I do that in order to have the percentage as close as possible to 28% for each company?
I'll exemplify:
If I just apply 28% for each company I get:
Company A has 324 employers: 0.28 * 324 = 90.72
Company B has 190 employers: 0.28 * 190 = 53.2
Again, I end up with decimal places. So I must figure out which ones I should round up, and which ones should I round down to get 144 total.
Note: For this example I only used two companies, but in the real problem I have 30 companies.
There are many methods to perform apportionment, and no objective best method.
The following is in terms of states and seats rather than companies and people. Credit probably goes to Dr. Larry Bowen who is cited on the base site for the first link.
Hamilton’s Method
Also known as the Method of Largest Remainders and sometimes as Vinton's Method.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign each state its Lower Quota.
If there are surplus seats, give them, one at a time, to states in descending order of the fractional parts of their Standard Quota.
Here, the Standard Divisor can be found by dividing the total population (the sum of the population of each company) by the number of people you want to sample (144 in this case). The Standard Quota is the company's population divided by the Standard Divisor. The Lower Quota is this value rounded down. However, this method has some flaws.
Problems:
The Alabama Paradox An increase in the total number of seats to be apportioned causes a state to lose a seat.
The Population Paradox An increase in a state’s population can cause it to lose a seat.
The New States Paradox Adding a new state with its fair share of seats can affect the number of seats due other states.
This is probably the most simple method to implement. Below are some other methods with their accompanying implementations and drawbacks.
Jefferson’s Method Also known as the Method of Greatest Divisors and in Europe as the Method of d'Hondt or the Hagenbach-Bischoff Method.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign each state its Lower Quota.
Check to see if the sum of the Lower Quotas is equal to the correct number of seats to be apportioned.
If the sum of the Lower Quotas is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Lower Quota.
If the sum of the Lower Quotas is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded DOWN, the sum of all the rounded (down) Modified Quotas is the exact number of seats to be apportioned. (Note: The MD will always be smaller than the Standard Divisor.) These rounded (down) Modified Quotas are sometimes called Modified Lower Quotas. Apportion each state its Modified Lower Quota.
Problem:
Violates the Quota Rule. (However, it can only violate Upper Quota—never Lower Quota.)
Webster’s Method Also known as the Webster-Willcox Method as well as the Method of Major Fractions.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign a state its Lower Quota if the fractional part of its Standard Quota is less than 0.5. Initially assign a state its Upper Quota if the fractional part of its Standard Quota is greater than or equal to 0.5. [In other words, round down or up based on the arithmetic mean (average).]
Check to see if the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned.
If the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Quota (Lower or Upper from Step 3).
If the sum of the Quotas (Lower and/or Upper from Step 3) is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded based on the arithmetic mean (average) , the sum of all the rounded Modified Quotas is the exact number of seats to be apportioned. Apportion each state its Modified Rounded Quota.
Problem:
Violates the Quota Rule. (However, violations are rare and are usually associated with contrived situations.)
Huntington-Hill Method Also known as the Method of Equal Proportions.
Current method used to apportion U.S. House
Developed around 1911 by Joseph A. Hill, Chief Statistician of the Bureau of the Census and Edward V. Huntington, Professor of Mechanics & Mathematics, Harvard
Preliminary terminology: The Geometric Mean
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign a state its Lower Quota if the fractional part of its Standard Quota is less than the Geometric Mean of the two whole numbers that the Standard Quota is immediately between (for example, 16.47 is immediately between 16 and 17). Initially assign a state its Upper Quota if the fractional part of its Standard Quota is greater than or equal to the Geometric Mean of the two whole numbers that the Standard Quota is immediately between (for example, 16.47 is immediately between 16 and 17). [In other words, round down or up based on the geometric mean.]
Check to see if the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned.
If the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Quota (Lower or Upper from Step 3).
If the sum of the Quotas (Lower and/or Upper from Step 3) is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded based on the geometric mean, the sum of all the rounded Modified Quotas is the exact number of seats to be apportioned. Apportion each state its Modified Rounded Quota.
Problem:
Violates the Quota Rule.
For reference, the Quota Rule :
Quota Rule
An apportionment method that always allocates only lower and/or upper bounds follows the quota rule.
The problem can be framed as that of finding the closest integer approximation to a set of ratios. For instance, if you want to assign respectively A, B, C ≥ 0 members from 3 groups to match the ratios a, b, c ≥ 0 (with a + b + c = N > 0), where N = A + B + C > 0 is the total allocation desired, then you're approximating (a, b, c) by (A, B, C) with A, B and C restricted to integers.
One way to solve this may be to set it up as a least squares problem - that of minimizing |a - A|² + |b - B|² + |c - C|²; subject to the constraints A + B + C = N and A, B, C ≥ 0.
A necessary condition for the optimum is that it be a local optimum with respect discrete unit changes. For instance, (A,B,C) → (A+1,B-1,C), if B > 0 ... which entails the condition (A - B ≥ a - b - 1 or B = 0).
For the situation at hand, the optimization problem is:
|A - a|² + |B - b|²
a = 144×324/(324+190) ≅ 90.770, b = 144×190/(324+190) ≅ 53.230
which leads to the conditions:
A - B ≥ a - b - 1 ≅ +36.541 or B = 0
B - A ≥ b - a - 1 ≅ -38.541 or A = 0
A + B = 144
Since they are integers the inequalities can be strengthened:
A - B ≥ +37 or B = 0
B - A ≥ -38 or A = 0
A + B = 144
The boundary cases A = 0 and B = 0 are ruled out, since they don't satisfy all three conditions. So, you're left with 37 ≤ A - B ≤ 38 or, since A + B = 144: 181 ≤ 2A ≤ 182 or A = 91 ... and B = 53.
It is quite possible that this way of framing the problem may be equivalent, in terms of its results, to one of the algorithms cited in an earlier reply.
My suggestion is to just take 28% of each company and round up to the nearest person.
In your case, you would go with 91 and 54. Admittedly, this does result in having a bit over 28%.
The most accurate method is as follows:
Calculate the exact number that you want.
Take 28% for each company and round down.
Sort the companies in descending order by the remainder.
Go through the list and choose the top n elements until you get exactly the number you want.
Since I originally posted this question I came across a description of this exact problem in Martin Fowler's book "Patterns of Enterprise Application Architecture" (page 489 and 490).
Martin talks about a "Matt Foemmel’s simple conundrum" of dividing 5 cents between two accounts, but must obey the distribution of 70% and 30%. This describes my problem in a much simpler way.
Here are the solutions he presents in his book to that problem:
Perhaps the most common is to ignore it—after all, it’s only a penny here
and there. However this tends to make accountants understandably nervous.
When allocating you always do the last allocation by subtracting from
what you’ve allocated so far. This avoids losing pennies, but you can get a
cumulative amount of pennies on the last allocation.
Allow users of a Money class to declare the rounding scheme when they call
the method. This permits a programmer to say that the 70% case rounds up and
the 30% rounds down. Things can get complicated when you allocate across ten
accounts instead of two. You also have to remember to round. To encourage people to remember I’ve seen some Money classes force a rounding parameter into
the multiply operation. Not only does this force the programmer to think about
what rounding she needs, it also might remind her of the tests to write. However,
it gets messy if you have a lot of tax calculations that all round the same way.
My favorite solution: have an allocator function on the money. The
parameter to the allocator is a list of numbers, representing the ratio to be allocated (it would look something like aMoney.allocate([7,3])). The allocator returns
a list of monies, guaranteeing that no pennies get dropped by scattering them
across the allocated monies in a way that looks pseudo-random from the outside. The allocator has faults: You have to remember to use it and any precise
rules about where the pennies go are difficult to enforce.
If I am given the total number of occurrences of an event over the last hour, and I can get this data at arbitrary times ( but at least once an hour ), how can I work out the total number of occurrences over a 24 hour period?
Obviously, you can't. For example -- if the first two observations overlap then it would be impossible to determine the number of kills during the overlap. If there is a time gap between the first two observations then there is no way to determine what happened during the gap. You could try to set up a system of equations -- but the resulting system will be underdetermined (but it could give you both a min and a max, which might be relevant).
Why not adopt a statistical approach? Let X = kills over a 1 hour period. This is a random variable. Estimate its expected value by sampling it at randomly chosen times and multiply your estimate by 24.
What maximum and minimum number this MATLAB function normrnd(mu,sigma) can output? If the mean = 0 and S.D =2? and what happen if we increase the S.D in this fucntion.
Ideally, there are no such maximum and minimum. A normal (Gaussian) pdf has infinite support, so it can produce any value, no matter how high or low, with positive probability. Of course, exceeding a value x is less probable as x grows; but the probability is never 0.
In reality, Matlab cannot represent values with absolute value greater than realmax (about 10^308). But that's a very large number, and the probability of reaching a value close to that is very small.
The S.D is a scale factor of the distribution. A greater S.D. tends to produce random numbers with larger absolute value. You can think about it this way: you generate a number according to a standard_ normal distribution (0 mean, 1 S.D.), and then you multiply times the actual S.D. and add the actual mean.
I need to calculate APR for loans. Constant repayment loans were covered clearly here Calculating annual percentage rate (need some help with inherited code)
My problem is where the repayment amount is not constant. The monthly repayments can differ and therefore Newton-Raphson does not seem applicable.
The formula is still 0 = loan amount - sum[Rp/(1+x)^p] where Rp is the repayment amount for repayment p. There are n repayments. Is there a way to solve this or is there a good way to make a good second guess to x based on the results of previous guesses?
It sounds like you're given the Rp values and want to calculate x. You can just use Newton-Raphson as before - the question you linked to showed you how to do that.
For this one, you just need to change your F(x) and F'(x) functions.
F(x) = loan amount - sum[Rp/(1+x)^p]
You'll have to write code with a little loop in it to do the sum.
F'(x) = + sum[-p*Rp/(1+x)^(p+1)]
A little loop there and you're set.