sum Vs simpleSum vs sumCompensation in DoubleSummaryStatistics? What is significant having them? - java-8

How is use of having these three things.
public class DoubleSummaryStatistics
implements DoubleConsumer
{
private long count;
private double sum;
private double sumCompensation;
private double simpleSum;
private double min = Double.POSITIVE_INFINITY;
private double max = Double.NEGATIVE_INFINITY;
}

sum and sumCompensation are used to reduce the error of regular floating point summation.
simpleSum contains the simple sum (obtained by applying simpleSum += value; on each added value), and is used for non-finite sums.
The implementation details explain that:
private double sum;
private double sumCompensation; // Low order bits of sum
private double simpleSum; // Used to compute right sum for non-finite inputs
This is how sum and sumCompensation are computed:
/**
* Incorporate a new double value using Kahan summation /
* compensated summation.
*/
private void sumWithCompensation(double value) {
double tmp = value - sumCompensation;
double velvel = sum + tmp; // Little wolf of rounding error
sumCompensation = (velvel - sum) - tmp;
sum = velvel;
}
You can see how they are used in getSum():
public final double getSum() {
// Better error bounds to add both terms as the final sum
double tmp = sum + sumCompensation;
if (Double.isNaN(tmp) && Double.isInfinite(simpleSum))
// If the compensated sum is spuriously NaN from
// accumulating one or more same-signed infinite values,
// return the correctly-signed infinity stored in
// simpleSum.
return simpleSum;
else
return tmp;
}
The Javadoc of getSum() doesn't require any specific implementation, but allows for implementations that reduce the error bound:
The value of a floating-point sum is a function both of the input values as well as the order of addition operations. The order of addition operations of this method is intentionally not defined to allow for implementation flexibility to improve the speed and accuracy of the computed result. In particular, this method may be implemented using compensated summation or other technique to reduce the error bound in the numerical sum compared to a simple summation of double values.

Related

How to Determine How many cycles it would take to get a given value to 0 by decreasing it by a given amount

This is my problem, i need to take double onHand and reduce it by the double consume, then determine how many cycles it would take to reach 0. then use Math.Round 3 to round it to 3 decimal points.
public static int Test4(double onHand, double consume)
{
int answer = 1;
for (int i = (int)(consume); i > onHand; i--)
{
answer -= (int)onHand;
}
return answer;
}
I tried creating multiple variables like introducing decimals, casting the doubles into floats and ints but i can only get to the point where my answer outputs the int of the onHand.
You just have to make a division.
Assuming onHand is 3.02 and consume is 0.24, you divide them like onHand / consume and that will result in 12.583333. You will have to ceil or round-up the value (13). That is the number of times it'll go trough the loop to reach or pass 0.
Example
public static int Test4(double onHand, double consume)
{
double answer = (decimal)onHand / (decimal)consume;
return (int) Math.Ceiling(answer);
}
I'm no expert on C# so I don't know if the casting is neccesary.
public static int Test4(double onHand, double consume)
{
int answer = 1;
for (int i = 0; i < consume; i++)
{
answer = (int)((decimal)onHand / (decimal)consume);
}
return answer;
}

Can this function be refactored to be O(1)

I have this function which is used to calculate a value with diminishing returns. It counts how often an ever increasing value can be subtracted from the input value and returns the number of subtractions. It is currently implemented iteratively with an infinite loop:
// inputValue is our parameter. It is manipulated in the method body.
// step counts how many subtractions have been performed so far. It is also our returned value.
// loss is the value that is being subtracted from the inputValue at each step. It grows polynomially with each step.
public int calculate(int inputValue) {
for (int step = 1; true; step++) {// infinite java for-each loop
int loss = (int) (1 + 0.0006 * step*step + 0.2 * step);
if (inputValue > loss) {
inputValue -= loss;
} else {
return step;
}
}
}
This function is used in various places within the larger application and sometimes in performance critical code. I would prefer it to be refactored in a way which does not require the loop anymore.
I am fairly sure that it is possible to somehow calculate the result more directly. But my mathematical skills seem to be insufficient to do this.
Can anybody show me a function which produces identical results without the need for a loop or recursion? It is OK if the refactored code may produce different results for extreme values and corner cases. Negative inputs need not be considered.
Thank you all in advance.
I don't think you can make the code faster preserving the exact logic. Particularly you have some hard to emulate rounding at
int loss = (int) (1 + 0.0006 * step*step + 0.2 * step);
If this is a requirement of your business logic rather than a bug, I don't think you can do significantly better. On the other hand if what you really want is something like (from the syntax I assumed you use Java):
public static int calculate_double(int inputValue) {
double value = inputValue;
for (int step = 1; true; step++) {// infinite java for-each loop
double loss = (1 + 0.0006 * step * step + 0.2 * step); // no rounding!
if (value > loss) {
value -= loss;
} else {
return step;
}
}
}
I.e. the same logic but without a rounding at every step, then there are some hopes.
Note: unfortunately this rounding does make a difference. For example, according to my test the output of calculate and calculate_double are slightly different for every inputValue in the range of [4, 46465] (sometimes even more than by +1, for example for inputValue = 1000 it is calculate = 90 vs calculate_double = 88). For bigger inputValue the results are more consistent. For example for the result of 519/520 the range of difference is only [55294, 55547]. Still for every results there is some range of different results.
First of all, the sum of loss in the case of no rounding for a given max step (let's call it n) has a closed formula:
sum(n) = n + 0.0006*n*(n+1)*(2n+1)/6 + 0.2*n*(n+1)/2
So theoretically finding such n so that sum(n) < inputValue < sum(n+1) can by done by solving the cubic equation sum(x) = inputValue which has a closed formula and then checking values like floor(x) and ceil(x). However the math behind this is a bit complicated so I didn't went that route.
Please also note that since int has a limited range, theoretically even your implementation of the algorithm is O(1) (because it will never take more steps than to compute calculate(Integer.MAX_VALUE) which is a constant). So probably what you really want is just a significant speed up.
Unfortunately the coefficients 0.0006 and 0.2 are small enough to make different summands the dominant part of the sum for different n. Still you can use binary search for a much better performance:
static int sum(int step) {
// n + 0.2 * n*(n+1)/2 + 0.0006 * n*(n+1)*(2n+1)/6
// ((0.0001*(2n+1) + 0.1) * (n+1) + 1) * n
double s = ((0.0001 * (2 * step + 1) + 0.1) * (step + 1) + 1) * step;
return (int) s;
}
static int calc_bin_search2(int inputValue) {
int left = 0;
// inputValue / 2 is a safe estimate, the answer for 100 is 27 or 28
int right = inputValue < 100 ? inputValue : inputValue / 2;
// for big inputValue reduce right more aggressively before starting the binary search
if (inputValue > 1000) {
while (true) {
int test = right / 8;
int tv = sum(test);
if (tv > inputValue)
right = test;
else {
left = test;
break;
}
}
}
// just usual binary search
while (true) {
int mid = (left + right) / 2;
int mv = sum(mid);
if (mv == inputValue)
return mid;
else if (mid == left) {
return mid + 1;
} else if (mv < inputValue)
left = mid;
else
right = mid;
}
}
Note: the return mid + 1 is the copy of your original logic that returns one step after the last loss was subtracted.
In my tests this implementation matches the output of calculate_double and has roughly the same performance for inputValue under 1000, is x50 faster for values around 1_000_000, and x200 faster for values around 1_000_000_000

Get percentage unique numbers generated by random generator

I am using random generator in my python code. I want to get the percentage of unique random numbers generated over a huge range like from random(0:10^8).I need to generate 10^12 numbers What could be the efficient algorithm in terms of space complexity?
the code is similar to :
import random
dif = {}
for i in range(0,1000):
rannum = random.randint(0,50)
dif[rannum] = "True"
dif_len = len(dif)
print dif_len
per = float(dif_len)/50
print per
You have to keep track of each number the generator generates or there is no way to know whether some new number has been seen before. What is the best way to do that? It depends on how many numbers you are going to examine. For small N, use a HashSet. At some large number of N it becomes more efficient to use a bitmap.
For small N...
public class Accumulator {
private int uniqueNumbers = 0;
private int totalAccumulated = 0;
private HashSet<int> set = new HashSet<int>();
public void Add(int i) {
if (!set.Contains(i)) {
set.Add(i);
uniqueNumbers++;
}
totalAccumulated++;
}
public double PercentUnique() {
return 100.0 * uniqueNumbers / totalAccumulated;
}
}

Sampling from a discrete distribution while accounting for past occurrences

I have a discrete distribution that I need to sample from. It is heavily skewed and an example is as follows
a - 1-40
b - 40-80
c - 80-85
d - 85-90
e - 90-95
f - 95-100
To currently sample from this distribution, I'm choosing a random number in the interval [1,100] and choosing the corresponding value.
However, I'd like to be certain that if I see one of [c,d,e,f] I don't see the exact same value being sampled for the next x samples. The context being that they are powerups in a game I'm building. I'd like the powerups to be random, but not hand out the same powerup to the player repeatedly.
Is there any method that incorporates past occurrences of samples into generating a value or do I have to repeatedly sample till I get a value I'd prefer?
One way to do this is to shuffle an array containing the values 1-100 and then iterate through as necessary.
For example, an implementation of this in Java:
public class PowerUpSelector
{
private int current;
private int x;
private int[] distribution;
public PowerUpSelector(int x)
{
this.x = x;
current = 0;
distribution = new int[100];
for (int i = 0; i < distribution.length; i++)
distribution[i] = i;
shuffle(distribution[i]); //implement fisher-yates shuffle here.
}
public int returnPowerUpID()
{
if (current >= x)
{
shuffle(distribution);
current = 0;
}
return distribution[current++];
}
}

Use class method as `op` argument to `accumulate`

I'm trying to create a class that calculates its variance from a vector<float>. It should do this by using its previously calculated this->mean in diffSquaredSum. I'm trying to call the method diffSquaredSum inside of accumulate but have no idea what the magical syntax is.
What is the correct syntax to use the diffSquaredSum class method as the op argument to accumulate in setVariance?
float diffSquaredSum(float sum, float f) {
// requires call to setMean
float diff = f - this->mean;
float diff_square = pow(diff,2);
return sum + diff_square;
}
void setVariance(vector<float>& values) {
size_t n = values.size();
double sum = accumulate(
values.begin(), values.end(), 0,
bind(this::diffSquaredSum));
this->variance = sum / n;
}
double sum = std::accumulate(
values.begin(),
values.end(),
0.f,
[&](float sum, float x){ return diffSquaredSum(sum,x);}
);
bind is only rarely useful. Prefer lambdas, they are easier to write and read.
You could instead get fancy with binding, but why?

Resources