Situation
After working on a coding kata I finally got the algorithm to work on my small test cases.
Only to find out it did not work on a large scale, the time is not an issue but the size of the numbers are.
In one of my calculations in one of the test cases I need to perform the following calculation.
var numberOfColumns = 34359738368;
var numberOfRows = 28827050410;
var valueOverflow = 13719506;
var totalOfSingleRow = (numberOfColumns * (numberOfColumns - 1))/2;
var totalGridValue = totalOfSingleRow * numberOfRows;
var result = (totalOfSingleRow * totalGridValue) % valueOverflow;
Because the top row is sequential, I can calculate the sum of the first row by doing (numberOfColumns * (numberOfColumns - 1))/2;.
Then I need to multiply that answer by the number of rows and apply the modulo to get my resulting value.
The problem
The problem is that Javascript can only calculate reliable with number less than 9007199254740991.
Only the calculation above results in a totalGridValue of 17016487081526963049249353236480.
You can imagine that my calculation does not result in the desired value of 10552574 because the value gets truncated to 1.7016487081526963e+31.
This results in the wrong value of 8479672
Question
How can I alter my calculation so the result becomes the desired 10552574.
I've tried applying the modulo operator sooner on the numberOfColumns without the desired result.
I've also looked at adding two large values as string but this process would become to slow as I have to add two strings to many times.
Note
Because I need to submit this on codewars, I cannot use any external libraries!
I can use other languages though, but I know it is possible in javascript.
I think you were on the right track with moving the modulus operation further up the chain, but I wouldn't use the modulo operator for that. Instead, use regular floating point division, and carry that value through to the last step, when all other calculations are done, then convert that float's decimal part into an integer. Basically, with pure division, you're just doing a transformation on the value, rather than changing the value. Once you go to the modulus, you've changed the value. (I'd also change the totalSingleRow formula to divide the big values then multiply the results, rather than multiply then divide.)
I agree that moving the modulus operation further up the chain is the correct basic idea. The one subtlety is handling the calculation of totalOfSingleRow, because it involves a division by two. I think what you need to do is perform that stage modulo (2 * valueOverflow), and only compute the result modulo valueOverflow at the end. Perhaps as follows:
var doubleOverflow = 2 * valueOverflow;
var totalOfSingleRow = (((numberOfColumns % doubleOverflow) * (numberOfColumns + doubleOverflow - 1)) / 2) % valueOverflow;
var totalGridValue = (totalOfSingleRow * (numberOfRows % valueOverflow)) % valueOverflow;
var result = (totalOfSingleRow * totalGridValue) % valueOverflow;
You'll need to check that (4 * valueOverflow*valueOverflow) is less than the maximum integer value that can be represented correctly.
Related
For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
ind(isnan(minval))=NaN;
sampleNo = ind;
sampleNo(minval>delta) = NaN;
end
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
end
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.
I'm writing a matlab code which uses digits of an irrational number. I tried finding it using a taylor expansion of $\sqrt(1+x)$. Since division to large numbers could be a bad idea for Matlab, this method seems to me not a good one.
I wonder if there is any simpler and efficient method to do this?
If you have the Symbolic Toolbox, vpa does that. You can specify the number of significant digits you want:
x = '2'; %// define x as a *string*. This avoids loss of precision
n = 100; %// desired number of *significant* digits
result = vpa(['sqrt(' x ')'], n);
The result is a symbolic variable. If needed, convert to a string:
result = char(result);
In the example above,
result =
1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641573
Note that this is subject to rounding. For example, the result with n = 7 is 1.414214 instead of 1.414213.
In newer Matlab versions (tested on R2017b), using a char input with vpa is discouraged, and support for this may be removed in the future. The recommended approach is to first define the variable as symbolic, and then apply the required operations to it:
x = sym(2);
n = 100;
result = vpa(sqrt(x), n);
It seems you need a method of digit-by-digit root calculation that was discovered long before computer era.
To make sure that this is not a duplicate, I have already checked this and this out.
I want to generate random numbers in a specific range including step size (not continuous distribution).
For example, I want to generate random numbers between -2 and 3 in which the step between two consecutive numbers is 0.02. (e.g. [-2 -1.98 -1.96 ... 2.69 2.98 3] so a generated number should be 2.96 not 2.95).
I have tried this:
a=-2*100;
b=3*100;
r = (b-a).*rand(5,1) + a;
for i=1:length(r)
if r(i) >= 0
if mod(fix(r(i)),2)
r(i)=ceil(r(i))/100;
else
r(i)=floor(r(i))/100;
end
else
if mod(fix(r(i)),2)
r(i)=floor(r(i))/100;
else
r(i)=ceil(r(i))/100;
end
end
end
and it works.
there is an alternative way to do this in MATLAB which is :
y = datasample(-2:0.02:3,5,'Replace',false)
I want to know:
How can I make my own implementation faster (improve the
performance)?
If the second method is faster (it looks faster to me), how can I
use similar implementation in C++?
Those previous answers do cover your case if you read carefully. For example, this one produces random numbers between limits with a step size of one. But let's generalize this to an arbitrary step size in case you can't figure out how to get there. There are several different ways. Here's one using randi where we use the default step size of one and the range from one to the number possible values as indices:
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randi(length(v),[5 1]))
If you look inside datasample (type edit datasample in your command window to view the code) you'll see that it's doing something very similar to this answer. In the case of the 'Replace' option being true see around line 135 (in R2013a at least).
If the 'Replace' option is false, as in your use of datasample above, then randperm actually needs to be used instead (see around line 159):
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randperm(length(v),51))
Because there is no replacement in this case, 51 is the maximum number of values that can be requested in a call and all values of r will be unique.
In C++ you should not use rand() if you're doing scientific computing and generating large numbers of random variates. Instead you should use a large period random number generator such as Mersenne Twister (the default in Matlab). C++11 includes a version of this generator as part of . More here in rand(). If you want something fast, you should try the Double precision SIMD-oriented Fast Mersenne Twister. You'll have to ask another question if you want to implement your code in C++.
The distribution you want is a simple transform of integers, so how about:
step = 0.02
r = randi([-2 3] / step, [5, 1]) * step;
In C++, rand() generates integers too, so it should be pretty obvious how to take a similar approach there.
Lets have matrix A say A = magic(100);. I have seen 2 ways of computing sum of all elements of matrix A.
sumOfA = sum(sum(A));
Or
sumOfA = sum(A(:));
Is one of them faster (or better practise) then other? If so which one is it? Or are they both equally fast?
It seems that you can't make up your mind about whether performance or floating point accuracy is more important.
If floating point accuracy were of paramount accuracy, then you would segregate the positive and negative elements, sorting each segment. Then sum in order of increasing absolute value. Yeah, I know, its more work than anyone would do, and it probably will be a waste of time.
Instead, use adequate precision such that any errors made will be irrelevant. Use good numerical practices about tests, etc, such that there are no problems generated.
As far as the time goes, for an NxM array,
sum(A(:)) will require N*M-1 additions.
sum(sum(A)) will require (N-1)*M + M-1 = N*M-1 additions.
Either method requires the same number of adds, so for a large array, even if the interpreter is not smart enough to recognize that they are both the same op, who cares?
It is simply not an issue. Don't make a mountain out of a mole hill to worry about this.
Edit: in response to Amro's comment about the errors for one method over the other, there is little you can control. The additions will be done in a different order, but there is no assurance about which sequence will be better.
A = randn(1000);
format long g
The two solutions are quite close. In fact, compared to eps, the difference is barely significant.
sum(A(:))
ans =
945.760668102446
sum(sum(A))
ans =
945.760668102449
sum(sum(A)) - sum(A(:))
ans =
2.72848410531878e-12
eps(sum(A(:)))
ans =
1.13686837721616e-13
Suppose you choose the segregate and sort trick I mentioned. See that the negative and positive parts will be large enough that there will be a loss of precision.
sum(sort(A(A<0),'descend'))
ans =
-398276.24754782
sum(sort(A(A<0),'descend')) + sum(sort(A(A>=0),'ascend'))
ans =
945.7606681037
So you really would need to accumulate the pieces in a higher precision array anyway. We might try this:
[~,tags] = sort(abs(A(:)));
sum(A(tags))
ans =
945.760668102446
An interesting problem arises even in these tests. Will there be an issue because the tests are done on a random (normal) array? Essentially, we can view sum(A(:)) as a random walk, a drunkard's walk. But consider sum(sum(A)). Each element of sum(A) (i.e., the internal sum) is itself a sum of 1000 normal deviates. Look at a few of them:
sum(A)
ans =
Columns 1 through 6
-32.6319600960983 36.8984589766173 38.2749084367497 27.3297721091922 30.5600109446534 -59.039228262402
Columns 7 through 12
3.82231962760523 4.11017616179294 -68.1497901792032 35.4196443983385 7.05786623564426 -27.1215387236418
Columns 13 through 18
When we add them up, there will be a loss of precision. So potentially, the operation as sum(A(:)) might be slightly more accurate. Is it so? What if we use a higher precision for the accumulation? So first, I'll form the sum down the columns using doubles, then convert to 25 digits of decimal precision, and sum the rows. (I've displayed only 20 digits here, leaving 5 digits hidden as guard digits.)
sum(hpf(sum(A)))
ans =
945.76066810244807408
Or, instead, convert immediately to 25 digits of precision, then summing the result.
sum(hpf(A(:))
945.76066810244749807
So both forms in double precision were equally wrong here, in opposite directions. In the end, this is all moot, since any of the alternatives I've shown are far more time consuming compared to the simple variations sum(A(:)) or sum(sum(A)). Just pick one of them and don't worry.
Performance-wise, I'd say both are very similar (assuming a recent MATLAB version). Here is quick test using the TIMEIT function:
function sumTest()
M = randn(5000);
timeit( #() func1(M) )
timeit( #() func2(M) )
end
function v = func1(A)
v = sum(A(:));
end
function v = func2(A)
v = sum(sum(A));
end
the results were:
>> sumTest
ans =
0.0020917
ans =
0.0017159
What I would worry about is floating-point issues. Example:
>> M = randn(1000);
>> abs( sum(M(:)) - sum(sum(M)) )
ans =
3.9108e-11
Error magnitude increases for larger matrices
i think a simple way to understand is apply " tic_ toc "function in first and last of your code.
tic
A = randn(5000);
format long g
sum(A(:));
toc
but when you used randn function ,elements of it are random and time of calculation can
different in each cycle CPU calculation .
This better you used a unique matrix whit so large elements to compare time of calculation.
I'm working on new data type for arbitrary length numbers (only non-negative integers) and I got stuck at implementing square root and exponentiation functions (only for natural exponents). Please help.
I store the arbitrary length number as a string, so all operations are made char by char.
Please don't include advices to use different (existing) library or other way to store the number than string. It's meant to be a programming exercise, not a real-world application, so optimization and performance are not so necessary.
If you include code in your answer, I would prefer it to be in either pseudo-code or in C++. The important thing is the algorithm, not the implementation itself.
Thanks for the help.
Square root: Babylonian method. I.e.
function sqrt(N):
oldguess = -1
guess = 1
while abs(guess-oldguess) > 1:
oldguess = guess
guess = (guess + N/guess) / 2
return guess
Exponentiation: by squaring.
function exp(base, pow):
result = 1
bits = toBinary(powr)
for bit in bits:
result = result * result
if (bit):
result = result * base
return result
where toBinary returns a list/array of 1s and 0s, MSB first, for instance as implemented by this Python function:
def toBinary(x):
return map(lambda b: 1 if b == '1' else 0, bin(x)[2:])
Note that if your implementation is done using binary numbers, this can be implemented using bitwise operations without needing any extra memory. If using decimal, then you will need the extra to store the binary encoding.
However, there is a decimal version of the algorithm, which looks something like this:
function exp(base, pow):
lookup = [1, base, base*base, base*base*base, ...] #...up to base^9
#The above line can be optimised using exp-by-squaring if desired
result = 1
digits = toDecimal(powr)
for digit in digits:
result = result * result * lookup[digit]
return result
Exponentiation is trivially implemented with multiplication - the most basic implementation is just a loop,
result = 1;
for (int i = 0; i < power; ++i) result *= base;
You can (and should) implement a better version using squaring with divide & conquer - i.e. a^5 = a^4 * a = (a^2)^2 * a.
Square root can be found using Newton's method - you have to get an initial guess (a good one is to take a square root from the highest digit, and to multiply that by base of the digits raised to half of the original number's length), and then to refine it using division: if a is an approximation to sqrt(x), then a better approximation is (a + x / a) / 2. You should stop when the next approximation is equal to the previous one, or to x / a.