Exponentiation on a point on elliptic curve unreasonably fast in SageMath - runtime

I am working on elliptic curves in sagemath. I was trying to collect benchmarks for group operation and exponentiation of points on NIST P-256 elliptic curve. When I tried to perform a group operation on 2 points on the curve, it takes roughly 2 micro seconds. When I tried to perform exponentiation on a point in elliptic curve with a random exponent, it takes only 3 micro seconds. How is this even possible? Since I am exponentiating with a 256 bit value, this should at least take time required for 256 group operations, which is more than 0.5ms. I am worried if my code is wrong!
p = 115792089210356248762697446949407573530086143415290314195533631308867097853951
order = 115792089210356248762697446949407573529996955224135760342422259061068512044369
b = 0x5ac635d8aa3a93e7b3ebbd55769886bc651d06b0cc53b0f63bce3c3e27d2604b
F = GF(p)
E = EllipticCurve(F, [-3,b])
runs = 10000
G = E.abelian_group()
F2 = GF(order)
exponent = [F2.random_element() for i in range(runs)]
e2 = [G.random_element() for i in range(runs)]
t1 = time()
for i in range(runs):
e = Integer(exponent[i])*e2[i]
t2 = time()
print "Time per operation = ", (t2 - t1)/runs , " seconds"
e1 = [G.random_element() for i in range(runs)]
e2 = [G.random_element() for i in range(runs)]
t1 = time()
for i in range(runs):
e = e1[i]+e2[i]
t2 = time()
print "Time per operation = ", (t2 - t1)/runs , " seconds"

Do not use E.abelian_group() if your goal is to time the elliptic curve scalar multiplication:
sage: P = G.random_element()
sage: P.parent()
Additive abelian group isomorphic to Z/115792089210356248762697446949407573529996955224135760342422259061068512044369 embedded in Abelian group of points on Elliptic Curve defined by y^2 = x^3 + 115792089210356248762697446949407573530086143415290314195533631308867097853948*x + 41058363725152142129326129780047268409114441015993725554835256314039467401291 over Finite Field of size 115792089210356248762697446949407573530086143415290314195533631308867097853951
sage: P.__class__
<class 'sage.groups.additive_abelian.additive_abelian_wrapper.AdditiveAbelianGroupWrapper_with_category.element_class'>
sage: Q = E.random_element()
sage: Q.parent()
Abelian group of points on Elliptic Curve defined by y^2 = x^3 + 115792089210356248762697446949407573530086143415290314195533631308867097853948*x + 41058363725152142129326129780047268409114441015993725554835256314039467401291 over Finite Field of size 115792089210356248762697446949407573530086143415290314195533631308867097853951
sage: Q.__class__
<class 'sage.schemes.elliptic_curves.ell_point.EllipticCurvePoint_finite_field'>
E.abelian_group() is a discrete log representation of E(𝔽_p): one (or more) generator for the group is chosen:
sage: G.gens()
((20722840521025081260844746572646324413063607211611059363846436737341554936251 : 92859506829486198345561119925850006904261672023969849576492780649068338418688 : 1),)
and points are represented as vectors of exponents:
sage: P.vector()
hence c*P simply multiplies the exponent by c and reduces modulo the order of the curve.
Use E.random_element() to get points of the curve and perform true elliptic curve operations:
sage: c = 2^100
sage: %timeit c*Q
100 loops, best of 3: 3.88 ms per loop
sage: c = 2^1000
sage: %timeit c*Q
10 loops, best of 3: 32.4 ms per loop
sage: c = 2^10000
sage: %timeit c*Q
1 loop, best of 3: 321 ms per loop


Calculate certainty of Monte Carlo simulation

Let's say that we use the Monte Carlo method to estimate the area of an object, in the exact same way you'd use it to estimate the value of π.
Now, let's say we want to calculate the certainty of our simulation result. We've cast n samples, m of which landed inside the object, so the area of the object is approximately m/n of the total sampled area. We would like to make a statement such as:
"We are 99% certain that the area of the object is between a1 and a2."
How can we calculate a1 and a2 above (given n, m, total area, and the desired certainty)?
Here is a program which attempts to estimate this bound numerically. Here the samples are points in [0,1), and the object is the segment [0.25,0.75). It prints a1 and a2 for 50%, 90%, and 99%, for a range of sample counts:
import std.algorithm;
import std.random;
import std.range;
import std.stdio;
void main()
foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
auto samples = new double[numSamples];
enum objectStart = 0.25;
enum objectEnd = 0.75;
enum numTotalSamples = 10_000_000;
auto numSizes = numTotalSamples / numSamples;
auto sizes = new double[numSizes];
foreach (ref size; sizes)
size_t numHits;
foreach (i; 0 .. numSamples)
auto sample = uniform01!double;
if (sample >= objectStart && sample < objectEnd)
size = 1.0 / numSamples * numHits;
writef("%d samples:", numSamples);
foreach (certainty; [50, 90, 99])
auto centerDist = numSizes * certainty / 100 / 2;
auto startPos = numSizes / 2 - centerDist;
auto endPos = numSizes / 2 + centerDist;
writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
(Run it online.) It outputs:
// 50% 90% 99%
100 samples: 0.47000..0.53000 0.42000..0.58000 0.37000..0.63000
200 samples: 0.47500..0.52500 0.44500..0.56000 0.41000..0.59000
300 samples: 0.48000..0.52000 0.45333..0.54667 0.42667..0.57333
400 samples: 0.48250..0.51750 0.46000..0.54250 0.43500..0.56500
500 samples: 0.48600..0.51600 0.46400..0.53800 0.44200..0.55800
600 samples: 0.48667..0.51333 0.46667..0.53333 0.44833..0.55167
700 samples: 0.48714..0.51286 0.46857..0.53143 0.45000..0.54857
800 samples: 0.48750..0.51250 0.47125..0.53000 0.45375..0.54625
900 samples: 0.48889..0.51111 0.47222..0.52667 0.45778..0.54111
1000 samples: 0.48900..0.51000 0.47400..0.52500 0.45800..0.53900
Is it possible to precisely calculate these numbers instead?
(Context: I'd like to add something like "±X.Y GB with 99% certainty" to btdu)
Ok, with question being language agnostic, here is the illustration how to do error estimation with Monte-Carlo.
Suppose, you want to compute integral
I = S01 f(x) dx
where f(x) is simple polynomial function
f(x) = xn
Here is the illustration of the calculations.
For that you have to compute not only mean value, but standard deviation as well.
Then, knowing that Monte Carlo error is going down as inverse square root of number of samples, computing confidence interval is simple
Code, Python 3.7, Windows 10 x64
import numpy as np
rng = np.random.default_rng()
N = 100000
n = 2
def f(x):
return np.power(x, n)
sample = f(rng.random(N)) # N samples of the function
m = np.mean(sample) # mean value of the sample, approaching integral value as N->∞
s = np.std(sample, ddof=1) # standard deviation with Bessel correction
e = s / np.sqrt(N) # Monte Carlo error decreases as inverse square root
t = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
#t = 3.00 # For 99.7% confidence interval, we should take 3 sigma, per Gaussian distribution
print(f'True integral value is {1.0/(1.0+n)}')
print(f'Computed integral value is in the range [{m-t*e}...{m+t*e}] with 99% confidence')
will print something like
True integral value is 0.3333333333333333
Computed integral value is in the range
[0.33141772204489295...0.3362795491124624] with 99% confidence
You could use Z-score table, line this one along the lines, to print table you want. You could vary N to get desired N dependency
zscore = {'50%': 0.674, '80%': 1.282, '90%': 1.645, '95%': 1.960, '98%': 2.326, '99%': 2.576, '99.7%': 3.0}
for c, z in zscore.items():
print(f'Computed integral value is in the range [{m-z*e}...{m+z*e}] with {c} confidence')
Based on Severin's answer, here is the code to calculate the values as stated in the question:
def calculate_error(n, m, z):
p = m / n
std_dev = (p * (1 - p)) ** 0.5 # Standard deviation of Bernoulli variable
error = std_dev / n ** 0.5 # Monte Carlo error decreases as inverse square root
return (mean - z * error, mean + z * error)
n = 1000
z = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
print(calculate_error(n, n * 0.5, z))

How to calculate the cost or number of operations in a matrix multiplication?

Suppose there are three matrixes named A(with dimension 10 X 30), B(with dimension 30 X 5), and C(with dimension 5 X 60).
(AB)C = (10×30×5) + (10×5×60) = 1500 + 3000 = 4500 operations
A(BC) = (30×5×60) + (10×30×60) = 9000 + 18000 = 27000 operations
How are they{(AB)C and A(BC)} calculated?
Please elaborate.
(AB)C represents two matrix multiplications:
X = AB
Y = XC
To calculate the number of operations for (AB)C, first calculate the number of operations for each of the two individual matrix multiplications:
X = AB = (10×30×5)
Y = XC = (10×5×60)
Then add them:
Y = (AB)C = (10×30×5) + (10×5×60)
This notation abuses the = sign to mean sometimes "matrix equals matrix" and sometimes "matrix requires that number of operations to calculate". I hope you are not getting confused by that.

Algorithm to evaluate best weights for weighted average

I have a data set of the form:
[9.1 5.6 7.4] => 8.5, [4.1 4.4 5.2] => 4.9, ... , x => y(x)
So x is a real vector of three elements and y is a scalar function.
I'm assuming a weighted average model of this data:
y(x) = (a * x[0] + b * x[1] + c * x[2]) / (a+b+c) + E(x)
where E is an unknown random error term.
I need an algorithm to find a,b,c, that minimizes total sum square error:
error = sum over all x of { E(x)^2 }
for a given data set.
Assume that the weights are normalized to sum to 1 (which happily is without loss of generality), then we can re-cast the problem with c = 1 - a - b, so we are actually solving for a and b.
With this we can write
error(a,b) = sum over all x { a x[0] + b x[1] + (1 - a - b) x[2] - y(x) }^2
Now it's just a question of taking the partial derivatives d_error/da and d_error/db and setting them to zero to find the minimum.
With some fiddling, you get a system of two equations in a and b.
C(X[0],X[0],X[2]) a + C(X[0],X[1],X[2]) b = C(X[0],Y,X[2])
C(X[1],X[0],X[2]) a + C(X[1],X[1],X[2]) b = C(X[1],Y,X[2])
The meaning of X[i] is the vector of all i'th components from the dataset x values.
The meaning of Y is the vector of all y(x) values.
The coefficient function C has the following meaning:
C(p, q, r) = sum over i { p[i] ( q[i] - r[i] ) }
I'll omit how to solve the 2x2 system unless this is a problem.
If we plug in the two-element data set you gave, we should get precise coefficients because you can always approximate two points perfectly with a line. So for example the first equation coefficients are:
C(X[0],X[0],X[2]) = 9.1(9.1 - 7.4) + 4.1(4.1 - 5.2) = 10.96
C(X[0],X[1],X[2]) = -19.66
C(X[0],Y,X[2]) = 8.78
Similarly for the second equation: 4.68 -13.6 4.84
Solving the 2x2 system produces: a = 0.42515, b = -0.20958. Therefore c = 0.78443.
Note that in this problem a negative coefficient results. There is nothing to guarantee they'll be positive, though "real" data sets may produce this result.
Indeed if you compute weighted averages with these coefficients, they are 8.5 and 4.9.
For fun I also tried this data set:
X[0] X[1] X[2] Y
0.018056028 9.70442075 9.368093544 6.360312244
8.138752835 5.181373099 3.824747424 5.423581239
6.296398214 4.74405298 9.837741509 7.714662742
5.177385358 1.241610571 5.028388255 4.491743107
4.251033792 8.261317658 7.415111851 6.430957844
4.720645386 1.0721718 2.187147908 2.815078796
1.941872069 1.108191586 6.24591771 3.994268819
4.220448549 9.931055481 4.435085917 5.233711923
9.398867623 2.799376317 7.982096264 7.612485261
4.971020963 1.578519218 0.462459906 2.248086465
I generated the Y values with 1/3 x[0] + 1/6 x[1] + 1/2 x[2] + E where E is a random number in [-0.1..+0.1]. If the algorithm is working correctly we'd expect to get roughly a = 1/3 and b = 1/6 from this result. Indeed we get a = .3472 and b = .1845.
OP has now said that his actual data are larger than 3-vectors. This method generalizes without much trouble. If the vectors are of length n, then you get an n-1 x n-1 system to solve.

Efficiently compute pairwise squared Euclidean distance in Matlab

Given two sets of d-dimensional points. How can I most efficiently compute the pairwise squared euclidean distance matrix in Matlab?
Set one is given by a (numA,d)-matrix A and set two is given by a (numB,d)-matrix B. The resulting distance matrix shall be of the format (numA,numB).
Example points:
d = 4; % dimension
numA = 100; % number of set 1 points
numB = 200; % number of set 2 points
A = rand(numA,d); % set 1 given as matrix A
B = rand(numB,d); % set 2 given as matrix B
The usually given answer here is based on bsxfun (cf. e.g. [1]). My proposed approach is based on matrix multiplication and turns out to be much faster than any comparable algorithm I could find:
helpA = zeros(numA,3*d);
helpB = zeros(numB,3*d);
for idx = 1:d
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 , B(:,idx), ones(numB,1)];
distMat = helpA * helpB';
Please note:
For constant d one can replace the for-loop by hardcoded implementations, e.g.
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,1), A(:,1).^2, ... % d == 2
ones(numA,1), -2*A(:,2), A(:,2).^2 ]; % etc.
%% create some points
d = 2; % dimension
numA = 20000;
numB = 20000;
A = rand(numA,d);
B = rand(numB,d);
%% pairwise distance matrix
% proposed method:
helpA = zeros(numA,3*d);
helpB = zeros(numB,3*d);
for idx = 1:d
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 , B(:,idx), ones(numB,1)];
distMat = helpA * helpB';
% compare to pdist2:
% compare to [1]:
% Another method: added 07/2014
% compare to ndgrid method (cf. Dan's comment)
[idxA,idxB] = ndgrid(1:numA,1:numB);
distMat = zeros(numA,numB);
distMat(:) = sum((A(idxA,:) - B(idxB,:)).^2,2);
Elapsed time is 1.796201 seconds.
Elapsed time is 5.653246 seconds.
Elapsed time is 3.551636 seconds.
Elapsed time is 22.461185 seconds.
For a more detailed evaluation w.r.t. dimension and number of data points follow the discussion below (#comments). It turns out that different algos should be preferred in different settings. In non time critical situations just use the pdist2 version.
Further development:
One can think of replacing the squared euclidean by any other metric based on the same principle:
help = zeros(numA,numB,d);
for idx = 1:d
help(:,:,idx) = [ones(numA,1), A(:,idx) ] * ...
[B(:,idx)' ; -ones(1,numB)];
distMat = sum(ANYFUNCTION(help),3);
Nevertheless, this is quite time consuming. It could be useful to replace for smaller d the 3-dimensional matrix help by d 2-dimensional matrices. Especially for d = 1 it provides a method to compute the pairwise difference by a simple matrix multiplication:
pairDiffs = [ones(numA,1), A ] * [B'; -ones(1,numB)];
Do you have any further ideas?
For squared Euclidean distance one can also use the following formula
||a-b||^2 = ||a||^2 + ||b||^2 - 2<a,b>
Where <a,b> is the dot product between a and b
nA = sum( A.^2, 2 ); %// norm of A's elements
nB = sum( B.^2, 2 ); %// norm of B's elements
distMat = bsxfun( #plus, nA, nB' ) - 2 * A * B' ;
Recently, I've been told that as of R2016b this method for computing square Euclidean distance is faster than accepted method.

Determining Floating Point Square Root

How do I determine the square root of a floating point number? Is the Newton-Raphson method a good way? I have no hardware square root either. I also have no hardware divide (but I have implemented floating point divide).
If possible, I would prefer to reduce the number of divides as much as possible since they are so expensive.
Also, what should be the initial guess to reduce the total number of iterations???
Thank you so much!
When you use Newton-Raphson to compute a square-root, you actually want to use the iteration to find the reciprocal square root (after which you can simply multiply by the input--with some care for rounding--to produce the square root).
More precisely: we use the function f(x) = x^-2 - n. Clearly, if f(x) = 0, then x = 1/sqrt(n). This gives rise to the newton iteration:
x_(i+1) = x_i - f(x_i)/f'(x_i)
= x_i - (x_i^-2 - n)/(-2x_i^-3)
= x_i + (x_i - nx_i^3)/2
= x_i*(3/2 - 1/2 nx_i^2)
Note that (unlike the iteration for the square root), this iteration for the reciprocal square root involves no divisions, so it is generally much more efficient.
I mentioned in your question on divide that you should look at existing soft-float libraries, rather than re-inventing the wheel. That advice applies here as well. This function has already been implemented in existing soft-float libraries.
Edit: the questioner seems to still be confused, so let's work an example: sqrt(612). 612 is 1.1953125 x 2^9 (or b1.0011001 x 2^9, if you prefer binary). Pull out the even portion of the exponent (9) to write the input as f * 2^(2m), where m is an integer and f is in the range [1,4). Then we will have:
sqrt(n) = sqrt(f * 2^2m) = sqrt(f)*2^m
applying this reduction to our example gives f = 1.1953125 * 2 = 2.390625 (b10.011001) and m = 4. Now do a newton-raphson iteration to find x = 1/sqrt(f), using a starting guess of 0.5 (as I noted in a comment, this guess converges for all f, but you can do significantly better using a linear approximation as an initial guess):
x_0 = 0.5
x_1 = x_0*(3/2 - 1/2 * 2.390625 * x_0^2)
= 0.6005859...
x_2 = x_1*(3/2 - 1/2 * 2.390625 * x_1^2)
= 0.6419342...
x_3 = 0.6467077...
x_4 = 0.6467616...
So even with a (relatively bad) initial guess, we get rapid convergence to the true value of 1/sqrt(f) = 0.6467616600226026.
Now we simply assemble the final result:
sqrt(f) = x_n * f = 1.5461646...
sqrt(n) = sqrt(f) * 2^m = 24.738633...
And check: sqrt(612) = 24.738633...
Obviously, if you want correct rounding, careful analysis needed to ensure that you carry sufficient precision at each stage of the computation. This requires careful bookkeeping, but it isn't rocket science. You simply keep careful error bounds and propagate them through the algorithm.
If you want to correct rounding without explicitly checking a residual, you need to compute sqrt(f) to a precision of 2p + 2 bits (where p is precision of the source and destination type). However, you can also take the strategy of computing sqrt(f) to a little more than p bits, square that value, and adjust the trailing bit by one if necessary (which is often cheaper).
sqrt is nice in that it is a unary function, which makes exhaustive testing for single-precision feasible on commodity hardware.
You can find the OS X soft-float sqrtf function on opensource.apple.com, which uses the algorithm described above (I wrote it, as it happens). It is licensed under the APSL, which may or not be suitable for your needs.
Probably (still) the fastest implementation for finding the inverse square root and the 10 lines of code that I adore the most.
It's based on Newton Approximation, but with a few quirks. There's even a great story around this.
Easiest to implement (you can even implement this in a calculator):
def sqrt(x, TOL=0.000001):
while( abs(x/y -y) > TOL ):
y= (y+x/y)/2.0
return y
This is exactly equal to newton raphson:
y(new) = y - f(y)/f'(y)
f(y) = y^2-x and f'(y) = 2y
Substituting these values:
y(new) = y - (y^2-x)/2y = (y^2+x)/2y = (y+x/y)/2
If division is expensive you should consider: http://en.wikipedia.org/wiki/Shifting_nth-root_algorithm .
Shifting algorithms:
Let us assume you have two numbers a and b such that least significant digit (equal to 1) is larger than b and b has only one bit equal to (eg. a=1000 and b=10). Let s(b) = log_2(b) (which is just the location of bit valued 1 in b).
Assume we already know the value of a^2. Now (a+b)^2 = a^2 + 2ab + b^2. a^2 is already known, 2ab: shift a by s(b)+1, b^2: shift b by s(b).
Initialize a such that a has only one bit equal to one and a^2<= n < (2*a)^2.
Let q=s(a).
sqra = a*a
For i = q-1 to -10 (or whatever significance you want):
sqrab = sqra + 2ab + b^2
if sqrab > n:
sqra = sqrab
a=10000 (16)
sqra = 256
Iteration 1:
b=01000 (8)
sqrab = (a+b)^2 = 24^2 = 576
sqrab < n => a=a+b = 24
Iteration 2:
b = 4
sqrab = (a+b)^2 = 28^2 = 784
sqrab > n => a=a
Iteration 3:
b = 2
sqrab = (a+b)^2 = 26^2 = 676
sqrab > n => a=a
Iteration 4:
b = 1
sqrab = (a+b)^2 = 25^2 = 625
sqrab > n => a=a
Iteration 5:
b = 0.5
sqrab = (a+b)^2 = 24.5^2 = 600.25
sqrab < n => a=a+b = 24.5
Iteration 6:
b = 0.25
sqrab = (a+b)^2 = 24.75^2 = 612.5625
sqrab < n => a=a
Iteration 7:
b = 0.125
sqrab = (a+b)^2 = 24.625^2 = 606.390625
sqrab < n => a=a+b = 24.625
and so on.
A good approximation to square root on the range [1,4) is
def sqrt(x):
y = x*-0.000267
y = x*(0.004686+y)
y = x*(-0.034810+y)
y = x*(0.144780+y)
y = x*(-0.387893+y)
y = x*(0.958108+y)
return y+0.315413
Normalise your floating point number so the mantissa is in the range [1,4), use the above algorithm on it, and then divide the exponent by 2. No floating point divisions anywhere.
With the same CPU time budget you can probably do much better, but that seems like a good starting point.
