Why is fitting a polynomial faster than changing polynomial basis? - performance

Given some data points on the interval [–1, 1] and the best-fit Chebyshev polynomial to those points, I want to convert the Chebyshev polynomial to a Legendre polynomial.
There are 2 ways to do it, as shown in the code below. The direct way is to call convert(kind = Legendre) on the Chebyshev polynomial, which took 19.591 seconds. The alternative is to call Legendre.fit on the data points, which took only 3.356 seconds.
import numpy as np
from numpy.polynomial import Chebyshev, Legendre
x = np.linspace(-1, 1, 1000)
y = 1.0 / (1 + x ** 2) + 1e-3 * np.random.random(1000)
T = Chebyshev.fit(x, y, 99)
from timeit import timeit
timeit("T.convert(kind = Legendre)", setup = "from __main__ import x, y, T, Legendre",
number = 200)
timeit("Legendre.fit(x, y, 99)", setup = "from __main__ import x, y, Legendre",
number = 200)
Question: Why is Legendre.fit much faster than convert(kind = Legendre)? Am I doing it wrongly?

Related

Generate matrix in Keras based on row and column

How I could create a layer in Keras that ouputs matrix given dimensions (e.g. m, n) with cells having a value based on the row and column?
Here is the forumula:
A[i, 2j] = i / (10**(2*j))
A[i, 2j+1] = i / (10**(2*j))
I tried to look on the lamba function but it seems Keras passed only the cell value and not the indices! Any other options (not a loop)
You could do the following:
from keras.layers import Input
import keras.backend as K
import numpy as np
def CustomConstantInput(m, n):
x = np.arange(m)
y = 10 ** (2 * (np.arange(n) // 2))
matrix = x[:, None] / y[None, :]
print(matrix)
fixed_input = K.constant(matrix)
return Input(tensor=fixed_input)
t = CustomConstantInput(3, 4)

Fixing the intercept in statsmodels ols

In Python's statsmodels.formula.api, the ols functionality automatically includes and estimates an intercept:
results = sm.ols(formula="s ~ x + y + z", data=somedata).fit()
results.params
(* Intercept 0.632646, x -1.258761, y 0.465076, z 0.497991 *)
Because I'm using it in a linear probability model, is there any way to fix the intercept to 0.5?
You can reproduce this behavior in 2 steps:
Subtract the predefined_intercept from your targets
Fit OLS without intercept: include "-1" in your formula
Minimal example:
from statsmodels.formula.api import ols
import pandas as pd
import numpy as np
n_samples = 100
predefined_intercept = 0.5
somedata = pd.DataFrame(np.random.random((n_samples, 3)), columns = ['x', 'y', 'z'])
somedata['s'] = somedata['x'] - 2 * somedata['y'] + 5 * somedata['z'] - predefined_intercept
results = ols(formula="s ~ x + y + z - 1", data=somedata).fit()
print(results.params)
Output:
x 0.671561
y -2.315076
z 4.759542
See an official example notebook on formulas for detailed explanations and more.

Finding center point given distance matrix

I have a matrix (really a loaded image) in which every element is a L2 distance from some unknown center point.
Here is a trivial example
A = [1.4142 1.0000 1.4142 2.2361]
[1.0000 0.0000 1.0000 2.0000]
[1.4142 1.0000 1.4142 2.2361]
In this case, the center is obviously at coordinate (1,1) (index A[1,1] in a 0-indexed matrix or 2D array).
However, in the case where my centers are not constrained to be integer indices, it's no longer as obvious. For example, given this matrix B, where is my center coordinate?
B = [3.0292 1.9612 2.8932 5.8252]
[1.2292 0.1612 1.0932 4.0252]
[1.4292 0.3612 1.2932 4.2252]
How would you find that the answer in this case is at row 1.034 and column 1.4?
I am aware of the trilateration solution (having provided MATLAB code to visualize that in 3D previously), but is there a more efficient way (e.g. one without a matrix inversion)?
This question is sort of language agnostic, as I am looking more for algorithmic help. If you could stick to MATLAB, Python, or C++ though in a solution, that would be great ;-).
While having no experience with similar tasks, i read some stuff and also tried something.
When unfamiliar with this topic it's hard to grasp it seems and all those resources i found are a bit chaotic.
Still unclear in regards to theory for me:
is the problem as stated above a convex-optimization problem (local-minimum = global-minimum; would mean access to powerful solvers!)
there are much more resources about more generic problems (Sensor Network
Localization), which are non-convex and where extremely complex methods have been developed
is your trilateration-approach able to exploit > 3 points (trilateration vs. multilateration; at least this code does not seem like it can which means: bad performance with noise!)
Here some example code with two approaches:
A: Convex-optimization: SOCP-Relaxation
Follows SECOND-ORDER CONE PROGRAMMING RELAXATION OF SENSOR NETWORK LOCALIZATION
Not impressive performance, but should be powerful as approximation for big-data
Guaranteed global-optimum for this relaxation!
Implemented with cvxpy
B: Nonlinear-programming optimization
Implemented using scipy.optimize
Pretty much perfect in my synthetic experiments; even good results in noisy case; despite the fact we are using numerical-differentiation (automatic-diff hard to use here)
Some additional remark:
Your example B surely has some (pretty bad) noise or some other problem in my opinion, as my approaches are completely off; while especially approach B shines for my synthetic-data (at least that's my impression)
Code:
import numpy as np
import cvxpy as cvx
from scipy.spatial.distance import cdist
from scipy.optimize import minimize
np.random.seed(1)
""" Create noise-free (not anymore!) fake-problem """
real_x = np.random.random(size=2) * 3
M, N = 5, 10
NOISE_DISTS = 0.1
pos = np.array([(i,j) for i in range(M) for j in range(N)]) # ugly -> tile/repeat/stack
real_x_stacked = np.vstack([real_x for i in range(pos.shape[0])])
Y = cdist(pos, real_x[np.newaxis])
Y += np.random.normal(size=Y.shape)*NOISE_DISTS # Let's add some noise!
print('-----')
print('PROBLEM')
print('-------')
print('real x: ', real_x)
print('dist mat: ', np.round(Y,3).T)
""" Helper """
def cost(x, Y, pos):
res = np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
return np.linalg.norm(res, 2)
print('cost with real_x (check vs. noisy): ', cost(real_x, Y, pos))
""" SOLVER SOCP """
def solve_socp_relax(pos, Y):
x = cvx.Variable(2)
y = cvx.Variable(pos.shape[0])
fake_stack = [x for i in range(pos.shape[0])] # hacky
objective = cvx.sum_entries(cvx.norm(y - Y))
x_stacked = cvx.reshape(cvx.vstack(*fake_stack), pos.shape[0], 2) # hacky
constraints = [cvx.norm(pos - x_stacked, 2, axis=1) <= y]
problem = cvx.Problem(cvx.Minimize(objective), constraints)
problem.solve(solver=cvx.ECOS, verbose=False)
return x.value.T
""" SOLVER NLP """
def solve_nlp(pos, Y):
sol = minimize(cost, np.zeros(pos.shape[1]), args=(Y, pos), method='BFGS')
# print(sol)
return sol.x
""" TEST """
print('-----')
print('SOLVE')
print('-----')
socp_relax_sol = solve_socp_relax(pos, Y)
print('SOCP RELAX SOL: ', socp_relax_sol)
nlp_sol = solve_nlp(pos, Y)
print('NLP SOL: ', nlp_sol)
Output:
-----
PROBLEM
-------
real x: [ 1.25106601 2.16097348]
dist mat: [[ 2.444 1.599 1.348 1.276 2.399 3.026 4.07 4.973 6.118 6.746
2.143 1.149 0.412 0.766 1.839 2.762 3.851 4.904 5.734 6.958
2.377 1.432 0.856 1.056 1.973 2.843 3.885 4.95 5.818 6.84
2.711 2.015 1.689 1.939 2.426 3.358 4.385 5.22 6.076 6.97
3.422 3.153 2.759 2.81 3.326 4.162 4.734 5.627 6.484 7.336]]
cost with real_x (check vs. noisy): 0.665125233772
-----
SOLVE
-----
SOCP RELAX SOL: [[ 1.95749275 2.00607253]]
NLP SOL: [ 1.23560791 2.16756168]
Edit: Further speedup can be achieved (especially in large-scale) in using nonlinear-least-squares instead of the more general NLP-approach! My results are still the same (as expected if the problem would be convex). Timings between NLP/NLS can look like 9 vs. 0.5 seconds!
This is my recommended method!
def solve_nls(pos, Y):
def res(x, Y, pos):
return np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
sol = least_squares(res, np.zeros(pos.shape[1]), args=(Y, pos), method='lm')
# print(sol)
return sol.x
Especially the second-approach (NLP) will also run for much bigger instances (cvxpy's overhead hurts; that's not a downside of the SOCP-solver which should scale much much better!).
Here some output for M, N = 500, 1000 with some more noise:
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 24.706 23.573 23.693 ..., 1090.29 1091.216
1090.817]]
cost with real_x (check vs. noisy): 353.354267797
-----
SOLVE
-----
NLP SOL: [ 12.51082419 21.60911561]
used: 5.9552763315495625 # SECONDS
So in my experiments it works, but i won't give any global-convergence guarantees or reconstruction-guarantees (still missing some theory).
At first i though about using the global optimum of the relaxed-SOCP-problem as initial-point in the NLP-solver, but i did not find any example where this is needed!
Some just-for-fun visuals using:
M, N = 20, 30
NOISE_DISTS = 0.2
...
import matplotlib.pyplot as plt
plt.imshow(Y.reshape(M, N), cmap='viridis', interpolation='none')
plt.colorbar()
plt.scatter(nlp_sol[1], nlp_sol[0], color='red', s=20)
plt.xlim((0, N))
plt.ylim((0, M))
plt.show()
And some super noisy case (nice performance!):
M, N = 50, 100
NOISE_DISTS = 5
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 22.329 18.745 27.588 ..., 94.967 80.034 91.206]]
cost with real_x (check vs. noisy): 354.527196716
-----
SOLVE
-----
NLP SOL: [ 12.44158986 21.50164637]
used: 0.01050068340320306
If I understand correctly, you have a matrix A, where A[i,j] holds the distance from (i,j) to some unknown point (y,x). You could find (y,x) like this:
Square each element of A, to make a matrix B say.
We then want to find (y,x) so
(y-i)*(y-i) + (x-j)*(x-j) = B[i,j]
Subtracting each equation from the 0,0 equation and rearranging:
2*i*y + 2*j*x = B[0,0] + i*i + j*j - B[i,j]
This can be solved by linear least squares. Note that since there are 2 unknowns, the matix inversion (better, factorisation) involved will be on a 2x2 matrix and so not time consuming. You could indeed, given just the dimensions of A, work out the required matrix and its inverse analytically.

Testing GPU with tensorflow matrix multiplication

As many machine learning algorithms rely to matrix multiplication(or at least can be implemented using matrix multiplication) to test my GPU is I plan to create matrices a , b , multiply them and record time it takes for computation to complete.
Here is code that will generate two matrices of dimensions 300000,20000 and multiply them :
import tensorflow as tf
import numpy as np
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])
a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)
println("Init complete");
result = tf.mul(a , b)
v = sess.run(result)
print(v)
Is this a sufficient test to compare performance of GPU's ? What other factors should I consider ?
Here's an example of a matmul benchmark which avoids common pitfalls, and matches the official 11 TFLOP mark on Titan X Pascal.
import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time
n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
product = tf.matmul(matrix1, matrix2)
# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
iters = 10
# pre-warming
sess.run(product.op)
start = time.time()
for i in range(iters):
sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
elapsed/iters,
rate,))

Integration of orbits with solar system gravity fields from Skyfield - speed issues

In the time tests shown below, I found that Skyfield takes several hundred microseconds up to a millisecond to return obj.at(jd).position.km for a single time value in jd, but the incremental cost for longer JulianDate objects (a list of points in time) is only about one microsecond per point. I see similar speeds using Jplephem and with two different ephemerides.
My question here is: if I want to random-access points in time, for example as a slave to an external Runge-Kutta routine which uses its own variable stepsize, is there a way I can do this faster within python (without having to learn to compile code)?
I understand this is not at all the typical way Skyfield is intended to be used. Normally we'd load a JulianDate object with a long list of time points and then calculate them at once, and probably do that a few times, not thousands of times (or more), the way an orbit integrator might do.
Workaround: I can imagine a work-around where I build my own NumPy database by running Skyfield once using a JulianDate object with fine time granularity, then writing my own Runge-Kutta routine which changes step sizes up and down by discrete amounts such that the timesteps always correspond directly to the striding of NumPy array.
Or I could even try re-interpolating. I am not doing highly precise calculations so a simple NumPy or SciPy 2nd order might be fine.
Ultimately I'd like to try integrating the path of objects under the influence of the gravity field of the solar system (e.g. deep-space satellite, comet, asteroid). When looking for an orbit solution one might try millions of starting state vectors in 6D phase space. I know I should be using things like ob.at(jd).observe(large_body).position.km method because gravity travels at the speed of light like everything else. This seems to cost substantial time since (I'm guessing) it's an iterative calculation ("Let's see... where would Jupiter have been such that I feel it's gravity right NOW"). But let's peel the cosmic onion one layer at a time.
Figure 1. Skyfield and JPLephem performance on my laptop for different length JulianDate objects, for de405 and de421. They are all about the same - (very) roughly about a half-millisecond for the first point and a microsecond for each additional point. Also the very first point to be calculated when the script runs (Earth (blue) with len(jd) = 1) has an additional millisecond artifact.
Earth and Moon are slower because it is a two-step calculation internally (the Earth-Moon Barycenter plus the individual orbits about the Barycenter). Mercury may be slower because it moves so fast compared to the ephemeris time steps that it requires more coefficients in the (costly) Chebyshev interpolation?
SCRIPT FOR SKYFIELD DATA the JPLephem script is farther down
import numpy as np
import matplotlib.pyplot as plt
from skyfield.api import load, JulianDate
import time
ephem = 'de421.bsp'
ephem = 'de405.bsp'
de = load(ephem)
earth = de['earth']
moon = de['moon']
earth_barycenter = de['earth barycenter']
mercury = de['mercury']
jupiter = de['jupiter barycenter']
pluto = de['pluto barycenter']
things = [ earth, moon, earth_barycenter, mercury, jupiter, pluto ]
names = ['earth', 'moon', 'earth barycenter', 'mercury', 'jupiter', 'pluto']
ntimes = [i*10**n for n in range(5) for i in [1, 2, 5]]
years = [np.zeros(1)] + [np.linspace(0, 100, n) for n in ntimes[1:]] # 100 years
microsecs = []
for y in years:
jd = JulianDate(utc=(1900 + y, 1, 1))
mics = []
for thing in things:
tstart = time.clock()
answer = thing.at(jd).position.km
mics.append(1E+06 * (time.clock() - tstart))
microsecs.append(mics)
microsecs = np.array(microsecs).T
many = [len(y) for y in years]
fig = plt.figure()
ax = plt.subplot(111, xlabel='length of JD object',
ylabel='microseconds',
title='time for thing.at(jd).position.km with ' + ephem )
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
ax.get_xticklabels() + ax.get_yticklabels()):
item.set_fontsize(item.get_fontsize() + 4) # http://stackoverflow.com/a/14971193/3904031
for name, mics in zip(names, microsecs):
ax.plot(many, mics, lw=2, label=name)
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.xscale('log')
plt.yscale('log')
plt.savefig("skyfield speed test " + ephem.split('.')[0])
plt.show()
SCRIPT FOR JPLEPHEM DATA the Skyfield script is above
import numpy as np
import matplotlib.pyplot as plt
from jplephem.spk import SPK
import time
ephem = 'de421.bsp'
ephem = 'de405.bsp'
kernel = SPK.open(ephem)
jd_1900_01_01 = 2415020.5004882407
ntimes = [i*10**n for n in range(5) for i in [1, 2, 5]]
years = [np.zeros(1)] + [np.linspace(0, 100, n) for n in ntimes[1:]] # 100 years
barytup = (0, 3)
earthtup = (3, 399)
# moontup = (3, 301)
microsecs = []
for y in years:
mics = []
#for thing in things:
jd = jd_1900_01_01 + y * 365.25 # roughly, it doesn't matter here
tstart = time.clock()
answer = kernel[earthtup].compute(jd) + kernel[barytup].compute(jd)
mics.append(1E+06 * (time.clock() - tstart))
microsecs.append(mics)
microsecs = np.array(microsecs)
many = [len(y) for y in years]
fig = plt.figure()
ax = plt.subplot(111, xlabel='length of JD object',
ylabel='microseconds',
title='time for jplephem [0,3] and [3,399] with ' + ephem )
# from here: http://stackoverflow.com/a/14971193/3904031
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
ax.get_xticklabels() + ax.get_yticklabels()):
item.set_fontsize(item.get_fontsize() + 4)
#for name, mics in zip(names, microsecs):
ax.plot(many, microsecs, lw=2, label='earth')
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.xscale('log')
plt.yscale('log')
plt.ylim(1E+02, 1E+06)
plt.savefig("jplephem speed test " + ephem.split('.')[0])
plt.show()

Resources