Optimal parameters not found: Number of calls to function has reached maxfev = 100 - curve-fitting

I'm new to python, I try to give some adjustment to the data, but when I get the graph, only the original data appears and with the message "Optimal parameters not found: Number of calls to function has reached maxfev = 1000." Could you help me find my mistake?
%matplotlib inline
import matplotlib.pylab as m
from scipy.optimize import curve_fit
import numpy as num
import scipy.optimize as optimize
xData=num.array([0,0,100,200,250,300,400], dtype="float")
yData=num.array([0,0,0,0,75,100,100], dtype="float")
m.plot(xData, yData, 'ro', label='Datos originales')
def fun(x, a, b):
return a + b * num.log(x)
popt,pcov=optimize.curve_fit(fun, xData, yData,p0=[1,1], maxfev=1000)
print=popt
x=num.linspace(1,400,7)
m.plot(x,fun(x, *popt), label='Función ajustada')
m.xlabel('concentración')
m.ylabel('% mortalidad')
m.legend()
m.grid()

The model in your code is "a + b * num.log(x)". Because your data contains an x value of 0.0, the evaluation of log(0.0) gives errors and will not allow the fitting software to function. Sometimes these x values of 0.0 can be replaced with very small numbers, as log(small number) will not fail - but in this case the equation and data do not appear to match and so using that technique alone would not be sufficient here.
My thought is that a different equation would be a better model for this data. I performed an equation search using your data, and found that several different sigmoidal type equations gave suspiciously good fits to this data set - which is not surprising because of the small number of data points.
The sigmoidal equations I tried were all extremely sensitive to the initial parameter estimates. Here is a graphical Python fitter using scipy's Differential Evolution genetic algorithm module to determine the initial parameter estimates for curve_fit's non-linear solver. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring bounds within which to search. Here those bounds are taken from the data maximum and minimun values.
I personally would not use this fit precisely because the small number of data points is giving such suspiciously good fits, and strongly recommend taking additional data points if at all possible. I could however not find any equations with less than three parameters that would fit the data.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData=numpy.array([0,0,100,200,250,300,400], dtype="float")
yData=numpy.array([0,0,0,0,75,100,100], dtype="float")
def func(x, a, b, c): # Sigmoid B equation from zunzun.com
return a / (1.0 + numpy.exp(-1.0 * (x - b) / c))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
parameterBounds = []
parameterBounds.append([minX, maxX]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([0.0, 2.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData), 100)
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Related

Subtracting a best fit line with numpy.polyfit()?

So I'm working on a project and I have a set of data that I loaded in as a csv. The data has a spot that that I need to flatten out. I used the numpy.polyfit() function to find a line of best fit, but what I can't seem to figure out is how to subtract off the best fit line. Any advice?
Here is the code I'm using so far:
μ = pd.read_csv("C:\\Users\\ander\\Documents\\Data\\plots and code\\dataframe2.csv")
yvalue = "average"
xvalue = "xvalue"
X = μ[xvalue][173:852]
Y = μ[yvalue][173:852]
fit = np.polyfit(X, Y, 1)
μ = μ.subtract(fit, μ)
The polyfit function finds the linear coefficient of the best fit. In order to subtract the line from your data, you first need to create the linear function itself. For example, you can use the numpy.poly1d function.
I'll show you an example. Since we don't have access to the .csv file I made up X and Y:
import matplotlib.pyplot as plt
import numpy as np
DATA_SIZE = 500
μ_X = np.sort(np.random.uniform(0,10,DATA_SIZE))
μ_Y = 3*np.exp(-(μ_X-7)**2) + np.random.normal(0,0.08,DATA_SIZE) + 0.5*μ_X
X = μ_X[50:200]
Y = μ_Y[50:200]
plt.scatter(μ_X, μ_Y, label='Full data')
plt.scatter(X, Y, label='Selected region')
plt.legend()
plt.show()
Now we can fit the baseline from the orange data and subtract the linear function from all the data (blue).
fit = np.polyfit(X, Y, 1)
linear_baseline = np.poly1d(fit) # create the linear baseline function
μ_Y = μ_Y - linear_baseline(μ_X) # subtract the baseline from μ_Y
plt.scatter(μ_X, μ_Y, label='Linear baseline removed')
plt.legend()
plt.show()

SciPy: von Mises distribution on a half circle?

I'm trying to figure out the best way to define a von-Mises distribution wrapped on a half-circle (I'm using it to draw directionless lines at different concentrations). I'm currently using SciPy's vonmises.rvs(). Essentially, I want to be able to put in, say, a mean orientation of pi/2 and have the distribution truncated to no more than pi/2 either side.
I could use a truncated normal distribution, but I will lose the wrapping of the von-mises (say if I want a mean orientation of 0)
I've seen this done in research papers looking at mapping fibre orientations, but I can't figure out how to implement it (in python). I'm a bit stuck on where to start.
If my von Mesis is defined as (from numpy.vonmises):
np.exp(kappa*np.cos(x-mu))/(2*np.pi*i0(kappa))
with:
mu, kappa = 0, 4.0
x = np.linspace(-np.pi, np.pi, num=51)
How would I alter it to use a wrap around a half-circle instead?
Could anyone with some experience with this offer some guidance?
Is is useful to have direct numerical inverse CDF sampling, it should work great for distribution with bounded domain. Here is code sample, building PDF and CDF tables and sampling using inverse CDF method. Could be optimized and vectorized, of course
Code, Python 3.8, x64 Windows 10
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate as integrate
def PDF(x, μ, κ):
return np.exp(κ*np.cos(x - μ))
N = 201
μ = np.pi/2.0
κ = 4.0
xlo = μ - np.pi/2.0
xhi = μ + np.pi/2.0
# PDF normaliztion
I = integrate.quad(lambda x: PDF(x, μ, κ), xlo, xhi)
print(I)
I = I[0]
x = np.linspace(xlo, xhi, N, dtype=np.float64)
step = (xhi-xlo)/(N-1)
p = PDF(x, μ, κ)/I # PDF table
# making CDF table
c = np.zeros(N, dtype=np.float64)
for k in range(1, N):
c[k] = integrate.quad(lambda x: PDF(x, μ, κ), xlo, x[k])[0] / I
c[N-1] = 1.0 # so random() in [0...1) range would work right
#%%
# sampling from tabular CDF via insverse CDF method
def InvCDFsample(c, x, gen):
r = gen.random()
i = np.searchsorted(c, r, side='right')
q = (r - c[i-1]) / (c[i] - c[i-1])
return (1.0 - q) * x[i-1] + q * x[i]
# sampling test
RNG = np.random.default_rng()
s = np.empty(20000)
for k in range(0, len(s)):
s[k] = InvCDFsample(c, x, RNG)
# plotting PDF, CDF and sampling density
plt.plot(x, p, 'b^') # PDF
plt.plot(x, c, 'r.') # CDF
n, bins, patches = plt.hist(s, x, density = True, color ='green', alpha = 0.7)
plt.show()
and graph with PDF, CDF and sampling histogram
You could discard the values outside the desired range via numpy's filtering (theta=theta[(theta>=0)&(theta<=np.pi)], shortening the array of samples). So, you could first increment the number of generated samples, then filter and then take a subarray of the desired size.
Or you could add/subtract pi to put them all into that range (via theta = np.where(theta < 0, theta + np.pi, np.where(theta > np.pi, theta - np.pi, theta))). As noted by #SeverinPappadeux such changes the distribution and is probably not desired.
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
import numpy as np
from scipy.stats import vonmises
mu = np.pi / 2
kappa = 4
orig_theta = vonmises.rvs(kappa, loc=mu, size=(10000))
fig, axes = plt.subplots(ncols=2, sharex=True, sharey=True, figsize=(12, 4))
for ax in axes:
theta = orig_theta.copy()
if ax == axes[0]:
ax.set_title(f"$Von Mises, \\mu={mu:.2f}, \\kappa={kappa}$")
else:
theta = theta[(theta >= 0) & (theta <= np.pi)]
print(len(theta))
ax.set_title(f"$Von Mises, angles\\ filtered\\ ({100 * len(theta) / (len(orig_theta)):.2f}\\ \\%)$")
segs = np.zeros((len(theta), 2, 2))
segs[:, 1, 0] = np.cos(theta)
segs[:, 1, 1] = np.sin(theta)
line_segments = LineCollection(segs, linewidths=.1, colors='blue', alpha=0.5)
ax.add_collection(line_segments)
ax.autoscale()
ax.set_aspect('equal')
plt.show()

How come some of the lines get ignored with hough line function?

I'm struggling a bit to figure out
how to make sure all lines get recognized with Line Hough Transform taken from sckit-image library.
https://scikit-image.org/docs/dev/auto_examples/edges/plot_line_hough_transform.html#id3
Here below all lines got recognized:
But if I apply the same script on similar image,
one line will get ignored after applying the Hough transform,
I have read the documentation which says:
The Hough transform constructs a histogram array representing the parameter
space (i.e., an :math:`M \\times N` matrix, for :math:`M` different values of
the radius and :math:`N` different values of :math:`\\theta`). For each
parameter combination, :math:`r` and :math:`\\theta`, we then find the number
of non-zero pixels in the input image that would fall close to the
corresponding line, and increment the array at position :math:`(r, \\theta)`
appropriately.
We can think of each non-zero pixel "voting" for potential line candidates. The
local maxima in the resulting histogram indicates the parameters of the most
probably lines
So my conclusion is the line got removed since it hadn't got enough "votes",
(I have tested it with different precisions (0.05, 0.5, 0.1) degree, but still got the same issue).
Here is the code:
import numpy as np
from skimage.transform import hough_line, hough_line_peaks
from skimage.feature import canny
from skimage import data,io
import matplotlib.pyplot as plt
from matplotlib import cm
# Constructing test image
image = io.imread("my_image.png")
# Classic straight-line Hough transform
# Set a precision of 0.05 degree.
tested_angles = np.linspace(-np.pi / 2, np.pi / 2, 3600)
h, theta, d = hough_line(image, theta=tested_angles)
# Generating figure 1
fig, axes = plt.subplots(1, 3, figsize=(15, 6))
ax = axes.ravel()
ax[0].imshow(image, cmap=cm.gray)
ax[0].set_title('Input image')
ax[0].set_axis_off()
ax[1].imshow(np.log(1 + h),
extent=[np.rad2deg(theta[-1]), np.rad2deg(theta[0]), d[-1], d[0]],
cmap=cm.gray, aspect=1/1.5)
ax[1].set_title('Hough transform')
ax[1].set_xlabel('Angles (degrees)')
ax[1].set_ylabel('Distance (pixels)')
ax[1].axis('image')
ax[2].imshow(image, cmap=cm.gray)
origin = np.array((0, image.shape[1]))
for _, angle, dist in zip(*hough_line_peaks(h, theta, d)):
y0, y1 = (dist - origin * np.cos(angle)) / np.sin(angle)
ax[2].plot(origin, (y0, y1), '-r')
ax[2].set_xlim(origin)
ax[2].set_ylim((image.shape[0], 0))
ax[2].set_axis_off()
ax[2].set_title('Detected lines')
plt.tight_layout()
plt.show()
How should I "catch" this line too,
any suggestion?
Shorter lines have lower accumulator values in the Hough transform, so you have to adjust the threshold appropriately. If you know how many line segments you are looking for, you can set the threshold fairly low and then limit the number of peaks detected.
Here's a condensed version of the code above, with modified threshold, for reference:
import numpy as np
from skimage.transform import hough_line, hough_line_peaks
from skimage import io
import matplotlib.pyplot as plt
from matplotlib import cm
from skimage import color
# Constructing test image
image = color.rgb2gray(io.imread("my_image.png"))
# Classic straight-line Hough transform
# Set a precision of 0.05 degree.
tested_angles = np.linspace(-np.pi / 2, np.pi / 2, 3600)
h, theta, d = hough_line(image, theta=tested_angles)
hpeaks = hough_line_peaks(h, theta, d, threshold=0.2 * h.max())
fig, ax = plt.subplots()
ax.imshow(image, cmap=cm.gray)
for _, angle, dist in zip(*hpeaks):
(x0, y0) = dist * np.array([np.cos(angle), np.sin(angle)])
ax.axline((x0, y0), slope=np.tan(angle + np.pi/2))
plt.show()
(Note: axline requires matplotlib 3.3.)

Kalman FIlter Convergence

Attached is a simple python Kalman filter example of a free-fall object (g=-9.8m/s^2)
Alas, I have a problem. The state vector x contains both the position and the velocity but the z vector (measurement) contains only the position.
If I set a wrong initial position value, the algorithm coverages to the true value even with noisy measurements (see picture below)
However, if I sent the wrong initial velocity value, the algorithm does not converge even though the motion model is defined correctly.
Attached is the python code:
kalman.py
In your code I see two problems.
You set the Q-Matrix to zero. It means you trust too much in your model and give the filter no chance to improve the estimation through the measurement. Your filter becomes to stiff. You can think of it like a low pass filter with a very big time constant.
In my code I set the Q-Matrix to
Q = np.array([[1,0],[0,0.1]])
The second issue is your measurement noise. You simulate the noisy measurements with R=100 but communicate to the filter R=4. The filter trusts the measurement more than it should be. This issue is not really relevant to your question but still it should be corrected.
Now even if I set the initial velocity to 20, the position estimation works fine.
Here is the estimation for R = 4:
And for R = 100:
UPDATE
The velocity estimation works wrong, because you have some mistakes in your matrix operations. Please note, the matrix multiplication goes through np.dot(), not through *.
Here is a correct result for v0 = 20:
Many thanks, Anton.
Attached below is the corrected code for your convenience:
Roi
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
from numpy.linalg import inv
N = 1000 # number of time steps
dt = 0.01 # Sampling time (s)
t = dt*np.arange(N)
F = np.array([[1, dt],[ 0, 1]])# system matrix - state
B = np.array([[-1/2*dt**2],[ -dt]])# system matrix - input
H = np.array([[1, 0]])#; % observation matrix
Q = np.array([[1,0],[0,1]])
u = 9.80665# % input = acceleration due to gravity (m/s^2)
I = np.array([[1,0],[0,1]]) #identity matrix
# Define the initial position and velocity
y0 = 100; # m
v0 = 0; # m/s
G2 = np.array([-1/2*dt**2, -dt])# system matrix - input
# Initialize the state vector (true state)
xt = np.zeros((2, N)) # True state vector
xt[:,0] = [y0,v0]
for k in range(1,N):
xt[:,k] = np.dot(F,xt[:,k-1]) +G2*u
#Generate the noisy measurement from the true state
R = 4 # % m^2/s^2
v = np.sqrt(R)*np.random.randn(N) #% measurement noise
z = np.dot(H,xt) + v; #% noisy measurement
R2=4
#% Initialize the covariance matrix
P = np.array([[10, 0], [0, 0.1]])# Covariance for initial state error
#% Loop through and perform the Kalman filter equations recursively
x_list =[]
x_kalman= np.array([[117],[290]])
x_list.append(x_kalman)
print(-B*u)
for k in range(1,N):
x_kalman=np.dot(F,x_kalman) +B*u
P = np.dot(np.dot(F,P),F.T) +Q
S=(np.dot(np.dot(H,P),H.T) + R2)
S2 = inv(S)
K = np.dot(P,H.T)*S2
x_kalman = x_kalman +K*((z[:,k]- np.dot(H,x_kalman)))
P = np.dot((I - K*H),P)
x_list.append(x_kalman)
x_array = np.array(x_list)
print(x_array.shape)
plt.figure()
plt.plot(t,z[0,:], label="measurment", color='LIME', linewidth=1)
plt.plot(t,x_array[:,0,:],label="kalman",linewidth=5)
plt.plot(t,xt[0,:],linestyle='--', label = "Truth",linewidth=6)
plt.legend(fontsize=30)
plt.grid(True)
plt.xlabel("t[s]")
plt.title("Position Estimation", fontsize=20)
plt.ylabel("$X_t$ = h[m]")
plt.gca().set( ylim=(0, 110))
plt.gca().set(xlim=(0,6))
plt.figure()
#plt.plot(t,z, label="measurment", color='LIME')
plt.plot(t,x_array[:,1,:],label="kalman",linewidth=4)
plt.plot(t,xt[1,:],linestyle='--', label = "Truth",linewidth=2)
plt.legend()
plt.grid(True)
plt.xlabel("t[s]")
plt.title("Velocity Estimation")
plt.ylabel("$X_t$ = h[m]")

How do I perform a curve fit with an array of points and touching a specific point in that array

I need help with curve fitting a given set of points. The points form a parabola and I ought to find the peak point of the result. Issue is when I do a curve fit, it sometimes doesn't touch the max y-coordinate even if the actual point is given in the input array.
Following is the code snippet. Here 1.88 is the actual peak y-coordinate (13.05,1.88). But the graph generated by the code does not touch the point due to curve fitting. So is there a way to fit the curve making sure that it touches the max point given in the input array?
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit, minimize_scalar
fig = plt.gcf()
#fig.set_size_inches(18.5, 10.5)
x = [4.59,9.02,13.05,18.47,20.3]
y = [1.7,1.84,1.88,1.7,1.64]
def f(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
plt.plot(x,y,"ro")
popt, pcov = curve_fit(f, x, y)
# find the peak
fm = lambda x: -f(x, *popt)
r = minimize_scalar(fm, bounds=(1, 5))
print( "maximum:", r["x"], f(r["x"], *popt) ) #maximum: 2.99846874275 18.3928199902
plt.text(1,1.9,'maximum '+str(round(r["x"],2))+'( #'+str(round(f(r["x"], *popt),2)) + ' )')
x_curve = np.linspace(min(x), max(x), 50)
plt.plot(x_curve, f(x_curve, *popt))
plt.plot(r['x'], f(r['x'], *popt), 'ko')
plt.show()
Here is a graphical code example using your equation with weighted fitting, where I have made the max point larger to more easily see the effect of the weighting. In non-weighted curve fitting, all weights are implicitly 1.0 as all data points have equal weight. Scipy's curve_fit routine uses weights in the form of uncertainties, so that giving a point a very small uncertainty (which I have done) is like giving the point a very large weight. This technique can be used to make a fit pass arbitrarily close to any single data point by any software that can perform weghted fitting.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [4.59,9.02,13.05,18.47,20.3]
y = [1.7,1.84,2.0,1.7,1.64]
# note the single very small uncertainty - try making this value 1.0
uncertainties = numpy.array([1.0, 1.0, 1.0E-6, 1.0, 1.0])
# rename data to use previous example
xData = numpy.array(x)
yData = numpy.array(y)
def func(x, p1, p2, p3):
return p3*(p1/((x-p2)**2 + (p1/2)**2))
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])
# curve fit the test data, first without uncertainties to
# get us closer to initial starting parameters
ssqParameters, pcov = curve_fit(func, xData, yData, p0 = initialParameters)
# now that we have better starting parameters, use uncertainties
fittedParameters, pcov = curve_fit(func, xData, yData, p0 = ssqParameters, sigma=uncertainties, absolute_sigma=True)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Resources