Derivative of sigmoid - algorithm

I'm creating a neural network using the backpropagation technique for learning.
I understand we need to find the derivative of the activation function used. I'm using the standard sigmoid function
f(x) = 1 / (1 + e^(-x))
and I've seen that its derivative is
dy/dx = f(x)' = f(x) * (1 - f(x))
This may be a daft question, but does this mean that we have to pass x through the sigmoid function twice during the equation, so it would expand to
dy/dx = f(x)' = 1 / (1 + e^(-x)) * (1 - (1 / (1 + e^(-x))))
or is it simply a matter of taking the already calculated output of f(x), which is the output of the neuron, and replace that value for f(x)?

Dougal is correct. Just do
f = 1/(1+exp(-x))
df = f * (1 - f)

The two ways of doing it are equivalent (since mathematical functions don't have side-effects and always return the same input for a given output), so you might as well do it the (faster) second way.

A little algebra can simplify this so that you don't have to have df call f.
df = exp(-x)/(1+exp(-x))^2
derivation:
df = 1/(1+e^-x) * (1 - (1/(1+e^-x)))
df = 1/(1+e^-x) * (1+e^-x - 1)/(1+e^-x)
df = 1/(1+e^-x) * (e^-x)/(1+e^-x)
df = (e^-x)/(1+e^-x)^2

You can use the output of your sigmoid function and pass it to your SigmoidDerivative function to be used as the f(x) in the following:
dy/dx = f(x)' = f(x) * (1 - f(x))

Related

Numerically stable way to compute (x-sin(x))/x^3

I'm looking for a very fast way to compute (x-sin(x))/(x^3) for all x using IEEE floating point arithmetic and standard trigonometric functions. At 0, it should return 1/6.
For sin(x)/x, it's sufficient to check if x=0 and return 1, otherwise just compute it using standard floating point sin and division. For (1-cos(x))/x^2, if cos(x) <= 0, this expression is fine as is and otherwise express as (sin(x)/x)^2/(1+cos(x))
But I can't figure out how to express (x-sin(x))/x^3.
So far, the best I have is to compute the infinite sum until it converges: $\sum_0^{\infty}{1/4^(n+1)sin(x/2^n)/(x/2^n)(1-cos(x/2^n))/(x/2^n)^2}$
but I'd prefer a closed form
(1 - cos x) / x2 is fundamentally different from (x - sin x) / x3, in that unity can be constructed by trigonometric functions as sin2 x + cos2 x = 1, while the same is not true of x. This means we cannot transform the latter function into a numerically advantageous closed-form trigonometric formula. I thought long about this and also tried manipulating the formula with all trigonometric identities I am aware of. I would love to be proven wrong; that seems like a question for Math Stack Exchange. The easiest and most accurate way to implement the former function is
// (1-cos(x))/x**2
double cosm1_over_xsquared (double x)
{
if (fabs (x) < sqrt (DBL_EPSILON)) {
return 0.5;
} else {
double s = sin (x * 0.5) / x;
return 2.0 * s * s;
}
}
If the standard math library computes sin() with an error just slightly over half an ulp, this implementation computes (1 - cos x) / x2 with an error no larger than 4 ulp. As a side-note, this function also lends itself to the use of Kahan's self-correction technique, which he first demonstrated for the computation of (ex - 1) / x in
William M. Kahan, "Interval arithmetic in the proposed IEEE floating point arithmetic standard." In Karl L. E. Nickel (ed.), Interval Arithmetic 1980, Academic Press 1980, pp. 99-128.
// (1-cos(x))/x**2 on [-3, 3] using Kahan's self-compensation technique
double cosm1_over_xsquared_kahan (double x)
{
double u = cos (x);
double n = 1.0 - u;
if (n == 0.0) {
return 0.5;
}
double d = acos (u);
return n / (d * d);
}
If both cos() and acos() have a maximum error just slightly over half an ulp, this function returns results with an error of less than 5 ulps. Because cos, other than ex is a periodic function, this approach works only on the restricted interval noted in the code comment.
The above suggests that we should shoot for an implementation of (x - sin x) / x3 with a maximum error of about 4 ulp. Characterizing the naive computation, we find that it is adequate for |x| > 1 under this provision. Despite the narrow input domain for an alternate computation, Kahan's self-compensation technique does not work for this function. The old standby of math function implementers, a polynomial minimax approximation, works just fine, however. This results in the following code:
// (x-sin(x))/x**3
double sinmx_over_xcubed (double x)
{
if (fabs(x) < 1.0) { // minimax approximation
double x2 = x * x;
double p = 7.5475867852548673E-13;
p = p * x2 - 1.6057658525730946E-10;
p = p * x2 + 2.5052098906959416E-8;
p = p * x2 - 2.7557319191306421E-6;
p = p * x2 + 1.9841269841218293E-4;
p = p * x2 - 8.3333333333333055E-3;
p = p * x2 + 1.6666666666666666E-1;
return p;
} else {
return (x - sin (x)) / x / x / x;
}
}
Well the Taylor series for this expression is:
1/6 - x^2/120 + x^4/5040 + O(x^6) (converges when x!=0)
Which should be pretty good for most applications.
Addendum
If you are trying to find the limit at 0 for this: then, apply the L'Hôpital's rule since this is of form 0/0 for lim x->0
lim(x->0) (x-sin(x))/x^3 = lim(x->0) (1 - cos(x))/3x^2 = lim(x->0) = sin(x)/6x = lim(x->0) = 1/6 ;
In other words, it's probably best to use an if statement and for the case of (x = 0) and then use Taylor series which will be a LOT faster than doing floating point sin and cos unless you have purpose-built hardware or are using GPUs.

Why does finding the eigenvalues of a 4*4 matrix by z3py take so much time and do not give any solutions?

I'm trying to calculate the eigenvalues of a 4*4 matrix called A in my code (I know that the eigenvalues are real values). All the elements of A are z3 expressions and need to be calculated from the previous constraints. The code below is the last part of a long code that tries to calculate matrix A, then its eigenvalues. The code is written as an entire but I've split it into two separate parts in order to debug it: part 1 in which the code tries to find the matrix A and part 2 which is eigenvalues' calculation. In part 1, the code works very fast and calculates A in less than a sec, but when I add part 2 to the code, it doesn't give me any solutions after.
I was wondering what could be the reason? Is it because of the order of the polynomial (which is 4) or what? I would appreciate it if anyone can help me find an alternative way to calculate the eigenvalues or give me some hints on how to rewrite the code so it can solve the problem.
(Note that A2 in the actusl code is a matrix with all of its elements as z3 expressions defined by previous constraints in the code. But, here I've defined the elements as real values just to make the code executable. In this way, the code gives a solution so fast but in the real situation it takes so long, like days.
for example, one of the elements of A is almost like this:
0 +
1*Vq0__1 +
2 * -Vd0__1 +
0 +
((5.5 * Iq0__1 - 0)/64/5) *
(0 +
0 * (Vq0__1 - 0) +
-521702838063439/62500000000000 * (-Vd0__1 - 0)) +
((.10 * Id0__1 - Etr_q0__1)/64/5) *
(0 +
521702838063439/62500000000000 * (Vq0__1 - 0) +
0.001 * (-Vd0__1 - 0)) +
0 +
0 + 0 +
0 +
((100 * Iq0__1 - 0)/64/5) * 0 +
((20 * Id0__1 - Etr_q0__1)/64/5) * 0 +
0 +
-5/64
All the variables in this example are z3 variables.)
from z3 import *
import numpy as np
def sub(*arg):
counter = 0
for matrix in arg:
if counter == 0:
counter += 1
Sub = []
for i in range(len(matrix)):
Sub1 = []
for j in range(len(matrix[0])):
Sub1 += [matrix[i][j]]
Sub += [Sub1]
else:
row = len(matrix)
colmn = len(matrix[0])
for i in range(row):
for j in range(colmn):
Sub[i][j] = Sub[i][j] - matrix[i][j]
return Sub
Landa = RealVector('Landa', 2) # Eigenvalues considered as real values
LandaI0 = np.diag( [ Landa[0] for i in range(4)] ).tolist()
ALandaz3 = RealVector('ALandaz3', 4 * 4 )
############# Building ( A - \lambda * I ) to find the eigenvalues ############
A2 = [[1,2,3,4],
[5,6,7,8],
[3,7,4,1],
[4,9,7,1]]
s = Solver()
for i in range(4):
for j in range(4):
s.add( ALandaz3[ 4 * i + j ] == sub(A2, LandaI0)[i][j] )
ALanda = [[ALandaz3[0], ALandaz3[1], ALandaz3[2], ALandaz3[3] ],
[ALandaz3[4], ALandaz3[5], ALandaz3[6], ALandaz3[7] ],
[ALandaz3[8], ALandaz3[9], ALandaz3[10], ALandaz3[11]],
[ALandaz3[12], ALandaz3[13], ALandaz3[14], ALandaz3[15] ]]
Determinant = (
ALandaz3[0] * ALandaz3[5] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) -
ALandaz3[1] * ALandaz3[4] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) +
ALandaz3[2] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[15] - ALandaz3[13] * ALandaz3[11]) -
ALandaz3[3] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[14] - ALandaz3[13] * ALandaz3[10]) )
tol = 0.001
s.add( And( Determinant >= -tol, Determinant <= tol ) ) # giving some flexibility instead of equalling to zero
print(s.check())
print(s.model())
Note that you seem to be using Z3 for a type of equations it absolutely isn't meant for. Z is a sat/smt solver. Such a solver works internally with a huge number of boolean equations. Integers and fractions can be converted to boolean expressions, but with general floats Z3 quickly reaches its limits. See here and here for a lot of typical examples, and note how floats are avoided.
Z3 can work in a limited way with floats, converting them to fractions, but doesn't work with approximations and accuracies as in needed in numerical algorithms. Therefore, the results are usually not what you are hoping for.
Finding eigenvalues is a typical numerical problem, where accuracy issues are very tricky. Python has libraries such as numpy and scipy to efficiently deal with those. See e.g. numpy.linalg.eig.
If, however your A2 matrix contains some symbolic expressions (and uses fractions instead of floats), sympy's matrix functions could be an interesting alternative.

15 digit floating variable calculation in microcontroller

I want to calculate an equation within a controller(Arduino)
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Now the decimal values of the coefficients are important because "x" varies in thousands so cube term cannot be ignored. I have tried manipulating the equation in excel to reduce the coefficients but R^2 is lost in the process and I would like to avoid that.
Max variable size available in Arduino is 4byte. And on google search, I was not able to find an appropriate solution.
Thank you for your time.
Since
-0.0000000104529251928664 ^ (1/3) = - 0.0021864822
0.0000928316793270531 ^ (1/2) = 0.00963491978
The formula
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Can be rewritten:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.282333029643959 * x + 297.661280719026
Rounding all coefficients to 10 decimal places, we get:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.2823330296 * x + 297.6612807
But I don't know Arduino, I'm not sure what the correct number of decimal places is, nor do I know what the compiler will accept or refuse.

How can I create (approximately) the double sigmoid in the shown figure?

I want to create (approximately) the double sigmoid in the shown figure as
a function in terms of the parameters X,Y,Z, a,b,c and d.
Any idea? Thanks.
This question seems to have gone ignored, so try something like this:
k = 1 # adjust this for "sharpness"
s(x) = (tanh(k * x) + 1) / 2
f(x) = X + (Y-X) * s(x-b) + (Z-Y) * s(x-c)
Here's an example plot.

envelope function (spatstat) - error "unused arguments"

I would like to ask your help for finding the reason why when I use the function envelope, my arguments are not accepted, but defined "unused arguments".
The data I'm using are ppp without marks and I would like to create a L function graph with simulated data and my data.
Here the code for my ppp data:
map2008MLW = ppp(xy2008_BNGppp$x, xy2008_BNGppp$y, window = IoM_polygon_MLWowin)
And then:
L2008 = Lest(map2008MLW,correction="Ripley")
OP = par(mar=c(5,5,4,4))
plot(L2008, . -r ~ r, ylab=expression(hat("L")), xlab = "d (m)"); par(OP)
L2008$iso = L$iso - L$r
L2008$theo = L$theo - L$r
Desired number of simulations
n = 9999
Desired p significance level to display
p = 0.05
And at this point the envelope function doesnt seem very happy:
EL2008 = envelope(map2008MLW[W], Lest, nsim=n, rank=(p * (n + 1)))
Error in envelope(map2008MLW[W], Lest, nsim = n, rank = (p * (n + 1))) :
unused arguments (nsim = n, rank = (p * (n + 1)))
It seems a generic error and I am not sure it is caused by the package spatstat. Please, help me in finding a solution to this, as I can't proceed with my analyses.
Thank you very much,
Martina
The argument rank should be nrank.
Also the relationship between the significance level and the argument nrank is not correct in the example. For a two-sided test, the significance level is alpha = 2 * nrank/(nsim+1), so nrank= alpha * (nsim+1)/2.
You have chosen a significance level of 0.95 but I assume you mean 0.05. So with nsim=9999 you want nrank=0.05 * 10000/2 = 250 to get a test with significance level 0.05.
Such a large number of simulations (9999) is unnecessary in this kind of application. Monte Carlo tests are valid with small values of nsim. In your example I would normally use nsim=39 and nrank=1.
See Chapter 10 of the spatstat book.

Resources