Is the derivative of the sigmoid function always the same regardless of its inner variables? - algorithm

The derivative of the sigmoid function is:
df(sigmoid(x)) = sigmoid(x) ( 1 - sigmoid (x))
Is this always true no matter what 'x' is? I'm calculating a back propagation of a CNN by hand and it seems that way with my output function being sigmoid solving for the gradients W0, P, etc.
Thank you.

Related

Fitting data to unknown curve -- Possible tanh

I am trying to fit the following dataset:
0.01 3.69470157
0.59744 3.514991345
0.65171 3.265043489
0.70076 2.978933734
0.75021 2.700637918
0.80103 2.413791532
0.84878 2.086939551
0.89572 1.819489189
0.94717 1.532756131
0.99626 1.244667864
1.01643 1.130430784
1.03626 1.024324017
1.05633 0.910153046
1.07605 0.804981232
1.09791 0.708171108
1.11795 0.612456485
1.13841 0.516217721
1.15944 0.421844141
1.18032 0.335218393
1.20003 0.258073446
1.22204 0.181296813
1.24223 0.115157866
1.25935 0.069310744
Where the first column is x and the second is y.
I have tried a tanh function, polynomials, and now trying the erf function. Nothing seems to fit correctly.
Is there a way to know what function I should be fitting to this? And if so, what is the form of such a function. Thank you.
BIG EDIT: the function must be monotonically decreases as x increases, and have asymptotic behavior at the tail ends. So for the data-set it looks like it should approach ~3.7 and ~0.0
A simple sine(radians) equation with an offset gives a good fit:
y = amplitude * sin(pi * (x - center) / width) + Offset
amplitude = -2.2364202059901910E+00
center = 8.6047683705837374E-01
width = 1.1558269132014631E+00
Offset = 2.0456549443735259E+00
R-squared: 0.99994
RMSE: 0.00909

Particle swarm optimization and function with several parameters

I would like to optimize a function with several parameters using Particle swarm optimization. How can i do it? Everywhere I found this formula 1, but how can I understand this formula, I can optimize a function with only one variable. For example, I have a function with 2 parameters and I want to maximize it. How can I do it with PSO?
vi,d ← ω vi,d + φp rp (pi,d-xi,d) + φg rg (gd-xi,d)
function (x, y)
{
return x + y
}
Since you have just 2 variable to optimize, your search space would be two dimensional. Assume that you want to optimize parameters x1 and x2. Furthermore, x1 is in the range of [a1,b1] and x2 is in the range of [a2,b2]. First you need to initialize a random population of particles (say 30 particles) into the search space boundary and assign random values to velocity vectors (V). Afterwards, you need to evaluate the fitness of the all particles and determine the best particle (Global best). Then you should perform the main upading mechanism of PSO.
This link would be helpful:
http://yarpiz.com/50/ypea102-particle-swarm-optimization

Plotting two variables function

This question is for learning purpose. I am writing my own function to plot an equation. For example:
function e(x) { return sin(x); }
plot(e);
I wrote a plot function that takes function as parameter. The plotting code is simple, x run from some value to some value and increase by small step. This is plot that the plot() manage to produce.
But there is the problem. It cannot express the circle equation like x2 + y2 = 1. So the question would be how should the plot and equation function look like to be able to handle two variables.
Noted that I am not only interesting in two circle equation. A more generalize way of plotting function with two variables.
Well to plot a non function 1D equation (x,y variables) you have 3 choices:
convert to parametric form
so for example x^2 + y^2 = 1 will become:
x = cos(t);
y = sin(t);
t = <0,2*PI>
So plot each function as 1D function plot while t is used as parameter. But for this you need to exploit mathematic identities and substitute ... That is not easily done programaticaly.
convert to 1D functions
non function means you got more than 1 y values for some x values. If you separate your equation into intervals and divide to all cases covering whole plot then you can plot each derived function instead.
So you derive y algebraicaly (let assume unit circle again):
x^2 + y^2 = 1
y^2 = 1 - x^2
y = +/- sqrt (1 - x^2)
----------------------
y1 = +sqrt (1 - x^2)
y2 = -sqrt (1 - x^2)
x = <-1,+1>
this is also not easily done programaticaly but it is a magnitude easier than #1.
do a 2D plot using equation as predicator
simply loop your view through all pixels and render only those for which the equation is true. So again unit circle:
for (x=-1.0;x<=+1.0;x+=0.001)
for (y=-1.0;y<=+1.0;y+=0.001)
if (fabs((x*x)+(y*y)-1.0)<=1e-6)
plot_pixel(x,y,some_color); // x,y should be rescaled and offset to the actual plot view
So you just convert your equation to implicit form:
x^2 + y^2 = 1
-----------------
x^2 + y^2 - 1 = 0
and compare to zero with some threshold (to avoid FPU accuracy problems):
| x^2 + y^2 - 1 | <= threshold_near_zero
The threshold is half size of plot lines width. So this way you can easily change plot width to any pixel size... As you can see this is easily done programaticaly but the plot is slower as you need to loop through all the pixels of the plot view. The step for x,y for loops should match pixel size of the view scale.
Also while using equation as predicate you should handle math singularities as with blind probing you will most likely hit some like division by zero, domain errors for asin,acos,sqrt,etc.
So for arbitrary 1D non function use #3. unless you got some mighty symbolic math engine for #1 or #2.
Defination of a function : A function f takes an input x, and returns a single output f(x).
Now it means for any input there will be one and only one unique output. Like y = sin(x). this is a function on x and y definnes that function.
For equaltion like (x*x) + (y*y) = 1. there are two possible values of y for a single value of `x, hence it can not be termed as a valid equaltion for a function.
If you need to draw it then one possible solution is to plot two points for a single value of x, i.e. sqrt(1-(x*x)) and other -1*sqrt(1-(x*x)). Plot both the values (one will be positive other will be negative with the same absolute value).

Matlab function gradient for fminunc

f = #(w) sum(log(1 + exp(-t .* (phis * w'))))/size(phis, 1) + coef * w*w';
options = optimset('Display', 'notify', 'MaxFunEvals', 2e+6, 'MaxIter', 2e+6);
w = fminunc(f, ones(1, size(phis, 2)), options);
phis size is NxN+1
t size is Nx1
coef is const
I'm trying to minimize function f, firstly I was using fminsearch but it works long time, that's why now I use fminunc, but there is one problem: I need function gradient for acceleration. Can you help me please construct gradient for function f, coz I always get this warning:
Warning: Gradient must be provided for trust-region algorithm;
using line-search algorithm instead.
What you are trying to do is called logistic regression, with a L2-regularization. There are far better ways to solve this problem than a call to a Matlab function, since the log-likelihood function is concave.
You should ask your question in the statistical website, or have a look at my former question there.

Quadratic Bezier Interpolation

I would like to get some code in AS2 to interpolate a quadratic bezier curve. the nodes are meant to be at constant distance away from each other. Basically it is to animate a ball at constant speed along a non-hyperbolic quadratic bezier curve defined by 3 pts.
Thanks!
The Bezier curve math is really quite simple, so I'll help you out with that and you can translate it into ActionScript.
A 2D quadratic Bezier curve is defined by three (x,y) coordinates. I will refer to these as P0 = (x0,y0), P1 = (x1,y1) and P2 = (x2,y2). Additionally a parameter value t, which ranges from 0 to 1, is used to indicate any position along the curve. All x, y and t variables are real-valued (floating point).
The equation for a quadratic Bezier curve is:
P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2
So, using pseudocode, we can smoothly trace out the Bezier curve like so:
for i = 0 to step_count
t = i / step_count
u = 1 - t
P = P0*u*u + P1*2*u*t + P2*t*t
draw_ball_at_position( P )
This assumes that you have already defined the points P0, P1 and P2 as above. If you space the control points evenly then you should get nice even steps along the curve. Just define step_count to be the number of steps along the curve that you would like to see.
Please note that the expression can be done much more efficient mathematically.
P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2
and
P = P0*u*u + P1*2*u*t + P2*t*t
both hold t multiplications which can be simplified.
For example:
C = A*t + B(1-t) = A*t + B - B*t = t*(A-B) + B = You saved one multiplication = Double performance.
The solution proposed by Naaff, that is P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2, will get you the correct "shape", but selecting evenly-spaced t in the [0:1] interval will not produce evenly-spaced P(t). In other words, the speed is not constant (you can differentiate the previous equation with respect to t to see see it).
Usually, a common method to traverse a parametric curve at constant-speed is to reparametrize by arc-length. This means expressing P as P(s) where s is the length traversed along the curve. Obviously, s varies from zero to the total length of the curve. In the case of a quadratic bezier curve, there's a closed-form solution for the arc-length as a function of t, but it's a bit complicated. Computationally, it's often faster to just integrate numerically using your favorite method. Notice however that the idea is to compute the inverse relation, that is, t(s), so as to express P as P(t(s)). Then, choosing evenly-spaced s will produce evenly-space P.

Resources