Which type of function is ReL? - backpropagation

I was wondering if the ReL function is a linear or an identity function.
If the function was linear (meaning y=kx+d) the function could be "changed" a lot more.
If the function on the other hand was an identity function (meaning y=x), then output wouldn't be changed - at least for positive inputs.
As far as I've read, positive values that are ran through a ReLU are not changed. Meaning, that the function (from 0 onwards) is an identity function. Is my understanding correct?

When it's applied on a set of input values evenly distributed around zero, the ReL function is linear for half of the input domain (on the positive input values) and nonlinear for the other half. Thus, it is referred to as a piecewise linear function or a hinge function.

Related

Pseudo Random Function vs Pseudo Random Generator

Hi all I am referring PBKDF2, I could not understand what exactly this Pseudo-Random Dunction.
What is the difference between the Pseudo-Random Function and Pseudo-Random Generator?
1)in personal opinion ,Pseudo-Random Function is a likewise true function that can replace random function to simulate the exponential numbers from true random functions set ,for example 2^2n.In Pseudo-Random Function user just uses a seed (k) as key which will be calculated with input x to have a random result. The method is likewise to choose a function randomly from one function set.
2)Normally speaking to Pseudo-Random Function, one property is the advantage between probability of choosing pseudorandom function and probability of true random function is negligible. Then people can think they can choose the pseudorandom function to replace random function.
3)In true random function people must choose some function as random function but the territory or the amount of the whole functions set is too large which can not be stored in computer or written by human. So we choose pseudorandom function as a smart way.

Algorithm to generate a (pseudo-) random high-dimensional function

I don't mean a function that generates random numbers, but an algorithm to generate a random function
"High dimension" means the function is multi-variable, e.g. a 100-dim function has 100 different variables.
Let's say the domain is [0,1], we need to generate a function f:[0,1]^n->[0,1]. This function is chosen from a certain class of functions, so that the probability of choosing any of these functions is the same.
(This class of functions can be either all continuous, or K-order derivative, whichever is convenient for the algorithm.)
Since the functions on a closed interval domain are uncountable infinite, we only require the algorithm to be pseudo-random.
Is there a polynomial time algorithm to solve this problem?
I just want to add a possible algorithm to the question(but not feasible due to its exponential time complexity). The algorithm was proposed by the friend who actually brought up this question in the first place:
The algorithm can be simply described as following. First, we assume the dimension d = 1 for example. Consider smooth functions on the interval I = [a; b]. First, we split the domain [a; b] into N small intervals. For each interval Ii, we generate a random number fi living in some specific distributions (Gaussian or uniform distribution). Finally, we do the interpolation of
series (ai; fi), where ai is a characteristic point of Ii (eg, we can choose ai as the middle point of Ii). After interpolation, we gain a smooth curve, which can be regarded as a one dimensional random function construction living in the function space Cm[a; b] (where m depends on the interpolation algorithm we choose).
This is just to say that the algorithm does not need to be that formal and rigorous, but simply to provide something that works.
So if i get it right you need function returning scalar from vector;
The easiest way I see is the use of dot product
for example let n be the dimensionality you need
so create random vector a[n] containing random coefficients in range <0,1>
and the sum of all coefficients is 1
create float a[n]
feed it with positive random numbers (no zeros)
compute the sum of a[i]
divide a[n] by this sum
now the function y=f(x[n]) is simply
y=dot(a[n],x[n])=a[0]*x[0]+a[1]*x[1]+...+a[n-1]*x[n-1]
if I didn't miss something the target range should be <0,1>
if x==(0,0,0,..0) then y=0;
if x==(1,1,1,..1) then y=1;
If you need something more complex use higher order of polynomial
something like y=dot(a0[n],x[n])*dot(a1[n],x[n]^2)*dot(a2[n],x[n]^3)...
where x[n]^2 means (x[0]*x[0],x[1]*x[1],...)
Booth approaches results in function with the same "direction"
if any x[i] rises then y rises too
if you want to change that then you have to allow also negative values for a[]
but to make that work you need to add some offset to y shifting from negative values ...
and the a[] normalization process will be a bit more complex
because you need to seek the min,max values ...
easier option is to add random flag vector m[n] to process
m[i] will flag if 1-x[i] should be used instead of x[i]
this way all above stays as is ...
you can create more types of mapping to make it even more vaiable
This might not only be hard, but impossible if you actually want to be able to generate every continuous function.
For the one-dimensional case you might be able to create a useful approximation by looking into the Faber-Schauder-System (also see wiki). This gives you a Schauder-basis for continuous functions on an interval. This kind of basis only covers the whole vectorspace if you include infinite linear combinations of basisvectors. Thus you can create some random functions by building random linear combinations from this basis, but in general you won't be able to create functions that are actually represented by an infinite amount of basisvectors this way.
Edit in response to your update:
It seems like choosing a random polynomial function of order K (for the class of K-times differentiable functions) might be sufficient for you since any of these functions can be approximated (around a given point) by one of those (see taylor's theorem). Choosing a random polynomial function is easy, since you can just pick K random real numbers as coefficients for your polynom. (Note that this will for example not return functions similar to abs(x))

What algorithm is used to implement HermiteH function (mathematica)

I need to port a numerical simulation written in Wolfram Mathematica to another language. The part that is giving me trouble is that the code is calling the HermiteH function with a non-integral order (the parameter n is a fractional number, not an integer), which I'm guessing is some extension to Hermite polynomials. What algorithm can be used to implement this function and what does it actually calculate when given a non-integral order?
(I do know how to implement hermite polynomials for integral orders)
http://www.maplesoft.com/support/help/maple/view.aspx?path=HermiteH
For n different from a non-negative integer, the analytic extension of the Hermite polynomial is given by
where KummerM is a Kummer's function (of the first kind) M and Γ is a gamma function

Single Perceptron - Non-linear Evaluating function

In the case of a single perceptron - literature states that it cannot be used for seperating non-linear discriminant cases like the XOR function. This is understandable since the VC-dimension of a line (in 2-D) is 3 and so a single 2-D line cannot discriminate outputs like XOR.
However, my question is why should the evaluating function in the single perceptron be a linear-step function? Clearly if we have a non-linear evaluating function like a sigmoid, this perceptron can discriminate between the 1s and 0s of XOR. So, am I missing something here?
if we have a non-linear evaluating function like a sigmoid, this perceptron can discriminate between the 1s and 0s of XOR
That's not true at all. The criteria for discrimination is not the shape of the line (or hyperplane in higher dimensions), but rather whether the function allows linear separability.
There is no single function that produces a hyperplane capable of separating the points of the XOR function. The curve in the image separates the points, but it is not a function.
To separate the points of XOR, you'll have to use at least two lines (or any other shaped functions). This will require two separate perceptrons. Then, you could use a third perceptron to separate the intermediate results on the basis of sign.
I assume by sigmoid you don't actually mean a sigmoid, but something with a local maximum. Whereas the normal perceptron binary classifier is of the form:
f(x) = (1 if w.x+b>0 else 0)
you could have a function:
f(x) = (1 if |w.x+b|<0.5 else 0)
This certainly would work, but would be fairly artificial, in that you effectively are tailoring your model to your dataset, which is bad.
Whether the normal perceptron algorithm would converge is almost certainly out of the question, though I may be mistaken. http://en.wikipedia.org/wiki/Perceptron#Separability_and_convergence You might need to come up with a whole new way to fit the function, which sort of defeats the purpose.
Or you could just use a support vector machine, which is like perceptron, but is able to handle more complicated cases by using the kernel trick.
Old question, but i want to leave my thoughts (anyone correct me if i'm wrong).
I think you're mixed the concepts of linear model and loss or error function.
The Perceptron is by definition a linear model, so it defines a line/plane/hyperplane which you can use to separate your classes.
The standard Perceptron algorithm extract the signal of your output, giving -1 or 1:
yhat = signal(w * X + w0)
This is fine and will eventually converge if your data is linearly separable.
To improve this you can use a sigmoid to smooth the loss function in the range [-1, 1]:
yhat = -1 + 2*sigmoid(w * X + w0)
mean_squared_error = (Y - yhat)^2
Then use a numerical optimizer like Gradient Descent to minimize the error over your entire dataset. Here w0, w1, w2, ..., wn are your variables.
Now, if your original data is not linearly separable, you can transform it in a way which makes it linearly separable and then apply any linear model. This is true because the model is linear on the weights.
This is basically what models like SVM do under the hoods do to classify your non-linear data.
PS: I'm learning this stuff too, so experts don't be mad at me if i said some crap.

Fitting values with polyfit in Matlab

I have made some measurements with tic-toc of X=qr(A) and [Q,R]=qr(A), where A is a random matrix, with dimensions nxn (n=[100:100:1000]).
Now I want to create a function that describes the time measurements i have made. I want the polynomial to be cubic and i want to use the polyfit function for creating it. Though, i can't understand what arguments to pass to polyfit. The last argument will be 3 (cubic), but what should the other two arguments be?
n is the first argument, and the time is the second. Both should be with the same length

Resources