Derivative of a program - transformation

Let us assume you can represent a program as mathematical function, that's possible. How does the program representation of the first derivative of that function look like? Is there a way to transform a program to its "derivative" form, and does this make sense at all?

Yes it does make sense, it's known as Automatic Differentiation. There are one or two experimental compilers which can do this, for example NAGware's Differentiation Enabled Fortran Compiler Technology. And there are a lot of research papers on the topic. I suggest you get Googling.

First, it only makes sense to try to get the derivative of a pure function (one that does not affect external state and returns the exact same output for every input). Second, the type system of many programming languages involves a lot of step functions (e.g. integers), meaning you'd have to get your program to work in terms of continuous functions in order to get a valid first derivative. Third, getting the derivative of any function involves breaking it down and manipulating it symbolically. Thus, you can't get the derivative of a function without knowing how what operations it is made of. This could be achieved with reflection.
You could create a derivative approximation function if your programming language supports closures (that is, nested functions and the ability to put functions into variables and return them). Here is a JavaScript example taken from http://en.wikipedia.org/wiki/Closure_%28computer_science%29 :
function derivative(f, dx) {
return function(x) {
return (f(x + dx) - f(x)) / dx;
};
}
Thus, you could say:
function f(x) { return x*x; }
f_prime = derivative(f, 0.0001);
Here, f_prime will approximate function(x) {return 2*x;}
If a programming language implemented higher-order functions and enough algebra, one could implement a real derivative function in it. That would be really cool.

See Lambda the Ultimate discussions on Derivatives and dissections of data types and Derivatives of Regular Expressions

How do you define the mathematical function of a program?
A derivative represent the rate of change of a function. If your function isn't continuous its derivative will be undefined over most of the domain.

I'm just gonna say that this doesn't make a lot of sense, as a program is much more abstract and "ruleless" than a mathematical function. As a derivative is a measure of the change in output as the input changes, there are certainly some programs where this could apply. However, you'd need to be able to quantify your input/output both in numerical terms.
Since input/output would both numerical, it's reasonable to assume that your program represents or operates similarly to a mathematical function, or series of functions. Hence, you can easily represent a derivative, but it would be no different than converting the mathematical derivative of a function to a computer program.

If the program is denoted as a distribution (Schwartz) then you have some notion of derivative assuming that tests functions models your postcondition (you can still take the limit to get a characteristic function). For instance, the assignment x:=x+1 is associated to the Dirac distribution \delta_{x_0+1} where x_0 is the initial value of the variable x. However, I have no idea what is the computational meaning of \delta_{x_0+1}'.

I am wondering, what if the program your're trying to "derive" uses some form of heursitics ? How can it be derived then ?
Half-jokingly, we all know that all real programs use at least a rand().

Related

jax minimization with stochastically estimated gradients

I'm trying to use the bfgs optimizer from tensorflow_probability.substrates.jax and from jax.scipy.optimize.minimize to minimize a function f which is estimated from pseudo-random samples and has a jax.random.PRNGKey as argument. To use this function with the jax/tfp bfgs minimizer, I wrap the function inside a lambda function
seed = 100
key = jax.random.PRNGKey(seed)
fun = lambda x: return f(x,key)
result = jax.scipy.optimize.minimize(fun = fun, ...)
What is the best way to update the key when the minimization routine calls the function to be minimized so that I use different pseudo-random numbers in a reproducible way? Maybe a global key variable? If yes, is there an example I could follow?
Secondly, is there a way to make the optimization stop after a certain amount of time, as one could do with a callback in scipy? I could directly use the scipy implementation of bfgs/ l-bfgs-b/ etc and use jax ony for the estimation of the function and of tis gradients, which seems to work. Is there a difference between the scipy, jax.scipy and tfp.jax bfgs implementations?
Finally, is there a way to print the values of the arguments of fun during the bfgs optimization in jax.scipy or tfp, given that f is jitted?
Thank you!
There is no way to do what you're asking with jax.scipy.optimize.minimize, because the minimizer does not offer any means to track changing state between function calls, and does not provide for any inbuilt stochasticity in the optimizer.
If you're interested in stochastic optimization in JAX, you might try stochastic optimization in JAXOpt, which provides a much more flexible set of optimization routines.
Regarding your second question, if you'd like to print values during the course of a jit-compiled optimization or other loop, you can use jax.debug.print.

Generate a Random number in Uppaal

My question is Can I generate a random number in Uppaal?
I would like to generate a number from a range of values. Even more, I would like to generate not just integers I would like to generate double values as well.
for example: double [7.25,18.3]
I found this question that were talking about the same. I tried it.
However, I got this error: syntax error unexpected T_SELECT.
It doesn't work. I'm pretty new in Uppaal world, I would appreciate any help that you can provide me.
Regards,
This is a common and misunderstood question in Uppaal.
Simple answer:
double val; // declaration
val = random(18.3-7.25)+7.25; // use in update, works in SMC (Uppaal v4.1)
Verbose answer:
Uppaal supports symbolic analysis as well as statistical and the treatment and possibilities are radically different. So one has to decide first what kind of analysis is needed. Usually one starts with simple symbolic analysis and then augment with stochastic features, sometimes stochastic behavior needs also to be checked symbolically.
In symbolic analysis (queries A[], A<>, E<>, E[] etc), random is synonymous with non-deterministic, i.e. if the model contains some "random" behavior, then verification should check all of them any way. Therefore such behavior is modelled as non-deterministic choices between edges. It is easy to setup a set of edges over an integer range by using select statement on the edge where a temporary variable is declared and its value can be used in guards, synchronization and update. Symbolic analysis supports only integer data types (no floating point types like double) and continuous ranges over clocks (specified by constraints in guards and invariants).
Statistical analysis (via Monte-Carlo simulations, queries like Pr[...](<> p), E[...](max: var), simulate, etc) supports double types and floating point functions like sin, cos, sqrt, random(MAX) (uniform distribution over [0, MAX)), random_normal(mean, dev) etc. in addition to int data types. Clock variables can also be treated as floating point type, except that their derivative is set to 1 by default (can be changed in the invariants which allow ODEs -- ordinary differential equations).
It is possible to create models with floating point operations (including random) and still apply symbolic analysis provided that the floating point variables do not influence/constrain the model behavior, and act merely as a cost function over the state space. Here are systematic rules to achieve this:
a) the clocks used in ODEs must be declared of hybrid clock type.
b) hybrid clock and double type variables cannot appear in guard and invariant constraints. Only ODEs are allowed over the hybrid clocks in the invariant.

error bound in function approximation algorithm

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".
From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.
For example let x = (sx,ex,mx) (sign exp and mantissa) then...
log2(x) = ex + log2(1.mx) so basically the range reduced function is "log2(1.mx)".
I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...
"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2(1.mx) with p ulp error"
Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.
I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?
I'm also trying to derive such bound by myself but it is not easy...
There is a description of current practice, along with a proposed improvement and an error analysis, at https://hal.inria.fr/ensl-00086904/document. The description of current practice appears consistent with the overview at https://docs.oracle.com/cd/E37069_01/html/E39019/z4000ac119729.html, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.
I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

Is there a way to predict unknown function value based on its previous values

I have values returned by unknown function like for example
# this is an easy case - parabolic function
# but in my case function is realy unknown as it is connected to process execution time
[0, 1, 4, 9]
is there a way to predict next value?
Not necessarily. Your "parabolic function" might be implemented like this:
def mindscrew
#nums ||= [0, 1, 4, 9, "cat", "dog", "cheese"]
#nums.pop
end
You can take a guess, but to predict with certainty is impossible.
You can try using neural networks approach. There are pretty many articles you can find by Google query "neural network function approximation". Many books are also available, e.g. this one.
If you just want data points
Extrapolation of data outside of known points can be estimated, but you need to accept the potential differences are much larger than with interpolation of data between known points. Strictly, both can be arbitrarily inaccurate, as the function could do anything crazy between the known points, even if it is a well-behaved continuous function. And if it isn't well-behaved, all bets are already off ;-p
There are a number of mathematical approaches to this (that have direct application to computer science) - anything from simple linear algebra to things like cubic splines; and everything in between.
If you want the function
Getting esoteric; another interesting model here is genetic programming; by evolving an expression over the known data points it is possible to find a suitably-close approximation. Sometimes it works; sometimes it doesn't. Not the language you were looking for, but Jason Bock shows some C# code that does this in .NET 3.5, here: Evolving LINQ Expressions.
I happen to have his code "to hand" (I've used it in some presentations); with something like a => a * a it will find it almost instantly, but it should (in theory) be able to find virtually any method - but without any defined maximum run length ;-p It is also possible to get into a dead end (evolutionary speaking) where you simply never recover...
Use the Wolfram Alpha API :)
Yes. Maybe.
If you have some input and output values, i.e. in your case [0,1,2,3] and [0,1,4,9], you could use response surfaces (basicly function fitting i believe) to 'guess' the actual function (in your case f(x)=x^2). If you let your guessing function be f(x)=c1*x+c2*x^2+c3 there are algorithms that will determine that c1=0, c2=1 and c3=0 given your input and output and given the resulting function you can predict the next value.
Note that most other answers to this question are valid as well. I am just assuming that you want to fit some function to data. In other words, I find your question quite vague, please try to pose your questions as complete as possible!
In general, no... unless you know it's a function of a particular form (e.g. polynomial of some degree N) and there is enough information to constrain the function.
e.g. for a more "ordinary" counterexample (see Chuck's answer) for why you can't necessarily assume n^2 w/o knowing it's a quadratic equation, you could have f(n) = n4 - 6n3 + 12n2 - 6n, which has for n=0,1,2,3,4,5 f(n) = 0,1,4,9,40,145.
If you do know it's a particular form, there are some options... if the form is a linear addition of basis functions (e.g. f(x) = a + bcos(x) + csqrt(x)) then using least-squares can get you the unknown coefficients for the best fit using those basis functions.
See also this question.
You can apply statistical methods to try and guess the next answer, but that might not work very well if the function is like this one (c):
int evil(void){
static int e = 0;
if(50 == e++){
e = e * 100;
}
return e;
}
This function will return nice simple increasing numbers then ... BAM.
That's a hard problem.
You should check out the recurrence relation equation for special cases where it could be possible such a task.

Resources