Generate a Random number in Uppaal - random

My question is Can I generate a random number in Uppaal?
I would like to generate a number from a range of values. Even more, I would like to generate not just integers I would like to generate double values as well.
for example: double [7.25,18.3]
I found this question that were talking about the same. I tried it.
However, I got this error: syntax error unexpected T_SELECT.
It doesn't work. I'm pretty new in Uppaal world, I would appreciate any help that you can provide me.
Regards,

This is a common and misunderstood question in Uppaal.
Simple answer:
double val; // declaration
val = random(18.3-7.25)+7.25; // use in update, works in SMC (Uppaal v4.1)
Verbose answer:
Uppaal supports symbolic analysis as well as statistical and the treatment and possibilities are radically different. So one has to decide first what kind of analysis is needed. Usually one starts with simple symbolic analysis and then augment with stochastic features, sometimes stochastic behavior needs also to be checked symbolically.
In symbolic analysis (queries A[], A<>, E<>, E[] etc), random is synonymous with non-deterministic, i.e. if the model contains some "random" behavior, then verification should check all of them any way. Therefore such behavior is modelled as non-deterministic choices between edges. It is easy to setup a set of edges over an integer range by using select statement on the edge where a temporary variable is declared and its value can be used in guards, synchronization and update. Symbolic analysis supports only integer data types (no floating point types like double) and continuous ranges over clocks (specified by constraints in guards and invariants).
Statistical analysis (via Monte-Carlo simulations, queries like Pr[...](<> p), E[...](max: var), simulate, etc) supports double types and floating point functions like sin, cos, sqrt, random(MAX) (uniform distribution over [0, MAX)), random_normal(mean, dev) etc. in addition to int data types. Clock variables can also be treated as floating point type, except that their derivative is set to 1 by default (can be changed in the invariants which allow ODEs -- ordinary differential equations).
It is possible to create models with floating point operations (including random) and still apply symbolic analysis provided that the floating point variables do not influence/constrain the model behavior, and act merely as a cost function over the state space. Here are systematic rules to achieve this:
a) the clocks used in ODEs must be declared of hybrid clock type.
b) hybrid clock and double type variables cannot appear in guard and invariant constraints. Only ODEs are allowed over the hybrid clocks in the invariant.

Related

error bound in function approximation algorithm

Suppose we have the set of floating point number with "m" bit mantissa and "e" bits for exponent. Suppose more over we want to approximate a function "f".
From the theory we know that usually a "range reduced function" is used and then from such function we derive the global function value.
For example let x = (sx,ex,mx) (sign exp and mantissa) then...
log2(x) = ex + log2(1.mx) so basically the range reduced function is "log2(1.mx)".
I have implemented at present reciprocal, square root, log2 and exp2, recently i've started to work with the trigonometric functions. But i was wandering if given a global error bound (ulp error especially) it is possible to derive an error bound for the range reduced function, is there some study about this kind of problem? Speaking of the log2(x) (as example) i would lke to be able to say...
"ok i want log2(x) with k ulp error, to achieve this given our floating point system we need to approximate log2(1.mx) with p ulp error"
Remember that as i said we know we are working with floating point number, but the format is generic, so it could be the classic F32, but even for example e=10, m = 8 end so on.
I can't actually find any reference that shows such kind of study. Reference i have (i.e. muller book) doesn't treat the topic in this way so i was looking for some kind of paper or similar. Do you know any reference?
I'm also trying to derive such bound by myself but it is not easy...
There is a description of current practice, along with a proposed improvement and an error analysis, at https://hal.inria.fr/ensl-00086904/document. The description of current practice appears consistent with the overview at https://docs.oracle.com/cd/E37069_01/html/E39019/z4000ac119729.html, which is consistent with my memory of the most talked about problem being the mod pi range reduction of trigonometric functions.
I think IEEE floating point was a big step forwards just because it standardized things at a time when there were a variety of computer architectures, so lowering the risks of porting code between them, but the accuracy requirements implied by this may have been overkill: for many problems the constraint on the accuracy of the output is the accuracy of the input data, not the accuracy of the calculation of intermediate values.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

constrained regression with many variables

I have around 200 dummies, and wish to run a constrained OLS regression where I impose that the sum of all coefficients on the dummies is equal to 1.
One option is to type:
constraint define 1 dummy_1+dummy_2 +...+dummy_200=1
cnsreg y x_1 x_2 dummy_1-dummy_200, c(1)
...but typing the constraint out would obviously be very painful.
Is there a way to quickly define such a large constraint? The matrix form would be very quick and straightforward, but after much reading online and in Stata guide, it is not clear to me how to do constraints in matrix form, and if they are even possible.
There are at least two sides to this, how to do it and whether it will work in any statistical sense.
How to do it seems easier than you fear as the difficult bit is just inserting "+" signs between the variable names, and that's string manipulation. Something like
unab myvars : dummy_*
local myvars : subinstr local myvars " " "+", all
mac li
constraint 1 `myvars' = 1
should get you started. The macro list is so you can see what you did, especially if it is not what you want.
Whether it will work for you statistically is outside the scope of this forum, but if that's the only constraint note that it's consistent with all kinds of negative and positive coefficients. Perhaps there are special features of your problem that make it a natural constraint, but my intuition is that such a model will be hard to estimate.
I would take a completely different approach. Such constraints typically occur when trying out a different coding scheme for a set of indicator variables. If that is the case then I would use Stata's factor variables, combined with margins with the contrast operators.

How does a finite state machine perform division?

I am taking a course on models of computation and currently we are doing finite state machines. One my tasks is to draw out a FSM that performs division of 3; to simplify the model the machine only accepts numbers multiple of 3. I am not sure how this exactly works, especially since I imagine FSM putting out only single binary values. Could you guys give examples (division by 2 or 4) or hints on how to approach this?
This is what you need, I think (sorry about the bad picture). The 'E' represents epsilon/lambda/no-output. The label of the edges denotes 'input/output'. For each symbol read there is also a corresponding output which may be lambda (no output).

Algorithms to represent a set of integers with only one integer

This may not be a programming question but it's a problem that arised recently at work. Some background: big C development with special interest in performance.
I've a set of integers and want to test the membership of another given integer. I would love to implement an algorithm that can check it with a minimal set of algebraic functions, using only a integer to represent the whole space of integers contained in the first set.
I've tried a composite Cantor pairing function for instance, but with a 30 element set it seems too complicated, and focusing in performance it makes no sense. I played with some operations, like XORing and negating, but it gives me low estimations on membership. Then I tried with successions of additions and finally got lost.
Any ideas?
For sets of unsigned long of size 30, the following is one fairly obvious way to do it:
store each set as a sorted array, 30 * sizeof(unsigned long) bytes per set.
to look up an integer, do a few steps of a binary search, followed by a linear search (profile in order to figure out how many steps of binary search is best - my wild guess is 2 steps, but you might find out different, and of course if you test bsearch and it's fast enough, you can just use it).
So the next question is why you want a big-maths solution, which will tell me what's wrong with this solution other than "it is insufficiently pleasing".
I suspect that any big-math solution will be slower than this. A single arithmetic operation on an N-digit number takes at least linear time in N. A single number to represent a set can't be very much smaller than the elements of the set laid end to end with a separator in between. So even a linear search in the set is about as fast as a single arithmetic operation on a big number. With the possible exception of a Goedel representation, which could do it in one division once you've found the nth prime number, any clever mathematical representation of sets is going to take multiple arithmetic operations to establish membership.
Note also that there are two different reasons you might care about the performance of "look up an integer in a set":
You are looking up lots of different integers in a single set, in which case you might be able to go faster by constructing a custom lookup function for that data. Of course in C that means you need either (a) a simple virtual machine to execute that "function", or (b) runtime code generation, or (c) to know the set at compile time. None of which is necessarily easy.
You are looking up the same integer in lots of different sets (to get a sequence of all the sets it belongs to), in which case you might benefit from a combined representation of all the sets you care about, rather than considering each set separately.
I suppose that very occasionally, you might be looking up lots of different integers, each in a different set, and so neither of the reasons applies. If this is one of them, you can ignore that stuff.
One good start is to try Bloom Filters.
Basically, it's a probabilistic data structure that gives you no false negative, but some false positive. So when an integer matches a bloom filter, you then have to check if it really matches the set, but it's a big speedup by reducing a lot the number of sets to check.
if i'd understood your correctly, python example:
>>> a=[1,2,3,4,5,6,7,8,9,0]
>>>
>>>
>>> len_a = len(a)
>>> b = [1]
>>> if len(set(a) - set(b)) < len_a:
... print 'this integer exists in set'
...
this integer exists in set
>>>
math base: http://en.wikipedia.org/wiki/Euler_diagram

Resources