Conditioned Slicing in Frama-C - slice

My last question (Understanding Frama-C slicer results) was on a precise example, but as I said, my goal is to know if it is possible to do some conditioned slicing (forward and backward) with Frama-C. Is it possible?
More precisely, I can't obtain a precise slice of this program :
/*# requires a >= b;
# assigns \nothing;
# ensures \result == a;
*/
int example4_instr1(int a, int b){
int max = a;
if(a < b)
max = b;
return max;
}
Is it possible, by using good parameters/options, to get what I want in this case/in the general case?

As Pascal mentioned in his answer to your previous question, Frama-C's backward and forward slicing are based on the results of an analysis called Value Analysis. This analysis is non-relational; this means that it only keeps information about the numeric range of variables, but not about e.g. the difference between two variables. Thus, it is not able to keep track of your inequality a >= b. This explains why both branches of the test if (a < b) appear to be followed.
Without more information from either the user (but, in this example, nothing that you could write will help the Value Analysis), or another analysis, the backward slicing must consider that the if may or may not be taken. This unfortunately results in a program from which nothing has been sliced away.

Related

how post and pre increment works with multiplication operator? [duplicate]

What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)

What is the purpose of this method?

in an interview question, I got asked the following:
What is the purpose of the below method, and how can we rewrite it?
public int question_1(int a, int b)
{
while (a > b)
{
a -= b;
}
return a;
}
at first I thought it is equivalent to a%b, but it is not since it is "while (a > b)" and not "while ( a >= b)".
Thanks
Honestly, it's impossible to know the purpose of a method just by reading its implementation, even if we assume that it's bug-free.
But, we can start by documenting its behaviors:
If b is positive:
If a is positive, the method returns the least positive integer that is congruent to a modulo b. (For example, given 15 and 10, it will return 5; given 30 and 10, it will return 10.)
Otherwise, the method returns a.
If b is zero:
If a is positive, the method loops forever.
Otherwise, the method returns a.
If b is negative:
If a ≤ b, the method returns a.
Otherwise, the method's behavior depends on the language, since it will increase a until it's no longer greater than b. If the language defines integer arithmetic using "wraparound" rules, then the method will loop for a long time, then finally return a very negative number (unless b is itself very negative, in which case, depending on the value of a, the function might loop forever).
and given these, we can infer that the behaviors with zero and negative numbers are bizarre enough that the method is probably actually only intended to be used with positive numbers. So its behavior can be summarized as:
If a and b are both positive, then the method returns the least positive integer that is congruent to a modulo b.
If the above inference is correct, then the method can be rewritten as:
public int question_1(int a, int b) {
if (a <= 0 || b <= 0)
throw new IllegalArgumentException();
return (a - 1) % b + 1;
}
I would guess that its purpose is to compute a%b for positive ints, and that it has a bug.
If I saw this in production, I would have to check the uses of this function to see if question_1(n,n) == n is really correct. If so, I'd add a comment indicating why that is so. Otherwise I'd fix it.
It either case, it could be rewritten to use the % operator instead of a loop. If it's correct, it could be rewritten like this:
public int question_1(int a, int b)
{
if (a>b)
{
a = ((a-1)%b) + 1;
}
return a;
}
This is not the same in its handling of negative numbers, though, so again you'd have to check to make sure that's OK.
The reason I provide this answer when #ruakh has already provided such a carefully considered answer is that this is an interview question, so it's best if you take the opportunity to show how you would approach a problem like this on the job.
You don't really want to give the impression that you would spend a long time and a lot of effort thinking carefully about such a simple problem -- if you have to spend that much effort to solve a simple problem, imagine what you would spend on a big one!
At the same time, you want to demonstrate that you recognize the possible bug, and take the initiative to fix it or to spare future engineers the same task.

abstract inplace mergesort for effective merge sort

I am reading about merge sort in Algorithms in C++ by Robert Sedgewick and have following questions.
static void mergeAB(ITEM[] c, int cl, ITEM[] a, int al, int ar, ITEM[] b, int bl, int br )
{
int i = al, j = bl;
for (int k = cl; k < cl+ar-al+br-bl+1; k++)
{
if (i > ar) { c[k] = b[j++]; continue; }
if (j > br) { c[k] = a[i++]; continue; }
c[k] = less(a[i], b[j]) ? a[i++] : b[j++];
}
}
The characteristic of the basic merge that is worthy of note is that
the inner loop includes two tests to determine whether the ends of the
two input arrays have been reached. Of course, these two tests usually
fail, and the situation thus cries out for the use of sentinel keys to
allow the tests to be removed. That is, if elements with a key value
larger than those of all the other keys are added to the ends of the a
and aux arrays, the tests can be removed, because when the a (b) array
is exhausted, the sentinel causes the next elements for the c array to
be taken from the b (a) array until the merge is complete.
However, it is not always easy to use sentinels, either because it
might not be easy to know the largest key value or because space might
not be available conveniently.
For merging, there is a simple remedy. The method is based on the
following idea: Given that we are resigned to copying the arrays to
implement the in-place abstraction, we simply put the second array in
reverse order when it is copied (at no extra cost), so that its
associated index moves from right to left. This arrangement leads to
the largest element—in whichever array it is—serving as sentinel for
the other array.
My questions on above text
What does statement "when the a (b) array is exhausted"? what is 'a (b)' here?
Why is the author mentioning that it is not easy to determine the largest key and how is the space related in determining largest key?
What does author mean by "Given that we are resigned to copying the arrays"? What is resigned in this context?
Request with simple example in understanding idea which is mentioned as simple remedy?
"When the a (b) array is exhausted" is a shorthand for "When either the a array or the b array is exhausted".
The interface is dealing with sub-arrays of a bigger array, so you can't simply go writing beyond the ends of the arrays.
The code copies the data from two arrays into one other array. Since this copy is inevitable, we are 'resigned to copying the arrays' means we reluctantly accept that it is inevitable that the arrays must be copied.
Tricky...that's going to take some time to work out what is meant.
Tangentially: That's probably not the way I'd write the loop. I'd be inclined to use:
int i = al, j = bl;
for (int k = cl; i <= ar && j <= br; k++)
{
if (a[i] < b[j])
c[k] = a[i++];
else
c[k] = b[j++];
}
while (i <= ar)
c[k++] = a[i++];
while (j <= br)
c[k++] = b[j++];
One of the two trailing loops does nothing. The revised main merge loop has 3 tests per iteration versus 4 tests per iteration for the one original algorithm. I've not formally measured it, but the simpler merge loop is likely to be quicker than the original single-loop algorithm.
The first three questions are almost best suited for English Language Learners.
a(b) and b(a)
Sometimes parenthesis are used to tell one or more similar phrases at once:
when a (b) is exhausted we copy elements from b (a)
means:
when a is exhausted we copy elements from b,
when b is exhausted we copy elements from a
What is difficult about sentinels
Two annoying things about sentinels are
sometimes your array data may potentially contain every possible value, so there is no value you can use as sentinel that is guaranteed to be bigger that all the values in the array
to use a sentinel instead of checking the index to see if you are done with an array requires that you have room for one extra space in the array to store the sentinel
Resigning
We programmers are never happy to copy (or move) things around and leaving them where they already are is, if possible, better (because we are lazy).
In this version of the merge sort we already gave up about trying to not copy things around... we resigned to it.
Given that we must copy, we can copy things in the opposite order if we like (and of course use the copy in opposite order) because that is free(*).
(*) is free at this level of abstraction, the cost on some real CPU may be high. As almost always in the performance area YMMV.

multiple arithmetic expressions in processing

Ok, so still getting use to the basics with processing, and I am unsure if this is the correct way to do multiple arithmetic expressions with the same data, should I be typing each as its own code, or doing it like this?
here is the question;
Write the statements which perform the following arithmetic operations (note: the variable names can be changed). (i) a=50 b=60
c=43 result1 = a+b+c result2=a*b result3 = a/b
here is my code;
short a = 50;
short b = 60;
short c = 43;
int sum = a+b+c; // Subsection i
print (sum);
int sum2 = a*b; // Subsection ii
print (sum2);
int sum3 =a/b; // Subsection iii
print (sum3);
Using the same variable for a in all three expressions, like you're doing, is the right way. This means that if you wanted to change a, b, or c you'd only have to change it in one place.
You didn't mention what language, but there are a couple problems. It's hard to say what your knowledge level is, so I apologize in advance if this is beyond the scope of the assignment.
First, your variables are defined as short but they end up being assigned to int variables. That's implicit typecasting. Granted, short is basically a subset of int in most languages, but you should be aware that you're doing it and implicit typecasting can cause problems. It's slightly bad practice.
Second, your variable names are all called sumX but only one is a sum. That's definitely bad practice. Variable names should be meaningful and represent what they actually are.
Third, your division is dividing two integers and storing the result into an integer. This means that if you're using a strongly typed language you will be truncating the fractional portion of the quotient. You will get 0 as your output: 50 / 60 = 0.8333[...] which when converted to an integer truncates to 0. You may wish to consider using double or float as your data types if your answer is supposed to be accurate.

Is there any clever way to determine whether a point is in a rectangle?

I want to calculate whether a point, (x,y), is inside a rectangle which is determined by two points, (a,b) and (c,d).
If a<=c and b<=d, then it is simple:
a<=x&&x<=c&&b<=y&&y<=d
However, since it is unknown whether a<=c or b<=d, the code should be
(a<=x&&x<=c||c<=x&&x<=a)&&(b<=y&&y<=d||d<=y&&y<=b)
This code may work, but it is too long. I can write a function and use it, but I wonder if there's shorter way (and should be executed very fast - the code is called a lot) to write it.
One I can imagine is:
((c-x)*(x-a)>=0)&&((d-y)*(y-b)>=0)
Is there more clever way to do this?
(And, is there any good way to iterate from a from c?)
Swap the variables as needed so that a = xmin and b = ymin:
if a > c: swap(a,c)
if b > d: swap(b,d)
a <= x <= c and b <= y <= d
Shorter but slightly less efficient:
min(a,c) <= x <= max(a,c) and min(b,d) <= y <= max(b,d)
As always when optimizing you should profile the different options and compare hard numbers. Pipelining, instruction reordering, branch prediction, and other modern day compiler/processor optimization techniques make it non-obvious whether programmer-level micro-optimizations are worthwhile. For instance it used to be significantly more expensive to do a multiply than a branch, but this is no longer always the case.
I like the this:
((c-x)*(x-a)>=0)&&((d-y)*(y-b)>=0)
but with more whitespace and more symmetry:
(c-x)*(a-x) <= 0 && (d-y)*(b-y) <= 0
It's mathematically elegant, and probably the fastest too. You will need to measure to really determine which is the fastest. With modern pipelined processors, I would expect that straight-line code with the minimum number of operators will run fastest.
While sorting the (a, b) and (c, d) pairs as suggested in the accepted answer is probably the best solution in this case, an even better application of this method would probably be to elevate the a < b and c < d requirement to the level of the program-wide invariant. I.e. require that all rectangles in your program are created and maintained in this "normalized" form from the very beginning. Thus, inside your point-in-rectangle test function you should simply assert that a < b and c < d instead of wasting CPU resources on actually sorting them in every call.
Define intermediary variables i = min(a,b), j = min(c,d), k = max(a,b), l = max(c,d)
Then you only need i<=x && x<=k && j<=y && y<=l.
EDIT: Mind you, efficiency-wise it's probably better to use your "too long" code in a function.

Resources