/*undefined sequence*/ in sliced code from Frama-C - slice

I am trying to slice code using Frama-C.
The source code is
static uint8_T ALARM_checkOverInfusionFlowRate(void)
{
uint8_T ov;
ov = 0U;
if (ALARM_Functional_B.In_Therapy) {
if (ALARM_Functional_B.Flow_Rate > ALARM_Functional_B.Flow_Rate_High) {
ov = 1U;
} else if (ALARM_Functional_B.Flow_Rate >
ALARM_Functional_B.Commanded_Flow_Rate * div_s32
(ALARM_Functional_B.Tolerance_Max, 100) +
ALARM_Functional_B.Commanded_Flow_Rate) {
ov = 1U;
} else {
if (ALARM_Functional_B.Flow_Rate > ALARM_Functional_B.Commanded_Flow_Rate * div_s32(ALARM_Functional_B.Tolerance_Min, 100) + ALARM_Functional_B.Commanded_Flow_Rate) {
ov = 2U;
}
}
}
return ov;
}
When I sliced the code usig Frama-C, I get the following. I don't know what this “undefined sequence” means.
static uint8_T ALARM_checkOverInfusionFlowRate(void)
{
uint8_T ov;
ov = 0U;
if (ALARM_Functional_B.In_Therapy)
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Flow_Rate_High)
ov = 1U;
else {
int32_T tmp_0;
{
/*undefined sequence*/
tmp_0 = div_s32((int)ALARM_Functional_B.Tolerance_Max,100);
}
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Commanded_Flow_Rate * tmp_0 + (int)ALARM_Functional_B.Commanded_Flow_Rate)
ov = 1U;
else {
int32_T tmp;
{
/*undefined sequence*/
tmp = div_s32((int)ALARM_Functional_B.Tolerance_Min,100);
}
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Commanded_Flow_Rate * tmp + (int)ALARM_Functional_B.Commanded_Flow_Rate)
ov = 2U;
}
}
return ov;
}
Appreciate any help in explaining why this happens.

/* undefined sequence */ in a block simply means that the block has been generated during the code normalization at parsing time but that with respect to C semantics there is no sequence point between the statements composing it. For instance x++ + x++ will be normalized as
{
/*undefined sequence*/
tmp = x;
x ++;
tmp_0 = x;
x ++;
;
}
Internally, each statement in such a sequence is decorated with lists of locations that are accessed for writing or reading (use -kernel-debug 1 with -print to see them in the output). Option -unspecified-access used together with -val will check that such accesses are correct, i.e. that there is at most one statement inside the sequence that write to a given location and if this is the case, that there is no read access to it (except for building the value it is assigned to). In addition, this option does not take care of side-effects occurring in a function call inside the sequence. There is a special plug-in for that, but it has not been released yet.
Finally note that since Frama-C Neon, the comment reads only /*sequence*/, which seems to be less daunting for the user. Indeed, the original code may be correct or may show undefined behavior, but syntactic analysis is too weak to decide in the general case. For instance, (*p)++ + (*q)++ is correct as long as p and q do not overlap. This is why the normalization phase only points out the sequences and leaves it up to more powerful analysis plug-ins to check whether there might be an issue.

Related

How much can we trust to warnings generated by static analysis tools for vulnerablity detection?

I am running flawfinder on a set of libraries written in C/C++. I have a lot of generated warnings by flawfinder. My question is that, how much I can rely on these generated warnings? For example, consider the following function from numpy library (https://github.com/numpy/numpy/blob/4ada0641ed1a50a2473f8061f4808b4b0d68eff5/numpy/f2py/src/fortranobject.c):
static PyObject *
fortran_doc(FortranDataDef def)
{
char *buf, *p;
PyObject *s = NULL;
Py_ssize_t n, origsize, size = 100;
if (def.doc != NULL) {
size += strlen(def.doc);
}
origsize = size;
buf = p = (char *)PyMem_Malloc(size);
if (buf == NULL) {
return PyErr_NoMemory();
}
if (def.rank == -1) {
if (def.doc) {
n = strlen(def.doc);
if (n > size) {
goto fail;
}
memcpy(p, def.doc, n);
p += n;
size -= n;
}
else {
n = PyOS_snprintf(p, size, "%s - no docs available", def.name);
if (n < 0 || n >= size) {
goto fail;
}
p += n;
size -= n;
}
}
else {
PyArray_Descr *d = PyArray_DescrFromType(def.type);
n = PyOS_snprintf(p, size, "'%c'-", d->type);
Py_DECREF(d);
if (n < 0 || n >= size) {
goto fail;
}
p += n;
size -= n;
if (def.data == NULL) {
n = format_def(p, size, def) == -1;
if (n < 0) {
goto fail;
}
p += n;
size -= n;
}
else if (def.rank > 0) {
n = format_def(p, size, def);
if (n < 0) {
goto fail;
}
p += n;
size -= n;
}
else {
n = strlen("scalar");
if (size < n) {
goto fail;
}
memcpy(p, "scalar", n);
p += n;
size -= n;
}
}
if (size <= 1) {
goto fail;
}
*p++ = '\n';
size--;
/* p now points one beyond the last character of the string in buf */
#if PY_VERSION_HEX >= 0x03000000
s = PyUnicode_FromStringAndSize(buf, p - buf);
#else
s = PyString_FromStringAndSize(buf, p - buf);
#endif
PyMem_Free(buf);
return s;
fail:
fprintf(stderr, "fortranobject.c: fortran_doc: len(p)=%zd>%zd=size:"
" too long docstring required, increase size\n",
p - buf, origsize);
PyMem_Free(buf);
return NULL;
}
There are two memcpy() API calls, and flawfinder tells me that:
['vul_fortranobject.c:216: [2] (buffer) memcpy:\\n Does not check for buffer overflows when copying to destination (CWE-120).\\n Make sure destination can always hold the source data.\\n memcpy(p, "scalar", n);']
I am not sure whether the report is true.
To answer your question: static analysis tools (like FlawFinder) can generate a LOT of "false positives".
I Googled to find some quantifiable information for you, and found an interesting article about "DeFP":
https://arxiv.org/pdf/2110.03296.pdf
Static analysis tools are frequently used to detect potential
vulnerabilities in software systems. However, an inevitable problem of
these tools is their large number of warnings with a high false
positive rate, which consumes time and effort for investigating. In
this paper, we present DeFP, a novel method for ranking static analysis warnings.
Based on the intuition that warnings which have
similar contexts tend to have similar labels (true positive or false
positive), DeFP is built with two BiLSTM models to capture the
patterns associated with the contexts of labeled warnings. After that,
for a set of new warnings, DeFP can calculate and rank them according
to their likelihoods to be true positives (i.e., actual
vulnerabilities).
Our experimental results on a dataset of 10
real-world projects show that using DeFP, by investigating only 60% of
the warnings, developers can find
+90% of actual vulnerabilities. Moreover, DeFP improves the state-of-the-art approach 30% in both Precision and Recall.
Apparently, the authors built a neural network to analyze FlawFinder results, and rank them.
I doubt DeFP is a practical "solution" for you. But yes: if you think that specific "memcpy()" warning is a "false positive" - then I'm inclined to agree. It very well could be :)

Performance of F# Array.reduce

I noticed while doing some F# experiments that if write my own reduce function for Array that it performs much better than the built in reduce. For example:
type Array with
static member inline fastReduce f (values : 'T[]) =
let mutable result = Unchecked.defaultof<'T>
for i in 0 .. values.Length-1 do
result <- f result values.[i]
result
This seems to behave identically to the built in Array.reduce but is ~2x faster for simple f
Is the built in one more flexible in some way?
By looking at the generated IL code it's easier to understand what's happening.
Using the built-in Array.reduce:
let reducer (vs : int []) : int = Array.reduce (+) vs
Gives the following equivalent C# (reverse engineered from the IL code using ILSpy)
public static int reducer(int[] vs)
{
return ArrayModule.Reduce<int>(new Program.BuiltIn.reducer#31(), vs);
}
Array.reduce looks like this:
public static T Reduce<T>(FSharpFunc<T, FSharpFunc<T, T>> reduction, T[] array)
{
if (array == null)
{
throw new ArgumentNullException("array");
}
int num = array.Length;
if (num == 0)
{
throw new ArgumentException(LanguagePrimitives.ErrorStrings.InputArrayEmptyString, "array");
}
OptimizedClosures.FSharpFunc<T, T, T> fSharpFunc = OptimizedClosures.FSharpFunc<T, T, T>.Adapt(reduction);
T t = array[0];
int num2 = 1;
int num3 = num - 1;
if (num3 >= num2)
{
do
{
t = fSharpFunc.Invoke(t, array[num2]);
num2++;
}
while (num2 != num3 + 1);
}
return t;
}
Notice that it invoking the reducer function f is a virtual call which typically the JIT:er struggles to inline.
Compare to your fastReduce function:
let reducer (vs : int []) : int = Array.fastReduce (+) vs
The reverse-engineered C# code:
public static int reducer(int[] vs)
{
int num = 0;
for (int i = 0; i < vs.Length; i++)
{
num += vs[i];
}
return num;
}
A lot more efficient as the virtual call is now gone. It seems that in this case F# inlines both the code for fastReduce as well as (+).
There's some kind of cut-off in F# as more complex reducer functions won't be inlined. I am unsure on the exact details.
Hope this helps
A side-note; Unchecked.defaultOf returns null values for class types in .NET such as string. I prefer LanguagePrimitives.GenericZero.
PS. A common trick for the real performance hungry is to loop towards 0. In F# that doesn't work for for-expressions because of a slight performance bug in how for-expressions are generated. In those case you can try to implement the loop using tail-recursion.

All of the option to replace an unknown number of characters

I am trying to find an algorithm that for an unknown number of characters in a string, produces all of the options for replacing some characters with stars.
For example, for the string "abc", the output should be:
*bc
a*c
ab*
**c
*b*
a**
***
It is simple enough with a known number of stars, just run through all of the options with for loops, but I'm having difficulties with an all of the options.
Every star combination corresponds to binary number, so you can use simple cycle
for i = 1 to 2^n-1
where n is string length
and set stars to the positions of 1-bits of binary representations of i
for example: i=5=101b => * b *
This is basically a binary increment problem.
You can create a vector of integer variables to represent a binary array isStar and for each iteration you "add one" to the vector.
bool AddOne (int* isStar, int size) {
isStar[size - 1] += 1
for (i = size - 1; i >= 0; i++) {
if (isStar[i] > 1) {
if (i = 0) { return true; }
isStar[i] = 0;
isStar[i - 1] += 1;
}
}
return false;
}
That way you still have the original string while replacing the characters
This is a simple binary counting problem, where * corresponds to a 1 and the original letter to a 0. So you could do it with a counter, applying a bit mask to the string, but it's just as easy to do the "counting" in place.
Here's a simple implementation in C++:
(Edit: The original question seems to imply that at least one character must be replaced with a star, so the count should start at 1 instead of 0. Or, in the following, the post-test do should be replaced with a pre-test for.)
#include <iostream>
#include <string>
// A cleverer implementation would implement C++'s iterator protocol.
// But that would cloud the simple logic of the algorithm.
class StarReplacer {
public:
StarReplacer(const std::string& s): original_(s), current_(s) {}
const std::string& current() const { return current_; }
// returns true unless we're at the last possibility (all stars),
// in which case it returns false but still resets current to the
// original configuration.
bool advance() {
for (int i = current_.size()-1; i >= 0; --i) {
if (current_[i] == '*') current_[i] = original_[i];
else {
current_[i] = '*';
return true;
}
}
return false;
}
private:
std::string original_;
std::string current_;
};
int main(int argc, const char** argv) {
for (int a = 1; a < argc; ++a) {
StarReplacer r(argv[a]);
do {
std::cout << r.current() << std::endl;
} while (r.advance());
std::cout << std::endl;
}
return 0;
}

inserting zeros between the elements of vector with high performance and speed ( preferred to use STL)

I have extracted raster data of a geotiff image using RasterIO of the GDAL library. Since the image shown by OpenGL needs to have width and height both a multiple of 4, I have used this code after extracting the data.
the first switch block evaluates the rest of RasterXSize(width) divided by 4 and for example if it is 1, it means that we should add 3 columns meaning that we should add 3 zeros at the end of each row. This is done by the code:
for ( int i = 1; i <= RasterYSize; i++)
pRasterData.insert(pRasterData.begin()+i*RasterXSize*depthOfPixel+(i-1)*3,3,0);
and the second switch block evaluates the rest of RasterYSize(height) divided by 4 and for example if it is 1, it means that we should easily add 3 rows to the end of the data which is done by this code:
pRasterData.insert(pRasterData.end(),3*RasterXSize,0);
This is the whole code that I have used for extracting the data and preparing it to be displayed by OpenGL:
void FilesWorkFlow::ReadRasterData(GDALDataset* poDataset)
{
RasterXSize = poDataset -> GetRasterXSize();
RasterYSize = poDataset -> GetRasterYSize();
RasterCount = poDataset -> GetRasterCount();
CPLErr error = CE_None;
GDALRasterBand *poRasterBand;
poRasterBand = poDataset -> GetRasterBand(1);
eType = poRasterBand -> GetRasterDataType();
BytesPerPixel = GDALGetDataTypeSize(eType) / 8;
depthOfPixel = RasterCount * BytesPerPixel;
pRasterData.resize(RasterXSize * RasterYSize * RasterCount * BytesPerPixel);
error = poDataset -> RasterIO(GF_Read,0,0,RasterXSize,RasterYSize,&pRasterData[0],RasterXSize,RasterYSize,eType,RasterCount,0,0,0,0);
int modRasterXSize = RasterXSize % 4;
switch (modRasterXSize)
{
case 1:
{
for ( int i = 1; i <= RasterYSize; i++)
pRasterData.insert(pRasterData.begin()+i*RasterXSize*depthOfPixel+(i-1)*3,3,0);
RasterXSize = RasterXSize+3;
break;
}
case 2:
{
for ( int i = 1; i <= RasterYSize; i++)
pRasterData.insert(pRasterData.begin()+i*RasterXSize*depthOfPixel+(i-1)*2,2,0);
RasterXSize = RasterXSize+2;
break;
}
case 3:
{
for ( int i = 1; i <= RasterYSize; i++)
pRasterData.insert(pRasterData.begin()+i*RasterXSize*depthOfPixel+(i-1)*1,1,0);
RasterXSize = RasterXSize+1;
break;
}
}
int modRasterYSize = RasterYSize % 4;
switch (modRasterYSize)
{
case 1:
{
pRasterData.insert(pRasterData.end(),3*RasterXSize,0);
RasterYSize = RasterYSize+3;
break;
}
case 2:
{
pRasterData.insert(pRasterData.end(),2*RasterXSize,0);
RasterYSize = RasterYSize+2;
break;
}
case 3:
{
pRasterData.insert(pRasterData.end(),1*RasterXSize,0);
RasterYSize = RasterYSize+1;
break;
}
}
}
the first switch block is where my code gets slow and because I am working with a 16997*15931 image it takes a lot of time for the program to run through the for loop.
Note that pRasterData is a member variable of the class FilesWorkFlow and because of the problems I had in sending this variable to the COpenGLControl class written by Brett Fowle in codeguru and used by me in the project with some slight changes, decided to use vector<unsigned char> instead of unsighned char*.
Now I am wondering is there anyway to implement these part of code faster using vectors?
Is there anyway to insert zero in certain parts of vector without using for loops and wasting too much time?
something like std::transform? I don't know!
Remember that I'm using MFC in Visual Studio 2010 and it's better for me to use STL but if you have another suggestions besides using vectors or STL, I'd be glad to hear that?
The reason it is slow is because the members of the vector are getting moved multiple times. Think about the members in the last row of your image. They all have to be moved once for every row of the image. It would be faster to create a whole new image, copying just the pixels you need from the original image and adding zeros where appropriate.
Here's an example:
void
padColumns(
std::vector<unsigned char> &old_image,
size_t old_width,
size_t new_width
)
{
size_t height = image.size() / old_width;
assert(image.size() == old_width*height);
std::vector<unsigned char> new_image(new_width * height);
for (size_t row=0; row!=height; ++row) {
std::copy(
old_image.begin() + row*old_width,
old_image.begin() + row*old_width + old_width,
new_image.begin() + row*new_width
);
std::fill(
new_image.begin() + row*new_width + old_width,
new_image.begin() + row*new_width + new_width,
0
);
}
old_image = new_image;
}

Understanding Frama-C slicer results

I'd like to know if it's possible to do some kind of forward conditioned slicing with Frama-C and I'm playing with some examples to understand how one could achieve this.
I've got this simple example which seems to result in an imprecise slice and I can't understand why. Here is the function I'd like to slice :
int f(int a){
int x;
if(a == 0)
x = 0;
else if(a != 0)
x = 1;
return x;
}
If I use this specification :
/*# requires a == 0;
# ensures \old(a) == a;
# ensures \result == 0;
*/
then Frama-C returns the following slice (which is precise), using "f -slice-return" criterion and f as entry point :
/*# ensures \result ≡ 0; */
int f(void){
int x;
x = 0;
return x;
}
But when using this specification :
/*# requires a != 0;
# ensures \old(a) == a;
# ensures \result == 1;
*/
then all instructions (& annotations) remain (when I was waiting for this slice to be returned :
/*# ensures \result ≡ 1; */
int f(void){
int x;
x = 1;
return x;
}
)
In the last case, is the slice imprecise? In this case, what could be the cause?
Regards,
Romain
Edit : I wrote "else if(a != 0) ..." but the problem remains with "else ..."
In Frama-C, the slicing plug-in relies on the result of a preliminary static analysis plug-in called the value analysis.
This value analysis can represent the values for variable a when a == 0 (the set of values is in this case { 0 }) but has a hard time to represent the values for a when it is known that a != 0. In the latter case, if a is not already known to be positive or negative, the value analysis plug-in needs to approximate the set of values for a. If a was known to be positive, for instance if it was an unsigned int, then the nonzero values could be represented as an interval, but the value analysis plug-in cannot represent “all values of type int except 0”.
If you are willing to change the pre-condition, you can write it in a form that is more easily understood by the value analysis plug-in (together with value analysis option -slevel):
$ cat t.c
/*# requires a < 0 || a > 0 ;
# ensures \old(a) == a;
# ensures \result == 0;
*/
int f(int a){
int x;
if(a == 0)
x = 0;
else if(a != 0)
x = 1;
return x;
}
$ frama-c -slevel 10 t.c -main f -slice-return f -then-on 'Slicing export' -print
…
/* Generated by Frama-C */
/*# ensures \result ≡ 0; */
int f(void)
{
int x;
x = 1;
return x;
}
This has no relevance whatsoever with your main question, but your ensures a == \old(a) clause is not doing what you expect. If you pretty-print your source code with option -print, you will see it has been silently transformed into ensures \old(a) == \old(a).
The ACSL language does not permit referring about the value of formal variables in the post-state, mostly because this is meaningless from the point of view of the caller. (The stack frame of the callee is popped after the call terminates.)

Resources