strcmp() return different values for same string comparisons [duplicate] - string-comparison

This question already has an answer here:
Why does strcmp() in a template function return a different value?
(1 answer)
Closed 2 years ago.
char s1[] = "0";
char s2[] = "9";
printf("%d\n", strcmp(s1, s2)); // Prints -9
printf("%d\n", strcmp("0", "9")); // Prints -1
Why do strcmp returns different values when it receives the same parameters ?
Those values are still legal since strcmp's man page says that the return value of strcmp can be less, greater or equal than 0, but I don't understand why they are different in this example.

I assume you are using GCC when compiling this, I tried it on 4.8.4. The trick here is that GCC understands the semantics of certain standard library functions (strcmp being one of them). In your case, the compiler will completely eliminate the second strcmp call, because it knows that the result of strcmpgiven string constants "0" and "9" will be negative, and a standard compatible value (-1) will be used instead of doing the call. It cannot do the same with the first call, because s1 and s2 might have been changed in memory (imagine an interrupt, or multiple threads, etc.).
You can do an experiment to validate this. Add the const qualifier to the arrays to let GCC know that they cannot be changed:
const char s1[] = "0";
const char s2[] = "9";
printf("%d\n", strcmp(s1, s2)); // Now this will print -1 as well
printf("%d\n", strcmp("0", "9")); // Prints -1
You can also look at the assembler output form the compiler (use the -S flag).
The best way to check however is to use -fno-builtin, which disables this optimization. With this option, your original code will print -9 in both cases

The difference is due to the implementation of strcmp. As long as it conforms to the (<0, 0, >0), it shouldn't matter to the developer. You cannot rely on anything else. For all you know, the source code could be determining it should be negative, and randomly generating a negative number to throw you off.

Related

Static Analysis erroneously reports out of bounds access

While reviewing a codebase, I came upon a particular piece of code that triggered a warning regarding an "out of bounds access". After looking at the code, I could not see a way for the reported access to happen - and tried to minimize the code to create a reproducible example. I then checked this example with two commercial static analysers that I have access to - and also with the open-source Frama-C.
All 3 of them see the same "out of bounds" access.
I don't. Let's have a look:
3 extern int checker(int id);
4 extern int checker2(int id);
5
6 int compute(int *q)
7 {
8 int res = 0, status;
9
10 status = checker2(12);
11 if (!status) {
12 status = 1;
13 *q = 2;
14 for(int i=0; i<2 && 0!=status; i++) {
15 if (checker(i)) {
16 res = i;
17 status=checker2(i);
18 }
19 }
20 }
21 if (!status)
22 *q = res;
23 return status;
24 }
25
26 int someFunc(int id)
27 {
28 int p;
29 extern int data[2];
30
31 int status = checker2(132);
32 status |= compute(&p);
33 if (status == 0) {
34 return data[p];
35 } else
36 return -1;
37 }
Please don't try to judge the quality of the code, or why it does things the way it does. This is a hacked, cropped and mutated version of the original, with the sole intent being to reach a small example that demonstrates the issue.
All analysers I have access to report the same thing - that the indexing in the caller at line 34, doing the return data[p] may read via the invalid index "2". Here's the output from Frama-C - but note that two commercial static analysers provide exactly the same assessment:
$ frama-c -val -main someFunc -rte why.c |& grep warning
...
why.c:34:[value] warning: accessing out of bounds index. assert p < 2;
Let's step the code in reverse, to see how this out of bounds access at line 34 can happen:
To end up in line 34, the returned status from both calls to checker2 and compute should be 0.
For compute to return 0 (at line 32 in the caller, line 23 in the callee), it means that we have performed the assignment at line 22 - since it is guarded at line 21 with a check for status being 0. So we wrote in the passed-in pointer q, whatever was stored in variable res. This pointer points to the variable used to perform the indexing - the supposed out-of-bounds index.
So, to experience an out of bounds access into the data, which is dimensioned to contain exactly two elements, we must have written a value that is neither 0 nor 1 into res.
We write into res via the for loop at 14; which will conditionally assign into res; if it does assign, the value it will write will be one of the two valid indexes 0 or 1 - because those are the values that the for loop allows to go through (it is bound with i<2).
Due to the initialization of status at line 12, if we do reach line 12, we will for sure enter the loop at least once. And if we do write into res, we will write a nice valid index.
What if we don't write into it, though? The "default" setup at line 13 has written a "2" into our target - which is probably what scares the analysers. Can that "2" indeed escape out into the caller?
Well, it doesn't seem so... if the status checks - at either line 11 or at line 21 fail, we will return with a non-zero status; so whatever value we wrote (or didn't, and left uninitialised) into the passed-in q is irrelevant; the caller will not read that value, due to the check at line 33.
So either I am missing something and there is indeed a scenario that leads to an out of bounds access with index 2 at line 34 (how?) or this is an example of the limits of mainstream formal verification.
Help?
When dealing with a case such as having to distinguish between == 0 and != 0 inside a range, such as [INT_MIN; INT_MAX], you need to tell Frama-C/Eva to split the cases.
By adding //# split annotations in the appropriate spots, you can tell Frama-C/Eva to maintain separate states, thus preventing merging them before status is evaluated.
Here's how your code would look like, in this case (courtesy of #Virgile):
extern int checker(int id);
extern int checker2(int id);
int compute(int *q)
{
int res = 0, status;
status = checker2(12);
//# split status <= 0;
//# split status == 0;
if (!status) {
status = 1;
*q = 2;
for(int i=0; i<2 && 0!=status; i++) {
if (checker(i)) {
res = i;
status=checker2(i);
}
}
}
//# split status <= 0;
//# split status == 0;
if (!status)
*q = res;
return status;
}
int someFunc(int id)
{
int p;
extern int data[2];
int status = checker2(132);
//# split status <= 0;
//# split status == 0;
status |= compute(&p);
if (status == 0) {
return data[p];
} else
return -1;
}
In each case, the first split annotation tells Eva to consider the cases status <= 0 and status > 0 separately; this allows "breaking" the interval [INT_MIN, INT_MAX] into [INT_MIN, 0] and [1, INT_MAX]; the second annotation allows separating [INT_MIN, 0] into [INT_MIN, -1] and [0, 0]. When these 3 states are propagated separately, Eva is able to precisely distinguish between the different situations in the code and avoid the spurious alarm.
You also need to allow Frama-C/Eva some margin for keeping the states separated (by default, Eva will optimize for efficiency, merging states somewhat aggressively); this is done by adding -eva-precision 1 (higher values may be required for your original scenario).
Related options: -eva-domains sign (previously -eva-sign-domain) and -eva-partition-history N
Frama-C/Eva also has other options which are related to splitting states; one of them is the signs domain, which computes information about sign of variables, and is useful to distinguish between 0 and non-zero values. In some cases (such as a slightly simplified version of your code, where status |= compute(&p); is replaced with status = compute(&p);), the sign domain may help splitting without the need for annotations. Enable it using -eva-domains sign (-eva-sign-domain for Frama-C <= 20).
Another related option is -eva-partition history N, which tells Frama-C to keep the states partitioned for longer.
Note that keeping states separated is a bit costly in terms of analysis, so it may not scale when applied to the "real" code, if it contains several more branches. Increasing the values given to -eva-precision and -eva-partition-history may help, as well as adding # split annotations.
I'd like to add some remarks which will hopefully be useful in the future:
Using Frama-C/Eva effectively
Frama-C contains several plug-ins and analyses. Here in particular, you are using the Eva plug-in. It performs an analysis based on abstract interpretation that reports all possible runtime errors (undefined behaviors, as the C standard puts it) in a program. Using -rte is thus unnecessary, and adds noise to the result. If Eva cannot be certain about the absence of some alarm, it will report it.
Replace the -val option with -eva. It's the same thing, but the former is deprecated.
If you want to improve precision (to remove false alarms), add -eva-precision N, where 0 <= N <= 11. In your example program, it doesn't change much, but in complex programs with multiple callstacks, extra precision will take longer but minimize the number of false alarms.
Also, consider providing a minimal specification for the external functions, to avoid warnings; here they contain no pointers, but if they did, you'd need to provide an assigns clause to explicitly tell Frama-C whether the functions modify such pointers (or any global variables, for instance).
Using the GUI and Studia
With the Frama-C graphical interface and the Studia plug-in (accessible by right-clicking an expression of interest and choosing the popup menu Studia -> Writes), and using the Values panel in the GUI, you can easily track what the analysis inferred, and better understand where the alarms and values come from. The only downside is that, it does not report exactly where merges happen. For the most precise results possible, you may need to add calls to an Eva built-in, Frama_C_show_each(exp), and put it inside a loop to get Eva to display, at each iteration of its analysis, the values contained in exp.
See section 9.3 (Displaying intermediate results) of the Eva user manual for more details, including similar built-ins (such as Frama_C_domain_show_each and Frama_C_dump_each, which show information about abstract domains). You may need to #include "__fc_builtin.h" in your program. You can use #ifdef __FRAMAC__ to allow the original code to compile when including this Frama-C-specific file.
Being nitpicky about the term erroneous reports
Frama-C is a semantic-based tool whose main analyses are exhaustive, but may contain false positives: Frama-C may report alarms when they do not happen, but it should never forget any possible alarm. It's a trade-off, you can't have an exact tool in all cases (though, in this example, with sufficient -eva-precision, Frama-C is exact, as in reporting only issues which may actually happen).
In this sense, erroneous would mean that Frama-C "forgot" to indicate some issue, and we'd be really concerned about it. Indicating an alarm where it may not happen is still problematic for the user (and we work to improve it, so such situations should happen less often), but not a bug in Frama-C, and so we prefer using the term imprecisely, e.g. "Frama-C/Eva imprecisely reports an out of bounds access".

scanf not working as expected in Frama-C

In the program below, function dec uses scanf to read an arbitrary input from the user.
dec is called from main and depending on the input it returns 1 or 0 and accordingly an operation will be performed. However, the value analysis indicates that y is always 0, even after the call to scanf. Why is that?
Note: the comments below apply to versions earlier than Frama-C 15 (Phosphorus, 20170501); in Frama-C 15, the Variadic plugin is enabled by default (and its short name is now -variadic).
Solution
Enable Variadic (-va) before running the value analysis (-val), it will eliminate the warning and the program will behave as expected.
Detailed explanation
Strictly speaking, Frama-C itself (the kernel) only does the parsing; it's up to the plug-ins themselves (e.g. Value/EVA) to evaluate the program.
From your description, I believe you must be using Value/EVA to analyze a program. I do not know exactly which version you are using, so I'll describe the behavior with Frama-C Silicon.
One limitation of ACSL (the specification language used by Frama-C) is that it is not currently possible to specify contracts for variadic functions such as scanf. Therefore, the specifications shipped with the Frama-C standard library are insufficient. You can notice this in the following program:
#include <stdio.h>
int d;
int main() {
scanf("%d", &d);
Frama_C_show_each(d);
return 0;
}
Running frama-c -val file.c will output, among other things:
...
[value] using specification for function scanf
FRAMAC_SHARE/libc/stdio.h:150:[value] warning: no \from part for clause 'assigns *__fc_stdin;' of function scanf
[value] Done for function scanf
[value] Called Frama_C_show_each({0})
...
That warning means that the specification is incorrect, which explains the odd behavior.
The solution in this case is to use the Variadic plug-in (-va, or -va-help for more details), which will specialize variadic calls and add specifications to them, thus avoiding the warning and behaving as expected. Here's the resulting code (-print) after running the Variadic plug-in on the example above:
$ frama-c -va file.c -print
[... lots of definitions from stdio.h ...]
/*# requires valid_read_string(format);
requires \valid(param0);
ensures \initialized(param0);
assigns \result, *__fc_stdin, *param0;
assigns \result
\from (indirect: *__fc_stdin), (indirect: *(format + (0 ..)));
assigns *__fc_stdin
\from (indirect: *__fc_stdin), (indirect: *(format + (0 ..)));
assigns *param0
\from (indirect: *__fc_stdin), (indirect: *(format + (0 ..)));
*/
int scanf_0(char const *format, int *param0);
int main(void)
{
int __retres;
scanf_0("%d",& d);
Frama_C_show_each(d);
__retres = 0;
return __retres;
}
In this example, scanf was specialized to scanf_0, with a proper ACSL annotation. Running EVA on this program will not emit any warnings and produce the expected output:
# frama-c -va file.c -val
...
[value] Done for function scanf_0
[value] Called Frama_C_show_each([-2147483648..2147483647])
...
Note: the GUI in Frama-C 14 (Silicon) does not allow the Variadic plug-in to be enabled (even after ticking it in the Analyses panel), so you must use the command-line in this case to obtain the expected result and avoid the warning. Starting from Frama-C 15 (Phosphorus, to be released in 2017), this won't be necessary: Variadic will be enabled by default and so your example would work from the start.

Why the following code prints garbage values for input strings greater than 128 bytes?

This is a problem of codechef that I recently came across. The answer seems to be right for every test case where the value of input string is less than 128 bytes as it is passing a couple of test cases. For every value greater than 128 bytes it is printing out a large value which seems to be a garbage value.
std::string str;
std::cin>>str;
vector<pair<char,int>> v;
v.push_back(make_pair('C',0));
v.push_back(make_pair('H',0));
v.push_back(make_pair('E',0));
v.push_back(make_pair('F',0));
int i=0;
while(1)
{
if(str[i]=='C')
v['C'].second++;
else if (str[i]=='H')
{
v['H'].second++;
v['C'].second--;
}
else if (str[i]=='E')
{
v['E'].second++;
v['C'].second--;
}
else if (str[i]=='F')
v['F'].second++;
else
break;
i++;
Even enclosing the same code within
/*reading the string values from a file and not console*/
std::string input;
std::ifstream infile("input.txt");
while(getline(infile,input))
{
istringstream in(input);
string str;
in>>str;
/* above code goes here */
}
generates the same result. I am not looking for any solution(s) or hint(s) to get to the right answer as I want to test the correctness of my algorithm. But I want to know why this happens as I am new to vector containers`.
-Regards.
if(str[i]=='C')
v['C'].second++;
You're modifying v[67]
... which is not contained in your vector, and thus either invalid memory or uninitialized
You seem to be trying to use a vector as an associative array. There is already such a structure in C++: a std::map. Use that instead.
With using this v['C'] you actually access the 67th (if 'A' is 65 from ASCII) element of a container having only 4 items. Depending on compiler and mode (debug vs release) you get undefined behavior for the code.
What you probably wanted to use was map i.e. map<char,int> v; instead of vector<pair<char,int>> v; and simple v['C']++; instead of v['C'].second++;

Is it possible to inject values in the frama-c value analyzer?

I'm experimenting with the frama-c value analyzer to evaluate C-Code, which is actually threaded.
I want to ignore any threading problems that might occur und just inspect the possible values for a single thread. So far this works by setting the entry point to where the thread starts.
Now to my problem: Inside one thread I read values that are written by another thread, because frama-c does not (and should not?) consider threading (currently) it assumes my variable is in some broad range, but I know that the range is in fact much smaller.
Is it possible to tell the value analyzer the value range of this variable?
Example:
volatile int x = 0;
void f() {
while(x==0)
sleep(100);
...
}
Here frama-c detects that x is volatile and thus has range [--..--], but I know what the other thread will write into x, and I want to tell the analyzer that x can only be 0 or 1.
Is this possible with frama-c, especially in the gui?
Thanks in advance
Christian
This is currently not possible automatically. The value analysis considers that volatile variables always contain the full range of values included in their underlying type. There however exists a proprietary plug-in that transforms accesses to volatile variables into calls to user-supplied function. In your case, your code would be transformed into essentially this:
int x = 0;
void f() {
while(1) {
x = f_volatile_x();
if (x == 0)
sleep(100);
...
}
By specifying f_volatile_x correctly, you can ensure it returns values between 0 and 1 only.
If the variable 'x' is not modified in the thread you are studying, you could also initialize it at the beginning of the 'main' function with :
x = Frama_C_interval (0, 1);
This is a function defined by Frama-C in ...../share/frama-c/builtin.c so you have to add this file to your inputs when you use it.

Easy way to get NUMBERFMT populated with defaults?

I'm using the Windows API GetNumberFormatEx to format some numbers for display with the appropriate localization choices for the current user (e.g., to make sure they have the right separators in the right places). This is trivial when you want exactly the user default.
But in some cases I sometimes have to override the number of digits after the radix separator. That requires providing a NUMBERFMT structure. What I'd like to do is to call an API that returns the NUMBERFMT populated with the appropriate defaults for the user, and then override just the fields I need to change. But there doesn't seem to be an API to get the defaults.
Currently, I'm calling GetLocaleInfoEx over and over and then translating that data into the form NUMBERFMT requires.
NUMBERFMT fmt = {0};
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_IDIGITS | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NumDigits),
sizeof(fmt.NumDigits)/sizeof(WCHAR));
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_ILZERO | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.LeadingZero),
sizeof(fmt.LeadingZero)/sizeof(WCHAR));
WCHAR szGrouping[32] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SGROUPING, szGrouping,
ARRAYSIZE(szGrouping));
if (::lstrcmp(szGrouping, L"3;0") == 0 ||
::lstrcmp(szGrouping, L"3") == 0
) {
fmt.Grouping = 3;
} else if (::lstrcmp(szGrouping, L"3;2;0") == 0) {
fmt.Grouping = 32;
} else {
assert(false); // unexpected grouping string
}
WCHAR szDecimal[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SDECIMAL, szDecimal,
ARRAYSIZE(szDecimal));
fmt.lpDecimalSep = szDecimal;
WCHAR szThousand[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_STHOUSAND, szThousand,
ARRAYSIZE(szThousand));
fmt.lpThousandSep = szThousand;
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_INEGNUMBER | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NegativeOrder),
sizeof(fmt.NegativeOrder)/sizeof(WCHAR));
Isn't there an API that already does this?
I just wrote some code to do this last week. Alas, there does not seem to be a GetDefaultNumberFormat(LCID lcid, NUMBERFMT* fmt) function; you will have to write it yourself as you've already started. On a side note, the grouping string has a well-defined format that can be easily parsed; your current code is wrong for "3" (should be 30) and obviously will fail on more exotic groupings (though this is probably not much of a concern, really).
If all you want to do is cut off the fractional digits from the end of the string, you can go with one of the default formats (like LOCALE_NAME_USER_DEFAULT), then check for the presence of the fractional separator (comma in continental languages, point in English) in the resulting character string, and then chop off the fractional part by replacing it with a null byte:
#define cut_off_decimals(sz, cch) \
if (cch >= 5 && (sz[cch-4] == _T('.') || sz[cch-4] == _T(','))) \
sz[cch-4] = _T('\0');
(Hungarian alert: sz is the C string, cch is character count, including the terminating null byte. And _T is the Windows generic text makro for either char or wchar_t depending on whether UNICODE is defined, only needed for compatibility with Windows 9x/ME.)
Note that this will produce incorrect results for the very odd case of a user-defined format where the third-to-last character is a dot or a comma that has some special meaning to the user other than fractional separator. I have never seen such a number format in my whole life, and hence I conclude that this is good and safe enough.
And of course this won't do anything if the third-to-last character is neither a dot nor a comma.

Resources