Related
I was openbsd bcrypt code and I got warning of unknown attribute bounded at below code snippet:
void SHA256Update(SHA2_CTX *, const void *, size_t)
__attribute__((__bounded__(__string__,2,3)));
I tried to google the attribute bounded but no relevant result was found. I want to port that code to a different platform and if I get the meaning of bounded attribute, I want to use a similar attribute of that platform.
Any suggestion would be appreciated!
The __bounded__ attribute is available in the context of function declarations to enable to determine the length of the memory region pointed by one of the function arguments using the value of another of it's arguments; the first parameter slightly changes the type of the check for different styles of functions.
In this case it augments the type of the second argument to the function with the length specified by the third argument; the __string__ bound style additionally checks that the size argument doesn't come from a sizeof applied to a pointer, as you want the destination's size.
It's only available in the OpenBSD's fork of GCC (see man 1 gcc-local); there also was a short-lived GNU C extension (somewhere between 2000 and 2003) by the same name and for the same purpose, which was a direct type qualifier instead and was usable outside function declarations too, however, AFAIK it was undocumented.
I am refactoring some business rule functions to provide a more generic version of the function.
The functions I am refactoring are:
DetermineWindowWidth
DetermineWindowHeight
DetermineWindowPositionX
DetermineWindowPositionY
All of them do string parsing, as it is a string parsing business rules engine.
My question is what would be a good name for the newly refactored function?
Obviously I want to shy away from a function name like:
DetermineWindowWidthHeightPositionXPositionY
I mean that would work, but it seems unnecessarily long when it could be something like:
DetermineWindowMoniker or something to that effect.
Function objective: Parse an input string like 1280x1024 or 200,100 and return either the first or second number. The use case is for data-driving test automation of a web browser window, but this should be irrelevant to the answer.
Question objective: I have the code to do this, so my question is not about code, but just the function name. Any ideas?
There are too little details, you should have specified at least the parameters and returns of the functions.
Have I understood correctly that you use strings of the format NxN for sizes and N,N for positions?
And that this generic function will have to parse both (and nothing else), and will return either the first or second part depending on a parameter of the function?
And that you'll then keep the various DetermineWindow* functions but make them all call this generic function?
If so:
Without knowing what parameters the generic function has it's even harder to help, but it's most likely impossible to give it a simple name.
Not all batches of code can be described by a simple name.
You'll most likely need to use a different construction if you want to have clear names. Here's an idea, in pseudo code:
ParseSize(string, outWidth, outHeight) {
ParsePair(string, "x", outWidht, outHeight)
}
ParsePosition(string, outX, outY) {
ParsePair(string, ",", outX, outY)
}
ParsePair(string, separator, outFirstItem, outSecondItem) {
...
}
And the various DetermineWindow would call ParseSize or ParsePosition.
You could also use just ParsePair, directly, but I thinks it's cleaner to have the two other functions in the middle.
Objects
Note that you'd probably get cleaner code by using objects rather than strings (a Size and a Position one, and probably a Pair one too).
The ParsePair code (adapted appropriately) would be included in a constructor or factory method that gives you a Pair out of a string.
---
Of course you can give other names to the various functions, objects and parameters, here I used the first that came to my mind.
It seems this question-answer provides a good starting point to answer this question:
Appropriate name for container of position, size, angle
A search on www.thesaurus.com for "Property" gives some interesting possible answers that provide enough meaningful context to the usage:
Aspect
Character
Characteristic
Trait
Virtue
Property
Quality
Attribute
Differentia
Frame
Constituent
I think ConstituentProperty is probably the most apt.
When using the OpenFile function in Go's os package, what exactly is the purpose of the pipe character?
Example:
os.OpenFile("foo.txt", os.O_RDWR|os.O_APPEND, 0660)
Does it serve as a logical OR? If so, does Go choose the first one that is "truthy"?? Being that the constants those flags represent, at the heart of them are just integers written in hexadecimal, when compiled how does Go choose which flag to apply?
After all, if the function call were to go by the largest number, os.O_APPEND would take precedence over all other flags passed in as seen below:
os.O_RDWR == syscall.O_RDWR == 0x2 == 2
os.O_APPEND == syscall.O_APPEND == 0x400 == 1024
os.O_CREATE == syscall.O_CREAT == 0x40 == 64
UPDATE 1
To follow up on the comment below, if I have a bitwise operator calculation using os.O_APPEND|os.O_CREATE will that error if the file exists, or simply create/append as needed?
UPDATE 2
My question is two fold, one to understand the purpose of the bitwise operator, which I understand now is being used more as a bitmask operation; and two, how to use the os.OpenFile() function as a create or append operation. In my playing around I have found the following combination to work best:
file, _ := os.OpenFile("foo.txt", os.O_RDWR|os.O_CREATE|os.O_APPEND, 0660)
file.WriteString("Hello World\n")
file.Sync()
Is this the correct way or is there a more succinct way to do this?
It is a bitwise, not a logical OR.
If you write out the numbers in binary, and assign each a truth value 0/1, and apply the logical OR to each of the bits in place i between the arguments, and then reassemble the result into an integer by binary expansion - that's the | operator.
It is often used in a way that is commonly described as a "bitmask" - you use a bitmask when you want a single int value to represent a (small) set of switches that could be turned on or off. One bit per switch.
You should see in this context, A | B means "all the switches in A that are on, as well as all the switches in B that are on". In your case, the switches define the exact behavior of the file open/creation function, as described by the Go manual. (And probably more in detail by the Unix manpage I linked above).
In a bitmask, constants are typically defined that represent each switch - that's how those O_* constants are determined. Each is an int with exactly one bit set and represents a particular switch. (though, be careful, because sometimes they represent combinations of switches!).
Also:
^A // All of the "switches" not currently on in A
A&^B // All of the "switches" on in A but not on in B
A^B // All of the "switches" on in exactly one of A or B
, etc.
The operator | itself is described in the Go manual here.
It is a bitwise OR operator. Its purpose being used here is to allow for multiple values to be passed as a bitmask. Thus you can combine flags to create a desired result such as using the OpenFile() function to create a file if it does not exist or append to it if it does.
os.Openfile("foo.txt", os.O_RDWR|os.O_CREATE|os.O_APPEND, 0660
The constants being passed as arguments from the os package are assigned values from the syscall package. This package contains low-level operating system independent values.
Package syscall contains an interface to the low-level operating system primitives. The details vary depending on the underlying system, and by default, godoc will display the syscall documentation for the current system. If you want godoc to display syscall documentation for another system, set $GOOS and $GOARCH to the desired system. For example, if you want to view documentation for freebsd/arm on linux/amd64, set $GOOS to freebsd and $GOARCH to arm. The primary use of syscall is inside other packages that provide a more portable interface to the system, such as "os", "time" and "net".
https://golang.org/pkg/syscall/
As noted by #BadZen, a bitwise OR operator, in this case the '|' character, acts at the binary level changing any 0 values to 1's that are not already ones.
You should see in this context, A | B means "all the switches in A that are on, as well as all the switches in B that are on".
By doing this as the function above displays, you are manipulating the behavior of the function to create a file (os.O_CREATE) with the given name of foo.txtor open the file for reading/writing (os.O_RDWR) and any value written to it will be appended (os.O_APPEND). Alternatively you could pass along os.O_TRUNC in order to truncate the file before writing.
The bitwise OR operator allows you a powerful solution to combining different behaviors in order to get the result from the function that you are desiring.
Given the code:
#include <stdlib.h>
#include <stdint.h>
typedef struct { int32_t x, y; } INTPAIR;
typedef struct { int32_t w; INTPAIR xy; } INTANDPAIR;
void foo(INTPAIR * s1, INTPAIR * s2)
{
s2->y++;
s1->x^=1;
s2->y--;
s1->x^=1;
}
int hey(int x)
{
static INTPAIR dummy;
void *p = calloc(sizeof (INTANDPAIR),1);
INTANDPAIR *p1 = p;
INTPAIR *p2a = p;
INTPAIR *p2b = &p1->xy;
p2b->x = x;
foo(p2b,p2a);
int result= p2b->x;
free(p);
return result;
}
#include <stdio.h>
int main(void)
{
for (int i=0; i<10; i++)
printf("%d.",hey(i));
}
Behavior depends upon gcc optimization level, which implies that gcc thinks
this code invokes Undefined Behavior (the definition of "foo" collapses to nothing, but interestingly the definition of "hey" increments the value passed in). I'm not quite sure what if anything it does that runs afoul of the Standard's rules, though.
The code very deliberately and evilly constructs two pointers such that
s2a->y and s2b->x will alias, but the pointers are deliberately constructed in such a way that both identify legitimate potential objects of type INTPAIR. Because code used calloc to get the memory, all field members have legitimate initial defined values of zero. All accesses to the allocated memory are done via an int32_t member of an INTPAIR*.
I can understand why it would make sense for the Standard to forbid aliasing structure fields in this fashion, but I couldn't find anything in the Standard which actually does so. Is gcc operating in Standard-compliant fashion here, or is it violating some clause in the Standard which isn't referenced by Annex J.2 and doesn't use any of the terms I searched for?
UPDATE:
I felt this answer was OK, but not still a little imprecise, and not cut and dry as to what the UB was. After a lot of very interesting discussion and comments I have tried again with a new answer
The right part of the C99 standard is quoted in this answer. I'm copying it here for convenience. The question and several of the answers are quite thorough.
(C99; ISO/IEC 9899:1999 6.5/7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types 73) or 88):
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of
the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
What is an effective type then? (C99; ISO/IEC 9899:1999 6.5/6:
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
So at the line p2b->x = x the object at p+4 becomes of effective type INTPAIR. Is it aligned correctly? If it isn't then Undefined Behavior (UB). But to keep it interesting, assume it is as it must be in this case because of the layout of INTANDPAIR.
By the same analysis there are two 8 byte objects, p2a (s2) at #(p+4) and p2b #p. As your example is demonstrating the 2nd element of p2a and the first of p2b end up being aliased.
In the foo(), the object p2b #p+4 is accessed by the normal method via s1->x. But then the "stored value" of object p2b is also accessed by a side effect of modifying a different object p2a #p. Since this falls under none of the bullets of 6.5/7, it is UB. Note that 6.5/7 says only, so objects shall not be accessed in any other ways.
I think the main distinction is that the "object" in question is the whole structure p2a/s2 and p2b/s1, not the integer members. If you change the argument of the function to take the integers and alias them it works "fine" because the function can't know s1 and s2 alias. For example:
void foo2(int *s1, int *s2)
{
(*s2)++;
(*s1)^=1;
(*s2)--;
(*s1)^=1;
}
...
/*foo(p2b,p2a);*/
foo2((int*)p, (int*)p); /* or p+4 or whatever you want */
This more or less confirms that this is the way GCC chose to interpret things: modifying a member is modifying the whole struct object and that since side effects of modifying one object are not on the listed legal ways to indirectly modify a different object, whee! we can do whatever silly thing we feel like doing.
So whether GCC interprets the ambiguities in standard to decide that by deriving s1 and s2 pointers through different typed pointers and then accessing them constitutes indirectly accessing the memory via different original types via p1 and p or whether it interprets the standard in the way I'm suggesting that "object" s2->y modifies is not just the integer but the s2 object, it is UB either way. Or is GCC just being especially snarky and pointing out that if the standard doesn't very clearly specify the semantics of dynamically allocated yet overlapping objects, it is free to do whatever it wants because by definition it is "undefined".
I don't think at this microscopic level anyone other than the standards body can definitively answer whether this should be UB or not because at this level it requires some "interpretation". The GCC's implementers opinion's seem to favor very aggressive interpretations.
I like Linus's reaction to this whole thing. And it is true, why not just be conservative and let the programmer tell the compiler when it is safe? Very Excellent Linus Rant
My previous answer was lacking, maybe not completely wrong, but the sample program is deliberately designed to sidestep each of the more obvious explicit Undefined Behaviors (UB) dictated by the C99 standard, like 6.5/7. But with both GCC (and Clang) this example demonstrates strict aliasing failure like symptoms under optimization. They appear to be assuming s1->y and s2-x can't alias. So, is the compiler wrong? Is this a loophole in the strict aliasing legalese?
Short answer: No. I wouldn't be surprised if there was a loophole of some kind in the standard, given its complexity. But in this example, creating overlapping objects on the heap is explicitly undefined behavior, and there are several other things happening that the standard does not define.
I think the point of the example is not that it fails - it is obvious that "playing fast and loose" with pointers is a bad idea and relying on corner cases and legalese to prove the compile "wrong" is of little help if the code doesn't work. The key questions are: is GCC wrong? and what in the standard says so.
First, lets look at the obvious strict aliasing rules and how this example is trying to avoid them.
C99 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 76)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This is the main strict aliasing section. It means that accessing the same memory via two different type pointers is UB. This example sidesteps it by accessing both using INTPAIR pointers in foo().
The key problem with this is that it is talking about accessing the stored value via two different effective types (e.g. pointers). It doesn't talk about accessing via two different objects.
What is being accessed? is it the integer member or the entire object s1 / s2? Is accessing s2->x via s1->y access via "a type compatible with the effective type of the object". I believe an argument can be made that a) the access as a side effect of modifying a different object does not fall under the permissible methods in 6.5/7 and that b) modifying one member of the aggregate transitively modifies the aggregate (*s1 or *s2) also.
Since this is not specified, it is UB, but it is a bit hand-wavy.
How did we get pointers to two overlapping objects? Are the pointer casts leading to them OK? Section 6.3.2.3 contains the rules for casting pointers and the example carefully does not violate any of them. In particular, because p2b is a pointer to INTANDPAIR member xy the alignment is guaranteed to be right, otherwise it would definitely run afoul of 6.3.2.3/7.
Furthermore, &p1->xy is not a problem - it can't be - it is a perfectly legitimate pointer to an INTPAIR. Simply casting pointers and/or taking addresses is safely outside the definition of "access" (3.1/1).
It is clear that the problem comes about by accessing two integer members that overlay each other as different parts of overlapping objects. Any attempt to do this via pointers of different types would clearly run afoul of 6.5/7. If accessed by the same type pointer at the same address, there would be no problem whatsoever. So the only way left that they could alias this way is that if two objects at different addresses overlapped in some fashion.
Obviously this could occur as part of a union, but that is not the case for this example. Type punning through unions may not be UB in C99, but it would be a different question whether a variant of this example could be made misbehave via unions.
The example uses dynamic allocation and casts the resultant void pointer to two different types. Going from from a pointer to an object to void * and back again is valid (6.3.2.3/1). Several other ways of obtaining pointers to objects that would overlap are explicitly UB by the pointer conversion rules of 6.3.2.3, the aliasing rules of 6.5/7, and/or the compatible type rules 6.2.7.
So what else is wrong?
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3
The storage for each of the objects is allocated by calloc() so the duration we want is "allocated". So we check 7.20.3: (emphasis added)
7.20.3 Memory management functions
1 The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
...
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 25) and retains its last-stored value throughout its lifetime. 26) If an object is referred to outside of its lifetime, the behavior is undefined.
To avoid UB, the accesses to the two different objects must be to a valid object within its lifetime. You can get a single valid object (or an array) with malloc()/calloc(), but these guarantee that you will receive a pointer disjoint from all other objects. So is the object returned from calloc() p or is it p1? It can't be both.
The UB is triggered by attempting to reuse the same dynamically allocated object to hold two objects that are not disjoint. While calloc() guarantees it will return a pointer to a disjoint object, there is nothing that says it will still work if you then start using parts of the buffer for a 2nd overlapping one. In fact, it even explicitly says it is UB if you access an object outside its lifetime and there is only a single allocation ergo a single lifetime.
Also note:
4. Conformance
In this International Standard, ‘‘shall’’ is to be interpreted as a requirement on an implementation or on a program; conversely, ‘‘shall not’’ is to be interpreted as a prohibition.
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition
of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
For this to be a compiler error it must fail on a program that only uses constructs explicitly defined. Anything else is outside the safe-harbor and is still undefined, even if it the standard doesn't explicitly state that it is Undefined Behavior.
Looking at the Windows SDK, I found this #define directive for MAKEINTRESOURCEW:
#define MAKEINTRESOURCEW(i) ((LPWSTR)((ULONG_PTR)((WORD)(i))))
Can someone explain to me what the heck that means? For example, what would be the value of MAKEINTRESOURCEW(0)? (1)? (-1)?
The result of this macro will be pointer to long string with value equal to given parameter. You can see it by reading precompiler output (see /P C++ compiler options). All casting is required to compile this macro result, when LP[w]WSTR pointer is required, both in Win32 and x64 configurations.
Some Windows API, like LoadIcon, expect string pointer as their parameter. Possibly, these functions test the pointer value, and if it is less than some maximum, they interpret it as resource index, and not as string (problems of ugly C-style interface). So, this macro allows to pass WORD as string, without changing its value, with appropriate casting.
For the most part, it leaves the value unchanged, but converts it from an int to a pointer so it's acceptable to functions that expect to see a pointer. The intermediate casts widen the input int to the same size as a pointer, while ensuring against it's being sign extended. In case you care, ULONG_PTR is not a "ULONG POINTER" like you might guess -- rather, it's an unsigned long the same size as a pointer. Back before 64-bit programming became a concern, the definition was something like:
#define MAKEINTRESOURCE(i) (LPTSTR) ((DWORD) ((WORD) (i)))
Nowadays, they use ULONG_PTR, which is a 32-bit unsigned long for a 32-bit target, and a 64-bit unsigned long for a 64-bit target.
That's a macro that casts an argument i to a word, then casts that result to a pointer to an unsigned long, then again to a long pointer to a wide-character string.
Like other users said - it just casts an integer into a "pointer to a string".
The reason for this is the following: At the ancient times of Windows 3.0 people tried to be minimalistic as much as possible.
It was assumed that resources in the executable can have either string identifier or integer. Hence when you try to access such a resource - you specify one of the above, and the function distinguish what you meant automatically (by checking if the provided "pointer" looks like a valid pointer).
Since the function could not receive a "variable argument type" - they decided to make it receive LPCTSTR (or similar), whereas the actual parameter passed may be integer.
Another example from Windows API: A pointer to the window procedure. Every window has a window procedure (accessed via GetWindowLong with GWL_WNDPROC flag.
However sometimes it's just an integer which specifies what "kind" of a window is that.
Then there's a CallWindowProc which knows to distinguish those cases.