Pointer aliasing- in C++0x - c++11

I'm thinking about (just as an idea) disjointed pointer aliasing in C++0x. I was thinking about seeing if it could be implemented similarly to const correctness- that is, enforced by the compiler. What would be the requirements for such a thing? As this is more of a thought experiment, I'm perfectly happy to look at solutions that destroy legacy code or redefine half the language and that kind of thing.
What I'd really rather not do is have, say, restrict from C99 where the programmer just promises it. It should be enforced.
I was thinking about having unique_ptr be not part of the library, but part of the language. That way, the compiler can perform special optimizations on it and write their own unique pointer classes if they need to.

The Standard C++ Library (including std::unique_ptr) is a part of the language.
Also, conforming programs are not allowed to add declarations and definitions to the namespace std.
Upon seeing an instantiation of std::unique_ptr<T>, the compiler knows everything about the behavior of this instantiation - it's exactly that behavior which was implemented as a part of the language implementation the compiler itself is a part of and the compiler is free to perform "special optimizations" coming from the guarantees of the C++ standard.
As an example for something coming from the same line of thinking, GCC already does this with a number of standard C99 functions in hosted mode - it may replace standard function calls with inline insn sequence or with calls to other functions - precisely because GCC knows the exact semantics by just knowing the name of the function.

Related

Why is `math.Sin` disallowed in a Go constant?

According to Effective Go, the function math.Sin cannot be used to define a constant because that function must happen at run-time.
What is the reasoning behind this limitation? Floating-point consistency? Quirk of the Sin implementation? Something else?
There is support for this sort of thing in other languages. In C, for example: as of version 4.3, GCC supports compile-time calculation of the sine function. (See section "General Optimizer Improvements").
However, as noted in this blog post by Bruce Dawson, this can cause unexpected issues. (See section "Compile-time versus run-time sin").
Is this a relevant concern in Go? Or is this usage restricted for a different reason?
Go doesn't support initializing a constant with the result of a function. Functions are called at runtime, not at compile time. But constants are defined at compile time.
It would be possible to make exceptions for certain functions (like math.Sin for example), but that would make the spec more complicated. The Go developers generally prefer to keep the spec simple and consistent.
Go simply lacks the concept. There is no way of marking a function as pure (its return value depends only on its arguments, and it doesn't alter any kind of mutable state or perform I/O), there is no way for the compiler to infer pureness, and there's no attempt to evaluate any expression containing a function call at compile-time (because doing so for anything except a pure function of constant arguments would be a source of weird behavior and bugs, and because adding the machinery needed to make it work right would introduce quite a bit of complexity).
Yes, this is a substantial loss, which forces a tradeoff between code with bad runtime behavior, and code which is flat-out ugly. Go partisans will choose the ugly code and tell you that you are a bad human being for not finding it beautiful.
The best thing you have available to you is code generation. The integration of go generate into the toolchain and the provision of a complete Go parser in the standard library makes it relatively easy to munge code at build time, and one of the things that you can do with this ability is create more advanced constant-folding if you so choose. You still get all of the debuggability peril of code generation, but it's something.

Inlining functions in Go

Go has a nice generic implementation of introsort using the generic sort.Interface. But one of the advantages of C++'s std::sort is that you can specify a compare functor which is inlined and unnecessary function calls are omitted.
Can we force the current native Go compilers somehow to inline those sort.Swap and sort.Less calls? I do not consider gccgo because it gives us terrible results compared to the native compiler.
For Go 1.x the answer is a sound no.
The Go compiler may inline some classes of functions on some platforms in some specific ways. You might even be able to figure out some trickery through which you can make the current compiler inline the sort-"functor" on your architecture but there is absolutely no guarantee whatsoever that this will hold in future releases and it may work against you (even break your code, depending on the level of hackery involved) when the compiler changes.
A major release of Go (2.x?) might provide some guarantees about optimizations; but for Go 1.x there's nothing like that. And even if the Go authors wanted to do this it's difficult to provide such guarantees for all platforms (remember Go runs on many architectures).
Even more generally it is a bad idea to rely on compiler-specific behavior, in any language.

Is there valid "use cases" for Undefined Behaviour?

I have found a piece of code which has UB, and was told to leave it in the code, with a comment that states it is UB. Using MSVC2012 only.
The code itself has a raw array of Foo objects, then casts that array to char* with reinterpret_cast<char*> and then calls delete casted_array (like this, not delete[]) on it.
Like this:
Foo* foos = new Foo[500];
char* CastedFoos = reinterpret_cast<char*>(foos);
delete CastedFoos;
Per the Standard 5.3.5/3 this is clearly Undefined Behavior.
Apparently this code does what it does to avoid having to call destructors as an optimisation.
I wondered, is there actually places where leaving UB in the code could be considered valid?
Also, as far as I'm concerned, leaving the above in code is not smart, am I right?
It depends entirely on your perspective.
Take an extreme example: in C++03, threads were undefined behavior. As soon as you had more than one thread, your program's behavior was no longer defined by the C++ standard.
And yet, most people would say threads are useful.
Of course, multithreading may have been UB according to the C++ standard, but individual compilers didn't treat it as undefined. They provided an additional guarantee that multithreading is going to work as you'd expect.
When talking about C++ in the abstract, UB has no uses whatsoever. How could it? You don't know what could or would happen.
But in specific applications, specific code compiled by specific compilers to run on specific operating systems, you may sometimes know that a piece of UB is (1) safe, and (2) ends up having some kind of beneficial effect.
The C++ standard defines "undefined behaviour" as follows:
behavior for which this standard imposes no requirements
So if you want your code to be portable to different compilers and platforms, then your code should not depend on undefined behavior, because what the programs (that are produced by different compilers compiling your code) do in these cases may vary.
If you don't care about portability, then you should check if your compiler documents how it behaves under the circumstances of interest. If it doesn't document what it does (and it doesn't have to), beware that the compiler could change what it does without warning between different versions. Also note that its behaviour may be non-deterministic. So for example it could crash 1% of the time, which you may not notice in ad-hoc testing, but will come back and bite you later when it goes into production. So even if you are using one compiler, it may still be a bad idea to depend on undefined behavior.
With regard to your specific example, you can rewrite it to achieve the same effect (not calling destructor, but reclaiming memory) in a way that does not result in undefined behaviour. Allocate a std::aligned_storage to hold the Foo array, call placement new to construct the Foo array on the aligned_storage, then when you want to deallocate the array, deallocate the aligned_storage without calling placement delete.
Of course this is still a terrible design, may cause memory leaks or other problems depending on what Foo::~Foo() was supposed to do, but at least it isn't UB.

How do I force gcc to inline a function?

Does __attribute__((always_inline)) force a function to be inlined by gcc?
Yes.
From documentation v4.1.2
From documentation latest
always_inline
Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified.
It should. I'm a big fan of manual inlining. Sure, used in excess it's a bad thing. But often times when optimizing code, there will be one or two functions that simply have to be inlined or performance goes down the toilet. And frankly, in my experience C compilers typically do not inline those functions when using the inline keyword.
I'm perfectly willing to let the compiler inline most of my code for me. It's only those half dozen or so absolutely vital cases that I really care about. People say "compilers do a good job at this." I'd like to see proof of that, please. So far, I've never seen a C compiler inline a vital piece of code I told it to without using some sort of forced inline syntax (__forceinline on msvc __attribute__((always_inline)) on gcc).
Yes, it will. That doesn't necessarily mean it's a good idea.
According to the gcc optimize options documentation, you can tune inlining with parameters:
-finline-limit=n
By default, GCC limits the size of functions that can be inlined. This flag
allows coarse control of this limit. n is the size of functions that can be
inlined in number of pseudo instructions.
Inlining is actually controlled by a number of parameters, which may be specified
individually by using --param name=value. The -finline-limit=n option sets some
of these parameters as follows:
max-inline-insns-single is set to n/2.
max-inline-insns-auto is set to n/2.
I suggest reading more in details about all the parameters for inlining, and setting them appropriately.
I want to add here that I have a SIMD math library where inlining is absolutely critical for performance. Initially I set all functions to inline but the disassembly showed that even for the most trivial operators it would decide to actually call the function. Both MSVC and Clang showed this, with all optimization flags on.
I did as suggested in other posts in SO and added __forceinline for MSVC and __attribute__((always_inline)) for all other compilers. There was a consistent 25-35% improvement in performance in various tight loops with operations ranging from basic multiplies to sines.
I didn't figure out why they had such a hard time inlining (perhaps templated code is harder?) but the bottom line is: there are very valid use cases for inlining manually and huge speedups to be gained.
If you're curious this is where I implemented it. https://github.com/redorav/hlslpp
Yes. It will inline the function regardless of any other options set. See here.
One can also use __always_inline. I have been using that for C++ member functions for GCC 4.8.1. But could not found a good explanation in GCC doc.
Actually the answer is "no". All it means is that the function is a candidate for inlining even with optimizations disabled.

Why doesn't Haskell have symbols (a la ruby) / atoms (a la erlang)?

The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.
Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example
type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"
In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).
The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).
So why have symbols not been added to the language? Or am I thinking about this the wrong way?
I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.
The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.
Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.
In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.
I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:
Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".
Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).
Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.
Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.
So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.
Atoms aren't provided by the language, but can be implemented reasonably as a library:
http://hackage.haskell.org/package/simple-atom
There are a few other libs on hackage, but this one looks the most recent and well-maintained.
Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.
The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.
I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
Use Enum instead.
data FileType = GZipped | BZipped | Plain
deriving Enum
descr ft = ["compressed with gzip",
"compressed with bzip2",
"uncompressed"] !! fromEnum ft

Resources