Are there any research papers on formal treatment of RAII and/or safe deallocations in C++?
Take a look at "A Mechanized Semantics for C++ Object Construction and
Destruction, with Applications to Resource Management" (page, different PDF version), which has apparently been submitted to POPL 2012; but AFAIK has not yet been peer reviewed.
There is a section specifically on RAII, although it may not prove what you want:
We cannot prove a general result guaranteeing the proper encapsulation
of resources in classes: this is a matter of program verification. We
can, however, prove that in a terminating program every construction
of a subobject is correctly matched by a destruction.
Disclaimer: I've only briefly skimmed the paper, and I know almost nothing about formal language semantics.
Related
I was reading an answer over on Software Engineering SE about factory methods and their uses. This answer refers to "cargo cult" programming which referred me to a wikipedia article and from there I followed the link to the "deep magic" page here.
On this page I found a very interesting statement:
Any comment that has an effect on the code is magic.
So my question is this. How might one create a comment that has an effect on the code? Are there any examples of this in the wild or is this merely a postulation with no grounding in reality?
In a naïvely interpreted language (without e.g. a bytecode compilation stage), comments will have effects on the execution times of the interpreter by affecting I/O operations to read the code. With sufficiently precise engineering of time-critical sections to a particular target architecture (or plain bad luck), you could get different execution paths, depending on the presence, absence or length of comments. Which is in itself a reason to stay away from purely interpreted languages… ah, MS-BASIC, how little I miss you!
I have been reading about formal verification and the basic point is that it requires a formal specification and model to work with. However, many sources classify static analysis as a formal verification technique, some mention abstract intepretation and mention its use in compilers.
So I am confused - how can these be formal verification if there is no formal description of the model?
EDIT: A source I found reads:
Static analysis: the abstract semantics is computed automatically from
the program text according to predefined abstractions (that can
sometimes be tailored automatically/manually by the user)
So does it mean it works just on the source code with no need for formal specification? This would be what static analysers do.
Also, is static analysis possible without formal verification? E.g. does SonarQube really perform formal methods?
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
How can these be formal verification if there is no formal description of the model?
A static analyser will generate control/data flow of a piece of code, upon which formal methods can then be applied to verify conformance to the system's/unit's expected design model.
Note that modelling/formal-specification is NOT a part of static-analysis.
However combined together, both of these tools are useful in formal verification.
For example if a system is modeled as a Finite State Machine (FSM) with
a pre-defined number of states
defined by a combination of specific values of certain member data.
a pre-defined set of transitions between various states
defined by the list of member functions.
Then the results of static analysis will help in formal verification of the fact that
the control NEVER flows along a path that is NOT present in the above FSM model.
Also, if a model can be simply defined in terms of type-definition, data-flow, control-flow/call-graph, i.e. code-metrics that a static-analyser can verify, then static-analysis itself is sufficient to formally verify that code conforms to such a model.
NOTE1. The yellow region above would be static analysers used to enforce stuff like coding-guidelines and naming-conventions i.e. aspects of code that cannot affect the program's behavior.
NOTE2. The red region above would be formal verification that requires additional steps like 100% dynamic code-coverage, elimination of unused and dead code. These cannot be detected/enforced using a static-analyser.
Static analysis is highly effective in verifying that a system/unit is implemented using a subset of the language specification to meet goals laid out in the system/unit design.
For example, if it is a design goal to prevent the stack memory from exceeding a particular limit, then one could apply a limit on the depth of recursion (or forbid recursive functions calls altogether). Static-analysis is used to identify such violations of design goals.
In the absence of any warnings from the static-analyser,
the system/unit code stands formally verified against such design-goals of its respective model.
eg. MISRA-C standard for Automotive software defines a subset of C for use in automotive systems.
MISRA-C:2012 contains
143 rules - each of which is checkable using static program analysis.
16 "directives" more open to interpretation, or relate to process.
Static analysis just means "read the source code and possibly complain". (Contrast to "dynamic analysis", meaning, "run the program and possibly complain about some execution behavior").
There are lots of different types of possible static-analysis complaints.
One possible complaint might be,
Your source code does not provably satisfy a formal specification
This complaint would be based on formal verification if the static analyzer had a formal specification which it interpreted "formally", a formal interpretation of the source code, and a trusted theorem prover that could not find an appropriate theorem.
All the other kinds of complaints you might get from a static analyzer are pretty much heuristic opinions, that is, they are based on some informal interpretation of the code (or specification if it indeed even exists).
The "heavy duty" static analyzers such as Coverity etc. have pretty good program models, but they don't tell you that your code meets a specification (they don't even look to see if you have one). At best they only tell you that your code does something undefined according to the language ("dereference a null pointer") and even that complaint isn't always right.
So-called "style checkers" such as MISRA are also static analyzers, but their complaints are essentially "You used a construct that some committee decided was bad form". That's not actually a bug, it is pure opinion.
You can certainly classify static analysis as a kind of formal verification.
how can these be formal verification if there is no formal description of the model?
For static analysis tools, the model is implicit (or in some tools, partly implicit). For example, "a well-formed C++ program will not leak memory, and will not access memory that hasn't been initialized". These sorts of rules can be derived from the language specification, or from the coding standards of a particular project.
Any algorithm you can implement in a HLL you can implement in assembly. On the other hand, there are many algorithms you can implement in assembly which you cannot implement in a HLL. - Randall Hyde
I found this statement in the forward to a book on assembly. The book is here: https://courses.engr.illinois.edu/ece390/books/artofasm/fwd/fwd.html#109
Does anyone know an example of this type of algorithm?
It's plain wrong.
You can implement any algorithm (in the CS sense of the word) in any turing complete programming language.
On the other hand, if he would have said something a like: "Some algorithms can be implemented very efficiently, and with ease in assembly, much more so than what is possible in most high level programming languages", then his statement would have made sense...
Interesting text though....
There is a sense in which it is trivially false: in the worst case, you could write an emulator in the HLL and then run the algorithm in there. But that's cheating a bit because now the HLL does not directly implement the algorithm.
A concrete example of what many HLL's can't do (or maybe they can in practice, but it is not guaranteed that they can do it), is directly implementing a XOR linked list. In many languages you just cannot XOR pointers, and/or it wouldn't make sense even if you could (consider garbage collection). Of course you can refer to every node by an integer ID and XOR those, but that's a workaround, not a direct implementation.
HLL's often have trouble implementing unstructured control flow, though many (particularly older) languages offer a goto. That means you may have to jump through hoops to implement a state machine (using a switch in a loop or whatever), instead of letting the state be implied by the program counter.
There are also many algorithms and data structures that rely on operations that don't exist in typical HLL's, for example popcnt or lzcnt, which can again be emulated, but then so can everything.
In case you have strict limitations in terms of memory and/or execution time, you might be forced to use assembly language.
High level languages typically require a run-time library which might be too big to fit into your program memory.
Think of a time-critical driver routine. An interrupt service routine for example. If there are only a few nanoseconds available for the routine, assembly language might be the only viable option.
How about this? You need to write some assembly code in order to access system registers and tables. But onces the setup is done, no CPU instructions are executed (everything's done by the complex CPU exception handling mechanisms) and yet the thing is Turing-complete and can "run" programs.
Are fewer checks/less rigorous code analysis required to provide development environment error feedback and auto completion for programming languages that are composed largely of human-readable phrases and words (i.e. Python, VB.NET)? This is in contrast to C-style languages, that depend more upon symbols and punctuation for code structure.
I have experience/am responsible for building dozens of language front ends.
Wordy languages vs. punctuationy languages are generally equally hard to parse and statically analyze.
The folks that define languages of either kind have either been decorating them for decades (e.g., COBOL since 1958), or building sophisticated languages (C++, Scala, Ruby) with both complex syntax and complex name resolution and type inference rules; the compiler vendors then proceed to add obscure syntax to support the strange things they do or to provide a customer lock (e.g., MS "managed C++", DLL declarations, etc.). There's the third problem of lousy definitions; the top languages may have precise rules about how they work, but many languages have sloppy definitions (e.g., PHP) which creates dark corner cases that have to be ironed out by painful experimentation with the actual implementation.
C++ has been our worst, esp. with the C++11 committee making a massive recent mess of things. We have full C++ parsers, but are still working on full name resolution for C++11 on top of our C++98 implementation. (The name resolution code is some 250,000 lines of code and its not enough!).
IBM COBOL is a close second; the language is just giant, and there are all sorts of funny name resolution rules ("an unqualified name can refer to a particular name without qualification if the reference is unambiguous" So, is this name an unambiguous reference in this context?).
Once you get past parsing and name/type resolution, then you get into control flow, data flow, points-to analysis, range anlaysis, call graph construction, ... which are generally about the same amount of effort as the earlier phases; we get away with less by having really good libraries that support these tasks.
With all this as background analyses, you can start to do "static analyis" of the smart kind that people want.
Another poster noted that recovering from syntax errors and (emphasis) "continue to generate meaningful error messages". All I can say to this is "Amen, brother". See this SO answer https://stackoverflow.com/a/6657974/120163 for a discussion of what goes wrong when you have "partial programs", which is essentially what you get when syntax error repairs guess at a fix.
From what I gathered on the lock free programming, it is incredibly hard to do right... and I agree.
Just thinking about some problems makes my head hurt. But what I wonder is, why isn't
there a widespread use of high-level wrappers around (e.g. lock free queue and similar stuff)?
For example boost has no lock free library, although one was suggested as far as I know.
I mean I guess that there is a lot of applications where you cant avoid the fact that the critical
section is the big part of the load. So what are the reasons? Is it...
Patents - I heard that some stuff related to lock-free programming is patented.
Performance.
Google, and Microsoft have internal libraries like that but none of them are public...
Something else?
So my question is: Why are high level abstractions that use lock free programming deep down not very
popular, while at the same time "regular" multi-threaded programming is "in"?
EDIT: boost got a lockfree lib :)
There are few people who are familiar enough with the field to implement easy-to-use lock-free libraries. Of those few, even fewer publish work for free and of those almost none do the vital additional work to make the library useable - e.g. publish full API docs, etc. They tend to just release a zip file with code in, which is almost useless. Then of course you also need to find a library which is written in the language you want to use, compiles on the platform you're using and finally, word of the library has to get out, so people know it exists.
Patents are an issue, in that they limit what can be offered. There is, for example, to my knowledge no unpatented singly-linked list. All the skip list stuff is heavily patented, too.
A hero in this field is Cliff Click, who came up with a lock-free hash, which he has more-or-less placed in the public domain.
You can find my lock-free library here;
http://www.liblfds.org
Another is Samy Bahra's Concurrency Kit;
http://www.concurrencykit.org
FYI Microsoft's .Net framework gained some lock free classes in .Net 4.0. Namely container classes in the System.Collections.Concurrent namespace, which are:
ConcurrentDictionary
ConcurrentQueue
ConcurrentStack
I've looked into their implementation and they are relatively fiddly/complex under the hood therefore they do represent a significant amount of effort in designing and testing (threading issues are of course notoriously difficult to test to a high standard).
You can take a look at libcds C++ library. It is collection of lock-free containers (stacks, queues, sets and maps) and safe memory reclamation algorithms.
IMHO regarding C++ (I'm not advanced in other languages). New C++ standard has just been released and the compiler developers need a time to implement its requirements. Today, all compilers do not support C++11 memory model entirely since it requires significant changes in compiler’s optimization rules. Recently, Microsoft announces support of the atomic operations that is the base of lock-free programming in VC++ 11 Developer Preview. It is good news for us. As I know, GCC is going to support it in 4.8 (or above).
Second problem is patents. Many interesting lock-free container algorithms are patented that is a barrier to include them to vendor’s libraries.
Third, the main part of lock-free containers is garbage collecting (safe memory reclamation). C++ is free from any GC (fortunately). There are a few GC algos (Hazard Pointer, Pass-the-Buck, epoch-based and so on) but most of them are patented too.
Fourth, not enough instruments to prove the correctness of memory fences applied in your lock-free implementation. Now I known only one – relacy(http://www.1024cores.net/home/relacy-race-detector).
I think after 2-3 years we’ll see many production-ready multiplatform C++ libraries of lock-free containers and algorithms. These libraries are being developed by vendors and enthusiasts.
However, in my opinion, our future is the hardware transaction memory (HTM). Today AMD, Sun (sorry, Oracle), Intel (?) are investigating HTM with very interesting results. Let’s wait.
There is at least one "lock free” framework that is somewhat popular: Erlang.
One major problem is that unless one uses an excessive number of memory barriers, it's hard to be certain that one has enough; if one does use an excessive number of memory barriers, performance is likely to be inferior to what one would have gotten using locks.
The biggest problem with locks is not performance, but robustness. If a thread gets waylaid while it holds a lock, the system dies. By contrast, if a thread which is accessing a lock-free data structure gets waylaid, it won't affect other threads' use thereof. In some situations, a lock-free data structure may be preferable to one using locks, even if performance is inferior, because one must protect the system from being brought down by a malfunctioning thread (for example, even if one was prepared to kill off a thread which hit a StackOverflowException without taking down the process, how would one protect against a thread putting a lot of stuff on its stack before calling a method to access a lock-protected data structure that the method, such that the lock-guarded method hit a stack overflow?) If one uses lock-free data structures, such risks aren't a problem.