is static checker the same as static analyzer? - static-analysis

in some literatures the term "static checker" is used. for example https://dl.acm.org/doi/pdf/10.1145/2872362.2872364. I know what is static analyzer (we use PDG, CFG, AST and ... for analyzing a program before it runs). but what do they mean by static checker? is "static checker" the same as "static analyzer"?

This answer, perhaps, should only be a comment but...
TL;DR; Yes, they are the same thing.
Longer answer:
In software engineering, we love to overload terms - using the same term for several different (sometimes incompatible) meanings.
We also are often non-standard in our terminology... using multiple terms for (broadly) the same thing; eg the following are all (broadly) synonym pairs:
Static Analysis v Static/Source Code Analysis (although the former is, perhaps, broader in scope than just source-code, and can analyse data or models etc)
Static Analyser v Static Checker (although, it could be argued that the former does Static Analysis, the later just Static Code Analysis)
In most instances, I would suggest that the each pair of terms can be used interchangeably, although I would (personally) recommend the left hand term is the "best" to use; the right hand term is a sub-set of the full.
Typically when someone talks about Static (Code) Analysis, they mean the process of running a Static Analyser/Checker on a set of source files, to check for conformance with coding rules (eg MISRA) - this may also measure some metrics.
So, yes, I suggest that (at least in most cases) a Static Analyser is the same thing as a Static Checker.
--
Disclaimer: for the avoidance of doubt, this post offers my personal opinion, and this view does not necessary reflect that of my employer, LDRA Ltd, who produce such Static Analysis tools.

Related

Static analysis vs. symbolic execution in implementation

What is the difference between implementation of static analysis and symbolic execution?
I really like this slide by Julian Cohen's Contemporary Automatic Program Analysis talk. In a nutshell, people like to divide program analysis into two broad categories of static and dynamic analysis. But there is really a broad spectrum of program analysis techniques that range from static to dynamic and manual to fully automatic. Symbolic execution is an interesting technique that falls somewhere in between static and dynamic analysis and is generally applied as a fully automatic approach.
Static analysis is any off-line computation that inspects code and produces opinions about the code quality. You can apply this to source code, to virtual machine code for Java/C#/... virtual machine instruction sets, and even to binary object code. There is no "one" static analysis (although classic compiler control and dataflow often figure prominently as foundation machinery for SA); the term collectively applies to all types of mechanisms that might be used offline.
Symbolic execution is a specific kind of off-line computation that computes some approximation of what the program actually does by constructing formulas representing the program state at various points. It is called "symbolic" because the approximation is usually some kind of formula involving program variables and constraints on their values.
Static analysis may use symbolic execution and inspect the resulting formula. Or it may use some other technique (regular expressions, classic compiler flow analyses, ...) or some combination. But static analysis does not have to use symbolic execution.
Symbolic execution may be used just to show an expected symbolic result of a computation. That isn't static analysis by the above definition because there isn't any opinion formed about how good that result is. Or, the formula may be subjected to analysis, at which point it becomes part of a static analysis. As a practical matter, one may use other program analysis techniques to support symbolic execution ("this formula for variable is propagated to which reads of variable x?" is a question usually answered well by flow analysis).
You may insist that static analysis is any offline computation over your source code, at which point symbolic execution is just a special case. I don't find this definition helpful, because it doesn't discriminate between use cases well enough.

Is static analysis really formal verification?

I have been reading about formal verification and the basic point is that it requires a formal specification and model to work with. However, many sources classify static analysis as a formal verification technique, some mention abstract intepretation and mention its use in compilers.
So I am confused - how can these be formal verification if there is no formal description of the model?
EDIT: A source I found reads:
Static analysis: the abstract semantics is computed automatically from
the program text according to predefined abstractions (that can
sometimes be tailored automatically/manually by the user)
So does it mean it works just on the source code with no need for formal specification? This would be what static analysers do.
Also, is static analysis possible without formal verification? E.g. does SonarQube really perform formal methods?
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
How can these be formal verification if there is no formal description of the model?
A static analyser will generate control/data flow of a piece of code, upon which formal methods can then be applied to verify conformance to the system's/unit's expected design model.
Note that modelling/formal-specification is NOT a part of static-analysis.
However combined together, both of these tools are useful in formal verification.
For example if a system is modeled as a Finite State Machine (FSM) with
a pre-defined number of states
defined by a combination of specific values of certain member data.
a pre-defined set of transitions between various states
defined by the list of member functions.
Then the results of static analysis will help in formal verification of the fact that
the control NEVER flows along a path that is NOT present in the above FSM model.
Also, if a model can be simply defined in terms of type-definition, data-flow, control-flow/call-graph, i.e. code-metrics that a static-analyser can verify, then static-analysis itself is sufficient to formally verify that code conforms to such a model.
NOTE1. The yellow region above would be static analysers used to enforce stuff like coding-guidelines and naming-conventions i.e. aspects of code that cannot affect the program's behavior.
NOTE2. The red region above would be formal verification that requires additional steps like 100% dynamic code-coverage, elimination of unused and dead code. These cannot be detected/enforced using a static-analyser.
Static analysis is highly effective in verifying that a system/unit is implemented using a subset of the language specification to meet goals laid out in the system/unit design.
For example, if it is a design goal to prevent the stack memory from exceeding a particular limit, then one could apply a limit on the depth of recursion (or forbid recursive functions calls altogether). Static-analysis is used to identify such violations of design goals.
In the absence of any warnings from the static-analyser,
the system/unit code stands formally verified against such design-goals of its respective model.
eg. MISRA-C standard for Automotive software defines a subset of C for use in automotive systems.
MISRA-C:2012 contains
143 rules - each of which is checkable using static program analysis.
16 "directives" more open to interpretation, or relate to process.
Static analysis just means "read the source code and possibly complain". (Contrast to "dynamic analysis", meaning, "run the program and possibly complain about some execution behavior").
There are lots of different types of possible static-analysis complaints.
One possible complaint might be,
Your source code does not provably satisfy a formal specification
This complaint would be based on formal verification if the static analyzer had a formal specification which it interpreted "formally", a formal interpretation of the source code, and a trusted theorem prover that could not find an appropriate theorem.
All the other kinds of complaints you might get from a static analyzer are pretty much heuristic opinions, that is, they are based on some informal interpretation of the code (or specification if it indeed even exists).
The "heavy duty" static analyzers such as Coverity etc. have pretty good program models, but they don't tell you that your code meets a specification (they don't even look to see if you have one). At best they only tell you that your code does something undefined according to the language ("dereference a null pointer") and even that complaint isn't always right.
So-called "style checkers" such as MISRA are also static analyzers, but their complaints are essentially "You used a construct that some committee decided was bad form". That's not actually a bug, it is pure opinion.
You can certainly classify static analysis as a kind of formal verification.
how can these be formal verification if there is no formal description of the model?
For static analysis tools, the model is implicit (or in some tools, partly implicit). For example, "a well-formed C++ program will not leak memory, and will not access memory that hasn't been initialized". These sorts of rules can be derived from the language specification, or from the coding standards of a particular project.

What levels should static analyzers analyze?

I've noticed that some static analyzers operate on source code, while others operate on bytecode (e.g., FindBugs). I'm sure there are even some that work on object code.
My question is a simple one, what are the advantages and disadvantages of writing different kinds of static analyzers for different levels of analysis?
Under "static analyzers" I'm including linters, bug finders, and even full-blown verifiers.
And by levels of analysis I would include source code, high-level IRs, low-level IRs, bytecode, object code, and compiler plugins that have access to all phases.
These different facets can influence the level at which an analyzer may decide to work:
Designing a static analyzer is a lot of work. It would be a shame not to factor this work for several languages compiled to the same bytecode, especially when the bytecode retains most of the structure of the source program: Java (FindBugs), .NET (various tools related to Code Contracts). In some cases, the common target language was made up for the purpose of analysis although the compilation scheme wasn't following this path.
Related to 1, you may hope that your static analyzer will be a little less costly to write if it works on a normalized version of the program with a minimum number of constructs. When authoring static analyzers, having to write the treatment for repeat until when you have already written while do is a bother. You may structure your analyzer so that several functions are shared for these two cases, but the care-free way to handle this is to translate one to the other, or to translate the source to an intermediate language that only has one of them.
On the other hand as already pointed out in Flash Sheridan's answer, source code contains the most information. For instance, in languages with fuzzy semantics, bugs at the source level may be removed by compilation. C and C++ have numerous "undefined behaviors" where the compiler is allowed to do anything, including generating a program that works accidentally. Fine, you might think, if the bug is not in the executable it's not a problematic bug. But when you ever re-compile the program for another architecture or with the next version of the compiler, the bug may appear again. This is one reason for not doing the analysis after any phase that might potentially remove bugs.
Some properties can only be checked with reasonable precision on compiled code. That includes absence of compiler-introduced bugs as pointed out again by Flash Sheridan, but also worst-case execution time. Similarly, many languages do not let you know what floating-point code does precisely unless you look at the assembly generated by the compiler (this is because existing hardware does not make it convenient for them to guarantee more). The choice is then to write an imprecise source-level analyzer that takes into account all possibilities, or to analyze precisely one particular compilation of a floating-point program, as long as it is understood that it is that precise assembly code that will be executed.
Source code analysis is the most generally useful, of course; sometimes heuristics even need to analyze comments or formatting. But you’re right that even object code analysis can be necessary, e.g., to detect bugs introduced by GCC misfeatures. Thomas Reps, head of GrammaTech and a Wisconsin professor, gave a good talk on this at Stanford a couple of years ago: http://pages.cs.wisc.edu/~reps/#TOPLAS-WYSINWYX.

What are the features of dynamic languages (like Ruby or Clojure) which you are missing in Scala?

What do you lose in practice when you choose a statically-typed language such as Scala (or F#, Haskell, C#) instead of dynamically-typed ones like Ruby, Python, Clojure, Groovy (which have macros or runtime metaprogramming capabilities)? Please consider best statically-typed languages and best (in your opinion) dynamically-typed languages, not the worst ones.
Answers Summary:
Key advantages of dynamic languages like Ruby over statically-typed language like Scala IMHO are:
Quick edit-run cycle (does JavaRebel reduces the gap?)
Currently community of Scala/Lift is much smaller then of Ruby/Rails or Python/Django
Possible to modify type definitions (though motivation or need for that is not very clear)
In principle, you give up being able to ignore what type you're using when it is not clear (in the static context) what the right thing to do is, and that's about it.
Since complex type-checking can be rather time-consuming, you also probably are forced to give up fast on-line metaprogramming.
In practice, with Scala, you give up very little else--and nothing that I particularly care about. You can't inject new methods, but you can compile and run new code. You do have to specify types in function arguments (and the return type with recursive functions), which is slightly annoying if you never make type errors yourself. Since it compiles each command, the Scala REPL isn't as snappy as e.g. the Python shell. And since it uses Java reflection mechanisms, you don't have quite the ease of online inspection that you do with e.g. Python (not without building your own inspection library, anyway).
The choice of which static or dynamic language is more significant than the static/dynamic choice itself. Some dynamic languages have good performance and good tools. Some static languages can be concise, expressive, and incremental. Some languages have few of these qualities, but do have large libraries of proven code.
Dynamic languages tend to have much more flexible type systems. For example, Python lets you inject a new method into an existing classes, or even into a single object.
Many (not all) static languages lack the facility to construct complex literals. For instance, languages like C# and Java cannot easily mimic the following JavaScript { 'request':{'type':'GET', 'path':mypath}, 'oncomplete':function(response) { alert(response.result) } }.
Dynamic languages have very fluid semantics. Python allows import statements, function definitions and class definitions to appear inside functions and if statements.
eval is a staple of most dynamic languages and few static languages.
Higher order programming is easier (in my subjective opinion) in dynamic languages than static languages, due to the awkwardness of having to fully specify the types of function parameters.
This is particulary so with recursive HOP constructs where the type system can really get in the way.
Dynamic language users don't have to deal with covariance and contravariance.
Generic programming comes practically free in dynamic languages.
I'm not sure if you lose anything but simplicity. Static type systems are an additional burden to learn.
I suppose you usually also lose eval, but I never use it, even in dynamic languages.
I find the issue is much more about everything else when it comes to choosing which language to use for a given task. Tooling, culture, libraries are all much more interesting than typing when it comes to solving a problem with a language.
Programming language research, on the other hand, is completely different. :)
Some criticism of Scala has been expressed by Steve Yegge here and here, and by Guido van Rossum, who mainly attacked Scala's type system complexity. They clearly aren't "Scala programmers" though. On the other hand, here's some praise from James Strachan.
My 2 cents...
IMO (strong) statically-typed languages might reduce the amount of necessary testing code, because some of that work will be done by the compiler. On the other hand, if the compiling step is relatively long, it makes it more difficult to do "incremental-style" programming, which in the real life might result in error-prone code that was only tested to pass the compiler.
On the other hand, dynamically-typed languages feel like there is less threshold to change things, that might reduce the responding time from the point of bug-fixing and improvement, and as a result might provide a smoother curve during application development: handling constant flow of small changes is easier/less risky than handling changes which are coming in bug chunks.
For example, for the project where the design is very unclear and is supposed to change often, it might have been easier to use dynamic language than a static one, if it helps reduce interdependencies between different parts. (I don't insist on that one though:) )
I think Scala sits somewhere in between (e.g. you don't have to explicitly specify types of the variables, which might ease up code maintenance in comparison with e.g. C++, but if you end up with the wrong assumption about types, the compiler will remind about it, unlike in PHP where you can write whatever and if you don't have good tests covering the functionality, you are doomed to find it out when everything is live and bleeding). Might be terribly wrong of course :)
In my opinion, the difference between the static and dynamic typing comes down to the style of coding. Although there is structural types in Scala, most of the time the programmer is thinking in terms of the type of the object including cool gadgets like trait. On the other hand, I think Python/Javascript/Ruby programmers think in terms of prototype of the object (list of methods and properties), which is slightly different from types.
For example, suppose there's a family of classes called Vehicle whose subclasses include Plane, Train, and Automobile; and another family of classes called Animal whose subclasses include Cat, Dog, and Horse. A Scala programmer would probably create a trait called Transportation or something which has
def ride: SomeResult
def ride(rider: Someone): SomeResult
as a member, so she can handle both Train and Horse as a means of transportation. A Python programmer would just pass the train object without additional code. At the run time the language figures out that the object supports ride.
The fact that the method invocations are resolved at the runtime allows languages like Python and Ruby to have libraries that redefines the meaning of properties or methods. A good example of that is O/R mapping or XML data binding, in which undefined property name is interpreted to be the field name in a table/XML type. I think this is what people mean by "flexibility."
In my very limited experience of using dynamic languages, I think it's faster coding in them as long as you don't make mistakes. And probably as you or your coworkers get good at coding in dynamic language, they would make less mistakes or start writing more unit tests (good luck). In my limited experience, it took me very long to find simple errors in dynamic languages that Scala can catch in a second. Also having all types at compile time makes refactoring easier.

Are dynamic languages slower than static languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Are dynamic languages slower than static languages because, for example, the run-time has to check the type consistently?
No.
Dynamic languages are not slower than static languages. In fact, it is impossible for any language, dynamic or not, to be slower than another language (or faster, for that matter), simply because a language is just a bunch of abstract mathematical rules. You cannot execute a bunch of abstract mathematical rules, therefore they cannot ever be slow(er) or fast(er).
The statement that "dynamic languages are slower than static languages" is not only wrong, it doesn't even make sense. If English were a typed language, that statement wouldn't even typecheck.
In order for a language to even be able to run, it has to be implemented first. Now you can measure performance, but you aren't measuring the performance of the language, you are measuring the performance of the execution engine. Most languages have many different execution engines, with very different performance characteristics. For C, for example, the difference between the fastest and slowest implementations is a factor of 100000 or so!
Also, you cannot really measure the performance of an execution engine, either: you have to write some code to run on that exection engine first. But now you aren't measuring the performance of the execution engine, you are measuring the performance of the benchmark code. Which has very little to do with the performance of the execution engine and certainly nothing to do with the performance of the language.
In general, running well-designed code on well-designed high-performance execution engines will yield about the same performance, independent of whether the language is static or dynamic, procedural, object-oriented or functional, imperative or declarative, lazy or strict, pure or impure.
In fact, I would propose that the performance of a system is solely dependent on the amount of money that was spent making it fast, and completely independent of any particular typing discipline, programming paradigm or language.
Take for example Smalltalk, Lisp, Java and C++. All of them are, or have at one point been, the language of choice for high-performance code. All of them have huge amounts of engineering and research man-centuries expended on them to make them fast. All of them have highly-tuned proprietary commercial high-performance execution engines available. Given roughly the same problem, implemented by roughly comparable developers, they all perform roughly the same.
Two of those languages are dynamic, two are static. Java is interesting, because although it is a static language, most modern high-performance implementations are actually dynamic implementations. (In fact, several modern high-performance JVMs are actually either Smalltalk VMs in disguise, derived from Smalltalk VMs or written by Smalltalk VM companies.) Lisp is also interesting, because although it is a dynamic language, there are some (although not many) static high-performance implementations.
And we haven't even begun talking about the rest of the execution environment: modern mainstream operating systems, mainstream CPUs and mainstream hardware architectures are heavily biased towards static languages, to the point of being actively hostile for dynamic languages. Given that modern mainstream execution environments are pretty much of a worst-case scenario for dynamic languages, it is quite astonishing how well they actually perform and one can only imagine what the performance in a less hostile environment would look like.
All other things being equal, usually, yes.
First you must clarify whether you consider
dynamic typing vs. static typing or
statically compiled languaged vs. interpreted languages vs. bytecode JIT.
Usually we mean
dynamc language = dynamic typing + interpreted at run-time and
static languages = static typing + statically compiled
, but it's not necessary the case.
Type information can help the VM dispatch the message faster than witout type information, but the difference tend to disappear with optimization in the VM which detect monomorphic call sites. See the paragraph "performance consideration" in this post about dynamic invokation.
The debates between compiled vs. interpreted vs. byte-code JIT is still open. Some argue that bytecode JIT results in faster execution than regular compilation because the compilation is more accurate due to the presence of more information collected at run-time. Read the wikipedia entry about JIT for more insight. Interpreted language are indeed slower than any of the two forms or compilation.
I will not argue further, and start a heated discussion, I just wanted to point out that the gap between both tend to get smaller and smaller. Chances are that the performance problem that you might face will not be related to the language and VM but because of your design.
EDIT
If you want numbers, I suggest you look at the The Computer Language Benchmarks. I found it insightful.
At the instruction level current implementations of dynamically typed languages are typically slower than current implementations of statically typed languages.
However that does not necessarily mean that the implementation of a program will be slower in dynamic languages - there are lots of documented cases of the same program being implemented in both a static and dynamic language and the dynamic implementation has turned out to be faster. For example this study (PDF) gave the same problem to programmers in a variety of languages and compared the result. The mean runtime for the Python and Perl implementations were faster than the mean runtime for the C++ and Java implementations.
There are several reasons for this:
1) the code can be implemented more quickly in a dynamic language, leaving more time for optimisation.
2) high level data structures (maps, sets etc) are a core part of most dynamic languages and so are more likely to be used. Since they are core to the language they tend to be highly optimised.
3) programmer skill is more important than language speed - an inexperienced programmer can write slow code in any language. In the study mentioned above there were several orders of magnitude difference between the fastest and slowest implementation in each of the languages.
4) in many problem domains execution speed it dominated by I/O or some other factor external to the language.
5) Algorithm choice can dwarf language choice. In the book "More Programming Pearls" Jon Bentley implemented two algorithms for a problem - one was O(N^3) and implemented in optimised fortran on a Cray1. The other was O(N) and implemented in BASIC on a TRS80 home micro (this was in the 1980s). The TRS80 outperformed the Cray 1 for N > 5000.
Dynamic language run-times only need to check the type occasionally.
But it is still, typically, slower.
There are people making good claims that such performance gaps are attackable, however; e.g. http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html
Themost important factor is to consider the method dispatch algorithm. With static languages each method is typically allocated an index. THe names we see in source are not actually used at runtime and are in source for readaility purposes. Naturally languages like java keep them and make them available in reflection but in terms of when one invokes a method they are not used. I will leave reflection and binding out of this discussion. This means when a method is invoked the runtmne simply uses the offset to lookup a table and call. A dynamic language on the other hand uses the name of the function to lookup a map and then calls said function. A hashmap is always going to be slower than using an index lookup into an array.
No, dynamic languages are not necessarily slower than static languages.
The pypy and psyco projects have been making a lot of progress on building JIT compilers for python that have data-driven compilation; in other words, they will automatically compile versions of frequently called functions specialised for particular common values of arguments. Not just by type, like a C++ template, but actual argument values; say an argument is usually zero, or None, then there will be a specifically compiled version of the function for that value.
This can lead to compiled code that is faster than you'd get out of a C++ compiler, and since it is doing this at runtime, it can discover optimisations specifically for the actual input data for this particular instance of the program.
Reasonable to assume as more things need to be computed in runtime.
Actually, it's difficult to say because many of the benchmarks used are not that representative. And with more sophisticated execution environments, like HotSpot JVM, differences are getting less and less relevant. Take a look at following article:
Java theory and practice: Dynamic compilation and performance measurement

Resources