How to build a static code analysis tool?

How to build a static code analysis tool? - static-analysis

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.
I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.
Any books, articles , blogs , sites..etc would help.
Thanks.

I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.
A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.

You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.
Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.
People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.
Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as
many tools for such languages.
Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.

There is a blog by DeepSource that covers everything one needs to know to build an understanding of static code analysis and equip you with the basic theory and the right tools so that you can write analyzers on your own.
Here’s the link: https://deepsource.io/blog/introduction-static-code-analysis/

Obviously you need a parser for the language. A good high level AST is useful.
You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
Combine the AST with the mistakes in #2.

Related

What is the technique behind the code generation feature of UML tools

I am an engineering student, and deciding upon my final year project.
One of the many candidates is an online UML tool with code generation facilities. But I did not take compiler designing classes, so I am not much aware of the code generation techniques.
I want to know about the techniques that I should look to study in order to build something like this. If these techniques are as complicated as writing a compiler, then perhaps I will have to abandon this idea.

Compilation is really the opposite of the kind of code generation you are describing, so I don't think you need to know how to write a compiler.
Code generation can be as simple as combining text strings or using templates, or as complex as using Reflection.Emit to create classes at runtime.
I would start with this Wikipedia article.

The creation of an UML tool is a long term project. You need many to acquire different expertises which can not be known by just one member of the team.
Your academic project is too ambitious.
An easy project which has never been done is to generate code from an activity or state diagram. You should not try to recreate the graphical editor because this is very very complex but only to take the xmi export and generate code from it using a xml parser. This would be a good 6 months project for your thesis :-)

Most UML tools generate source code. The generation is normally quite a bit simpler than a compiler as well. For example, a class diagram will have a collection of data structures representing classes and links between those classes (inheritance). To generate output, you walk through the class objects, and for each you "print" out a representation of that object in the syntax of the target language.

I'm not sure exactly what capabilities your code generation will require, but the UML tools that I have used are not very sophisticated in their code generation.
Tools that I have used simply create files and drop your function names into them with arguments derived from the inputs. This would not require any understanding of compilers. Most of the difficulty would be in the user interface and how you store the data to make code generation easy.

You can just find that here:
http://yuml.me and http://askuml.com

Good examples when teaching refactoring?

I'm running a refactoring code dojo for some coworkers who asked how refactoring and patterns go together, and I need a sample code base. Anyone know of a good starting point that isn't to horrible they can't make heads or tails of the code, but can rewrite their way to something useful?

I would actually suggesting refactoring some of your and your coworkers' code.
There are always places that an existing codebase can be refactored, and the familiarity with the existing code will help make it feel more like a useful thing and less like an exercise. Find something in your company's code to use as an example, if possible.

Here are some codes, both the original and the refactored version, so you can prepare your kata or simply compare the results once the refactoring is performed:
My books have both shorter examples and a longer, actually a book long example. Code is free to download.
VB Code Examples
C# Code Examples
A nice example from Refactoring Workbook
There are a lot of examples on the internet of simple games like Tic-Tac-Toe or Snake that have a lot of smells but are simple enough to start with refactoring.

The first chapter in Martin Fowler "Refactoring" is a good starting point to refactoring. I understood most of the concepts when one of my teachers at school used this example.

What is the general knowledge level of your coworkers?
Something basic as code duplication should be easy to wrap their heads around. Two pieces of (nearly) identical code that can be refactored into a reusable method, class, whatever. Using a (past) example from your own codebase would be good.

I would recommend you to develop a simple example project for a specific requirement.
Then you add one more requirement and make changes to the existing classes . You keep on doing this and show them how you are finding it difficult to make each change when the code is not designed properly. This will make them realize easily because, this is what those ppl will be doing in their day to day work. Make them realize that , if patterns and principles are not followed from beginning, how are they going to end up in mess at the end.
When they realize that,then you start from scratch or refactor the existing messed up code .Now add a requirement and make them realize that it is easy to make a change in the refactored code, so that you need to test only a few classes. One change would not affect others and so on.
You could use the computer ,keyboard and printer class as an example. Add requirements like, you will be wanting the computer to read from mouse , then one more requirement can be like your computer would want to save it in hard disk than printing. Finally your refactored code should be like, your computer class should depend on abstract input device class and output device class. And your keyboard class should inherit from Inputdevice class.

Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin considers refactoring.

I'm loving Refactoring Guru examples.
In there you can find design patterns examples too.

Refactoring is non-functional requirement when code perform correct functionality for which it is designed however difficult to debug, requires more effort to maintain and some performance bottleneck. Refactoring is to change to be easily maintainable, good readability and improve efficiency.
Thus we need to focus on criteria to make code more readable, easy to maintain.
Its obvious that having very large method/function might be difficult to understand.
Class depends on other hundreds of class make thing worst while debugging.
Code should be readable just like reading some workflow.
You can also use tools like sonar which can help you to identify critical criteria such as "Cyclomatic Complexity"
http://www.sonarsource.org/managing-cyclomatic-complexity-to-increase-maintainability/
You ask them to write code them self and check how tool does refactoring.
Apart from that, you can write code in eclipse and there is option available which does refactoring for you...

It's a bit dated (2003), but IBM has several refactoring examples (that work[ed?] in Eclipse) at http://www.ibm.com/developerworks/library/os-ecref/

What tools for migrating programs from a platform A to B

As a pet project, I was thinking about writing a program to migrate applications written in a language A into a language B.
A and B would be object-oriented languages. I suppose it is a very hard task : mapping language constructs that are alike is doable, but mapping libraries concepts will be a very long task.
I was wondering what tools to use, I know this has to do with compilation, but I'm a bit afraid to use Lex and Yacc and all that stuff.
I was thinking of maybe using the Eclipse Modeling Framework, which would help me write models (of application code) transformations in a readable form.
But first I would have to write parsers for creating the models (and also create the metamodel from the language grammar).
Are there tools that exist that would make my task easier?

You can use special transformation tools/languages for that TXL or Stratego/XT.
Also you can have a look and easily try Java to Python and Java to Tcl migrating projects made by me with TXL.
You are right about mapping library concepts. It is rather hard and long task. There are two ways here:
Fully migrate the class library from language A to B
Migrate classes/functions from language A to the corresponding concepts in language B
The approach you will choose depends on your goals and time/resources available. Also in many cases you wont be doing a general A->B migration which will cover all possible cases, you will need just to convert some project/library/etc. so you will see in your particular cases what is better to do with classes/libraries.

I think this is almost impossibly hard, especially as a personal project. But if you are going to do it, don't make life even more difficult for yourself by trying to come up with a general solution. Choose two specific real-life programming languages ind investigate the possibities of converting between them. I think you will be shocked by the number of problems and issues this will expose.

There are some tools for direct migration for some combinations of A and B.
There are a variety of reverse engineering and code generation tools for different languages and platforms. It's fairly rare to see reverse engineering tools which capture all the semantics of the source language, and the semantics of UML are not well defined ( since it's designed to map to different implementation languages, it itself doesn't define a complete execution model for its behavioural representations ), so you're unlikely to be able to reverse engineer and generate code between tools. You may find one tool that does full reverse engineering and full code generation for your A and B languages, and so may be able to get somewhere.
In general you don't use the same idioms on different platforms, so you're more likely to get something which emulates A code on B rather than something which corresponds to a native B solution.

If you want to use Java as the source language(that language you try to convert) than you might use Checkstyle AST(its used to write Rules). It gives you tree structure with every operation done in the source code. This will be much more easier than writing your own paser or using regex.
You can run com.puppycrawl.tools.checkstyle.gui.Main from checkstyle-4.4.jar to launch Swing GUI that parse Java Source Code.

Based on your comment
I'm not sure yet, but I think the source language/framework would be Java/Swing and the target some RIA language like Flex or a Javascript/Ajax framework. – Alain Michel 3 hours ago
Google Web Toolkit might be worth a look.

See this answer: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Code Length in IDE ( w/o modeling support ) versus Code Efficiecy in Compilation in Delphi

So - highly hypothetical question and more like discussion about your coding style and practice you use daily.
I will take as example: CodeGear RAD Studio 2009 (sorry to all D7 fans, but Unicode rules).
I have capability to expand/collapse functions/procedures/records and few other complex data structures, but what if code is lengthy?
What makes the task and its accomplishment efficient - the time required to add comments (its req actually) and expand/collapse necessary area or use OMT offered possibilities?
To give example input from myself - I have small app, about 1,5k lines and I do not use Modeling. Is it smart enough or do I lose a lot of time if I need to find some simple references or (event) calls?

If I understand your question correctly, it is a bout finding your way into code (yours or someone elses').
I use Model Maker Code Explorer for browsing through source code (and for refactoring existing code, and creating new code). At EUR 99, it is dead cheap for what it does.
It usually gives me a perfect overview of what I need, and has a nice 'search' interface as well.
If I need more complex searches, I usually use the GExperts (grep) search function: it is blazingly fast, and with good naming of your identifiers, it is usually a breeze to find stuff.

If I understand your question correctly, you want to know what is more efficient:
Use comments and expandable sections.
Use moddeling techniques.
I think it depends on personal style. Modeling can be great, but has dangers of spending too much time creating nice pictures.
We have a large app 500k+ lines. We do not use collapsable sections because we keep our file size acceptable and we have a good file organisation structure. We sometimes use modeling if complex parts are added (class diagrams and state diagrams). And we use lots of comment to explain difficult parts.

If you have Delphi 2009 you can use also the Delphi Class Explorer (in the View menu) in order to see your classes. It seems a little bit cryptic but only for the first 5 minutes. After this you will get used with it.
Also you can use CnPack a very impressive package in order to help you manage your project. Basically, in the IDE appears a new menu called 'CnPack' which has a bunch of wizards to help you find the way out in the source. Some examples:
Uses Cleaner
Procedure List (it gives you the incremental search capability for your procedures - very neat)
Bookmark Browser
etc.

Do you have coding standards? If so, how are they enforced? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've worked on a couple of projects where we spent a great deal of time discussing and writing elaborate coding standards covering everything from syntax layout to actual best practices. However, I have also found that these are rarely followed to the full extent. Many developers seem to hesitate to reject a code review based on coding standard violations alone. I.e. violations are committed to the repository on a regular basis.
My questions are: Do you have coding standards? What do they cover? Are they followed by everyone? And what do you do (if anything) to make sure everybody is following the standards?
I'm aware that there is a similar question here, but my concern is not so much how you could do it, but how you are actually going about it and what are the perceived benefits?

I've worked in places with barely-followed coding practices, and others where they're close to being enforced - or at least easily checked.
A few suggestions:
The most important thing is to get buy-in to the idea that consistency trumps your personal preferred style. There should be discussion of the coding standard both before and after it's instituted, but no-one should be allowed to just opt out of it.
Code reviews should be mandatory, with the checkin comment including the username of the reviewer. If you're using a suitably powerful SCM, consider not allowing checkins which don't have a valid reviewer name.
There should be a document which everyone knows about laying out the coding standards. With enough detail, you shouldn't get too much in the way of arguments.
Where possible, automate checking of the conventions (via Lint, CheckStyle, FXCop etc) so it's easy for both the committer and the reviewer to get a quick check of things like ordering import/using directives, whitespace etc.
The benefits are:
Primarily consistency - if you make it so that anyone can feel "at home" in any part of the codebase at any time, it gives you more flexibility.
Spreading best practice - if you ban public fields, mutable structs etc then no-one can accidentally plant a time bomb in your code. (At least, not a time bomb that's covered by the standard. There's no coding standard for perfect code, of course :)
EDIT: I should point out that coding standards are probably most important when working in large companies. I believe they help even in small companies, but there's probably less need of process around the standard at that point. It helps when all the developers know each other personally and are all co-located.

Do you have coding standards?
Yes, differs from project to project.
What does it cover?
Code(class, variable, method, constant), SQL naming and formatting convention
Is it being followed by everyone?
Yes, every new entrant in project could be asked to create a demo project following organization coding convention then it gets reviewed. This exercise makes developer feel at ease before starting real job.
And what do you do (if anything) to make sure everybody is following the standard?
Use StyleCop and FxCop to ensure they are religiously followed. It would show up as warning/error if code fails to comply with organization coding convention.
Visual Studio Team system has nice code anlysis and check-In policies which would prevent developers checking in code that does not comply
Hope, it helps
Thanks,
Maulik Modi

We take use of the Eclipse's save actions and formatters. We do have a suggested standard, but nobody is actually enforcing it, so there are some variations on what is actually formatted, and how.
This is something of a nuisance (for me), as various whitespace variations are committed as updates to the SVN repository...

StyleCop does a good job of enforcing coding layout practices and you can write custom rules for it if something isn't covered in the base rules that is important to you.

I think coding standards are very important. There is nothing more frustrating than trying to find the differences between two revisions of a file only to find that the whole file has been changed by someone who reformatted it all. And I know someone is going to say that that sort of practice should be stamped out, but most IDEs have a 'reformat file' feature (Ctrl-K Ctrl-D in Visual Studio, for example), which makes keeping your code layed out nicely much easier.
I've seen projects fail through lack of coding standards - the curly-brace wars at my last company were contributary to a breakdown in the team.
I've found the best coding standards are not the standards made up by someone in the team. I implemented the standards created by iDesign (click here) in our team, which gets you away from any kind of resentment you might get if you try to implement your own 'standard'.
A quick mention of Code Style Enforcer (click here) which is pretty good for highlighting non-compliance in Visual Studio.

I have a combination of personal and company coding standards that are still evolving to some extent. They cover code structure, testing, and various documents describing various bits of functionality.
As it evolves, it is being adopted and interpreted by the rest of my team. Part of what will happen ultimately is that if there is concensus on some parts then those will hold up while other parts may just remain code that isn't necessarily going to be up to snuff.
I think there may be some respect or professional admiration that act as a way of getting people to follow the coding standards where some parts of it become clear after it is applied, e.g. refactoring a function to be more readable or adding tests to some form, with various "light bulb moments" to borrow a phrase from Oprah.
Another part of the benefit is to see how well do others work, what kinds of tips and techniques do they have and how can one improve over time to be a better developer.

I think the best way to look at coding standards is in terms of what you hope to achieve by applying, and the damage that they can cause if mis-applied. For example, I see the following as quite good;
Document and provide unit tests that illustrate all typical scenarios for usage of a given interface to a given routine or module.
Where possible use the following container classes libraries, etc...
Use asserts to validate incoming parameters and results returned (C & C++)
Minimise scope of all variables
Access object members through methods
Use new and delete over malloc and free
Use the prescribed naming conventions
I don't think that enforcing style beyond this is a great idea, as different programmers are efficient using differing styles. Forcing programmers to change style can be counter productive and lead to lost time and reduced quality. Standards should be kept short and easy to understand.

Oh yes, I'm the coding standard police :) I just wrote a simple script to periodically check and fix the code (my coding standard is simple enough to implement that.) I hope people will get the message after seeing all these "coding convention cleanups" messages :)

We have a kind of 'loose' standard. Maybe because of our inability to have agreement upon some of those 'how many spaces to put there and there', 'where to put my open brace, after the statement or on the next line'.
However, as we have main developers for each of the dedicated modules or components, and some additional developers that may work in those modules, we have the following main rule:
"Uphold the style used by the main developer"
So if he wants to do 3 space-indentation, do it yourself also.
It's not ideal as it might require retune your editor settings, but it keeps the peace :-)

Do you have coding standards?
What does it cover?
Yes, it has naming conventions, mandatory braces after if, while ... , no warning allowed, recommendations for 32/64 bits alignment, no magic number, header guards, variables initialization and formatting rules that favor consistency for legacy code.
Is it being followed by everyone?
And what do you do (if anything) to make sure everybody is following the standard?
Mostly, getting the team agreement and a somewhat lightweight coding standard (less than 20 rules) helped us here.
How it is being enforced ?
Softly, we do not have coding standard cop.
Application of the standard is checked at review time
We have template files that provide the standard boilerplate

I have never seen a project fail because of lack of coding standards (or adherence to them), or even have any effect on productivity. If you are spending any time on enforcing them then you are wasting money. There are so many important things to worry about instead (like code quality).
Create a set of suggested standards for those who prefer to have something to follow, but leave it at that.

JTest by ParaSoft is decent for Java.

Our coding standards are listed in our Programmer's Manual so everyone can easily refer to them. They are effective simply because we have buy in from all team members, because people are not afraid to raise standards and style issues during code reviews, and because they allow a certain level of flexibility. If one programmer creates a new file, and she prefers to place the bracket on the same line as an if statement, that sets the standard for that file. Anyone modifying that file in the future must use the same style to keep things consistent.
I'll admit, when I first read the coding standards, I didn't agree with some of them. For instance, we use a certain style for function declarations that looks like this:
static // Scope
void // Type declaration
func(
char al, //IN: Description of al
intl6u hash_table_size, //IN/OUT: Description of hash_table_size
int8u s) //OUT: Description of s
{
<local declarations>
<statements>
}
I had never seen that before, so it seemed strange and foreign to me at first. My gut reaction was, "Well, that's dumb." Now that I've been here a while, I have adjusted to the style and appreciate how I can quickly comprehend the function declaration because everyone does it this way.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio