Open Source Static Code Analysis Models - static-analysis

I am working on a project where I'd like to develop some static source code analysis tools. The source code will be in multiple proprietary languages that interact with one another. So, I am looking for a project that defines an abstract Model/AST and can do some data flow analysis for languages where I can translate each proprietary language into the Model and be able to analyze the data flow/tree.
Does such a project exist?

Not open source, but designed and proven useful for building tools to handle multiple, complex langauges: our DMS Software Reengineering Toolkit.
DMS contains strong parsing machinery (capable of handling difficult languages such as C++) that builds ASTs automatically from just a grammar description, and libraries to support construction of symbol tables, and various kinds of control and data flow analysis.
OP will have to provide grammar and semantical descriptions of his proprietary languages, but I think he is expecting that. If he wants to model flows across the languages, he'll have to organize his flow analyses for the individual languages to be compatible. The fact that DMS uses uniform infrastructure/datastructures to support all these activities even for different langauges will make this easier.
He should not expect a project involving multiple languages to be easy or quick, regardless of the framework he finds. Our intention with DMS was to make this practical.

I think the Object Management Group's (OMG) Specification for the Knowledge Discovery Metamodel (KDM) is kind of in the space you're looking for. (See http://www.omg.org/spec/KDM/). It's part of the Architecture Driven Modernization (ADM) activity at the OMG. KDM has been republished by ISO as ISO/IEC 19506:2012(E).
From the introduction:
This International Standard defines a meta-model for representing existing software assets, their associations, and
operational environments, referred to as the Knowledge Discovery Meta-model (KDM).
You'll likely have to do most of the heavy lifting yourself, but at least the metamodel has been provided.

More as a sidemark: If you are not too much interested in syntactic details and have the free choice of your platform, you might as well analyze code for a VM, like .Net bytecode.
There are compilers for C# and F# and also C++(/CLI) and Visual Basic (of course most of them from a well-known, large software company :-) )
They all compile to bytecode programs, which can be inspected e.g. by tools like Mono.Cecil, which allow to construct control flow graphs etc.

Related

Micro-services - decision on technology/platform

Is it good architecture of an application if,
I am using multiple technologies leveraging strong points of each.
for example:
Encryption in python,
integration of services in java etc.
or should I stick to one technology like Java as I am comfortable with it?
Also the reason for this question is I am thinking of developing a new application in which speed is a major concern, I am targeting to attain.
Also Database that I am preferring for now is MongoDb.
Any suggestions on the Technologies apart from these technologies?
Also will this approach help in speeding up the application?
Writing the main application in one language only is a easier approach than dividing your application and attempting to write pieces in each language that is best suited for the task, unless you are fluent in a few languages and the ones chosen are particularly suited to specific groups of tasks that make up parts of the functionality.
Because MongoDB has a Java Driver there's nothing wrong with writing your main application in Java and relying on libraries written in other languages (MongoDB is written in C++, C and JavaScript).
As long as other works you need to rely on are well maintained there's no reason to switch from your preferred language to match what any of your libraries are using.
If you add artificial intelligence to your program in the future and part of the code is to run on a GPU you are forced to have a program that is a hybrid; learning a new language along with the details of the underlying algorithms is certainly more of a burden than learning the API.
Decide where to draw the line, what you will write in your preferred language and what will be written by others. It's certainly better to choose libraries and programs that you interface with written in languages you understand (assuming that they are open source). If what you interface with has no source available it becomes a 'black box' which simply must work, there are occasions when that is acceptable and occasional when there is no choice.

Cooperation between multiple programming languages

I'm a fairly advanced hobby programmer. I consider myself capable at Objective-C, Java, some straight C, Python, and general MVC design.
I've written quite a few programs but they have all been relatively self-contained, using external libraries occasionally.
When reading about larger projects, and/or more complicated programs, I hear a lot of language thrown around about "Writing one part in X, and writing this part in Y."
Since I have a lack of experience with this, I was wondering if someone could point me in the right direction. What general designs/mechanisms are employed for applications or projects written in more than one language? What is involved in a "scriptable" design?
Thanks for any guidance on the topic!
-Chase
There is no single "right way". A multitude of approaches exist, including the .NET-way, where all the languages are hosted inside a common runtime environment with well-specified interoperability constraints, and a good old Unix-way, where all the components are supposed to communicate via pipes or sockets, using simple text-based protocols.
For the latter you can read a classic book: http://en.wikipedia.org/wiki/The_Unix_Programming_Environment
Depends on what you need to do. For example if you want to build a poker game online then, most probably you would use java for the application and flash/flex for the interface. Java has the power of the libraries and the flash/flex are quite generally available and offer a rich interface.
If you have a software that receives input from an online application and offers output on a specific output (label printer for example) then your online-ready software (Java/PHP/Python) would best communicate with a specially designed program on the target computer. A program for which I'd use C++ for it's technical power, rigurosity and speed compared to java.
The idea is to identify the languages that suit your purpose best. In my opinion it is ideal that you use one language to do all the stuff, that is why I like java as it seems to fit everything although it has a more or less bad renown for slowness.
I see things in a kind of this way:
1. Engineered, machine oriented stuff then it is C++ (and languages of it's kind)
2. Mobile multifunctional stuff (middle-ware mainly) Java
3. Online , browser based stuff PHP especially for B2C(people oriented) applications
4. Python,Ruby etc are from my point of view somewhere between java and PHP but I never really worked with them so I can not give an exact opinion
You can link them together depending on your needs.

Textual Domain-Specific language (DSL) development with Microsoft Visual Studio

I did some searches on developing a DSL in visual studio. At the beginning, I found out there is a Visualization and Modeling SDK for VS2010. It has a tool called DSL tool. But it seems that it is only for graphical DSL development.
Then I saw some posts saying "Oslo" is a tool for developing textual DSL, which "was" a Microsoft product - Microsoft no longer supports the tool. http://blogs.msdn.com/b/modelcitizen/archive/2010/09/22/update-on-sql-server-modeling-ctp-repository-modeling-services-quot-quadrant-quot-and-quot-m-quot.aspx
Therefore, I would like to know if I want to develop a textual DSL, what tool is the best? What do you think if I implement a DSL parser making use of F# powerpack with FSLex and FSYacc?
I am currently developing several external text-based DSLs using FsLex/FsYacc. I was using a hand parser, but I find the FsLex/FsYacc much easier to maintain in the design stage.
FsLex/FsYacc are not as sophisticated as ANTLR, but since most DSLs are fairly simple, FsLex/FsYacc are a perfectly sound choice for use within Visual Studio. And keeping DSLs simple is a good thing, since they are intended to be restricted and simple to learn.
I find Martin Fowler's book to be a good resource, less for the examples and details than as an encyclopedia of DSL ideas. His discussion of useability and other design aspects of DSLs is also worth reading. As Toumas indicated, it does not cover either F# or functional languages. Mr. Fowler writes that he lacked the experience in those subjects to bring the book to market in a timely way.
Having praised FsLex/FsYacc, I do still wish someone would write a good ANTLR back-end for F#. :)
-Neil
I am a fan of embedded DSLs, a la
http://lorgonblog.wordpress.com/2010/04/15/using-vs2010-to-edit-f-source-code-and-a-little-logo-edsl/
http://lorgonblog.wordpress.com/2010/04/16/fun-with-turtle-graphics-in-f/
where you just use leverage F# syntax with some good function names and possibly other syntax cleverness (lists, workflows, ...) to get code that "looks like maybe it is another language" but is actually just F#.
But yes, for external DSLs, you just need a grammar/parser/etc tool chain, and either FsLex/FsYacc, or maybe ANTLR or FParsec are various choices. (I don't have enough experience with any of these to know trade-offs among them.)
Since having made my earlier post, I have also bought and read parts of Terence Parr’s book “Language Implementation Patterns.” It is excellent, though quite a bit more technical than Martin Fowler’s book (with some additional material it could be a “Dragon Book” for the new millennium). The examples are strongly based in Java and ANTLR, but the text is the main thing, so the book is useful regardless of one’s language development environment.
Interestingly, there is little overlap between the two books. Martin Fowler’s book does a good job of covering the design and implementation of basic DSLs, such as those used for specification and configuration, while Terence Parr’s book is more technical and covers the realm extending all the way up through more sophisticated languages and byte-code machines. I recommend both if you can budget for them, otherwise, either is an excellent choice within its given domain.
Martin Fowler has a new book about DSL:s. Sadly, it won't discuss much about Microsoft's tooling nor functional languages.
Microsoft no longer support the graphical tool "Quadrant", but MGrammar is still supported and integrated to SQL server, right? MGrammar is the "DSL making language".
Still, I would say that functional languages (read: F#) are the way to go.
This book has a simple example of how to make a DSL with F#: http://www.manning.com/petricek/
and also Google finds many other good references about this topic.
Try MBase, but it only worth using if your DSL is complicated enough to require an efficient compiler and a PEG grammar. Otherwise FsYacc is more than enough.
Our DMS Software Reengineering Toolkit is designed to handle arbitrary DSLs (I happen to be the architect).
Most people think if you have a parser you have enough, and it is technically true, in the same sense that if you have transistors you can build a computer.
In my experience you want a lot more than just a parser: you need ways to build symbol tables so that your generator knows what the meaning of a particular identifier is, means to analyze the specification, ways to easily encode your translation and to apply optimizations to generated results.
DMS provides all these capabilities to support building DSLs. And in that sense, it goes much beyond F#.

Prolog web programming

At work, there was a discussion of using prolog as the backend for a rules engine on a web-app.
How would this get tied into existing systems?
Are there available prolog libraries for other languages allowing the invocation of prolog modules?
For SWI-Prolog, you could look to Thea2 which has support for SWRL in Prolog and can also be attached to external reasoners via JPL such as HermiT for OWL/SWRL reasoning, or Pellet, etc.
On a personal note, I have used JPL several times in the past to enable web-apps with a SWI-Prolog backend, which works just fine if you intend to program your web app using a language which is executable on a JVM, like Java, Groovy, or Scala, for example. Another alternative would be to hook SWI-Prolog into a C or C++ environment, which I haven't tried for a web-app.
If your web-app is using another development language that doesn't run on a JVM or in C/C++, then this mightn't be the right path for you as it seems to be a bit harder to connect a running SWI-Prolog environment to other language environments. However, that said, we have successfully implemented a SWI-Prolog-to-anything bridge using HTTP before, but this is less than ideal if performance is a necessity.
swi-prolog has a perfectly reasonable HTTP server/ web framework included.
You could talk to it over HTTP.
There are tools for parsing XML/SGML and JSON, and ODBC
I'm not sure exactly what you're looking for, but you may want to look into Yield Prolog
, which allows the embedding of Prolog code into programs using Python, C#, or Javascript. There is no API involved. I haven't used this myself (yet), but it may be amenable to what you're trying to do.
I guess an important prerequisite for web embedding, especially on the
server side, is multi theading capability of the Prolog system. At
least you would probably need this, if you want to serve multiple
users concurrently.
You can then opt either for a pure solution, where the pages are
generated and maintained by Prolog itself. Or maybe for a mix solution,
where the Prolog system is only used for some business logic and another
programming language is used for presentation and/or storage.
The following Wiki comparison table gives an overview of Prolog systems and
whether they are multi threaded and/or support some web programming:
Comparison of Prolog implementations,Operating system and Web-related features
For my own take on this problem I have set up a little tutorial
that shows the use of the Jekejeke Runtime for server side
business logic. Jekejeke Runtime is quite flexible, you can not
only have multiple threads, you can also have multiple
knowledge bases.
Jekejeke Runtime,Deployment Methods
The Jekejeke Runtime is for example currently used in a productive
sales system, by some custom read/write locks, it allows even remote
hot swap of the knowledge base by an administrator without restarting
the web context. Unfortunately there is no report yet on that.
Bye
LPA Prolog has been widely used in various commercial web-based applications, most noticeably within Business Integrity's industry-lead document assembly product, DealBuilder
LPA provide various architectures for deliverying web-based applications - some of which are shocased within the VisiRule section on the LPA web-site
www.lpa.co.uk
Clive
Some languages make use of a bridge and provide a library for it for example in Python there's PySWIP for single-threaded use, so it doesn't play nicely with web apps. I've found Pengines to be far more versatile.
Pengines are in SWI, they're a way of querying Prolog via HTTP. There's at least a JavaScript, Java, and Python library to interface with them, but it's just a bunch of HTTP requests. Makes it easy to distribute, use as a micro-service, or horizontally scale. Although as anniepoo has witnessed to, a SWI server can handle decent amount of traffic!
Also in the case of JavaScript there's Tau Prolog which compiles to and works within JavaScript.
There's a lot of sense in using a logic language as a rule engine.

When should I use a Domain Specific Language? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I would like some practical guidance on when I should use a Domain Specific Language. I have found resources about advantages and disadvantages, but what kind of project would warrant its use?
It seems like there is a big investment in time to create and maintain a DSL, so in what application space would I get a productivity return on my time investment?
Edit: It seems the most common use of DSL is for file formats for persisting data state, what about using a DSL for program logic and structure(perhaps code generation)? When is this feasible?
Edit #2 I am mainly asking about when is creating a specific DSL worthwhile. Of course we should use existing DSLs as much as possible to save time.
There are very few good reasons for creating yet another DSL. The world is fat with special-purpose languages.
Think along with these lines.
Solve the problem with a general-purpose language such as Python, Java, C++.. whatever.
Optimize that solution to factor out the common features and build a really nice, really elegant, really extensible class library.
Optimize that class library to emphasize "orthogonality". Make sure all features work well together, without any problems.
If you need simplification of the syntax only, create a scripting wrapper around your nice class library. This is your DSL. For Python, this is easy -- it's already a dynamic language. For Java, there are things you can leverage. For C++ it can be a bit of work to build this flexible scripting environment.
If you still need further optimization, consider writing a compiler for your DSL.
The ACM Computing Surveys article When and How to Develop Domain-Specific Languages provides advice on just this topic, as does Martin Fowler's 2010 book Domain-Specific Languages.
Firstly, I would use a DSL when the problem domain your developing against is a widely well known domain, and some business experts of that domain have already went through great lengths to build such a DSL so that you wouldn't have to go through the lengths yourself to solve all the problems they have already figured out.
If you're thinking of creating a DSL, I would consider doing so if your business is done in a very particular area, and you spend the majority of your time focusing in a specific problem domain. If you bounce around doing applications for multiple problem domains, then I wouldn't advise taking that approach.
For example, if your business is soley in building tax applications, it might be a good idea to build a tax system DSL. This would allow your language not only to be useable by you in your various tax applications, but it would also be marketeable (useable) by other businesses in your industry that want to do similar things that you're accomplishing.
Of course, you have to weight the costs/benefits of building a DSL vs a framework on top of an already existing language.
One situation that comes to mind is when the requirements requires a very high or improbable level of customization/configuration. So you would provide a kind of scripting model against a DSL instead.
Takes a car assembly "arm" for example, providing a configuration model to support various factory configurations would be impossible. (detect this, don't detect that, when this happens do this ... etc.)
But compiling a new application with specialized logic for each customer is probably not a good way to go. So in this case, you create a little framework that would becomes a kind of DSL, and then for each robotic arm you sells, you write a little app in your DSL and save it along with the core software that would compiles and runs your DSL scripts instead. Or better yet, tools to program the DSL are included along with the robotic arm so your customer can "program" the arm themselves in the DSL you created.
One real world examples that comes to mind is Yahoo Pipes (you could think of it as a DSL) or the robots.txt directive for automated web crawler for example. They may not be a full-blown DSL but they demonstrate where DSL might be useful.
Well, someone has to say it, so here goes:
Lisp is regarded by some as the domain specific language for any domain. A well-supported and very extensible DSL at that.
In some cases, fashioning a DSL from Lisp (or a similar language such as Haskell) could actually provide a lot of power with minimal effort, and thus would be quite worthwhile. DSLs do not always need to be big maintenance burdens.
The most obvious is you should definitely use them when the language already exists and is well-supported. The prime examples of this are UIL for Motif-based GUI development and make for software builds.
If you have to make your own, I'd say to look for domains a large amount of effort is in just specifying things properly, and where your compiler can't really find most errors, but a domain-specific compiler could. GUIs are a great example, as most of the work is in setting up the layout, and there are generally lots of ways to make syntacticly valid C++ calls that make no sense whatsoever to your underlying GUI system (EG: trying to embed an entire dialog widget inside a button).
I find UIL particularly a huge gain for GUI development because a UIL compiler can find errors in a GUI specification that just look like nice normal compilable code to a C++ compilter. The fact that it is well-supported means that the code is easy to port between platforms, and even GUI builders.

Resources