How should I properly implement the core Clojure interfaces? - data-structures

If I'm implementing some data structure in Clojure using deftype, how should I decide which of the core Clojure interfaces to implement?
I was unable to find a comprehensive guide to the various Clojure interfaces; in fact, the only relevant piece of information I was able to find was this question, which is very limited in scope.
What I'm looking for is a list of each of the core Clojure interfaces, with a brief description of what it is and when you should implement it (or if you should even implement it at all!), as well as a list of all the common functions in clojure.core that use it. A description or diagram of the inheritance tree of all these interfaces would also be helpful.

i can not answer all your questions but I usually start from clojure atlas http://www.clojureatlas.com/ to find which interfaces to implement.

The excellent book Clojure Applied by Ben Vandgrift and Alex Miller provides detailed information on this exact topic in "Chapter 2. Collect and Organize Your Data", under "Building Custom Collections". A visual diagram of interfaces in clojure.lang is provided on page 39.

Related

Is there a Cypher syntax definition anywhere?

I'm looking for a definition of the syntax for the Cypher query language. I tried the docs but they're very vague.
Ideally, I'd like a BNF (or any variant) definition, or one of those "graph" definitions like this or this. Really, anything resembling a formal definition.
What you are looking for will be available in openCypher. Several items will be released as part of the project, one of the first of which is the BNF grammar.
Update 2016-01-30: A first draft of the grammar is now avialable at \https://github.com/opencypher/openCypher/blob/master/grammar.ebnf.
Update: 2016-10-17: EBNF and Antlr grammars, TCK, railroad diagrams, and a list of community projects are available at http://www.opencypher.org/#resources
Take a look at the recently announced (Oct 2015) openCypher project. It involves releasing the language specification, among other things.
From the announcement:
1. Cypher reference documentation:
A comprehensive user documentation describing use of the Cypher query language with examples and tutorials.
2. Technology compatibility kit (TCK):
The TCK consists of a number of tests that a software supplier would run in order to self-certify support for a given version of Cypher.
3. Reference implementation:
Distributed under the Apache 2.0 license, the reference implementation is a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool. The first planned deliverable is a parser that will take a Cypher statement and parse it into an AST (abstract syntax tree) representation. The reference implementation complements the documentation and tests by providing working implementations of Cypher – which are permissively licensed – and can be used as examples or as a foundation for one’s own implementation.
4. Cypher language specification:
Licensed under a Creative Commons license, the Cypher language specification is a technical expression of the language syntax to enable parsers to auto-generate the query syntax. A full semantic specification is also planned as a part of the openCypher project.
The same announcement also says that the process is open and that it is possible to submit, review and comment on language proposals.
Update!
Neo4j has changed a lot since this answer was written. In 2017 the simple answer is yes, you can download the grammar files from https://www.opencypher.org/
Below is the old answer, which was accurate in 2014
As far as I can tell, the only formal definition is in the code. That's the bad news.
The good news is that the code uses a scala library to do the parsing which makes the code rules look kinda/sorta like BNF. And there's some documentation on how to read it.
Here's a link into a scala object that defines what a query is.
This general package on github looks to me like it contains all of the cypher command implementations, and should have everything you're asking for.
Code in this package is written in scala, and looks like this:
object Query {
def start(startItems: StartItem*) = new QueryBuilder().startItems(startItems:_*)
def matches(patterns:Pattern*) = new QueryBuilder().matches(patterns:_*)
def optionalMatches(patterns:Pattern*) = new QueryBuilder().matches(patterns:_*).makeOptional()
def updates(cmds:UpdateAction*) = new QueryBuilder().updates(cmds:_*)
def unique(cmds:UniqueLink*) = new QueryBuilder().startItems(Seq(CreateUniqueStartItem(CreateUniqueAction(cmds:_*))):_*)
(...)
This matches roughly with the upper right hand quadrant of the Cypher refcard. You can sorta see that there can be a start clause, a match clause, and so on. This includes links to other implementation classes (like UpdateAction which further define clauses considered update actions.
Make sure to also read How Neo4J Uses Scala's Parser Combinator: Cypher's Internals Part 1 for more information on what's going on here, and the mapping between the scala classes and what we'd normally consider EBNF. This blog post is old (2011) and the specific code examples it gives shouldn't be trusted, but I think it has good general information on how the implementation works, and what to look for if you want to understand the EBNF behind cypher.
Disclaimer: I'm not a scala hardcore, YMMV, IANAL, devs please overrule me if I'm wrong.
(Michael Hunger answered in a comment, so I can't accept his answer. Here's his answer:)
Cypher uses parboiled as parser, the parboiled rule DSL are pretty easy to read and understand. https://github.com/neo4j/neo4j/blob/d18583d260a957ab1a14bd27d34eb5625df42bc5/community/cypher/cypher-compiler-2.2/src/main/scala/org/neo4j/cypher/internal/compiler/v2_2/parser/Clauses.scala
None of these seem to work any more.
I don't see anything on the opencypher.org site that looks like a grammar to download.
None of the github links from Michael Hunger work.
I'd really like access to SOME resource where I can learn how to construct queries for functions like avg that allegedly take a list expression as an argument, yet barf at every variant I can figure out.

Translating ecore models (accompanying OCL expressions) to alloy specification

I am looking to see if there is any tool or engine which translates Ecore (meta-)models to Alloy specification?
if it does this translation considering accompanying OCL expressions, it would be great :)
Thx
There are a number of research papers on the topic of translating between Alloy and UML. A quick google scholar search for "ocl alloy" returned more than 6000 results. Here are some that seemed the most relevant:
On challenges of model transformation from UML to Alloy
UML2ALLOY: A TOOL FOR LIGHTWEIGHT MODELLING OF DISCRETE EVENT SYSTEMS
Analysis of model transformations via alloy
Formal refactoring for UML class diagrams
A research paper that explicitly focus on translating UML class diagrams annotated with OCL is "Translating between Alloy specifications and UML class diagrams annotated with OCL", by Alcino Cunha, Ana Garis and Daniel Riesco.
You can check out the implementation here. It should be trivially adaptable to Ecore meta-models.
An Eclipse plugin called Lightning allows to do such transformations. It is currently in test phase, and will normally be available online at the end of the week. (I will edit this answer with the update-site link when it will be release)
If you can't wait that long, I might arrange you an early access.
edit :
update site > http://lightning.gforge.uni.lu/update-site

How to build a static code analysis tool?

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.
I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.
Any books, articles , blogs , sites..etc would help.
Thanks.
I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.
A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.
You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.
Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.
People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.
Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as
many tools for such languages.
Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.
There is a blog by DeepSource that covers everything one needs to know to build an understanding of static code analysis and equip you with the basic theory and the right tools so that you can write analyzers on your own.
Here’s the link: https://deepsource.io/blog/introduction-static-code-analysis/
Obviously you need a parser for the language. A good high level AST is useful.
You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
Combine the AST with the mistakes in #2.

What is the technique behind the code generation feature of UML tools

I am an engineering student, and deciding upon my final year project.
One of the many candidates is an online UML tool with code generation facilities. But I did not take compiler designing classes, so I am not much aware of the code generation techniques.
I want to know about the techniques that I should look to study in order to build something like this. If these techniques are as complicated as writing a compiler, then perhaps I will have to abandon this idea.
Compilation is really the opposite of the kind of code generation you are describing, so I don't think you need to know how to write a compiler.
Code generation can be as simple as combining text strings or using templates, or as complex as using Reflection.Emit to create classes at runtime.
I would start with this Wikipedia article.
The creation of an UML tool is a long term project. You need many to acquire different expertises which can not be known by just one member of the team.
Your academic project is too ambitious.
An easy project which has never been done is to generate code from an activity or state diagram. You should not try to recreate the graphical editor because this is very very complex but only to take the xmi export and generate code from it using a xml parser. This would be a good 6 months project for your thesis :-)
Most UML tools generate source code. The generation is normally quite a bit simpler than a compiler as well. For example, a class diagram will have a collection of data structures representing classes and links between those classes (inheritance). To generate output, you walk through the class objects, and for each you "print" out a representation of that object in the syntax of the target language.
I'm not sure exactly what capabilities your code generation will require, but the UML tools that I have used are not very sophisticated in their code generation.
Tools that I have used simply create files and drop your function names into them with arguments derived from the inputs. This would not require any understanding of compilers. Most of the difficulty would be in the user interface and how you store the data to make code generation easy.
You can just find that here:
http://yuml.me and http://askuml.com

What exactly can concrete domains be used to describe?

I have read the formal definition of a "concrete domain", but I still don't quite get it.
Could someone explain it to me in simpler terms, preferably with some examples?
The definition is available in
Reasoning in Description Logics with a Concrete Domain in the Framework of Resolution by Ullrich Hustadt, Boris Motik, Ulrike Sattler. Page 1, Definition 1.
I'm not very good with the predicate logic myself but became intrigued by your question so I read up and got a fairly good overview of the concept of concrete domains from the part 6.2.1 in "The Description Logic Handbook", it had some examples also.
I'm a new user and can't post links but you will find the book by searching for "concrete domains" on Google (look for books.google.com).
I hope it helps, I found the formal definition hard to get as well! The beginning of part 6.2.1 is good for introduction in the book.
Good luck!

Resources