Is there a Cypher syntax definition anywhere? - syntax

I'm looking for a definition of the syntax for the Cypher query language. I tried the docs but they're very vague.
Ideally, I'd like a BNF (or any variant) definition, or one of those "graph" definitions like this or this. Really, anything resembling a formal definition.

What you are looking for will be available in openCypher. Several items will be released as part of the project, one of the first of which is the BNF grammar.
Update 2016-01-30: A first draft of the grammar is now avialable at \https://github.com/opencypher/openCypher/blob/master/grammar.ebnf.
Update: 2016-10-17: EBNF and Antlr grammars, TCK, railroad diagrams, and a list of community projects are available at http://www.opencypher.org/#resources
Take a look at the recently announced (Oct 2015) openCypher project. It involves releasing the language specification, among other things.
From the announcement:
1. Cypher reference documentation:
A comprehensive user documentation describing use of the Cypher query language with examples and tutorials.
2. Technology compatibility kit (TCK):
The TCK consists of a number of tests that a software supplier would run in order to self-certify support for a given version of Cypher.
3. Reference implementation:
Distributed under the Apache 2.0 license, the reference implementation is a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool. The first planned deliverable is a parser that will take a Cypher statement and parse it into an AST (abstract syntax tree) representation. The reference implementation complements the documentation and tests by providing working implementations of Cypher – which are permissively licensed – and can be used as examples or as a foundation for one’s own implementation.
4. Cypher language specification:
Licensed under a Creative Commons license, the Cypher language specification is a technical expression of the language syntax to enable parsers to auto-generate the query syntax. A full semantic specification is also planned as a part of the openCypher project.
The same announcement also says that the process is open and that it is possible to submit, review and comment on language proposals.

Update!
Neo4j has changed a lot since this answer was written. In 2017 the simple answer is yes, you can download the grammar files from https://www.opencypher.org/
Below is the old answer, which was accurate in 2014
As far as I can tell, the only formal definition is in the code. That's the bad news.
The good news is that the code uses a scala library to do the parsing which makes the code rules look kinda/sorta like BNF. And there's some documentation on how to read it.
Here's a link into a scala object that defines what a query is.
This general package on github looks to me like it contains all of the cypher command implementations, and should have everything you're asking for.
Code in this package is written in scala, and looks like this:
object Query {
def start(startItems: StartItem*) = new QueryBuilder().startItems(startItems:_*)
def matches(patterns:Pattern*) = new QueryBuilder().matches(patterns:_*)
def optionalMatches(patterns:Pattern*) = new QueryBuilder().matches(patterns:_*).makeOptional()
def updates(cmds:UpdateAction*) = new QueryBuilder().updates(cmds:_*)
def unique(cmds:UniqueLink*) = new QueryBuilder().startItems(Seq(CreateUniqueStartItem(CreateUniqueAction(cmds:_*))):_*)
(...)
This matches roughly with the upper right hand quadrant of the Cypher refcard. You can sorta see that there can be a start clause, a match clause, and so on. This includes links to other implementation classes (like UpdateAction which further define clauses considered update actions.
Make sure to also read How Neo4J Uses Scala's Parser Combinator: Cypher's Internals Part 1 for more information on what's going on here, and the mapping between the scala classes and what we'd normally consider EBNF. This blog post is old (2011) and the specific code examples it gives shouldn't be trusted, but I think it has good general information on how the implementation works, and what to look for if you want to understand the EBNF behind cypher.
Disclaimer: I'm not a scala hardcore, YMMV, IANAL, devs please overrule me if I'm wrong.

(Michael Hunger answered in a comment, so I can't accept his answer. Here's his answer:)
Cypher uses parboiled as parser, the parboiled rule DSL are pretty easy to read and understand. https://github.com/neo4j/neo4j/blob/d18583d260a957ab1a14bd27d34eb5625df42bc5/community/cypher/cypher-compiler-2.2/src/main/scala/org/neo4j/cypher/internal/compiler/v2_2/parser/Clauses.scala

None of these seem to work any more.
I don't see anything on the opencypher.org site that looks like a grammar to download.
None of the github links from Michael Hunger work.
I'd really like access to SOME resource where I can learn how to construct queries for functions like avg that allegedly take a list expression as an argument, yet barf at every variant I can figure out.

Related

Including ComplexExpr and ComplexFunc classes in Halide API

The ComplexExpr and ComplexFunc classes in the links below seem very convenient to work with complex numbers. Is there a plan to include them into the official Halide API? Or is there a reason why they are not included?
https://github.com/halide/Halide/blob/master/apps/fft/complex.h
https://github.com/halide/Halide/blob/be1269b15f4ba8b83df5fa0ef1ae507017fe1a69/apps/fft/funct.h
Speaking as a Halide developer...
Or is there a reason why they are not included?
We haven't included these historically since we didn't want to bless a particular representation for complex numbers. There are a few valid ways of dealing with them and the headers in question are just one.
Is there a plan to include them into the official Halide API?
We've started talking about packaging some of this type of code into a set of header-only "Halide tools" libraries, so named to avoid the normative implication of calling it something like "stdlib". So as of right now, there is no concrete plan, but the odds are nonzero.
In the meantime, the code is MIT licensed, so you should feel free to use those files, regardless.

Is there nice yaml documentation documenting most of functionality, written without yaml?

Official and most of yaml documentation is written in yaml itself. That is nice demonstration of language power. I get it, that's the main point of this documentation. But documenting language by using that yet unknown language is like solving puzzle. At least for me. Searching in this style of documentation really hard: "which operators can I use for string indentation?" In traditional documentating style one would use chapter say "string indentation". But here, while it's a nice demo, you need to read it all, and understand it all, which is extra subpar if you don't work with yaml daily. And yaml language spec is great, if you want to practice context (free?) grammar definition, but greatly unfit for quick search for basic questions.
My question is. Is there yaml documentation, using traditional structure, documenting most of features, not just very few? One html page, sections and paragraphs? I cannot find one, and I'm always struggling/wasting so much time trying to find something in this style of documentation. And every time I read anything, I feel I'm missing so much information, which is not shown, constantly learning X using not yet explained constructs.

How to build a static code analysis tool?

I m in process of understanding and building a static code analysis tool for a proprietary language from a big company. Reason for doing this , I have to review a rather large code base , and a static code analysis would help a lot and they do not have one for the language so far.
I would like to know how does one go about building a static code analysis tool , for e.g. Lint or SpLint for C.
Any books, articles , blogs , sites..etc would help.
Thanks.
I know this is an old post, but the answers don't really seem that satisfactory. This article is a pretty good introduction to the technology behind the static analysis tools, and has several links to examples.
A good book is "Secure Programming with Static Analysis" by Brian Chest and Jacob West.
You need good infrastructrure, such as a parser, a tree builder, tree analyzers, symbol table builders, flow analyzers, and then to get on with your specific task you need to code specific checks for the specific problems of interest to you, using all the infrastructure machinery.
Building all that foundation machinery is actually pretty hard, and it doesn't help you do your specific task. People don't write the operating system for every application they code; why should you build all the infrastructure? Like an OS, it is better if you simply acquire good infrastructure.
People will tell you to lex and yacc. That's kind of like suggesting you use the real time keneral part of the OS; useful, but far from all the infrastructure you really need.
Our DMS Software Reengineering Toolkit provides all the necessary infracture. It has been used to define many language front ends as well as
many tools for such languages.
Such infrastructure would allow you to define your specific nonstandard language relatively quickly, and then get on with your task of coding your special checks.
There is a blog by DeepSource that covers everything one needs to know to build an understanding of static code analysis and equip you with the basic theory and the right tools so that you can write analyzers on your own.
Here’s the link: https://deepsource.io/blog/introduction-static-code-analysis/
Obviously you need a parser for the language. A good high level AST is useful.
You need to enumerate a set of "mistakes" in the language. Without knowing more about the language in question, we can't help here. Examples: unallocated pointers in C, etc.
Combine the AST with the mistakes in #2.

What tools for migrating programs from a platform A to B

As a pet project, I was thinking about writing a program to migrate applications written in a language A into a language B.
A and B would be object-oriented languages. I suppose it is a very hard task : mapping language constructs that are alike is doable, but mapping libraries concepts will be a very long task.
I was wondering what tools to use, I know this has to do with compilation, but I'm a bit afraid to use Lex and Yacc and all that stuff.
I was thinking of maybe using the Eclipse Modeling Framework, which would help me write models (of application code) transformations in a readable form.
But first I would have to write parsers for creating the models (and also create the metamodel from the language grammar).
Are there tools that exist that would make my task easier?
You can use special transformation tools/languages for that TXL or Stratego/XT.
Also you can have a look and easily try Java to Python and Java to Tcl migrating projects made by me with TXL.
You are right about mapping library concepts. It is rather hard and long task. There are two ways here:
Fully migrate the class library from language A to B
Migrate classes/functions from language A to the corresponding concepts in language B
The approach you will choose depends on your goals and time/resources available. Also in many cases you wont be doing a general A->B migration which will cover all possible cases, you will need just to convert some project/library/etc. so you will see in your particular cases what is better to do with classes/libraries.
I think this is almost impossibly hard, especially as a personal project. But if you are going to do it, don't make life even more difficult for yourself by trying to come up with a general solution. Choose two specific real-life programming languages ind investigate the possibities of converting between them. I think you will be shocked by the number of problems and issues this will expose.
There are some tools for direct migration for some combinations of A and B.
There are a variety of reverse engineering and code generation tools for different languages and platforms. It's fairly rare to see reverse engineering tools which capture all the semantics of the source language, and the semantics of UML are not well defined ( since it's designed to map to different implementation languages, it itself doesn't define a complete execution model for its behavioural representations ), so you're unlikely to be able to reverse engineer and generate code between tools. You may find one tool that does full reverse engineering and full code generation for your A and B languages, and so may be able to get somewhere.
In general you don't use the same idioms on different platforms, so you're more likely to get something which emulates A code on B rather than something which corresponds to a native B solution.
If you want to use Java as the source language(that language you try to convert) than you might use Checkstyle AST(its used to write Rules). It gives you tree structure with every operation done in the source code. This will be much more easier than writing your own paser or using regex.
You can run com.puppycrawl.tools.checkstyle.gui.Main from checkstyle-4.4.jar to launch Swing GUI that parse Java Source Code.
Based on your comment
I'm not sure yet, but I think the source language/framework would be Java/Swing and the target some RIA language like Flex or a Javascript/Ajax framework. – Alain Michel 3 hours ago
Google Web Toolkit might be worth a look.
See this answer: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Simple (Dumb) LINQ Provider

How easy would it be to write a dumb LINQ provider that can just use my class definitions (which don't have any object references as properties) and give me the translated SQL. It can assume the name of the properties and the columns to be same as well as the names of the classes and the underlying tables. Can you please give me some pointers.?
It took me about 4 months of fulltime work (8 hours a day) to build a stable, working provider that implements the entire spec of linq. I would say I had a very simple, buggy and unstable version after about three weeks, so if you're just looking for something rough I would say you're probably looking at anything from a week up to two months depending on how good you are and what types of requiements you have.
I must point you to the Wayward blog for this, Matt has written a really good walkthrough on how to implement a linq provider, and even if you're probably not going to be able to copy and paste, it will help you to get to grips with how to think when working. You can find Matt´s walkthrough here: http://blogs.msdn.com/mattwar/archive/2007/07/30/linq-building-an-iqueryable-provider-part-i.aspx . I recommend you go about it the same way Matt does, and extend the expression tree visitor Matt includes in the second part of his tutorial.
Also, when I began working with this, I had so much help from the expression tree visualizer, it really made parsing a whole lot easier once you could see how linq parsed to queries.
Building a provider is really a lot of fun, even if a bit frustrating at times. I wish you all the best of luck!
Give a look to the LINQExtender project, is a toolkit for creating custom LINQ providers.
Another option for giving you a leg up seems to be re-linq which is a framework for creating custom LINQ providers.
Here's the Source code and a nice overview (pdf) of what's involved in writing one.
I’ve written a tutorial series on my blog base on my experience developing a LINQ-to-SQL provider from scratch, starting with the expression tree composition stage (calling the LINQ methods), continuing with the expression visitor, breaking down of the query into components, parsing the where clause, generating the text and parameter and, eventually, compiling the whole thing into IL using the .NET expression namespace.
I’ve seen many incomplete posts that promised to explain how to write a provider, falling very short of the mark, barely scratching the surface and not actually delivering anything remotely executable.
The blog series I’ve written based on my experience has a sample project available for download with the simple provider that covers only the functionality required by the tutorial example. However, it also includes the production version supporting a number of operations (where, join, first, count, top, etc.), subqueries, nested statements, and etc. Additionally, it produces a cleaner SQL than a lot of what I’ve seen from Entities and LINQ-to-SQL. There’s no unnecessary/redundant nesting, wrapping everything in brackets and etc.
For anyone with a good level of abstract thinking, developing such a provider isn’t such a difficult task many set it out to be. I’ve developed one that’s used in production environment in about 3 months of part time work (meaning some evenings and weekends). From the get go it was aimed with performance and tidy SQL in mind – a goal it achieved.
It was a little hard to find the time to publish this material, but I thought – if it may help someone out there, there’s no reason for this experience to go to waste:
How to write a LINQ to SQL provider in C# Part 1 - Introduction
How to write a LINQ to SQL provider in C# Part 2 - Expression Visitor
How to write a LINQ to SQL provider in C# Part 3 - Where Clause Visitor
How to write a LINQ to SQL provider in C# Part 4 - Compiling Expression Trees
I have created a project 'LinqToAnything' which is designed to make it very very easy to implement a (simple) Linq provider.

Resources