Data structure to hold HQL or EJB QL - data-structures

We need to produce a fairly complex dynamic query builder for retrieving reports on the fly. We're scratching our heads a little on what sort of data structure would be best.
It's really nothing more than holding a list of selectParts, a list of fromParts, a list of where criteria, order by, group by, that sort of thing, for persistence. When we start thinking about joins, especially outer joins, having clauses, and aggregate functions, things start getting a little fuzzy.
We're building it up interfaces first for now and trying to think as far ahead as we can, but definitely will go through a series of refactorings when we discover limitations with our structures.
I'm posting this question here in the hopes that someone has already come up with something that we can base it on. Or know of some library or some such. It would be nice to get some tips or heads-up on potential issues before we dive into implementations next week.

I've done something similar couple of times in the past. A couple of the bigger things spring to mind..
The where clause is the hardest to get right. If you divide things up into what I would call "expressions" and "predicates" it makes it easier.
Expressions - column references, parameters, literals, functions, aggregates (count/sum)
Predicates - comparisons, like, between, in, is null (predicates have expression as children, e.g. expr1 = expr2. Then you also having composites such as and/or/not.
The whole where clause, as you can imagine, is a tree with a predicate at the root, with maybe sub-predicates underneath eventually terminating with expressions at the leaves.
To construct the HQL you walk the model (depth first usually). I used a visitor as I need to walk my models for other reasons, but if you don't have multiple purposes you can build the rendering code right into the model.
e.g. If you had
"where upper(column1) = :param1 AND ( column2 is null OR column3 between :param2 and param3)"
Then the tree is
Root
- AND
- Equal
- Function(upper)
- ColumnReference(column1)
- Parameter(param1)
- OR
- IsNull
- ColumnReference(column2)
- Between
- ColumnReference(column3)
- Parameter(param2)
- Parameter(param3)
Then you walk the tree depth first and merge rendered bits of HQL on the way back up. The upper function for example would expect one piece of child HQL to be rendered and it would then generate
"upper( " + childHql + " )"
and pass that up to it's parent. Something like Between expects three child HQL pieces.
You can then re-use the expression model in the select/group by/order by clauses
You can skip storing the group by if you wish by just storing the select and before query construction scan for aggregate. If there is one or more then just copy all the non-aggregate select expressions into the group by.
From clause is just a list of table reference + zero or more join clauses. Each join clause has a type (inner/left/right) and a table reference. Table reference is a table name + optional alias.
Plus, if you ever get into wanting to parse a query language (or anything really) then I can highly recommend ANTLR. Learning curve is quite steep but there are plenty of example grammars to look at.
HTH.

if you need EJB-QL parser and data structures, EclipseLink (well several of it's internal classes) have good one:
JPQLParseTree tree = org.eclipse.persistence.internal.jpa.parsing.jpql.JPQLParser.buildParserFor(" _the_ejb_ql_string_ ").parse();
JPQLParseTree contains the all the data.
but generating EJB-QL back from modified JPQLParseTree is something you have to do yourself.

Related

How to implement conditional relationship clause in cypher for Spring Neo4j query?

I have a cypher query called from Java (Spring Neo4j) that I'm using, which works.
The basic query is straightfoward - it asks what nodes lead to another node.
The gist of the challenge is that there are a set (of three) items - contributor, tag, and source - that the query might need to filter its results by.
Two of these - tag and source - bring with them a relationship that needs to be tested. But if these are not provided, then that relationship does not need to be tested.
Neo-browser warns me that the query I'm using it will be inefficient, for probably obvious reasons:
#Query("MATCH (p:BoardPosition)<-[:PARENT]-(n:BoardPosition)<-[:PARENT*0..]-(c:BoardPosition),
(t:Tag), (s:JosekiSource) WHERE
id(p)={TargetID} AND
({ContributorID} IS NULL or c.contributor = {ContributorID}) AND
({TagID} IS NULL or ((c)-->(t) AND id(t) = {TagID}) ) AND
({SourceID} IS NULL or ((c)-->(s) AND id(s) = {SourceID}) )
RETURN DISTINCT n"
)
List<BoardPosition> findFilteredVariations(#Param("TargetID") Long targetId,
#Param("ContributorID") Long contributorId,
#Param("TagID") Long tagId,
#Param("SourceID") Long sourceId);
(Java multi-line query string catenation removed from code for clarity)
It would appear that this would be more efficient if the conditional inclusion of the relationship-test could be in the MATCH clause itself. But that does not appear possible.
Is there some in-cypher way I can improve the efficiency of this?
FWIW: the Neo4j browser warning is:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing.

Linq to sql/xml , to object ,and to dataset comparison

Hellow Dears,
i read deeply in LinQ articles and i wonder about one thing not got it is what difference between LinQ Types [To SQL/XML - To Object - To DataSet].
need simple clarify comparison specially for memory view
Thanks
"Linq" stands for "Langage Integrated Query" --- basically, it means that the query syntax keywords (from where select etc) are now officially part of the language.
Now, at the highest level, a query against an array or a database table is doing essentially the same thing --- but the actual mechanics of how the query happens are quite different.
Linq2Sql, Linq2Object et al, are different subsystems which allow very different queries to be expressed using a common syntax.

Clean way to write this query

I'm looking for a clean way to write this Linq query.
Basically I have a collection of objects with id's, then using nhibernate and Linq, I need to check if the nhibernate entity has a subclass collection where all id's in object collection exist in the nhibernate subclass collection.
If there was just one item this would work:
var objectImCheckingAgainst = ... //irrelevant
where Obj.SubObj.Any(a => a.id == objectImCheckingAgainst.Id)
Now I want to instead somehow pass a list of objectImCheckingAgainst and return true only if the Obj.SubObj collection contains all items in list of objectImCheckingAgainst based on Id.
I like to use GroupJoin for this.
return objectImCheckingAgainst.GroupJoin(Obj.SubObj,
a => a.Id,
b => b.id,
(a, b) => b.Any())
.All(c => c);
I believe this query should be more or less self-explanatory, but essentially, this joins the two collections using their respective ids as keys, then groups those results. Then for each of those groupings, it determines whether any matches exist. Finally, it ensures that all groupings had matches.
A useful alternative that I sometimes use is .Count() == 1 instead of the .Any(). Obviously, the difference there is whether you want to support multiple elements with the same id matching. From your description, it sounded like that either doesn't matter or is enforced by another means. But that's an easy swap, either way.
An important concept in GroupJoin that I know is relevant, but may or may not be obvious, is that the first enumerable (which is to say, the first argument to the extension method, or objectImCheckingAgainst in this example) will have all its elements included in the result, but the second one may or may not. It's not like Join, where the ordering is irrelevant. If you're used to SQL, these are the elementary beginnings of a LEFT OUTER JOIN.
Another way you could accomplish this, somewhat more simply but not as efficiently, would be to simply nest the queries:
return objectImCheckingAgainst.All(c => Obj.SubObj.Any(x => x.id == c.Id));
I say this because it's pretty similar to the example you provided.
I don't have any experience with NHibernate, but I know many ORMs (I believe EF included) will map this to SQL, so efficiency may or may not be a concern. But in general, I like to write LINQ as close to par as I can so it works as well in memory as against a database, so I'd go with the first one I mentioned.
I'm not well versed in LINQ-to-NHibernate but when using LINQ against any SQL backen it's always important to keep an eye on the generated SQL. I think this where clause...
where Obj.SubObj.All(a => idList.Contains(a.id))
...will produce the best SQL (having an IN statement).
idList is a list of Ids extracted from the list of objectImCheckingAgainst objects.

Non aggregate functions, relational algebra

How can we translate the non aggregate functions of Structured Query Language into relational algebra expressions?! I know how to express the aggregate functions, but what about the non aggregate functions?!
e.g How can we write the Year( a date format column) function?! Just Year(date)?
select e.name,year(e.dateOfEmployment) from Employees e
Thanks!
(This is a very reasonable question, I don't understand why it should get downvoted.)
The "Relational" in RA means expressing functions as mathematical relations -- using a set-theoretic approach. (It doesn't mean, as often thought, relating one table or datum to another as in "Entity Relational" modelling.) I can't grab a very succinct reference for this off the top of my head, but start here http://en.wikipedia.org/wiki/Binary_relation and follow the links.
How does this get to answer your question in context of a practical RA? Have a look at this:
http://www.dcs.warwick.ac.uk/~hugh/TTM/APPXA.pdf, especially the section Treating Operators as Relations.
See how the relations PLUS and SQRT can be 'applied' (using COMPOSE, which is a shorthand for Natural JOIN and PROJECT) to behave as a function.
For your specific question, you need a relation with two attributes (type Date and Year).

Compound "from" clause in Linq query in Entity Framework

I've been working with Entity Framework for a few weeks now. I have been working with Linq-Objects and Linq-SQL for years. A lot of times, I like to write linq statements like this:
from student in db.Students
from score in student.Scores
where score > 90
select student;
With other forms of linq, this returns distinct students who have at least one score greater than 90. However, in EF this query returns one student for every score greater than 90.
Does anyone know if this behavior can be replicated in unit tests? Is it possible this is a bug in EF?
I don't like that SQL-like syntax (I have no better name for it), especially when you start nesting them.
var students = db.Students.Where(student
=> student.Scores.Any(score => score > 90)
)
.ToList();
This snippet, using the method syntax, does the same thing. I find it far more readable. It's more explicit in the order of operations used.
And as far as I have experienced, EF hasn't yet shown a bug with its selection using method syntax.
Edit
To actually answer your problem:
However, in EF this query returns one student for every score greater than 90.
I think is is due to a JOIN statement used in the final SQL that will be run. This is why I avoid SQL-like syntax, because it becomes very hard to differentiate between what you want to retrieve (students) and what you want to filter with (scores).
Much like you would do in SQL, you are joining the data from students and scores, and then running a filtering operation on that collection. It becomes harder to then unseparate that result again into a collection of students. I think this is the main cause of your issue. It's not a bug per sé, but I think EF can only handle it one way.
Alternative solutions to the above:
If it returns one student per score over 90, take the distinct students returned. It should be the same result set.
Use more explicit parentheses () and formatting to nest separate SQL-like statements.
Note: I'm not saying it can't be done with SQL-like syntax. I am well aware that most of this answer is opinion based.

Resources