How to implement conditional relationship clause in cypher for Spring Neo4j query? - spring-boot

I have a cypher query called from Java (Spring Neo4j) that I'm using, which works.
The basic query is straightfoward - it asks what nodes lead to another node.
The gist of the challenge is that there are a set (of three) items - contributor, tag, and source - that the query might need to filter its results by.
Two of these - tag and source - bring with them a relationship that needs to be tested. But if these are not provided, then that relationship does not need to be tested.
Neo-browser warns me that the query I'm using it will be inefficient, for probably obvious reasons:
#Query("MATCH (p:BoardPosition)<-[:PARENT]-(n:BoardPosition)<-[:PARENT*0..]-(c:BoardPosition),
(t:Tag), (s:JosekiSource) WHERE
id(p)={TargetID} AND
({ContributorID} IS NULL or c.contributor = {ContributorID}) AND
({TagID} IS NULL or ((c)-->(t) AND id(t) = {TagID}) ) AND
({SourceID} IS NULL or ((c)-->(s) AND id(s) = {SourceID}) )
RETURN DISTINCT n"
)
List<BoardPosition> findFilteredVariations(#Param("TargetID") Long targetId,
#Param("ContributorID") Long contributorId,
#Param("TagID") Long tagId,
#Param("SourceID") Long sourceId);
(Java multi-line query string catenation removed from code for clarity)
It would appear that this would be more efficient if the conditional inclusion of the relationship-test could be in the MATCH clause itself. But that does not appear possible.
Is there some in-cypher way I can improve the efficiency of this?
FWIW: the Neo4j browser warning is:
This query builds a cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing.

Related

JPA #Query count AND select

I have a somewhat complicated #Query in a JpaRepository.
I need to get the results of this query in two forms (but not at the same time!):
First, the client asks for a count of the number of results: SELECT COUNT(x.*) FROM my_table x ...
Then later (maybe), they want to see the actual data: SELECT x.* FROM my_table x ...
What follows (the ...) is identical for both queries. Is there any way to combine these so that I don't repeat myself?
I know I could just use the second method, and count the number of elements in the resulting List. However, this adds the overhead of actually fetching all those elements from the database.
I could put the ... in a String constant somewhere, but that kind of separates it from its context (I'd lose IntelliJ's syntax highlighting/error checking)
I can't convert it to a Criteria or Example query, because I need to use PostGIS's geography type. (And these are less readable anyway...)
Any other ideas?
If your worries is about some developer change the COUNT query and forgot to change the SELECT query too, you can create a repository integration test to guarantee the expected result between the two queries.
Another alternative is create a unit test to read the annotation content and verify if the final of these two queries are equal.

Compound "from" clause in Linq query in Entity Framework

I've been working with Entity Framework for a few weeks now. I have been working with Linq-Objects and Linq-SQL for years. A lot of times, I like to write linq statements like this:
from student in db.Students
from score in student.Scores
where score > 90
select student;
With other forms of linq, this returns distinct students who have at least one score greater than 90. However, in EF this query returns one student for every score greater than 90.
Does anyone know if this behavior can be replicated in unit tests? Is it possible this is a bug in EF?
I don't like that SQL-like syntax (I have no better name for it), especially when you start nesting them.
var students = db.Students.Where(student
=> student.Scores.Any(score => score > 90)
)
.ToList();
This snippet, using the method syntax, does the same thing. I find it far more readable. It's more explicit in the order of operations used.
And as far as I have experienced, EF hasn't yet shown a bug with its selection using method syntax.
Edit
To actually answer your problem:
However, in EF this query returns one student for every score greater than 90.
I think is is due to a JOIN statement used in the final SQL that will be run. This is why I avoid SQL-like syntax, because it becomes very hard to differentiate between what you want to retrieve (students) and what you want to filter with (scores).
Much like you would do in SQL, you are joining the data from students and scores, and then running a filtering operation on that collection. It becomes harder to then unseparate that result again into a collection of students. I think this is the main cause of your issue. It's not a bug per sé, but I think EF can only handle it one way.
Alternative solutions to the above:
If it returns one student per score over 90, take the distinct students returned. It should be the same result set.
Use more explicit parentheses () and formatting to nest separate SQL-like statements.
Note: I'm not saying it can't be done with SQL-like syntax. I am well aware that most of this answer is opinion based.

Can IQueryable<> only contain instructions (in the form of expression tree) on how to get the initial sequence, but not

1)
public class Query<T> : IQueryable<T> ...
{
...
public IEnumerator<T> GetEnumerator()
{
return((IEnumerable<T>)this.provider.Execute(this.expression)).GetEnumerator();
}
}
Query<string> someQuery = new Query<string>();
someQuery.Expression holds the expression tree associated with this particular instance of IQueryable and thus describes how to retrieve the initial sequence of items. In the following example initialSet variable actually contains a sequence of strings:
string[] initialSet = { };
var results1 = from x in initialSet
where ...
select ...;
a) Can someQuery also contain a sequence of strings, or can it only contain instructions ( in the form of expression tree ) on how to get this initial sequence of strings from some DB?
b) I assume that even if someQuery actually contains an initial sequence of strings, it is of no use to Where and Select operators, since they won't ever operate on this sequence of strings, but instead their job is only to build queries or request queries to be executed ( by calling IQueryProvider.Execute)? And for this reason someQuery must always contain an expression tree describing how too get the initial sequence of strings, even if someQuery already contains this initial sequence?
Thank you
EDIT:
c) The way I understood your post is that query provider may contain information about describing the table or at least describing particular DB rows which initial query needs to retrieve. But I didn't interpret your answer as saying that query provider may also contain actual elements required by this initial query ( someQuery in our example )?
d) Regardless, I assume even if query provider maintains actual elements, it can only maintain them for initial query? Thus if we apply Linq-to-entity or Linq-to-Sql operators on that initial query, I assume provider will have to query the database. As such, if my assumption are correct then answer to b) would be even if query does contain actual elements, when we call Where on someQuery ( someQuery.Where ),query provider will have to retrieve results from a DB, even if this query provider already contains all the elements of someQuery?
e) I only started learning Linq-to-entities, so my question may be too general, but how does EF handle all of this? In other words, when does ObjectSet<T> returned by some EF API ( such as ObjectContext ), contain actual elements and when does it ( if ever ) contain only logic for retrieving elements from some data source (such as DB)?
f) Also, even if ObjectSet<T> ( returned by say ObjectSet ) does contain actual elements, I assume if we apply Where operator on it ( ObjectSet<T>.Where ), query provider will always have to retrieve results from the DB?
a) You wouldn't normally create a Query<T> yourself - a query provider would. It can choose to include whatever information it wants. It's most likely to just contain the information about what table it's associated with though.
b) That's entirely up to the query provider. As you saw in the other question, the query provider may end up recognizing when it's reached a Query<T> - so it could know to ask the Query<T> for its strings, if that was appropriate.
c) The query provider wouldn't usually contain data itself - but it could do. It's up to the provider.
d) The query provider may notice that it's within a transaction, and that it's already executed a similar context in the query - it may be able to answer the query from within its cache. It's up to the query provider.
e, f) No idea, I've never used Entity Framework in anger.
The answer to almost all of your questions around this is topic is "it's up to the query provider". For details about a particular query provider, you should read the documentation for that provider. That should explain when it will make queries etc. It's unclear what you're really trying to get out of these questions - but if you're after a full implementation to study, there are plenty of open source LINQ providers around. You might want to look at NHibernate for example.

LINQ - Using where or join - Performance difference?

Based on this question:
What is difference between Where and Join in linq?
My question is following:
Is there a performance difference in the following two statements:
from order in myDB.OrdersSet
from person in myDB.PersonSet
from product in myDB.ProductSet
where order.Persons_Id==person.Id && order.Products_Id==product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
and
from order in myDB.OrdersSet
join person in myDB.PersonSet on order.Persons_Id equals person.Id
join product in myDB.ProductSet on order.Products_Id equals product.Id
select new { order.Id, person.Name, person.SurName, product.Model,UrunAdı=product.Name };
I would always use the second one just because it´s more clear.
My question is now, is the first one slower than the second one?
Does it build a cartesic product and filters it afterwards with the where clauses ?
Thank you.
It entirely depends on the provider you're using.
With LINQ to Objects, it will absolutely build the Cartesian product and filter afterwards.
For out-of-process query providers such as LINQ to SQL, it depends on whether it's smart enough to realise that it can translate it into a SQL join. Even if LINQ to SQL doesn't, it's likely that the query engine actually performing the query will do so - you'd have to check with the relevant query plan tool for your database to see what's actually going to happen.
Side-note: multiple "from" clauses don't always result in a Cartesian product - the contents of one "from" can depend on the current element of earlier ones, e.g.
from file in files
from line in ReadLines(file)
...
My question is now, is the first one slower than the second one? Does it build a cartesic product and filters it afterwards with the where clauses ?
If the collections are in memory, then yes. There is no query optimizer for LinqToObjects - it simply does what the programmer asks in the order that is asked.
If the collections are in a database (which is suspected due to the myDB variable), then no. The query is translated into sql and sent off to the database where there is a query optimizer. This optimizer will generate an execution plan. Since both queries are asking for the same logical result, it is reasonable to expect the same efficient plan will be generated for both. The only ways to be certain are to
inspect the execution plans
or measure the IO (SET STATISTICS IO ON).
Is there a performance difference
If you find yourself in a scenario where you have to ask, you should cultivate tools with which to measure and discover the truth for yourself. Measure - not ask.

Data structure to hold HQL or EJB QL

We need to produce a fairly complex dynamic query builder for retrieving reports on the fly. We're scratching our heads a little on what sort of data structure would be best.
It's really nothing more than holding a list of selectParts, a list of fromParts, a list of where criteria, order by, group by, that sort of thing, for persistence. When we start thinking about joins, especially outer joins, having clauses, and aggregate functions, things start getting a little fuzzy.
We're building it up interfaces first for now and trying to think as far ahead as we can, but definitely will go through a series of refactorings when we discover limitations with our structures.
I'm posting this question here in the hopes that someone has already come up with something that we can base it on. Or know of some library or some such. It would be nice to get some tips or heads-up on potential issues before we dive into implementations next week.
I've done something similar couple of times in the past. A couple of the bigger things spring to mind..
The where clause is the hardest to get right. If you divide things up into what I would call "expressions" and "predicates" it makes it easier.
Expressions - column references, parameters, literals, functions, aggregates (count/sum)
Predicates - comparisons, like, between, in, is null (predicates have expression as children, e.g. expr1 = expr2. Then you also having composites such as and/or/not.
The whole where clause, as you can imagine, is a tree with a predicate at the root, with maybe sub-predicates underneath eventually terminating with expressions at the leaves.
To construct the HQL you walk the model (depth first usually). I used a visitor as I need to walk my models for other reasons, but if you don't have multiple purposes you can build the rendering code right into the model.
e.g. If you had
"where upper(column1) = :param1 AND ( column2 is null OR column3 between :param2 and param3)"
Then the tree is
Root
- AND
- Equal
- Function(upper)
- ColumnReference(column1)
- Parameter(param1)
- OR
- IsNull
- ColumnReference(column2)
- Between
- ColumnReference(column3)
- Parameter(param2)
- Parameter(param3)
Then you walk the tree depth first and merge rendered bits of HQL on the way back up. The upper function for example would expect one piece of child HQL to be rendered and it would then generate
"upper( " + childHql + " )"
and pass that up to it's parent. Something like Between expects three child HQL pieces.
You can then re-use the expression model in the select/group by/order by clauses
You can skip storing the group by if you wish by just storing the select and before query construction scan for aggregate. If there is one or more then just copy all the non-aggregate select expressions into the group by.
From clause is just a list of table reference + zero or more join clauses. Each join clause has a type (inner/left/right) and a table reference. Table reference is a table name + optional alias.
Plus, if you ever get into wanting to parse a query language (or anything really) then I can highly recommend ANTLR. Learning curve is quite steep but there are plenty of example grammars to look at.
HTH.
if you need EJB-QL parser and data structures, EclipseLink (well several of it's internal classes) have good one:
JPQLParseTree tree = org.eclipse.persistence.internal.jpa.parsing.jpql.JPQLParser.buildParserFor(" _the_ejb_ql_string_ ").parse();
JPQLParseTree contains the all the data.
but generating EJB-QL back from modified JPQLParseTree is something you have to do yourself.

Resources