Neo4j Spring OGM query for list of entities always return distinct - spring

Lets say i have a graph:
A - follows -> B
A - follows -> C
Now, i have a query to get followers for both B and C (which should return me A 2 times).
MATCH (a)<-[:FOLLOWS]-(followers)
WHERE a.username IN ['B','C']
RETURN followers
If i make this query through Neo4J browser, i get 2 records: A node 2 times. This is correct.
If i make the same query through Spring Repository i get a list with only 1 object (A).
So, through Spring's repository any query for entities performs as if i add DISTINCT, so there is no difference between regular query and DISTINCT one.
If i query for some property of a node, i.e. A.username, i get a list with two duplicate strings (as intended).
Is this behaviour expected?
Why ?
Is there a way to query fo full entities with duplicates, the same
way that Cypher query works in Neo4J itself?

In general this behaviour is correct:
The A is always the same and gets mapped as one object. It won't make any sense to create the very same object twice.
I don't know from your question what the query should map to. But assuming it should create List<A> for this query it is correct.
Returning the a.username will not map to any entity but can only get collected in a projection / #QueryResult. This result has no concept of equality or similar and will always get created for any returned "row" from the response.

Related

Errors in the use of Specification<T>, returns one object out of 2 matching the search criteria

Errors in the use of Specification, returns one object out of 2 matching the search criteria, while builder returns the correct value.
springDatabaseItemRepository.findAll(builder.build(), pageable);
I'm looking for 3 conditions: 1 of which joins all queries and works. The other two are one simple query, the second is join connected using or. And after processing, only the results of join are returned to me.
I tried to change the processing and conditions, tried to read the documentation, but it didn't give me anything. Also, if you call join and a simple query separately, the results are returned correct.

Spring JPA paginated query with Join Fetch - Count Query gives fetch error

(Note: all code examples are extremely simple. I know there are other ways to do such simple queries. The problem I am demonstrating, however, is a bigger deal for more complex queries).
There is a known issue with Spring JPA Repositories and paginated queries that I'm really hoping there is a good solution for. In JPQL, it is possible to use JOIN FETCH to specify that I want to eagerly fetch a related entity, rather than doing it lazily. This avoids the N+1 problem, among other things. JOIN FETCH requires that the owner of the association is included in the select clause. Here is a very simple example of the type of query I'm talking about:
#Query("""
SELECT p
FROM Person p
JOIN FETCH p.address
""")
Page<Person> getPeopleAndAddresses(Pageable page);
The problem with this kind of query is the pagination piece. When returning a Page, Spring will do the query I wrote but then also do a count query to get the total possible records. Spring appears to take my query exactly as written, and just replace SELECT p with SELECT COUNT(p). Doing this means that the owner of the JOIN FETCH association is no longer present in the SELECT clause, which then results in the JPQL giving an error.
The only way I know how to resolve this is to construct the query with separate query and countQuery values, like this:
#Query(query = """
SELECT p
FROM Person p
JOIN FETCH p.address
""", countQuery = """
SELECT COUNT(p)
FROM Person p
""")
Page<Person> getPeopleAndAddresses(Pageable page);
This resolves the JPQL JOIN FETCH error, because the count query no longer contains a JOIN FETCH clause. However, for complex queries with sophisticated JOINs and WHERE clauses, this will lead to excessive code duplication as I will have to write all that logic in two places.
This seems like the kind of issue where there really should be a better solution available. I'm exploring various alternatives, including Specifications (mixed feelings), Blaze Persistence, and others. However, I'm wondering if there is some way in Spring itself to resolve this issue so that the first code example would work without an error?

Room relational query method with paging

In Room 2.4, there is a new feature called relational query method in DAO which you can write your custom query to select columns from 2 entities and Room can be able to aggregate into Map<TableA, List<TableB>> return type.
I have a fairly complicated query which do left join with nested queries to return a map of railway stations and their associated rail lines (many-to-many relationship). I tried to make a #Query method returns Flow<Map<RailStation, List<RailLine>>> and it can return the map that I want.
Now, I want to go one step further to make it returns paging 3's PagingSource. As the original type is a Map, so I think I should make the paging #Query method as PagingSource<Int, Map.Entry<RailStation, List<RailLine>>>. (Map.Entry should be the representative type of a single list item rather than Map as it represent the whole query result.) However, the Room annotation processor complainted about this line saying that it cannot handle this type:
[ksp] RailStationDao.kt:130: Not sure how to convert a Cursor to this method's return type (androidx.paging.PagingSource<java.lang.Integer, java.util.Map.Entry<RailStation, java.util.List<RailLine>>>).
So my question is: does the Room annotation processor and Paging 3 support for relational query method with paging 3? If not, is there any alternative way to archive the same goal? It seems like the #Relation annotation in Room can only support for simple table joining, but my case is I need to write nested query in the LEFT JOIN clause.

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

How to build a query across parent and child object fields in Oracle CRM On Demand?

As a part of an Integration Requirement, I need to query Opportunity records that have been Modified after a specific date and time?
Now, Opportunity has a child object called ProductRevenue with a one to many relationship. Is there anyway I can construct a querypage that will fetch records whose Opportunity fields 'OR' its child ProductRevenue's fields have been modified after a specific date and time?
I have tried using the SearchSpec argument, but it does not let me query across child object fields.
For eg:
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00') OR ([ProductRevenueData.ModifiedDate] >= '01/01/2013 00:00:00')";
[This above code written in C# thew me an error saying - The object Opportunity does not have an integration component called - ProductRevenueData.ModifiedDate.]
Any help will be greatly appreciated. Thank you.
I have been looking for the answer myself, and here is my understanding although not a solution.
In Web Services 2.0, Oracle says "all parent records matching the parent criteria and only children matching the child criteria are returned."
You actually can define "searchspec" on the parent AND the child, and it does work in the way that Oracle defined. However it is probably not the behavior you are looking for. When you do this what happens is you get ALL parents that match the parent.searchspec regardless of whether its child matches the child.searchspec. However those parents will only have the child(ren) that match your child.searchspec in the query result. So if all you wanted was "parents that have these children" or "these children" you are out of luck. Because what you get is "many parents and some of their children."
So even when you are post-processing with two queries you will have to spend some time. :(
By the way your two separate queries will have to look something like this:
query 1.
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00')";
query 2. ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.ListOfProductRevenue.ProductRevenue.searchspec = "([ModifiedDate] >= '01/01/2013 00:00:00')";
Then post-process query 2 to take out all parents who have no children.
Then union that with the results from query 1.
From my experience, you will not be able to do this using their V2.0 api (i.e. searchspec). You can perform this using V1.0 api BUT this will return all parent records matching your criteria plus all related ProductRevenue records whether they meet the criteria or not. I do something similar and then post process the data against an xpath predicate filter. The only other option, I think, is 2 separate queries.
I had the same problem, and i tried many ways to resolve the problem, but for now you should deal with the result returned: you can use DOM, XPath or regular expressions to extract the information you want for the returned result.
In my case i used XPath because it's very fast and more easier. This is a link to the question i have posted with the correct answer :
Xpath solution for the parent-child query result
I hope this will fix the problem.

Resources