stored procedures search elastic search - elasticsearch

I am trying to script a recursive search in elastic.
i know there are search-templates,but i am not finding examples like below scenario
`ex:-father= neo
1.search in person-index documents for father attribute
2.if father=neo return direct,else(here father=ted)
3.search for ted now and check if father=neo return indirect
or repeat step 3 till script find's ancestor if not found return not related when reached father = some constant like (pre-genator or ancient)
`
This eliminates for me to go for graph database, if i have only one relation .
another scenario like find all decedents of "neo"

There is no facility to do exactly what you are describing inside elasticsearch at the moment.
If number of ancestral generations is limited and they can be expressed as 1-to-many relationships, you can use multiple has_parent queries.
Alternatively, if it's possible, you can denormalize the data and store names of all ancestors for the given record in a single field. So the record would look like this:
{
"father": "neo",
"ancestors": ["neo", "ted", ... ]
}
Otherwise, you need to do these searches outside of elasticsearch.

Related

How to create index for realtionships when using apoc.algo.dijkstra in neo4j

I want to create appropriate indexes on relationships which increase performance when using the apoc.algo.dijkstra algorithm.
My query looks like this
MATCH (a:Waypoint {name: 'nameTwo'}), (b:Waypoint{name: 'nameOne'}) CALL apoc.algo.dijkstra(a, b, 'STREET_A>|STREET_B>', 'distance') yield path as path, weight as distance RETURN path, distance;
I want indexes for the relationship types 'STREET_A' and 'STREET_B'
I tried to create indexes like these, but it does not seem to make performance difference:
CREATE INDEX STREET_A_INDEX IF NOT EXISTS FOR ()-[r:STREET_A]-() ON (r.distance)
This is the result with PROFILE:
Plan image
Is it at all possible to make apoc.algo.dijkstra more performant via an index?
I don't think a relationship index will help in this case. Indexes are used in Neo4j to find the starting point of a traversal - in your case that would be finding the two Waypoint nodes. From the query plan, it looks like you already have an index that is being used to find those nodes.
Neo4j has what's called "index-free adjacency". This means that a traversal from one node to another following a relationship doesn't use an index, rather the operation is more like following pointers - so a relationship index would not be used in your example.

How to perform a Join query on Elastic Search via springboot/java?

I have a springboot application that interacts with elastic search (or as it know now OpenSearch). It can perform basic operations such as search, index etc. I used this as my base (although I replaced high level client since it is deprecated) and to perform queries, I am using #Query annotation mostly (as described in section 2.2 here, although I also used QueryBuilders).
Now, I have an interesting use case - I would like to perform 2 queries at the same time. First query would find a file in elastic search that would contain 3 ids. These 3 ids are ids of other files in the same elastic search. The 2nd query would look for these 3 files and finally return them to me. Now, I can easily do it in 2 steps:
Have a query to find a file containing 3 ids and return it
Have a second query (multisearch query can do bulk search as I understand) to search
for 3 files using info from the first query.
However, I need them to happen within the same query - so within the same query I need to search for a file containing the 3 ids and then perform a search for these 3 files.
So currently my files in elastic search look like so:
{
"docId": "docId57",
"relatedDocs": [
{
"relatedId": "docId1",
"type": "apple"
},
{
"relatedId": "docId2",
"type": "orange"
},
{
"relatedId": "docId3",
"type": "banana"
}
]
}
and my goal is to have a query that will accept docId57 as an arg (so a method findFilesViaJoin(docId57) or something) and return a list of 3 files: file for docId1, file docId2 and file for docId3.
I know it is possible either via nested queries, child/parent queries or good old SQL queries (via jpa/hibarnate).
I attempted to use all of these and was unsuccessful for reasons described below.
Child/parent queries
So for child/parent queries, I attempted to use DSL with #Query but couldn't quite get it since I don't have a solid documentation to refer to (the one that actually helps with java not curls). After some time I found this and this articles - I maybe can figure out how to make it work with child/parent but neither explain how to do mapping. If this approach can do what I want, my question is: how to set up & map parent/child in springboot.
Using SQL queries
So for this one, I need to change my set up to use hibarnate. I used this as my base. It works, the only problem I have is that my SQL queries get ignored. Instead, the search is done based of a method's name, not the content of #Query. So it is as if I don't have an annotation used at all. So using the structure mentioned above, the following method in my app:
#Query("select t from MyModel t where t.docId = ?1")
findByRelatedDocsRelatedId(String id)
will return files that has a relatedId that matches the id passed via method ard id (as oppose to reading query from #Query that tells method to search all docs based on docId). Now, I don't mind using method name as a query to search for something. But then why would I use #Query for? (not to mention how do I create a name that does join). It might be possible that my hibernate is set up wrong (never used it before this week). So question here is, does anybody have a nice complete example of hibarnate being used with elastic search that does join query?
Nested queries
For these queries, I assume that I just need to figure out what to put inside the #Query but due to limited documentation about how to compose nested query I didn't manage to make it even remotely to work. Any concreate documentation on how to create DSL nested query would be appreciated.
Any of the ways I described will work for me. I think child/parent seems the best choice (seeing as they kind created for this purpose) but any will do.

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

Hierarchical faceting with Lucene/Solr/Elasticsearch where document can have multiple parents

I'm currently evaluating whether to use elasticsearch or solr in a project and moving through the cases that need to be implemented. I found one case on which I couldn't find any documentation which felt a bit strange to me since the case seemed to be quite common to me. The categories are user supplied so I don't know them in advance. Consider the following part of a taxonomy with documents that can have multiple categories:
Root (3)
Books (2)
Sci-fi (1)
DocumentA
Fantasy (2)
DocumentA
DocumentC
Movies (1)
Action (1)
DocumentB
Games (1)
Adventure
DocumentB
In this case DocumentB could be an entry for e.g. Indiana Jones. Normal term hierarchies can be implemented using the path hierarchy tokenizer in solr/elastic, so DocumentC would have 'Root/Books/Fantasy' as category with a path split on '/'.
DocumentB however would need to have two paths ('Root/Movies/Action' and 'Root/Games/Adventure'). I thought about dynamically adding one category_n field per path for the document in elastic with the path hierarchy tokenizer and then do the category search on all the category_* fields, but i don't know if that would be the right approach, especially considering that the document count for the facets is not simple because the count of a parent node is not the sum of its children (documents can be in multiple child categories and should not be counted more than once).
What would be a good way to implement this in solr/elastic?
Cheers
I ended up using ES and had a category field in which I put every path to the node. So 'Root/Movies/Action' and 'Root/Games/Adventure'. Then I used a path hierarchy tokenizer splitting on / with this field.
ES supports putting multiple paths in that field and searching them. I then used an aggregation with bucketing on the categories, that yielded exactly what I wanted, documents are not counted multiple times if the occure more than one time in a branch.

How to build a query across parent and child object fields in Oracle CRM On Demand?

As a part of an Integration Requirement, I need to query Opportunity records that have been Modified after a specific date and time?
Now, Opportunity has a child object called ProductRevenue with a one to many relationship. Is there anyway I can construct a querypage that will fetch records whose Opportunity fields 'OR' its child ProductRevenue's fields have been modified after a specific date and time?
I have tried using the SearchSpec argument, but it does not let me query across child object fields.
For eg:
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00') OR ([ProductRevenueData.ModifiedDate] >= '01/01/2013 00:00:00')";
[This above code written in C# thew me an error saying - The object Opportunity does not have an integration component called - ProductRevenueData.ModifiedDate.]
Any help will be greatly appreciated. Thank you.
I have been looking for the answer myself, and here is my understanding although not a solution.
In Web Services 2.0, Oracle says "all parent records matching the parent criteria and only children matching the child criteria are returned."
You actually can define "searchspec" on the parent AND the child, and it does work in the way that Oracle defined. However it is probably not the behavior you are looking for. When you do this what happens is you get ALL parents that match the parent.searchspec regardless of whether its child matches the child.searchspec. However those parents will only have the child(ren) that match your child.searchspec in the query result. So if all you wanted was "parents that have these children" or "these children" you are out of luck. Because what you get is "many parents and some of their children."
So even when you are post-processing with two queries you will have to spend some time. :(
By the way your two separate queries will have to look something like this:
query 1.
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00')";
query 2. ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.ListOfProductRevenue.ProductRevenue.searchspec = "([ModifiedDate] >= '01/01/2013 00:00:00')";
Then post-process query 2 to take out all parents who have no children.
Then union that with the results from query 1.
From my experience, you will not be able to do this using their V2.0 api (i.e. searchspec). You can perform this using V1.0 api BUT this will return all parent records matching your criteria plus all related ProductRevenue records whether they meet the criteria or not. I do something similar and then post process the data against an xpath predicate filter. The only other option, I think, is 2 separate queries.
I had the same problem, and i tried many ways to resolve the problem, but for now you should deal with the result returned: you can use DOM, XPath or regular expressions to extract the information you want for the returned result.
In my case i used XPath because it's very fast and more easier. This is a link to the question i have posted with the correct answer :
Xpath solution for the parent-child query result
I hope this will fix the problem.

Resources