Elasticsearch - Query by parent id - elasticsearch

I have three document types with the following mappings with Parent/Child relationships. I have omitted other properties as I thought they are not relevant to the question.
"mappings": {
"Parent": {
},
"Child": {
"_parent": {
"type": "Parent"
},
"_routing": {
"required": true
}
},
"GrandChild": {
"_parent": {
"type": "Child"
},
"_routing": {
"required": true
}
}
}
I'm using java api to insert documents to the index. Before indexing a document, existing documents with the same ids needs to removed to avoid duplicates. Parent's Id and Child's Id are stored externally. First all "GrandChild" documents are deleted using a term query on its parent id (which is an id of type "Child" in this case). There are no errors but the "GrandChild" docs are not getting deleted.
By running the following term query using the chrome plugin Sense, I found out that the problem is in the term query. It doesn't return hits. Parent, Child and GrandChild have the same routing value which is set to the id of Parent. This is the query I tried.
POST /myindex/GrandChild/_search?routing=DFC0E8CD59EBC00EC2DDC9A0FF5D1F2DB272B2449680824CCE60B6864568D498
{
"query" : {
"term" : { "_parent" : "//id/of/doc/of/type/Child" }
}
}
When I try to search for a "Child" document using its parent id, it works. I get a "Child" using the following query.
POST /myindex/Child/_search?routing=DFC0E8CD59EBC00EC2DDC9A0FF5D1F2DB272B2449680824CCE60B6864568D498
{
"query" : {
"term" : { "_parent" : "DFC0E8CD59EBC00EC2DDC9A0FF5D1F2DB272B2449680824CCE60B6864568D498" }
}
}
What could I be doing wrong in the "GrandChild" search?
Any help is appreciated. Thanks.

Got an answer for the question from elasticsearch mailing list
https://groups.google.com/forum/#!topic/elasticsearch/d_aZejNMBD8
There is an open issue for this here:
https://github.com/elasticsearch/elasticsearch/issues/5399
Edit :
As shown in the Github issue the way to perform your query is to prefix the ID by the parent type :
POST /myindex/GrandChild/_search?routing=DFC0E8CD59EBC00EC2DDC9A0FF5D1F2DB272B2449680824CCE60B6864568D498
{
"query" : {
"term" : { "_parent" : "Child#//id/of/doc/of/type/Child" }
}
}

Related

How to update index mapping with dynamic as strict in future?

I am new to elastic search. I have an index named users, which has a lot of fields I know. But a few more fields can be added in the future.
So when defining my mapping, I want to include the fields that I know currently with dynamic "strict", but in the future, if I want to add the new field, how will update the new mapping and if I do it, will I have to reindex everything.
I found in the ES documents that mappings are applied only during index creation time. So I am a little confused here, what's the right way to approach this.
You can always update the mapping in future, even after keeping it strict using the put mapping api. You'll not require existing data to be re-indexed unless you want the newly added field have some value for the older documents which were added before updating the mapping with the new field.
Lets assume you already have an index test with one field say field1 of type keyword. Now in future you have a requirement to add new field say field2 of integer type. You can do so by the put mapping api as below,
PUT test/_mapping
{
"properties": {
"field2": {
"type": "integer"
}
}
}
After executing the above if you check the mapping using
GET test/_mapping
You can see the new field as well in the response,
{
"test" : {
"mappings" : {
"dynamic" : "strict",
"properties" : {
"field1" : {
"type" : "keyword"
},
"field2" : {
"type" : "integer"
}
}
}
}
}
Inner objects inherit the dynamic setting from their parent object or from the mapping type. In the following example, dynamic mapping is disabled at the type level, so no new top-level fields will be added dynamically.
However, the user.social_networks object enables dynamic mapping, so you can add fields to this inner object.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic.html
PUT my-index-000001
{
"mappings": {
"dynamic": false,
"properties": {
"user": {
"properties": {
"name": {
"type": "text"
},
"social_networks": {
"dynamic": true,
"properties": {}
}
}
}
}
}
}

Attempting to use Elasticsearch Bulk API when _id is equal to a specific field

I am attempting to bulk insert documents into an index. I need to have _id equal to a specific field that I am inserting. I'm using ES v6.6
POST productv9/_bulk
{ "index" : { "_index" : "productv9", "_id": "in_stock"}}
{ "description" : "test", "in_stock" : "2001"}
GET productv9/_search
{
"query": {
"match": {
"_id": "2001"
}
}
}
When I run the bulk statement it runs without any error. However, when I run the search statement it is not getting any hits. Additionally, I have many additional documents that I would like to insert in the same manner.
What I suggest to do is to create an ingest pipeline that will set the _id of your document based on the value of the in_stock field.
First create the pipeline:
PUT _ingest/pipeline/set_id
{
"description" : "Sets the id of the document based on a field value",
"processors" : [
{
"set" : {
"field": "_id",
"value": "{{in_stock}}"
}
}
]
}
Then you can reference the pipeline in your bulk call:
POST productv9/doc/_bulk?pipeline=set_id
{ "index" : {}}
{ "description" : "test", "in_stock" : "2001"}
By calling GET productv9/_doc/2001 you will get your document.

elastic search join query

I am having hard time doing join queries in elastic search. My use case is given a particular field and value for it in child document , retrieve the parent document.
I have established parent child relationship between two document following the documentation. When I tried to query, I dont get an error but I dont get any results as well. Following is the query
GET my_index/_search
{
"query": {
"has_parent" : {
"parent_type" : "<parent_type>",
"query" : {
"match" : {
"name": "child_name"
}
}
}
}
}
I have loaded parent documents and child documents in the same index "my_index". I have established the mapping before loading documents. My mapping is as follows
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"parent_join": {
"type": "join",
"relations": {
"<parent_type>": "<child_type>"
}
}
}
}
}
}
I have added the routing number to be parent id while loading document.
You've said you want to retrieve the parent document based on child's field but your query does the opposite. You want has_child:
{
"query": {
"has_child" : {
"type" : "<child_type>",
"query" : {
"term" : {
"name" : "child_name"
}
}
}
}
}

Elastic Search Term Query Not Matching URL's

I am a beginner with Elastic search and I am working on a POC from last week.
I am having a URL field as a part of my document which contains URL's in the following format :"http://www.example.com/foo/navestelre-04-cop".
I can not define mapping to my whole object as every object has different keys except the URL.
Here is how I am creating my Index :
POST
{
"settings" : {
"number_of_shards" : 5,
"mappings" : {
"properties" : {
"url" : { "type" : "string","index":"not_analyzed" }
}
}
}
}
I am keeping my URL field as not_analyzed as I have learned from some resource that marking a field as not_analyzed will prevent it from tokenization and thus I can look for an exact match for that field in a term query.
I have also tried using the whitespace analyzer as the URL value thus not have any of the white space character. But again I am unable to get a successful Hit.
Below is my term query :
{
"query":{
"constant_score": {
"filter": {
"term": {
"url":"http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
I am guessing the problem is somewhere with the Analyzers and Tokenizers but I am unable to get to a solution. Any kind of help would be great to enhance my knowledge and would help me reach to a solution.
Thanks in Advance.
You have the right idea, but it looks like some small mistakes in your settings request are leading you astray. Here is the final index request:
POST /test
{
"settings": {
"number_of_shards" : 5
},
"mappings": {
"url_test": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Notice the added url_test type in the mapping. This lets ES know that your mapping applies to this document type. Also, settings and mappings are also different keys of the root object, so they have to be separated. Because your initial settings request was malformed, ES just ignored it, and used the standard analyzer on your document, which led to you not being able to query it with your query. I point you to the ES Mapping docs
We can index two documents to test with:
POST /test/url_test/1
{
"url":"http://www.example.com/foo/navestelre-04-cop"
}
POST /test/url_test/2
{
"url":"http://stackoverflow.com/questions/37326126/elastic-search-term-query-not-matching-urls"
}
And then execute your unmodified search query:
GET /test/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
Yields this result:
"hits": [
{
"_index": "test",
"_type": "url_test",
"_id": "1",
"_score": 1,
"_source": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
]

Trouble with has_parent query containing scripted function_score

I have two document types, in a parent-child relationship:
"myParent" : {
"properties" : {
"weight" : {
"type" : "double"
}
}
}
"myChild" : {
"_parent" : {
"type" : "myParent"
},
"_routing" : {
"required" : true
}
}
The weight field is to be used for custom scoring/sorting. This query directly against the parent documents works as intended:
{
"query" : {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
}
}
However, when trying to do similar scoring for the child documents with a has_parent query, I get an error:
{
"query" : {
"has_parent" : {
"query" : {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
},
"parent_type" : "myParent",
"score_type" : "score"
}
}
}
The error is:
QueryPhaseExecutionException[[myIndex][3]: query[filtered(ParentQuery[myParent](filtered(function score (ConstantScore(:),function=script[_score * doc['weight'].value], params [null]))->cache(_type:myParent)))->cache(_type:myChild)],from[0],size[10]: Query Failed [failed to execute context rewrite]]; nested: ElasticSearchIllegalArgumentException[No field found for [weight] in mapping with types [myChild]];
It seems like instead of applying the scoring function to the parent and then passing its result to the child, ES is trying to apply the scoring function itself to the child, causing the error.
If I don't use score for score_type, the error doesn't occur, although the results scores are then all 1.0, as documented.
What am I missing here? How can I query these child documents with custom scoring based on a parent field?
This I would say is a bug: it is using the myChild mapping as the default context, even though you are inside a has_parent query. But I'm not sure how easy the bug would be to fix. properly.
However, you can work around it by including the type name in the full field name:
curl -XGET "http://localhost:9200/t/myChild/_search" -d'
{
"query": {
"has_parent": {
"query": {
"function_score": {
"script_score": {
"script": "_score * doc[\"myParent.weight\"].value"
}
}
},
"parent_type": "myParent",
"score_type": "score"
}
}
}'
I've opened an issue to see if we can get this fixed #4914
I think the problem is that you are trying to score child documents based on a field in the parent document and that the function score should really be the other way round.
To solve the problem my idea would be to store the parent/child relation and the score with the child documents. Then you would filter for child documents and score them according to the weight in the child document.
An example:
"myParent" : {
"properties" : {
"name" : {
"type" : "string"
}
}
}
"myChild" : {
"_parent" : {
"type" : "myParent"
},
"_routing" : {
"required" : true
},
"properties": {
"weight" : {
"type" : "double"
}
}
}
Now you could use a has_parent filter to select all child documents that have a certain parent and then score them using the function score:
{
"query": {
"filtered": {
"query": {
"function_score" : {
"script_score" : {
"script" : "_score * doc['weight'].value"
}
}
},
"filter": {
"has_parent": {
"parent_type": "myParent",
"query": {
"term": {
"name": "something"
}
}
}
}
}
}
}
So if parent documents were blog posts and child comments, then you could filter all posts and score the comments based on weight. I doubt that scoring childs based on parents is possible though I might be wrong :)
Disclaimer: 1st post to stack overflow...

Resources