Can't get join query in elastic - elasticsearch

I got elasticsearch version 7.3 and two indexes, profiles and purchases,
here is their mappings:
\purchases
{
"purchases": {
"mappings": {
"properties": {
"product": {
"type": "keyword"
},
"profile": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"profiles": "purchases"
}
}
}
}
}
}
\profiles
{
"profiles": {
"mappings": {
"properties": {
"user": {
"type": "keyword"
}
}
}
}
}
I added one profile with user:abc, _id:1 and two purchases this way
{
"profile": {"name": "profiles", "parent": "1"},
"product" : "tomato",
}
{
"profile": {"name": "profiles", "parent": "1"},
"product" : "tomato 2",
}
Then I do search query for purchases
{
"query": {
"has_parent": {
"parent_type": "profiles",
"query": {
"query_string": {
"query": "user:abc"
}
}
}
}
}
And I get empty result, what is wrong?

As stated in the documentation of the Join datatype you can not create parent-child-relationships over multiple indices:
The join datatype is a special field that creates parent/child relation within documents of the same index.
If you would like to use the join datatype, you have to model it in one index.
UPDATE
This is how your mapping and the Indexing of the documents would look like:
PUT profiles-purchases-index
{
"mappings": {
"properties": {
"user":{
"type": "keyword"
},
"product":{
"type": "keyword"
},
"profile":{
"type": "join",
"relations":{
"profiles": "purchases"
}
}
}
}
}
Index parent document:
PUT profiles-purchases-index/_doc/1
{
"user": "abc",
"profile": "profiles"
}
Index child documents:
PUT profiles-purchases-index/_doc/2?routing=1
{
"product": "tomato",
"profile":{
"name": "purchases",
"parent": 1
}
}
PUT profiles-purchases-index/_doc/3?routing=1
{
"product": "tomato 2",
"profile":{
"name": "purchases",
"parent": 1
}
}
Run Query:
GET profiles-purchases-index/_search
{
"query": {
"has_parent": {
"parent_type": "profiles",
"query": {
"match": {
"user": "abc"
}
}
}
}
}
Response:
{
...
"hits" : [
{
"_index" : "profiles-purchases-index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"product" : "tomato",
"profile" : {
"name" : "purchases",
"parent" : 1
}
}
},
{
"_index" : "profiles-purchases-index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"product" : "tomato 2",
"profile" : {
"name" : "purchases",
"parent" : 1
}
}
}
]
}
}
Notice that you have to set the routing parameter to index the child documents. But please refer to the documentation for that.

Related

Query with And & OR in Elastic Search

I'm new to Elastic Search, I have document like below :
Mapping of same JSON index is like below :
Mapping
{
"mappings": {
"properties": {
"age": {
"type": "long"
},
"hobbiles": {
"type": "keyword"
}
}
}
}
Some sample documents are like below :
[{
"_id": "test#domain.com",
"age": 12,
"hobbiles": [{
"name": "Singing",
"level": "begineer"
},
{
"name": "Dancing",
"level": "begineer"
}
]
},
{
"_id": "test1#domain.com",
"age": 7,
"hobbiles": [{
"name": "Coding",
"level": "begineer"
},
{
"name": "Chess",
"level": "begineer"
}
]
},
{
"_id": "test2#domain.com",
"age": 20,
"hobbiles": [{
"name": "Singing",
"level": "begineer"
},
{
"name": "Dancing",
"level": "begineer"
}
]
},
{
"_id": "test3#domain.com",
"age": 21,
"hobbiles": [{
"name": "Coding",
"level": "begineer"
},
{
"name": "Dancing",
"level": "Football"
}
]
}
]
Now I want to fetch documents where id IN (test#domain.com, test1#domain.com) and age is greater than 5. [operationally] hobiiles Football.
My expectations from output is I should get three documents: and if hobbies is not matching then also it should be fine but if hobbies matches then that document should be on top. Basically I want to match hobbies but its optional if it doesn't match then also I should get data based on prior clauses.
[test3#domain.com, test#domain.com, test1#domain.com]
test3 on top because Football matches there, and test and test1 because age and id matches there.
Tldr;
It can be achieved via bool queries.
Solution
PUT /_bulk
{"index":{"_index":"73935795", "_id":"test#domain.com"}}
{"age":12,"hobbiles":[{"name":"Singing","level":"begineer"},{"name":"Dancing","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test1#domain.com"}}
{"age":7,"hobbiles":[{"name":"Coding","level":"begineer"},{"name":"Chess","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test2#domain.com"}}
{"age":20,"hobbiles":[{"name":"Singing","level":"begineer"},{"name":"Dancing","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test3#domain.com"}}
{"age":21,"hobbiles":[{"name":"Coding","level":"begineer"},{"name":"Dancing","level":"Football"}]}
GET 73935795/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"age": {
"gt": 5
}
}
},
{
"terms": {
"_id": [
"test#domain.com",
"test1#domain.com",
"test3#domain.com"
]
}
}
],
"should": [
{
"query_string": {
"query": "(football) OR (begineer)",
"default_field": "hobbiles.level"
}
}
]
}
}
}
This requires using Should clause. Should is equivalent to "OR". So a document will be returned if it satisfies any one condition in should query.
For conditions on id and age I have used filter clause. It is equivalent to "AND" . Filter clause does not calculate score for matched documents so any document which matches "hobbiles.level" will be ranked higher.
Query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"hobbiles.level.keyword": {
"value": "Football"
}
}
},
{
"bool": {
"filter": [
{
"terms": {
"id.keyword": [
"test#domain.com",
"test1#domain.com"
]
}
},
{
"range": {
"age": {
"gt": 5
}
}
}
]
}
}
]
}
}
}
Result
"hits" : [
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "qE06noMBfFiM6spcUTo4",
"_score" : 1.3112575,
"_source" : {
"id" : "test3#domain.com",
"age" : 21,
"hobbiles" : [
{
"name" : "Coding",
"level" : "begineer"
},
{
"name" : "Dancing",
"level" : "Football"
}
]
}
},
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "pE03noMBfFiM6spc4jr2",
"_score" : 0.0,
"_source" : {
"id" : "test#domain.com",
"age" : 12,
"hobbiles" : [
{
"name" : "Singing",
"level" : "begineer"
},
{
"name" : "Dancing",
"level" : "begineer"
}
]
}
},
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "pU03noMBfFiM6spc6DqZ",
"_score" : 0.0,
"_source" : {
"id" : "test1#domain.com",
"age" : 7,
"hobbiles" : [
{
"name" : "Coding",
"level" : "begineer"
},
{
"name" : "Chess",
"level" : "begineer"
}
]
}
}
]

Does Elasticsearch provide highlighting on "copy_to" field in their newer versions?

I had used Elasticsearch few years ago(version 6.4.0) and they had no provision to provide highlight the "copy_to" field. I would like to know if they have this provision now?
Yes, highlight can enabled on copy_to field in latest version.
Please check below example which i have tried.
Index Mapping:
PUT my-index-000001
{
"mappings": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
Document Index:
PUT my-index-000001/_doc/1
{
"first_name": "John",
"last_name": "Smith"
}
Query:
GET my-index-000001/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
},
"highlight": {
"fields": {
"full_name": {}
}
}
}
Result:
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith"
},
"highlight" : {
"full_name" : [
"<em>Smith</em>",
"<em>John</em>"
]
}
}
]
Update 1: Search using copy_to field and highlight match to particular field
In below example, search will be happen on full_name field which is copy field and highlight will be happen on first_name field.
Query:
GET my-index-000001/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
},
"highlight": {
"require_field_match": "false",
"fields": {
"first_name": {}
}
}
}
Result:
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith"
},
"highlight" : {
"first_name" : [
"<em>John</em>"
]
}
}

has_child and has_parent not returning results

I went through the following links before pasting the ques
Elasticsearch has_child returning no results
ElasticSearch 7.3 has_parent/has_child don't return any hits
ES documentation
I created a simple mapping with text_doc as the parent and flag_doc as the child.
{
"doc_index_ap3" : {
"mappings" : {
"properties" : {
"domain" : {
"type" : "keyword"
},
"email_text" : {
"type" : "text"
},
"id" : {
"type" : "keyword"
},
"my_join_field" : {
"type" : "join",
"eager_global_ordinals" : true,
"relations" : {
"text_doc" : "flag_doc"
}
}
}
}
}
}
The query with parent_id works fine & returns 1 doc as expected
GET doc_index_ap3/_search
{
"query": {
"parent_id": {
"type": "flag_doc",
"id":"f0d2cb3c-bf4b-11eb-9f67-93a282921115"
}
}
}
But none of the below queries return any results.
GET doc_index_ap3/_search
{
"query": {
"has_parent": {
"parent_type": "text_doc",
"query": {
"match_all": {
}
}
}
}
}
GET doc_index_ap3/_search
{
"query": {
"has_child": {
"type": "flag_doc",
"query": {
"match_all": {}
}
}
}
}
There must be some issue in the way you have indexed the parent and child documents. Refer to this official documentation, to know more about parent-child relationship
Adding a working example using the same index mapping as given in the question above
Parent document in the text_doc context
PUT /index-name/_doc/1
{
"domain": "ab",
"email_text": "ab",
"id": "ab",
"my_join_field": {
"name": "text_doc"
}
}
Child document
PUT /index-name/_doc/2?routing=1&refresh
{
"domain": "cs",
"email_text": "cs",
"id": "cs",
"my_join_field": {
"name": "flag_doc",
"parent": "1"
}
}
Search Query:
{
"query": {
"has_parent": {
"parent_type": "text_doc",
"query": {
"match_all": {
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "67731507",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_routing": "1",
"_source": {
"domain": "cs",
"email_text": "cs",
"id": "cs",
"my_join_field": {
"name": "flag_doc",
"parent": "1"
}
}
}
]
Search Query:
{
"query": {
"has_child": {
"type": "flag_doc",
"query": {
"match_all": {}
}
}
}
}
Search Result:
"hits": [
{
"_index": "67731507",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"domain": "ab",
"email_text": "ab",
"id": "ab",
"my_join_field": {
"name": "text_doc"
}
}
}
]

How to build simple terms query for nested object?

I have index like this:
PUT job_offers
{
"mappings": {
"properties": {
"location": {
"properties": {
"slug": {
"type": "keyword"
},
"name": {
"type": "text"
}
},
"type": "nested"
},
"experience": {
"properties": {
"slug": {
"type": "keyword"
},
"name": {
"type": "text"
}
},
"type": "nested"
}
}
}
}
I insert this object:
POST job_offers/_doc
{
"title": "Junior Ruby on Rails Developer",
"location": [
{
"slug": "new-york",
"name": "New York"
},
{
"slug": "atlanta",
"name": "Atlanta"
},
{
"slug": "remote",
"name": "Remote"
}
],
"experience": [
{
"slug": "junior",
"name": "Junior"
}
]
}
This query returns 0 documents.
GET job_offers/_search
{
"query": {
"terms": {
"location.slug": [
"remote",
"new-york"
]
}
}
}
Can you explain me why? I thought it should return documents where location.slug is remote or new-york.
Nested- Query have a different syntax
GET job_offers/_search
{
"query": {
"nested": {
"path": "location",
"query": {
"terms": {
"location.slug": ["remote","new-york"]
}
}
}
}
}
Result:
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_score" : 1.0,
"_source" : {
"title" : "Junior Ruby on Rails Developer",
"location" : [
{
"slug" : "new-york",
"name" : "New York"
},
{
"slug" : "atlanta",
"name" : "Atlanta"
},
{
"slug" : "remote",
"name" : "Remote"
}
],
"experience" : [
{
"slug" : "junior",
"name" : "Junior"
}
]
}
}
]
It will return entire document where location.slug matches "remote" or "new-york". If you want to get matched nested document , you need to use inner_hits
GET job_offers/_search
{
"query": {
"nested": {
"path": "location",
"query": {
"terms": {
"location.slug": ["remote","new-york"]
}
},
"inner_hits": {} --> note
}
}
}
Result:
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_score" : 1.0,
"_source" : {
"title" : "Junior Ruby on Rails Developer",
"location" : [
{
"slug" : "new-york",
"name" : "New York"
},
{
"slug" : "atlanta",
"name" : "Atlanta"
},
{
"slug" : "remote",
"name" : "Remote"
}
],
"experience" : [
{
"slug" : "junior",
"name" : "Junior"
}
]
},
"inner_hits" : { --> will give matched nested object
"location" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_nested" : {
"field" : "location",
"offset" : 0
},
"_score" : 1.0,
"_source" : {
"slug" : "new-york",
"name" : "New York"
}
},
{
"_index" : "job_offers",
"_type" : "_doc",
"_id" : "wWjoXnEBs0rCGpYsvUf4",
"_nested" : {
"field" : "location",
"offset" : 2
},
"_score" : 1.0,
"_source" : {
"slug" : "remote",
"name" : "Remote"
}
}
]
}
}
}
}
]
Also I see that you are using two fields for same data with different types. if data is same in both fields(name and slug) and only data type is different, you can use fields for that
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
In that case your mapping will become below
PUT job_offers
{
"mappings": {
"properties": {
"location": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
},
"type": "nested"
},
"experience": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
},
"type": "nested"
}
}
}
}

How to filter by the size of an array in nested type?

Let's say I have the following type:
{
"2019-11-04": {
"mappings": {
"_doc": {
"properties": {
"labels": {
"type": "nested",
"properties": {
"confidence": {
"type": "float"
},
"created_at": {
"type": "date",
"format": "strict_date_optional_time||date_time||epoch_millis"
},
"label": {
"type": "keyword"
},
"updated_at": {
"type": "date",
"format": "strict_date_optional_time||date_time||epoch_millis"
},
"value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "float",
"ignore_malformed": true
}
}
}
}
},
"params": {
"type": "object"
},
"type": {
"type": "keyword"
}
}
}
}
}
}
And I want to filter by the size/length of the labels array. I've tried the following (as the official docs suggest):
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['labels'].size > 10"
}
}
}
}
}
}
but I keep getting:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"doc['labels'].size > 10",
" ^---- HERE"
],
"script": "doc['labels'].size > 10",
"lang": "painless"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "2019-11-04",
"node": "kk5MNRPoR4SYeQpLk2By3A",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"doc['labels'].size > 10",
" ^---- HERE"
],
"script": "doc['labels'].size > 10",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [labels] in mapping with types []"
}
}
}
]
},
"status": 500
}
I'm afraid that is not something possible, because the field labels is not a field that ES saves or albiet creates an inverted index on.
Doc doc['fieldname'] is only applicable on the fields on which inverted index is created and Elasticsearch's Query DSL too only works on fields on which inverted index gets created and unfortunately nested type is not a valid field on which inverted index is created.
Having said so, I have the below two ways of doing this.
For the sake of simplicity, I've created sample mapping, documents and two possible solutions which may help you.
Mapping:
PUT my_sample_index
{
"mappings": {
"properties": {
"myfield": {
"type": "nested",
"properties": {
"label": {
"type": "keyword"
}
}
}
}
}
}
Sample Documents:
// single field inside 'myfield'
POST my_sample_index/_doc/1
{
"myfield": {
"label": ["New York", "LA", "Austin"]
}
}
// two fields inside 'myfield'
POST my_sample_index/_doc/2
{
"myfield": {
"label": ["London", "Leicester", "Newcastle", "Liverpool"],
"country": "England"
}
}
Solution 1: Using Script Fields (Managing at Application Level)
I have a workaround to get what you want, well not exactly but would help you filter out on your service layer or application.
POST my_sample_index/_search
{
"_source": "*",
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"script_fields": {
"label_size": {
"script": {
"lang": "painless",
"source": "params['_source']['labels'].size() > 1"
}
}
}
}
You would notice that in response a separate field label_size gets created with true or false value.
A sample response is something like below:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_sample_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"myfield" : {
"label" : [
"New York",
"LA",
"Austin"
]
}
},
"fields" : {
"label_size" : [ <---- Scripted Field
false
]
}
},
{
"_index" : "my_sample_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"myfield" : {
"country" : "England",
"label" : [
"London",
"Leicester",
"Newcastle",
"Liverpool"
]
}
},
"fields" : { <---- Scripted Field
"label_size" : [
true <---- True because it has two fields 'labels' and 'country'
]
}
}
]
}
}
Note that only second document makes sense as it has two fields i.e. country and labels. However if you only want the docs with label_size with true, that'd would have to be managed at your application layer.
Solution 2: Reindexing with labels.size using Script Processor
Create a new index as below:
PUT my_sample_index_temp
{
"mappings": {
"properties": {
"myfield": {
"type": "nested",
"properties": {
"label": {
"type": "keyword"
}
}
},
"labels_size":{ <---- New Field where we'd store the size
"type": "integer"
}
}
}
}
Create the below pipeline:
PUT _ingest/pipeline/set_labels_size
{
"description": "sets the value of labels size",
"processors": [
{
"script": {
"source": """
ctx.labels_size = ctx.myfield.size();
"""
}
}
]
}
Use Reindex API to reindex from my_sample_index index
POST _reindex
{
"source": {
"index": "my_sample_index"
},
"dest": {
"index": "my_sample_index_temp",
"pipeline": "set_labels_size"
}
}
Verify the documents in my_sample_index_temp using GET my_sample_index_temp/_search
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_sample_index_temp",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"labels_size" : 1, <---- New Field Created
"myfield" : {
"label" : [
"New York",
"LA",
"Austin"
]
}
}
},
{
"_index" : "my_sample_index_temp",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"labels_size" : 2, <----- New Field Created
"myfield" : {
"country" : "England",
"label" : [
"London",
"Leicester",
"Newcastle",
"Liverpool"
]
}
}
}
]
}
}
Now you can simply use this field labels_size in your query and its way easier and not to mention efficient.
Hope this helps!
You can solve it with a custom score approach:
GET 2019-11-04/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "params['_source']['labels'].length > 10 ? 1 : 0"
}
}
}
]
}
}
}

Resources