Elastic Search shows "Unknown key for a START_OBJECT" exception - elasticsearch

I am sending the following query to elastic search in order to get data which are within the range of the values between the from and to:
{
"range" : {
"variables.value.long" : {
"from" : -1.0E19,
"to" : 9.1E18,
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}.
}
}
Despite that elastic search throws the following error:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [range].",
"line": 2,
"col": 13
}
],
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [range].",
"line": 2,
"col": 13
},
"status": 400
}
Does anybody know what this error means and why I am getting it?

There is some lack of context here like your mappings or the full query you are running, but this is how a range query should look for your document.
Create index
PUT test_andromachiii
{
"mappings": {
"properties": {
"variables": {
"properties": {
"values": {
"properties": {
"long": {
"type": "double"
}
}
}
}
}
}
}
}
Index document
POST test_andromachiii/_doc
{
"variables": {
"values": {
"long": 9.1E18
}
}
}
Run Query
POST test_andromachiii/_search
{
"query": {
"range": {
"variables.values.long": {
"lte": -1.0E19,
"gte": 9.1E18,
"boost": 1
}
}
}
}
Note lte means lower or equals to, gte greater or equals to.
Response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_andromachiii",
"_type" : "_doc",
"_id" : "gtGj73cBbr4pOF0Is9my",
"_score" : 1.0,
"_source" : {
"variables" : {
"values" : {
"long" : 9.1E18
}
}
}
}
]
}
}

It looks like you're using version <0.90.4. If that's the case, simply wrap your range in a parent query object:
{
"query":{
"range":{
"variables.value.long":{
"from":-1.0E19,
"to":9.1E18,
"include_lower":true,
"include_upper":true,
"boost":1.0
}
}
}
}
If you're using any newer version than that, note that:
The from, to, include_lower and include_upper parameters have been deprecated in 0.90.4 in favour of gt, gte, lt, and lte.

This error is saying (somewhat cryptically) that you have a key range with an Object value, in a place where that key isn't recognised.
The specific cause here is that your range needs to be part of a higher query key such as (i.e.) the bool query, not part of the main.
Credit: https://discuss.elastic.co/t/unknown-key-for-a-start-object-in-should/140008/3

Related

ElasticSearch Query fields based on conditions on another field

Mapping
PUT /employee
{
"mappings": {
"post": {
"properties": {
"name": {
"type": "keyword"
},
"email_ids": {
"properties":{
"id" : { "type" : "integer"},
"value" : { "type" : "keyword"}
}
},
"primary_email_id":{
"type": "integer"
}
}
}
}
}
Data
POST employee/post/1
{
"name": "John",
"email_ids": [
{
"id" : 1,
"value" : "1#email.com"
},
{
"id" : 2,
"value" : "2#email.com"
}
],
"primary_email_id": 2 // Here 2 refers to the id field of email_ids.id (2#email.com).
}
I need help to form a query to check if an email id is already taken as a primary email?
eg: If I query for 1#email.com I should get result as No as 1#email.com is not a primary email id.
If I query for 2#email.com I should get result as Yes as 2#email.com is a primary email id for John.
As far as i know with this mapping you can not achive what you are expecting.
But, You can create email_ids field as nested type and add one more field like isPrimary and set value of it to true whenever email is primary email.
Index Mapping
PUT employee
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"email_ids": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"value": {
"type": "keyword"
},
"isPrimary":{
"type": "boolean"
}
}
},
"primary_email_id": {
"type": "integer"
}
}
}
}
Sample Document
POST employee/_doc/1
{
"name": "John",
"email_ids": [
{
"id": 1,
"value": "1#email.com"
},
{
"id": 2,
"value": "2#email.com",
"isPrimary": true
}
],
"primary_email_id": 2
}
Query
You need to keep below query as it is and only need to change email address when you want to see if email is primary or not.
POST employee/_search
{
"_source": false,
"query": {
"nested": {
"path": "email_ids",
"query": {
"bool": {
"must": [
{
"term": {
"email_ids.value": {
"value": "2#email.com"
}
}
},
{
"term": {
"email_ids.isPrimary": {
"value": "true"
}
}
}
]
}
}
}
}
}
Result
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.98082924,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.98082924
}
]
}
}
Interpret Result:
Elasticsearch will not return result in boolean like true or false but you can implement it at application level. You can consider value of hits.total.value from result, if it is 0 then you can consider false otherwise true.
PS: Answer is based on ES version 7.10.

Named rescore queries

Named queries help me to identify which part of the query hit.
For normal queries this works perfectly.
However for rescore queries named queries don't show up in the response.
Question: Is this a bug or intentional? Is there a workaround?
Update: I raised a feature request
I attached some code to reproduce the problem:
Set up a test index with a single document
PUT /test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
},
"field2": {
"type": "keyword"
}
}
}
}
POST /test/_doc/1
{
"field1": "a",
"field2": "b"
}
"Normal" and rescore query.
GET /test/_search
{
"query": {
"term": {
"field1": {
"value": "a",
"_name": "query_field_1"
}
}
},
"rescore": {
"query": {
"rescore_query": {
"term": {
"field2": {
"value": "b",
"_name": "query_field_2"
}
}
}
},
"window_size": 50
}
}
Response: Only the name of the "normal" query shows up in matched_queries.
That "query_field_2" must have also hit can be ensured by comparing the score with and without the rescore query.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"field1" : "a",
"field2" : "b"
},
"matched_queries" : [
"query_field_1" <<-----HERE I'D EXPECT query_field_2------
]
}
]
}
}
The naming is perhaps unfortunate but the rescore query just tweaks the scoring and is applied after the query and post_filter phases. So since it's not an actual query, it cannot be _named.
It's certainly worth a feature request though.

How to insert a scripted field using igestion pipeline

So I have two fields in my docs
{
emails: ["", "", ""]
name: "",
}
And I want to have a new field once the docs are indexed called uid which will just contain the concatenated strings of all the emails and the name for every doc.
I am able to get scripted field like that using this GET request on my index _search endpoint
{
"script_fields": {
"combined": {
"script": {
"lang": "painless",
"source": "def result=''; for (String email: doc['emails.keyword']) { result = result + email;} return doc['name'].value + result;"
}
}
}
}
I want to know what my ingest pipeline PUT request body should look like if I want to have the same scripted field indexed with my docs?
Let's say I have the below sample index and sample document.
Sample Source Index
For the sake of understanding, I've created the below mapping.
PUT my_source_index
{
"mappings": {
"properties": {
"email":{
"type":"text"
},
"name":{
"type": "text"
}
}
}
}
Sample Document:
POST my_source_index/_doc/1
{
"email": ["john#gmail.com","doe#outlook.com"],
"name": "johndoe"
}
Just follow the below steps
Step 1: Create Ingest Pipeline
PUT _ingest/pipeline/my-pipeline-concat
{
"description" : "describe pipeline",
"processors" : [
{
"join": {
"field": "email",
"target_field": "temp_uuid",
"separator": "-"
}
},
{
"set": {
"field": "uuid",
"value": "{{name}}-{{temp_uuid}}"
}
},
{
"remove":{
"field": "temp_uuid"
}
}
]
}
Notice that I've made use of Ingest API where I am using three processors while creating the above pipeline which would be executed in sequence:
The first processor is a Join Processor, which concatenates all the email ids and creates temp_uuid.
Second Processor is a Set Processor, I am combining name with temp_uuid.
And in the third step, I am removing the temp_uuid using Remove Processor
Note that I am using - as delimiter between all values. You can feel free to use anything you want.
Step 2: Create Destination Index:
PUT my_dest_index
{
"mappings": {
"properties": {
"email":{
"type":"text"
},
"name":{
"type": "text"
},
"uuid":{ <--- Do not forget to add this
"type": "text"
}
}
}
}
Step 3: Apply Reindex API:
POST _reindex
{
"source": {
"index": "my_source_index"
},
"dest": {
"index": "my_dest_index",
"pipeline": "my-pipeline-concat" <--- Make sure you add pipeline here
}
}
Note how I've mentioned the pipeline while using Reindex API
Step 4: Verify Destination Index:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_dest_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "johndoe",
"uuid" : "johndoe-john#gmail.com-doe#outlook.com", <--- Note this
"email" : [
"john#gmail.com",
"doe#outlook.com"
]
}
}
]
}
}
Hope this helps!

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

access query value from function_score to compute new score

I need to customize ES score. The score function I need to implement is:
score = len(document_term) - len(query_term)
For instance, one of my document in the ES index is :
{
"name": "foobar"
}
And the search query
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "foo"
}
}
},
"functions": [
{
"script_score": {
"script": {
"source": "doc['name'].value.length() - ?LEN(query_tem)?"
}
}
}
],
"boost_mode": "replace"
}
}
}
The above search should provide a score of 6 - 3 = 3. But I didn't find a solution to get access the value of the query term.
Is it possible to access the value of the query term in a function_score context ?
There is no direct way to do this, however you can achieve that in the below way where you would need to add the query parameters in two different parts of the query.
Before that one important note, you cannot apply the doc['myfield'].value if the field is of type text, instead you would need to have its sibling field created as keyword and refer that in the script, which again I've mentioned below:
Mapping:
PUT myindex
{
"mappings" : {
"properties" : {
"myfield" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Sample Document:
POST myquery/_doc/1
{
"myfield": "I've become comfortably numb"
}
Query:
POST <your_index_name>/_search
{
"query": {
"function_score": {
"query": {
"match": {
"myfield": "numb"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "return doc['myfield.keyword'].value.length() - params.myquery.length()",
"params": {
"myquery": "numb" <---- Add the query string here as well
}
}
}
}
],
"boost_mode": "replace"
}
}
}
Response:
{
"took" : 558,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 24.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 24.0,
"_source" : {
"myfield" : "I've become comfortably numb"
}
}
]
}
}
Hope this helps!

Resources