Debugging ElasticSearch boolean queries - elasticsearch

I'm faced with an issue with an automatically-generated elasticSearch query. When running it, six of the seven shards I'm using on the index return a success, and the seventh returns this error:
index: "shard"
reason: "ClassCastException[
org.elasticsearch.common.mvel2.compiler.BlankLiteral cannot be cast to java.lang.Boolean]"
shard: 1
status: 500
successful: 6
total: 7
How can I figure out what this is coming from, considering that the explain endpoint yields absolutely nothing due to the root query being a bool?
The query is as follows:
{
"query": {
"bool": {
"must": [
{
"filtered": {
"query": {
"range": {
"date": {
"lte": "2014-05-21T21:59:59+00:00",
"gte": "2013-01-23T23:00:00+00:00"
}
}
},
"filter": {
"not": {
"terms": {
"idCountry": [
"9999"
]
}
}
}
}
},
{
"filtered": {
"query": {
"nested": {
"path": "reports",
"query": {
"terms": {
"reports.36317.flag": [
"o"
],
"minimum_should_match": 1
}
}
}
},
"filter": {
"nested": {
"path": "reports",
"filter": {
"exists": {
"field": "reports.36317"
}
}
}
}
}
}
]
}
},
"script_fields": {
"idTone": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idTone.empty ? _source.idLanguage : _source.reports[reportId].idTone",
"params": {
"reportId": "36317"
}
},
"tags": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].tags.empty ? 'none' : _source.reports[reportId].tags",
"params": {
"reportId": "36317"
}
},
"flag": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].flag.empty ? 'O' : _source.reports[reportId].flag",
"params": {
"reportId": "36317"
}
},
"synthesioRank": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].synthesioRank.empty || _source.reports[reportId].synthesioRank == null ? '0' : _source.reports[reportId].synthesioRank",
"params": {
"reportId": "36317"
}
},
"idUserEngagement": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idUserEngagement == null ? '0' : _source.reports[reportId].idUserEngagement",
"params": {
"reportId": "36317"
}
},
"idStatus": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idStatus == null ? '0' : _source.reports[reportId].idStatus",
"params": {
"reportId": "36317"
}
}
},
"fields": [
"access",
"content",
"title",
"date",
"geo",
"idItem",
"idSiteType",
"idSite",
"idSource",
"idSourceType",
"idTopic",
"media",
"url",
"idLanguage",
"idDocument",
"idCountry"
]
}

The thrown exception has to do with mvel, the scripting language used for the script_fields in your case.
The fact that only 1 shard fails maybe means that the execution of one of your scripted fields fails against one specific document in that shard.
You could try and remove those fields, one by one, along with any filter requesting that field, to spot the one that fails.
Note : the Explain API is designed to help understand scoring computation inside a sorted query. It won't help you in any way regarding a failing query.

Related

full-text and knn_vector hybrid search for elastic

I am currently working on a search engine and i've started to implement semantic search. I use open distro version of elastic and my mapping look like this for the moment :
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"title": {
"type" : "text"
},
"data": {
"type" : "text"
},
"title_embeddings": {
"type": "knn_vector",
"dimension": 600
},
"data_embeddings": {
"type": "knn_vector",
"dimension": 600
}
}
}
}
for basic knn_vector search i use this :
{
"size": size,
"query": {
"script_score": {
"query": {
"match_all": { }
},
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
and i've managed to get a, kind of, hybrid search with this :
{
"size": size,
"query": {
"function_score": {
"query": {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"script_score": {
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
}
The problem is that if i don't have the word in the document, then it is not returned. For example, with the first search query, when i search for trump (which is not in my dataset) i manage to get document about social network and politic. I don't have these results with the hybrid search.
I have tried this :
{
"size": size,
"query": {
"function_score": {
"query": {
"match_all": { }
},
"functions": [
{
"filter" : {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"weight": 1
},
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 4
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
but the multi match part give a constant score to all documents that match and i want to use the filter to rank my document like in normal full text query. Any idea to do it ? Or should i use another strategy? Thank you in advance.
After the help of Archit Saxena here is the solution of my problems :
{
"size": size,
"query": {
"function_score": {
"query": {
"bool": {
"should" : [
{
"multi_match" : {
"query": query,
"fields": ["data", "title"]
}
},
{
"match_all": { }
}
],
"minimum_should_match" : 0
}
},
"functions": [
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 20
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}

How do I get the size of a 'nested' type array through a Painless script in Elasticsearch version 6.7?

I am using Elasticsearch version 6.7. I have the following mapping:
{
"customers": {
"mappings": {
"customer": {
"properties": {
"name": {
"type": "keyword"
},
"permissions": {
"type": "nested",
"properties": {
"entityId": {
"type": "keyword"
},
"entityType": {
"type": "keyword"
},
"permission": {
"type": "keyword"
},
"permissionLevel": {
"type": "keyword"
},
"userId": {
"type": "keyword"
}
}
}
}
}
}
}
}
I want to run a query to that shows all customers who have > 0 permissions. I have tried the following:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "params._source != null && params._source.permissions != null && params._source.permissions.size() > 0"
}
}
}
}
}
}
But this returns no hits because params._source is null as Painless does not have access to the _source document according to this Stackoverflow post. How can I write a Painless script that gives me all customers who have > 0 permissions?
Solution 1: Using Script with must query
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"inline": """
ArrayList st = params._source.permissions;
if(st!=null && st.size()>0)
return true;
"""
}
}
}
]
}
}
}
Solution 2: Using Exists Query on nested fields
You could simply make use of Exists query something like the below to get customers who have > 0 permissions.
Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"bool": {
"should": [
{
"exists":{
"field": "permissions.permission"
}
},
{
"exists":{
"field": "permissions.entityId"
}
},
{
"exists":{
"field": "permissions.entityType"
}
},
{
"exists":{
"field": "permissions.permissionLevel"
}
}
]
}
}
}
}]
}
}
}
Solution 3: Create definitive structure but add empty values to the fields
Another alternative would be to ensure all documents would have the fields.
Basically,
Ensure that all the documents would have the permissions nested document
However for those who would not have the permissions, just set the field permissions.permission to 0
Construct a query that could help you get such documents accordingly
Below would be a sample document for a user who doesn't have permissions:
POST mycustomers/customer/1
{
"name": "john doe",
"permissions": [
{
"entityId" : "null",
"entityType": "null",
"permissionLevel": 0,
"permission": 0
}
]
}
The query in that case would be as simple as this:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"range": {
"permissions.permission": {
"gte": 1
}
}
}
}
}
]
}
}
}
Hope this helps!

Filter by length of nested array

Here is my mapping:
{"field_name": {
"dynamic": "strict",
"properties": {...},
"type": "nested"
}}
And I am trying to filter only documents which have at least one field_name.
I tried:
{"query": {
"bool": {
"filter": [ { "script" : {
"script" : {
"inline": "doc['field_name'].length >= 1",
"lang": "painless"
} } ]
}
} }
But elasticsearch is screaming at me about No field found for [field_name] in mapping with types [type_name].
I also tried to wrap the previous query into a nested but didn't work either:
{ "nested": {
"path": "field_name",
"query": {
"bool": {
"filter": [ {
"script": {
"script": {
"inline": "doc['field_name'].length >= 1",
"lang": "painless"
}
}
} ]
}
}
} }
This gave the same error as above.
Any ideas?
if all object has same field , you can use exist to check if object exist, then use sum to calc count,then use script score to choose the condition you want. like below code
{
"query": {
"function_score": {
"query": {
"nested": {
"path": "field_name",
"query": {
"exists": {
"field": "field_name.same_field"
}
},
"score_mode": "sum"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "_score >= 1 ? 1 : 0"
}
}
}
],
"boost_mode": "replace"
}
},
"min_score": 1
}
What I ended up doing is adding a field my_array_length during construction time. Like that I can just filter by the value of this field.
Simple approach would be using exists term for each of the fields:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"exists": {
"field": "field_name.dynamic"
}
},
{
"exists": {
"field": "field_name.properties"
}
},
{
"exists": {
"field": "field_name.type"
}
}
],
"minimum_should_match": 1
}
}
}
}
}
You define should clause with minimum_should_match and get only relevant documents.
See exists, bool-query

Selecting documents with a specific field is set to NULL in Elasticsearch

Someone please help me to add expires_at IS NULL to ES query below. I looked into Dealing with Null Values section for missing filter but the way I used it (shown at the bottom) causes not expired documents not appearing in result so obviously I'm doing something wrong here.
Note: I don't want to use or query because it is deprecated in 2.0.0-beta1.
QUERY
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"term": {
"order_id": "123"
}
},
{
"term": {
"is_active": 1
}
},
{
"range": {
"expires_at": {
"gt": "2016-07-01T00:00:00+0000"
}
}
}
]
}
}
}
}
}
This is what I'm aiming at:
SELECT * FROM orders
WHERE
order_id = '123' AND
is_active = '1' AND
(expires_at > '2016-07-01T00:00:00+0000' OR expires_at IS NULL)
This is what I did, but un-expired documents won't show up in this case so this is wrong.
{
"query": {
"filtered": {
"filter": {
"missing": {
"field": "expires_at"
}
},
"query": {
"bool": {
"must": [
......
......
]
}
}
}
}
}
My ES version:
{
"status" : 200,
"name" : "Fan Boy",
"version" : {
"number" : "1.3.4",
"build_hash" : "a70f3ccb52200f8f2c87e9c370c6597448eb3e45",
"build_timestamp" : "2014-09-30T09:07:17Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
This should do it:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"term": {
"order_id": "123"
}
},
{
"term": {
"is_active": 1
}
},
{
"bool": {
"should": [
{
"range": {
"expires_at": {
"gt": "20160101000000"
}
}
},
{
"filtered": {
"filter": {
"missing": {
"field": "expires_at"
}
}
}
}
]
}
}
]
}
}
}
}
}

How to add 2 values in elasticsearch script?

I am trying to create a rows_processed field by adding 2 fields src_s_rows and tgt_s_rows, but some how it is not working, it always gives me 0. Even when I give "script": "(doc['src_s_rows'].value)" instead of "script": "(doc['src_s_rows'].value+doc['tgt_s_rows'].value)" it still gives me 0.
What is it that I am missing, please help.
GET run_hist/task_hist/_search
{
"fields": [
"THROUGHPUT_ROWS_PER_SEC",
"start_time",
"end_time",
"src_s_rows",
"tgt_s_rows"
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_id": "249885850"
}
}
]
}
}
}
},
"filter": {
"script": {
"script": "(doc['end_time'].value-doc['start_time'].value)>minutes*1",
"params": {
"minutes": 60000
}
}
},
"script_fields": {
"total_time_taken": {
"script": "(doc['end_time'].value-doc['start_time'].value)/1000"
},
"rows_processed": {
"script": "(doc['src_s_rows'].value+doc['tgt_s_rows'].value)"
}
},
"size": 10000
}
Screenshot given below
Use _source.src_s_rows.value in place of doc['src_s_rows'].value
try this
"script": "(_source.src_s_rows.value+_source.tgt_s_rows.value)"

Resources