elasticsearch need to add a must to a bool should query - elasticsearch

I have the following query that works as expected:
GET <index_name>/_search
{
"sort": [
{
"irFileCreateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"should": [
{
"match": {
"fileId": 46704
}
},
{
"match": {
"fileId": 46706
}
},
{
"match": {
"fileId": 46719
}
}
]
}
}
}
The problem is that I need to further filter the data, but the field I need to filter on is a text field. I have tried many different ways of putting a must match into my query but everything is either malformed or filters out all hits when I know it should only filter out half. How can I add a must match "irStatus":"COMPLETE" to this query? Thanks in advance.

What you're after is a term query on, preferably, the keyword of irStatus. That is to say:
GET index/_search
{
"sort": [
{
"irFileCreateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"term": {
"irStatus.keyword": {
"value": "COMPLETE"
}
}
}
],
"should": [
{
"match": {
"fileId": 46704
}
},
{
"match": {
"fileId": 46706
}
},
{
"match": {
"fileId": 46719
}
}
]
}
}
}
Assuming your mapping looks something like this:
{
"mappings": {
"properties": {
"irFileCreateTime": {
"type": "date"
},
"fileId": {
"type": "integer"
},
"irStatus": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
The reason it's apparently failing on your end is that "COMPLETE" has been lowercased due to standard analyzer.
Alternatively, you could do:
{
"must":[
{
"query_string":{
"query":"irStatus:COMPLETE AND (fileId:(46704 OR 46706 OR 46719))"
}
}
]
}

Related

Need help combining wildcard search with range query within elasticsearch?

I am trying to combine wildcard with date range in elastic search query but is not giving response based upon the wildcard search. It is returning response with items which have incorrect date range.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
]
}
}
]
}
}
}
The index mapping looks as below:
{
"index_history": {
"mappings": {
"applications_datalake": {
"properties": {
"query": {
"properties": {
"term": {
"properties": {
"server": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
},
"index-data-type": {
"properties": {
"attributes": {
"properties": {
"wwnListForServer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"hostName": {
"type": "keyword"
},
"requestDate": {
"type": "date"
},
"requestedBy": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
You missed minimum_should_match parameter,
Check this out :
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html.
I think your query should looklike this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
],
"minimum_should_match" : 2
}
}
]
}
}
}
From the documentation :
You can use the minimum_should_match parameter to specify the number
or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
According to your mappings, you have to call-out the fully qualified property for hostName and requestDate fields. Example:
"wildcard": {
"index-data-type.hostName": {
"value": "..."
}
}
Also, could also consider reducing your compound queries to just the main bool query, using the must clause, and apply a filter. Example:
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"wildcard": {
"index-data-type.hostName": {
"value": "*abc*"
}
}
}
],
"filter": {
"range": {
"index-data-type.requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
}
}
}
The filter context doesn't contribute to the _score yet it reduces your number of hits.
Warnining:
Using the leading asterisk (*) on a wildcard query can have severe performance impacts to your queries.

How do I get the size of a 'nested' type array through a Painless script in Elasticsearch version 6.7?

I am using Elasticsearch version 6.7. I have the following mapping:
{
"customers": {
"mappings": {
"customer": {
"properties": {
"name": {
"type": "keyword"
},
"permissions": {
"type": "nested",
"properties": {
"entityId": {
"type": "keyword"
},
"entityType": {
"type": "keyword"
},
"permission": {
"type": "keyword"
},
"permissionLevel": {
"type": "keyword"
},
"userId": {
"type": "keyword"
}
}
}
}
}
}
}
}
I want to run a query to that shows all customers who have > 0 permissions. I have tried the following:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "params._source != null && params._source.permissions != null && params._source.permissions.size() > 0"
}
}
}
}
}
}
But this returns no hits because params._source is null as Painless does not have access to the _source document according to this Stackoverflow post. How can I write a Painless script that gives me all customers who have > 0 permissions?
Solution 1: Using Script with must query
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"inline": """
ArrayList st = params._source.permissions;
if(st!=null && st.size()>0)
return true;
"""
}
}
}
]
}
}
}
Solution 2: Using Exists Query on nested fields
You could simply make use of Exists query something like the below to get customers who have > 0 permissions.
Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"bool": {
"should": [
{
"exists":{
"field": "permissions.permission"
}
},
{
"exists":{
"field": "permissions.entityId"
}
},
{
"exists":{
"field": "permissions.entityType"
}
},
{
"exists":{
"field": "permissions.permissionLevel"
}
}
]
}
}
}
}]
}
}
}
Solution 3: Create definitive structure but add empty values to the fields
Another alternative would be to ensure all documents would have the fields.
Basically,
Ensure that all the documents would have the permissions nested document
However for those who would not have the permissions, just set the field permissions.permission to 0
Construct a query that could help you get such documents accordingly
Below would be a sample document for a user who doesn't have permissions:
POST mycustomers/customer/1
{
"name": "john doe",
"permissions": [
{
"entityId" : "null",
"entityType": "null",
"permissionLevel": 0,
"permission": 0
}
]
}
The query in that case would be as simple as this:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"range": {
"permissions.permission": {
"gte": 1
}
}
}
}
}
]
}
}
}
Hope this helps!

Score keyword terms query on nested fields in elastichsearch 6.3

I have a set of keywords (skills in my example) and I would like to retrieve documents which match most of them. The documents should be sorted by how many of the keywords they match. The field i am searching into (skills) is of nested type. The index has the following mapping:
{
"mappings": {
"profiles": {
"properties": {
"id": {
"type": "keyword"
},
"skills": {
"type": "nested",
"properties": {
"level": {
"type": "float"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
I tried both a terms query on the keyword field like:
{
"query": {
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"python",
"java"
]
}
}
}
}
}
And a boolean query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"should": [
{
"terms": {
"skills.name": [
"java"
]
}
},
{
"terms": {
"skills.name": [
"r"
]
}
}
]
}
}
}
}
}
For both queries the maximum score of the returned documents is 1. Thus both return documents that have ANY of the skills, but do not sort them such those with both skills are on top. The issues seems to be that skills is a nested field.
The second query works if each element of should is a nested query.
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"java"
]
}
}
}
},
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"r"
]
}
}
}
}
]
}
}
}

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

ElasticSeach query not return results

I got 2 querys, the different between them is just 1 filter term.
The first query:
GET _search
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"and": {
"filters": [
{
"term": {
"type": "log"
}
},
{
"term": {
"context.blueprint_id": "adv1"
}
},
{
"term": {
"context.deployment_id": "deploy1"
}
}
]
}
}
}
}
}
return this result:
{
"_source": {
"level": "info",
"timestamp": "2014-03-24 10:12:41.925680",
"message_code": null,
"context": {
"blueprint_id": "Adv1",
"execution_id": "27efcba7-3a60-4270-bbe2-9f17d602dcef",
"deployment_id": "deploy1"
},
"type": "log",
"#version": "1",
"#timestamp": "2014-03-24T10:12:41.927Z"
}
}
The second query is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"and": {
"filters": [
{
"term": {
"type": "log"
}
},
{
"term": {
"context.blueprint_id": "adv1"
}
},
{
"term": {
"context.deployment_id": "deploy1"
}
},
{
"term": {
"context.execution_id": "27efcba7-3a60-4270-bbe2-9f17d602dcef"
}
}
]
}
}
}
}
}
return empty results.
the different between them is in the second query, i just add this term:
{
"term": {
"context.execution_id": "27efcba7-3a60-4270-bbe2-9f17d602dcef"
}
}
and in the result we can see that there is result match to that query, but it still not work.
what i'm doing wrong here?
thanks.
By default, ElasticSearch will treat string fields as text and will analyze them (i.e. tokenize, stem etc. before indexing). This means you might not be able to find them when searching for their exact content.
You should make sure the mapping for the execution_id field is not analyzed. Start with GET /_mappings and work from there. :)

Resources