Elasticsearch bool query with filter terms and string search returning inconsistent results - elasticsearch

I'm running the following query against Elasticsearch that matches documents based on a string search and property terms match. When I pass a single term, I get the expected results, but when I add a second term, I don't get the same results. Ideas?
{
"_source": {
"includes": [
"docID"
]
},
"query": {
"bool": {
"must": [
{
"terms": {
"userID": [
1,
2,
71
]
}
},
{
"query_string": {
"query": "**test**",
"fields": [
"attachment.content"
]
}
}
]
}
}
}
If I pass only userID 1, and omit the others, I get the docIDs I expect (i.e. 1,4,8), but when I pass all three userIDs I have several docIDs missing from the results (i.e. 1, 6, 8, but no 4). Using Elasticsearch 6.5.
Hopefully someone understands better than I why this is!
Thanks in advance!

By default, ES returns result as 10. Maybe the missing documents are in the next page. We can increase the size to larger number such as:
{
"size": 30, // put size here
"_source": {
"includes": [
"docID"
]
},
"query": {
"bool": {
"must": [
{
"terms": {
"userID": [
1,
2,
71
]
}
},
{
"query_string": {
"query": "**test**",
"fields": [
"attachment.content"
]
}
}
]
}
}
}

Related

Specify size for each subquery in Elasticsearch

I have query that is similar to union operation in SQL. What i need is to specify the size of result set for each index. For example i want to get 10 records from first index and 15 records from second index.
My query:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [{
"match_phrase_prefix": {"userName": "ar" }
}]
}
},
{
"bool": {
"must": [{
"match_phrase_prefix": { "groupName": "ar" }
}]
}
}
]
}
}
}
Url to send query:
http://website.com:9200/user_data,group_data/_search
If you have any thoughts i'd be very grateful.
Thank you
I think you can't do that with a simple query.
But can do that with the Top Hits aggregation, which lets you group result sets by certain fields via a bucket aggregator. Your case should look like:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [{
"match_phrase_prefix": {"userName": "ar" }
}]
}
},
{
"bool": {
"must": [{
"match_phrase_prefix": { "groupName": "ar" }
}]
}
}
]
}
}, #Your query stills the same
"size": 0, #This will bring back nothing within the field "hits", so you can focus in the "aggregations" field.
"aggs": {
"10_usernames": {
"top_hits": {
"_source": {
"includes": [ "userName" ]
},
"size" : 10
}
},
"15_groupames": {
"top_hits": {
"_source": {
"includes": [ "groupName" ]
},
"size" : 15
}
}
}
}
You'll see your results within the "aggregations" field.
Hope this is helpful! :D
Ok, thanks for help
Eventually i've chosen another approach. I use Multi Search API, which allows you executing several requests at once. My query is:
POST http://website.com:9200/_msearch
{"index": "user_data"}
{"size":10,"query":{"bool":{"must":[{"match_phrase_prefix":{"userName":"##USER_TEXT##"}}]}}}
{"index": "group_data"}
{"size":15,"query":{"bool":{"must":[{"match_phrase_prefix":{"groupName":"##USER_TEXT##"}}]}}}

Elasticsearch search in documents with certain values for a field

I have an index with following document structure with 5 fields. I have written a search query as follows :
{
"query": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"from": 0,
"size": 1000
}
This works fine but as a new requirement I have to search only in documents where field4 has a given set of values suppose (1,2,3) and omit rest of the documents.
It is possible for me to obtain a list of field4 values which are to be omitted as they are present in the db with skip status.
Please suggest a solution for the same.Thanks in advance.
I suggest using a filter query inside a bool query to match the docs that meet the condition.
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"filter": {
"terms": {
"field4.keyword": [1, 2, 3]
}
}
}
}
}

Elastic search bool query

My objective is to find out most recent 10 documents which match message id as MSG-1013 and Severity field must be info. Both conditions should satisfied and match text should be exact. I have tried with search query below but it does not give me expected results. What am I doing wrong here ?
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": { "messageId": "MSG-1013" }
},
{
"match": { "Severity": "Info" }
}
]
}
}
}
If I have understood you correctly, you want to find the top 10 (recent) documents having exactly fields "messageId" and "Severity". I assume, you don't need a score because your score seems to be the the document timestamp or something else like a date field. For this purpose, you could use the bool filter in combination with a sort query.
{
"query": {
"bool": {
"filter": [
{ "term": { "messageId": "MSG-1013" } },
{ "term": { "Severity": "Info" } }
]
}
},
"sort" : [
{ "documentTimestamp" : {"order" : "desc"}}
],
"size": 10
}

"match_phrase" hit with no highlights returned

I have an index that includes the full text of different books belonging to a specific series. Each document represents a different volume in a series, and each volume has a set of nested documents on it corresponding to a section of text in that book. This is the query we are using in order to get highlights matching a specific phrase within all the books of a given series:
{
"from": 0,
"size": 3,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"sections.content.phrase": {
"query": "theory legal",
"type": "phrase",
"slop": X
}
}
}
]
}
},
"path": "sections",
"inner_hits": {
"highlight": {
"order": "score",
"fields": {
"sections.content.phrase": {}
}
},
"_source": {
"include": [
"title",
"id"
]
}
}
}
}
],
"filter": [
{
"term": {
"series": "00410"
}
}
]
}
}
}
Normally this query works fine, but for some series we can get hits in books with no highlighted text returned. For example with the above phrase query, series, and a slop value of 1 we correctly get a single hit for one book in the series: (each allegation of discrimination or each <em>theory</em> of <em>legal</em> recovery not required to be set forth in separate. If we take the same query and up the slop value to 3 we suddenly get hits in 5 different books each with no matching highlights found. Not even the original hit from when the slop value was 1 is returned. Why are we getting these results?

More_like_this query with a filter

I have 1702 documents indexed in elastic search which has category as one of the fields and it also has a field named SequentialId.
I initially fetched the documents with category 1.1 which are between the document 1 and document 850 like below.
**POST testucb/docs/_search
{
"size": 1702,
"query": {
"bool": {
"must": [
{"match": {
"Category": "1.1"
}}
],
"filter":[
{
"range":
{
"SequentialId":
{
"gte":1,
"lte":850
}
}
}
]
}
}
}**
the above query gave me 834 documents which matched category 1.1.(I have the binary to parse out the 834 _ids from the resultant JSON output.)
My goal now is to provide these 834 _ids into more_like this query as a training set for the remaining documents which is my test set(docs from sequentialid 851 to 1702 is my test set)
I tried this more_like_this query below with the filter.
POST /testucb/docs/_search
{
"size": 1702,
"fields": [
"SequentialId",
"Category",
"PRIMARY_CONTENT_EN"
],
"query": {
"more_like_this":
{
"fields": [
"PRIMARY_CONTENT_EN"
],
"like":[
<-----------834 _ids goes here ---->
],
**"filter":[
{
"range":
{
"SequentialId":
{
"gte":851,
"lte":1702**
}
}
}
],
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms": 15,
"min_word_len": 3,
"stop_words": [
],
"boost": 2,
"include":false
}
}
}
I am getting query parsing exception which says MLT does not support filter.
I am not sure how I can provide the remaining documents with sequentialid from 851 to 1702 as my test set .
I hope am clear with what I am expecting to accomplish.Can you guys please help me how to accomplish my task? I am new to elastic search .
If you want to do a more like this query and filter beforehand, you should use a bool query with filter clause (Elasticsearchversion > 2.0)
POST /testucb/docs/_search
{
"size": 1702,
"fields": [
"SequentialId",
"Category",
"PRIMARY_CONTENT_EN"
],
"query": {
"bool": {
"must": [
{
"more_like_this": {
"fields": [
"PRIMARY_CONTENT_EN"
],
"like": [
<-----------834 _ids goes here ---->
],
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms": 15,
"min_word_len": 3,
"stop_words": [],
"boost": 2,
"include": false
}
}
],
"filter": {
"range": {
"SequentialId": {
"gte": 851,
"lte": 1702
}
}
}
}
}
}
If you use an older version of elasticsearch, you should use the filtered query instead

Resources