More_like_this query with a filter - elasticsearch

I have 1702 documents indexed in elastic search which has category as one of the fields and it also has a field named SequentialId.
I initially fetched the documents with category 1.1 which are between the document 1 and document 850 like below.
**POST testucb/docs/_search
{
"size": 1702,
"query": {
"bool": {
"must": [
{"match": {
"Category": "1.1"
}}
],
"filter":[
{
"range":
{
"SequentialId":
{
"gte":1,
"lte":850
}
}
}
]
}
}
}**
the above query gave me 834 documents which matched category 1.1.(I have the binary to parse out the 834 _ids from the resultant JSON output.)
My goal now is to provide these 834 _ids into more_like this query as a training set for the remaining documents which is my test set(docs from sequentialid 851 to 1702 is my test set)
I tried this more_like_this query below with the filter.
POST /testucb/docs/_search
{
"size": 1702,
"fields": [
"SequentialId",
"Category",
"PRIMARY_CONTENT_EN"
],
"query": {
"more_like_this":
{
"fields": [
"PRIMARY_CONTENT_EN"
],
"like":[
<-----------834 _ids goes here ---->
],
**"filter":[
{
"range":
{
"SequentialId":
{
"gte":851,
"lte":1702**
}
}
}
],
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms": 15,
"min_word_len": 3,
"stop_words": [
],
"boost": 2,
"include":false
}
}
}
I am getting query parsing exception which says MLT does not support filter.
I am not sure how I can provide the remaining documents with sequentialid from 851 to 1702 as my test set .
I hope am clear with what I am expecting to accomplish.Can you guys please help me how to accomplish my task? I am new to elastic search .

If you want to do a more like this query and filter beforehand, you should use a bool query with filter clause (Elasticsearchversion > 2.0)
POST /testucb/docs/_search
{
"size": 1702,
"fields": [
"SequentialId",
"Category",
"PRIMARY_CONTENT_EN"
],
"query": {
"bool": {
"must": [
{
"more_like_this": {
"fields": [
"PRIMARY_CONTENT_EN"
],
"like": [
<-----------834 _ids goes here ---->
],
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms": 15,
"min_word_len": 3,
"stop_words": [],
"boost": 2,
"include": false
}
}
],
"filter": {
"range": {
"SequentialId": {
"gte": 851,
"lte": 1702
}
}
}
}
}
}
If you use an older version of elasticsearch, you should use the filtered query instead

Related

ElasticSearch : how to sort ES document based on document version

Following is my ES query I want to sort my document in descending order based on "_version" but not sure how to do it
{
"query":
{
"bool":
{
"must":
[
{
"terms":
{
"streamingSegmentId":
[
"00003319-b7fa-3409-806a-fa3bb5d2be26"
],
"boost": 1
}
},
{
"range":
{
"streamingSegmentStartTime":
{
"from": 1644480000000,
"to": 1647476658447,
"include_lower": true,
"include_upper": false,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"version": true,
"_source":
{
"includes":
[
"errorCount",
"benefitId",
"streamingSegmentStopTime",
"fanoutPublishTimestamp",
"sessionUpdateTime",
"contentSegmentUpdateTime"
],
"excludes":
[]
},
"sort":
[
{
"streamingSegmentStartTime":
{
"order": "asc"
}
},
{
"_version":
{
"order": "desc"
}
}
]
}
I was not able to sort based on _version, hence I added one timestamp field called fanoutPublishTimestamp to sort my document in descending order of time. Following is my udpated query and I'm using collapse to fetch only latest timestamp document. Now the recent problem I started facing with following query is collpase cannot be used with search_after. search_after I'm using to add pagination support in my ES query.
I'm using AWS Elastic search which is using 7.10 version of ES and 8.1 ES version only supports collapse with Search_after. Please let me know if anybody has better solution to deal with this issue
GET /sessions/_search
{
"size": 2,
"query": {
"bool": {
"must": [
{
"terms": {
"benefitId": [
"PRIME"
],
"boost": 1
}
},
{
"range": {
"streamingSegmentStartTime": {
"from": 1647821557000,
"to": 1647825157000,
"include_lower": true,
"include_upper": false,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"deviceTypeId",
"timeline"
],
"excludes": []
},
"sort": [
{
"streamingSegmentStartTime": {
"order": "asc"
}
},
{
"fanoutPublishTimestamp": {
"order": "desc"
}
}
],
"search_after": [
"1647821557001",
"1647829603837"
],
"collapse": {
"field": "streamingSegmentId"
}
}
As you didn't provide your sample documents, and didn't explain what it means that its not working, I am assuming that it's because of two sort param you are using, if you use just one _version it works(tested locally on my sample documents).
Mostly your another sort criteria streamingSegmentStartTime is causing few documents which has higher _version to come later in the response, try to remove it and see if it provided you expected result.

Elastic search query is not executed

Hi I am using elastic search engine to search for some items, items are placed in some buildings, when running this query, Items returned are not sorted even if I change the sort direction. My first impression is that the block sort is not even executed. Is there something wrong with the query ?
{
"from": 0,
"size": 20,
"query": {
"bool": {
"filter": [
{
"terms": {
"buildingsUuid": [
"9caff147-d019-416a-a167-f02bab7334fd"
],
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"sort": [
{
"itemId": {
"order": "desc"
}
}
]
}

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

Elasticsearch bool query with filter terms and string search returning inconsistent results

I'm running the following query against Elasticsearch that matches documents based on a string search and property terms match. When I pass a single term, I get the expected results, but when I add a second term, I don't get the same results. Ideas?
{
"_source": {
"includes": [
"docID"
]
},
"query": {
"bool": {
"must": [
{
"terms": {
"userID": [
1,
2,
71
]
}
},
{
"query_string": {
"query": "**test**",
"fields": [
"attachment.content"
]
}
}
]
}
}
}
If I pass only userID 1, and omit the others, I get the docIDs I expect (i.e. 1,4,8), but when I pass all three userIDs I have several docIDs missing from the results (i.e. 1, 6, 8, but no 4). Using Elasticsearch 6.5.
Hopefully someone understands better than I why this is!
Thanks in advance!
By default, ES returns result as 10. Maybe the missing documents are in the next page. We can increase the size to larger number such as:
{
"size": 30, // put size here
"_source": {
"includes": [
"docID"
]
},
"query": {
"bool": {
"must": [
{
"terms": {
"userID": [
1,
2,
71
]
}
},
{
"query_string": {
"query": "**test**",
"fields": [
"attachment.content"
]
}
}
]
}
}
}

elasticsearch: "More like this" combined with additional constraint

I just bumped into "more like this" functionality/api. Is there a possibility to combine the result from more_like_this with some additional search constraint?
I have two following ES query which works:
POST /h/B/_search
{
"query": {
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
}
}
}
Which returns
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 53,
"max_score": 3.2860293,
"hits": [
...
Which is fine but I need to specify additional constraint over other field of the underlying document which works separately fine:
POST /h/B/_search
{
"query": {
"bool": {
"must": {
"match": {
"Kind": "Pen"
}
}
}
}
}
I would love to combine those two to one, as the query should state: "Find a similar items to items labelled with Pen". I tried following with nested query but that gives me some error back:
POST /h/B/_search
{
"query": {
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
},
"nested": {
"query": {
"bool": {
"must": {
"match": {
"Kind": "Pen"
}
}
}
}
}
}
}
I tried several variant for combining those two search criteria but so far with no luck.
If someone more experienced could provide some hint that would be really appreciated.
Thanks
bool queries are used exactly for this purpose. A bool must is basically equivalent to the Boolean AND operation. Similarly you can use bool should for Boolean OR and bool must_not for Boolean NOT operations.
POST /h/B/_search
{
"query": {
"bool": {
"must": [
{
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
}
},
{
"match": {
"Kind": "Pen"
}
}
]
}
}
}

Resources