I have the following search query:
{
"query": {
"match": {
"name": "testlib"
}
}
}
When I do this query I get the three results below. What I want to do now is only return one result: the newest #timestamp that doesn't contain version_pre. So in this case, only return AV6qvDXDyHw9vNh6Wlpl.
[
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDXDyHw9vNh6Wlpl",
"_score": 0.2876821,
"_source": {
"#timestamp": "2017-09-21T11:02:15-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 1
}
},
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDF5MtcMTuGknsVs",
"_score": 0.18232156,
"_source": {
"#timestamp": "2017-09-20T17:21:35-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 0
}
},
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDnVyHw9vNh6Wlpn",
"_score": 0.18232156,
"_source": {
"#timestamp": "2017-09-22T13:56:55-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 2,
"version_pre": 0
}
}
]
Use sort (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html) and https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-exists-query.html:
{
"size" : 1,
"sort" : [{ "#timestamp" : {"order" : "asc"}}],
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "version_pre"
}
}
}
Or even, via query string:
/_search?sort=#timestamp:desc&size=1&q=_missing_:version_pre
Related
Within an Elastic Search index I am attempting to query by 2 distinct top-level field values from field companyName and field productName, ordered by a generatedDate field and include the domainModelId field.
The following SQL query shows the results of all existing values and I've high-lighted the two unique document rows (in this case) by generatedDate;
{
"query": "SELECT companyName, productName, generatedDate FROM nextware_domain_metaservices_domainmodel ORDER BY generatedDate DESC"
}
response as follows:
I tried the following
{
"size":0,
"aggs":
{
"companies":
{
"terms":
{
"field": "companyName.keyword"
},
"aggs":
{
"products":
{
"terms":
{
"field": "productName.keyword"
}
}
}
}
}
}
This returns the correct buckets as follows;
"aggregations": {
"companies": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NextWare",
"doc_count": 18,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ProductPortal",
"doc_count": 16
},
{
"key": "Domain",
"doc_count": 2
}
]
}
}
]
}
}
How can I include the value of domainModelId.Id field without a second query?
To include the value of domainModelId.Id, you need to use top_hits aggregation
Adding a working example with index data, search query, and search result
Index Data:
{
"companyName":"NextWare",
"productName":"Domain",
"domainModelId.Id":"i"
}
{
"companyName":"NextWare",
"productName":"Domain",
"domainModelId.Id":"c"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"a"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"b"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"d"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"e"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"f"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"g"
}
{
"companyName":"NextWare",
"productName":"ProductPortal",
"domainModelId.Id":"h"
}
Search Query:
{
"size": 0,
"aggs": {
"companies": {
"terms": {
"field": "companyName.keyword"
},
"aggs": {
"products": {
"terms": {
"field": "productName.keyword"
},
"aggs": {
"top_ids": {
"top_hits": {
"_source": {
"includes": [
"domainModelId.Id"
]
},
"size": 10
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"companies": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NextWare",
"doc_count": 9,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ProductPortal",
"doc_count": 7,
"top_ids": {
"hits": {
"total": {
"value": 7,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "67049816",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"domainModelId.Id": "a"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"domainModelId.Id": "b"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"domainModelId.Id": "d"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "5",
"_score": 1.0,
"_source": {
"domainModelId.Id": "e"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"domainModelId.Id": "f"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "7",
"_score": 1.0,
"_source": {
"domainModelId.Id": "g"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "8",
"_score": 1.0,
"_source": {
"domainModelId.Id": "h"
}
}
]
}
}
},
{
"key": "Domain",
"doc_count": 2,
"top_ids": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "67049816",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"domainModelId.Id": "c"
}
},
{
"_index": "67049816",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"domainModelId.Id": "i"
}
}
]
}
}
}
]
}
}
]
}
}
After searching for sometime and seeing some answer not able to quite figure out the query for my requirement
My requirement is i have a list of document ids, what i need to find is the documents which are older than a specified range.
Scenario what i am trying:
total document present 10 documents with id ranging from 1 to 10.
trying to fetch 1,2,3 document if its 7 days older.
if only document 1,2 is 7 days older than it should only return 1 and 2 document and ignore the document 3 (if other documents are there which are 7 days older apart from document with id 1,2,3 it should not return in the result as i am passing the ids in the query).
Documents in index
{
"took": 391,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "user",
"_type": "test",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "string1",
"publishedDate": "2020-11-13T19:11:13.654Z"
}
},
{
"_index": "user",
"_type": "test",
"_id": "2",
"_score": 1.0,
"_source": {
"title": "string2",
"publishedDate": "2020-08-13T19:11:13.654Z"
}
},
{
"_index": "user",
"_type": "test",
"_id": "3",
"_score": 1.0,
"_source": {
"title": "string3",
"publishedDate": "2020-11-09T19:11:13.654Z"
}
},
{
"_index": "user",
"_type": "test",
"_id": "4",
"_score": 1.0,
"_source": {
"title": "string4",
"publishedDate": "2020-11-02T19:11:13.654Z"
}
}
]
}
}
Below is the query i am trying:
{
"query": {
"bool" : {
"must" : [
{"term" : {"_id" : {"value" : "1"}}},
{"term" : {"_id" : {"value" : "2"}}},
{"term" : {"_id" : {"value" : "3"}}}
],
"filter" : [
{"range" : {"publishedDate" : {"from" : "now-7d","to" : "now",
"include_lower" : true,"include_upper" : true,"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
Ideally it should return document 1 and 2 as only those two documents match with the query but above query doesn't return any result.
i think i am doing something wrong in the query.
can someone please help me with this.
Thanks in advance
If you want to retrieve those documents that are max 7 days older than the current date, then it should return only document 1, as document 2 is older than 7 days.
Adding a working example with search query and search result
Search Query:
{
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
"1",
"2",
"3"
]
}
}
],
"filter": [
{
"range": {
"publishedDate": {
"from": "now-7d",
"to": "now",
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "64906019",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"title": "string1",
"publishedDate": "2020-11-13T19:11:13.654Z"
}
}
]
Update 1:
Your query will also work if you just replace the must clause with the should clause
{
"query": {
"bool": {
"should": [ <-- note this
{
"term": {
"_id": "1"
}
},
{
"term": {
"_id": "2"
}
},
{
"term": {
"_id": "3"
}
}
],
"filter": [
{
"range": {
"publishedDate": {
"from": "now-7d",
"to": "now",
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
]
}
}
}
Here, I have a indexed document like:
doc = {
"id": 1,
"content": [
{
"txt": I,
"time": 0,
},
{
"txt": have,
"time": 1,
},
{
"txt": a book,
"time": 2,
},
{
"txt": do not match this block,
"time": 3,
},
]
}
And I want to match "I have a book", and return the matched time: 0,1,2. Is there anyone who knows how to build the index and the query for this situation?
I think the "content.txt" should be flattened but "content.time" should be nested?
want to match "I have a book", and return the matched time: 0,1,2.
Adding a working example with index mapping,search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"content": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"nested": {
"path": "content",
"query": {
"bool": {
"must": [
{
"match": {
"content.txt": "I have a book"
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result:
"inner_hits": {
"content": {
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 2.5226097,
"hits": [
{
"_index": "64752029",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "content",
"offset": 2
},
"_score": 2.5226097,
"_source": {
"txt": "a book",
"time": 2
}
},
{
"_index": "64752029",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "content",
"offset": 0
},
"_score": 1.5580825,
"_source": {
"txt": "I",
"time": 0
}
},
{
"_index": "64752029",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "content",
"offset": 1
},
"_score": 1.5580825,
"_source": {
"txt": "have",
"time": 1
}
}
]
}
}
}
}
Given the following index
PUT /test_index
{
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "whitespace",
"similarity": "boolean"
},
"field2": {
"type": "text",
"analyzer": "whitespace",
"similarity": "boolean"
}
}
}
}
and the following data
POST /test_index/_bulk?refresh=true
{ "index" : {} }
{ "field1": "foo", "field2": "bar"}
{ "index" : {} }
{ "field1": "foo1 foo2", "field2": "bar1 bar2"}
{ "index" : {} }
{ "field1": "foo1 foo2 foo3", "field2": "bar1 bar2 bar3"}
for the given Boolean similarity query
POST /test_index/_search
{
"size": 10,
"min_score": 0.4,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy":{
"field1":{
"value":"foo",
"fuzziness":"AUTO",
"boost": 1
}
}
},
{
"fuzzy":{
"field2":{
"value":"bar",
"fuzziness":"AUTO",
"boost": 1
}
}
}
]
}
}
}
}
}
I'm always receiving ["foo1 foo2 foo3", "bar1 bar2 bar3"] despite the fact that there is an exact result in index (the first one):
{
"took": 114,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 3.9999998,
"hits": [
{
"_index": "test_index",
"_type": "_doc",
"_id": "bXw8eXUBCTtfNv84bNPr",
"_score": 3.9999998,
"_source": {
"field1": "foo1 foo2 foo3",
"field2": "bar1 bar2 bar3"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "bHw8eXUBCTtfNv84bNPr",
"_score": 2.6666665,
"_source": {
"field1": "foo1 foo2",
"field2": "bar1 bar2"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "a3w8eXUBCTtfNv84bNPr",
"_score": 2.0,
"_source": {
"field1": "foo",
"field2": "bar"
}
}
]
}
}
I'm aware of the fact that Boolean works that way to match as many results, and I know I can do rescoring here, but this is not an option since I don't know how many top N results to fetch.
Are there any other options here? Maybe to create my own similarity plugin based on Boolean similarity to remove duplicates and leave the best matched token, but I don't know where to start from, I see only samples for script and rescore.
Update:- Based on the clarity provided in the comment section of my earlier answer, updating the answer.
Below query returns the expected results
{
"min_score": 0.4,
"size":10,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy": {
"field1": {
"value": "foo",
"fuzziness": "AUTO",
"boost": 0.5
}
}
},
{
"term": { --> used for boosting the exact terms
"field1": {
"value": "foo",
"boost": 1.5 --> further boosting the exact match.
}
}
}
]
}
}
}
}
}
And search results
"hits": [
{
"_index": "test_index",
"_type": "_doc",
"_id": "zdMEvHUBlo4-1mHbtvNH",
"_score": 2.0,
"_source": {
"field1": "foo",
"field2": "bar"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "z9MEvHUBlo4-1mHbtvNH",
"_score": 0.99999994,
"_source": {
"field1": "foo1 foo2 foo3",
"field2": "bar1 bar2 bar3"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "ztMEvHUBlo4-1mHbtvNH",
"_score": 0.6666666,
"_source": {
"field1": "foo1 foo2",
"field2": "bar1 bar2"
}
}
]
Another query without the explicit boost of the exact term also returns the expected results
{
"min_score": 0.4,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy": {
"field1": {
"value": "foo",
"fuzziness": "AUTO",
"boost": 0.5
}
}
},
{
"term": {
"field1": {
"value": "foo" --> notice there is no boost
}
}
}
]
}
}
}
}
}
And search result
"hits": [
{
"_index": "test_index",
"_type": "_doc",
"_id": "zdMEvHUBlo4-1mHbtvNH",
"_score": 1.5,
"_source": {
"field1": "foo",
"field2": "bar"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "z9MEvHUBlo4-1mHbtvNH",
"_score": 0.99999994,
"_source": {
"field1": "foo1 foo2 foo3",
"field2": "bar1 bar2 bar3"
}
},
{
"_index": "test_index",
"_type": "_doc",
"_id": "ztMEvHUBlo4-1mHbtvNH",
"_score": 0.6666666,
"_source": {
"field1": "foo1 foo2",
"field2": "bar1 bar2"
}
}
]
Is there a way to query starting from a particular value and get the next n records in Elasticsearch?
For example, I want to get 10 records starting from employee id "ABC_123".
The below query gives an error saying
[terms] query does not support [empId]
GET /_search
{
"from": 0, "size": 10,
"query" : {
"terms" : {
"empId" : "ABC_123"
}
}
}
What can I do about this?
You can use the prefix query, Also you can read more about the autocomplete on my blog, which discussed 4 approaches to make it work and their trade-off.
I used prefix query on your sample data and got the expected output and below is the step by step guide.
Index mapping
{
"mappings": {
"properties": {
"empId": {
"type": "keyword" --> field type `keyword`
}
}
}
}
Index sample docs
{
"empId" : "ABC_1231"
}
{
"empId" : "ABC_1232"
}
{
"empId" : "ABC_1233"
}
{
"empId" : "ABC_1234"
}
and so on
Prefix Search query
{
"from": 0,
"size": 10,
"query": {
"prefix": {
"empId": "ABC_123"
}
}
}
Search result
"hits": [
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"empId": "ABC_1231"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"empId": "ABC_1232"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"empId": "ABC_1233"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"empId": "ABC_1234"
}
}
]