Return documents starting from ID, sorted by timestamp - elasticsearch

Say I have an index with the following documents:
{
"id": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e",
"timestamp": "2022-10-18T00:00:02"
}
{
"id": "0ebeb7b1-dcd0-4b37-a70d-fa7377f07f8c",
"timestamp": "2022-10-18T00:00:03"
}
{
"id": "ea779299-1781-4465-b8a1-53f7b14fbe0c",
"timestamp": "2022-10-18T00:00:01"
}
{
"id": "3624a119-4830-4ec2-a840-f656c048fc5c",
"timestamp": "2022-10-18T00:00:04"
}
I need a search query that returns documents from a specified id, sorted by timestamp up to a limit (say 100). So given the id of 8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e, the following documents will be returned (in this exact order, note that document with id ea779299-1781-4465-b8a1-53f7b14fbe0c is missing because its timestamp is earlier than the document I'm looking for):
{
"id": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e",
"timestamp": "2022-10-18T00:00:02"
}
{
"id": "0ebeb7b1-dcd0-4b37-a70d-fa7377f07f8c",
"timestamp": "2022-10-18T00:00:03"
}
{
"id": "3624a119-4830-4ec2-a840-f656c048fc5c",
"timestamp": "2022-10-18T00:00:04"
}
I know how to do this in two queries by first getting the document by its id, and then another query to get all the documents "after" that document's timestamp, but I'm hopeful there's a more efficient way to do this using one single query?
Note that the index is expected to have tens/hundreds of millions of documents, so performance concerns are a factor (I'm unsure what "work" ES is doing under the covers, such as sorting first and then visiting each document to check the id), although the cluster will be sized appropriately.

You can use below bool query which will give you your expected result. match_all inside must will return all the documents and term inside should clause will boost the document where ID is matching.
If your id field is defined as keyword type then use id only in term query and if it is defined as text and keyword both then use id.keyword.
{
"size": 100,
"sort": [
{
"_score": "desc"
},
{
"timestamp": {
"order": "asc"
}
}
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"id.keyword": {
"value": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e"
}
}
}
]
}
}
}

Related

Retrieve documents based on an element inside an array in each document

I am working on an elastic query which need to return all the documents based on an attribute which is inside the first element in an array in the document. Please refer the document structure below.
User document
{
"name": "Sam",
"age": 20,
"vehicle": [
{
"type": "car",
"capacity": 4,
"registration": {
"date": "20.02.2020",
"plate": null
}
}
]
}
Every document only having 1 element in vehicle array.
Above is a sample document in my ES and the requirement is to get all the documents which has "null" value for plate attribute (similar to the example) in collection. I have tried various queries for two days but non of them got succeeded and got errors in query. What's the solution to this?
My suggestion is check any field exists inside field "vehicle" for example: field "type".
{
"query": {
"nested": {
"path": "vehicle",
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "vehicle.registration.plate"
}
}
]
}
}
}
}
}

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How to aggregate on the same field:value which are specified in query in elasticsearch

So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like
`{
"dummy_name": "abc",
"dummy_id": "44850642"
}`
which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.
`{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
},
{
}
]
}
},
{
"person_id": 1235,
.........
}`
Query Iam using:
`{
"query": {
"bool": {
"must": [
{
"match": {
"Properties.Property1.dummy_id": "453041 23234324 124324 "
}
}
]
}
},
"aggregations": {
"group_by_concept": {
"terms": {
"field": "Properties.Property1.dummy_id",
"order": {
"_count": "desc"
},
"size": 10
}
}
}
}`
The problem which is coming is how are you keeping the data.
For eg In this document
{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
"dummy_name": "dfg",
"dummy_id": "876468"
},
{
}
]
}
}
The tokens that would be generated in this document would be
Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene
So when you would query for dummy_id:44850642
you would get the document, but aggregations aggregates on terms produced by the documents matching the query
So as a result you would see buckets of 44850642 as well as 876468.
For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

Needs to return only the matched nested objects with full parent body in Elasticsearch

I am using Elastic search version 1.7 in my project. I have a an index named colleges and under this index there is a nested index name courses like this.
{
"name": "College Name"
"university": "University Name",
"city": 429,
"city_name": "London",
"state": 328,
"state_name": "London",
"courses": [
{
"id": 26,
"degree_name": "Master Of Technology",
"annual_fee": 100000,
"stream": "Engineering",
"degree_id": 9419
},
{
"id": 28,
"degree_name": "Master Of Philosophy",
"annual_fee": 100000,
"stream": "Philosophy",
"degree_id": 9420
}
]
}
What I am doing is that I am trying to filter the the colleges based on state and degree_id which is nested under courses provided by the College. I want to return the full body or all the fields of the parent object i.e colleges and only those courses which matches the query.
The query I return to accomplish the task is
{
"_source": false,
"query": {
"bool": {
"must": [
{
"term": {
"state": "328"
}
},
{
"nested": {
"path": "courses",
"query": {
"term": {
"courses.degree_id": 9419
}
}
}
}
]
}
}
}
This query is working fine and returning me only those nested objects which matches the query but the wrong with this query is that i declared "_source":false in the parent object.
if I declare"_source": truethen it returns me all the nested objects whether they meets the query or not. And the second way the query is working fine is to declare field names in"_source": ["field1", "field2", .... "field100"]. But I have about 50 fields in the parent or colleges index. So i don't want to declare all the field names in_source. Is there any other way to accomplish this without declaring all the field names in_source`.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources