Elasticsearch sort search results without term - sorting

I want to retrieve all records inside a particular type and sort it using a date field. I am using this code:
{
"query":{
"filtered":{
"filter":{
"type" : {
"value" : "articles"
}
}
}
},
"from":0,
"size":10,
"sort":[
{
"date_entered":{
"order":"desc"
}
}
]
}
But, the output from this query seems to be not sorted with the specified field. The field date_entered is formatted like this: January 01, 1970 12:00 AM
How can I return all the records within a page that is sorted using the date_entered field?

Related

Elasticsearch: Query to filter out specific documents based on field value and return count

I'm trying to compose a query in Elasticsearch that filters out documents with a specific field value, and also returns the number of documents that has been filtered out as an aggregation.
What I have so far is below, however, with my solution it seems that the documents are filtered out first, then after the filtering, the count is performed, which is making it always be 0.
{
"query":{
"bool":{
"must_not":[
{
"terms":{
"gender":[
"male"
]
}
}
]
}
},
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}
You don't need a query block, just aggs will provide you expected results.
{
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}

Elastic Search query: filter aggregation by number range

I have a query like
{
"query":{
"bool":{
"must":[
{
"range":{
"created_date":{
"gte":1801301,
"lte":1807061
}
}
}
]
}
},
"aggs":{
"rating":{
"filters":{
"filters":{
"neutral":{
"match":{
"rating":0
}
},
"positive":{
"match":{
"rating":1
}
},
"negative":{
"match":{
"rating":2
}
}
}
}
}
},
"size":0
}
The query filters documents by created_date. I use date range that covers two date ranges: current and previous. Like data for this month and previous month. This is needed in other calculations(original query is much bigger).
This query works, but it calculates the rating for current and previous date ranges. I need to calculate rating in shorter date range: created_date: 1804181-1807061.
Is there a way how I can do this?
You can use
{
"range: {
"created_date": {
"gte":"now-10d/d",
"lte":"now/d"
}
}
}
I'm thinking this will help for you. Let me know if you any questions

Elasticsearch avoiding maxClauseCount error in aggregation

I have an index that contains the following documents in Elasticsearch 5.X.
It holds the string of a line of a document file as a single record.
{"file_id":"file0001", "row_id":"1", "text":"(text field to search...)"}
{"file_id":"file0001", "row_id":"2", "text":"(text field to search...)"}
{"file_id":"file0001", "row_id":"3", "text":"(text field to search...)"}
{"file_id":"file0002", "row_id":"1", "text":"(text field to search...)"}
{"file_id":"file0002", "row_id":"2", "text":"(text field to search...)"}
...Millions of documents...
And send the following query to extract the top 500 hit rows for each file.
{
"_source":[
"file_id",
"text"
],
"size":0,
"query":{
"filtered":{
"query"{
"must":{
"regexp":{
"text":".*[o2].*"
}
}
},
"filter":{
"terms":{
"file_id":[
(Thousands of file_ids...)
]
}
}
}
},
"aggs":{
"top-docs":{
"terms":{
"field":"file_id",
"size":5000
},
"aggs":{
"top_file_hits":{
"top_hits":{
"size":500,
"highlight":{
"pre_tags":["<em>"],
"post_tags":["</em>"],
"fields":{
"text":{}
}
}
}
}
}
}
}
}
Then the following error is returned.
{
"error" : {
"root_cause" : [
{
"type" : "too_many_clauses",
"reason" : "maxClauseCount is set to 1024"
I consider the aggs process heavy, but I can't think of a way not to use it.
Any ideas?

How to do a max date aggregation over the same document in Elasticsearch?

I have millions of documents with a block like this one:
{
"useraccountid": 123456,
"purchases_history" : {
"last_updated" : "Sat Apr 27 13:41:46 UTC 2019",
"purchases" : [
{
"purchase_id" : 19854284,
"purchase_date" : "Jan 11, 2017 7:53:35 PM"
},
{
"purchase_id" : 19854285,
"purchase_date" : "Jan 12, 2017 7:53:35 PM"
},
{
"purchase_id" : 19854286,
"purchase_date" : "Jan 13, 2017 7:53:35 PM"
}
]
}
}
I am trying to figure out how I can do something like:
SELECT useraccountid, max(purchases_history.purchases.purchase_date) FROM my_index GROUP BY useraccountid
I only found the max aggregation but it aggregates over all the documents in the index, but this is not what I need. I need to find the max purchase date for each document. I believe there must be a way to iterate over each path purchases_history.purchases.purchase_date of each document to identify which one is the max purchase date, but I really cannot find how to do it (if this is really the best way of course).
Any suggestion?
I assume that your field useraccountid is unique. You will have to do a terms aggregation, inside do the max aggregation. I can think of this:
"aggs":{
"unique_user_ids":{
"terms":{
"field": "useraccountid",
"size": 10000 #Default value is 10
},
"aggs":{
"max_date":{
"max":{
"field": "purchases_history.purchases.purchase_date"
}
}
}
}
}
In the aggregations field you'll see first the unique user ID and inside, their max date.
Note the 10,000 in the size. The terms aggregation is only recommended to return until 10,000 results.
If you need more, you can play with the Composite aggregation. With that, you can paginate your results and your cluster won't get performance issues.
I can think of the following if you want to play with Composite:
GET /_search
{
"aggs" : {
"my_buckets": {
"composite" : {
"size": 10000, #Default set to 10
"sources" : [
{ "user_id": { "terms": {"field": "useraccountid" } } },
{ "product": { "max": { "field": "purchases_history.purchases.purchase_date" } } }
]
}
}
}
}
After running the query, it will return a field called after_key. With that field you can paginate your results in pages of 10,000 elements. Take a look at the After parameter for the composite aggregation.
Hope this is helpful! :D

Elasticsearch get the last ten added events

I have an index with multiple types, one of these being event and I would like to get the last 10 events sorted by their start date
{
"from":0,
"size":10,
"query":{
"range":{
"start":{
"from":"2014-02-25 00:00:01 UTC",
"to":"2014-03-04 23:59:00 UTC"
}
}
},
"filter" :{
"and": [
{
"type": {
"value": "event"
}
}
]
},
"sort":[
{ "start":
{"order":"asc"}
}
]
}
I have tried variations of the above query but cannot seem to get it working, elastic-search does not apply the type filter
the filter syntax above is correct (the and is not needed).
if you are just interested in events you might as well just query their endpoint (like localhost:9200/idx/event/_search)
In fact if you want to use the 'type' in your query, you have to do use the '_type' name with the underscore. This here is an example:
POST /items/_search
{
"query": {
"match": {
"_type": "item"
}
}
}

Resources