I have Elastic search v6.5 with an index of 300 million documents.
Documents field type keyword, example {"url": "http:/linkedin.com/435"}.
{
"size":0,
"aggs":{
"duplicateCount":{
"terms":{
"field":"url",
"min_doc_count":2
}
}
}
}
I got 0 results, then posted test_url value 2 times to the URL again, launched the query and it remains an empty set. What is the reason and is
there any way to overcome the issue?
{
"size":0,
"aggs":{
"duplicateCount":{
"terms":{
"field":"url.keyword",
"min_doc_count":2
}
}
}
}
You need to add .keyword at the end of field name.
Try this. Hope this will work.
Related
I'm trying to compose a query in Elasticsearch that filters out documents with a specific field value, and also returns the number of documents that has been filtered out as an aggregation.
What I have so far is below, however, with my solution it seems that the documents are filtered out first, then after the filtering, the count is performed, which is making it always be 0.
{
"query":{
"bool":{
"must_not":[
{
"terms":{
"gender":[
"male"
]
}
}
]
}
},
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}
You don't need a query block, just aggs will provide you expected results.
{
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}
I use this query to get search hits and the count of hits across multiple indices:
/index1,index2/_search
{
"query":{
"query_string":{
"query":"*"
}
},
"aggs":{
"group_by_index":{
"terms":{
"field":"_index",
"min_doc_count":0
}
}
},
"post_filter":{
"terms":{
"_index":"index1"
}
},
"sort":{
"my_field":"asc"
}
}
The problem is if I sort on a field (my_field) that only exist in index1, the aggregation will only give me the hits count of index1, and not index2.
I thought the aggregation would work regardless of what sorting I have specified?
Using Elasticsearch 6.4
Solved it by using unmapped_type
I have data in ES which looks like this:
'{"Emp_ID":"12212","Emp_Name":"Jim","Emp_Sal":300,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"6874590","Emp_Name":"Joe","Emp_Sal":140,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"32135","Emp_Name":"Jill","Emp_Sal":170,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"43312","Emp_Name":"Andy","Emp_Sal":450,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"315609","Emp_Name":"Cody","Emp_Sal":150,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"87346","Emp_Name":"Dave","Emp_Sal":500,"Dep_Id":55,"Dep_Name":"hr","Dep_Cnt":10}'
I want to get all the unique departments ordered by Dep_Cnt, for which I wrote the following query
{
"size":0,
"aggs":{
"by_Dep_Cnt":{
"terms":{
"field":"Dep_Cnt",
"order":{
"_term":"asc"
}
},
"aggs":{
"by_unique_dep_id":{
"terms":{
"field":"Dep_Id"
},
"aggs":{
"tops":{
"top_hits":{
"size":1
}
}
}
}
}
}
}
}
And got expected output of 3 unique departments ordered by Dep_Cnt.
But now my requirement is to get only the top two departments.
How do I modify the query to get only 2 buckets?
What you are looking for is the parameter size of the terms aggregation:
If Dep_Cnt is the number of employees in your department and your document are per employee and you have all the employee in your index (from your mapping it may be the case) you can just do:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
}
}
}
Since by default it will sort by the number of documents with the corresponding value i.e. the number of documents with this Dep_Id i.e. the number of employees in this department.
If you are not in this situation:
Your current request does not behave the same way when you have two department with the same size (you will have two Dep_Ids in the same bucket of Dep_Cnt)
You can group documents by Dep_Id, get the Dep_Cnt using the metric you want (min, max, avg, ...) and sort on this metric:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
"order":{
"avg_Dep_Cnt":"asc"
}
},
"aggs":{
"avg_Dep_Cnt":{
"avg":{
"field":"Dep_Cnt"
}
}
}
}
}
}
NB: I removed the top_hits aggregations since you do not need them according to what you explained, if you have extra requirement just add them in the aggregation.
I am trying to get extra field with aggregation. Below is the query
GET /iacmpi/_search?_source=false
{
"query": {
"match": {
"Document_Type": "INVOICEDoc"
}
},
"aggs": {
"GroupByCDMInvoiceID": {
"terms":{ "field" : "INVOICE_ID" },
"aggs":{
"LatestVersion":{
"max":{
"field":"DocVersion"
}
}
}
}
}
}
So at the level of INVOICE_ID field aggregation, i need to fetch one more field 'NAME'. I dont want that in query part as it will show me all hits and i have to traverse the hits and get a match.
Is it possible?
Thanks,
Sameer
I think top hits is what you're looking for.
When I execute the query below, how to paging the aggs results?
And is there a method to put the aggs results to hits part in json result?
POST http://myElastic.com/test/e1,e2,e3/_search
{
"aggs":{
"dedup" : {
"terms":{
"field": "id"
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
I searched a moment before found and I came across several positions during my research, so I post a new answer for people who will make the same journey as me.
We can partition the results as below:
{
"aggs":{
"group" : {
"terms":{
"field": "id",
"size":5000,
"include": {
"partition": 1,
"num_partitions": 1000
}
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
// size:5000 : return 5.000 results per page
// num_partitions:1000 : return 1.000 pages of results
// partition:1 : return page index 1 (start at 0)
// size:5000,num_partitions:1000,partition:1 : returns results from 5.000 to 9.999
// size:5000,num_partitions:1000,partition:2 : returns results from 10.000 to 14.999
// size:5000,num_partitions:1000,partition:3 : returns results from 15.000 to 19.999
Based on the below issue on the Elasticsearch github site I don't think what you are asking for is possible:
https://github.com/elastic/elasticsearch/issues/4915
Seems like a common request however. Add your own feedback and they may get around to adding it.