nested count aggregations in elasticsearch - elasticsearch

I have a type in elasticsearch where each user can post any number of posts(fields being "userid" and "post").Now I need the count of users who posted 0 post,1 post,2 posts and so on....how do I do it? I think it needs some nested aggregations implemented but I don't know how to proceed. Thanks in advance !

The best way of doing this is to add a separate field to store the number of posts.
Scripts are not too efficient (values are getting re-evaluated each time a query executes) and you get the value indexed properly which makes queries and aggregations very fast.
Of course you need to be sure you update this count each time you update the document.

You can use script in aggregation:
POST index_name/type_name/_search
{
"aggs": {
"group By Post Count": {
"terms": {
"script" : "doc['post'].size()"
}
}
}
}
Make sure you enable scriptig
Hope this helps you.

Related

Elasticsearch manipulate existing field value to add new field

I try to add new field which is value comes from hashed existing field value. So, i want to do;
my_index.hashedusername(new field) = crc32(my_index.username) (existing field)
For example
POST _update_by_query
{
"query": {
"match_all": {}
},
"script" : {
"source": "ctx._source.hashedusername = crc32(ctx._source.username);"
}
}
Please give me an idea how to do this..
java.util.zip.CRC32 is not available in the shared painless API so mocking that package will be non-trivial -- perhaps even unreasonable.
I'd suggest to compute the CRC32 hashes beforehand and only then send the docs to ES. Alternatively, scroll through all your documents, compute the hash and bulk-update your documents.
The painless API was designed to perform comparatively simple tasks and CRC32 is certainly outside of its purpose.

Using term or terms with one value in Elasticsearch queries

I am querying an Elasticsearch index using the values of a field. Sometimes, I have to extract all the documents having a field set to exactly one value; Some other times I have to retrieve all the documents having a field, set with one of the values in a list of values.
The latter use case contains the former. Can I use a single query using the terms construct?
POST /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
Or, in cases I know I need to search only for a unique value, it is better to use the term construct?
POST _search
{
"query": {
"term" : { "user" : "kimchy" }
}
}
Which approach is better regarding performance? Does Elasticsearch perform any optimization if the value in the terms construct is unique?
Thanks to all.
See this link. Terms query is automatically cached while term query is not . So, the next you run the same query, the took time for query for execution will be faster. So if you have a case where you need to run the same query again and again, terms query is a good choice. If not, there is not much of difference between the two.

Nested count queries

i'm looking to add a feature to an existing query. Basically, I run a query that returns say 1000 documents. Those documents all have the same structure, only the values of certain fields vary. What i'd like, is to not only get the full list as a result, but also count how many results have a field X with the value Y, how many results have the same field X with the value Z etc...
Basically get all the results + 4 or 5 "counts" that would act like the SQL "group by", in a way.
The point of this is to allow full text search over all the clients in our database (without filtering), while showing how many of those are active clients, past clients, active prospects etc...
Any way to do this without running additional / separate queries ?
EDIT WITH ANSWER :
Aggregations is the way to go. Here's how I did it, it's so straightforward that I expected much harder work !
{
"query": {
"term": {
"_type":"client"
}
},
"aggregations" : {
"agg1" : {
"terms" : {
"field" : "listType.typeRef.keyword"
}
}
}
}
Note that it's even in a list of terms and not a single field, that's just how easy it was !
I believe what you are looking for is the aggregation query.
The documentation should be clear enough, but if you struggle please give us your ES query and we will help you from there.

Query that works on difference of dates

Consider I have a doc which has createdDate and closedDate. Now I want to find all docs where (closedDate - createdDate) > 2. I am not able to apply script in range field. Any clue how to proceed with this.
I think this may be possbile by using scripts. By isn't any way I can perform this by query.
Isn't a way to perform this like
{
"range" : {
"date" : {
"gt" : "{createdDate} - {closedDate}/d > 2"
}
}
}
The only way to do that by query is to index an additonal duration field before-hand into your JSON document. Personally I would store the duration in milliseconds and use filters for queries.
If this is not acceptable you will have to use script fields. Described here and here in the Elasticsearch docu.
IMO saving the durtion to each document is preferable, especially if you frequently use the duration for further analysis. The additional field does not cost a lot of memory, but reduces the need for calculations (and therefore is likly to speed up query time) And Especially in Elasticsearch memory shouldn't be a big issue.
Yes, you can do this via script
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "(doc.closedDate.value - doc.createdDate.value)/86400000 > 2"
}
}
]
}
}
}
Note: make sure to enable dynamic scripting in order to try this.
However, it'd be best to already compute that difference at indexing time and then use a range query on that difference field.

Elastic Search Distinct values

I want to know how it's possible to get distinct value of a field in elastic search. I read an article here shows how to do that with facets, but I read facets are deprecated:
http://elasticsearch-users.115913.n3.nabble.com/Getting-Distinct-Values-td3830953.html
Is there any other way to do that? if not is it possible to tell me how to do that? it's abit hard to understand solutions like this: Elastic Search - display all distinct values of an array
Use aggregations:
GET /my_index/my_type/_search?search_type=count
{
"aggs": {
"my_fields": {
"terms": {
"field": "name",
"size": 1000
}
}
}
}
You can use the Cardinality metric
Although the counts returned aren't guaranteed to be 100% accurate, they almost always are for low cardinality terms and the precision is configurable via the precision_threshold param.
http://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html

Resources