Elasticsearch range with aggs - elasticsearch

i want average rating of every user document but is not working according to me.please check the code given below.
curl -XGET 'localhost:9200/mentorz/users/_search?pretty' -H 'Content-Type: application/json' -d'
{"aggs" : {"avg_rating" : {"range" : {"field" : "rating","ranges" : [{ "from" : 3, "to" : 19 }]}}}}';
{ "_index" : "mentorz", "_type" : "users", "_id" : "555", "_source" : { "name" : "neeru", "user_id" : 555,"email_id" : "abc#gmail.com","followers" : 0,
"following" : 0, "mentors" : 0, "mentees" : 0, "basic_info" : "api test info",
"birth_date" : 1448451985397,"charge_price" : 0,"org" : "cz","located_in" : "noida", "position" : "sw developer", "exp" : 7, "video_bio_lres" : "test bio lres url normal signup","video_bio_hres" : "test bio hres url normal signup", "rating" : [ 5 ,4], "expertises" : [ 1, 4, 61, 62, 63 ] }
this is my user document,i want to filter only those users who have average rating range from 3 to 5.

Update Answer
I've made a query using script, hope the below query works for you.
GET mentorz/users/_search
{
"size": 0,
"aggs": {
"term": {
"terms": {
"field": "user.keyword",
"size": 100
},
"aggs": {
"NAME": {
"terms": {
"field": "rating",
"size": 10,
"script": {
"inline": "float var=0;float count=0;for(int i = 0; i < params['_source']['rating'].size();i++){var=var+params['_source']['rating'][i];count++;} float avg = var/count; if(avg>=4 && avg<=5) {avg}else{null}"
}
}
}
}
}
}
}
You can change the range of your desired rating range by changing the if condition "if(avg>=4 && avg<=5)".

Related

Elastic search query to find Total Login User count

Total Distinct Login User Cumulative count - Column ( User ID)
Total Accounts searched User Cumulative count (Account ID)
I Tried Below queries but not getting the Count
GET /Logstach/_Search_Tracking/_count?q=user:userId
GET /Logstach/_Search_Tracking/_count?q=user:AccountId
Below output got
{
"count" : 0,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
}
}
Below is the Index Mapping details
"_index" : "ABC",
"_type" : "_doc",
"_id" : "08c28b9c-dd07-47c0-8243-3afc4fe89c08",
"_score" : 1.0,
"_source" : {
"user" : {
"userId" : "A123",
"userCategory" : "Outside",
"accountId" : "ABC58"
},
You need to query like this: ...?q=user.userId:A123 and ...?q=user.accountId:ABC58 because both fields are located within the user object
GET /Logstach/_Search_Tracking/_count?q=user.userId:A123
^
|
add this
GET /Logstach/_Search_Tracking/_count?q=user.accountId:ABC58
^
|
add this
Now if you want the distinct number of users that have logged in, yu need to use the cardinality aggregation
GET /Logstach/_Search_Tracking/_search
{
"size": 0,
"aggs": {
"distinct_loggedin": {
"cardinality": {
"field": "user.userId"
}
}
}
}
Same for the total number of accounts searched:
GET /Logstach/_Search_Tracking/_search
{
"size": 0,
"aggs": {
"total_accounts": {
"cardinality": {
"field": "user.accountId"
}
}
}
}

I would like to combine the duplicate values in Elasticsearch into one and see the results with a different filter

I'm collecting logs through Elastic Search. The logs are collected as below.
ex.
{
"name" : "John"
"team" : "IT"
"startTime" : "21:00"
"result" : "pass"
},
{
"name" : "James"
"team" : "HR"
"startTime" : "21:04"
"result" : "pass"
},
{
"name" : "Paul"
"team" : "IT"
"startTime" : "21:05"
"result" : "pass"
},
{
"name" : "Jackson"
"team" : "Marketing"
"startTime" : "21:30"
"result" : "fail"
},
{
"name" : "John"
"team" : "IT"
"startTime" : "21:41"
"result" : "pass"
},
.....and so on
If you run the query below on these collected logs,
GET logData/_search
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
}
}
}
}
The following results will be exposed.
"aggregations" : {
"Documents_per_team" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "IT",
"doc_count" : 70
},
{
"key" : "Marketing",
"doc_count" : 55
},
{
"key" : "HR",
"doc_count" : 11
}
]
}
}
}
What I want is to eliminate duplication if the name of the document is duplicated in this result.
[AS-IS]
As shown above, the IT team count is exposed to 70
[The result I want]
if John performed 50 times, Kate performed 10 times, Paul performed 10 times, the IT team count 3 is exposed. (Because there are three of IT team member)
Can I get a team-by-team result after removing duplicates?
Thanks
You've got two options:
a cardinality sub-aggregation (straightforward, but approximate and not very scalable, albeit only in very specific/advanced situations)
or a scripted metric aggregation (slower, more verbose but exact).
Both approaches assume that the names are unique per team-level. If they're not, you'll need to adjust accordingly. Also, it is assumed that the name is mapped to be of type keyword, just like the team. If not, you'll need to replace them with your_field.keyword
1. Cardinality
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
},
"aggs": {
"unique_names_per_team": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
2. Scripted Metric
{
"size": 0,
"aggs": {
"Documents_per_team": {
"scripted_metric": {
"init_script": "state.by_department = [:]; state.dept_vs_name = [:];",
"map_script": """
def dept = doc['team'].value;
def name = doc['name'].value;
def name_already_considered = state.by_department.containsKey(dept) && state.dept_vs_name[dept].containsKey(name);
if (name_already_considered) {
return;
}
if (state.by_department.containsKey(dept)) {
state.by_department[dept] += 1;
} else {
state.by_department[dept] = 1
}
if (!state.dept_vs_name.containsKey(dept)) {
// init new map & set is first member
state.dept_vs_name[dept] = [name:true];
} else if (!state.dept_vs_name[dept].containsKey(name)) {
state.dept_vs_name[dept][name] = true;
}
""",
"combine_script": "return state.by_department",
"reduce_script": "return states"
}
}
}
}
Note: If you also wish to see the underlying dept vs. name breakdown, you can modify the combine_script to return the whole state, i.e. return state.

Elasticsearch - group by day of week and hour

I need to do get some data grouped by day of week and hour, for example
curl -XGET http://localhost:9200/testing/hello/_search?pretty=true -d '
{
"size": 0,
"aggs": {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "hour",
"format": "E - k"
}
}
}
}
'
Gives me this:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2857,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"articles_over_time" : {
"buckets" : [ {
"key_as_string" : "Fri - 17",
"key" : 1391792400000,
"doc_count" : 6
},
...
{
"key_as_string" : "Wed - 22",
"key" : 1411596000000,
"doc_count" : 1
}, {
"key_as_string" : "Wed - 22",
"key" : 1411632000000,
"doc_count" : 1
} ]
}
}
}
Now I need to summarize doc counts by this value "Wed - 22", how can I do this?
Maybe some another approach?
The same kind of problem has been solved in this thread.
Adapting the solution to your problem, we need to make a script to convert the date into the hour of day and day of week:
Date date = new Date(doc['date'].value) ;
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');
format.format(date)
And use it in a query:
{
"aggs": {
"perWeekDay": {
"terms": {
"script": "Date date = new Date(doc['date'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');format.format(date)"
}
}
}
}
You can try doing terms aggregation on "key_as_string" field from the aggregation results using sub aggregation.
Hope that helps.
This is because you are using an interval of 'hour', but, the date format is 'day' (E - k).
Change your interval to be 'day', and you'll no longer get separate buckets for 'Weds - 22'.
Or, if you do want per hour, then change your format to include the hour field.

Show all Elasticsearch aggregation results/buckets and not just 10

I'm trying to list all buckets on an aggregation, but it seems to be showing only the first 10.
My search:
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 0,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw"
}
}
}
}'
Returns:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 16920,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"bairro_count" : {
"buckets" : [ {
"key" : "Barra da Tijuca",
"doc_count" : 5812
}, {
"key" : "Centro",
"doc_count" : 1757
}, {
"key" : "Recreio dos Bandeirantes",
"doc_count" : 1027
}, {
"key" : "Ipanema",
"doc_count" : 927
}, {
"key" : "Copacabana",
"doc_count" : 842
}, {
"key" : "Leblon",
"doc_count" : 833
}, {
"key" : "Botafogo",
"doc_count" : 594
}, {
"key" : "Campo Grande",
"doc_count" : 456
}, {
"key" : "Tijuca",
"doc_count" : 361
}, {
"key" : "Flamengo",
"doc_count" : 328
} ]
}
}
}
I have much more than 10 keys for this aggregation. In this example I'd have 145 keys, and I want the count for each of them. Is there some pagination on buckets? Can I get all of them?
I'm using Elasticsearch 1.1.0
The size param should be a param for the terms query example:
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 0,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw",
"size": 10000
}
}
}
}'
Use size: 0 for ES version 2 and prior.
Setting size:0 is deprecated in 2.x onwards, due to memory issues inflicted on your cluster with high-cardinality field values. You can read more about it in the github issue here .
It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.
How to show all buckets?
{
"size": 0,
"aggs": {
"aggregation_name": {
"terms": {
"field": "your_field",
"size": 10000
}
}
}
}
Note
"size":10000 Get at most 10000 buckets. Default is 10.
"size":0 In result, "hits" contains 10 documents by default. We don't need them.
By default, the buckets are ordered by the doc_count in decreasing order.
Why do I get Fielddata is disabled on text fields by default error?
Because fielddata is disabled on text fields by default. If you have not wxplicitly chosen a field type mapping, it has the default dynamic mappings for string fields.
So, instead of writing "field": "your_field" you need to have "field": "your_field.keyword".
If you want to get all unique values without setting a magic number (size: 10000), then use COMPOSITE AGGREGATION (ES 6.5+).
From official documentation:
"If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the COMPOSITE AGGREGATION which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination."
Implementation example in JavaScript:
const ITEMS_PER_PAGE = 1000;
const body = {
"size": 0, // Returning only aggregation results: https://www.elastic.co/guide/en/elasticsearch/reference/current/returning-only-agg-results.html
"aggs" : {
"langs": {
"composite" : {
"size": ITEMS_PER_PAGE,
"sources" : [
{ "language": { "terms" : { "field": "language" } } }
]
}
}
}
};
const uniqueLanguages = [];
while (true) {
const result = await es.search(body);
const currentUniqueLangs = result.aggregations.langs.buckets.map(bucket => bucket.key);
uniqueLanguages.push(...currentUniqueLangs);
const after = result.aggregations.langs.after_key;
if (after) {
// continue paginating unique items
body.aggs.langs.composite.after = after;
} else {
break;
}
}
console.log(uniqueLanguages);
Increase the size(2nd size) to 10000 in your term aggregations and you will get the bucket of size 10000. By default it is set to 10.
Also if you want to see the search results just make the 1st size to 1, you can see 1 document, since ES does support both searching and aggregation.
curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
"size": 1,
"aggregations": {
"bairro_count": {
"terms": {
"field": "bairro.raw",
"size": 10000
}
}
}
}'

ElasticSearch how to use boost

This query works great, but it's returning too many results. I would like to add the boost function but I don't know the proper syntax.
$data_string = '{
"from" : 0, "size" : 100,
"sort" : [
{ "date" : {"order" : "desc"} }
],
"query": {
"more_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1
}
}
}
}';
Found the solution. Looks like using fuzzy_like_this_field and min_similarity is the way to go.
$data_string = '{
"from" : 0, "size" : 100,
"query": {
"fuzzy_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_similarity": 0.9
}
}
}
}';
According to the docs, you just need to add it to the other parameters:
...
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1,
"boost": 1.0
}
...
Also, if you have too many docs, you can try to increase the min_term_freq and the min_doc_freq, too.

Resources