ElasticSearch aggregation over inner object - elasticsearch

In my ES I've a schema type like this:
{
"index_v1":{
"mappings":{
"fuas":{
"properties":{
"comment":{
"type":"string"
},
"matter":{
"type":"string"
},
"metainfos":{
"properties":{
"department":{
"type":"string"
},
"processos":{
"type":"string"
}
}
}
}
}
}
}
}
Shortly, fuas type has two properties comment and matter and an inner (not nested) object metainfos with several properties department and processos.
I'd like to know how many metainfos' fields are informed with its number of occurrences.
Imagine a document doc1 with metainfos: {department: "d1"} and a doc2 with metainfos: {department: "d2", processos: "p1"}.
Then I'd like to get: {department: 2, processos: 1}.
EDIT
As a inner object and since ES is schemaless documents' metainfos inner objects can have several fields informed or not.
So, doc1's metainfos {field1: 1, field3: 3} and doc2's metainfos {field2: 1, field4: 5} and doc3's metainfos {field1:2, field4: 2, field5: 1}.
I'd like to get: {field1: 2, field2: 1, field3: 1, field4: 2, field5: 1}. I think the main issue to solve it is how I'm able to ask for fields I don't know exist.
I've tested with two documents:
{
"hits":{
"total":2,
"max_score":1.0,
"hits":[
{
"_source":{
"matter":"FUA2",
"comment":null,
"metainfos":[
{
"department":"d1"
}
]
}
},
{
"_source":{
"matter":"FUA1",
"comment":"vcvcvc",
"metainfos":[
{
"department":"d1"
},
{
"processos":"p1"
}
]
}
}
]
}
}
I've tested this with this command:
curl -XGET 'http://localhost:9201/living_team/fuas/_search?pretty' -d '
{
"size": 0,
"aggregations" : {
"followUpActivity.metainfo.department" : {
"terms" : {
"field" : "metainfos.*"
}
}
}
}
'
The results have been:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"followUpActivity.metainfo.department" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
}

You can use the value_count aggregation for this:
{
"size": 0,
"aggs" : {
"dept" : {
"value_count" : { "field" : "metainfos.department" }
},
"proc" : {
"value_count" : { "field" : "metainfos.processos" }
}
}
}

You need to use nested fields as otherwise your inner fields are not seen "together" in the metainfos object.
See here: ElasticSearch aggregation over inner object

Related

Elasticsearch Aggregation most common list of integers

I am looking for elastic search aggregation + mapping
that will return the most common list for a certain field.
For example for docs:
{"ToneCurvePV2012": [1,2,3]}
{"ToneCurvePV2012": [1,5,6]}
{"ToneCurvePV2012": [1,7,8]}
{"ToneCurvePV2012": [1,2,3]}
I wish for the aggregation result:
[1,2,3] (since it appears twice).
so far any aggregation that i made would return: 1
This is not possible with default terms aggregation. You need to use terms aggregation with script. Please note that this might impact your cluster performance.
Here, i have used script which will create string from array and used it for aggregation. so if you have array value like [1,2,3] then it will create string representation of it like '[1,2,3]' and that key will be used for aggregation.
Below is sample query you can use to generate aggregation as you expected:
POST index1/_search
{
"size": 0,
"aggs": {
"tone_s": {
"terms": {
"script": {
"source": "def value='['; for(int i=0;i<doc['ToneCurvePV2012'].length;i++){value= value + doc['ToneCurvePV2012'][i] + ',';} value+= ']'; value = value.replace(',]', ']'); return value;"
}
}
}
}
}
Output:
{
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"tone_s" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "[1,2,3]",
"doc_count" : 2
},
{
"key" : "[1,5,6]",
"doc_count" : 1
},
{
"key" : "[1,7,8]",
"doc_count" : 1
}
]
}
}
}
PS: key will be come as string and not as array in aggregation response.

SQL aggregation query corresponding in elasticsearch

I studied elasticsearch aggregation queries but couldn't find if it supports multiple aggregate function. In an other word, I wanna know if elasticsearch can generate the equivalent of this Sql aggregation query:
SELECT account_no, transaction_type, count(account_no), sum(amount), max(amount) FROM index_name GROUP BY account_no, transaction_type Having count(account_no) > 10
If yes, how?
Thank you.
There are two possible ways to do what you are looking for in ES and I've mentioned them both below.
I've also added sample mapping and sample documents for your reference.
Mapping:
PUT index_name
{
"mappings": {
"mydocs":{
"properties":{
"account_no":{
"type": "keyword"
},
"transaction_type":{
"type": "keyword"
},
"amount":{
"type":"double"
}
}
}
}
}
Sample Documents:
Notice carefully, I'm only creating list of 4 transactions for 1 customer.
POST index_name/mydocs/1
{
"account_no": "1011",
"transaction_type":"credit",
"amount": 200
}
POST index_name/mydocs/2
{
"account_no": "1011",
"transaction_type":"credit",
"amount": 400
}
POST index_name/mydocs/3
{
"account_no": "1011",
"transaction_type":"cheque",
"amount": 100
}
POST index_name/mydocs/4
{
"account_no": "1011",
"transaction_type":"cheque",
"amount": 100
}
There are two ways to get what you are looking for:
Solution 1: Using Elasticsearch Query DSL
Aggregation Query:
For Aggregation Query DSL, I've made use of the below aggregation queries to solve what you are looking for.
Terms Aggregation
Sum Aggregation Query (Metric Aggregation)
Max Aggregation Query (Metric Aggregation)
Below is how query is summarised version of the query so that you get the clarity on which queries are sibling and which are parents.
- Terms Aggregation (For Every Account)
- Terms Aggregation (For Every Transaction_type)
- Sum Amount
- Max Amount
Below is the actual query:
POST index_name/_search
{
"size": 0,
"aggs": {
"account_no_agg": {
"terms": {
"field": "account_no"
},
"aggs": {
"transaction_type_agg": {
"terms": {
"field": "transaction_type",
"min_doc_count": 2
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
},
"max_amount":{
"max": {
"field": "amount"
}
}
}
}
}
}
}
}
Important thing to mention is min_doc_count which is nothing but the having count(account_no)>10, which in my query I'm filtering only those transactions with having count(account_no) > 2
Query Response
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"account_no_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1011", <---- account_no
"doc_count" : 4, <---- count(account_no)
"transaction_type_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "cheque", <---- transaction_type
"doc_count" : 2,
"sum_amount" : { <---- sum(amount)
"value" : 200.0
},
"max_amount" : { <---- max(amount)
"value" : 100.0
}
},
{
"key" : "credit", <---- another transaction_type
"doc_count" : 2,
"sum_amount" : { <---- sum(amount)
"value" : 600.0
},
"max_amount" : { <---- max(amount)
"value" : 400.0
}
}
]
}
}
]
}
}
}
Notice the above result carefully, I've added comments wherever required so that it helps what part of sql query you are looking for.
Solution 2: Using Elasticsearch SQL(_xpack solution)
If you are making use of xpack feature of Elasticsearch's SQL Access, you can simply copy paste the SELECT Query as below for the mapping and document as mentioned above:
Elasticsearch SQL:
POST /_xpack/sql?format=txt
{
"query": "SELECT account_no, transaction_type, sum(amount), max(amount), count(account_no) FROM index_name GROUP BY account_no, transaction_type HAVING count(account_no) > 1"
}
Elasticsearch SQL Result:
account_no |transaction_type| SUM(amount) | MAX(amount) |COUNT(account_no)
---------------+----------------+---------------+---------------+-----------------
1011 |cheque |200.0 |100.0 |2
1011 |credit |600.0 |400.0 |2
Note that I've tested the query in ES 6.5.4.
Hope this helps!

Elasticsearch, query array

Suppose I have the following data:
{"field": [{"type": "A"}, {"type": "B"}]},
{"field": [{"type": "B"}]}
How do you construct a query in Elasticsearch to get the count of all records with a specific field type value, given field is an array?
You can use the Count API, with the following query
Query:
GET /index/index_type/_count
{
"query" : {
"term" : { "field.type" : "A" }
}
}
Response:
{
"count" : <number of docs>,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

ElasticSearch how to use boost

This query works great, but it's returning too many results. I would like to add the boost function but I don't know the proper syntax.
$data_string = '{
"from" : 0, "size" : 100,
"sort" : [
{ "date" : {"order" : "desc"} }
],
"query": {
"more_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1
}
}
}
}';
Found the solution. Looks like using fuzzy_like_this_field and min_similarity is the way to go.
$data_string = '{
"from" : 0, "size" : 100,
"query": {
"fuzzy_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_similarity": 0.9
}
}
}
}';
According to the docs, you just need to add it to the other parameters:
...
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1,
"boost": 1.0
}
...
Also, if you have too many docs, you can try to increase the min_term_freq and the min_doc_freq, too.

ElasticSearch doesn't seem to support array lookups

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.
Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

Resources