How to have a OR search while aggregating in Elasticsearch? - elasticsearch

I have this query which gets the aggregation of the mentions field for all the tweets with the text "CAT DOG". How can I edit my query such that I get the aggregation for all the tweets containing either "cat" or "dog".
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {"text" : "CAT DOG"}
}
},
"aggs": {
"tweet": {
"terms": {
"field": "hashtags",
"size" : 50
}
}
}
}

Look to multi_match query it has option operator you should set it to or which means that any should match.
{
"size": 0,
"query": {
"multi_match" : {
"query": "CAT DOG",
"fields": [ "text"],
"operator": "or"
}
},
"aggs": {
"tweet": {
"terms": {
"field": "hashtags",
"size" : 50
}
}
}
}
Other option in your case is bool query, but you should be aware that it will search for terms it means that if you are using default analyzer it will lowercase tokens and to match them you must lowercase user input, also you are responsible to split tokens from user input.
{
"size": 0,
"query": {
"bool" : {
"should" : [
{ "term" : { "text": "cat" } },
{ "term" : { "text" : "dog" } }
],
}
}
},
"aggs": {
"tweet": {
"terms": {
"field": "hashtags",
"size" : 50
}
}
}
}

You need to use should query.
If something not clear fill free to ask.
Please attention to elasticsearch version.

Related

Elasticsearch sub-aggregation with a condition

I have the database table columns like:
ID | Biz Name | License # | Violations | ...
I need to find out those businesses that have more than 5 violations.
I have the following:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
It seems working.
Now I need to find out those businesses that have 5 or more violations(like above), and also have 3 or more license #s.
I am not sure how to further aggregate this.
Thanks!
Let's assume that your License # field is defined just like the Biz Name and has a .keyword mapping.
Now, the statement:
find the businesses that have ... 3 or more license #s
can be rephrased as:
aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.
With that being said, you can use the cardinality aggregation to get distinct License IDs.
Secondly, the mechanism for "aggregating under a condition" is the handy bucket_script aggregation which executes a script to determine whether the currently iterated bucket will be retained in the final aggregation.
Leveraging both of these in tandem would mean:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
and that's all there's to it!

How to group events by multiple terms?

How can I group by year and month? My query works if I leave 1 term, for example, Month. But I cannot group by multiple terms.
GET traffic-data/_search?
{
"size":0,
"query": {
"bool": {
"must": [
{ "match": {
"VehiclePlateNumber": "111"
}}
]
} },
"aggs" : {
"years" : {
"terms" : {
"field" : "Year"
},
"aggs" : {
"months" : { "by_month" : { "field" : "Month" } }
}
}
}
}
I think your question's query is already close, try this:
GET traffic-data/_search?
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"VehiclePlateNumber": "111"
}
}
]
}
},
"aggs": {
"years": {
"terms": {
"field": "Year",
"size": 100
},
"aggs": {
"months": {
"terms": {
"size": 12,
"field": "Month"
}
}
}
}
}
}
Edit - I am assuming your month is a string keyword field. Let me know if this is not the case (and please include the mappings) and I will revise.

elastic agfregations get uniq value where clause

I have a query in Elastic search to get unique values for specific field.
How to get Unique values using where clause.
where field1:xyz, field2:yzx etc
{
"size": 20,
"aggs" : {
"pf" : {
"terms" : { "field" : "platform" }
}
}}
I think you are looking for aggregations with filters
{
"size":0, // <-- If you are using ES 2.0 or above, setting size=0 will only return aggregations and no results
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [{
"term": {
"field1": "xyz"
}
}, {
"term": {
"field2": "abc"
}
}]
}
}
}
},
"aggregations": {
"pf": {
"terms": {
"field": "platform"
}
}
}
}

Elasticsearch one record for one matching query

I have one elasticsearch index in which I have so many records. There is a field username, I want to get latest 1 post of each username by passing comma separated values, example ::
john,shahid,mike,jolie
and I want latest 1 post of each usernames. How can I do this? I can do it by passing one username at a time but it will hit so many http requests. I want to do it in one request.
You could use a filtered terms aggregation coupled with a top_hits one in order to achieve what you need:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"username": [ "john", "shahid", "mike", "jolie" ]
}
}
}
},
"aggs": {
"usernames": {
"filter": {
"terms": {
"username": [ "john", "shahid", "mike", "jolie" ]
}
},
"aggs": {
"by_username": {
"terms": {
"field": "username"
},
"aggs": {
"top1": {
"top_hits": {
"size": 1,
"sort" : {"created_date" : "desc"}
}
}
}
}
}
}
}
}
This query can give you all the posts of these 4 ids sorted by post_date in descending order. You can process on that data to get the result.
{
"sort" : [
{ "post_date" : {"order" : "desc"}}
],
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"username" : ["john","shahid","mike","jolie]
}
}
}
}
}

Count how many documents have an attribute or are missing that attribute in Elasticsearch

How can I write a single Elasticsearch query that will count how many documents either have a value for a field or are missing that field?
This query successfully count the docs missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field" : {
"missing": { "field": "group_doc_groupset_id" }
}
}
}
This query does the opposite, counting documents NOT missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Not_Missing_Field" : {
"exists": { "field": "group_doc_groupset_id" }
}
}
}
How can I write one that combines both? For example, this yields a syntax error:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field_Or_Not" : {
"missing": { "field": "group_doc_groupset_id" },
"exists": { "field": "group_doc_groupset_id" }
}
}
}
GET indexname/_search?size=0
{
"aggs": {
"a1": {
"missing": {
"field": "status"
}
},
"a2": {
"filter": {
"exists": {
"field": "status"
}
}
}
}
}
As per new Elastic search recommendation in the docs:
GET {your_index_name}/_search #or _count, to see just the value
{
"query": {
"bool": {
"must_not": { # here can be also "must"
"exists": {
"field": "{field_to_be_searched}"
}
}
}
}
}
Edit: _count allows to have exact values of how many documents are indexed. If there're more than 10k the total is shown as:
"hits" : {
"total" : {
"value" : 10000, # 10k
"relation" : "gte" # Greater than
}

Resources