Problem This aggregation gives all 'windows' but it is case sensitive. How to do a case insensitive search?
GET /record_new/_search
{"size":0,
"aggs" : {
"software_tags" : {
"terms" : {
"field" : "software_tags.keyword",
"include" : ".*Windows.*",
"size" : 10000,
"order" : { "_term" : "asc" }
}
}
}
}
Mapping
{
"record_new": {
"mappings": {
"record_new": {
"software_tags": {
"full_name": "software_tags",
"mapping": {
"software_tags": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": true
}
}
}
}
}
}
}
Response
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5706542,
"max_score": 0,
"hits": []
},
"aggregations": {
"software_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Bloc-notes (Windows)",
"doc_count": 1
},
{
"key": "Windows CE",
"doc_count": 8
},
{
"key": "Windows CE 5.0",
"doc_count": 2
},
{
"key": "Windows Calculator",
"doc_count": 33
},
{
"key": "Windows Communication Foundation",
"doc_count": 43
},
{
"key": "Windows Contacts",
"doc_count": 1
},
{
"key": "Windows DVD Maker",
"doc_count": 3
},
{
"key": "Windows Defender",
"doc_count": 409
},
{
"key": "Windows Desktop Gadgets",
"doc_count": 14
},
{
"key": "Windows Desktop Update",
"doc_count": 33
},
{
"key": "Windows Display Driver Model",
"doc_count": 64
},
{
"key": "Windows DreamScene",
"doc_count": 5
},
{
"key": "Windows Driver Frameworks",
"doc_count": 1
},
{
"key": "Windows Driver Kit",
"doc_count": 12
},
{
"key": "Windows Driver Model",
"doc_count": 99
},
{
"key": "Windows Easy Transfer",
"doc_count": 3
},
{
"key": "Windows Embedded Automotive",
"doc_count": 1
},
{
"key": "Windows Embedded CE 6.0",
"doc_count": 7
},
{
"key": "Windows Embedded Compact",
"doc_count": 361
},
{
"key": "Windows Embedded Compact 7",
"doc_count": 1
},
{
"key": "Windows Embedded Industry",
"doc_count": 2
},
{
"key": "Windows Essential Business Server 2008",
"doc_count": 2
},
{
"key": "Windows Essentials",
"doc_count": 13
},
{
"key": "Windows Filtering Platform",
"doc_count": 1
},
{
"key": "Windows Firewall",
"doc_count": 588
},
{
"key": "Windows Fundamentals for Legacy PCs",
"doc_count": 21
},
{
"key": "Windows Genuine Advantage",
"doc_count": 60
},
{
"key": "Windows Home Server",
"doc_count": 7
},
{
"key": "Windows Image Acquisition",
"doc_count": 1
},
{
"key": "Windows Insider",
"doc_count": 10
},
{
"key": "Windows Installer",
"doc_count": 562
},
{
"key": "Windows Internal Database",
"doc_count": 2
},
{
"key": "Windows IoT",
"doc_count": 132
},
{
"key": "Windows Live Mail",
"doc_count": 117
},
{
"key": "Windows Live Mesh",
"doc_count": 1
},
{
"key": "Windows Live Messenger",
"doc_count": 1595
},
{
"key": "Windows Live OneCare",
"doc_count": 18
},
{
"key": "Windows Live OneCare Safety Scanner",
"doc_count": 1
},
{
"key": "Windows Live Spaces",
"doc_count": 1
},
{
"key": "Windows Live Toolbar",
"doc_count": 4
},
{
"key": "Windows ME",
"doc_count": 1055
},
{
"key": "Windows Management Instrumentation",
"doc_count": 289
},
{
"key": "Windows Marketplace",
"doc_count": 4
},
{
"key": "Windows Media",
"doc_count": 168
},
{
"key": "Windows Mobile",
"doc_count": 439
},
{
"key": "Windows SideShow",
"doc_count": 1
},
{
"key": "Windows SteadyState",
"doc_count": 6
},
{
"key": "Центр обновления Windows",
"doc_count": 2
}
]
}
}
}
I think you are doing this completely wrong. Searching and getting unique values are different things. How about the following approach?
Note, that I used slightly different settings for the aggregation and I added a query.
GET record_new/_search
{
"size": 0,
"query": {
"term": {
"software_tags": {
"value": "windows"
}
}
},
"aggs": {
"software_tags": {
"terms": {
"field": "software_tags.keyword",
"include" : ".*Windows.*",
"size": 10000,
"order": {
"_count": "desc"
}
}
}
}
}
Related
I am indexing some events and trying to get unique hours but the terms aggregation is giving weird response . I have the following query.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"City": [
"Chicago"
]
}
},
{
"range": {
"eventDate": {
"gte": "2018-06-22",
"lte": "2018-06-22"
}
}
}
]
}
},
"aggs": {
"Hours": {
"terms": {
"script": "doc['eventDate'].date.getHourOfDay()"
}
}
}
}
This query produces following response.
"buckets": [
{
"key": "19",
"doc_count": 12
},
{
"key": "9",
"doc_count": 7
},
{
"key": "15",
"doc_count": 4
},
{
"key": "16",
"doc_count": 4
},
{
"key": "20",
"doc_count": 4
},
{
"key": "12",
"doc_count": 2
},
{
"key": "6",
"doc_count": 2
},
{
"key": "8",
"doc_count": 2
},
{
"key": "10",
"doc_count": 1
},
{
"key": "11",
"doc_count": 1
}
]
Now I changed the range to get the events for past one month
{
"range": {
"eventDate": {
"gte": "2018-05-22",
"lte": "2018-06-22"
}
}
}
and the response I got was
"Hours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1319,
"buckets": [
{
"key": "22",
"doc_count": 805
},
{
"key": "14",
"doc_count": 370
},
{
"key": "15",
"doc_count": 250
},
{
"key": "21",
"doc_count": 248
},
{
"key": "16",
"doc_count": 195
},
{
"key": "0",
"doc_count": 191
},
{
"key": "13",
"doc_count": 176
},
{
"key": "3",
"doc_count": 168
},
{
"key": "20",
"doc_count": 159
},
{
"key": "11",
"doc_count": 148
}
]
}
As you can see I got buckets with key 6,8,9,10 and 12 in the response of first query but not in the second query which is very strange as documents returned by first query is a small subset of the second query. Is this a bug or am I missing something obvious?
Thanks
I want to apply filter after aggregate query. For example, with the below aggregate query, I want to get only those entries where we have all the windows.
Note: we do not have to use include because it uses regular expression which is time consuming and we cannot ignore the case.
Query:
GET /record_new/_search
{"size":0, "aggs" : {
"software_tags" : {
"terms" : {
"field" : "software_tags.keyword",
"size" : 100
}
}
}
}
Response:
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5706542,
"max_score": 0,
"hits": []
},
"aggregations": {
"software_tags": {
"doc_count_error_upper_bound": 5514,
"sum_other_doc_count": 581800,
"buckets": [
{
"key": "Microsoft Windows",
"doc_count": 70641
},
{
"key": "Bitcoin",
"doc_count": 35423
},
{
"key": "Linux",
"doc_count": 33230
},
{
"key": "ICQ",
"doc_count": 21934
},
{
"key": "PHP",
"doc_count": 20562
},
{
"key": "Windows XP",
"doc_count": 19720
},
{
"key": "Android (operating system)",
"doc_count": 17774
},
{
"key": "C++",
"doc_count": 14792
},
{
"key": "Pretty Good Privacy",
"doc_count": 14307
},
{
"key": "Tor (anonymity network)",
"doc_count": 14110
}
]
}
}
}
I tried to do filter as well but I am not getting incorrect output. In output we are getting linux as well. I don't know what is happening here.
GET /record_new/_search
{"size":0, "query": {
"constant_score": {
"filter":
{ "term": { "software_tags": "windows" }}
}
}, "aggs" : {
"software_tags" : {
"terms" : {
"field" : "software_tags.keyword",
"size" : 10
}
}
}
}
Output:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 93181,
"max_score": 0,
"hits": []
},
"aggregations": {
"software_tags": {
"doc_count_error_upper_bound": 1640,
"sum_other_doc_count": 171831,
"buckets": [
{
"key": "Microsoft Windows",
"doc_count": 70641
},
{
"key": "Windows XP",
"doc_count": 19720
},
{
"key": "Windows 7",
"doc_count": 12692
},
{
"key": "Linux",
"doc_count": 12311
},
{
"key": "Windows Vista",
"doc_count": 10172
},
{
"key": "Windows NT",
"doc_count": 5417
},
{
"key": "Windows Registry",
"doc_count": 5055
},
{
"key": "Windows 8",
"doc_count": 4829
},
{
"key": "Windows 2000",
"doc_count": 4738
},
{
"key": "Windows 10",
"doc_count": 4611
}
]
}
}
}
Try this query, it should look for records with windows in the software_tag:
{
"size":0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "software_tags: *windows* AND NOT *linux* AND NOT *<next OS name to exclude>*",
"analyze_wildcard": true
}
}
]
}
}, "aggs" : {
"software_tags" : {
"terms" : {
"field" : "software_tags.keyword",
"size" : 10
}
}
}
}
It might be a bit slower than the usual queries but thats because of the wildcard character in the query.
In the query below, occasionally I receive a "NaN" response (see the response below the query).
I'm assuming that, occasionally, some invalid data gets in to the "amount" field (the one being aggregated). If that is a valid assumption, how can I find those documents with the invalid "amount" fields so I can troubleshoot them?
If that's not a valid assumption, how do I troubleshoot the occasional "NaN" value being returned?
REQUEST:
POST /_msearch
{
"search_type": "query_then_fetch",
"ignore_unavailable": true,
"index": [
"view-2017-10-22",
"view-2017-10-23"
]
}
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"handling-time": {
"gte": "1508706273585",
"lte": "1508792673586",
"format": "epoch_millis"
}
}
},
{
"query_string": {
"analyze_wildcard": true,
"query": "+page:\"checkout order confirmation\" +pageType:\"d\""
}
}
]
}
},
"aggs": {
"2": {
"date_histogram": {
"interval": "1h",
"field": "time",
"min_doc_count": 0,
"extended_bounds": {
"min": "1508706273585",
"max": "1508792673586"
},
"format": "epoch_millis"
},
"aggs": {
"1": {
"sum": {
"field": "amount"
}
}
}
}
}
}
RESPONSE:
{
"responses": [
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 44587,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": "NaN"
},
"key_as_string": "1508706000000",
"key": 1508706000000,
"doc_count": 2915
},
{
"1": {
"value": 300203.74
},
"key_as_string": "1508709600000",
"key": 1508709600000,
"doc_count": 2851
},
{
"1": {
"value": 348139.5600000001
},
"key_as_string": "1508713200000",
"key": 1508713200000,
"doc_count": 3197
},
{
"1": {
"value": "NaN"
},
"key_as_string": "1508716800000",
"key": 1508716800000,
"doc_count": 3449
},
{
"1": {
"value": "NaN"
},
"key_as_string": "1508720400000",
"key": 1508720400000,
"doc_count": 3482
},
{
"1": {
"value": 364449.60999999987
},
"key_as_string": "1508724000000",
"key": 1508724000000,
"doc_count": 3103
},
{
"1": {
"value": 334914.68
},
"key_as_string": "1508727600000",
"key": 1508727600000,
"doc_count": 2722
},
{
"1": {
"value": 315368.09000000014
},
"key_as_string": "1508731200000",
"key": 1508731200000,
"doc_count": 2161
},
{
"1": {
"value": 102244.34
},
"key_as_string": "1508734800000",
"key": 1508734800000,
"doc_count": 742
},
{
"1": {
"value": 37178.63
},
"key_as_string": "1508738400000",
"key": 1508738400000,
"doc_count": 333
},
{
"1": {
"value": 25345.68
},
"key_as_string": "1508742000000",
"key": 1508742000000,
"doc_count": 233
},
{
"1": {
"value": 85454.47000000002
},
"key_as_string": "1508745600000",
"key": 1508745600000,
"doc_count": 477
},
{
"1": {
"value": 24102.719999999994
},
"key_as_string": "1508749200000",
"key": 1508749200000,
"doc_count": 195
},
{
"1": {
"value": 23352.309999999994
},
"key_as_string": "1508752800000",
"key": 1508752800000,
"doc_count": 294
},
{
"1": {
"value": 44353.409999999996
},
"key_as_string": "1508756400000",
"key": 1508756400000,
"doc_count": 450
},
{
"1": {
"value": 80129.89999999998
},
"key_as_string": "1508760000000",
"key": 1508760000000,
"doc_count": 867
},
{
"1": {
"value": 122797.11
},
"key_as_string": "1508763600000",
"key": 1508763600000,
"doc_count": 1330
},
{
"1": {
"value": 157442.29000000004
},
"key_as_string": "1508767200000",
"key": 1508767200000,
"doc_count": 1872
},
{
"1": {
"value": 198831.71
},
"key_as_string": "1508770800000",
"key": 1508770800000,
"doc_count": 2251
},
{
"1": {
"value": 218384.08000000002
},
"key_as_string": "1508774400000",
"key": 1508774400000,
"doc_count": 2305
},
{
"1": {
"value": 229829.22000000006
},
"key_as_string": "1508778000000",
"key": 1508778000000,
"doc_count": 2381
},
{
"1": {
"value": 217157.56000000006
},
"key_as_string": "1508781600000",
"key": 1508781600000,
"doc_count": 2433
},
{
"1": {
"value": 208877.13
},
"key_as_string": "1508785200000",
"key": 1508785200000,
"doc_count": 2223
},
{
"1": {
"value": "NaN"
},
"key_as_string": "1508788800000",
"key": 1508788800000,
"doc_count": 2166
},
{
"1": {
"value": 18268.14
},
"key_as_string": "1508792400000",
"key": 1508792400000,
"doc_count": 155
}
]
}
},
"status": 200
}
]
}
You can do a search for <fieldName>:NaN (on numeric fields) to find numbers that are set to NaN.
Obviously, once you find those, you can either fix the root cause of the field being set to NaN, or you can exclude those records from the aggregation by adding a -<fieldName>:NaN to the query.
(It turns out that the input was feeding in some garbage characters once in every few million documents.)
I have this query that calculates the number of events per bucket. How can I calculate the total number of buckets that have value greater than 0?
GET myindex/_search?
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"PlateNumber": "111"
}
}
]
}
},
"aggs": {
"daily_intensity": {
"date_histogram": {
"field": "Datetime",
"interval": "day"
},
"aggs": {
"count_of_events": {
"value_count": {
"field": "Monthday"
}
}
}
}
}
}
This is the output that I get. The expected answer that I want to get is 26, because there are totally 26 elements in buckets that have value greater than 0. Basically I do not need the output of all buckets, I only need this total number.
{
"took": 237,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 98,
"max_score": 0,
"hits": []
},
"aggregations": {
"daily_intensity": {
"buckets": [
{
"key_as_string": "2017-05-01T00:00:00.000Z",
"key": 1493596800000,
"doc_count": 3,
"count_of_events": {
"value": 3
}
},
{
"key_as_string": "2017-05-02T00:00:00.000Z",
"key": 1493683200000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-03T00:00:00.000Z",
"key": 1493769600000,
"doc_count": 4,
"count_of_events": {
"value": 4
}
},
{
"key_as_string": "2017-05-04T00:00:00.000Z",
"key": 1493856000000,
"doc_count": 6,
"count_of_events": {
"value": 6
}
},
{
"key_as_string": "2017-05-05T00:00:00.000Z",
"key": 1493942400000,
"doc_count": 0,
"count_of_events": {
"value": 0
}
},
{
"key_as_string": "2017-05-06T00:00:00.000Z",
"key": 1494028800000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-07T00:00:00.000Z",
"key": 1494115200000,
"doc_count": 5,
"count_of_events": {
"value": 5
}
},
{
"key_as_string": "2017-05-08T00:00:00.000Z",
"key": 1494201600000,
"doc_count": 6,
"count_of_events": {
"value": 6
}
},
{
"key_as_string": "2017-05-09T00:00:00.000Z",
"key": 1494288000000,
"doc_count": 2,
"count_of_events": {
"value": 2
}
},
{
"key_as_string": "2017-05-10T00:00:00.000Z",
"key": 1494374400000,
"doc_count": 3,
"count_of_events": {
"value": 3
}
},
{
"key_as_string": "2017-05-11T00:00:00.000Z",
"key": 1494460800000,
"doc_count": 0,
"count_of_events": {
"value": 0
}
},
{
"key_as_string": "2017-05-12T00:00:00.000Z",
"key": 1494547200000,
"doc_count": 3,
"count_of_events": {
"value": 3
}
},
{
"key_as_string": "2017-05-13T00:00:00.000Z",
"key": 1494633600000,
"doc_count": 0,
"count_of_events": {
"value": 0
}
},
{
"key_as_string": "2017-05-14T00:00:00.000Z",
"key": 1494720000000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-15T00:00:00.000Z",
"key": 1494806400000,
"doc_count": 3,
"count_of_events": {
"value": 3
}
},
{
"key_as_string": "2017-05-16T00:00:00.000Z",
"key": 1494892800000,
"doc_count": 0,
"count_of_events": {
"value": 0
}
},
{
"key_as_string": "2017-05-17T00:00:00.000Z",
"key": 1494979200000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-18T00:00:00.000Z",
"key": 1495065600000,
"doc_count": 3,
"count_of_events": {
"value": 3
}
},
{
"key_as_string": "2017-05-19T00:00:00.000Z",
"key": 1495152000000,
"doc_count": 2,
"count_of_events": {
"value": 2
}
},
{
"key_as_string": "2017-05-20T00:00:00.000Z",
"key": 1495238400000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-21T00:00:00.000Z",
"key": 1495324800000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-22T00:00:00.000Z",
"key": 1495411200000,
"doc_count": 5,
"count_of_events": {
"value": 5
}
},
{
"key_as_string": "2017-05-23T00:00:00.000Z",
"key": 1495497600000,
"doc_count": 16,
"count_of_events": {
"value": 16
}
},
{
"key_as_string": "2017-05-24T00:00:00.000Z",
"key": 1495584000000,
"doc_count": 4,
"count_of_events": {
"value": 4
}
},
{
"key_as_string": "2017-05-25T00:00:00.000Z",
"key": 1495670400000,
"doc_count": 6,
"count_of_events": {
"value": 6
}
},
{
"key_as_string": "2017-05-26T00:00:00.000Z",
"key": 1495756800000,
"doc_count": 1,
"count_of_events": {
"value": 1
}
},
{
"key_as_string": "2017-05-27T00:00:00.000Z",
"key": 1495843200000,
"doc_count": 5,
"count_of_events": {
"value": 5
}
},
{
"key_as_string": "2017-05-28T00:00:00.000Z",
"key": 1495929600000,
"doc_count": 4,
"count_of_events": {
"value": 4
}
},
{
"key_as_string": "2017-05-29T00:00:00.000Z",
"key": 1496016000000,
"doc_count": 5,
"count_of_events": {
"value": 5
}
},
{
"key_as_string": "2017-05-30T00:00:00.000Z",
"key": 1496102400000,
"doc_count": 2,
"count_of_events": {
"value": 2
}
},
{
"key_as_string": "2017-05-31T00:00:00.000Z",
"key": 1496188800000,
"doc_count": 4,
"count_of_events": {
"value": 4
}
}
]
}
}
}
You can use Bucket Script Aggregation & Sum Bucket Aggregation to achieve this. Try below query.
GET myindex/_search?
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"PlateNumber": "111"
}
}
]
}
},
"aggs": {
"daily_intensity": {
"date_histogram": {
"field": "Datetime",
"interval": "day"
},
"aggs": {
"count_of_events": {
"value_count": {
"field": "Monthday"
}
},
"check": {
"bucket_script": {
"buckets_path": {
"count": "count_of_events"
},
"script": "return (params.count > 0 ? 1 : 0)"
}
}
}
},
"bucket_count": {
"sum_bucket": {
"buckets_path": "daily_intensity>check"
}
}
}
}
I have some problems with the facets terms query (with ElasticSearch 1.7.0), all float values are right stored in the database and in the elasticsearch bulk too, but when I request the data I get the values like that "2.99000000954".
The strange thing is that when I put my request with "2.99000000954", the engine found the good Article related with this data, the article with the "2.99" value.
Please have a look on the below codes files and the curl call request:
Mapping (from _plugin/head)
"pvi": {
"include_in_all": false,
"type": "float",
"fields": {
"raw": {
"type": "float"
},
"sort": {
"type": "float"
}
}
}
elastic_bulk_Article_en_XXX.json0
{
"pvi": [
"2.99"
],
}
The curl call
curl -XGET 'http://elasticsearch:9200/entrepriseName_search_index_fr_fr/Article/_search' -d '{"query":{"filtered":{"query":{"match_all":{}},"filter":[]}},"aggs":{"pvi":{"filter":{"query":{"query_string":{"query":"*","fuzzy_prefix_length":1,"fields":["pvi"]}}},"aggs":{"pvi":{"terms":{"field":"pvi","size":25,"order":{"_term":"asc"}}}}}},"size":0}'
The curl call results
{
"aggregations": {
"pvi": {
"doc_count": 1007,
"pvi": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1
},
{
"key": 2.99000000954,
"doc_count": 1
},
{
"key": 3.99000000954,
"doc_count": 6
},
{
"key": 4.98999977112,
"doc_count": 33
},
{
"key": 5.98999977112,
"doc_count": 46
},
{
"key": 6.98999977112,
"doc_count": 11
},
{
"key": 7.98999977112,
"doc_count": 69
},
{
"key": 9.98999977112,
"doc_count": 78
},
{
"key": 12.9899997711,
"doc_count": 107
},
{
"key": 15.9899997711,
"doc_count": 135
},
{
"key": 17.9899997711,
"doc_count": 60
},
{
"key": 19.9899997711,
"doc_count": 158
},
{
"key": 22.9899997711,
"doc_count": 17
},
{
"key": 25.9899997711,
"doc_count": 143
},
{
"key": 27.9899997711,
"doc_count": 2
},
{
"key": 29.9899997711,
"doc_count": 70
},
{
"key": 35.9900016785,
"doc_count": 25
},
{
"key": 39,
"doc_count": 1
},
{
"key": 39.9900016785,
"doc_count": 28
},
{
"key": 49.9900016785,
"doc_count": 12
},
{
"key": 59.9900016785,
"doc_count": 3
},
{
"key": 69.9899978638,
"doc_count": 1
}
]
}
}
},
"query": null,
"checked": "{}"
}
I've found the solution, I changed the datatype from float to long and everything works!