New elasticsearch user here and having an issue with a terms aggregation.
I have indexed 187 documents with fields like "name","host","risk" etc.
The field risk has 4 unique values ("Critical","High","Medium","Low","Informational")
I am running a terms aggregations like this:
POST http://localhost:9200/{index_name}/_search?size=0
{
"aggs":{
"riskCount":{
"terms":{
"field":"risk.keyword"
}
}
}
}
I was expecting a result stating that i have x of Critical, x of High etc.
Thing is, i get no buckets returned.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 187,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"riskCount": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
My Elasticsearch version is 7.12.0 Any ideas
Edit:
So, here's the mapping:
"findings": {
"mappings": {
"properties": {
"date_uploaded": {
"type": "date"
},
"host": {
"type": "text"
},
"name": {
"type": "text"
},
"risk": {
"type": "text"
}
}
}
}
And here's the document:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 187,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "findings",
"_type": "_doc",
"_id": "f86b6b5b-f09e-4350-9a66-d88a3a78f640",
"_score": 1.0,
"_source": {
"risk": "Informational",
"name": "HTTP Server Type and Version",
"host": "10.10.9.10",
"date_uploaded": "2021-05-07T19:39:10.810663+00:00"
}
}
]
}
}
Since the risk field is of text type, you need to update your index mapping as
PUT /_mapping
{
"properties": {
"risk": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Then run the update_by_query API to reindex the data
You don't have any risk.keyword field in your mapping. You need to change your mapping as follows. Just run the following command to update your mapping and create the risk.keyword sub-field:
PUT index-name/_mapping
{
"properties": {
"date_uploaded": {
"type": "date"
},
"host": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"risk": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Then reindex your data using this command:
POST index-name/_update_by_query
And then your query can be run like this:
{
"aggs":{
"riskCount":{
"terms":{
"field":"risk.keyword"
}
}
}
}
Related
Created an index tr_logintracker in elasticsearch using the below. We are using Elasticsearch version 7.17
We even tried with data type of Integer in place of index
{
"mappings": {
"properties": {
"logintime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"logouttime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"logout": {
"type": "long"
},
"vehicleid": {
"type": "long"
},
"driverid": {
"type": "long"
},
"vehicleownerid": {
"type": "long"
}
}
}
}
Index is created
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "tr_logintracker"
}
Document is inserted into the index. Same can be seen using
{
"query": {
"match_all" : {}
}
}
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "tr_logintracker",
"_type": "_doc",
"_id": "6pJHe4QBPiDyvh1VwkiC",
"_score": 1.0,
"_source": {
"data": {
"vehicleownerid": 17,
"driverid": 21,
"vehicleid": 20,
"logintime": "2022-11-15 18:03:29",
"logout": 0
}
}
}
]
}
}
But when the same is queried null result is getting fetched
Query
{
"query": {
"bool": {
"must": [
{ "match": { "driverid" : 21 }}
]
}
}
}
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
When checked /tr_logintracker/_mapping can see the below. Which does not look correct. The second set of entries is happening when we insert the document into the index.
{
"tr_logintracker": {
"mappings": {
"properties": {
"data": {
"properties": {
"driverid": {
"type": "long"
},
"logintime": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"logout": {
"type": "long"
},
"vehicleid": {
"type": "long"
},
"vehicleownerid": {
"type": "long"
}
}
},
"driverid": {
"type": "long"
},
"logintime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"logout": {
"type": "long"
},
"logouttime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"vehicleid": {
"type": "long"
},
"vehicleownerid": {
"type": "long"
}
}
}
}
}
We also tried the option of dynamic mapping - getting the index created by the program which is inserting. Even in that case query is not fetching the result.
Tldr;
It seems you have two entries for driverid,
data.driverid
driverid
From the document you showed the first time, you have data.driverid: 21.
But you seem to query driverid instead.
Solution
This should work.
{
"query": {
"bool": {
"must": [
{ "match": { "data.driverid" : 21 }}
]
}
}
}
Root Cause
Most likely, you must be sending a document like below to Elasticsearch
{
"data": {
"vehicleownerid": 17,
"driverid": 21,
"vehicleid": 20,
"logintime": "2022-11-15 18:03:29",
"logout": 0
}
}
Where as you should be sending it like so
{
"vehicleownerid": 17,
"driverid": 21,
"vehicleid": 20,
"logintime": "2022-11-15 18:03:29",
"logout": 0
}
From the match_all result, I can see that what you want to query stays inside the data object. So, instead of matching driverid, it should be data.driverid.
Your query then looks like this:
{
"query": {
"bool": {
"must": [
{ "match": { "data.driverid" : 21 }}
]
}
}
}
I have the following index template
{
"index_patterns": "notificationtiles*",
"order": 1,
"version": 1,
"aliases": {
"notificationtiles": {}
},
"settings": {
"number_of_shards": 5,
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"influencerId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"friendId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"message": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"type": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"sponsorshipCharityId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"createdTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"updatedTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"createdDate": {
"type": "date"
},
"updatedDate": {
"type": "date"
}
}
}
}
with the following query
{
"query": {
"bool": {
"must": [
{
"match": {
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6"
}
},
{
"match": {
"type": "friend_completed_sponsorship"
}
}
]
}
},
"size": 0,
"aggs": {
"friendId": {
"terms": {
"field": "friendId",
"size": 2
},
"aggs": {
"latest": {
"top_hits": {
"sort": [
{
"createdDate": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"sponsorshipCharityId",
"message",
"createdDate"
]
},
"size": 1
}
}
}
}
}
}
which returns
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 12,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"friendId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 7,
"buckets": [
{
"key": "cf750fd8-998f-4dcd-9c88-56b2b6d6fce9",
"doc_count": 3,
"latest": {
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "416a8e07-fd72-46d4-ade1-b9442ef46978",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:35:17.816842Z",
"sponsorshipCharityId": "336de13c-f522-4796-9218-f373ff0b4373",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593020117816
]
}
]
}
}
},
{
"key": "93ab55c5-795f-44b0-900c-912e3e186da0",
"doc_count": 2,
"latest": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "66913b8f-94fe-49fd-9483-f332329b80dd",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:57:17.816842Z",
"sponsorshipCharityId": "dbad136c-5002-4470-b85d-e5ba1eff515b",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593021437816
]
}
]
}
}
}
]
}
}
}
However, I'd like the results to include the latest documents (ordered by createdDate desc), for example the following document
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"_score": 1.0,
"_source": {
"id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6",
"friendId": "af342805-1990-4794-9d67-3bb2dd1e36dc",
"message": "Contact Test 788826 Completed Sponsorship!",
"type": "friend_completed_sponsorship",
"sponsorshipCharityId": "b2db72e6-a70e-414a-bf8b-558e6314e7ec",
"createdDate": "2020-06-25T17:35:17.816842Z",
"updatedDate": "2020-06-25T17:35:17.816876Z",
"createdTimestampEpochInMilliseconds": 1593021437817,
"updatedTimestampEpochInMilliseconds": 1593021437817
}
}
I need to get the 2 latests documents grouped by friendId with the latest document per friendId. The part of grouping by friendId with the latest document per friendId, works fine. However, I'm unable to sort the index by createdDate desc before the aggregation happens.
essentially, i'd like to sort the index by createdDate desc, before the aggregation takes place. I don't want to have a parent aggregate by createdDate since that wouldn't result in unique friendId. How can that be achieved?
It looks like you need to set the order property of your terms aggregation. By default they are ordered by hit count. You want them to be ordered by the max createdDate. So you should add a sub aggregation to calculate the max createdDate, and then you can use the ID of that aggregation to order the parent terms aggregation.
I'm running the following query :
{
"size": 50,
"_source" : ["servername", "silo", "packages.displayname", "packages.displayversion","environment"],
"query": {
"bool": {
"must": {
"match": {
"packages.displayname": "Google Chrome"
}
}
,
"must": {
"type": {
"value": "server"
}
}
}
}
}
But it doesn't fetch any records
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
However, the concerned index\type has some records where "packages.displayname" = "Google Chrome", below is a sample of the index\type
{
"took": 78,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 994,
"max_score": 1,
"hits": [
{
"_index": "package_conformity-13.02.2019",
"_type": "server",
"_id": "AWjklhaPsoJF1yu58sfg",
"_score": 1,
"_source": {
"environment": "PRD",
"servername": "Zephyr",
"packages": [
{
"displayname": "Google Chrome",
"displayversion": "71.0.3578.80"
},
here is the index mapping :
{
"package_conformity-13.02.2019": {
"mappings": {
"server": {
"properties": {
"environment": {
"type": "keyword"
},
"farm": {
"type": "keyword"
},
"packages": {
"type": "nested",
"properties": {
"InstallDate": {
"type": "date",
"index": false
},
"InstallLocation": {
"type": "text",
"index": false
},
"comments": {
"type": "text",
"index": false
},
"displayname": {
"type": "keyword"
},
"displayversion": {
"type": "keyword",
"index": false
},
"publisher": {
"type": "text",
"index": false
},
"regkey": {
"type": "keyword",
"index": false
}
}
},
"servername": {
"type": "keyword"
},
"silo": {
"type": "keyword"
},
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
Is there something wrong in the way of querying or in the index structure or content ? Please help me by pointing me to the right way..
Thanks
If you want multiple constraints inside your must clause, you need to have an array (and not repeat the must keyword multiple times). Also, the constraint on _type should be made differently, using a term query. Try this query instead:
{
"size": 50,
"_source": [
"servername",
"silo",
"packages.displayname",
"packages.displayversion",
"environment"
],
"query": {
"bool": {
"must": [
{
"nested": {
"path": "packages",
"query": {
"match": {
"packages.displayname": "Google Chrome"
}
}
}
},
{
"term": {
"_type": "server"
}
}
]
}
}
}
I have a mapping for some documents and queries agains terms does fail. I don't understand why:
"mappings":{
"timeslot":{
"properties":{
"FOB_IN":{
"type":"long"
},
"TRIGGER_CODE":{
"type":"long"
},
"FLIGHT_PHASE":{
"type":"long"
},
"REP16_TRIG":{
"type":"long"
},
"fwot":{
"type":"string"
},
"FOB_OUT":{
"type":"long"
},
"FP":{
"type":"long"
},
"FLTNB":{
"type":"string"
},
"Date":{
"format":"strict_date_optional_time||epoch_millis",
"type":"date"
}
}
}
}
I can make a term query against TRIGGER_CODE, for example, and it works fine
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 4.4446826,
"hits": [
{
"_index": "merged-2016-04",
"_type": "timeslot",
"_id": "AVRS8VnirVLwfvMnwpXb",
"_score": 4.4446826,
"_source": {
"Date": "2016-04-03T08:42:44+0000",
"FLIGHT_PHASE": 20,
"TRIGGER_CODE": 4000,
"fwot": "A6-APA"
}
}
]
}
}
now the same against fwot does fail. What's wrong?
GET merged-2016-04/_search?size=1
{
"query" : {
"term" : { "fwot": "A6-APA"}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
You need fwot to be "index": "not_analyzed" for that to work. And you need to reindex the data for the above change to work.
Here's the complete list of commands for the mapping change and some test data:
PUT /merged-2016-04
{
"mappings": {
"timeslot": {
"properties": {
"FOB_IN": {
"type": "long"
},
"TRIGGER_CODE": {
"type": "long"
},
"FLIGHT_PHASE": {
"type": "long"
},
"REP16_TRIG": {
"type": "long"
},
"fwot": {
"type": "string",
"index": "not_analyzed"
},
"FOB_OUT": {
"type": "long"
},
"FP": {
"type": "long"
},
"FLTNB": {
"type": "string"
},
"Date": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
}
}
}
}
}
POST /merged-2016-04/timeslot
{
"Date": "2016-04-03T08:42:44+0000",
"FLIGHT_PHASE": 20,
"TRIGGER_CODE": 4000,
"fwot": "A6-APA"
}
GET merged-2016-04/_search?size=1
{
"query": {
"term": {
"fwot": "A6-APA"
}
}
}
See the doc page Query DLS term query, note "Why doesn’t the term query match my document" for a detailed explanation.
We Can use keyword
GET merged-2016-04/_search?size=1
{
"query": {
"term": {
"fwot.keyword": "A6-APA"
}
}
}
I have a field named "lang" which consists values "en_US","en_GB","ru_RU", e.t.c. with this mapping
"lang": {
"type": "string",
"index": "not_analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
How to filter for documents, e.g. from "US"?
One way you can do it is change "index": "not_analyzed" on the upper-level field, and set up a pattern analyzer for that field. Since you already have the "lang.raw" field set up, you'll still be able to get the untouched version for faceting or whatever.
So, to test it I set up an index like this:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"whitespace_underscore": {
"type": "pattern",
"pattern": "[\\s_]+",
"lowercase": false
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string"
},
"lang": {
"type": "string",
"index_analyzer": "whitespace_underscore",
"search_analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
}
}
And added a few docs:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"doc1","lang":"en_US"}
{"index":{"_id":2}}
{"name":"doc2","lang":"en_GB"}
{"index":{"_id":3}}
{"name":"doc3","lang":"ru_RU"}
Now I can filter by "US" like this:
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"lang": "US"
}
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"name": "doc1",
"lang": "en_US"
}
}
]
}
}
And I can still get a list of values with a terms aggregation on "lang.raw":
POST /test_index/_search?search_type=count
{
"aggs": {
"lang_terms": {
"terms": {
"field": "lang.raw"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"lang_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "en_GB",
"doc_count": 1
},
{
"key": "en_US",
"doc_count": 1
},
{
"key": "ru_RU",
"doc_count": 1
}
]
}
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/ac3f3fd66ea649c0c3a8010241d1f6981a7e012c