I am building visualizations in Kibana for AWS CloudWatch metrics, and have run into a bit of an issue creating Metric Tables.. Kibana is splitting my fields that contain dashes (instance ID, region, etc..)
Rather than having an individual row with an instance ID, for example, i-7bb06dzz, it is creating 2 rows: i & 7bb06dzz. "i" displays the aggregate count of all other fields. If I add a second split with the region, this is duplicated for every set of characters in the region name separated by dashes as well. (us , east, and 1 instead of just us-east-1).
I tried to post a screenshot, but my reputation is not high enough to do so..
Here are my visualization settings:
Metrics: Metric (Count)
Aggregations:
Split Rows: Terms: InstanceID: Top 5: Order by metric:Count
Split Rows: Terms: Region: Top 5: Order by metric:Count
No Advanced Settings have been specified. I was able to get a reasonable looking list by only specifying InstanceID, and excluding the pattern "i"... however, it doesn't do me a lot of good when I can't display the region next to it. Both values are indexed as strings and were recorded in ElasticSearch with double quotes around them.
Any recommendations on how to display the fields as intended would be much appreciated.
This is because Elasticsearch "analyzes" the field for the individual tokens in it. Logstash will store fields in both the fieldname and fieldname.raw fields - the latter is unanalyzed and will behave as you expect.
Related
I have a document has a "bag.contents" field (indexed as text with a .keyword derivative) that contains a comma separated list of items contained in it. Below are some samples:
`Apple, Apple, Apple`
`Apple, Orange`
`Car, Apple` <--
`Orange`
`Bus` <--
`Grape, Car` <--
'Car, Bus` <--
The desired query results should be all documents where there is at least one instance of something other than 'Apple', 'Orange', 'Grape', as per the arrows above.
I'm sure the DSL is a combination of must and not but after 20 or so iterations it seems very difficult to get Elasticsearch to return the correct result set short of one that doesn't contain any of those 3 things.
It is also worth noting that this field in the original document is a JSON array and Kibana shows it as a single field with the elements as a comma-separated field. I suspect this may be complicating it.
1 - If it is showing up as single field, probably its not indexed as array - Please make sure document to index is formed properly. i.e, you need it to be
{ "contents": ["apple","orange","grape"]}
and not
{"contents": "apple,orange,grape"}
2- Regarding query - if you know all the terms possible while doing query- you can form a term_set query with all other terms but apple , orange and grape. termset query allows to control min matches required ( 1 in your case)
If you dont know all possible terms , may be create a separate field for indexing all other words minus apple orange and grape and query against that field.
Lets say I have documents of the following type and many indices (hundreds) like that per day (Ts of data per day):
{
"date_time":"07/May/2019:02:37:19 +0000",
"ip":"17.15.15.15",
"status":"200",
"request_url":"https://my.domain.com/some/long/path/to/page",
"response_time": "0.01"
}
The first 3 fields (date_time, ip, status) will be searched by/ aggregated by a lot, a few times per second/min.
The last two fields (request_url, response_time) will be searched by very infrequently - once a month or even a few months - but they will be searched by.
What would be the most efficient mapping for these requirements?
Obviously, the first 3 fields have to be fully indexed, as doc_type.
What about the last 2 , infrequently searched ones? If I totally disable indexing for them - set "enabled" = FALSE (I'd have to move them into a separate object as AFAIK single fields can't be disabled, only objects ) - then I will never be able to search them, unless I search the _source content. Given the volume of data I have (Ts of data, hundreds of indices, per day) - it is going to kill my cluster :)
I'm almost looking for an option like the "frozen" indices - for which indexing structures are loaded into memory on demand. But, I need this on a field-level, not the whole index level. So that only a subset of fields would be "indexed on demand"
Any suggestions?
Thank you!
Marina
We have an elasticsearch that contains over half a billion documents that each have a url field that stores a URL.
The url field mapping currently has the settings:
{
index: not_analyzed
doc_values: true
...
}
We want our users to be able to search URLs, or portions of URLs without having to use wildcards.
For example, taking the URL with path: /part1/user#site/part2/part3.ext
They should be able to bring back a matching document by searching:
part3.ext
user#site
part1
part2/part3.ext
The way I see it, we have two options:
Implement an analysed version of this field (which can no longer have doc_values: true) and do match querying instead of wildcards. This would also require using a custom analyser to leverage the pattern tokeniser to make the extracted terms correct (the standard tokeniser would split user#site into user and site).
Go through our database and for each document create a new field that is a list of URL parts. This field could have doc_values: true still so would be stored off-heap, and we could do term querying on exact field values instead of wildcards.
My question is this:
Which is better for performance: having a list of variable lengths that has doc_values on, or having an analysed field? (ie: option 1 or option 2) OR is there an option 3 that would be even better yet?!
Thanks for your help!
Your question is about a field where you need doc_values but can not index with keyword-analyzer.
You did not mention why you need doc_values. But you did mention that you currently not search in this field.
So I guess that the name of the search-field do not have to be the same: you can copy the field value in an other field which is only for search ( "store": false ). For this new field you can use the pattern-analyzer or pattern-tokenizer for your use case.
It seems that no-one has actually performance tested the two options, so I did.
I took a sample of 10 million documents and created two new indices:
An index with an analysed field that was setup as suggested in the other answer.
An index with a string field that would store all permutations of URL segmentation.
I ran an enrichment process over the second index to populate the fields. The field values on the first index were created when I re-indexed the sample data from my main index.
Then I created a set of gatling tests to run against the indices and compared the gatling results and netdata (https://github.com/firehol/netdata) landscape for each.
The results were as follows:
Regarding the netadata landscape: The analysed field showed a spike - although only a small one - on all elastic nodes. The not_analysed list field tests didn't even register.
It is worth mentioning that enriching the list field with URL segmentation permutations bloated the index by about 80% in our case. So there's a trade off - you never need to do wildcard searches for exact sub-segment matching on URLs, but you'll need a lot more disk to do it.
Update
Don't do this. Go for doc_values. Doing anything with analyzed strings that have a massive number of possible terms will mean massive field data that will, eventually, never fit in the amount of memory you can allocate it.
In Kibana, I have an index that looks like as follows
type (String)
value (String)
timestamp (Date)
I would like to have a visualization that shows the most recent value field where the type is equal to "battery", for example.
I would like the visualization to be similar to the "Metric" one, but displaying a string of text instead of a number, of course.
Is this possible with Kibana? If not, how can I get a similar result?
You can use a Data Table visualization.
In the search query you would specify type: "Battery"
In the metric section you would specify Max timestamp
In the Split Rows section you would specify Aggregation=Terms, Field=value, OrderBy=metric:Max timestamp, Order=descending, Size=1
You will have a result that is a table with 1 row and 2 columns, one of which being a value and the other a timestamp
If this does not satisfy your needs, you may look into available Kibana plugins that allow new visualizations (see the list of known plugins) or modify one of them to suite your needs.
I have an problem with the unique count feature.
I get data from elasticsearch for example an computer name (PC-01) in a field.
When i want to use a visualisation unique count then kibana makes from "DESKTOP-2D562R2" -> "DESKTOP" and "2D562R2" as a entery.
See this splitted field:
The data kibana gets from elastic search looks like this entery data:
The problem with this is that 2d562r2 and desktop two different "enterys" are in a kibana table or with unique count.
Your field is being analyzed (split into tokens). Change the mapping (or template, depending on how you're creating the indexes) to make this field not_analyzed.
Note that, as a hack, logstash's default template creates a ".raw" version of string fields that is not analyzed. You could refer to enterys.raw.