kibana unique count seeing names with - as different entery - elasticsearch

I have an problem with the unique count feature.
I get data from elasticsearch for example an computer name (PC-01) in a field.
When i want to use a visualisation unique count then kibana makes from "DESKTOP-2D562R2" -> "DESKTOP" and "2D562R2" as a entery.
See this splitted field:
The data kibana gets from elastic search looks like this entery data:
The problem with this is that 2d562r2 and desktop two different "enterys" are in a kibana table or with unique count.

Your field is being analyzed (split into tokens). Change the mapping (or template, depending on how you're creating the indexes) to make this field not_analyzed.
Note that, as a hack, logstash's default template creates a ".raw" version of string fields that is not analyzed. You could refer to enterys.raw.

Related

elasticsearch unknown setting index.include_type_name

I'm in really weird situations, I need to create indexes in elasticsearch that contain typeless fields. I have a rails application that sends any data per second to my elasticsearch. about my architecture, I have to say I use elastic-stack on docker in ubuntu server and use socket to send data's to elk and all of them are the latest version.
In my rails application user could choose datatype for each field but the issues happen when the user want to change the datatype of one field right after it's created, logstash return this error
error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [field] of type [long] in document with id '5e760cac-cafc-4fd0-9e45-1c650967ccd4'. Preview of field's value: '2022-01-18T08:06:30'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"2022-01-18T08:06:30\
I found deadly queue letter plugins to save wrong input in my server after that I think if I could index documents without any type the problem is solved so I start to googling and found Removal of mapping types in elasticsearch documents and I follow instructions which describe in tutorials I get the following error:
unknown setting [index.include_type_name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
even I put "include_type_name" in the request to send to the elastic noting change I have the latest version of elastic.
I think maybe it's helpful to edit the default elasticsearch template but noting the change. could you please help me with what should I do?
As already mentioned in the comments, Elasticsearch does not support changing the data type of a field without a reindex or creating a new index.
For example, if a field is mapped as a numeric field like integer and the user wants to index a string value in this field, elasticsearch will return an mapping error.
You would need to change the mapping of the index and reindex it or create a entirely new index using the new mapping.
None of this is done automatically by elastic, you would need to deal with this in your application, you could catch the error and implement some logic to create a new index with the new mapping, but this also could lead to other problems as having too many indices in the cluster and query errors when the range of the query include index with the same field with different mappings.
One feature that Elasticsearch has that could help you in some way is the runtime fields, with runtime fields you can query a field that has a specific mapping using a different mapping.
For example, if you have a field that has date values, but was wrongly mapped as a keyword or text field, you could use a runtime field to query it as it was a date field.
But again, this will need that you implement a logic to build those runtime fields and also can lead to other problems, not all the data types are available to runtime fields and runtime fields can impact in the performance.
Another feature that could help you is to use of multi-fields, this, I think, is the closest you got of having a field with multiple data types.
Using multi-fields you could have a field named date with the date type and also a field named date.keyword with the keyword type, you also could have a field name code with the keyword type and a field name code.int with the integer type, you would also need to use the ignore_malformed setting in the mapping so elastic does not reject the entire document in case of mapping errors, just the field with the wrong mapping.
Just keep in mind that when use multi-fields, you will have a different field for each mapping, for example date is a field, date.keyword is another field, this will increase the storage usage.
But again, none of this is done automatically, it needs logic in your application, elasticsearch does not allows you to change the mapping of a field, if your application needs this, you will need to implement something in the application that can work with that limitations of elasticsearch.

elasticsearch - Tag data with lookup table values

I’m trying to tag my data according to a lookup table.
The lookup table has these fields:
• Key- represent the field name in the data I want to tag.
In the real data the field is a subfield of “Headers” field..
An example for the “Key” field:
“Server. (* is a wildcard)
• Value- represent the wanted value of the mentioned field above.
The value in the lookup table is only a part of a string in the real data value.
An example for the “Value” field:
“Avtech”.
• Vendor- the value I want to add to the real data if a combination of field- value is found in an document.
An example for combination in the real data:
“Headers.Server : Linux/2.x UPnP/1.0 Avtech/1.0”
A match with that document in the look up table will be:
Key= Server (with wildcard on both sides).
Value= Avtech(with wildcard on both sides)
Vendor= Avtech
So baisically I’ll need to add a field to that document with the value- “ Avtech”.
the subfields in “Headers” are dynamic fields that changes from document to document.
of a match is not found I’ll need to add to the tag field with value- “Unknown”.
I’ve tried to use the enrich processor , use the lookup table as the source data , the match field will be ”Value” and the enrich field will be “Vendor”.
In the enrich processor I didn’t know how to call to the field since it’s dynamic and I wanted to search if the value is anywhere in the “Headers” subfields.
Also, I don’t think that there will be a match between the “Value” in the lookup table and the value of the Headers subfield, since “Value” field in the lookup table is a substring with wildcards on both sides.
I can use some help to accomplish what I’m trying to do.. and how to search with wildcards inside an enrich processor.
or if you have other idea besides the enrich processor- such as parent- child and lookup terms mechanism.
Thanks!
Adi.
There are two ways to accomplish this:
Using the combination of Logstash & Elasticsearch
Using the only the Elastichsearch Ingest node
Constriant: You need to know the position of the Vendor term occuring in the Header field.
Approach 1
If so then you can use the GROK filter to extract the term. And based on the term found, do a lookup to get the corresponding value.
Reference
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_static.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
Approach 2
Create an index consisting of KV pairs. In the ingest node, create a pipeline which consists of Grok processor and then Enrich it. The Grok would work the same way mentioned in the Approach 1. And you seem to have already got the Enrich part working.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
If you are able to isolate the sub field within the Header where the Term of interest is present then it would make things easier for you.

Elastic Search - Find document with a conflicting field type

I'm using Elastic Search 5.6.2 with Kibana and I'm currently facing a problem
My documents are indexed on the field timestamp which is normally an integer, however recently somebody has logged a document with a timestamp that is not an integer, and Kibana complains of conflicting type.
The discover panels display nothing and the following errors pop:
Saved "field" parameter is now invalid. Please select a new field.
Discover: "field" is a required parameter
How can I look for the document(s) causing these conflicts so that to find the service creating bad logs ?
The field type (either integer or text/keyword) is not defined on per document basis but rather on per index basis (in the mappings). I guess you are manipulating timeseries data, and you probably have un index per day (or per month or ...).
In Kibana Dev Tools:
List the created indices with GET _cat/indices
For each index (logstash-2017.09.28 in my example) do a GET logstash-2017.09.28/_mapping and check the type of the field in #timestamp
The field type is probably different between indices.
You won't be able to change the field type on created indices. Deleting document won't solve you're problem. The only solution is to drop the index or reindex the whole index with a new field type (in a specific mapping).
To avoid this problem on future indices, the solution is to create an index template with a mapping telling that the field #timestamp is of type date or whatever.

analyzed field vs doc_values: true field

We have an elasticsearch that contains over half a billion documents that each have a url field that stores a URL.
The url field mapping currently has the settings:
{
index: not_analyzed
doc_values: true
...
}
We want our users to be able to search URLs, or portions of URLs without having to use wildcards.
For example, taking the URL with path: /part1/user#site/part2/part3.ext
They should be able to bring back a matching document by searching:
part3.ext
user#site
part1
part2/part3.ext
The way I see it, we have two options:
Implement an analysed version of this field (which can no longer have doc_values: true) and do match querying instead of wildcards. This would also require using a custom analyser to leverage the pattern tokeniser to make the extracted terms correct (the standard tokeniser would split user#site into user and site).
Go through our database and for each document create a new field that is a list of URL parts. This field could have doc_values: true still so would be stored off-heap, and we could do term querying on exact field values instead of wildcards.
My question is this:
Which is better for performance: having a list of variable lengths that has doc_values on, or having an analysed field? (ie: option 1 or option 2) OR is there an option 3 that would be even better yet?!
Thanks for your help!
Your question is about a field where you need doc_values but can not index with keyword-analyzer.
You did not mention why you need doc_values. But you did mention that you currently not search in this field.
So I guess that the name of the search-field do not have to be the same: you can copy the field value in an other field which is only for search ( "store": false ). For this new field you can use the pattern-analyzer or pattern-tokenizer for your use case.
It seems that no-one has actually performance tested the two options, so I did.
I took a sample of 10 million documents and created two new indices:
An index with an analysed field that was setup as suggested in the other answer.
An index with a string field that would store all permutations of URL segmentation.
I ran an enrichment process over the second index to populate the fields. The field values on the first index were created when I re-indexed the sample data from my main index.
Then I created a set of gatling tests to run against the indices and compared the gatling results and netdata (https://github.com/firehol/netdata) landscape for each.
The results were as follows:
Regarding the netadata landscape: The analysed field showed a spike - although only a small one - on all elastic nodes. The not_analysed list field tests didn't even register.
It is worth mentioning that enriching the list field with URL segmentation permutations bloated the index by about 80% in our case. So there's a trade off - you never need to do wildcard searches for exact sub-segment matching on URLs, but you'll need a lot more disk to do it.
Update
Don't do this. Go for doc_values. Doing anything with analyzed strings that have a massive number of possible terms will mean massive field data that will, eventually, never fit in the amount of memory you can allocate it.

Cannot select time field for default index

I'm using kibana-4. Following the documentation here I should be able to create an index by putting this in my elasticsearch.yaml file:
PUT .kibana
{
"index.mapper.dynamic": true
}
I'm not sure I understand how to do this, because a yaml file should not take values formatted like the above block, right?
I noticed that .kibana was a default index, so after inputting it into the kibana console, I was asked to input a time field for the default index. However, the input HTML element is a dropdown that contained no options. Without selecting a time-field option I am not allowed to create a default index. What am I supposed to do? Has anyone else run into a similar problem?
I understand the problem faced by you. Even i faced the same while using Kibana 4 for first time.
Here are 2 possible solutions to your problem:-
1. Input data into elasticsearch which contains a timestamped field. So upon inputting data that field will be directly recognized by Kibana & would be showed to you in the dropdown menu (where you are currently seeing empty).
It is empty because Kibana couldn't recognize the timestamped field from the data inserted by you in elasticsearch.
2. Untick the option of Index contains time-based events which will allow you to just enter your index name & access Kibana.
Note:- while using Option 2 & specifying index name as .kibana you would notice that it doesn't contain any field or data because .kibana doesnt store any data.
I would suggest you to create an index using curl command and insert data in it with or without timestamped field. If inserted data without timestamped field use Option 2 otherwise use Option 1.

Resources