Remove unnecessary fields in ElasticSearch - elasticsearch

We are populating Elasticsearch via logstash. The thing is that I see some unnecessary fields that I had like to remove like for example:
#version
file
geoip
host
message
offset
tags
Is it possible to do this by defining/extending a dynamic template? If yes, how? If no, can we do this via logstash configuration?
Your help is much appreciated.

You can remove fields using really any logstash filter - when the filter succeeds, it will remove the field.
It makes sense to me to use mutate:
filter {
mutate {
remove_field => [ "file" ]
}
}
That said, most of these fields are incredibly useful and really should not be removed.

Related

Elasticsearch Dynamic Field Mapping and JSON Dot Notation

I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
The problem is that a few other applications deployed to the cluster have labels of the following format:
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.
The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:
object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value
The opposite situation, posting the namespaced label second, results in this exception:
Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].
What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.
My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.
Any help here would be appreciated.
I opted to use the Logstash mutate filter with the rename option as described here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
The end result looked something like this:
filter {
mutate {
'[kubernetes][labels][app]' => '[kubernetes][labels][app.kubernetes.io/name]'
'[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
}
}
Although personally I've never encountered the exact same issue, I had similar problems when I indexed some test data and afterwards changed the structure of the document that should have been indexed (especially when "unflattening" data structures).
Your interpretation of the error message is correct. When you first index the document
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
Elasticsearch will recognize the app as an object/structure due to dynamic mapping.
When you then try to index the document
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
the previously, dynamically created mapping defined the field app as an object with sub-fields but elasticsearch encounters a concrete value, namely "application-1".
I suggest that you setup an index template to define the correct mappings. For the 'outdated' logging-versions I suggest to pre-process the particular documents either through an elasticsearch ingest-pipeline or with e.g. Logstash to get the documents in the correct format.
Hope that helps.

How to define a query timeout for a spring data elastic search query?

My question is more in general assuming i have a simple query like this to elastic search
Page<MyEntity> findAll(Pageable pageable);
I want to be able to set a timeout for this query for instance so it doesn't hang forever, although I read the documentation I didn't see anything clear about how to do it.
Is the any way to do it? a way to set a timeout for Spring-data-elasticsearch queries that I can make sure that nothing will get for too long?
One way of achieving a 'timeout' in a search request query is using the 'timeout' parameter in the query itself. here
Lets assume we want to perform a full-text 'match query', we can add 'timeout' before the query itself:
{
"timeout": "1ms",
"query": {
"match" : {
"description" : "This is a fullText test"
}
}
}
You will have to use Elasticsearch time units as mentioned here and ship them as String values.
In your case - I don't see any way to achieve this using spring-data-es repository, but - you can add custom functionality to your repository and use the ElasticsearchIndexTemplate with the matchAllQuery() (java elastic api),
Something like that (haven't tested it):
nodeEsTemplate.getClient().prepareSearch("test-index")
.setQuery(QueryBuilders.matchAllQuery())
.setTimeout(TimeValue.timeValueMillis(1))
.execute().actionGet();
As nodeEsTemplate is of type ElasticsearchIndexTemplate and assuming you have created a custom findAllWithTimeOut method in your repository class.

Elasticsearch: Fields necessary for map visualization on Kibana

I want to make use of the geoip logstash plugin to get geolocation info about some IP addresses seen in my logs;
I also want to be able to visualize such info on kibana;
I am going through a short overview of the process;
What the tutorial does not mention, is what are the geoip.* fields necessary for producing the map visualizations;
I want to keep only the strictly necessary fields and discard the rest;
Will keeping only geoip.longtitute and geoip.latitude do the job?
edit: At this point in time I am just using
{ geoip { source => "my_incoming_ip" } }
in my logstash filter;
It turns out the following field is necessary for producing the map visualization
geoip.location {
"lat": 38.7163,
"lon": -78.1704
}
The others can be ommited (i.e. mutate/remove)

logstash not mapping to values in indices

I have a sample log
2016-12-28 16:40:53.290 [debug] <0.545.0> <<"{\"user_id\”:\”79\”,\”timestamp\":\"2016-12-28T11:10:26Z\",\"operation\":\"ver3 - Requested for recommended,verified handle information\",\"data\":\"\",\"content_id\":\"\",\"channel_id\":\"\"}">>
for which I have written a logstash grok filter
filter{
grok {
match => { "message" => "%{URIHOST} %{TIME} %{SYSLOG5424SD} <%{BASE16FLOAT}.0> <<%{QS}>>"}
}
}
in http://grokdebug.herokuapp.com/ everything is working fine and values are getting mapped with filter.
When I am pushing values with this filter into elastic search it's not getting mapped and in message only I am getting whole log as it is.
Please let me know if I am doing something wrong.
Your kibana screen shot isn't loading, but I'll take a guess: you're capturing patterns, but not naming the data into fields. Here's the difference:
%{TIME}
will look for that pattern in your data. The debugger will show "TIME" as having been parsed, but logstash won't create a field without being asked.
%{TIME:myTime}
will create the field (and you can see it working in the debugger).
You would need to do this for any matched pattern that you would like to save.

How to remove the json. prefix on my elasticsearch field

I use ELK to get some info on my rabbitmq stuff.
Here my conf logstash side
json {
source => "message"
}
But in kibana I have to prefix all my fields with json.xxx:
json.sender, json.sender.raw,json.programld, json.programId.raw ...
How can I not have this json.-prefix in my field names, so that I only have to have: sender, programId, etc.?
Best regards and thanks for your help !
Bonus question : what are all these .'raw' I must use in kibana ?
According to the doc:
By default it will place the parsed JSON in the root (top level) of
the Logstash event, but this filter can be configured to place the
JSON into any arbitrary event field, using the target configuration.
So it feels like your json is wrapped in a container named "json" or you're setting the "target" in logstash without showing us.
As for ".raw", the default elasticsearch mapping will analyze the data you put in a field, so changing "/var/log/messages" into three words: [var, log, messages]" which can make it hard to search. To keep you from having to worry about this at the beginning, logstash creates a ".raw" version of each string, which is not analyzed.
You'll eventually make your own mappings, and you can make the original field not_analyzed, so you won't need the .raw versions anymore.

Resources