Remove ECS data from metricbeat for smaller documents - elasticsearch

I use the graphite beat to get graphite protocol metrics into es.
The metric document is much bigger than the metric data itself (timestamp, value, metric name).
I also get all the ECS data inserted and I think it will make my queries much slower (and my documents much bigger) and I don't need this data.
Can I remove the ECS data somehow in the metricbeat configuration?

You might be able to use Metricbeat's drop_fields processor, but it might not be able to remove all the fields you specify as some are added after the processor chain.
So, acting on the ES side will guarantee you that you can change the event source the way you like. Also if you have many Beats deployed, you only need to configure this in a single place.
One way to achieve this is to create an index template for Metricbeat events and attach an ingest pipeline to it.
PUT _index_template/my-template
{
"index_patterns" : [
"metricbeat-*"
],
"template" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metric-lifecycle"
},
"codec" : "best_compression",
"default_pipeline" : "metric-pipeline"
}
},
...
Then the metric-pipeline would simply look like this and remove all the fields listed in the field array:
PUT _ingest/pipeline/metric-pipeline
{
"processors": [
{
"remove": {
"field": ["agent", "host", "..."]
}
}
]
}

Related

Conditional indexing in metricbeat using Ingest node pipeline creates a datastream

I am trying to achieve conditional indexing for namespaces in elastic using ingest node pipelines. I used the below pipeline but the index getting created when I add the pipeline in metricbeat.yml is in form of datastreams.
PUT _ingest/pipeline/sample-pipeline
{
"processors": [
{
"set": {
"field": "_index",
"copy_from": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace==\"dev\"",
"ignore_failure": true
}
}
]
}
Expected index name is metricbeat-dev but i am getting the value in _index as .ds-metricbeat-dev.
This works fine when I test with one document but when I implement it in yml file I get the index name starting with .ds- why is this happening?
update for the template :
{
"metricbeat" : {
"order" : 1,
"index_patterns" : [
"metricbeat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metricbeat",
"rollover_alias" : "metricbeat-metrics"
},
If you have data streams enabled in the index templates it has potential to create a datastream. This would depend upon how you configure the priority. If priority is not mentioned then it would create legacy index but if priority higher than 100 is mentioned in the index templates. Then this creates a data stream(legacy index has priority 100 so use priority value more than 100 if you want index in form of data stream).
If its create a data stream and its not expected please check if there is a template pointing to index you are writing where data stream is enabled! This was the reason in my case.
Have been working with this for few months and this is what I have observed!

How to create a map chart with GeoIP mapping?

I'm fairly new to ELK (7.10), and I would like to know how to create a map chart using GeoIP mapping.
I already have logs parsed and one field is "remote_ip" which I want to view on a map chart.
I've seen lots of instructions on how to do this but most are out of date and do not apply to my version which is 7.10. I'm using filebeats/logstash/kibana/elasticsearch.
Could someone show me the high level steps required to do this? Or point me to a detailed guide appropriate to my version? I have no idea how to begin.
I'm assuming those IP addresses are public so you can geocode them. Since your logs are already indexed, you now need to geocode them. Here is how to do it.
First, you need to modify your mapping to add a geo_point field, like this:
PUT your-index/_mapping
{
"properties": {
"remote_location": {
"type": "geo_point"
}
}
}
Once you've added that new field to your mapping, you can update your index to geocode the IP addresses. For that, you first need to create an ingest pipeline with the geoip processor:
PUT _ingest/pipeline/geoip
{
"description" : "Geocode IP address",
"processors" : [
{
"geoip" : {
"field" : "remote_ip",
"target_field": "remote_location"
}
}
]
}
Once this ingest pipeline is created you can use it to update your index using the _update_by_query endpoint like this:
POST your-index/_update_by_query?pipeline=geoip
Once the update is over, you can go into Kibana, create an index pattern and then go to Analytics > Maps and create your map.

Can you set a field to not_analyzed in an auto created index in Elasticsearch?

As part of our AWS infrastructure, I am using an Elasticsearch (7.4) index. We use Terraform to create the domain in AWS Elasticsearch but we don't create the index explicitly. Instead when the first document is posted, the index is auto-created. This worked well, but now I have been given the requirement to have a non analyzed field (user id).
Is there a way to make a field not_analyzed when putting the first document?
If there is not, what are my options to set the field to not_analyzed? Should I do some sort of init/bootstrapping? Maybe there is a way to do it from Terraform. The application is build using Chalice and runs in Lambda. Not sure how to do initialization in Lambda in that case. Ideally I would fire this call a single time:
PUT /my_index
{
"mappings" : {
"properties" : {
"user_id" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
When restarting the application, this call would be send again but I guess it's immutable (PUT).
This might be an overkill but I would consider using index template feature
This may look like
PUT _index_template/template_1
{
"index_patterns": [
"my_template*"
],
"template": {
"mappings": {
"properties": {
"user_id" : {
"type" : "keyword"
}
}
}
},
"priority": 1
}
It can be terraformed using dedicated provider - it also integrates directly with AWS Elasticsearch using IAM keys.
Then first document created in that way will also build an index using given template (of course if name will match the pattern)
This doesn't directly answer your question, but for your problem I would suggest a solution outside Elasticsearch:
Provision a second Lambda function in Terraform that will be permitted to run PUT operations against Elasticsearch, and has a sole purpose of creating your index.
In Terraform, invoke this lambda function once the domain is created
In other words, perform the bootstrapping mentioned in your question, but move it to a separate lambda function instead of it being mixed into your application lambda.

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Get elasticsearch indices before specific date

My logstash service sends the logs to elasticsearch as daily indices.
elasticsearch {
hosts => [ "127.0.0.1:9200" ]
index => "%{type}-%{+YYYY.MM.dd}"
}
Does Elasticsearch provides the API to lookup the indices before specific date?
For example, how could I get the indices created before 2015-12-15 ?
The only time I really care about what indexes are created is when I want to close/delete them using curator. Curator has "age" type features built in, if that's also your use case.
I think you are looking for Indices Query have a look here
Here is an example:
GET /_search
{
"query": {
"indices" : {
"query": {
"term": {"description": "*"}
},
"indices" : ["2015-01-*", "2015-12-*"],
"no_match_query": "none"
}
}
}
Each index has a creation_date field.
Since the number of indices is supposed to be quite small there's no such feature as 'searching for indices'. So you just get their metadata and filter them inside your app. The creation_date is also available via _cat API.

Resources