How to create a map chart with GeoIP mapping? - elasticsearch

I'm fairly new to ELK (7.10), and I would like to know how to create a map chart using GeoIP mapping.
I already have logs parsed and one field is "remote_ip" which I want to view on a map chart.
I've seen lots of instructions on how to do this but most are out of date and do not apply to my version which is 7.10. I'm using filebeats/logstash/kibana/elasticsearch.
Could someone show me the high level steps required to do this? Or point me to a detailed guide appropriate to my version? I have no idea how to begin.

I'm assuming those IP addresses are public so you can geocode them. Since your logs are already indexed, you now need to geocode them. Here is how to do it.
First, you need to modify your mapping to add a geo_point field, like this:
PUT your-index/_mapping
{
"properties": {
"remote_location": {
"type": "geo_point"
}
}
}
Once you've added that new field to your mapping, you can update your index to geocode the IP addresses. For that, you first need to create an ingest pipeline with the geoip processor:
PUT _ingest/pipeline/geoip
{
"description" : "Geocode IP address",
"processors" : [
{
"geoip" : {
"field" : "remote_ip",
"target_field": "remote_location"
}
}
]
}
Once this ingest pipeline is created you can use it to update your index using the _update_by_query endpoint like this:
POST your-index/_update_by_query?pipeline=geoip
Once the update is over, you can go into Kibana, create an index pattern and then go to Analytics > Maps and create your map.

Related

Remove ECS data from metricbeat for smaller documents

I use the graphite beat to get graphite protocol metrics into es.
The metric document is much bigger than the metric data itself (timestamp, value, metric name).
I also get all the ECS data inserted and I think it will make my queries much slower (and my documents much bigger) and I don't need this data.
Can I remove the ECS data somehow in the metricbeat configuration?
You might be able to use Metricbeat's drop_fields processor, but it might not be able to remove all the fields you specify as some are added after the processor chain.
So, acting on the ES side will guarantee you that you can change the event source the way you like. Also if you have many Beats deployed, you only need to configure this in a single place.
One way to achieve this is to create an index template for Metricbeat events and attach an ingest pipeline to it.
PUT _index_template/my-template
{
"index_patterns" : [
"metricbeat-*"
],
"template" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metric-lifecycle"
},
"codec" : "best_compression",
"default_pipeline" : "metric-pipeline"
}
},
...
Then the metric-pipeline would simply look like this and remove all the fields listed in the field array:
PUT _ingest/pipeline/metric-pipeline
{
"processors": [
{
"remove": {
"field": ["agent", "host", "..."]
}
}
]
}

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Use number field as date in Kibana date_histogram aggregation

I'm trying to use Visualize feature in Kibana to plot monthly date_histogram graph that counts # of messages in my system. Message type has a sent_at field that is stored as number since epoch time.
Although I can do that just fine with elasticsearch query
POST /_all/message/_search?size=0
{
"aggs" : {
"monthly_message" : {
"date_histogram" : {
"field" : "sent_at",
"interval" : "month"
}
}
}
}
I ran into a problem in Kibana saying No Compatible Fields: The "myindex" index pattern does not contain any of the following field types: date
Is there a way to get Kibana to use number field as date?
Not to my knowledge, Kibana will use the index mapping in order to find out date fields, if no date fields can be found, then Kibana won't be able to infer one from the other number fields.
What you can do is to add another field called sent_at_date to your mapping, then use the update-by-query API in order to copy the sent_at field to that new field and finally to recreate your index pattern in Kibana.
It goes basically like this:
# 1. add a new field to your mapping
PUT myindex/_mapping/message
{
"properties": {
"sent_at_date": {
"type": "date"
}
}
}
# 2. update all your documents
POST myindex/_update_by_query
{
"script": {
"source": "ctx._source.sent_at_date = ctx._source.sent_at"
}
}
And finally recreate your index pattern in Kibana. You should see a new field called sent_at_date of type date that you can use in Kibana.

Specifying Field Types Indexing from Logstash to Elasticsearch

I have successfully ingested data using the XML filter plugin from Logstash to Elasticsearch, however all the field types are of the type "text."
Is there a way to manually or automatically specify the correct type?
I found the following technique good for my use:
Logstash would filter the data and change a field from the default - text to whatever form you want. The documentation would be found here. The example given in the documentation is:
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
This you add in the /etc/logstash/conf.d/02-... file in the body part. I believe the downside of this practice is that from my understanding it is less recommended to alter data entering the ES.
After you do this you will probably get the this problem. If you have this problem and your DB is a test DB that you can erase all old data just DELETE the index until now that there would not be a conflict (for example you have a field that was until now text and now it is received as date there would be a conflict between old and new data). If you can't just erase the old data then read into the answer in the link I linked.
What you want to do is specify a mapping template.
PUT _template/template_1
{
"index_patterns": ["te*", "bar*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"type1": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z YYYY"
}
}
}
}
}
Change the settings to match your needs such as listing the properties to map what you want them to map to.
Setting index_patterns is especially important because it tells elastic how to apply this template. You can set an array of index patterns and can use * as appropriate for wildcards. i.e logstash's default is to rotate by date. They will look like logstash-2018.04.23 so your pattern could be logstash-* and any that match the pattern will receive the template.
If you want to match based on some pattern, then you can use dynamic templates.
Edit: Adding a little update here, if you want logstash to apply the template for you, here is a link to the settings you'll want to be aware of.

how to add geo_point type data to elasticsearch from logstash?

I would like to add some custom geo search functions to my program(not geoip, translating ip address into coordinate). How do i filter custom lat and lng data into elasticsearch geo_type format data so that i can visualize in kibana tile map?
so as you may have found out, there is a (somewhat clunky) solution.
basically you need to set the mapping of the geo_point field before you could log data that way (I also used ES python module directly instead logging via logstash.. just to be sure).
so how do you set the correct mapping?
make sure you use a fresh instance of elasticsearch (or at least that the mapping for both the index and the type you will use is not set yet)
run from sense (or use the appropriate curl command)
PUT <index_name>
{
"mappings": {
"<type_name>": {
"properties": {
"timestamp": {
"type": "date"
},
"message": {
"type": "string"
},
"location": {
"type": "geo_point"
}
<etc.>
}
}
}
}
now you're golden, just make sure that your geo_points are in a format that ES excepts
more on mapping geo_points here:
ElasticSearch how to setup geo_point
and here:
https://discuss.elastic.co/t/geo-point-logging-from-python-to-elasticsearch/37336

Resources