ELK Stack - Customize autogenerated field mappings - elasticsearch

I've got a very basic ELK stack setup and passing logs to it via syslog. I have used inbuilt grok patterns to split the logs in to fields. But the field mappings are auto-generated by logstash elasticsearch plugin and I am unable to customize them.
For instance, I create a new field by name "dst-geoip" using logstash config file (see below):
geoip {
database => "/usr/local/share/GeoIP/GeoLiteCity.dat" ### Change me to location of GeoLiteCity.dat file
source => "dst_ip"
target => "dst_geoip"
fields => [ "ip", "country_code2", "country_name", "latitude", "longitude","location" ]
add_field => [ "coordinates", "%{[dst_geoip][latitude]},%{[geoip][longitude]}" ]
add_field => [ "dst_country", "%{[dst_geoip][country_code2]}"]
add_field => [ "flow_dir", "outbound" ]
}
I want to assign it the type "geo_point" which I cannot edit from Kibana. Online documents mentions manually updating the mapping on respective index using ElasticSearch APIs. But Logstash generates many indices (one per day). If I update one index, will the mapping stay the same in future indices?

What you're looking for is a "template".

Related

After adding Prune filter along with KV filter - logs are not going to Elastic search

I am learning ELK and trying to do as a POC for my project. I am applying KV filter for the sample integration logs from my project and i could see lot of extra fields are coming as a result so i have tried to apply prune filter and white-listed certain fields. I can see the logs getting printed in the logstash server but logs are not going to elastic search. If i remove the filter it is going to the elastic search. Please advise how to further debug on this issue.
filter {
kv {
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
prune
{
whitelist_names => [ "App_version", "Correlation_id", "Env", "Flow_name", "host", "Instance_id", "log_level","log_thread", "log_timestamp", "message", "patient_id", "status_code", "type", "detail"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "mule-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
I also need two more suggestion,
I am also trying to use the grok filter in the initial logs and trying to take log level fields(time and log type) from the sample log and send the remaining logs to the KV filter. Is there any reference please share for it. This is what i have tried for it. but getting as _grokparsefailure. I have passed the msgbody to the kv filter with the source option.
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
I am having message fields inside sample logs as like below. When the data goes to Kibana i can see two message field tag one is with full log and other is with correct message(highlighted). Will the mutate works for this case? Is there any way we can change the full log name as something else ??
[2020-02-10 11:20:07.172] INFO Mule.api [[MuleRuntime].cpuLight.04:
[main-api-test].api-main.CPU_LITE #256c5cf5:
[main-api-test].main-api-main/processors/0/processors/0.CPU_LITE
#378f34b0]: event:00000003 {app_name=main-api-main, app_version=v1,
env=Test, timestamp=2020-02-10T11:20:07.172Z,
log={correlation_id=00000003, patient_id=12345678,
instance_id=hospital, message=Start of System API,
flow_name=main-api-main}}
prune filter error
Your prune filter does not have the #timestamp field in the whitelist_names list, your output is date based (%{+YYYY.MM.dd}), logstash needs the #timestamp field in the output to extract the date.
I've ran your pipeline with your sample message and it worked as expected, with the prune filter the message is sent to elasticsearch, but it is stored in an index named mule- without any datetime field.
Without the prune filter your message uses the time when logstash received the event as the #timestamp, since you do not have any date filter to change it.
If you created the index pattern for the index mule-* with a datetime field like #timestamp, you won't see on Kibana any documents on the index that doesn't have the same datetime field.
grok error
Your grok is wrong, you need to escape the square brackets surrounding your timestamp. Kibana has a grok debugger where you can try your patterns.
The following grok works, move your kv to run after the grok and with the msgbody as source.
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
kv {
source => "msgbody"
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
Just run it with output only to stdout to see the filters you need to change your prune filter.
duplicated message fields
If you put your kv filter after the grok you wouldn't have duplicated message fields since your kv is capitalizing your fields, you will end with a message field containing your full log, and a Message field containing your internal message, logstash fields are case sensitive.
However you can rename any field using the mutate filter.
mutate {
rename => ["message", "fullLogMessage"]
}

Elastic Document's #version not incrementing when updating via Logstash

I want to load the issue data from a JIRA instance to my Elastic Stack on a regular basis. I don't want to create a new elastic document every time I pull the data from the JIRA API, but instead update the existing document document, which means there should only exist one document per JIRA issue. When updating, I would expect the #version field to increment automatically when setting the document_id field of the elasticsearch output plugin.
Currently working setup
Elastic Stack: Version 7.4.0 running on Ubuntu in Docker containers
Logstash Input stage: get the JIRA issue data via http_poller input plugin
Logstash Filter stage: use the split filter plugin to modify the JSON data as needed
Logstash Output stage: pipe the data to Elasticsearch and make it visible in Kibana
Where I am struggling
The data is correctly registered in Elastic and shown in Kibana. As expected there is one document per issue. However, the document is being overwritten but #version stays at value 1. I assumend using action => "update", doc_as_upsert => true and document_id => "%{[#metadata][id]}" would be enough to make Elasticsearch realize that it needs to increment the version of the document.
I am wondering in general if this is the correct approach to make the JIRA issue data searchable over time. For example, will I be able to find the status quo of a JIRA ticket at a past #version? Or will the #version value only give me the information how often the document was updated, without giving me the indiviual document version's values?
logstash.conf (certain data was removed and replaced with <> tags)
input {
http_poller {
urls => {
data => {
method => get
url => "https://<myjira>.com/jira/rest/api/2/search?<searchJQL>"
headers => {
Authorization => "Basic <censored>"
Accept => "application/json"
"Content-Type" => "application/json"
}
}
}
request_timeout => 60
schedule => { every => "10s" } # low value for debugging
codec => "json"
}
}
filter {
split {
field => "issues"
add_field => {
"key" => "%{[issues][key]}"
"Summary" => "%{[issues][fields][summary]}"
[#metadata]["id"] => "%{[issues][id]}" # unique ID of a JIRA issue, the JIRA issue key could also be used
}
remove_field => [ "startAt", "total", "maxResults", "expand", "issues"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "gsep"
user => ["<usr>"]
password => ["<pw>"]
hosts => ["elasticsearch:9200"]
action => "update"
document_id => "%{[#metadata][id]}"
doc_as_upsert => true
}
}
Screenshots from Document Data in Kibana
I had to censor information, but the missing information should not be relevant. On the screenshot you can see that the same _id is correctly set, but the #version stays at 1. In Elasticstash/Kibana exists only exactly this document for the respective issue/_id.
The #version field is coming from logstash and is just an indicator for the version of your log message format. There is no auto-increment functionality etc.
Please note, there is also a _version field in elasticsearch documents.
_version is an automatically incremented value used for optimistic locking in a concurrency scenario.
Just to be clear, elasticsearch can't give you what you are expecting in terms of versioning out of the box. You can't access a different version of the same document relying on _version. There are design patterns hot to implement such a document history in elasticsearch. But that's a broad question with many answers and out of scope of this question.

Logstash filter to identify address matches

I have a CSV file with customer addresses. I have also an Elasticsearch index with my own addresses. I use Logstash as tool to import the CSV file. I'd like to use a logstash filter to check in my index if the customer address already exists. All I found is the default elasticsearch filter ("Copies fields from previous log events in Elasticsearch to current events") which doesn't look the correct one to solve my problem. Does another filter exist for my problem?
Here my configuration file so far:
input {
file {
path => "C:/import/Logstash/customer.CSV"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
csv {
columns => [
"Customer",
"City",
"Address",
"State",
"Postal Code"
]
separator => ";"
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "customer-unmatched"
}
stdout{}
}
You don't normally have access to the data in Elasticsearch while processing your Logstash event. Consider using a pipeline on an Ingest node

Remove header fields generated by http input plugin

When I use http input plugin, Logstash adds the following fields to Elasticsearch:
headers.http_accept
headers.content_type
headers.request_path
headers.http_version
headers.request_method
...
How can I remove all these fields starting with headers.?
Since these are all pathed, that means they all are hierarchical under [headers] as far as the logstash configs go. This will probably do wonders for you:
filter {
mutate {
remove_field => [ "headers" ]
}
}
Which should drop the [headers][http_accept], [headers][content_type] and so on fields.

Plot a Tile map with the ELK stack

I'm trying to create a tile map with Kibana. My conf file logstash works correctly and generates all what Kibana needs to plot a tile map. This is my conf logstash :
input {
file {
path => "/home/ec2-user/part.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["kilo_bytes_total","ip","session_number","request_number_total","duration_minutes_total","referer_list","filter_match_count_avg","request_number_avg","duration_minutes_avg","kilo_bytes_avg","segment_duration_avg","req_by_minute_avg","segment_mix_rank_avg","offset_avg_avg","offset_std_avg","extrem_interval_count_avg","pf0_avg","pf1_avg","pf2_avg","pf3_avg","pf4_avg","code_0_avg","code_1_avg","code_2_avg","code_3_avg","code_4_avg","code_5_avg","volume_classification_filter_avg","code_classification_filter_avg","profiles_classification_filter_avg","strange_classification_filter_avg"]
}
geoip {
source => "ip"
database => "/home/ec2-user/logstash-5.2.0/GeoLite2-City.mmdb"
target => "geoip"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
add_tag => "geoip"
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
}
output {
elasticsearch {
index => "geotrafficip"
}
}
And this is what that generates :
It looks cool. Trying to create my tile map, I have this message :
What to do ?
It seems that I must add somewhere the possiblity to use dynamic templates.. Should I create a template and add it to my file conf logstash ?
Can anybody give me some feedback ? Thanks !
If you look in the Kibana settings for your index, you'll need at least one field to show up with a type of geo_point to be able to get anything on a map.
If you don't already have a geo_point field, you'll need to re-index your data after setting up an appropriate mapping for the geoip.coordinates field. For example: https://stackoverflow.com/a/42004303/2785358
If you are using a relatively new version of Elasticsearch (2.3 or later), it's relatively easy to re-index your data. You need to create a new index with the correct mapping, use the re-index API to copy the data to the new index, delete the original index and then re-index back to the original name.
You are using the geoip filter wrong and are trying to convert the longitude and latitude to float. Get rid of your mutate filter and change the geoip filter to this.
geoip {
source => "ip"
fields => ["latitude","longitude"]
add_tag => "geoip"
}
This will create the appropriate fields. And the required GeoJSON object.

Resources