Getting multiple fields from message in filebeat and logstash - elasticsearch

I am writing logs into log file from my Django app, from there I am shipping those logs to elasticsearch. Because I want to split the fields as well, I am using logstash between filebeat and elasticsearch.
Here is sample log field:
2019-03-19 13:39:06 logfile INFO save_data {'field1': None, 'time':
'13:39:06', 'mobile': '9876543210', 'list_item': "[{'item1': 10,
'item2': 'path/to/file'}]", 'response': '{some_complicated_json}}',
'field2': 'some data', 'date': '19-03-2019', 'field3': 'some other
data'}
I tried to write a GROK match pattern but all the fields are going into message field :
%{TIMESTAMP_ISO8601:temp_date}%{SPACE} %{WORD:logfile} %{LOGLEVEL:level} %{WORD:save_data} %{GREEDYDATA:message}
How can I write GROK match pattern which can decompose above log entry.

I don't know how you could do this with Grok, but the way we do it is with a json processor in elastic ingest node pipeline. Something like this:
{
"my-log-pipeline": {
"description": "My log pipeline",
"processors": [{
"json": {
"field": "message",
"target_field": "messageFields"
}
}]
}
}
Then you just need to tell your source (filebeat/logstash) to use this pipeline when ingesting.

Related

Create custom grok pattern to message filed in elasticsearch

I am having a query related to grok processor.
For example this is my message filed
{
"message":"agentId:agent003"
}
I want to Grok this and my output should me something like this
{
"message":"agentId:agent003",
"agentId":"agent003"
}
Could some one help me on this how to achieve this? If i am able to do it for one field i can manage for rest of my fields. Thanks in advance.
This is the pipeline i have created in elasticsearch
PUT _ingest/pipeline/dissectpipeline
{
"description" : "split message content",
"processors": [
{
"dissect": {
"field": "message",
"pattern" : "%{apm_application_message.agentId}:%{apm_application_message.agentId}"
}
}
]
}
Central management added filebeat module other config
- pipeline:
if: ctx.first_char == '{'
name: '{< IngestPipeline "dissectpipeline" >}'
There is no error with my filebeat it's working fine but i am unable to find any field like apm_application_message.agentId in index.
How to make sure my pipeline working or not. Also if i am doing something wrong please let me know.
Instead of grok I'd suggest using the dissect filter instead with, which is more intuitive and easier to use.
dissect {
mapping => {
"message" => "%{?agentId}:%{&agentId}"
}
}
If you're using Filebeat, there is also the possibility to use the dissect processor:
processors:
- dissect:
tokenizer: "%{?agentId}:%{&agentId}"
field: "message"
target_prefix: ""

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

How can I easily convert logstash filters with grok to ingest pipelines?

Logstash has ingest-convert.sh to convert ingest pipelines to Logstash, but I'd like to convert the other way around.
I have the 23-Tomcat-filters and grok_patterns file.
Is there any smart way that would save me time writing all json files to this ingest pipeline-format, based on the current logstash configuration?
PUT _ingest/pipeline/ujjain
{
"description" : "Pipeline ujjain",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{MONTH}%{SPACE}%{MONTHDAY},%{SPACE}%{YEAR}%{SPACE}%{HOUR}:?%{MINUTE}(?::?%{SECOND})%{SPACE}(?:AM|PM)%{SPACE}%{NOTSPACE:class}%{SPACE}%{NOTSPACE:type_log}%{SPACE}%{WORD:loglevel}:%{SPACE}%{GREEDYDATA:log_text}"]
}
...

any elasticsearch datatype matching decimal timestamp?

I have a timestamp from log file like {"ts" : "1486418325.948487"}
My infrastructure are "filebeat" 5.2.0 --> "elasticsearch" 5.2
I tried mapping the "ts" to "date" -- "epoch_second" but es writing failed in filebeat.
PUT /auth_auditlog
{
"mappings": {
"auth-auditlogs": {
"properties": {
"ts": {
"type": "date",
"format": "epoch_second"
}
}
}
}
}
The filebeat error msg like
WARN Can not index event (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [ts]","caused_by":{"type":"illegal_argument_exception","reason":"Invalid format: \"1486418325.948487\""}}
I tried use "1486418325" is ok so I guess es doesn't accept decimal format timestamp. However, python default output timestamp is this format.
My purpose is to type correctly in elasticsearch. I want to use ts as a original timestamp in elasticsearch.
Any solution is welcome except to change the original log data!
Filebeat doesn't have a processor for this type of stuff. You can't replace the #timestamp with the one your log has in Filebeat. What you can do, is send that stuff to logstash and let the date filter parse epoch.
date {
match => ["timestamp","UNIX","UNIX_MS"]
}
The other option would be to use a ingest node. Although I haven't used this myself, it seems it is also able to do the job. Check out the docs here.

elasticsearch: define field's order in returned doc

i'm doing sending queries to elasticsearch and it responde with an unknown order of fields in its documents.
how can i fix the order that elsasticsearch is returning fields inside documents?
i mean, i'm sending this query:
{
"index": "my_index",
"_source":{
"includes" : ["field1","field2","field3","field14"]
},
"size": X,
"body": {
"query": {
// stuff
}
}
}
and when it responds, it gives me something not in the good order.
i ultimatly want to convert this to csv, and want to fix csv headers.
is there something to do so i can get something like
doc1 :{"field1","field2","field3","field14"}
doc2 :{"field1","field2","field3","field14"}
...
in the same order as my "_source" ?
thank's for your help.
A document in Elasticsearch is a JSON hash/map and by definition maps are unordered.
One solution around this would be to use Logstash in order to extract docs from ES using an elasticsearch input and output them in CSV using a csv output. That way you can guarantee that the fields in the CSV file will have the exact same order as specified. Another benefit is that you don't have to write your own boilerplate code to extract from ES and sink to CSV, Logstash does it all for you for free.
The Logstash configuration would look something like this:
input {
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match_all": {} } }'
size => 100
index => "my_index"
}
}
filter {}
output {
csv {
fields => ["field1","field2","field3","field14"]
path => "/path/to/file.csv"
}
}

Resources