Using Elasticsearch and filebeats how do I only execute my pipeline on certain files? - elasticsearch

I only want to run my pipeline on files where the log path contains a certain keyword, how do I do this within the pipeline?
Pipeline (removed my pattern and patterns as it is not relevant):
{
"description" : "...",
"processors": [
{
"grok": {
"if": "ctx['log']['file']['path'].value.contains('keyword')",
"field": "message",
}
}
]
}
In Kibana I see I have log.file.path available as metadata, and I just want to run the pipeline if it contains a keyword, but I get a runtime error because of my if statement.
Thanks for your help!
EDIT: I think the problem lies with how I am trying to access the log.file.path field as I don't know how to reference it correctly from here.

You can probably use the Drop processor
https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html
"drop": {
"if": "ctx.log.file.path.contains('keyword');"
}
You can find more complexe exemples here:
https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-conditional-complex.html

Related

How to add custom index using ingest node pipeline?

Is it possible to create conditional indexing by using ingest node pipelines? I feel this could be done by the script processor but can someone tell if this is possible?
I am in a scenario where I should decide which is a better way to do custom indexing. I can mention conditions in the metricbeat.yml /filebeat.yml files to get this done. But is this the best way to do custom indexing? There is no logstash in my elastic stack
output.elasticsearch:
indices:
- index: "metricbeat-dev-%{[agent.version]}-%{+yyyy.MM.dd}"
when.equals:
kubernetes.namespace: "dev"
This is how I have implemented custom indexing in metric/filebeat right now. I have like 20+ namespaces in my Kubernetes cluster. Please help in suggesting if this could be done by ingest node pipeline or not
Yes, You can achived this by ingest pipeline Set Processor. Ingest Pipeline support accessing of metadata fields and you can access / update index name using _index field name.
Below is sample Ingest Pipeline which will update index name when namespace is dev:
[
{
"set": {
"field": "_index",
"value": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace== 'dev'"
}
}
]
Upadte 1: append agent version to index name. I ahve consider agent version feild name as agent.version
[
{
"set": {
"field": "_index",
"value": "metricbeat-dev-{{agent.version}}",
"if": "ctx.kubernetes?.namespace== 'dev'"
}
}
]

Create custom grok pattern to message filed in elasticsearch

I am having a query related to grok processor.
For example this is my message filed
{
"message":"agentId:agent003"
}
I want to Grok this and my output should me something like this
{
"message":"agentId:agent003",
"agentId":"agent003"
}
Could some one help me on this how to achieve this? If i am able to do it for one field i can manage for rest of my fields. Thanks in advance.
This is the pipeline i have created in elasticsearch
PUT _ingest/pipeline/dissectpipeline
{
"description" : "split message content",
"processors": [
{
"dissect": {
"field": "message",
"pattern" : "%{apm_application_message.agentId}:%{apm_application_message.agentId}"
}
}
]
}
Central management added filebeat module other config
- pipeline:
if: ctx.first_char == '{'
name: '{< IngestPipeline "dissectpipeline" >}'
There is no error with my filebeat it's working fine but i am unable to find any field like apm_application_message.agentId in index.
How to make sure my pipeline working or not. Also if i am doing something wrong please let me know.
Instead of grok I'd suggest using the dissect filter instead with, which is more intuitive and easier to use.
dissect {
mapping => {
"message" => "%{?agentId}:%{&agentId}"
}
}
If you're using Filebeat, there is also the possibility to use the dissect processor:
processors:
- dissect:
tokenizer: "%{?agentId}:%{&agentId}"
field: "message"
target_prefix: ""

Is there a way to update a document with a Painless script without changing the order of unaffected fields?

I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Exclude "'" in pipeline Kibana

I am currently working with Kibana and am running into a problem which i cant solve.
in my source file there is a rule which includes "", however when i run the script i made for kibana in dev tools it does not include that rule bur gives a error. how can i exclude those characters or how can it be included?
i have tried to exclude those characters using a g sub field but that doesn't work either.
"%{DATA:Datetime},%{DATA:Elapsed},%{DATA:label},%{DATA:ResponseCode},%{DATA:ResponseMessage},%{DATA:ThreadName},%{DATA:DataType},%{DATA:Success},%{DATA:FailureMessage},%{DATA:Bytes},%{DATA:SentBytes},%{DATA:GRPThreads},%{DATA:AllThreads},%{DATA:URL},%{DATA:Latency},%{DATA:IdleTime},%{GREEDYDATA:Connect}"
that is the grok pattern i'm using.
27-19-2018 12:19:43,8331,OK - Refresh Samenvatting,200,"Number of samples in transaction : 67, number of failing samples : 0",Thread Group 1-1,,true,,550720,137198,1,1,null,8318,5094,270
and this is the line i want to run trough it, it goes wrong at the "".
R. Kiers
Best to use custom patterns. Tested on Kibana 6.x and works for the sample data provided above. Can easily tweak to work with any sample data
custom patterns used:
UNTILNEXTCOMMA ([^,]*)
Grok pattern:
%{DATA:Datetime},%{NUMBER:Elapsed},%{UNTILNEXTCOMMA:label},%{NUMBER:Responsecode},"%{GREEDYDATA:ResponseMessage}",%{UNTILNEXTCOMMA:ThreadName},%{UNTILNEXTCOMMA:DataType},%{UNTILNEXTCOMMA:Success},%{UNTILNEXTCOMMA:FailureMessage},%{NUMBER:Bytes},%{NUMBER:SentBytes},%{NUMBER:GRPThreads},%{NUMBER:AllThreads},%{UNTILNEXTCOMMA:URL},%{NUMBER:Latency},%{NUMBER:IdleTime},%{NUMBER:Connect}
EDIT1:
Pipeline for Kibana 5.x
PUT _ingest/pipeline/test
{
"description": "Test pipeline",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{DATA:Datetime},%{DATA:Elapsed},%{DATA:label},%{DATA:ResponseCode},\"%{GREEDYDATA:ResponseMessage}\",%{DATA:ThreadName},%{DATA:DataType},%{DATA:Success},%{DATA:FailureMessage},%{DATA:Bytes},%{DATA:SentBytes},%{DATA:GRPThreads},%{DATA:AllThreads},%{DATA:URL},%{DATA:Latency},%{DATA:IdleTime},%{GREEDYDATA:Connect}"
],
"ignore_failure": false
}
}
]
}
Tested using simulate and it works
POST _ingest/pipeline/test/_simulate
{
"docs" : [
{ "_source": { "message" : "27-19-2018 12:19:43,8331,OK - Refresh Samenvatting,200,\"Number of samples in transaction : 67, number of failing samples : 0\",Thread Group 1-1,,true,,550720,137198,1,1,null,8318,5094,270"
} }
]
}

Resources