Create custom grok pattern to message filed in elasticsearch - elasticsearch

I am having a query related to grok processor.
For example this is my message filed
{
"message":"agentId:agent003"
}
I want to Grok this and my output should me something like this
{
"message":"agentId:agent003",
"agentId":"agent003"
}
Could some one help me on this how to achieve this? If i am able to do it for one field i can manage for rest of my fields. Thanks in advance.
This is the pipeline i have created in elasticsearch
PUT _ingest/pipeline/dissectpipeline
{
"description" : "split message content",
"processors": [
{
"dissect": {
"field": "message",
"pattern" : "%{apm_application_message.agentId}:%{apm_application_message.agentId}"
}
}
]
}
Central management added filebeat module other config
- pipeline:
if: ctx.first_char == '{'
name: '{< IngestPipeline "dissectpipeline" >}'
There is no error with my filebeat it's working fine but i am unable to find any field like apm_application_message.agentId in index.
How to make sure my pipeline working or not. Also if i am doing something wrong please let me know.

Instead of grok I'd suggest using the dissect filter instead with, which is more intuitive and easier to use.
dissect {
mapping => {
"message" => "%{?agentId}:%{&agentId}"
}
}
If you're using Filebeat, there is also the possibility to use the dissect processor:
processors:
- dissect:
tokenizer: "%{?agentId}:%{&agentId}"
field: "message"
target_prefix: ""

Related

Elasticsearch - Enable fulltext search of field

I have run into a brick wall considering searching in my logged events. I am using an elasticsearch solution, filebeat to load messages from logs to elasticsearch, and Kibana front end.
I currently log the messages into a field message and exception stacktrace (if present) into error.message. So the logged event's snippet may look like:
{
"message": "Thrown exception: CustomException (Exception for testing purposes)"
"error" : {
"message" : "com.press.controller.CustomException: Exception for testing purposes\n at
com.press.controller....<you get the idea at this point>"
}
}
Of course there are other fields like timestamp, but those are not important. What is important is this:
When I search message : customException, I can find the events I logged. When I search error.message : customException, I do not get the events. I need to be able to fulltext search all fields.
Is there a way how to tell elasticsearch to enable the fulltext search in the fields?
And why has the "message" field enabled it by default? None of my colleagues are aware that any indexing command was run on the field in the console after deployment and our privileges do not allow me or other team members to run indexing or analysis commands on any field. So it has to be in the config somewhere.
So far I was unable to find the solution. Please push me in the right direction.
Edit:
The config of fields is as follows:
We use a modified ECS, and both messages are declared as
level: core
type: text
in file fields.yml.
in filebeat, the config snippet is as such:
filebeat.inputs:
- type: log
enabled: true
paths: .....
...
...
processors:
- rename:
fields:
- from: "msg"
to: "message"
- from: "filepath"
to: "log.file.name"
- from: "ex"
to: "error.message"
ignore_missing: true
fail_on_error: true
logging.level: debug
logging.to_files: true
For security requirements, I cannot disclose full files. Also, I need to write all the snippets by hand, so misspells are probably my fault.
Thanks
Problem is with the analyzer associated with your field, by default for text fields in ES, standard analyzer is used which doesn't create separate tokens if text contains . for ex: foo.bar would result in just 1 token as foo.bar while if you want both foo and bar should match in foo.bar then you need to genrate 2 tokens as foo and bar.
What you need is a custom analyzer which creates token as above as your error.message text contains . which I explained in my example:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": ["replace_dots"]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
}
}
}
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "foo.bar"
}
The above example creates 2 tokens as foo and bar and same should happen with you when you create and test it with these API.
Let me know if you face any issue with it.
Elastic Search indexes all fields by default, here you did not define the mapping hence all fields should be indexed by default.
Also for your case I doubt if the data is properly going in elastic search as the log doesn't seem to be proper json.
Do you see proper logs in Kibana if yes please send a sample log/screenshot

Getting multiple fields from message in filebeat and logstash

I am writing logs into log file from my Django app, from there I am shipping those logs to elasticsearch. Because I want to split the fields as well, I am using logstash between filebeat and elasticsearch.
Here is sample log field:
2019-03-19 13:39:06 logfile INFO save_data {'field1': None, 'time':
'13:39:06', 'mobile': '9876543210', 'list_item': "[{'item1': 10,
'item2': 'path/to/file'}]", 'response': '{some_complicated_json}}',
'field2': 'some data', 'date': '19-03-2019', 'field3': 'some other
data'}
I tried to write a GROK match pattern but all the fields are going into message field :
%{TIMESTAMP_ISO8601:temp_date}%{SPACE} %{WORD:logfile} %{LOGLEVEL:level} %{WORD:save_data} %{GREEDYDATA:message}
How can I write GROK match pattern which can decompose above log entry.
I don't know how you could do this with Grok, but the way we do it is with a json processor in elastic ingest node pipeline. Something like this:
{
"my-log-pipeline": {
"description": "My log pipeline",
"processors": [{
"json": {
"field": "message",
"target_field": "messageFields"
}
}]
}
}
Then you just need to tell your source (filebeat/logstash) to use this pipeline when ingesting.

Exclude "'" in pipeline Kibana

I am currently working with Kibana and am running into a problem which i cant solve.
in my source file there is a rule which includes "", however when i run the script i made for kibana in dev tools it does not include that rule bur gives a error. how can i exclude those characters or how can it be included?
i have tried to exclude those characters using a g sub field but that doesn't work either.
"%{DATA:Datetime},%{DATA:Elapsed},%{DATA:label},%{DATA:ResponseCode},%{DATA:ResponseMessage},%{DATA:ThreadName},%{DATA:DataType},%{DATA:Success},%{DATA:FailureMessage},%{DATA:Bytes},%{DATA:SentBytes},%{DATA:GRPThreads},%{DATA:AllThreads},%{DATA:URL},%{DATA:Latency},%{DATA:IdleTime},%{GREEDYDATA:Connect}"
that is the grok pattern i'm using.
27-19-2018 12:19:43,8331,OK - Refresh Samenvatting,200,"Number of samples in transaction : 67, number of failing samples : 0",Thread Group 1-1,,true,,550720,137198,1,1,null,8318,5094,270
and this is the line i want to run trough it, it goes wrong at the "".
R. Kiers
Best to use custom patterns. Tested on Kibana 6.x and works for the sample data provided above. Can easily tweak to work with any sample data
custom patterns used:
UNTILNEXTCOMMA ([^,]*)
Grok pattern:
%{DATA:Datetime},%{NUMBER:Elapsed},%{UNTILNEXTCOMMA:label},%{NUMBER:Responsecode},"%{GREEDYDATA:ResponseMessage}",%{UNTILNEXTCOMMA:ThreadName},%{UNTILNEXTCOMMA:DataType},%{UNTILNEXTCOMMA:Success},%{UNTILNEXTCOMMA:FailureMessage},%{NUMBER:Bytes},%{NUMBER:SentBytes},%{NUMBER:GRPThreads},%{NUMBER:AllThreads},%{UNTILNEXTCOMMA:URL},%{NUMBER:Latency},%{NUMBER:IdleTime},%{NUMBER:Connect}
EDIT1:
Pipeline for Kibana 5.x
PUT _ingest/pipeline/test
{
"description": "Test pipeline",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{DATA:Datetime},%{DATA:Elapsed},%{DATA:label},%{DATA:ResponseCode},\"%{GREEDYDATA:ResponseMessage}\",%{DATA:ThreadName},%{DATA:DataType},%{DATA:Success},%{DATA:FailureMessage},%{DATA:Bytes},%{DATA:SentBytes},%{DATA:GRPThreads},%{DATA:AllThreads},%{DATA:URL},%{DATA:Latency},%{DATA:IdleTime},%{GREEDYDATA:Connect}"
],
"ignore_failure": false
}
}
]
}
Tested using simulate and it works
POST _ingest/pipeline/test/_simulate
{
"docs" : [
{ "_source": { "message" : "27-19-2018 12:19:43,8331,OK - Refresh Samenvatting,200,\"Number of samples in transaction : 67, number of failing samples : 0\",Thread Group 1-1,,true,,550720,137198,1,1,null,8318,5094,270"
} }
]
}

Logstash normalise URL from JSON logs

I have logs in new line separated JSON like following
{
"httpRequest": {
"requestMethod": "GET",
"requestUrl": "/foo/submit?proj=56"
}
}
Now I need the url without the dynamic parts in the i.e. 1st resource (someTenant) and the query parameters to be added as a field in elasticsearch ie. the expected normalised url is
"requestUrl": "/{{someTenant}}/submit?{{someParams}}"
I already have the following filter in logstash config but not sure how to do sequence of regex operation on a specific field and add it as a new one.
json{
source => "message"
}
This way I could aggregate the unique endpoints although the urls are different in logs due to variable path params and query params.
Since this question tagged with grok, i will go ahead and assume you can use grok filters.
use grok filter and create a new field from requestUrl field, you can then use URIPATHPARAM grok pattern to separate various components from requestUrl as follows,
grok {
match => {"requestUrl" => "%{URIPATHPARAM:request_data}"}
}
this will produce following output,
{
"request_data": [
[
"/foo/submit?proj=56"
]
],
"URIPATH": [
[
"/foo/submit"
]
],
"URIPARAM": [
[
"?proj=56"
]
]
}
Can be tested on Grok Online Debugger
thanks

Modify the content of a field using logstash

I am using logstash to get data from a sql database. There is a field called "code" in which the content has
this structure:
PO0000001209
ST0000000909
And what I would like to do is to remove the 6 zeros after the letters to get the following result:
PO1209
ST0909
I will put the result in another field called "code_short" and use it for my query in elasticsearch. I have configured the input
and the output in logstash but I am not sure how to do it using grok or maybe mutate filter
I have read some examples but I am quite new on this and I am a bit stuck.
Any help would be appreciated. Thanks.
You could use a mutate/gsub filter for this but that will replace the value of the code field:
filter {
mutate {
gsub => [
"code", "000000", "",
]
}
}
Another option is to use a grok filter like this:
filter {
grok {
match => { "code" => "(?<prefix>[a-zA-Z]+)000000%{INT:suffix}" }
add_field => { "code_short" => "%{prefix}%{suffix}"}
}
}

Resources