Earlier I had only one type of log for an index, but recently I changed the logs pattern. Now my grok pattern looks like
grok {
match => { "message" => "%{DATA:created_timestamp},%{DATA:request_id},%{DATA:tenant},%{DATA:username},%{DATA:job_code},%{DATA:stepname},%{DATA:quartz_trigger_timestamp},%{DATA:execution_level},%{DATA:facility_name},%{DATA:channel_code},%{DATA:status},%{DATA:current_step_time_ms},%{DATA:total_time_ms},\'%{DATA:error_message}\',%{DATA:tenant_mode},%{GREEDYDATA:channel_src_code},\'%{GREEDYDATA:jobSpecificMetaData}\'" }
match => { "message" => "%{DATA:created_timestamp},%{DATA:request_id},%{DATA:tenant},%{DATA:username},%{DATA:job_code},%{DATA:stepname},%{DATA:quartz_trigger_timestamp},%{DATA:execution_level},%{DATA:facility_name},%{DATA:channel_code},%{DATA:status},%{DATA:current_step_time_ms},%{DATA:total_time_ms},%{DATA:error_message},%{DATA:tenant_mode},%{GREEDYDATA:channel_src_code}" }
}
and sample logs are
2023-01-11 15:16:20.932,edc71ada-62f5-46be-99a4-3c8b882a6ef0,geocommerce,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:16:13 IST 2023,TENANT,null,AMAZON_URBAN_BASICS,SUCCESSFUL,5903,7932,'',LIVE,AMAZON_IN,'{"totalCITCount":0}'
2023-01-11 15:16:29.368,fedca039-e834-4393-bbaa-e1903c3c92e6,bellacasa,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:16:03 IST 2023,TENANT,null,FLIPKART_SMART,SUCCESSFUL,24005,26368,'',LIVE,FLIPKART_SMART,'{"totalCITCount":0}'
2023-01-11 15:16:31.684,762b8b46-2d21-437b-83fc-a1cc40737c84,ishitaknitfab,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:15:48 IST 2023,TENANT,null,FLIPKART_SMART,SUCCESSFUL,41442,43684,'',LIVE,FLIPKART_SMART,'{"totalCITCount":0}'
2023-01-11 15:15:58.739,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Non Sellable Bengaluru Return,null,SUCCESSFUL,393,2739,Task completed successfully,LIVE,null
2023-01-11 15:15:58.743,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Delhi Main,null,SUCCESSFUL,371,2743,Task completed successfully,LIVE,null
2023-01-11 15:15:58.744,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Bengaluru D2C,null,SUCCESSFUL,388,2744,Task completed successfully,LIVE,null
Logstash has to process approximately 150000 events in 5 minutes for this index and approx. 400000 events for the other index.
Now whenever I try to change grok, the CPU usage of the logstash server reaches 100%.
I don't know how to optimize my grok.
can anyone help me is this?
The first step to improve grok would be to anchor the patterns. grok is slow when it fails to match, not when it matches. More details on how much anchoring matters can be found in this blog post from Elastic.
The second step would be to define a custom pattern to use instead of DATA, such as
pattern_definitions => { "NOTCOMMA" => "[^,]*" }
that will prevent DATA attempting to consume more than one field while failing to match.
Our IIS server generates logs in the following format : -
Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
2018-09-13 08:47:52 ::1 GET / - 80 U:papl ::1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/68.0.3440.106+Safari/537.36 - 200 0 0 453
2018-09-13 08:47:52 ::1 GET /api/captcha.aspx rnd=R43YM 80 U:papl ::1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/68.0.3440.106+Safari/537.36 http://localhost/ 200 0 0 36
Now I want to config logstash in such a way where it can create separate columns for IP, RequestMethodType i.e. GET or POST, PageName which is here /api/captcha.aspx.
But it is creating a single column named "message" in elasticSearch and storing whole value in this message field.
So what changes should I make in logstash to create separate columns in ElasticSearch for IP, RequestMethod(POST/GET) and PageName?
Currently, I am using the following filter:-
match => {"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"
In this, it created only messages field and stores all the values in this single field.
Please help me.
NB: to test your pattern, you can use this site, which allow to save a lot of time when working with patterns.
The pattern you're using is too long if you just want IP, request and pageName, you should just try to extract what you need. In addition to this, a shorter pattern will be quicker to execute and more resilient to change.
This filter correctly extract what you asked:
match => {"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:PageName}"}
With this pattern and the logs you provided, you'd get this result (with the site I've linked above):
I tested the filter with logstash:
filter {
grok { match => {"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:PageName}"} }
}
output {
stdout { codec => json }
}
With this input:
2018-09-16 04:11:52 W3SVC10 webserver 107.6.166.194 GET /axestrack/homepagedata/ uname=satish34&pwd=3445&panelid=1 80 - 223.188.235.131 HTTP/1.1 Dalvik/1.6.0+(Linux;+U;+Android+4.4.4;+2014818+MIUI/V7.5.2.0.KHJMIDE) - - vehicletrack.biz 200 0 0 730 229 413
I'm getting this result:
{
"client":"107.6.166.194",
"method":"GET",
"#version":"1",
"host":"frsred-0077",
"message":"2018-09-16 04:11:52 W3SVC10 webserver 107.6.166.194 GET /axestrack/homepagedata/ uname=satish34&pwd=3445&panelid=1 80 - 223.188.235.131 HTTP/1.1 Dalvik/1.6.0+(Linux;+U;+Android+4.4.4;+2014818+MIUI/V7.5.2.0.KHJMIDE) - - vehicletrack.biz 200 0 0 730 229 413\r",
"#timestamp":"2018-09-18T08:13:23.539Z",
"PageName":"/axestrack/homepagedata/"
}
My log file
maker model mileage manufacture_year engine_displacement engine_power body_type color_slug stk_year transmission door_count
seat_count fuel_type date_created date_last_seen price_eur
ford galaxy 151000 2011 2000 103 None man 5 7 diesel 2015-11-14 18:10:06.838319+00 2016-01-27 20:40:15.46361+00 10584.75
skoda octavia 143476 2012 2000 81 None man 5 5 diesel 2015-11-14 18:10:06.853411+00 2016-01-27 20:40:15.46361+00 8882.31
bmw 97676 2010 1995 85 None man 5 5 diesel 2015-11-14 18:10:06.861792+00 2016-01-27 20:40:15.46361+00 12065.06
skoda fabia 111970 2004 1200 47 None man 5 5 gasoline 2015-11-14 18:10:06.872313+00 2016-01-27 20:40:15.46361+00 2960.77
skoda fabia 128886 2004 1200 47 None man 5 5 gasoline 2015-11-14 18:10:06.880335+00 2016-01-27 20:40:15.46361+00 2738.71
Error is below
[2018-03-23T11:35:20,226][ERROR][logstash.agent]Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::PluginLoadingError", :message=>"Couldn't find any filter plugin named 'txt'. Are you sure this is correct?
My conf file is below
input {file {path => "/home/elk/data/logklda.txt"start_position => "beginning"
sincedb_path => "/dev/null"}}
filter{txt {separator => " "columns => ["name","type","category","date","error_log"]}}
output {elasticsearch {hosts => "localhost"index => "logklda"document_type => "category"}stdout{}}
The filter txt does not exist. From your configuration, it seems you want to use the csv filter. In your case, replacing txt by csv, the configuration would look like this:
csv {
separator => " "
columns => ["name","type","category","date","error_log"]
}
there are filter called "txt"is not existing. I think you must check this logstash guide site and correct your code. https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
I created a simple index with a suggest field and a completion type. I indexed some city names. For the suggest field I use a german analyzer.
PUT city_de
{
"mappings": {
"city" : {
"properties": {
"name" : {
"type": "text",
"analyzer": "german"
},
"suggest" : {
"type": "completion",
"analyzer": "german"
}
}
}
}
}
The analyzer works fine and the search by using umlauts is good. Also the autocompletion is perfect. But I faced an issue by searching for the term wie.
Lets say I have two documents Wiesbaden and Wien with the same name as suggest completion term.
If I searching for wie I assume that the cities Wien and Wiesbaden are in the response. But unfortunately I get no response. I suppose that wie has a restriction because of the german analyzer. Because if I search for wi or wies I get valid responses.
Same is for term was, er, sie, und which looks like stemming words in german.
Do I need any additional configuration to get also a result if I search for wie or was?
Thanks!
The problem
Searching city names by prefix
"wie" should find "Wien" or "Wiesbaden"
Possible solution approach
For the usecase I would suggest using an edge n-gram https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html and ASCII folding the terms https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html.
Example
wien
token position start offset end offset
w 0 0 1
wi 1 0 2
wie 2 0 3
wien 3 0 4
wiesbaden
token position start offset end offset
w 0 0 1
wi 1 0 2
wie 2 0 3
wies 3 0 4
...
wiesbaden 8 0 9
Keep in mind that the system has to work in a asymmetric way now. The query should not be analyzed (use keyword) but the data in the index has to be analyzed.
There are two ways to achieve this:
1.) Add the query analyzer to use the query
2.) Bind the query analyzer to the field
"cities": {
"type": "text",
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "autocomplete_analyzer", <-- index time analyzer
"search_analyzer": "autocomplete_search" <-- search time analyzer
}
}
}
Why does the german analyzer not work
The analyzer is designed for german text and uses an easy algorithm to remove flection and morphology.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#german-analyzer
Here is an example for the typical terms generated by this tokenizer
Hallo hier ist der Text über Wiesbaden und Wien. Es scheint angebracht über Wände und Wandern zu sprechen.
hallo 0 0 5
text 4 19 23
wiesbad 6 29 38
wien 8 43 47
scheint 10 52 59
angebracht 11 60 70
wand 13 76 81
wandern 15 86 93
sprech
If it works on city names this happens just by coincidence.
How can I format digit in logstash?
I am using the '' % format expression in ruby code in filter plugin but I get nil as format result. I tried sprintf and format function but same result.
Below is my code snippet.
ruby {
code => "
event.set( 'positioning', event.get('branch_lat') + ',' + event.get('branch_lon') )
event.set( 'report_datetime', event.get('report_date') + '%04d' % event.get('report_time') )
"
}
As a format result, I get below error in the log.
[2016-10-28T12:31:43,217][ERROR][logstash.filters.ruby ] Ruby exception occurred: undefined method `+' for nil:NilClass
My platform information is below.
[root#elk-analytic logstash]# rpm -qi logstash
Name : logstash
Epoch : 1
Version : 5.0.0
Release : 1
Architecture: noarch
Install Date: Thu 27 Oct 2016 01:26:03 PM JST
Group : default
Size : 198320729
License : ASL 2.0
Signature : RSA/SHA512, Wed 26 Oct 2016 01:57:59 PM JST, Key ID d27d666cd88e42b4
Source RPM : logstash-5.0.0-1.src.rpm
Build Date : Wed 26 Oct 2016 01:10:26 PM JST
Build Host : packer-virtualbox-iso-1474648640
Relocations : /
Packager : <vagrant#packer-virtualbox-iso-1474648640>
Vendor : Elasticsearch
URL : http://www.elasticsearch.org/overview/logstash/
Summary : An extensible logging pipeline
Description :
An extensible logging pipeline
Added on 2016.10.28 14:32
My Goal is to parse below csv columns into timestamp field in elasticsearch.
Please notice that hour of time has mixed patterns of 1 and 2 digits.
date,time
20160204,1000
20160204,935
I tried using date function in filter plugin but it did not work properly by logging error.
[2016-10-28T11:00:10,233][WARN ][logstash.filters.date ] Failed parsing date from field {:field=>"report_datetime",
:value=>"20160204 935", :exception=>"Cannot parse \"20160204 935\": Value 93 for hourOfDay must be in the range [0,23]", :config_parsers=>"YYYYMMdd Hmm", :config_locale=>"default=en_US"}
Below is the code snippet when above error appeared.
ruby {
code => "
event.set( 'positioning', event.get('branch_lat') + ',' + event.get('branch_lon') )
event.set( 'report_datetime', event.get('report_date') + ' ' + event.get('report_time') )
"
}
# Set the #timestamp according to report_date and time
date {
"match" => ["report_datetime", "YYYYMMdd Hmm"]
}
I did some modification and ended up with the code I first posted.
I suggest to do it like this without any ruby filter:
filter {
# your other filters...
# if 3-digit hours, pad the time with one zero
if [time] =~ /^\d{3}$/ {
mutate {
add_field => { "report_datetime" => "%{date} 0%{time}" }
}
# otherwise just concat the fields
} else {
mutate {
add_field => { "report_datetime" => "%{date} %{time}" }
}
}
# match date and time
date {
"match" => ["report_datetime", "yyyyMMdd HHmm"]
"target" => "report_datetime"
}
}