Is there a way parse log messages using rsyslog config and transform them to structured messages? - rsyslog

I am trying to parse log messages and transform them to structured messages using rsyslog. Is there a way support such operation with rsyslog config? I have not yet explored the option to write custom parser or message modification plugin for this.
I found template list properties which can do some of it. Is there a way to do the following?
Map 2 fields to single output name. Ex: "__ts": "2018-09-20 10:18:56.363" (first 2 fields in example below). Would not use regex here as I am looking for a solution that does not depend on value of the fields. Ex: the two fields could be two strings or some other values not just dates.
Extract what is left in msg after extracting all known fields based on position. Ex: "msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN".
Is there a way to use local variables to hold the values of fields from msg and use the variables in templates?
Example Log message:
2018-09-20 10:18:56.363 INFO --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN
1. rsyslog config template definition
template(name="structure-log-format" type="list") {
constant(value="{")
# This only extracts the first field with value 2018-09-20.
# TODO: What is a way to map first 2 fields to map to __ts field?
property(outname="__ts" name="msg" field.number="1" field.delimiter="32" format="jsonf") constant(value=", ")
constant(value="\"event\":[{")
constant(value="\"payload\":{")
property(outname="_log_" name="syslogtag" format="jsonf") constant(value=", ")
property(outname="__loglvl" name="msg" field.number="4" field.delimiter="32" format="jsonf") constant(value=", ")
property(outname="__thread" name="msg" field.number="7" field.delimiter="32" format="jsonf") constant(value=", ")
property(outname="__key1" name="msg" field.number="8" field.delimiter="32" format="jsonf") constant(value=", ")
# The following setting will include full message value starting from "2018-09-20 ... DOWN"
# TODO: What is a way to only include message starting from "Unregistering ... DOWN"?
property(name="msg" format="jsonf" droplastlf="on" )
constant(value="}")
constant(value="}]} \n")
}
2. Expected result:
{
"__ts": "2018-09-20 10:18:56.363",
"event": [
{
"payload": {
"_log_": "catalina",
"__loglvl": "INFO",
"__thread": "Thread-68",
"__key1": "x.y.z.key1Value",
"msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
}
}
]
}
3. Actual result:
{
"__ts": "2018-09-20",
"event": [
{
"payload": {
"_log_": "catalina",
"__loglvl": "INFO",
"__thread": "Thread-68",
"__key1": "x.y.z.key1Value",
"msg": "2018-09-20 10:18:56.363 INFO 2144 --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
}
}
]
}
Thank you.

You can also use regular expressions to match parts of a message. For example, replace your outname="__ts" property with:
property(outname="__ts" name="msg"
regex.expression="([^ ]+ +[^ ]+)"
regex.type="ERE"
regex.submatch="1" format="jsonf")
Here the extended regular expression (ERE) looks for not-a-space ([^ ]) one or more of them (+), followed by a space or more, and another not-a-space. These 2 words are captured as a submatch by the () and you select this one, counting from 1. The result should be as you want.
You can similarly use a regex for the second requirement, either by counting "words" and spaces again, or some more precise other match. Here the regex skips 6 words by putting a repeat count {6} after the word-and-spaces pattern, then captures the rest (.*). Since there are 2 sets of (), the submatch to keep is now 2, not 1:
property(name="msg"
regex.expression="([^ ]+ +){6}(.*)"
regex.type="ERE"
regex.submatch="2" format="jsonf" droplastlf="on" )

Related

Logstash aggregate filter didnt add new field into index in-fact didn't created any index after aggregation filter add

Dears,
i have created two grok pattern for in single log file , i want add existing field to another document if condition match , could you advice me how to add one existing field to another document
my log input
my input
INFO [2020-05-21 18:00:17,240][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - CarrierCode = AI ServiceNumber = 0531 DeparturePortCode = DEL ArrivalPortCode = CCJ DepartureDateTime = Thu May 21 XXXXX AST 2020 ArrivalDateTime = Thu May 21 XXX
WARN [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
INFO [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - SensitiveDataFilterHelper - Sensitive Data Filter key is not enabled.
ERROR [2020-05-21 18:00:17,243][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - AbstractMessageHandler - APP_LOGICAL_VALIDATION_FAILURE: comment1 = Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
my filter
filter {
if [type] == "server" {
grok {
match => [ "message", "%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}"]
}
}
if [failed_corid] == "%{seg_corid}" {
aggregate {
task_id => "%{appmessageid}"
code => "map['carrierCode'] = [carrierCode]"
map_action => "create"
end_of_task => true
timeout => 120
}
}
mutate { remove_field => ["message"]}
if "_grokparsefailure" in [tags]{drop {} }
}
output {
if [type] == "server" {
elasticsearch {
hosts => ["X.X.X.X:9200"]
index => "app-data-%{+YYYY-MM-DD}"
}
}
}
my requred fields found in defferent grok patter for example
carrier code found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},
and failed corid found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}
we want to merge carriercode field into failed_corid documents
kindly help us how to do it in logstash
expected output fields in the documents
{
"failed_corid": [id-433erfdtert3er]
"carrier_code": AI
}

Scan/Match incorrect input error messages

I am trying to count the correct inputs from the user. An input looks like:
m = "<ex=1>test xxxx <ex=1>test xxxxx test <ex=1>"
The tag ex=1 and the word test have to be connected and in this particular order to count as correct. In case of an invalid input, I want to send the user an error message that explains the error.
I tried to do it as written below:
ex_test_size = m.scan(/<ex=1>test/).size # => 2
test_size = m.scan(/test/).size # => 3
ex_size = m.scan(/<ex=1>/).size # => 3
puts "lack of tags(<ex=1>)" if ex_test_size < ex_size
puts "Lack of the word(test)" if ex_test_size < test_size
I believe it can be written in a better way as the way I wrote, I guess, is prone to errors. How can I make sure that all the errors will be found and shown to the user?
You might use negative lookarounds:
#⇒ ["xxx test", "<ex=1>"]
m.scan(/<ex=1>(?!test).{,4}|.{,4}(?<!<ex=1>)test/).map do |msg|
"<ex=1>test expected, #{msg} got"
end.join(', ')
We scan the string for either <ex=1> not followed by test or vice versa. Also, we grab up to 4 characters that violate the rule for the more descriptive message.

Pig: my filter doesn't give result

Hello i am a new comer to hadoop's environmment.
I done request to have data on csv.
>LoadHomicide = LOAD '/user/admin/Crimes_samples.csv' USING PigStorage('\t') >AS >(Date:chararray,Block:chararray,PrimaryType:chararray,
>Description:chararray,
>LocationDescription:chararray,Arrest:chararray,Domestic:chararray,District:c>hararray,Year:chararray);
>uniq_arrest = FILTER LoadHomicide BY ($5 matches'%FALSE%');
>dump uniq_arrest;
I have nothing as error but the script's log gives answer success csv here.
ID","Case Number","Date","Block","IUCR","Primary Type","Description","Location Description","Arrest","Domestic","Beat","District","Ward","Community Area","FBI Code","X Coordinate","Y Coordinate","Year","Updated On","Latitude","Longitude","Location"
0442761,"HZ181379",3/9/16 11:55 PM,"023XX N HAMLIN
AVE","0560","ASSAULT","SIMPLE","APARTMENT","false","false","2525","025",35,"22","08A",1150660,1915214,2016,03/16/2016,41.92,-87.72,"(41.923245915,
-87.721845939)" 10442848,"HZ181470",3/9/16 11:55 PM,"0000X W JACKSON BLVD","1310","CRIMINAL DAMAGE","TO PROPERTY","CTA GARAGE / OTHER
PROPERTY","false","false","0113","001",2,"32","14",1176304,1898987,2016,03/16/2016,41.88,-87.63,"(41.878177799,
-87.628111493)" 10442789,"HZ181391",3/9/16 11:55 PM,"052XX W HURON ST","1150","DECEPTIVE PRACTICE","CREDIT CARD
FRAUD","ALLEY","false","false","1524","015",28,"25","11",1141433,1904126,2016,03/16/2016,41.89,-87.76,"(41.892994741,
-87.756023813)" 10447046,"HZ185157",3/9/16 11:50 PM,"055XX N LINCOLN AVE","0460","BATTERY","SIMPLE","HOTEL
matches syntax is incorrect.Also there is no 6th ($5 refers to 6th field in the schema,positional notation starts from $0) field that has "false" in it.Use the correct field and the right syntax.Assuming 6th field has "false" in it, then this is how you would apply the filter on it.
uniq_arrest = FILTER LoadHomicide BY ($5 matches '.*false.*');

`awk`-ing plain text tables produced by pandoc with fields/cells/values that contain spaces

I've run into this problem a number of times and maybe it's just my unsophisticated technique as I'm still a bit of a novice with the finer points of text processing, but using pandoc going from html to plain yields pretty tables in the form of:
# IP Address Device Name MAC Address
--- ------------- -------------------------- -------------------
1 192.168.1.3 ANDROID-FFFFFFFFFFFFFFFF FF:FF:FF:FF:FF:FF
2 192.168.1.4 XXXXXXX FF:FF:FF:FF:FF:FF
3 192.168.1.5 -- FF:FF:FF:FF:FF:FF
4 192.168.1.6 -- FF:FF:FF:FF:FF:FF
--- ------------- -------------------------- -------------------
The column headings here in this example (and the fields/cells/whatever in others) aren't especially awk friendly since they contain spaces. There must be some utility (or pandoc option) to add delimiters or otherwise process it in a smart and simple way to make it easier to use with awk (since the dash ruling hints as the max column width), but I'm fast approaching the limits of my knowledge and have been unable to find any good solutions on my own. I'd appreciate any help and I'm open to alternate approaches (I just use pandoc since that's what I know).
I've got a solution for you which parses the dash line to get column lengths, then uses that info to divide each line into columns (similar to what #shellter proposed in the comment, but without the need to hardcode values).
First, within the BEGIN block we read the headers line and the dashes line. Then we will grab the column lengths by splitting the dashline and processing it.
BEGIN {
getline headers
getline dashline
col_count = split(dashline, columns, " ")
for (i=1;i<=col_count;i++)
col_lens[i] = length(columns[i])
}
Now we have the lengths of each column and you can use that inside the main body.
{
start = 1
for (i=start;i<=col_count;i++){
col_n = substr($0, start, col_lens[i])
start = start + col_lens[i] + 1
printf("column %i: [%s]\n",i,col_n);
}
}
That seems a little onerous, but it works. I believe this answers your question. To make things a little nicer, I factored out the line parsing into a user defined function. That's convenient because you can now use it on the headers you stored (if you want).
Here's the complete solution:
function parse_line(line, col_lens, col_count){
start = 1
for (i=start;i<=col_count;i++){
col_i = substr(line, start, col_lens[i])
start = start + col_lens[i] + 1
printf("column %i: [%s]\n", i, col_i)
}
}
BEGIN {
getline headers
getline dashline
col_count = split(dashline, columns, " ")
for (i=1;i<=col_count;i++){
col_lens[i] = length(columns[i])
}
parse_line(headers, col_lens, col_count);
}
{
parse_line($0, col_lens, col_count);
}
If you put your example table into a file called table and this program into a file called dashes.awk, here's the output (using head -n -1 to drop the final row of dashes):
$ head -n -1 table | awk -f dashes.awk
column 1: [ # ]
column 2: [ IP Address ]
column 3: [ Device Name ]
column 4: [ MAC Address]
column 1: [ 1 ]
column 2: [ 192.168.1.3 ]
column 3: [ ANDROID-FFFFFFFFFFFFFFFF ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 2 ]
column 2: [ 192.168.1.4 ]
column 3: [ XXXXXXX ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 3 ]
column 2: [ 192.168.1.5 ]
column 3: [ -- ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 4 ]
column 2: [ 192.168.1.6 ]
column 3: [ -- ]
column 4: [ FF:FF:FF:FF:FF:FF]
Have a look at pandoc's filter functionallity: It allows you to programmatically alter the document without having to parse the table yourself. Probably the simplest option is to use lua-filters, as those require no external program and are fully platform-independent.
Here is a filter which acts on each cell of the table body, ignoring the table header:
function Table (table)
for i, row in ipairs(table.rows) do
for j, cell in ipairs(row) do
local cell_text = pandoc.utils.stringify(pandoc.Div(cell))
local text_val = changed_cell(cell_text)
row[j] = pandoc.read(text_val).blocks
end
end
return table
end
where changed_cell could be either a lua function (lua has good built-in support for patterns) or a function which pipes the output through awk:
function changed_cell (raw_text)
return pandoc.pipe('awk', {'YOUR AWK SCRIPT'}, raw_text)
end
The above is a slightly unidiomatic pandoc filter, as filters usually don't act on raw strings but on pandoc AST elements. However, the above should work fine in your case.

grok match : parse log file for time only using pattern or match

I want to parse the below mentioned line from log file.
03:34:19,491 INFO [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
After parse, the output must be :
Time : 03:34:19
LogType : INOF
Message : [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
Ignore : ,491 (comma and 3 digit number).
Grok filter config should be like this for parsing the mentioned log.
...
filter {
grok {
match => {"message" => "%{TIME:Time},%{NUMBER:ms} %{WORD:LogType} %{GREEDYDATA:Message}"}
remove_field => [ "ms" ]
}
}
...

Resources