Dears,
i have created two grok pattern for in single log file , i want add existing field to another document if condition match , could you advice me how to add one existing field to another document
my log input
my input
INFO [2020-05-21 18:00:17,240][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - CarrierCode = AI ServiceNumber = 0531 DeparturePortCode = DEL ArrivalPortCode = CCJ DepartureDateTime = Thu May 21 XXXXX AST 2020 ArrivalDateTime = Thu May 21 XXX
WARN [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - ValidationRuleSegmentStops - Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
INFO [2020-05-21 18:00:17,242][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - SensitiveDataFilterHelper - Sensitive Data Filter key is not enabled.
ERROR [2020-05-21 18:00:17,243][appListenerContainer-3][ID:b5ba824c-9f79-4dd4-9d53-1012250ce72a] - AbstractMessageHandler - APP_LOGICAL_VALIDATION_FAILURE: comment1 = Multiple segment stops with departure datetime not set - only one permitted. Message sequence number 374991954 discarded
my filter
filter {
if [type] == "server" {
grok {
match => [ "message", "%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}"]
}
}
if [failed_corid] == "%{seg_corid}" {
aggregate {
task_id => "%{appmessageid}"
code => "map['carrierCode'] = [carrierCode]"
map_action => "create"
end_of_task => true
timeout => 120
}
}
mutate { remove_field => ["message"]}
if "_grokparsefailure" in [tags]{drop {} }
}
output {
if [type] == "server" {
elasticsearch {
hosts => ["X.X.X.X:9200"]
index => "app-data-%{+YYYY-MM-DD}"
}
}
}
my requred fields found in defferent grok patter for example
carrier code found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<seg_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+CarrierCode\s+=\s+(?<carrierCode>[A-Z0-9]{2})\s+ServiceNumber\s+=\s+(?<Number>[0-9]{4})\s+DeparturePortCode\s+=\s+(?<DeparturePort>[A-Z]{3})\s+ArrivalPortCode\s+=\s+(?<ArrivalPort>[A-Z]{3})\s+DepartureDateTime\s+=\s+%{DATESTAMP_OTHER:departure_datetime}\s+ArrivalDateTime\s+=\s+%{DATESTAMP_OTHER:arrival_datetime},
and failed corid found in
%{LOGLEVEL:log-level}\s+\[%{TIMESTAMP_ISO8601:createddatetime}\]\[%{DATA:lstcontainer}\](?<failed_corid>[^\s]+)\s+\-\s+%{WORD:abstract}\s+\-\s+(?<app-logical-error>[A-Z]{3}\_[A-Z]{7}\_[A-Z]{10}\_[A-Z]{7})\:\s+comment1 = Multiple segment stops with\s+%{WORD:direction}\s+datetime not set - only one permitted. Message sequence number\s+%{NUMBER:appmessageid:int}
we want to merge carriercode field into failed_corid documents
kindly help us how to do it in logstash
expected output fields in the documents
{
"failed_corid": [id-433erfdtert3er]
"carrier_code": AI
}
I am trying to count the correct inputs from the user. An input looks like:
m = "<ex=1>test xxxx <ex=1>test xxxxx test <ex=1>"
The tag ex=1 and the word test have to be connected and in this particular order to count as correct. In case of an invalid input, I want to send the user an error message that explains the error.
I tried to do it as written below:
ex_test_size = m.scan(/<ex=1>test/).size # => 2
test_size = m.scan(/test/).size # => 3
ex_size = m.scan(/<ex=1>/).size # => 3
puts "lack of tags(<ex=1>)" if ex_test_size < ex_size
puts "Lack of the word(test)" if ex_test_size < test_size
I believe it can be written in a better way as the way I wrote, I guess, is prone to errors. How can I make sure that all the errors will be found and shown to the user?
You might use negative lookarounds:
#⇒ ["xxx test", "<ex=1>"]
m.scan(/<ex=1>(?!test).{,4}|.{,4}(?<!<ex=1>)test/).map do |msg|
"<ex=1>test expected, #{msg} got"
end.join(', ')
We scan the string for either <ex=1> not followed by test or vice versa. Also, we grab up to 4 characters that violate the rule for the more descriptive message.
Hello i am a new comer to hadoop's environmment.
I done request to have data on csv.
>LoadHomicide = LOAD '/user/admin/Crimes_samples.csv' USING PigStorage('\t') >AS >(Date:chararray,Block:chararray,PrimaryType:chararray,
>Description:chararray,
>LocationDescription:chararray,Arrest:chararray,Domestic:chararray,District:c>hararray,Year:chararray);
>uniq_arrest = FILTER LoadHomicide BY ($5 matches'%FALSE%');
>dump uniq_arrest;
I have nothing as error but the script's log gives answer success csv here.
ID","Case Number","Date","Block","IUCR","Primary Type","Description","Location Description","Arrest","Domestic","Beat","District","Ward","Community Area","FBI Code","X Coordinate","Y Coordinate","Year","Updated On","Latitude","Longitude","Location"
0442761,"HZ181379",3/9/16 11:55 PM,"023XX N HAMLIN
AVE","0560","ASSAULT","SIMPLE","APARTMENT","false","false","2525","025",35,"22","08A",1150660,1915214,2016,03/16/2016,41.92,-87.72,"(41.923245915,
-87.721845939)" 10442848,"HZ181470",3/9/16 11:55 PM,"0000X W JACKSON BLVD","1310","CRIMINAL DAMAGE","TO PROPERTY","CTA GARAGE / OTHER
PROPERTY","false","false","0113","001",2,"32","14",1176304,1898987,2016,03/16/2016,41.88,-87.63,"(41.878177799,
-87.628111493)" 10442789,"HZ181391",3/9/16 11:55 PM,"052XX W HURON ST","1150","DECEPTIVE PRACTICE","CREDIT CARD
FRAUD","ALLEY","false","false","1524","015",28,"25","11",1141433,1904126,2016,03/16/2016,41.89,-87.76,"(41.892994741,
-87.756023813)" 10447046,"HZ185157",3/9/16 11:50 PM,"055XX N LINCOLN AVE","0460","BATTERY","SIMPLE","HOTEL
matches syntax is incorrect.Also there is no 6th ($5 refers to 6th field in the schema,positional notation starts from $0) field that has "false" in it.Use the correct field and the right syntax.Assuming 6th field has "false" in it, then this is how you would apply the filter on it.
uniq_arrest = FILTER LoadHomicide BY ($5 matches '.*false.*');
I've run into this problem a number of times and maybe it's just my unsophisticated technique as I'm still a bit of a novice with the finer points of text processing, but using pandoc going from html to plain yields pretty tables in the form of:
# IP Address Device Name MAC Address
--- ------------- -------------------------- -------------------
1 192.168.1.3 ANDROID-FFFFFFFFFFFFFFFF FF:FF:FF:FF:FF:FF
2 192.168.1.4 XXXXXXX FF:FF:FF:FF:FF:FF
3 192.168.1.5 -- FF:FF:FF:FF:FF:FF
4 192.168.1.6 -- FF:FF:FF:FF:FF:FF
--- ------------- -------------------------- -------------------
The column headings here in this example (and the fields/cells/whatever in others) aren't especially awk friendly since they contain spaces. There must be some utility (or pandoc option) to add delimiters or otherwise process it in a smart and simple way to make it easier to use with awk (since the dash ruling hints as the max column width), but I'm fast approaching the limits of my knowledge and have been unable to find any good solutions on my own. I'd appreciate any help and I'm open to alternate approaches (I just use pandoc since that's what I know).
I've got a solution for you which parses the dash line to get column lengths, then uses that info to divide each line into columns (similar to what #shellter proposed in the comment, but without the need to hardcode values).
First, within the BEGIN block we read the headers line and the dashes line. Then we will grab the column lengths by splitting the dashline and processing it.
BEGIN {
getline headers
getline dashline
col_count = split(dashline, columns, " ")
for (i=1;i<=col_count;i++)
col_lens[i] = length(columns[i])
}
Now we have the lengths of each column and you can use that inside the main body.
{
start = 1
for (i=start;i<=col_count;i++){
col_n = substr($0, start, col_lens[i])
start = start + col_lens[i] + 1
printf("column %i: [%s]\n",i,col_n);
}
}
That seems a little onerous, but it works. I believe this answers your question. To make things a little nicer, I factored out the line parsing into a user defined function. That's convenient because you can now use it on the headers you stored (if you want).
Here's the complete solution:
function parse_line(line, col_lens, col_count){
start = 1
for (i=start;i<=col_count;i++){
col_i = substr(line, start, col_lens[i])
start = start + col_lens[i] + 1
printf("column %i: [%s]\n", i, col_i)
}
}
BEGIN {
getline headers
getline dashline
col_count = split(dashline, columns, " ")
for (i=1;i<=col_count;i++){
col_lens[i] = length(columns[i])
}
parse_line(headers, col_lens, col_count);
}
{
parse_line($0, col_lens, col_count);
}
If you put your example table into a file called table and this program into a file called dashes.awk, here's the output (using head -n -1 to drop the final row of dashes):
$ head -n -1 table | awk -f dashes.awk
column 1: [ # ]
column 2: [ IP Address ]
column 3: [ Device Name ]
column 4: [ MAC Address]
column 1: [ 1 ]
column 2: [ 192.168.1.3 ]
column 3: [ ANDROID-FFFFFFFFFFFFFFFF ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 2 ]
column 2: [ 192.168.1.4 ]
column 3: [ XXXXXXX ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 3 ]
column 2: [ 192.168.1.5 ]
column 3: [ -- ]
column 4: [ FF:FF:FF:FF:FF:FF]
column 1: [ 4 ]
column 2: [ 192.168.1.6 ]
column 3: [ -- ]
column 4: [ FF:FF:FF:FF:FF:FF]
Have a look at pandoc's filter functionallity: It allows you to programmatically alter the document without having to parse the table yourself. Probably the simplest option is to use lua-filters, as those require no external program and are fully platform-independent.
Here is a filter which acts on each cell of the table body, ignoring the table header:
function Table (table)
for i, row in ipairs(table.rows) do
for j, cell in ipairs(row) do
local cell_text = pandoc.utils.stringify(pandoc.Div(cell))
local text_val = changed_cell(cell_text)
row[j] = pandoc.read(text_val).blocks
end
end
return table
end
where changed_cell could be either a lua function (lua has good built-in support for patterns) or a function which pipes the output through awk:
function changed_cell (raw_text)
return pandoc.pipe('awk', {'YOUR AWK SCRIPT'}, raw_text)
end
The above is a slightly unidiomatic pandoc filter, as filters usually don't act on raw strings but on pandoc AST elements. However, the above should work fine in your case.
I want to parse the below mentioned line from log file.
03:34:19,491 INFO [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
After parse, the output must be :
Time : 03:34:19
LogType : INOF
Message : [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
Ignore : ,491 (comma and 3 digit number).
Grok filter config should be like this for parsing the mentioned log.
...
filter {
grok {
match => {"message" => "%{TIME:Time},%{NUMBER:ms} %{WORD:LogType} %{GREEDYDATA:Message}"}
remove_field => [ "ms" ]
}
}
...