grok not reading a word with hyphen - elasticsearch

This is my grok pattern
2017-09-25 08:58:17,861 p=14774 u=ec2-user | 14774 1506329897.86160: checking for any_errors_fatal
I'm trying to read the user but it's giving only ec2 , it's not giving the full word
Sorry i'm newer to the grok filter
My current pattern :
%{TIMESTAMP_ISO8601:timestamp} p=%{WORD:process_id} u=%{WORD:user_id}
Current output :
...
...
...
"process_id": [
[
"14774"
]
],
"user_id": [
[
"ec2"
]
]
}

WORD is defined as "\b\w+\b"
See https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
\b is a word boundary
\w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or "_"
+ means any number of the previous character. So \w+ means any number of characters
Note that \w does NOT match -
So to make it work instead of WORD use
(?<user_id>\b[\w\-]+\b)
This does not use the preddefined grok patterns but "raw" regexp
the (?....) is used instead of %{ as it is "raw" regexp
\- means a literal - sign
[ ] means a character class. So [\w-] will match all the things \w does and - as well

InputAllow1-2 : Success
Grok Filter(?:%{GREEDYDATA:Output}?|-)
Result
{"Output":[["Allow1-2 : Success"]]}

Related

How to match any pattern by ignoring any special character in Logstash?

I am writing a grok pattern for switch log. I am not getting how to ignore the "%" character form the log %DAEMON-3-SYSTEM_MSG
Complete log is-
Jul 16 21:06:50 %DAEMON-3-SYSTEM_MSG: Un-parsable frequency in /mnt/pss/ntp.drift
This can be done using the plain % character. A not very efficient example:
%%{NOTSPACE:switch_source}: %{GREEDYDATA:switch_message}
Which will set:
{
"switch_source": [
[
"DAEMON-3-SYSTEM_MSG"
]
],
"switch_message": [
[
"Un-parsable frequency in /mnt/pss/ntp.drift"
]
]
}
The percent-sign is not a special character in Oniguruma regex, so you don't have to escape it. When used with %{ and then } later, that's when you run into problems. But your log-snippet doesn't seem to use that pattern.

Regex: match something except within arbitrary delimiters

My string:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
Using Ruby's regex flavor, I would like to do something like:
a.gsub /regex_pattern/, '_'
And obtain:
"Please_match_spaces_here_<but not here>._Again_match_here_<while ignoring these>"
This should do it:
result = subject.gsub(/\s+(?![^<>]*>)/, '_')
This regex assumes there's nothing tricky like escaped angle brackets. Also be aware that \s matches newlines, TABs and other whitespace characters as well as spaces. That's probably what you want, but you have the option of matching only spaces:
/ +(?![^<>]*>)/
I think, it works:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
pattern = /<(?:(?!<).)*>/
a.gsub(pattern, '')
# => "Please match spaces here . Again match here "

Logstash using gsub

I would like to use the gsub filter or a ruby code filter to do the following in logstash.
I have a field which is dynamically named eg. P12IP3, P12IP2, P13IP1 etc.
I would like to remove all white space characters in these fields.
However, the following does not seem to work
gsub => ["/(.)IP(.)/"," ",""]
I've tried some variations using ruby code filter as well, but could not get it to work. Can someone suggest a solution?
Sample Conf of what I have tried
grok {
patterns_dir => "/etc/logstash/patterns"
match => [ "message", "iLO %{BASE16NUM:P16F1} %{HLA_TS_1:ts1} / %{BASE16NUM:P16F2}
%{BASE16NUM:P16F3} :
%{BASE16NUM:P16F4} %{BASE16NUM:P16F5} Browser login : OA
Administrator1 \- \ %{IP_HLA:P16IP1} \( DNS name not found \) \." ]
add_tag => [ "pattern", "16" ]
tag_on_failure => []
}
grok {
patterns_dir => "/etc/logstash/patterns"
match => [ "message", "iLO %{BASE16NUM:P17F1} %{HLA_TS_1:ts1} / %{BASE16NUM:P17F2} %{BASE16NUM:P17F3} :
%{BASE16NUM:P17F4} %{BASE16NUM:P17F5} Browser login : OA
Administrator3 \- \ %{IP_HLA:P17IP1} \( DNS name not found \) \." ]
add_tag => [ "pattern", "17" ]
tag_on_failure => []
}
mutate{
gsub => [
"/(.*)IP(.*)/"," ",""
]
}
Here above you can see that there are two IP fields P16IP1 and P17IP1, what I want is that both of them should be replaced by the gsub mutation filter such that all white space is removed in the values of the field.
I am also providing the input, the following is an input for the first pattern (16).
iLO 2 2012 / 31 / 14 13 : 24 : 01 / 2011 12 : 52 1 Browser login : OA Administrator1 - 15 . 33 . 64 . 119 ( DNS name not found ) .
Here the output for the IP field is currently "P16IP1":"15 . 33 . 64 . 119", what I would like is for the output to be "P16IP1":"15.33.64.119"
Removing all whitespace from a string is easy:
"a \t\n\r\fb".gsub(/\s+/, '') # => "ab"
/\s+/ is the regular expression way of saying "all whitespace characters". This is its definition:
/\s/ - A whitespace character: /[ \t\r\n\f]/
If you're trying to match lines containing variants on
P12IP2
P01IP1
P99IP9
then you can use a pattern like:
/P\d{2}IP\d/
http://rubular.com/r/MCnY87DkZv
From there you can capture the leading/trailing characters:
/^(.+)P\d{2}IP\d(.+)/
http://rubular.com/r/HmekyYzXcU
If it's possible that the first two digits in the string can be shorter or longer than nn you can adjust the {2} size to whatever. See the Regexp documentation for how it works.

Regex - matching leading and trailing spaces, spaces between opening and closing brackets and words, but not between words

I apologize if this question has already been answered, but I have searched and cannot find the answer. I am trying to write a regex that will match all leading and trailing space, the spaces between the opening and closing bracket and the word, but will not match the spaces between words. The following are string format examples of the data I'm parsing:
[Header]
[ SomeSpace]
[ Some1 More Space 15 ]
no leading and trailing space, no space between brackets and only one word.
some leading and trailing space, space between the opening bracket and trailing space.
some leading space, space between word and digits, space between the opening and closing bracket, and trailing space.
The closest single regex I've come up with is:
/[^\[\]a-zA-Z\d]/
But I cannot seem to unmatch only the spaces between the words and digits...
The ruby code I currently am using as a workaround is:
line.gsub!(/^\s*/, "")
line.gsub!(/\[/, "")
line.gsub!(/\]/, "")
s = line.gsub!(/^\s*|\s*$/, "")
s = "[" + s + "]\n"
Obviously, not very pretty...
Any help to streamline this into an elegant gsub line is greatly appreciated.
Thanks!
Lee
If I understand your question correctly, you are trying to turn this text
[Header]
[ SomeSpace]
[ Some1 More Space 15 ]
into this:
[Header]
[SomeSpace]
[Some1 More Space 15]
This regex will do the job. The key addition here is the non-greedy ? quantifier on the inner character class. This makes the character class match as little as possible and leaves the trailing space within the brackets (if there is any) for the following greedy \s*.
s/^\s*\[\s*([\w\s]*?)\s*\]\s*$/[$1]/g
Ruby:
line.gsub! /^\s*\[\s*([\w\s]*?)\s*\]\s*$/, '[\\1]'
sed (ugly and most likely non-performant.. I'm no sed master!)
sed -Ee "s/^ *\[([a-zA-Z0-9 ]+)\] *$/\\1/g" -e "s/^ */[/g" -e "s/ *$/]/g" infile
Regex to match all extra spaces for replacement:
/(?<=^|\[)\s+|\s+(?=$|\])|(?<=\s)\s+/
The first part will match all leading spaces at the start and inside bracket.
The second part will match all trailing spaces at the end and inside bracket.
The last part will detect sequence of 2 or more spaces and remove the extra ones.
Just replace the matches with empty string.
Test data
[Header]
[ SomeSpace]
[ Some1 More Space 15 ]
[ Super Space ]
[ ]
[ ]
[]
[a]
[a ]
[ a]
[ a ]
[a a]
[a a a a a b] [ dasdasd dsd ]
I don't know about elegant but simplest is probably:
line.gsub /^\s*(\[)\s*|\s*(\])\s*$/, '\\1\\2'

What is the Ruby regex to match a string with at least one period and no spaces?

What is the regex to match a string with at least one period and no spaces?
You can use this :
/^\S*\.\S*$/
It works like this :
^ <-- Starts with
\S <-- Any character but white spaces (notice the upper case) (same as [^ \t\r\n])
* <-- Repeated but not mandatory
\. <-- A period
\S <-- Any character but white spaces
* <-- Repeated but not mandatory
$ <-- Ends here
You can replace \S by [^ ] to work strictly with spaces (not with tabs etc.)
Something like
^[^ ]*\.[^ ]*$
(match any non-spaces, then a period, then some more non-spaces)
no need regular expression. Keep it simple
>> s="test.txt"
=> "test.txt"
>> s["."] and s.count(" ")<1
=> true
>> s="test with spaces.txt"
=> "test with spaces.txt"
>> s["."] and s.count(" ")<1
=> false
Try this:
/^\S*\.\S*$/

Resources