Split Logstash/grok pattern that has international characters - elasticsearch

Running into this issue.
I need to split up urls to get values from them. This works great when its all english.
URL = /78965asdvc34/Test/testBasins
Pattern = /%{WORD:org}/(?i)test/%{WORD:name}
I get this in the grok debugger.
{"org":[["78965asdvc34"]],"name":[["testBasins"]]}
If I have international characters, grok does not read them with the pattern above.
/78965asdvc34/Test/浸水Basins
Any thoughts how to get this to work? This value can be in any language in the logs, and hopefully there is a way to get it out.

Have you already tried
/%{WORD:org}/(?i)test/%{GREEDYDATA:name}
From hurb.
Thanks Hurb. GREEDYDATA worked.

Related

Grok filters for the below log message

I am working on writing a pattern for this specific line through GROK filter
"NOTIFICATION-Interface_IF-asdasdsf01.chn.asdfasp.com/1074_Down"
Can some one help me with this please .
This is the reg-ex i came up with
[A-Z]\w+[-][A-Z][a-z]\w+[]\w+[a-zA-Z][-]\w+[a-zA-Z0-9][.][A-Za-z]\w+[.][A-Za-z]\w+[.][a-z]\w+/[0-9]\d+[][A-Za-z]\w+
but i want it for "NOTIFICATION-Interface_IF-asdasdsf01.chn.asdfasp.com/1074_Down" as well as with out the chn in the hostname
NOTIFICATION-Interface_IF-asdasdsf01.asdfasp.com/1074_Down
Thanks in Advance.
Which data would you capture ?
Using this kind of pattern if would be possible to capture multiple data
%{WORD:type}-%{WORD:interface}-%{GREEDYDATA:client}
Will be produce
type = "NOTIFICATION"
interface = "Interface_ID"
client = "asdasdsf01.chn.asdfasp.com/1074_Down"
Don't hesitate to test and upgrade the pattern using the grok debugger. See here the native patterns from logstash.
PS : your first regexp can be simplify : ([A-Z]+)\-([A-Za-z]+)_([A-Za-z]+)\-(.*)\/([0-9]+)_([A-Za-z]+). Depending of your delimiter, it can be more easy.
You can test is using this link

Pattern failure with grok due a longer integer in a column

I have used grok debugger to get the top format working and it is being seen fine by elasticsearch. Eventually, when a log line like the one below hit it shoots out a tag with "grokparsefailure" due to the extra space before each integer (I'm assuming). Is there a tag I can use to accept anything no matter how long or short for each column?
0000003B 2015-03-14 07:46:14.618 16117 16121
00000DA1 2015-03-14 07:45:54.609 6382 6382
It's also possible to use the built in logstash pattern %{SPACE} to match any number of whitespace characters.
%{INT:num1}%{SPACE}%{INT:num2}
One or more spaces between two integers:
%{INT} +%{INT}
I ended up doing a custom filter since I knew my values were between 4-5 characters and then used patterns_dir => "./patterns" in my conf file.
_ID [0-9A-F]{4,5}
_ID2 [0-9A-F]{4,5}
UPDATE*****
my solution did not work because the number can be anywhere from 3 to 6 characters. The easier solution was provided above. Marked as answer.

Submatching repeating pattern

I am trying to put together a regexp in VBA, but even in ruby I can't get it right.
the string:
<thead class="thead"><tr><th>FECHA</th><th>ITLUPVALOR</th><th>ITLUPPLAZO</th><th>ITLUP30DIAS</th><th>ITLUP60DIAS</th><th>ITLUP90DIAS</th><th>ITLUP180DIAS</th><th>ITLUP270DIAS</th><th>ITLUP360DIAS</th><th>ITLUP720DIAS</th><th>ITLUP1080DIAS</th><th>ITLUP1440DIAS</th><th>ITLUP1800DIAS</th></tr></thead>
what i have tried:
/(?:<thead class=\"thead\"><tr>)(<th>[^<]+?<\/th>)+(?:<\/tr><\/thead>)/m
The idea here (http://rubular.com/r/BpbPszctTw) was to have 9 submatches instead of one.
What am I missing?
Sorry, but a regex repeating group will only capture the last match in a group. See http://www.regular-expressions.info/captureall.html for more info.
Update: True, but if you let the regex match do the repeating for you, as in the other answer, you can get multiple matches, per http://rubular.com/r/BclU13qWYm ! In other words, accept the other answer, not this one. :-)
With this pattern you can obtain what you want:
/<thead class="thead"><tr>|\G<th>([^<]+)<\/th>/
Just remove the first result.

How do I regex a name and an email out of the 3 major email clients in ruby?

I thought I had it figured out, but it appears that my regex still has quirks in it. Basically I would like to use the same regex pattern to match the following major email clients (Gmail, Yahoo, and regular email):
"Brian Mang" <brian.mang#email.com> -- Case1
Brian Mang (brian.mang#email.com) -- Case2
<brian.mang#email.com> -- Case3
brian.mang#email.com -- Case4
I had the following regex pattern:
/[\W"]*(?<name>.*?)[\"]*?\s*[<(](?<email>\w.*)[>)]/.match(contact)
and it works for all Cases 1-3, but I cant get it to pick up case 4, I tried messing around with it but cant figure it out cause it breaks the other cases. Any idea what I need to change/modify to make my regex pick up all of the 4 cases? Thank you.
Try this
[\W"]*(?<name>.*?)[\"]*?\s*[<(]?(?<email>\S+#\S+)[>)]?
See it here on Regexr
I made the classes surrounding the address optional and changed the part that matches the email to \S+#\S+ that means at least one non-whitespace followed by a # then at least one more non-whitespace character.
Since the above version matches the closing character also, you can restrict the part after the # a bit more
[\W"]*(?<name>.*?)[\"]*?\s*[<(]?(?<email>\S+#[^\s>)]+)[>)]?
see it here on Regexr
Edit: This one works for all four:
[\W"]*(?<name>.*?)[\"]*?\s*[<(]?(?<email>\S+#[^)>]+)[>)]?

Ruby RegEx issue

I'm having a problem getting my RegEx to work with my Ruby script.
Here is what I'm trying to match:
http://my.test.website.com/{GUID}/{GUID}/
Here is the RegEx that I've tested and should be matching the string as shown above:
/([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/
3 capturing groups:
group 1: ([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)
group 2: (\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])
Ruby is giving me an error when trying to validate a match against this regex:
empty range in char class: (My RegEx goes here) (SyntaxError)
I appreciate any thoughts or suggestions on this.
You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:
uri = URI.parse(your_url)
path = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)
If you need any of the non-path components of the URL the you can easily pull them out of uri.
You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.
You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:
...[\/\/[0-9a-fA-F]....
the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,
...[-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}...
is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.

Resources