should i always try to avoid grok parse errors? - elasticsearch

I have a field in my logs that I match for as an IP address like this:
grok {
match => [ "field1", "(?:(?<field2>%{IP})|(%{IP}),\+)"]
}
Sometimes this field1 is blank so field2 never gets created in the doc.
This is good because I don't want this field to be added unless it is a valid IP address.
But when this happens the doc is tagged with _grokparsefailure. Is this bad? Does this mean I'm doing something wrong and I should avoid _grokparsefailures.

All _grokparsefailure is, is a breadcrumb for the maintainer signifying some grok rules may need updating. Performance-wise, it has no meaning for a rule as short as that one. If you really don't care about misses on a grok statement, you can tell it to tag it some other way (tag_on_failure => ["something"]).

Related

Split Logstash/grok pattern that has international characters

Running into this issue.
I need to split up urls to get values from them. This works great when its all english.
URL = /78965asdvc34/Test/testBasins
Pattern = /%{WORD:org}/(?i)test/%{WORD:name}
I get this in the grok debugger.
{"org":[["78965asdvc34"]],"name":[["testBasins"]]}
If I have international characters, grok does not read them with the pattern above.
/78965asdvc34/Test/浸水Basins
Any thoughts how to get this to work? This value can be in any language in the logs, and hopefully there is a way to get it out.
Have you already tried
/%{WORD:org}/(?i)test/%{GREEDYDATA:name}
From hurb.
Thanks Hurb. GREEDYDATA worked.

ElasticSearch Nest AutoComplete based on words split by whitespace

I have AutoComplete working with ElasticSearch (Nest) and it's fine when the user types in the letters from the begining of the phrase but I would like to be able to use a specialized type of auto complete if it's possible that caters for words in a sentence.
To clarify further, my requirement is to be able to "auto complete" like such:
Imagine the full indexed string is "this is some title". When the user types in "th", this comes back as a suggestion with my current code.
I would also like the same thing to be returned if the user types in "som" or "title" or any letters that form a word (word being classified as a string between two spaces or the start/end of the string).
The code I have is:
var result = _client.Search<ContentIndexable>(
body => body
.Index(indexName)
.SuggestCompletion("content-suggest" + Guid.NewGuid(),
descriptor =>
descriptor
.OnField(t => t.Title.Suffix("completion"))
.Text(searchTerm)
.Size(size)));
And I would like to see if it would be possible to write something that matches my requirement using SuggestCompletion (and not by doing a match query).
Many thanks,
Update:
This question already has an answer here but I leave it here since the title/description is probably a little easier to search by search engines.
The correct solution to this problem can be found here:
Elasticsearch NEST client creating multi-field fields with completion
#Kha i think it's better to use the NGram Tokenizer
So you should use this tokenizer when you create the mapping.
If you want more info, and maybe an example write back.

Logstash grok pattern field not appearing in Kibana

I have recently been investigating ELK as a potential logging/monitoring solution. I have the stack set up and working, and I am starting to filter logs via grok.
Is it possible to have a specific part of your grok pattern appear as a field in Kibana?
For example, take the following pattern:
SAMSLOG %{HOUR}:%{MINUTE}:%{SECOND} \[%{USERNAME:user}\] - %{JAVALOGMESSAGE}
I was hoping (and from what I have read) "user" should become an available field in Kibana that I am able to search/filter the results on? Have I completely misunderstood or am I missing a vital link in the chain?
Full Grok pattern:
multiline {
patterns_dir => "/home/samuel/logstash/grok.patterns"
pattern => "(^%{SAMSLOG})"
negate => true
what => "previous"
}
Thank you,
Sam
Yes, the whole "magic" of logstash is to take the unstructured data and make structured fields from it. So, your basic premise is correct.
What you're missing is that multiline{} is a filter that is used to combine several input lines into one event; that's basically all it does. The "pattern" field there is used to identify when a new line should be started.
To make fields out of an event, you would need to use the grok{} filter.

Pattern failure with grok due a longer integer in a column

I have used grok debugger to get the top format working and it is being seen fine by elasticsearch. Eventually, when a log line like the one below hit it shoots out a tag with "grokparsefailure" due to the extra space before each integer (I'm assuming). Is there a tag I can use to accept anything no matter how long or short for each column?
0000003B 2015-03-14 07:46:14.618 16117 16121
00000DA1 2015-03-14 07:45:54.609 6382 6382
It's also possible to use the built in logstash pattern %{SPACE} to match any number of whitespace characters.
%{INT:num1}%{SPACE}%{INT:num2}
One or more spaces between two integers:
%{INT} +%{INT}
I ended up doing a custom filter since I knew my values were between 4-5 characters and then used patterns_dir => "./patterns" in my conf file.
_ID [0-9A-F]{4,5}
_ID2 [0-9A-F]{4,5}
UPDATE*****
my solution did not work because the number can be anywhere from 3 to 6 characters. The easier solution was provided above. Marked as answer.

Parsing "request" field from AWS ELB logs

I've enabled the access logs for my ELB's on AWS, and we're sending them to a setup of logstash + elasticsearch + kibana.
I'm using logstash's grok filter to parse the logs into separate fields that i can view and sort in kibana, and running into a difficulty with parsing the last field that amazon give in those logs, which is the "request" field.
it contains actually 3 parts. the HTTP method, the URL itself and the HTTP version.
how can i separate those 3 into independent fields that i could use?
Thanks
Benyamin
What about something like this, to replace the last element of your grok filter?
\"%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\"
I've never actually administered logstash before, but I pieced this together by looking at the source code for the built-in filters, some of which are evidently built on top of other built-in filters.
https://github.com/elasticsearch/logstash/blob/v1.4.1/patterns/grok-patterns
This pattern should extract three elements, the "verb" would capture "GET" and the "httpversion" would capture the numeric HTTP version, and the "request" would capture the rest.
I admit I'm also guessing about the backslashes to escape the double quote literals that are in the message, but that seems like the logical way to include a literal quote to match the ones that ELB puts in the logs. Note that the final double-quote I've shown isn't the closing quote of the filter string expression. That quote would go immediately after the above, since this matches the last thing on each line.

Resources