I need to grok a pipe-delimited string of values in a grok line; for example:
|NAME=keith|DAY=wednesday|TIME=09:27:423227|DATE=08/06/2019|amount=68.23|currency=USD|etc...
What is the easiest way to do this?
Is there any form of a grok split?
Thanks,
Keith
Your scenario is the perfect use case of logstashs kv (key-value) filter!
The basic idea behind this filter plugin is to extract key-value pairs in a repetitive pattern like yours.
In this case the field_split character would be the pipe ( | ).
To distinguish keys from values you would set the value_split character to the equal sign ( = ).
Here's a sample but untested filter configuration:
filter{
kv{
source => "your_field_name"
target => "kv"
field_split => "\|"
value_split => "="
}
}
Notice how the pipe character in the field_split setting is escaped. Since the pipe is a regex-recognized character, you have to escape it!
This filter will extract all found key-value pairs from your source field and set it into the target named "kv" (the name is arbitrary) from that you can access the fields.
You might want to take a look at the other possible settings of the kv filter to satisfy your needs.
I hope I could help you! :-)
Related
With this example data
GigabitEthernet102/0/0/28 = TLU-46356_CAR_ONE_RIVERO_AUTO_CENTER_PRINCIPAL Traffic (SNMP Traffic) Down (The interface is disconnected: ifOperStatus=down (2) (code: PE058))
Im try to get a substring ever that found "TLU" pattern and create a new field like this "TLU-46356"
Im use a grok pattern like this
if ([logMessage] =~ /TLU-/){ grok { match => { "logMessage" => 'TLU=(?<TLU>[0-9a-fx]{8})' } }
But donĀ“t work and the result is "grokparsefailure"
Any idea, please.
I just tested your example in Grok Debugger in Kibana and this works for me
(?<TLU>[0-9a-fx]{5})
For some reason TLU= in the beginning is not working, but grok will pick up named capture groups anyways, so even without it you'll get field called TLU.
The other part of the problem is that you're trying to match 8 characters from 0 to 9, from a to f or x, but in your example number after TLU- is only 5 characters long. So either the logs have variable number of characters which you didn't mention and you should use something like {n, m} to define length of the string you want to capture. Or you can just say * and terminate it with _ like this
(?<TLU>[0-9a-fx]*)_
If there's anything I'm missing please update the questions with additional data and let me know., good luck!
Feeling difficulty in writing grok patterns.Please help
I have GetIndicatorsByAnalysisProcessIDServlet service method is called and in this how to extract only GetIndicatorsByAnalysisProcess and text GetIndicatorsByAnalysisProcess will not be same
Here challenging i felt is truncating string from backward direction
i followed up
grok {
match => ["destinationid", "(?<fieldname>discard.{7})"]
}
it high-lets considering number of characters from start
If I understand you correctly, you need to have the first word in a variable.
This is achievable via
(?<fieldname>[^\s]*)\s*
with sample output from it
{
"fieldname": [
[
"GetIndicatorsByAnalysisProcessIDServlet"
]
]
}
In case you have various beginnings with optional spaces but an exactly same ending of the sentence, the effective regexp will be different.
I'd like to match this content:
[default]
aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I
aws_access_key_id = K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF
against this union:
aws_configuration_file_regex = Regexp.union [
/aws_access_key_id\s*=\s*(?<aws_access_key_id>.+)/,
/aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+)/
]
but it doesn't work as expected as only the first match present in result:
=> #<MatchData
"aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I"
aws_secret_access_key:"69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I"
aws_access_key_id:nil>
How to fix that? I'd like to keep the code as short as possible, i.e. no function defines should be present.
The problem is that Regexp.union effectively means match one or the other. It might be best to first match one, then the other.
If you still want to match both in one go (as I see you have differently named groups, you have to concatenate them instead and add a multiline flag:
# note the .*? for concatenation and the //m flag
r = /aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+).*?aws_access_key_id\s*=\s*(?<aws_access_key_id>.+)/m
foo.match(r) # =>
# #<MatchData
# "aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I\n aws_access_key_id = K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF"
# aws_secret_access_key:"69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I\n "
# aws_access_key_id:"K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF">
However, note that this is not the most comprehensive code in the word.
You want to do something like this:
((aws_access_key_id\s*=\s*(?<aws_access_key_id>.+))|(aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+)))*
Seems to work on http://rubular.com/
This will collect all the keys, and make a bunch of matches, so you have to sort through that. Highly recommend you try on rubular.com, and get what is happening, and even tune up the pattern.
I am trying to search names in elastic search,
Consider name as kanal-kannan
normally we search name with * na, I tried to search like this -
"/index/party_details/_search?size=200&from=0&q=(first_name_v:kanal-*)"
this results in zero records.
Unless the hyphen character has been dealt with specifically by the analyzer then the two words in your example, kanal and kannan will be indexed separately because any non-alpha character is treated by default as a word delimiter.
Have a look at the documentation for Word Delimiter Token Filter and specifically at the type_table parameter.
Here's an example I used to ensure that an email field was correctly indexed
ft.custom_delimiter = {
"type": "word_delimiter",
"split_on_numerics": false,
"type_table": ["# => ALPHANUM", ". => ALPHANUM", "- => ALPHA", "_ => ALPHANUM"]
};
- is a special character that needs to be escaped to be searched literally: \-
If you use the q parameter (that is, the query_string query), the rules of the Lucene Queryparser Syntax apply.
Depending on your analyzer chain, you might not have any - characters in your index, replacing them with a space in your query would work in that cases too.
#l4rd's answer should work properly (I have the same setup). Another option you have is to mark field with keyword analyzer to prevent tokenizing at all. Note that keyword tokenizer wouldn't lowercase anything, so use custom analyzer with keyword tokenizer in this case.
I need to read a text file like this,
regular: 12/04/2013, 13/04/2013
extract 'regular', and save it in a variable and all the dates in an array. How can I do this?
Based on what you say you tried, would the following do what you want?
data = line.split(/: */) # => ["regular", "12/04/2013, 13/04/2013"]
#customer = data[0] # => "regular"
#dates_array = data[1].split(/, */) # => ["12/04/2013", "13/04/2013"]
I used * to match (and eliminate) multiple blanks. I'm assuming here that you don't want the blanks, comma, or the colon (:) separator included in your results. If that's not correct, adjust the regular expressions accordingly.