Parsing "request" field from AWS ELB logs - ruby

I've enabled the access logs for my ELB's on AWS, and we're sending them to a setup of logstash + elasticsearch + kibana.
I'm using logstash's grok filter to parse the logs into separate fields that i can view and sort in kibana, and running into a difficulty with parsing the last field that amazon give in those logs, which is the "request" field.
it contains actually 3 parts. the HTTP method, the URL itself and the HTTP version.
how can i separate those 3 into independent fields that i could use?
Thanks
Benyamin

What about something like this, to replace the last element of your grok filter?
\"%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\"
I've never actually administered logstash before, but I pieced this together by looking at the source code for the built-in filters, some of which are evidently built on top of other built-in filters.
https://github.com/elasticsearch/logstash/blob/v1.4.1/patterns/grok-patterns
This pattern should extract three elements, the "verb" would capture "GET" and the "httpversion" would capture the numeric HTTP version, and the "request" would capture the rest.
I admit I'm also guessing about the backslashes to escape the double quote literals that are in the message, but that seems like the logical way to include a literal quote to match the ones that ELB puts in the logs. Note that the final double-quote I've shown isn't the closing quote of the filter string expression. That quote would go immediately after the above, since this matches the last thing on each line.

Related

Elastalert whitelist/blacklist not working

So I have a certain query running in (Yelp's) Elastalert and I am trying to filter out logs containing one of several keywords. If I use the any rule type, I get a set of 30 matches to the certain query I have. When I change the ruletype to whitelist:
type: whitelist
compare_key: message
ignore_null: true
whitelist: ["exclude_strings"...]
I still get the same 30 matches, even when I know the message field contains the listed strings. I've also tried changing the compare key or the strings, using strings that exactly match the entire field, I've changed the formatting to
whitelist:
- "string"
...
and nothing has made a difference. The same thing happens also with the blacklist type.
What am I missing?
After further testing, it turns out that either of the above formats will work correctly. The reason I thought it was not working is that I was looking at the hits term in the Elastalert status. Instead I should have been looking at the matches term. The search returned the same number of hits because the query was the same each time, but it seems that the matches term comes, not from ElasticSearch, but from Elastalert itself.
That is, Elastalert sends the full query to ElasticSearch, and then does the filtering on the returned data based on the whitelist terms. hits will be the same every time, but matches depends on the whitelist. If you set realert to zero, you will see that the number of alerts generated is the same as the number of matches.

elasticsearch - fulltext search for words with special/reserved characters

I am indexing documents that may contain any special/reserved characters in their fulltext body. For example
"PDF/A is an ISO-standardized version of the Portable Document Format..."
I would like to be able to search for pdf/a without having to escape the forward slash.
How should i analyze my query-string and what type of query should i use?
The default standard analyzer will tokenize a string like that so that "PDF" and "A" are separate tokens. The "A" token might get cut out by the stop token filter (See Standard Analyzer). So without any custom analyzers, you will typically get any documents with just "PDF".
You can try creating your own analyzer modeled off the standard analyzer that includes a Mapping Char Filter. The idea would that "PDF/A" might get transformed into something like "pdf_a" at index and query time. A simple match query will work just fine. But this is a very simplistic approach and you might want to consider how '/' characters are used in your content and use slightly more complex regex filters which are also not perfect solutions.
Sorry, I completely missed your point about having to escape the character. Can you elaborate on your use case if this turns out to not be helpful at all?
To support queries containing reserved characters i now use the Simple Query String Query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html)
As of not using a query parser it is a bit limited (e.g. no field-queries like id:5), but it solves the purpose.

How to extract jmeter response value

I did read few responses but my regular expression extractor is not working.
Mine is a simple case where this is my response
token.id=AQIC5wM2LY4Sfcz4cOT2RrremxWJmM3llZmPl6k0bP_r5D4.AAJTSQACMDUAAlNLABQtNDI1OTg4NzgxODg5MDM1ODU2NQACUzEAAjI3
I am trying to grab the value using this expression
token.id="(.*?)"
which is not saving the value into the variable i assigned. My next request when trying to use the value fails since its not grabbing it.
Can someone let me know what exactly is missing. thanks.
There are few problems with your regular expression:
You need to escape dot between "token" and "id" with backslash as it is a special character. See Literal Characters article for more information.
You don't need the quotations marks as your response doesn't contain them (does it?)
So your regular expression needs to be amended as token\.id=(.*) (however I would rather go for something like token\.id=(\w.+)
You can use View Results Tree listener in "RegExp Tester" mode to test your regular expressions directly against response without having to re-run the request.
See Regular Expressions JMeter documentation chapter and How to debug your Apache JMeter script guide for extended information on the above approaches.

Logstash grok pattern field not appearing in Kibana

I have recently been investigating ELK as a potential logging/monitoring solution. I have the stack set up and working, and I am starting to filter logs via grok.
Is it possible to have a specific part of your grok pattern appear as a field in Kibana?
For example, take the following pattern:
SAMSLOG %{HOUR}:%{MINUTE}:%{SECOND} \[%{USERNAME:user}\] - %{JAVALOGMESSAGE}
I was hoping (and from what I have read) "user" should become an available field in Kibana that I am able to search/filter the results on? Have I completely misunderstood or am I missing a vital link in the chain?
Full Grok pattern:
multiline {
patterns_dir => "/home/samuel/logstash/grok.patterns"
pattern => "(^%{SAMSLOG})"
negate => true
what => "previous"
}
Thank you,
Sam
Yes, the whole "magic" of logstash is to take the unstructured data and make structured fields from it. So, your basic premise is correct.
What you're missing is that multiline{} is a filter that is used to combine several input lines into one event; that's basically all it does. The "pattern" field there is used to identify when a new line should be started.
To make fields out of an event, you would need to use the grok{} filter.

Yahoo Pipes: filter items in a feed based on words in a text file

I have a pipe that filters an RSS feed and removes any item that contains "stopwords" that I've chosen. Currently I've manually created a filter for each stopword in the pipe editor, but the more logical way is to read these from a file. I've figured out how to read the stopwords out of the text file, but how do I apply the filter operator to the feed, once for every stopword?
The documentation states explicitly that operators can't be applied within the loop construct, but hopefully I'm missing something here.
You're not missing anything - the filter operator can't go in a loop.
Your best bet might be to generate a regex out of the stopwords and filter using that. e.g. generate a string like (word1|word2|word3|...|wordN).
You may have to escape any odd characters. Also I'm not sure how long a regex can be so you might have to chunk it over multiple filter rules.
In addition to Gavin Brock's answer the following Yahoo Pipes
filters the feed items (title, description, link and author) according to multiple stopwords:
Pipes Info
Pipes Edit
Pipes Demo
Inputs
_render=rss
feed=http://example.com/feed.rss
stopwords=word1-word2-word3

Resources