Logstash grok failing - elasticsearch

Am trying to grok a message but its failing with _grokparsefailure in log but doesn't actually say what it's failing on. The grok query works on https://grokdebug.herokuapp.com/
input {
file {
type => "apache-access"
path => "C:/prdLogs/sent/*"
}
filter {
grok {
match => ['message', '%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \] "%{WORD:httpmethod} %{NOTSPACE:referrer} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} "-" "%{NOTSPACE:request}" %{QS:UserAgent} %{WORD:httpmethodO} - - HTTP/%{NUMBER:httpversion2} "%{WORD:session}:%{WORD:httpmed}" "-" %{NUMBER:duration}' ]
}
date {
match => [ "raw_timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
output {
elasticsearch { hosts => ["111.44.44.44:9200"] }
}
The data looks like:
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.select.item/?locale=en_GB&dojo.preventCache=1488075524942 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3203
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.no.recently.used/?locale=en_GB&dojo.preventCache=1488075525483 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3159
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/selector.label.selected/?locale=en_GB&dojo.preventCache=1488075525843 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3600
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/actor.selector.label.remove.all/?locale=en_GB&dojo.preventCache=1488075526305 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3224
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/com.label.filter.objects/?locale=en_GB&dojo.preventCache=1488075526711 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3299
This is actually an apache access log but I was unable to use COMBINEDAPACHELOG or COMMONAPACHELOG. Same error actually!!
All entries in elasticsearch are tagged as "_grokparsefailure". I ran logstash in debug mode with log.level at debug but am not seeing any errors in the log.
Am using the latest version of logstash.
Please advise.
R2 D2 Thanks, I tried this but no joy :(
I created a patterns file and pasted your pattern. I just changed the payload to just "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]" and the following was my filter:
filter {
grok {
patterns_dir => ["c:/logstashconfig/patterns"]
match => ['message', '%{IP:clientip} - - /[%{DATE_CUSTOM:timestamp}/]']
}
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
The debug log in logstash:
{
"path" => "C:/prdLogs/sent/test",
"#timestamp" => 2017-03-03T00:06:15.269Z,
"#version" => "1",
"host" => "hkw20012125",
"message" => "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]\r",
"type" => "apache-access",
"tags" => [
[0] "_grokparsefailure"
]
}
Any ideas? Is it the +0800 at the end of the data?
Thanks.

I think once you have GREEDYDATA in your pattern, it means to consider rest of your line from the log:
GREEDYDATA's pattern looks like:
GREEDYDATA .* <-- means to capture the entire line
And your grok match should look something like this if I'm not mistaken:
grok {
match => ['message', '%{IPV4:clientip} - - %{GREEDYDATA:data}']
}
unless you need the values to be extracted separately, the above grok should do the trick for you. And I think the way you're matching the timestamp is wrong. In order to handle your timestamp you need to have the below patterns within your patterns file:
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
YEAR (?>\d\d){1,2}
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_CUSTOM %{MONTHDAY}[/]%{MONTH }[/]%{YEAR}:%{TIME}
And then you could simply use this within your grok match:
grok {
match => ['message', '%{IPV4:clientip} - - \[%{DATE_CUSTOM:timestamp} %{GREEDYDATA:data}']
}
Now you'll be able to match the timestamp as:
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
Hope this helps!

When you have to build your own patterns, start from the left side, go slowly, and use the debugger.
If you test this pattern:
%{IP:clientip} - - \[
it works, but this one:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \]
doesn't. Comparing your pattern to the input shows that there aren't spaces between the timestamp and the close bracket.
Changing this part of the pattern to:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}\]
works.

Related

symfony/panther is giving "unknown error: net::ERR_NAME_NOT_RESOLVED\n (Session info: headless chrome=107.0.5304.87)" Error

Please help. I am getting the following error when trying to run the following code ...
Code is ...
$client = Client::createChromeClient(null, [
'--headless',
'--no-sandbox',
'--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'--window-size=1200,1100',
'--disable-gpu',
],
["port" => 9080, 'request_timeout_in_ms' => 100000]
);
$client->request('GET', 'https://www.apple.com');
The error I am getting is
unknown error: net::ERR_NAME_NOT_RESOLVED\n (Session info: headless chrome=107.0.5304.87)",
"#0 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/HttpCommandExecutor.php(385): Facebook\\WebDriver\\Exception\\WebDriverException::throwException()\n#1 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(598): Facebook\\WebDriver\\Remote\\HttpCommandExecutor->execute()\n#2 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(257): Facebook\\WebDriver\\Remote\\RemoteWebDriver->execute()\n#3 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(532): Facebook\\WebDriver\\Remote\\RemoteWebDriver->get()\n#4 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(276): Symfony\\Component\\Panther\\Client->get()\n#5 /var/www/html/tests/php/scraping/panther/index.php(26): Symfony\\Component\\Panther\\Client->request()\n#6 {main}"

400 bad request error when sending hits with Firefox user-agent to GA Measurement Protocol

I'm sending hits to GA Measurement Protocol, and some of them do not make it to the GA. I've noticed that all of them have one thing in common: the user-agent is Firefox, only varying version and device. Some examples:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (Android 10; Mobile; rv:103.0) Gecko/103.0 Firefox/103.0
GA validator is OK with those examples when checking them through the debug mode like this:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXXXXXXX-1&t=event&ec=Ecommerce&ea=purchase&pa=purchase&cid=1234567890.1234567890&ni=1&ti=184242&tr=1060&uip=X.X.X.X&ua=Mozilla%2F5.0+%28Windows+NT+10.0%3B+Win64%3B+x64%3B+rv%3A103.0%29+Gecko%2F20100101+Firefox%2F103.0&pr1id=test_1&pr1pr=530&pr1qt=1&pr1ps=1
I get this response:
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1..."
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
BUT in the production settings GA responses with 400 bad request error to the same requests without providing any details: "Your client has issued a malformed or illegal request. That’s all we know.".
So what might be wrong with Firefox UA?
UPD: I've managed to make this work by unsetting the 'User-Agent' header in case it contains 'Firefox' - and the corresponding 'ua' parameter in the payload gets accepted then.
if (strpos($requestHeaders['User-Agent'], 'Firefox') !== false) {
unset($requestHeaders['User-Agent']);
}
But it's still unclear what was wrong with such headers in the first place.

logstash geoip parsing using apache_log data failed

I am new to Elasticsearch..
I want to use apache_logs data to use geoip filter in logstash.
apache log data:
"83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
logstash.conf
input {
tcp {
port => 9900
}
}
filter {
grok {
match => { "message" => "%{IP:clientip}" }
}
geoip {
source => "clientip"
}
}
output {
stdout { }
}
and i got an error below..
Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: GeoIP Filter in ECS-Compatiblity mode requires a `target` when `source` is not an `ip` sub-field, eg. [client][ip]>
....
Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Reload<main>, action_result: false", :backtrace=>nil}
here is my data output:
{
"#timestamp" => 2022-03-09T09:40:28.652491Z,
"clientip" => "83.149.9.216",
"#version" => "1",
"message" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"event" => {
"original" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
}
}
Could you guys help me to solve this problem?? Thanks.

How to prevent fake useragent detection in selenium headless?

I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?
I am using selenium chromedriver.
Please add those options
# windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument("user-agent=#{linux_useragent}")
options.add_argument("--disable-web-security")
options.add_argument("--disable-xss-auditor")
options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
navigator.platform and navigator.userAgent should be matched.
If userAgent is for windows, then navigator.platform should be "Win32"
If userAgent is for linux, then navigator.platform should be "Linux x86_64"
You can set like that
platform = {
windows: "Win32",
linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
"source": "
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
}),
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
}),
Object.defineProperty(navigator, 'platform', {
get: () => \"#{platform[:linux]}\"
})"
})
and of course you need to set navigator.webdriver to undefined

TinyMCE - IE8 Error

I'm testing my page in a bunch of browsers and in IE 8 i get the following error:
Webpage error details
User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
Timestamp: Mon, 20 Sep 2010 20:03:46 UTC
Message: Invalid argument.
Line: 1314
Char: 7
Code: 0
URI: http://192.168.1.93/JS/tiny_mce/tiny_mce_src.js
Any idea on how to fix this? My TinyMCE version is:
majorVersion : '3',
minorVersion : '3.9',
releaseDate : '2010-09-08',
My init is:
tinyMCE.init({
'mode' : 'exact',
'elements' : 'EDITOR',
'auto_focus' : 'EDITOR',
'theme' : 'advanced',
'plugins' : 'safari,save,preview,table,paste,insertdatetime',
'height' : h,
'width' : w,
'cleanup_on_startup' : true,
'fix_list_elements' : true,
'fix_table_elements' : true,
'fix_nesting' : false,
'theme_advanced_layout_manager' : 'SimpleLayout',
'theme_advanced_toolbar_location' : 'top',
'theme_advanced_toolbar_align' : 'left',
forced_root_block : '',
'theme_advanced_buttons1' : 'save, cancel, |, fontselect, fontsizeselect, formatselect, |, backcolor, forecolor, |, selectall, cut, copy, paste, pastetest, pasteword, |, undo, redo',
'theme_advanced_buttons2' : 'anchor, link, unlink, |, bold, italic, underline, strikethrough, sub, sup, |, numlist, bullist, charmap, |, outdent, indent, |, justifyleft, justifycenter, justifyright, justifyfull, |, insertdate, inserttime',
'theme_advanced_buttons3' : 'tablecontrols',
'theme_advanced_font_sizes' : '8pt,9pt,10pt,11pt,12pt,14pt,16pt,17pt,18pt,19pt,20pt,25pt,30pt,35pt,40pt',
'theme_advanced_buttons3_add' : '|, code',
'end' : 'end',
});
My height and width were messed up. The function I used to get h and w worked everywhere just fine everywhere but in IE.
I also had this error in IE8. Except the cause of mine was that I was not specifying a height/width at all. Even though it was still loading correctly, and adopting the size of the text box that it converted, it still threw an "invalid argument" exception.
Fixed it by adding this code to the init:
$('#' + textAreaID).tinymce({
...
height: $('#' + textAreaID).outerHeight(),
width: $('#' + textAreaID).outerWidth()
});
(using jQuery syntax)

Resources