logstash geoip parsing using apache_log data failed - filter

I am new to Elasticsearch..
I want to use apache_logs data to use geoip filter in logstash.
apache log data:
"83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
logstash.conf
input {
tcp {
port => 9900
}
}
filter {
grok {
match => { "message" => "%{IP:clientip}" }
}
geoip {
source => "clientip"
}
}
output {
stdout { }
}
and i got an error below..
Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: GeoIP Filter in ECS-Compatiblity mode requires a `target` when `source` is not an `ip` sub-field, eg. [client][ip]>
....
Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Reload<main>, action_result: false", :backtrace=>nil}
here is my data output:
{
"#timestamp" => 2022-03-09T09:40:28.652491Z,
"clientip" => "83.149.9.216",
"#version" => "1",
"message" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"event" => {
"original" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
}
}
Could you guys help me to solve this problem?? Thanks.

Related

symfony/panther is giving "unknown error: net::ERR_NAME_NOT_RESOLVED\n (Session info: headless chrome=107.0.5304.87)" Error

Please help. I am getting the following error when trying to run the following code ...
Code is ...
$client = Client::createChromeClient(null, [
'--headless',
'--no-sandbox',
'--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'--window-size=1200,1100',
'--disable-gpu',
],
["port" => 9080, 'request_timeout_in_ms' => 100000]
);
$client->request('GET', 'https://www.apple.com');
The error I am getting is
unknown error: net::ERR_NAME_NOT_RESOLVED\n (Session info: headless chrome=107.0.5304.87)",
"#0 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/HttpCommandExecutor.php(385): Facebook\\WebDriver\\Exception\\WebDriverException::throwException()\n#1 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(598): Facebook\\WebDriver\\Remote\\HttpCommandExecutor->execute()\n#2 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(257): Facebook\\WebDriver\\Remote\\RemoteWebDriver->execute()\n#3 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(532): Facebook\\WebDriver\\Remote\\RemoteWebDriver->get()\n#4 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(276): Symfony\\Component\\Panther\\Client->get()\n#5 /var/www/html/tests/php/scraping/panther/index.php(26): Symfony\\Component\\Panther\\Client->request()\n#6 {main}"

400 bad request error when sending hits with Firefox user-agent to GA Measurement Protocol

I'm sending hits to GA Measurement Protocol, and some of them do not make it to the GA. I've noticed that all of them have one thing in common: the user-agent is Firefox, only varying version and device. Some examples:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (Android 10; Mobile; rv:103.0) Gecko/103.0 Firefox/103.0
GA validator is OK with those examples when checking them through the debug mode like this:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXXXXXXX-1&t=event&ec=Ecommerce&ea=purchase&pa=purchase&cid=1234567890.1234567890&ni=1&ti=184242&tr=1060&uip=X.X.X.X&ua=Mozilla%2F5.0+%28Windows+NT+10.0%3B+Win64%3B+x64%3B+rv%3A103.0%29+Gecko%2F20100101+Firefox%2F103.0&pr1id=test_1&pr1pr=530&pr1qt=1&pr1ps=1
I get this response:
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1..."
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
BUT in the production settings GA responses with 400 bad request error to the same requests without providing any details: "Your client has issued a malformed or illegal request. That’s all we know.".
So what might be wrong with Firefox UA?
UPD: I've managed to make this work by unsetting the 'User-Agent' header in case it contains 'Firefox' - and the corresponding 'ua' parameter in the payload gets accepted then.
if (strpos($requestHeaders['User-Agent'], 'Firefox') !== false) {
unset($requestHeaders['User-Agent']);
}
But it's still unclear what was wrong with such headers in the first place.

How to prevent fake useragent detection in selenium headless?

I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?
I am using selenium chromedriver.
Please add those options
# windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument("user-agent=#{linux_useragent}")
options.add_argument("--disable-web-security")
options.add_argument("--disable-xss-auditor")
options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
navigator.platform and navigator.userAgent should be matched.
If userAgent is for windows, then navigator.platform should be "Win32"
If userAgent is for linux, then navigator.platform should be "Linux x86_64"
You can set like that
platform = {
windows: "Win32",
linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
"source": "
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
}),
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
}),
Object.defineProperty(navigator, 'platform', {
get: () => \"#{platform[:linux]}\"
})"
})
and of course you need to set navigator.webdriver to undefined

Logstash grok failing

Am trying to grok a message but its failing with _grokparsefailure in log but doesn't actually say what it's failing on. The grok query works on https://grokdebug.herokuapp.com/
input {
file {
type => "apache-access"
path => "C:/prdLogs/sent/*"
}
filter {
grok {
match => ['message', '%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \] "%{WORD:httpmethod} %{NOTSPACE:referrer} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} "-" "%{NOTSPACE:request}" %{QS:UserAgent} %{WORD:httpmethodO} - - HTTP/%{NUMBER:httpversion2} "%{WORD:session}:%{WORD:httpmed}" "-" %{NUMBER:duration}' ]
}
date {
match => [ "raw_timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
output {
elasticsearch { hosts => ["111.44.44.44:9200"] }
}
The data looks like:
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.select.item/?locale=en_GB&dojo.preventCache=1488075524942 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3203
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.no.recently.used/?locale=en_GB&dojo.preventCache=1488075525483 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3159
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/selector.label.selected/?locale=en_GB&dojo.preventCache=1488075525843 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3600
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/actor.selector.label.remove.all/?locale=en_GB&dojo.preventCache=1488075526305 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3224
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/com.label.filter.objects/?locale=en_GB&dojo.preventCache=1488075526711 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3299
This is actually an apache access log but I was unable to use COMBINEDAPACHELOG or COMMONAPACHELOG. Same error actually!!
All entries in elasticsearch are tagged as "_grokparsefailure". I ran logstash in debug mode with log.level at debug but am not seeing any errors in the log.
Am using the latest version of logstash.
Please advise.
R2 D2 Thanks, I tried this but no joy :(
I created a patterns file and pasted your pattern. I just changed the payload to just "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]" and the following was my filter:
filter {
grok {
patterns_dir => ["c:/logstashconfig/patterns"]
match => ['message', '%{IP:clientip} - - /[%{DATE_CUSTOM:timestamp}/]']
}
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
The debug log in logstash:
{
"path" => "C:/prdLogs/sent/test",
"#timestamp" => 2017-03-03T00:06:15.269Z,
"#version" => "1",
"host" => "hkw20012125",
"message" => "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]\r",
"type" => "apache-access",
"tags" => [
[0] "_grokparsefailure"
]
}
Any ideas? Is it the +0800 at the end of the data?
Thanks.
I think once you have GREEDYDATA in your pattern, it means to consider rest of your line from the log:
GREEDYDATA's pattern looks like:
GREEDYDATA .* <-- means to capture the entire line
And your grok match should look something like this if I'm not mistaken:
grok {
match => ['message', '%{IPV4:clientip} - - %{GREEDYDATA:data}']
}
unless you need the values to be extracted separately, the above grok should do the trick for you. And I think the way you're matching the timestamp is wrong. In order to handle your timestamp you need to have the below patterns within your patterns file:
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
YEAR (?>\d\d){1,2}
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_CUSTOM %{MONTHDAY}[/]%{MONTH }[/]%{YEAR}:%{TIME}
And then you could simply use this within your grok match:
grok {
match => ['message', '%{IPV4:clientip} - - \[%{DATE_CUSTOM:timestamp} %{GREEDYDATA:data}']
}
Now you'll be able to match the timestamp as:
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
Hope this helps!
When you have to build your own patterns, start from the left side, go slowly, and use the debugger.
If you test this pattern:
%{IP:clientip} - - \[
it works, but this one:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \]
doesn't. Comparing your pattern to the input shows that there aren't spaces between the timestamp and the close bracket.
Changing this part of the pattern to:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}\]
works.

Where did the default Useragent of Cobalt come from?

When run the cobalt, I can see the useragent from the log:
[0101/000230:INFO:application.cc(690)] User Agent: Mozilla/5.0 (DirectFB; Linux x86_64) Cobalt/4.13031-qa (unlike Gecko) Starboard/1
So where does it come from? Is there a way to change it?
The default useragent is set in the following file, you can have a check:
https://cobalt.googlesource.com/cobalt/+/e9b4b99dab6e774b8b6e63add74c352cc5dd395a/src/cobalt/network/user_agent_string_factory.cc
std::string UserAgentStringFactory::CreateUserAgentString() {
// Cobalt's user agent contains the following sections:
// Mozilla/5.0 (ChromiumStylePlatform)
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
// Starboard/APIVersion,
// Device/FirmwareVersion (Brand, Model, ConnectionType)
// Mozilla/5.0 (ChromiumStylePlatform)
std::string user_agent =
base::StringPrintf("Mozilla/5.0 (%s)", CreatePlatformString().c_str());
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
base::StringAppendF(&user_agent, " Cobalt/%s.%s-%s (unlike Gecko)",
COBALT_VERSION, COBALT_BUILD_VERSION_NUMBER,
kBuildConfiguration);
// Starboard/APIVersion,
if (!starboard_version_.empty()) {
base::StringAppendF(&user_agent, " %s", starboard_version_.c_str());
}
// Device/FirmwareVersion (Brand, Model, ConnectionType)
if (youtube_tv_info_) {
base::StringAppendF(
&user_agent, ", %s_%s_%s/%s (%s, %s, %s)",
youtube_tv_info_->network_operator.value_or("").c_str(),
CreateDeviceTypeString().c_str(),
youtube_tv_info_->chipset_model_number.value_or("").c_str(),
youtube_tv_info_->firmware_version.value_or("").c_str(),
youtube_tv_info_->brand.c_str(), youtube_tv_info_->model.c_str(),
CreateConnectionTypeString().c_str());
}
return user_agent;
}
If your SbSystemGetDeviceType() is true for SystemDeviceTypeIsTv() (in file user_agent_string_factory_starboard.cc), you can customize the UA by implementing some fields of SbSystemGetProperty() + some SbSystemGet() functions.
This is a typical example:
Mozilla/5.0 (1) Cobalt/11.119147-gold (unlike Gecko) Starboard/8, 2_8_6/5 (3, 4, 7)
where,
kSbSystemPropertyPlatformName
kSbSystemPropertyNetworkOperatorName
kSbSystemPropertyManufacturerName
kSbSystemPropertyModelName
kSbSystemPropertyFirmwareVersion
kSbSystemPropertyChipsetModelNumber
SbSystemGetConnectionType()
SbSystemGetDeviceType()

Resources