I am new to Elasticsearch..
I want to use apache_logs data to use geoip filter in logstash.
apache log data:
"83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
logstash.conf
input {
tcp {
port => 9900
}
}
filter {
grok {
match => { "message" => "%{IP:clientip}" }
}
geoip {
source => "clientip"
}
}
output {
stdout { }
}
and i got an error below..
Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: GeoIP Filter in ECS-Compatiblity mode requires a `target` when `source` is not an `ip` sub-field, eg. [client][ip]>
....
Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Reload<main>, action_result: false", :backtrace=>nil}
here is my data output:
{
"#timestamp" => 2022-03-09T09:40:28.652491Z,
"clientip" => "83.149.9.216",
"#version" => "1",
"message" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
"event" => {
"original" => "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
}
}
Could you guys help me to solve this problem?? Thanks.
Related
Please help. I am getting the following error when trying to run the following code ...
Code is ...
$client = Client::createChromeClient(null, [
'--headless',
'--no-sandbox',
'--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'--window-size=1200,1100',
'--disable-gpu',
],
["port" => 9080, 'request_timeout_in_ms' => 100000]
);
$client->request('GET', 'https://www.apple.com');
The error I am getting is
unknown error: net::ERR_NAME_NOT_RESOLVED\n (Session info: headless chrome=107.0.5304.87)",
"#0 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/HttpCommandExecutor.php(385): Facebook\\WebDriver\\Exception\\WebDriverException::throwException()\n#1 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(598): Facebook\\WebDriver\\Remote\\HttpCommandExecutor->execute()\n#2 /var/www/html/tests/php/scraping/panther/vendor/php-webdriver/webdriver/lib/Remote/RemoteWebDriver.php(257): Facebook\\WebDriver\\Remote\\RemoteWebDriver->execute()\n#3 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(532): Facebook\\WebDriver\\Remote\\RemoteWebDriver->get()\n#4 /var/www/html/tests/php/scraping/panther/vendor/symfony/panther/src/Client.php(276): Symfony\\Component\\Panther\\Client->get()\n#5 /var/www/html/tests/php/scraping/panther/index.php(26): Symfony\\Component\\Panther\\Client->request()\n#6 {main}"
I'm sending hits to GA Measurement Protocol, and some of them do not make it to the GA. I've noticed that all of them have one thing in common: the user-agent is Firefox, only varying version and device. Some examples:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (Android 10; Mobile; rv:103.0) Gecko/103.0 Firefox/103.0
GA validator is OK with those examples when checking them through the debug mode like this:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXXXXXXX-1&t=event&ec=Ecommerce&ea=purchase&pa=purchase&cid=1234567890.1234567890&ni=1&ti=184242&tr=1060&uip=X.X.X.X&ua=Mozilla%2F5.0+%28Windows+NT+10.0%3B+Win64%3B+x64%3B+rv%3A103.0%29+Gecko%2F20100101+Firefox%2F103.0&pr1id=test_1&pr1pr=530&pr1qt=1&pr1ps=1
I get this response:
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1..."
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
BUT in the production settings GA responses with 400 bad request error to the same requests without providing any details: "Your client has issued a malformed or illegal request. That’s all we know.".
So what might be wrong with Firefox UA?
UPD: I've managed to make this work by unsetting the 'User-Agent' header in case it contains 'Firefox' - and the corresponding 'ua' parameter in the payload gets accepted then.
if (strpos($requestHeaders['User-Agent'], 'Firefox') !== false) {
unset($requestHeaders['User-Agent']);
}
But it's still unclear what was wrong with such headers in the first place.
I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?
I am using selenium chromedriver.
Please add those options
# windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument("user-agent=#{linux_useragent}")
options.add_argument("--disable-web-security")
options.add_argument("--disable-xss-auditor")
options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
navigator.platform and navigator.userAgent should be matched.
If userAgent is for windows, then navigator.platform should be "Win32"
If userAgent is for linux, then navigator.platform should be "Linux x86_64"
You can set like that
platform = {
windows: "Win32",
linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
"source": "
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
}),
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
}),
Object.defineProperty(navigator, 'platform', {
get: () => \"#{platform[:linux]}\"
})"
})
and of course you need to set navigator.webdriver to undefined
Am trying to grok a message but its failing with _grokparsefailure in log but doesn't actually say what it's failing on. The grok query works on https://grokdebug.herokuapp.com/
input {
file {
type => "apache-access"
path => "C:/prdLogs/sent/*"
}
filter {
grok {
match => ['message', '%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \] "%{WORD:httpmethod} %{NOTSPACE:referrer} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} "-" "%{NOTSPACE:request}" %{QS:UserAgent} %{WORD:httpmethodO} - - HTTP/%{NUMBER:httpversion2} "%{WORD:session}:%{WORD:httpmed}" "-" %{NUMBER:duration}' ]
}
date {
match => [ "raw_timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
output {
elasticsearch { hosts => ["111.44.44.44:9200"] }
}
The data looks like:
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.select.item/?locale=en_GB&dojo.preventCache=1488075524942 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3203
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.no.recently.used/?locale=en_GB&dojo.preventCache=1488075525483 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3159
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/selector.label.selected/?locale=en_GB&dojo.preventCache=1488075525843 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3600
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/actor.selector.label.remove.all/?locale=en_GB&dojo.preventCache=1488075526305 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3224
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/com.label.filter.objects/?locale=en_GB&dojo.preventCache=1488075526711 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3299
This is actually an apache access log but I was unable to use COMBINEDAPACHELOG or COMMONAPACHELOG. Same error actually!!
All entries in elasticsearch are tagged as "_grokparsefailure". I ran logstash in debug mode with log.level at debug but am not seeing any errors in the log.
Am using the latest version of logstash.
Please advise.
R2 D2 Thanks, I tried this but no joy :(
I created a patterns file and pasted your pattern. I just changed the payload to just "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]" and the following was my filter:
filter {
grok {
patterns_dir => ["c:/logstashconfig/patterns"]
match => ['message', '%{IP:clientip} - - /[%{DATE_CUSTOM:timestamp}/]']
}
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
}
The debug log in logstash:
{
"path" => "C:/prdLogs/sent/test",
"#timestamp" => 2017-03-03T00:06:15.269Z,
"#version" => "1",
"host" => "hkw20012125",
"message" => "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]\r",
"type" => "apache-access",
"tags" => [
[0] "_grokparsefailure"
]
}
Any ideas? Is it the +0800 at the end of the data?
Thanks.
I think once you have GREEDYDATA in your pattern, it means to consider rest of your line from the log:
GREEDYDATA's pattern looks like:
GREEDYDATA .* <-- means to capture the entire line
And your grok match should look something like this if I'm not mistaken:
grok {
match => ['message', '%{IPV4:clientip} - - %{GREEDYDATA:data}']
}
unless you need the values to be extracted separately, the above grok should do the trick for you. And I think the way you're matching the timestamp is wrong. In order to handle your timestamp you need to have the below patterns within your patterns file:
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
YEAR (?>\d\d){1,2}
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_CUSTOM %{MONTHDAY}[/]%{MONTH }[/]%{YEAR}:%{TIME}
And then you could simply use this within your grok match:
grok {
match => ['message', '%{IPV4:clientip} - - \[%{DATE_CUSTOM:timestamp} %{GREEDYDATA:data}']
}
Now you'll be able to match the timestamp as:
date {
match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
target => '#timestamp'
}
Hope this helps!
When you have to build your own patterns, start from the left side, go slowly, and use the debugger.
If you test this pattern:
%{IP:clientip} - - \[
it works, but this one:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp} \]
doesn't. Comparing your pattern to the input shows that there aren't spaces between the timestamp and the close bracket.
Changing this part of the pattern to:
%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}\]
works.
When run the cobalt, I can see the useragent from the log:
[0101/000230:INFO:application.cc(690)] User Agent: Mozilla/5.0 (DirectFB; Linux x86_64) Cobalt/4.13031-qa (unlike Gecko) Starboard/1
So where does it come from? Is there a way to change it?
The default useragent is set in the following file, you can have a check:
https://cobalt.googlesource.com/cobalt/+/e9b4b99dab6e774b8b6e63add74c352cc5dd395a/src/cobalt/network/user_agent_string_factory.cc
std::string UserAgentStringFactory::CreateUserAgentString() {
// Cobalt's user agent contains the following sections:
// Mozilla/5.0 (ChromiumStylePlatform)
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
// Starboard/APIVersion,
// Device/FirmwareVersion (Brand, Model, ConnectionType)
// Mozilla/5.0 (ChromiumStylePlatform)
std::string user_agent =
base::StringPrintf("Mozilla/5.0 (%s)", CreatePlatformString().c_str());
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
base::StringAppendF(&user_agent, " Cobalt/%s.%s-%s (unlike Gecko)",
COBALT_VERSION, COBALT_BUILD_VERSION_NUMBER,
kBuildConfiguration);
// Starboard/APIVersion,
if (!starboard_version_.empty()) {
base::StringAppendF(&user_agent, " %s", starboard_version_.c_str());
}
// Device/FirmwareVersion (Brand, Model, ConnectionType)
if (youtube_tv_info_) {
base::StringAppendF(
&user_agent, ", %s_%s_%s/%s (%s, %s, %s)",
youtube_tv_info_->network_operator.value_or("").c_str(),
CreateDeviceTypeString().c_str(),
youtube_tv_info_->chipset_model_number.value_or("").c_str(),
youtube_tv_info_->firmware_version.value_or("").c_str(),
youtube_tv_info_->brand.c_str(), youtube_tv_info_->model.c_str(),
CreateConnectionTypeString().c_str());
}
return user_agent;
}
If your SbSystemGetDeviceType() is true for SystemDeviceTypeIsTv() (in file user_agent_string_factory_starboard.cc), you can customize the UA by implementing some fields of SbSystemGetProperty() + some SbSystemGet() functions.
This is a typical example:
Mozilla/5.0 (1) Cobalt/11.119147-gold (unlike Gecko) Starboard/8, 2_8_6/5 (3, 4, 7)
where,
kSbSystemPropertyPlatformName
kSbSystemPropertyNetworkOperatorName
kSbSystemPropertyManufacturerName
kSbSystemPropertyModelName
kSbSystemPropertyFirmwareVersion
kSbSystemPropertyChipsetModelNumber
SbSystemGetConnectionType()
SbSystemGetDeviceType()