Combining log entries with logstash - elasticsearch

I want to collect and process logs from dnsmasq and I´ve decided to use ELK. Dnsmasq is used as a DHCP Server and as a DNS Resolver and hence it creates log entries for both services.
My goal is to send to Elasticsearch all DNS Queries with the requester IP, requester hostname (if available) and requester mac address. That will allow me to group the request per mac address regardless if the device IP changed or not, and display the host name.
What I would like to do is the following:
1) Read the entries like:
Mar 30 21:55:34 dnsmasq-dhcp[346]: 3806132383 DHCPACK(eth0) 192.168.0.80 04:0c:ce:d1:af:18 air
2) Store temporarily the relationship:
192.168.0.80 => 04:0c:ce:d1:af:18
192.168.0.80 => air
3) Enrich the entries like the one below adding the mac address and hostname. If the hostname was empty I would add the mac address.
Mar 30 22:13:05 dnsmasq[346]: query[A] imap.gmail.com from 192.168.0.80
I found a module called “memorize” that would allow me to store them but unfortunately does not work with the latest version of Logstash
The versions I´m using:
ElastiSearch 2.3.0
Kibana 4.4.2
Logstash 2.2.2
And the logstash filter (this is my first attempt with logstash and hence I´m sure the configuration file can be improved)
input {
file {
path => "/var/log/dnsmasq.log"
start_position => "beginning"
type => "dnsmasq"
}
}
filter {
if [type] == "dnsmasq" {
grok {
match => [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: ?(%{NONNEGINT:num} )?%{NOTSPACE:action} %{IP:clientip} %{MAC:clientmac} ?(%{HOSTNAME:clientname})?"]
match => [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: ?(%{NONNEGINT:num} )?%{USER:action}?(\[%{USER:subaction}\])? %{NOTSPACE:domain} %{NOTSPACE:function} %{IP:clientip}"]
match => [ "message", "%{SYSLOGTIMESTAMP:reqtimestamp} %{USER:program}\[%{NONNEGINT:pid}\]\: %{NOTSPACE:action} %{DATA:data}"]
}
if [action] =~ "DHCPACK" {
}else if [action] == "query" {
}else
{
drop{}
}
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}
Questions:
1) Is there an alternative to the plugin “memorize” working with the latest logstash version? Either another plugin or different procedure.
2) Shall I downgrade logstash to a version before 2 (I think the previous is 1.5.4)? If so, is there any known sever issue or incompatibility with elasticsearch 2.2.1?
3) Or shall I modify the plugin “memorize” allowing logstash 2.x (if so I´ll appreciate any pointer on how to start)?

There's no need to repack the memorize plugin for this in my opinion. You can use the aggregate filter to achieve what you want.
...
# record host/mac in temporary map
if [action] =~ "DHCPACK" {
aggregate {
task_id => "%{clientip}"
code => "map['clientmac'] = event['clientmac']; map['clientname'] = event['clientname'];"
map_action => "create_or_update"
# timeout set to 48h
timeout => 172800
}
}
# add host/mac where/when needed
else if [action] == "query" {
aggregate {
task_id => "%{clientip}"
code => "event['clientmac'] = map['clientmac']; event['clientname'] = map['clientname']"
map_action => "update"
}
}

So to use memorize with logstash >2.0
Clone the repository.
Open file logstash-filter-memorize.gemspec
Change s.add_runtime_dependency "logstash-core", '>= 1.4.0', '< 2.0.0' as s.add_runtime_dependency "logstash-core", '>= 1.4.0', '< 3.0.0'
Build plugin via: gem build logstash-filter-memorize.gemspec
Install it via: $ bin/logstash-plugin install /path/to/memorize/logstash-filter-memorize-0.9.1.gem
I tried it and seems to work.

Related

Logstash content based filtering, into multiple indexs

I am currently pulling JSON log files from an S3 bucket which contain different types of logs defined as RawLog, along with another value which is MessageSourceType (there are more metadata fields which I don't care about). Each line on the file is a separate log in case that makes a difference.
I currently have these all going into 1 index as seen in my config below, however, I ideally want to split these out into separate indexes. For example, if the MessageSourceType = Syslog - Linux Host then I need logstash to extract the RawLog as syslog and place it into an index called logs-syslog, whereas if the MessageSourceType = MS Windows Event Logging XML I want it to extract the RawLog as XML and place it in an index called logs-MS_Event_logs.
filter {
mutate {
replace => [ "message", "%{message}" ]
}
json {
source => "message"
remove_field => "message"
}
}
output {
elasticsearch {
hosts => ["http://xx.xx.xx.xx:xxxx","http://xx.xx.xx.xx:xxxx"]
index => "logs-received"
}
Also for a bit of context here is an example of one of the logs:
{"MsgClassTypeId":"3000","Direction":"0","ImpactedZoneEnum":"0","message":"<30>Feb 13 23:45:24 xx.xx.xx.xx Account=\"\" Action=\"\" Aggregate=\"False\" Amount=\"\" Archive=\"True\" BytesIn=\"\" BytesOut=\"\" CollectionSequence=\"825328\" Command=\"\" CommonEventId=\"3\" CommonEventName=\"General Operations\" CVE=\"\" DateInserted=\"2/13/2021 11:45:24 PM\" DInterface=\"\" DIP=\"\" Direction=\"0\" DirectionName=\"Unknown\" DMAC=\"\" DName=\"\" DNameParsed=\"\" DNameResolved=\"\" DNATIP=\"\" DNATPort=\"-1\" Domain=\"\" DomainOrigin=\"\" DPort=\"-1\" DropLog=\"False\" DropRaw=\"False\" Duration=\"\" EntityId=\"" EventClassification=\"-1\" EventCommonEventID=\"-1\" FalseAlarmRating=\"0\" Forward=\"False\" ForwardToLogMart=\"False\" GLPRAssignedRBP=\"-1\" Group=\"\" HasBeenInserted_EMDB=\"False\" HasBeenQueued_Archiving=\"True\" HasBeenQueued_EventProcessor=\"False\" HasBeenQueued_LogProcessor=\"True\" Hash=\"\" HostID=\"44\" IgnoreGlobalRBPCriteria=\"False\" ImpactedEntityId=\"0\" ImpactedEntityName=\"\" ImpactedHostId=\"-1\" ImpactedHostName=\"\" ImpactedLocationKey=\"\" ImpactedLocationName=\"\" ImpactedNetworkId=\"-1\" ImpactedNetworkName=\"\" ImpactedZoneEnum=\"0\" ImpactedZoneName=\"\" IsDNameParsedValue=\"True\" IsRemote=\"True\" IsSNameParsedValue=\"True\" ItemsIn=\"\" ItemsOut=\"\" LDSVERSION=\"1.1\" Login=\"\" LogMartMode=\"13627389\" LogSourceId=\"158\" LogSourceName=\"ip-xx-xx-xx-xx.eu-west-2.computer.internal Linux Syslog\" MediatorMsgID=\"0\" MediatorSessionID=\"1640\" MsgClassId=\"3999\" MsgClassName=\"Other Operations\" MsgClassTypeId=\"3000\" MsgClassTypeName=\"Operations\" MsgCount=\"1\" MsgDate=\"2021-02-13T23:45:24.0000000+00:00\" MsgDateOrigin=\"0\" MsgSourceHostID=\"44\" MsgSourceTypeId=\"88\" MsgSourceTypeName=\"Syslog - Linux Host\" NormalMsgDate=\"2021-02-13T23:45:24.0540000Z\" Object=\"\" ObjectName=\"\" ObjectType=\"\" OriginEntityId=\"0\" OriginEntityName=\"\" OriginHostId=\"-1\" OriginHostName=\"\" OriginLocationKey=\"\" OriginLocationName=\"\" OriginNetworkId=\"-1\" OriginNetworkName=\"\" OriginZoneEnum=\"0\" OriginZoneName=\"\" ParentProcessId=\"\" ParentProcessName=\"\" ParentProcessPath=\"\" PID=\"-1\" Policy=\"\" Priority=\"4\" Process=\"\" ProtocolId=\"-1\" ProtocolName=\"\" Quantity=\"\" Rate=\"\" Reason=\"\" Recipient=\"\" RecipientIdentity=\"\" RecipientIdentityCompany=\"\" RecipientIdentityDepartment=\"\" RecipientIdentityDomain=\"\" RecipientIdentityID=\"-1\" RecipientIdentityTitle=\"\" ResolvedImpactedName=\"\" ResolvedOriginName=\"\" ResponseCode=\"\" Result=\"\" RiskRating=\"0\" RootEntityId=\"9\" Sender=\"\" SenderIdentity=\"\" SenderIdentityCompany=\"\" SenderIdentityDepartment=\"\" SenderIdentityDomain=\"\" SenderIdentityID=\"-1\" SenderIdentityTitle=\"\" SerialNumber=\"\" ServiceId=\"-1\" ServiceName=\"\" Session=\"\" SessionType=\"\" Severity=\"\" SInterface=\"\" SIP=\"\" Size=\"\" SMAC=\"\" SName=\"\" SNameParsed=\"\" SNameResolved=\"\" SNATIP=\"\" SNATPort=\"-1\" SPort=\"-1\" Status=\"\" Subject=\"\" SystemMonitorID=\"9\" ThreatId=\"\" ThreatName=\"\" UniqueID=\"7d4c4ed3-a2fc-44bc-a7ec-0b8b68e7f456\" URL=\"\" UserAgent=\"\" UserImpactedIdentity=\"\" UserImpactedIdentityCompany=\"\" UserImpactedIdentityDomain=\"\" UserImpactedIdentityID=\"-1\" UserImpactedIdentityTitle=\"\" UserOriginIdentity=\"\" UserOriginIdentityCompany=\"\" UserOriginIdentityDepartment=\"\" UserOriginIdentityDomain=\"\" UserOriginIdentityID=\"-1\" UserOriginIdentityTitle=\"\" VendorInfo=\"\" VendorMsgID=\"\" Version=\"\" RawLog=\"02 13 2021 23:45:24 xx.xx.xx.xx <SYSD:INFO> Feb 13 23:45:24 euw2-ec2--001 metricbeat[3031]: 2021-02-13T23:45:24.264Z#011ERROR#011[logstash.node_stats]#011node_stats/node_stats.go:73#011error making http request: Get \\\"https://xx.xx.xx.xx:9600/\\\": dial tcp xx.xx.xx.xx:9600: connect: connection refused\"","CollectionSequence":"825328","NormalMsgDate":"2021-02-13T23:45:24.0540000Z"}
I am a little unsure of the best way to achieve this and thought you guys might have some suggestions. I have looked into grok and think this may achieve my objective however I'm unsure where to start.
You can do this with conditionals in your filter section and define the target index according to the type of logs you're parsing.
filter {
... other filters ...
if [MsgSourceTypeName] == "Syslog - Linux Host" {
mutate {
add_field => {
"[#metadata][target_index]" => "logs-syslog"
}
}
}
else if [MsgSourceTypeName] == "MS Windows Event Logging XML" {
mutate {
add_field => {
"[#metadata][target_index]" => "logs-ms_event_log"
}
}
}
}
output {
elasticsearch {
hosts => ["http://xx.xx.xx.xx:xxxx","http://xx.xx.xx.xx:xxxx"]
index => "%{[#metadata][target_index]}"
}
}

ELK - How to use different source in logstash

I have a so far running ELK installation that I want to use to analyse log files from differenct sources:
nginx-logs
auth-logs
and so on...
I am using filebeat to collect content from logfiles and sending it to logstash with this filebeat.yml:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
- /var/nginx/example_com/logs/
output.logstash:
hosts: ["localhost:5044"]
In logstash I alread configured a grok-section, but only for nginx-logs. This was the only working tutorial I found. So this config receives content from filebeat, filters is (that's what grok is for?) and sends it to elasticsearch.
input {
beats {
port => 5044
}
}
filter {
grok {
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{NGINXACCESS}" }
}
}
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
That's the content of the one nginx-pattern file I am referencing:
NGUSERNAME [a-zA-Z\.\#\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IPORHOST:clientip} (?:-|(%{WORD}.%{WORD})) %{USER:ident} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:forwarder}
But I have trouble understanding how to manage different log-data sources. Because now Kibana only displays log content from /var/log, but there is no log data from my particular nginx folder.
What is it, that I am doing wrong here?
Since you are running filebeat, you already have a module available, that process nginx logs filebeat nginx module
This way, you will not need logstash to process the logs, and you only have to point the output directly to elasticsearch.
But, since you are processing multiple paths with different logs, and because elastic stack don't allow to have multiple output forms (logstash + elasticserach), you can set logstash to only process logs that do not come from nginx. This way, and using the module (that comes with sample dashboards) , your logs will do:
Filebeat -> Logstash (from input plugin to output plugin - without any filtering) -> Elasticsearch
If you really want to process the logs on your own, you are in a good path to finish. But right now, all your logs are being process by the grok pattern. So maybe the problem is with your pattern, that processes logs from nginx, and not from nginx in the same way. You can filter the logs in the filter plugin, with something like this:
#if you are using the module
filter {
if [fileset][module] == "nginx" {
}
}
if not, please check different available examples at logstash docs
Another thing you can try, it's add this to you filter. This way, if the grok fails,you will see the log in kibana, but, with the "_grok_parse_error_nginx_error" failure tag.
grok {
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{NGINXACCESS}" }
tag_on_failure => [ "_grok_parse_error_nginx_error" ]
}

Logstash Elastic Cloud 401 Unauthorized error

Official logstash elastic cloud module
Official doc for starting with
My logstash.yml looks like:
cloud.id: "Test:testkey"
cloud.auth: "elastic:password"
With 2 spaces in front and no space at end, within ""
This is all I have in logstash.yml and nothing else,
And I am getting:
[2018-08-29T12:33:52,112][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://myserverurl:12345/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '401' contacting Elasticsearch at URL 'https://myserverurl:12345/'"}
And the my_config_file_name.conf looks like:
input{jdbc{...jdbc here... This works, as I see data in windows console}}
output {
stdout { codec => json_lines }
elasticsearch {
hosts => ["myserverurl:12345"]
index => "my_index"
# document_id => "%{brand}"
}
What I am doing is hitting bin/logstash on windows cmd,
It loads data from database that I have configured in input of conf file and then shows me error, I want to index my data from MySQL to elasticsearch on Cloud, I took 14 days trial and created a test index, for learning purpose as I later have to deploy it.
My Pipeline looks like:
- pipeline.id: my_id
path.config: "./config/conf_file_name.conf"
pipeline.workers: 1
If logs won't include senistive data, I can also provide them.
Basically I wan't to sync (schedule check) my MYSQL data with ElasticSearch on cloud i.e. AWS
The output shall be:
elasticsearch {
hosts => ["https://yourhost:yourport/"]
user => "elastic"
password => "password"
# protocol => https
# port => "yourport"
index => "test_index"
# document_id => "%{table_id}"
# - represent comments
as stated at: Configuring logstash with elastic cloud docs
The document provided while deploying app does not provide config for jdbc, jdbc as well need user and password even if defined in settings file i.e. logstash.yml
Also if you created your API key in the web UI you will not be able to get the values needed to configure Logstash. You must to use the devtool console found at /app/dev_tools#/console with something like this:
POST /_security/api_key
{
"name": "logstash"
}
of which the output is something like:
{
"id": "<id value>",
"name": "logstash",
"api_key": "<api key>",
"encoded": "<encoded api key>"
}
And in your logstash pipeline config you use the values like this:
output {
elasticsearch {
cloud_id => "<cloud id>"
api_key => "<id value>:<api key>"
data_stream => true
ssl => true
}
stdout { codec => rubydebug }
}
Note the combined "api_key" value separated by ":". Also, you can find the "cloud id" under your "Deployments" menu option.
I add the same issue in my dev environment. After scour hours on google, I understood by default, when you install Logstash, X-Pack is installed. In the doc https://www.elastic.co/guide/en/logstash/current/setup-xpack.html it is stated that
Blockquote
X-Pack is an Elastic Stack extension that provides security, alerting, monitoring, machine learning, pipeline management, and many other capabilities
Blockquote
As I don't need x-pack to run in my dev while I am streaming Elasticsearch, I had to disable it by setting ilm_enabled to false in the output of my indexation file configuration.
output {
elasticsearch {
hosts => [.. ]
ilm_enabled => false
}
}
The link bellow may help
https://discuss.opendistrocommunity.dev/t/logstash-oss-with-non-removable-x-pack/655/3

Logstash for Vagrant: Address already in use

I have a Vagrant image in which there is an application; it is reachable in the Vagrant image if you call the port 2401 and depending on the service that you want, you call a specific address (i.e. "curl -X GET http://127.0.0.1:2401/provider/ipfix"). To retrieve the output outside the Vagrant machine I have set a port forwarding in the Vagrant file ("config.vm.network :forwarded_port, guest: 2401, host: 8080"), thus using the command "curl -X GET http://127.0.0.1:8080/provider/ipfix" from host I get the same output.
I am now on the phase of installing Logstash. My issue is that when I run Logstash with the config file I get the error "Address already in use". I tried to use also fields to guide to the specific output. Below is my Logstash config file. What workaround would you suggest?
input {
tcp {
host => localhost
port => 8080
add_field => {
"field1" => "provider"
"field2" => "ipfix"
}
codec => netflow {
versions => [10]
target => ipfix
}
type => ipfix
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "IPFIX-logstash-%{+YYYY.MM.dd}"
}
}
If I'm reading this right, you're expecting Logstash to use TCP to connect to localhost:8080 to fetch information that it will then process.
That's not what this input does. This creates a listener on 127.0.0.1:8080, so the error message about 'already in use' is quite correct.
Considering you're using curl as an example of fetching this data, I suggest the http_poller plugin is better for what you want.
input {
http_poller {
urls => {
IPFIX => "http://127.0.0.1:8080/provider/ipfix"
}
request_timeout => 30
schedule => { "every" => "5s" }
add_tags => [ 'ipfix' ]
}
}
This will hit the known-working CURL URL every 5 seconds with a GET request.

Logstash not writing output to elasticsearch

The code mentioned is my logstash conf file . I provide my nginx access log file as input and output to elasticsearch .I also write the output to a text file which works fine .. But the output is never been written to elasticsearch.
input {
file {
path => "filepath"
start_position => "beginning"
}
}
output {
file {
path => "filepath"
}
elasticsearch {
host => localhost
port => "9200"
}
}
I also tried executing logstash binary from command line using -e option
input { stdin{ } output { elasticsearch { host => localhost } }
which works fine. I get the output written to elasticsearch.. But in the former case i dont . Help me solve this
I tried a few things, I have no idea why your case with just host works. If I try it, i get timeouts. This is the configuration that works for me:
elasticsearch {
protocol => "http"
host => "localhost"
port => "9200"
}
I tried with logstash 1.4.2 and elasticsearch 1.4.4

Resources