How to use ingest pipeline with logstash elaticsearch output update feature - elasticsearch

I am using Logstash Elasticsearch output to publish data to Elasticsearch. Two records are merged to create a single record from a request and a response. This code is working with no issues.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
action => "update"
doc_as_upsert => true
document_id => "%{tid}"
script =>'
if(ctx._source.transaction=="request"){
ctx._source.status = params.event.get("status");
}else if(ctx._source.transaction=="response"){
ctx._source.api = params.event.get("api");
}
}
Now I am trying to do add a new field with above record update using ingest pipelines.
PUT _ingest/pipeline/ingest_pipe2
{
"description" : "describe pipeline",
"processors" : [
{
"set" : {
"field": "api-test",
"value": "new"
}
}
]
}
This will add a new field to the incoming event. It works fine with following code.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
pipeline => "ingest_pipe2"
}
The problem is both logstash update and ingest pipeline update doesn't work together.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
pipeline => "ingest_pipe2"**
action => "update"
doc_as_upsert => true
document_id => "%{tid}"
script =>'
if(ctx._source.transaction=="request"){
ctx._source.status = params.event.get("status");
}else if(ctx._source.transaction=="response"){
ctx._source.api = params.event.get("api");
}
}

It is not possible to use an ingest pipeline with doc_as_upsert
Using ingest pipelines with doc_as_upsert is not supported.
You can find more info here and here

Related

How to filter data with Logstash before storing parsed data in Elasticsearch

I understand that Logstash is for aggregating and processing logs. I have NGIX logs and had Logstash config setup as:
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "weblogs-%{+YYYY.MM}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
This would parse the unstructured logs into a structured form of data, and store the data into monthly indexes.
What I discovered is that the majority of logs were contributed by robots/web-crawlers. In python I would filter them out by:
browser_names = browser_names[~browser_names.str.\
match('^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$', na=False)]
However, I would like to filter them out with Logstash so I can save a lot of disk space in Elasticsearch server. Is there a way to do that? Thanks in advance!
Thanks LeBigCat for generously giving a hint. I solved this problem by adding the following under the filter:
if [browser_names] =~ /(?i)^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$/ {
drop {}
}
the (?i) flag is for case insensitive matching.
In your filter you can ask for drop (https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html). As you already got your pattern, should be pretty fast ;)

Filebeat -> Logstash indexing documents twice

I have Nginx logs being sent from Filebeat to Logstash which is indexing them into Elasticsearch.
Every entry gets indexed twice. Once with the correct grok filter and then again with no fields found except for the "message" field.
This is the logstash configuration.
02-beats-input.conf
input {
beats {
port => 5044
ssl => false
}
}
11-nginx-filter.conf
filter {
if [type] == "nginx-access" {
grok {
patterns_dir => ['/etc/logstash/patterns']
match => {"message" => "%{NGINXACCESS}"
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z", "d/MMM/YYYY:HH:mm:ss Z" ]
}
}
}
Nginx Patterns
NGUSERNAME [a-zA-Z\.\#\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IPORHOST:clientip}\s+%{NGUSER:ident}\s+%{NGUSER:auth}\s+\[%{HTTPDATE:timestamp}\]\s+\"%{WORD:verb}\s+%{URIPATHPARAM:request}\s+HTTP/%{NUMBER:httpversion}\"\s+%{NUMBER:response}\s+(?:%{NUMBER:bytes}|-)\s+(?:\"(?:%{URI:referrer}|-)\"|%{QS:referrer})\s+%{QS:agent}
30-elasticsearc-output.conf
output {
elasticsearch {
hosts => ["elastic00:9200", "elastic01:9200", "elastic02:9200"]
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
Check your filebeat configuration!
During setup I had accidentally un-commented and configured the output.elasticsearch section of the filebeat.yml.
I then also configured the output.logstash section of the configuration but forgot to comment out the elasticsearch output section.
This caused one entry to be sent to logstash where it was grok'd and another one to be sent directly to elasticsearch.

Elasticsearch Logstash Filebeat mapping

Im having a problem with ELK Stack + Filebeat.
Filebeat is sending apache-like logs to Logstash, which should be parsing the lines. Elasticsearch should be storing the split data in fields so i can visualize them using Kibana.
Problem:
Elasticsearch recieves the logs but stores them in a single "message" field.
Desired solution:
Input:
10.0.0.1 some.hostname.at - [27/Jun/2017:23:59:59 +0200]
ES:
"ip":"10.0.0.1"
"hostname":"some.hostname.at"
"timestamp":"27/Jun/2017:23:59:59 +0200"
My logstash configuration:
input {
beats {
port => 5044
}
}
filter {
if [type] == "web-apache" {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "IP: %{IPV4:client_ip}, Hostname: %{HOSTNAME:hostname}, - \[timestamp: %{HTTPDATE:timestamp}\]" }
break_on_match => false
remove_field => [ "message" ]
}
date {
locale => "en"
timezone => "Europe/Vienna"
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
useragent {
source => "agent"
prefix => "browser_"
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost:9200"]
index => "test1"
document_type => "accessAPI"
}
}
My Elasticsearch discover output:
I hope there are any ELK experts around that can help me.
Thank you in advance,
Matthias
The grok filter you stated will not work here.
Try using:
%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]
There is no need to specify desired names seperately in front of the field names (you're not trying to format the message here, but to extract seperate fields), just stating the field name in brackets after the ':' will lead to the result you want.
Also, use the overwrite-function instead of remove_field for message.
More information here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-options
It will look similar to that in the end:
filter {
grok {
match => { "message" => "%{IPV4:client_ip} %{HOSTNAME:hostname} - \[%{HTTPDATE:timestamp}\]" }
overwrite => [ "message" ]
}
}
You can test grok filters here:
http://grokconstructor.appspot.com/do/match

Dynamic Index in ElasticSearch from Logstash

I have following configuration in logstash whereby I am able to create dynamic "document_type" into ES based on input JSON received:
elasticsearch {
hosts => ["localhost:9200"]
index => "queuelogs"
document_type => "%{action}"
}
Here, "action" is the parameter that I receive in JSON and different document_type gets created as per different action received.
Now I want this to be done same for Index creation, such as following:
elasticsearch {
hosts => ["localhost:9200"]
index => "%{logtype}"
document_type => "%{action}"
}
Here, "logtype" is the parameter that I receive in JSON.
But somehow in ES, it creates index as "%{logtype}" only, not as per actual logtype value .
The input JSON is as following:
{
"action": "UPLOAD",
"user": "123",
"timestamp": "2016 Jun 14 12:00:12",
"data": {
"file_id": "2345",
"file_name": "xyz.pdf"
},
"header": {
"proj_id": "P123",
"logtype": "httplogs"
},
"comments": "Check comments"
}
Here, I tried to generate index in following ways:
index => "%{logtype}"
index => "%{header.logtype}"
But in both the cases, Logstash does not replace the actual value of logtype from JSON.
You need to specify it like this:
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[header][logtype]}"
document_type => "%{action}"
}

How to move data from one Elasticsearch index to another using the Bulk API

I am new to Elasticsearch. How to move data from one Elasticsearch index to another using the Bulk API?
I'd suggest using Logstash for this, i.e. you use one elasticsearch input plugin to retrieve the data from your index and another elasticsearch output plugin to push the data to your other index.
The config logstash config file would look like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "source_index" <--- the name of your source index
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "target_index" <---- the name of your target index
document_type => "your_doc_type" <---- make sure to set the appropriate type
document_id => "%{id}"
workers => 5
}
}
After installing Logstash, you can run it like this:
bin/logstash -f logstash.conf

Resources