How do I exclude the "fingerprint" field from elasticsearch - elasticsearch

I'm using the fingerprint filter in Logstash to create a fingerprint field that I set to document_id in the elasticsearch output.
Configuration is as follows:
filter {
fingerprint {
method => "SHA1"
key => "KEY"
}
}
output {
elasticsearch {
host => localhost
document_id => "%{fingerprint}"
}
}
This results in a redundant fingerprint field in Elasticsearch that's the same value as _id. How do I prevent this redundant field from being saved to ES?

If you're using logstash 1.5 or higher, you can put your field in metadata and then it will not be sent to elasticsearch as part of the regular message.
Example:
filter {
fingerprint {
...
target => "[#metadata][fingerprint]"
}
}
output {
elasticsearch {
...
document_id => "%{[#metadata][fingerprint]}"
}
}

You could set the document_id as the fingerprint target
filter {
fingerprint {
method => "SHA1"
key => "KEY"
target => "document_id"
}
}

Related

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

Elasticsearch change field type from filter dissect

I use logstash-logback-encoder to send java log files to logstash, and then to elasticsearch. To parse the message in java log, I use following filter to dissect message
input {
file {
path => "/Users/MacBook-201965/Work/java/logs/oauth-logstash.log"
start_position => "beginning"
codec => "json"
}
}
filter {
if "EXECUTION_TIME" in [tags] {
dissect {
mapping => {
"message" => "%{endpoint} timeMillis:[%{execution_time_millis}] data:%{additional_data}"
}
}
mutate {
convert => { "execution_time_millis" => "integer" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "elk-%{+YYYY}"
document_type => "log"
}
stdout {
codec => json
}
}
It dissect the message so I can get value of execution_time_millis. However the data type is string. I created the index using Kibana index pattern. How can I change the data type of execution_time_millis into long?
Here is the sample json message from logback
{
"message":"/tests/{id} timeMillis:[142] data:2282||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"142",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-7",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:20.100Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2282||0:0:0:0:0:0:0:1"
}{
"message":"/tests/{id} timeMillis:[110] data:2280||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"110",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-5",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:19.780Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2280||0:0:0:0:0:0:0:1"
}
Thank you
If you have already indexed the documents, you'll have to reindex the data after changing the datatype of any field.
However, you can use something like this to change the type of millis from string to integer. (long is not supported in this)
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-convert
Also, try defining elasticsearch template before creating index if are going to add multiple index with index names having some regex pattern.Else, you can define your index format beforehand too an then start indexing.

How to split a large json file input into different elastic search index?

The input to logstash is
input {
file {
path => "/tmp/very-large.json"
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
and sample json file
{"type":"type1", "msg":"..."}
{"type":"type2", "msg":"..."}
{"type":"type1", "msg":"..."}
{"type":"type3", "msg":"..."}
Is it possible to make them feed into different elastic search index, so I can process them easier in the future?
I know if it is possible to assign them with a tag, then I can do something like
if "type1" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-type1%{+YYYY.MM.dd}"
flush_size => 50
}
}
How to do similar thing by looking at a specific json field value, e.g. type in my above example?
Even simpler, just use the type field to build the index name like this:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}
You can compare on any fields. You'll have to first parse your json with the json filter or codec.
Then you'll have a type field to work on, like this:
if [type] == "type1" {
elasticsearch {
...
index => "logstash-type1%{+YYYY.MM.dd}"
}
} else if [type] == "type2" {
elasticsearch {
...
index => "logstash-type2%{+YYYY.MM.dd}"
}
} ...
Or like in Val's answer:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}

How to move data from one Elasticsearch index to another using the Bulk API

I am new to Elasticsearch. How to move data from one Elasticsearch index to another using the Bulk API?
I'd suggest using Logstash for this, i.e. you use one elasticsearch input plugin to retrieve the data from your index and another elasticsearch output plugin to push the data to your other index.
The config logstash config file would look like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "source_index" <--- the name of your source index
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "target_index" <---- the name of your target index
document_type => "your_doc_type" <---- make sure to set the appropriate type
document_id => "%{id}"
workers => 5
}
}
After installing Logstash, you can run it like this:
bin/logstash -f logstash.conf

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:
output {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
}
I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.
output {
elasticsearch_http {
host => "127.0.0.1"
if document_id {
document_id => "%{document_id}"
}
}
}
Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
elasticsearch_http {
host => "127.0.0.1"
if
I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:
if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:
output {
if [document_id] {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
} else {
elasticsearch_http {
host => "127.0.0.1"
}
}
}
(But the suggestion in one of the other answers to use the uuid filter is good too.)
One way to solve this is to make sure a document_idis always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id field if it is not present.
filter {
if "" in [document_id] {
uuid {
target => "document_id"
}
}
}
Edited per Magnus Bäck's suggestion. Thanks!
Reference : docinfo_fields
For any document added in elasticsearch, the _id is auto-generated if not specified during insert. We can use this same _id later to update/delete/search queries by using docinfo_fields feature.
Example :
filter {
json {
source => "message"
}
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
query => "..."
docinfo_fields => {
"_id" => "docid"
"_index" => "document_index"
}
}
if ("_elasticsearch_lookup_failure" not in [tags]) {
#... doc update logic ...
}
}
output {
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
index => "%{document_index}"
action => "update"
doc_as_upsert => true
document_id => "%{docid}"
}
}

Resources