update elastic-search document with the same ID - elasticsearch

everyone. I'm new in elk and I have a question about logstash.
I have some services and each one has 4 or 6 logs; it means a doc in elastic may has 4 or 6 logs.
I want to read these logs and if they have the same id, put them in one elastic doc.
I must specify that all of the logs have a unique "id" and each request and every log that refers to that request has the same id. each log has a specific type.
I want to put together every log that has the same id and type; like this:
{
"_id":"123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
Every log for the same requset:
Some of them must be in the same group. because their type are the same. look example above. Type2 is Json Array and has 2 jsons. I want to use logstash to read every log and have them classified.
Imagine that our doc is like bellow JSON at the moment:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{}
}
now a new log arrives, with id 123 and it's type is Type4. The doc must update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{},
"Type4":{}
}
again, I have new log with id, 123 and type, Type3. the doc update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
I tried with script, but I didn't succeed. :
{
"id": 1,
"Type2": {}
}
The script is:
input {
stdin {
codec => json_lines
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{requestId}"
action => "update" # update if possible instead of overwriting
document_type => "_doc"
script_lang => "painless"
scripted_upsert => true
script_type => "inline"
script => 'if (ctx._source.Type3 == null) { ctx._source.Type3 = new ArrayList() } if(!ctx._source.Type3.contains("%{Type3}")) { ctx._source.Type3.add("%{Type3}")}'
}
}
now my problem is this script format just one type; if it works for multiple types, what would it look like?
there is one more problem. I have some logs that they don't have an id, or they have an id, but don't have a type. I want to have these logs in the elastic, what should I do?

You can have a look on aggregate filter plugin for logstash. Or as you mentioned if some of the logs don't have an id, then you can use fingerprint filter plugin to create an id, which you can use to update document in elasticsearch.
E.g:
input {
stdin {
codec => json_lines
}
}
filter {
fingerprint {
source => "message"
target => "[#metadata][id]"
method => "MURMUR3"
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{[#metadata][id]}"
action => "update" # update if possible instead of overwriting
}
}

Related

CSV response from get request in elastic search

I am sending an http Get request to elastic search server and i want the response to be in csv format.Like in solr we can specify wt=csv is there any way In elastic Search too ?
My query is :
enter code here
http://elasticServer/_search?q=RCE:"some date" OR
VENDOR_NAME:"Anuj"&from=0&size=5&sort=#timestamp
-----After that i want to force the server to return me response in csv format
By default, ES supports only two data formats: JSON and YAML. However, if you're open to using Logstash, you can achieve what you want very easily like this:
input {
elasticsearch {
hosts => ["localhost:9200"]
query => 'RCE:"some date" OR VENDOR_NAME:"Anuj"'
size => 5
}
}
filter {}
output {
csv {
fields => ["field1", "field2", "field3"]
path => "/path/to/data.csv"
}
}
Since the elasticsearch input uses scrolling, you cannot specify any sorting. So if sorting is really important to you, you can use the http_poller input instead of the elasticsearch one, like this:
input {
http_poller {
urls => {
es => {
method => get
url => 'http://elasticServer/_search?q=RCE:"some date" OR VENDOR_NAME:"Anuj"&from=0&size=5&sort=#timestamp'
headers => {
Accept => "application/json"
}
}
}
codec => "json"
}
}
filter {}
output {
csv {
fields => ["field1", "field2", "field3"]
path => "/path/to/data.csv"
}
}
There is a ElasticSearch plugin on Github called Elasticsearch Data Format Plugin that should satisfy your requirements.

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

Logstash Update a document in elasticsearch

Trying to update a specific field in elasticsearch through logstash. Is it possible to update only a set of fields through logstash ?
Please find the code below,
input {
file {
path => "/**/**/logstash/bin/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "multi"
}
}
filter {
csv {
separator => "|"
columns => ["GEOREFID","COUNTRYNAME", "G_COUNTRY", "G_UPDATE", "G_DELETE", "D_COUNTRY", "D_UPDATE", "D_DELETE"]
}
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
query => "GEOREFID:%{GEOREFID}"
fields => [["JSON_COUNTRY","G_COUNTRY"],
["XML_COUNTRY","D_COUNTRY"]]
}
if [G_COUNTRY] {
mutate {
update => { "D_COUNTRY" => "%{D_COUNTRY}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
document_id => "%{GEOREFID}"
}
}
We are using the above configuration when we use this the null value field is getting removed instead of skipping null value update.
Data comes from 2 different source. One is from XML file and the other is from JSON file.
XML log format : GEO-1|CD|23|John|892|Canada|31-01-2017|QC|-|-|-|-|-
JSON log format : GEO-1|AS|33|-|-|-|-|-|Mike|123|US|31-01-2017|QC
When adding one log new document will get created in the index. When reading the second log file the existing document should get updated. The update should happen only in the first 5 fields if log file is XML and last 5 fields if the log file is JSON. Please suggest us on how to do this in logstash.
Tried with the above code. Please check and can any one help on how to fix this ?
For the Elasticsearch output to do any action other than index you need to tell it to do something else.
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
document_id => "%{GEOREFID}"
}
This should probably be wrapped in a conditional to ensure you're only updating records that need updating. There is another option, though, doc_as_upsert
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
doc_as_upsert => true
document_id => "%{GEOREFID}"
}
This tells the plugin to insert if it is new, and update if it is not.
However, you're attempting to use two inputs to define a document. This makes things complicated. Also, you're not providing both inputs, so I'll improvise. To provide different output behavior, you will need to define two outputs.
input {
file {
path => "/var/log/xmlhome.log"
[other details]
}
file {
path => "/var/log/jsonhome.log"
[other details]
}
}
filter { [some stuff ] }
output {
if [path] == '/var/log/xmlhome.log' {
elasticsearch {
[XML file case]
}
} else if [path] == '/var/log/jsonhome.log' {
elasticsearch {
[JSON file case]
action => "update"
}
}
}
Setting it up like this will allow you to change the ElasticSearch behavior based on where the event originated.

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:
output {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
}
I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.
output {
elasticsearch_http {
host => "127.0.0.1"
if document_id {
document_id => "%{document_id}"
}
}
}
Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
elasticsearch_http {
host => "127.0.0.1"
if
I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:
if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:
output {
if [document_id] {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
} else {
elasticsearch_http {
host => "127.0.0.1"
}
}
}
(But the suggestion in one of the other answers to use the uuid filter is good too.)
One way to solve this is to make sure a document_idis always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id field if it is not present.
filter {
if "" in [document_id] {
uuid {
target => "document_id"
}
}
}
Edited per Magnus Bäck's suggestion. Thanks!
Reference : docinfo_fields
For any document added in elasticsearch, the _id is auto-generated if not specified during insert. We can use this same _id later to update/delete/search queries by using docinfo_fields feature.
Example :
filter {
json {
source => "message"
}
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
query => "..."
docinfo_fields => {
"_id" => "docid"
"_index" => "document_index"
}
}
if ("_elasticsearch_lookup_failure" not in [tags]) {
#... doc update logic ...
}
}
output {
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
index => "%{document_index}"
action => "update"
doc_as_upsert => true
document_id => "%{docid}"
}
}

elasticsearch message field value

I am sending json messages to logstash getting indexed by elasticsearch and managed to setup the UI dashboard in Kibana. I would like to filter the data by the message fields and cannot figure out how or where to do this. An example of my message:
{"message":"{"pubDate":"2014-02-25T13:09:14",
"scrapeDate":"2014-02-5T13:09:26",
"Id":"78967",
"query":"samsung S5",
"lang":"en"}
Right now it counts all these messages coming in but I need to get each message filtered by the fields itself for example like Id or lang or query.
Does this have to be done in the config file or can it be created in Kibana interface.
First, I assume your json messages is
{
"pubDate":"2014-02-25T13:09:14",
"scrapeDate":"2014-02-5T13:09:26",
"Id":"78967",
"query":"samsung S5",
"lang":"en"
}
When you send your message to logstash, you need to specify the codec to json. As show in the configuration below:
input {
stdin {
codec => json
}
}
output {
elasticsearch {
cluster => "abc"
}
}
Logstash will parsing your message to different field, like the output:
{
"pubDate" => "2014-02-25T13:09:14",
"scrapeDate" => "2014-02-5T13:09:26",
"Id" => "78967",
"query" => "samsung S5",
"lang" => "en",
"#version" => "1",
"#timestamp" => "2014-02-26T01:36:15.336Z",
"host" => "AAAAAAAAAA"
}
When you show this data in Kibana, You can use fieldname:value to query and filter what you need. For example, you can query all message with lang:en.

Resources