Create a Kibana graph from logstash logs - elasticsearch

I need to create a graph in kibana according to a specific value.
Here is my raw log from logstash :
2016-03-14T15:01:21.061Z Accueil-PC 14-03-2016 16:01:19.926 [pool-3-thread-1] INFO com.github.vspiewak.loggenerator.SearchRequest - id=300,ip=84.102.53.31,brand=Apple,name=iPhone 5S,model=iPhone 5S - Gris sideral - Disque 64Go,category=Mobile,color=Gris sideral,options=Disque 64Go,price=899.0
In this log line, I have the id information "id=300".
In order to create graphics in Kibana using the id value, I want a new field. So I have a specific grok configuration :
grok {
match => ["message", "(?<mycustomnewfield>id=%{INT}+)"]
}
With this transformation I get the following JSON :
{
"_index": "metrics-2016.03.14",
"_type": "logs",
"_id": "AVN1k-cJcXxORIbORG7w",
"_score": null,
"_source": {
"message": "{\"message\":\"14-03-2016 15:42:18.739 [pool-1950-thread-1] INFO com.github.vspiewak.loggenerator.SellRequest - id=300,ip=54.226.24.77,email=client951#gmail.com,sex=F,brand=Apple,name=iPad R\\\\xE9tina,model=iPad R\\\\xE9tina - Noir,category=Tablette,color=Noir,price=509.0\\\\r\",\"#version\":\"1\",\"#timestamp\":\"2016-03-14T14:42:19.040Z\",\"path\":\"D:\\\\LogStash\\\\logstash-2.2.2\\\\logstash-2.2.2\\\\bin\\\\logs.logs.txt\",\"host\":\"Accueil-PC\",\"type\":\"metrics-type\",\"mycustomnewfield\":\"300\"}",
"#version": "1",
"#timestamp": "2016-03-14T14:42:19.803Z",
"host": "127.0.0.1",
"port": 57867
},
"fields": {
"#timestamp": [
1457966539803
]
},
"sort": [
1457966539803
]}
A new field was actually created (the field 'mycustomnewfield') but within the message field ! As a result I can't see it in kibana when I try to create a graph. I tried to create a "scripted field" in Kibana but only numeric field can be accessed.
Should I create an index in elasticSearch with a specific mapping to create a new field ?

There was actually something wrong with my configuration. I should have paste the whole configuration with my question. In fact i'm using logstash as a shipper and also as a log server. On the server side, I modified the configuration :
input {
tcp {
port => "yyyy"
host => "x.x.x.x"
mode => "server"
codec => json # I forgot this option
}}
Because the logstash shipper is actually sending json, I need to advice the server about this. Now I no longer have a message field within a message field, and my new field is inserted at the right place.

Related

Calculate field data size and store to other field at indexing time ElasticSearch 7.17

I am looking for a way to store the size of a field (bytes) in a new field of a document.
I.e. when a document is created with a field message that contains the value hello, I want another field message_size_bytes written that in this example has the value 5.
I am aware of the possibilities using _update_by_query and _search using scripting fields, but I have so much data that I do not want to calculate the sizes while querying but at index time.
Is there a possibility to do this using Elasticsearch 7.17 only? I do not have access to the data before it's passed to elasticsearch.
You can use Ingest Pipeline with Script processor.
You can create pipeline using below command:
PUT _ingest/pipeline/calculate_bytes
{
"processors": [
{
"script": {
"description": "Calculate bytes of message field",
"lang": "painless",
"source": """
ctx['message_size_bytes '] = ctx['message'].length();
"""
}
}
]
}
After creating pipeline, you cna use pipeline name while indexing data like below (same you can use in logstash, java or anyother client as well):
POST 74906877/_doc/1?pipeline=calculate_bytes
{
"message":"hello"
}
Result:
"hits": [
{
"_index": "74906877",
"_id": "1",
"_score": 1,
"_source": {
"message": "hello",
"message_size_bytes ": 5
}
}
]

Logstash parsing different line than 1st line as header

I have a sample data:
employee_name,user_id,O,C,E,A,N
Yvette Vivien Donovan,YVD0093,38,19,29,15,36
Troy Alvin Craig,TAC0118,34,40,24,15,34
Eden Jocelyn Mcclain,EJM0952,20,37,48,35,34
Alexa Emma Wood,AEW0655,25,20,18,40,38
Celeste Maris Griffith,CMG0936,36,13,18,50,29
Tanek Orson Griffin,TOG0025,40,36,24,19,26
Colton James Lowery,CJL0436,39,41,27,25,28
Baxter Flynn Mcknight,BFM0761,42,32,28,17,22
Olivia Calista Hodges,OCH0195,37,36,39,38,32
Price Zachery Maldonado,PZM0602,24,46,30,18,29
Daryl Delilah Atkinson,DDA0185,17,43,33,18,25
And logstash config file as:
input {
file {
path => "/path/psychometric_data.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
amazon_es {
hosts => [ "https://xxx-xxx-es-xxx.xx-xx-1.es.amazonaws.com:443" ]
ssl => true
region => "ap-south-1"
index => "psychometric_data"
}
}
I am expecting 1st row(i.e. employee_name,user_id,O,C,E,A,N) as a Elasticsearch field name(header), but I am gettting 3rd row(i.e.Troy Alvin Craig,TAC0118,34,40,24,15,34) as header as follows.
{
"_index": "psychometric_data",
"_type": "_doc",
"_id": "md4hm3YB8",
"_score": 1,
"_source": {
"15": "21",
"24": "17",
"34": "39",
"40": "37",
"#version": "1",
"#timestamp": "2020-12-25T18:20:00.759Z",
"message": "Ishmael Mannix Velazquez,IMV0086,22,37,17,21,39\r",
"path": "/path/psychometric_data.csv",
"Troy Alvin Craig": "Ishmael Mannix Velazquez",
"host": "xx-ThinkPad-xx",
"TAC0118": "IMV0086"
}
}
What might be the reason for it?
If you set autodetect_column_names to true then the filter interprets the first line that it sees as the column names. If pipeline.workers is set to more than one then it is a race to see which thread sets the column names first. Since different workers are processing different lines this means it may not use the first line. You must set pipeline.workers to 1.
In addition to that, the java execution engine (enabled by default) does not always preserve the order of events. There is a setting pipeline.ordered in logstash.yml that controls that. In 7.9 that keeps event order iff pipeline.workers is set to 1.
You do not say which version you are running. For anything from 7.0 (when java_execution became the default) to 7.6 the fix is to disable the java engine using either pipeline.java_execution: false in logstash.yml or --java_execution false on the command line. For any 7.x release from 7.7 onwards, make sure pipeline.ordered is set to auto or true (auto is the default in 7.x). In future releases (8.x perhaps) pipeline.ordered will default to false.

Use Kafka Connect to update Elasticsearch field on existing document instead of creating new

I have Kafka set-up running with the Elasticsearch connector and I am successfully indexing new documents into an ES index based on the incoming messages on a particular topic.
However, based on incoming messages on another topic, I need to append data to a field on a specific document in the same index.
Psuedo-schema below:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": []
}
^ This document is being created fine in ES based on the data in the topic mentioned above.
However, how do I then add items to the views field using messages from another topic. Like so:
article-view topic schema:
{
"article_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"user_id": 123456,
"timestamp: 136389734
}
and instead of simply creating a new document on the article-view index (which I dont' want to even have). It appends this to the views field on the article document with corresponding _id equal to article_id from the message.
so the end result after one message would be:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": [
{
"user_id": 123456,
"timestamp: 136389734
}
]
}
Using the ES API it is possible using a script. Like so:
{
"script": {
"lang": "painless",
"params": {
"newItems": [{
"timestamp": 136389734,
"user_id": 123456
}]
},
"source": "ctx._source.views.addAll(params.newItems)"
}
}
I can generate scripts like above dynamically in bulk, and then use the helpers.bulk function in the ES Python library to bulk update documents this way.
Is this possible with Kafka Connect / Elasticsearch? I haven't found any documentation on Confluent's website to explain how to do this.
It seems like a fairly standard requirement and an obvious thing people would need to do with Kafka / A sink connector like ES.
Thanks!
Edit: Partial updates are possible with write.method=upsert (src)
The Elasticsearch connector doesn't support this. You can update documents in-place but need to send the full document, not a delta for appending which I think it what you're after.

Multiple Logstash Outputs depending from collectd

I'm facing a configuration failure which I can't solve on my own, tried to get the solution with the documentation, but without luck.
I'm having a few different hosts which send their metrics via collectd to logstash. Inside the logstash configuration I'd like to seperate each host and pipe it into an own ES-index. When I try to configtest my settings logstash throws a failure - maybe someone can help me.
The seperation should be triggered by the hostname collectd delivers:
[This is an old raw json output, so please don't mind the wrong set index]
{
"_index": "wv-metrics",
"_type": "logs",
"_id": "AVHyJunyGanLcfwDBAon",
"_score": null,
"_source": {
"host": "somefqdn.com",
"#timestamp": "2015-12-30T09:10:15.211Z",
"plugin": "disk",
"plugin_instance": "dm-5",
"collectd_type": "disk_merged",
"read": 0,
"write": 0,
"#version": "1"
},
"fields": {
"#timestamp": [
1451466615211
]
},
"sort": [
1451466615211
]
}
Please see my config:
Input Config (Working so far)
input {
udp {
port => 25826
buffer_size => 1452
codec => collectd { }
}
}
Output Config File:
filter {
if [host] == "somefqdn.com" {
output {
elasticsearch {
hosts => "someip:someport"
user => logstash
password => averystrongpassword
index => "somefqdn.com"
}
}
}
}
Error which is thrown:
root#test-collectd1:/home/username# service logstash configtest
Error: Expected one of #, => at line 21, column 17 (byte 314) after filter {
if [host] == "somefqdn.com" {
output {
elasticsearch
I understand, that there's a character possible missing in my config, but I can't locate it.
Thx in advance!
I spot two errors in a quick scan:
First, your output stanza should not be wrapped with a filter{} block.
Second, your output stanza should start with output{} (put the conditional inside):
output {
if [host] == "somefqdn.com" {
elasticsearch {
...
}
}
}

Remove Duplicate Fields Used for document_id Before Elasticsearch in Logstash

I wrote my own filter for Logstash and I'm trying to calculate my own document_id something like this:
docIdClean = "%d %s %s %s" % [ event["#timestamp"].to_f * 1000, event["type"], event["message"] ]
event["docId"] = Digest::MD5.hexdigest(docIdClean)
And the Logstash configuration looks like this:
output {
elasticsearch {
...
index => "analysis-%{+YYYY.MM.dd}"
document_id => "%{docId}"
template_name => "logstash_per_index"
}
}
The more or less minor downside is that all documents in Elasticsearch contain _id and docId holding the same value. Since docId is completely pointless as nobody searches for an MD5-hash I want to remove it, but I don't know how.
The docId has to exist when the event hits the output, otherwise the output can't refer to it. Therefore, I can't remove it beforehand. Since I can't remove it afterwards, the docId sits there occupying space.
I tried to set the event field _id, but that only causes an exception in Elasticsearch that the id of the document is different.
Maybe for explanation here one document:
{
"_index": "analysis-2014.09.16",
"_type": "access",
"_id": "022d9055423cdd0756b6cfa06886f866",
"_score": 1,
"_source": {
"#timestamp": "2014-09-16T19:36:31.000+02:00",
"type": "access",
"tags": [
"personalized"
],
"importDate": "2014/09/17",
"docId": "022d9055423cdd0756b6cfa06886f866"
}
}
EDIT:
This is about Logstash 1.3
There's nothing you can do about this in Logstash 1.4.
In Logstash 1.5, you can use #metadata fields, which are not passed to Elasticsearch.

Resources