How do I replicate the _id and _type of elasticsearch index when dumping data through Logstash - elasticsearch

I have an "Index":samcorp with "type":"sam".
One of them looks like the below :
{
"_index": "samcorp",
"_type": "sam",
"_id": "1236",
"_version": 1,
"_score": 1,
"_source": {
"name": "Sam Smith",
"age": 22,
"confirmed": true,
"join_date": "2014-06-01"
}
}
I want to replicate the same data into a different "index" name "jamcorp" with the same "type" and same "id"
I am using Logstash to do it:
I use the below code in the configuration file of logstash I end up having wrong ids and type
input {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "samcorp"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
manage_template => false
index => "jamcorp"
document_type => "%{_type}"
document_id => "%{_id}"
}
}
I've tried all possible combinations, I gt the following output:
Output:
{
"_index": "jamcorp",
"_type": "%{_type}",
"_id": "%{_id}",
"_version": 4,
"_score": 1,
"_source": {
"name": "Sam Smith",
"age": 22,
"confirmed": true,
"join_date": "2014-06-01"
}
}
The Ouptut I require is:
{
"_index": "jamcorp",
"_type": "sam",
"_id": "1236",
"_version": 4,
"_score": 1,
"_source": {
"name": "Sam Smith",
"age": 22,
"confirmed": true,
"join_date": "2014-06-01"
}
}
Any help would be appreciated. :) Thanks

In your elasticsearch input, you need to set the docinfo parameter to true
input {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "samcorp"
docinfo => true <--- add this
}
}
As a result the #metadata hash will be populated with the index, _type and _id of the document and you can reuse that in your filters and outputs:
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
manage_template => false
index => "jamcorp"
document_type => "%{[#metadata][_type]}" <--- use #metadata
document_id => "%{[#metadata][_id]}" <--- use #metadata
}
}

Related

I have implemented the kafka with logstash input and elasticsearch output. its working fine in kibana.. I want to filter the data based on statuscode

This is kibana dashboard json Data.. Here i have to filter the based on response statuscode with in the message json data field..
{
"_index": "rand-topic",
"_type": "_doc",
"_id": "ulF8uH0BK9MbBSR7DPEw",
"_version": 1,
"_score": null,
"fields": {
"#timestamp": [
"2021-12-14T10:27:56.956Z"
],
"#version": [
"1"
],
"#version.keyword": [
"1"
],
"message": [
"{\"requestMethod\":\"GET\",\"headers\":{\"content-type\":\"application/json\",\"user-agent\":\"PostmanRuntime/7.28.4\",\"accept\":\"*/*\",\"postman-token\":\"977fc94b-38c8-4df4-ad73-814871a32eca\",\"host\":\"localhost:5600\",\"accept-encoding\":\"gzip, deflate, br\",\"connection\":\"keep-alive\",\"content-length\":\"44\"},\"body\":{\"category\":\"CAT\",\"noise\":\"purr\"},\"query\":{},\"requestUrl\":\"http://localhost:5600/kafka\",\"protocol\":\"HTTP/1.1\",\"remoteIp\":\"1\",\"requestSize\":302,\"userAgent\":\"PostmanRuntime/7.28.4\",\"statusCode\":200,\"response\":{\"success\":true,\"message\":\"Kafka Details are added\",\"data\":{\"kafkaData\":{\"_id\":\"61b871ac69be37078a9c1a79\",\"category\":\"DOG\",\"noise\":\"bark\",\"__v\":0},\"postData\":{\"category\":\"DOG\",\"noise\":\"bark\"}}},\"latency\":{\"seconds\":0,\"nanos\":61000000},\"responseSize\":193}"]},"sort[1639477676956]}
Expected output like this Here added the statuscode field from message field
{
"_index": "rand-topic",
"_type": "_doc",
"_id": "ulF8uH0BK9MbBSR7DPEw",
"_version": 1,
"_score": null,
"fields": {
"#timestamp": [
"2021-12-14T10:27:56.956Z"
],
"#version": [
"1"
],
"#version.keyword": [
"1"
],
"statusCode": [
200
],
"message": [
"{\"requestMethod\":\"GET\",\"headers\":{\"content-
type\":\"application/json\",\"user-
agent\":\"PostmanRuntime/7.28.4\",\"accept\":\"*/*\",\"postman-
token\":\"977fc94b-38c8-4df4-ad73-
814871a32eca\",\"host\":\"localhost:5600\",\"accept-
encoding\":\"gzip, deflate, br\",\"connection\":\"keep-
alive\",\"content-length\":\"44\"},\"body\":
{\"category\":\"CAT\",\"noise\":\"purr\"},\"query\": {}, \"requestUrl\":\"http://localhost:5600/kafka\",\"protocol\":\"HTTP/1.1\",\"remoteIp\":\"1\",\"requestSize\":302,\"userAgent\":\"PostmanRuntime/7.28.4\",\"statusCode\":200,\"response\":{\"success\":true,\"message\":\"Kafka Details are added\",\"data\":{\"kafkaData\":{\"_id\":\"61b871ac69be37078a9c1a79\",\"category\":\"DOG\",\"noise\":\"bark\",\"__v\":0},\"postData\":{\"category\":\"DOG\",\"noise\":\"bark\"}}},\"latency\":{\"seconds\":0,\"nanos\":61000000},\"responseSize\":193}"
]},"sort": [1639477676956]}
Please help me how to configure logstash filter for statusCode
input {
kafka {
topics => ["randtopic"]
bootstrap_servers => "192.168.29.138:9092"
}
}
filter{
mutate {
add_field => {
"statusCode" => "%{[status]}"
}
}
}
output {
elasticsearch {
hosts => ["192.168.29.138:9200"]
index => "rand-topic"
workers => 1
}
}
output {
if [message][0][statusCode] == "200" {
Do Somethings ....
stdout { codec => ""}
}
}

Set _Id as update key in logstash elasticsearch

Im having an index as below:
{
"_index": "mydata",
"_type": "_doc",
"_id": "PuhnbG0B1IIlyY9-ArdR",
"_score": 1,
"_source": {
"age": 9,
"#version": "1",
"updated_on": "2019-01-01T00:00:00.000Z",
"id": 4,
"name": "Emma",
"#timestamp": "2019-09-26T07:09:11.947Z"
}
So my logstash conf for updaing data is input {
jdbc {
jdbc_connection_string => "***"
jdbc_driver_class => "***"
jdbc_driver_library => "***"
jdbc_user => ***
statement => "SELECT * from agedata WHERE updated_on > :sql_last_value ORDER BY updated_on"
use_column_value =>true
tracking_column =>updated_on
tracking_column_type => "timestamp"
}
}
output {
elasticsearch { hosts => ["localhost:9200"]
index => "mydata"
action => update
document_id => "{_id}"
doc_as_upsert =>true}
stdout { codec => rubydebug }
}
So, when i run this after any updation in the same row, my expected output is to update the existing _id values for any changes i made in that row.
But my Elasticsearch is indexing it as a new row where my _id is considered as a string.
"_index": "agesep",
"_type": "_doc",
"_id": ***"%{_id}"***
The duplicate occurs when i use document_id => "%{id}" as:
actual:
{
"_index": "mydata",
"_type": "_doc",
"_id": "BuilbG0B1IIlyY9-4P7t",
"_score": 1,
"_source": {
"id": 1,
"age": 13,
"name": "Greg",
"updated_on": "2019-09-26T08:11:00.000Z",
"#timestamp": "2019-09-26T08:17:52.974Z",
"#version": "1"
}
}
duplicate:
{
"_index": "mydata",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"age": 56,
"#version": "1",
"id": 1,
"name": "Greg",
"updated_on": "2019-09-26T08:18:00.000Z",
"#timestamp": "2019-09-26T08:20:14.561Z"
}
How do i get it to consider the existing _id and not create a duplicate value when i make updates in ES?
My expectation is to update data in the index based on the _id, and not create a new row of update.
I suggest using id instead of _id
document_id => "%{id}"

logstash extract and move nested fields into new parent field

If in my log I print the latitude and longitude of a given point, how can I capture this information so that it is processed as a geospatial data in elastic search?
Below I show an example of a document in Elasticsearch corresponding to a log line:
{
"_index": "memo-logstash-2018.05",
"_type": "doc",
"_id": "DDCARGMBfvaBflicTW4-",
"_version": 1,
"_score": null,
"_source": {
"type": "elktest",
"message": "LON: 12.5, LAT: 42",
"#timestamp": "2018-05-09T10:44:09.046Z",
"host": "f6f9fd66cd6c",
"path": "/usr/share/logstash/logs/docker-elk-master.log",
"#version": "1"
},
"fields": {
"#timestamp": [
"2018-05-09T10:44:09.046Z"
]
},
"highlight": {
"type": [
"#kibana-highlighted-field#elktest#/kibana-highlighted-field#"
]
},
"sort": [
1525862649046
]
}
You can first separate LON and LAT into their own fields as follows,
grok {
match => {"message" => "LON: %{NUMBER:LON}, LAT: %{NUMBER:LAT}"}
}
once they are separated you can use mutate filter to create a parent field around them, like this,
filter {
mutate {
rename => { "LON" => "[location][LON]" }
rename => { "LAT" => "[location][LAT]" }
}
}
let me know if this helps.

Test logstash with elasticsearch as input and output

I have configured logstash with Elasticsearch as input and output paramaters as below :
input
{
elasticsearch {
hosts => ["hostname" ]
index => 'indexname'
type => 'type'
user => 'username'
password => 'password'
docinfo => true
query => '{ "query": { "match": { "first_name": "mary" } }}'
}
}
output
{
elasticsearch {
hosts => ["hostname" ]
index => 'indexname'
user => 'username'
password => 'password'
}
}
My indexed data is as below :
PUT person/person/3
{
"first_name" : "mary"
}
PUT person/person/4
{
"first_name" : "mary.m"
}
PUT person/person/5
{
"first_name" : "mary.k"
}
When I run below query on ES
GET indexname/_search
{
"query": {
"match": {
"first_name": "mary"
}
}
}
it returns
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "3",
"_score": 0.2876821,
"_source": {
"first_name": "mary"
}
}
]
}
}
Although logstash pipeline has started successfully it does not log this query in ES as I had used query as "match": { "first_name": "mary"} in input section.
Since your ES runs on HTTPS, you need to add ssl => true to your elasticsearch input configuration
input {
elasticsearch {
hosts => ["hostname" ]
index => 'indexname'
type => 'type'
user => 'username'
password => 'password'
docinfo => true
ssl => true <--- add this
query => '{ "query": { "match": { "first_name": "mary" } }}'
}
}

How to extract feature from the Elasticsearch _source to index

I have used logstash,elasticsearch and kibana to collect logs.
The log file is json which like this:
{"_id":{"$oid":"5540afc2cec7c68fc1248d78"},"agentId":"0000000BAB39A520","handler":"SUSIControl","sensorId":"/GPIO/GPIO00/Level","ts":{"$date":"2015-04-29T09:00:00.846Z"},"vHour":1}
{"_id":{"$oid":"5540afc2cec7c68fc1248d79"},"agentId":"0000000BAB39A520","handler":"SUSIControl","sensorId":"/GPIO/GPIO00/Dir","ts":{"$date":"2015-04-29T09:00:00.846Z"},"vHour":0}
and the code I have used in logstash:
input {
file {
type => "log"
path => ["/home/data/1/1.json"]
start_position => "beginning"
}
}
filter {
json{
source => "message"
}
}
output {
elasticsearch { embedded => true }
stdout { codec => rubydebug }
}
then the output in elasticsearch is :
{
"_index": "logstash-2015.06.29",
"_type": "log",
"_id": "AU5AG7KahwyA2bfnpJO0",
"_version": 1,
"_score": 1,
"_source": {
"message": "{"_id":{"$oid":"5540afc2cec7c68fc1248d7c"},"agentId":"0000000BAB39A520","handler":"SUSIControl","sensorId":"/GPIO/GPIO05/Dir","ts":{"$date":"2015-04-29T09:00:00.846Z"},"vHour":1}",
"#version": "1",
"#timestamp": "2015-06-29T16:17:03.040Z",
"type": "log",
"host": "song-Lenovo-IdeaPad",
"path": "/home/song/soft/data/1/Average.json",
"_id": {
"$oid": "5540afc2cec7c68fc1248d7c"
},
"agentId": "0000000BAB39A520",
"handler": "SUSIControl",
"sensorId": "/GPIO/GPIO05/Dir",
"ts": {
"$date": "2015-04-29T09:00:00.846Z"
},
"vHour": 1
}
}
But the information in the json file all in the _source not index
so that i can't use kibana to analysis them.
the kibana shows that Analysis is not available for object fields.
the _source is object fields
how to solve this problem?

Resources