Logstash - elasticsearch getting only new data - elasticsearch

I want to run a logstash process that grabs real-time data with certain value in the field and output it into the screen. So far i've come up with this configuration:
input {
elasticsearch {
hosts => "localhost"
user => "logstash"
password => "logstash"
size => 100
query =>'{ "query" : { "bool" : { "must" : { "bool" : { "should" : [ {"match": {"field": "value2"}}, {"match": {"field": "value1"}} ] } } } } }'
}
}
output {
stdout {
codec => rubydebug
}
}
What I've learned from running this config is that:
Logstash output the data in batches which is determined by the size parameter.
There's a few seconds delay between each batch
Logstash grabbed the data from the existing data first.
My question, is there any configuration that can turn the process so that logstash will only listen for new data and output it as soon as the data come into Elastic? Any help would be appreciated.

Related

how to add auto remove field in logstash filter

I am trying to add a _ttl field in logstash so that elasticsearch removes the document after a while, 120 seconds in this case but that's for testing.
filter {
if "drop" in [message] {
drop { }
}
add_field => { "_ttl" => "120s" }
}
but now nothing is logged in elasticsearch.
I have 2 questions.
Where is logged what is going wrong, maybe the syntax of the filter is wrong?
How do I add a ttl field to elasticsearch for auto removal?
When you add a filter to logstash.conf with a mutator it works:
filter {
mutate {
add_field => { "_ttl" => "120s" }
}
}
POST myindex/_search
{
"query": {
"match_all": {}
}
}
Results:
"hits": [
{
"_index": "myindex",
...................
"_ttl": "120s",
For the other question, cant really help there. Im running logstash as container so logging is read with:
docker logs d492eb3c3d0d

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

How to get ElasticSearch output?

I want to add my log document to ElasticSearch and, then I want to check the document in the ElasticSearch.
Following is the conntent of the log file :
Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<20130101142543.5828399CCAF#mailserver14.example.com>
Feb 2 06:25:43 mailserver15 postfix/cleanup[21403]: BEF25A72999: message-id=<20130101142543.5828399CCAF#mailserver15.example.com>
Mar 3 06:25:43 mailserver16 postfix/cleanup[21403]: BEF25A72998: message-id=<20130101142543.5828399CCAF#mailserver16.example.com>
I am able to run my logstash instance with following logstast configuration file :
input {
file {
path => "/Myserver/mnt/appln/somefolder/somefolder2/testData/fileValidator-access.LOG"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
patterns_dir => ["/Myserver/mnt/appln/somefolder/somefolder2/logstash/pattern"]
match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
}
}
output{
elasticsearch{
hosts => "localhost:9200"
document_id => "test"
index => "testindex"
action => "update"
}
stdout { codec => rubydebug }
}
I have define my own grok pattern as :
POSTFIX_QUEUEID [0-9A-F]{10,11}
When I am running the logstash instance, I am successfully sending the data to elasticsearch, which gives following output :
Now, I have got the index stored in elastic search under testindex, but when I am using the curl -X GET "localhost:9200/testindex" I am getting following output :
{
"depositorypayin" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"creation_date" : "1547795277865",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "5TKW2BfDS66cuoHPe8k5lg",
"version" : {
"created" : "6050499"
},
"provided_name" : "depositorypayin"
}
}
}
}
This is not what is stored inside the index.I want to query the document inside the index.Please help. (PS: please forgive me for the typos)
The API you used above only returns information about the index itself (docs here). You need to use the Query DSL to search the documents. The following Match All Query will return all the documents in the index testindex:
curl -X GET "localhost:9200/testindex/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'
Actually I have edited my config file whic look like this now :
input {
. . .
}
filter {
. . .
}
output{
elasticsearch{
hosts => "localhost:9200"
index => "testindex"
}
}
And now I am able to get fetch the data from elasticSearch using
curl 'localhost:9200/testindex/_search'
I don't know how it works, but it is now.
can anyone explain why ?

Uncooperative ELK Docker Instance

I have ELK 5.5.1 running in a Docker container, and it'll parse most of my logs, except for ones that originate from my Spring application. Kinda running out of ideas.
I've traced it down to the logstash->elasticsearch pipeline. Filebeat is doing its job, and Logstash is receiving logs from the application in question, based on tailing lostash's stdout log.
I wiped the docker volume that stores my ELK data clean, and started fresh with filebeat just forwarding the logs in question.
Take a log line like this:
FINEST|8384/0|Service tsoft_spring|17-08-31 14:12:01|2017-08-31 14:12:01.260 INFO 8384 --- [ taskExecutor-2] c.t.s.c.s.a.ConfirmationService : Will not persist empty response notes
Using a very minimal logstash configuration, it'll wind up being persisted in elasticsearch:
input {
beats {
port => 5044
ssl => false
}
}
filter {
if [message] =~ /tsoft_spring/ {
grok {
match => [ "message", "%{GREEDYDATA:logmessage}" ]
}
}
}
output {
stdout { }
elasticsearch { hosts => ["localhost:9200"] }
}
Using a more complete configuration, the log is just ignored by elastic, no grokparsefailure, no dateparsefailure:
input {
beats {
port => 5044
ssl => false
}
}
filter {
if [message] =~ /tsoft_spring/ {
grok {
match => [ "message", "%{WORD}\|%{NUMBER}/%{NUMBER}\|%{WORD}%{SPACE}%{WORD}\|%{TIMESTAMP_ISO8601:timestamp}\|%{TIMESTAMP_ISO8601}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}%{NUMBER:pid}%{SPACE}---%{SPACE}%{SYSLOG5424SD:threadname}%{SPACE}%{JAVACLASS:classname}%{SPACE}:%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
}
}
}
output {
stdout { }
elasticsearch { hosts => ["localhost:9200"] }
}
I've checked that this pattern will parse that line, using http://grokconstructor.appspot.com/do/match#result, and I could've sworn it was working last weekend, but could be my imagination.
Maybe the problem here is not in your grok filter, but in the date match. Resulting year is 0017, instead of 2017. Maybe that's why you can't find the event in ES? Can you try this:
date {
match => [ "timestamp" , "yy-MM-dd HH:mm:ss" ]
}

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

Resources