How can I configure a custom field to be aggregatable in Kibana? - elasticsearch

I am new to running the ELK stack. I have Logstash configured to feed my webapp log into Elasticsearch. I am trying to set up a visualization in Kibana that will show the count of unique users, given by the user_email field, which is parsed out of certain log lines.
I am fairly sure that I want to use the Unique Count aggregation, but I can't seem to get Kibana to include user_email in the list of fields which I can aggregate.
Here is my Logstash configuration:
filter {
if [type] == "wl-proxy-log" {
grok {
match => {
"message" => [
"(?<syslog_datetime>%{SYSLOGTIMESTAMP}\s+%{YEAR})\s+<%{INT:session_id}>\s+%{DATA:log_message}\s+license=%{WORD:license}\&user=(?<user_email>%{USERNAME}\#%{URIHOST})\&files=%{WORD:files}",
]
}
break_on_match => true
}
date {
match => [ "syslog_datetime", "MMM dd HH:mm:ss yyyy", "MMM d HH:mm:ss yyyy" ]
target => "#timestamp"
locale => "en_US"
timezone => "America/Los_Angeles"
}
kv {
source => "uri_params"
field_split => "&?"
}
}
}
output {
elasticsearch {
ssl => false
index => "wl-proxy"
manage_template => false
}
}
Here is the relevant mapping in Elasticsearch:
{
"wl-proxy" : {
"mappings" : {
"wl-proxy-log" : {
"user_email" : {
"full_name" : "user_email",
"mapping" : {
"user_email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
Can anyone tell me what I am missing?
BTW, I am running CentOS with the following versions:
Elasticsearch Version: 6.0.0, Build: 8f0685b/2017-11-10T18:41:22.859Z, JVM: 1.8.0_151
Logstash v.6.0.0
Kibana v.6.0.0
Thanks!

I figured it out. The configuration was correct, AFAICT. The issue was that I simply hadn't refreshed the list of fields in the index in the Kibana UI.
Management -> Index Patterns -> Refresh Field List (the refresh icon)
After doing that, the field began appearing in the list of aggregatable terms, and I was able to create the necessary visualizations.

Related

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

Logstash unable to index into elasticsearch because it can't parse date

I am getting a lot of the following errors when I am running logstash to index documents into Elasticsearch
[2019-11-02T18:48:13,812][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index-2019-09-28", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x729fc561>], :response=>{"index"=>{"_index"=>"my-index-2019-09-28", "_type"=>"doc", "_id"=>"BhlNLm4Ba4O_5bsE_PxF", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id 'BhlNLm4Ba4O_5bsE_PxF'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2019-09-28 23:32:10.586\" is malformed at \" 23:32:10.586\""}}}}}
It clearly has a problem with the date being formed but I don't see what that problem could be. Below are excerpts from my logstash config and the elasticsearch template. I include these because I'm trying to use the timestamp field to articulate the index in my logstash config by copying timestamp into #timestamp then formatting that to a YYY-MM-DD format and use that stored metadata to articulate my index
Logstash config:
input {
stdin { type => stdin }
}
filter {
csv {
separator => " " # this is a tab (/t) not just whitespace
columns => ["timestamp","field1", "field2", ...]
convert => {
"timestamp" => "date_time"
...
}
}
}
filter {
date {
match => ["timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "#timestamp"
}
}
filter {
date_formatter {
source => "#timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
filter {
mutate {
remove_field => [
"#timestamp",
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "my-index-%{[#metadata][date]}"
template => "my-config.json"
template_name => "my-index-*"
region => "us-east-1"
}
}
Template:
{
"template" : "my-index-*",
"mappings" : {
"doc" : {
"dynamic" : "false",
"properties" : {
"timestamp" : {
"type" : "date"
}, ...
},
"settings" : {
"index" : {
"number_of_shards" : "12",
"number_of_replicas" : "0"
}
}
}
When I inspect the raw data it looks like what the error is showing me and that appears to be well formed so I'm not sure what my issue is
Here is an example row, it's been redacted but the problem field is untouched and is the first one
2019-09-28 07:29:46.454 NA 2019-09-28 07:29:00 someApp 62847957802 62847957802
Turns out the source problem was the convert block. logstash is unable to understand the time format specified in the file. To address this I changed the original timestamp field to unformatted_timestamp and apply the date formatter I was already using
filter {
date {
match => ["unformatted_timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "timestamp"
}
}
filter {
date_formatter {
source => "timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
You are parsing your lines using the csv filter and setting the separator to a space, but your date is also split by a space, this way your first field, named timestamp only gets the date 2019-09-28 and the time is on the field named field1.
You can solve your problem creating a new field named date_and_time with the contents of the fields with the date and the time, for example.
csv {
separator => " "
columns => ["date","time","field1","field2","field3","field4","field5","field6"]
}
mutate {
add_field => { "date_and_time" => "%{date} %{time}" }
}
mutate {
remove_field => ["date","time"]
}
This will create a field named date_and_time with the value 2019-09-28 07:29:46.454, you can now use the date filter to parse this value into the #timestamp field, the default for logstash.
date {
match => ["date_and_time", "YYYY-MM-dd HH:mm:ss.SSS"]
}
This will leave you with two fields with the same value, date_and_time and #timestamp, the #timestamp is the default for logstash so I would suggest keeping it and removing the date_and_time that was created before.
mutate {
remove_field => ["date_and_time"]
}
Now you can create your date based index using the format YYYY-MM-dd and logstash will extract the date from the #timestamp field, just change your index line in your output for this one:
index => "my-index-%{+YYYY-MM-dd}"

How to get ElasticSearch output?

I want to add my log document to ElasticSearch and, then I want to check the document in the ElasticSearch.
Following is the conntent of the log file :
Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<20130101142543.5828399CCAF#mailserver14.example.com>
Feb 2 06:25:43 mailserver15 postfix/cleanup[21403]: BEF25A72999: message-id=<20130101142543.5828399CCAF#mailserver15.example.com>
Mar 3 06:25:43 mailserver16 postfix/cleanup[21403]: BEF25A72998: message-id=<20130101142543.5828399CCAF#mailserver16.example.com>
I am able to run my logstash instance with following logstast configuration file :
input {
file {
path => "/Myserver/mnt/appln/somefolder/somefolder2/testData/fileValidator-access.LOG"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
patterns_dir => ["/Myserver/mnt/appln/somefolder/somefolder2/logstash/pattern"]
match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
}
}
output{
elasticsearch{
hosts => "localhost:9200"
document_id => "test"
index => "testindex"
action => "update"
}
stdout { codec => rubydebug }
}
I have define my own grok pattern as :
POSTFIX_QUEUEID [0-9A-F]{10,11}
When I am running the logstash instance, I am successfully sending the data to elasticsearch, which gives following output :
Now, I have got the index stored in elastic search under testindex, but when I am using the curl -X GET "localhost:9200/testindex" I am getting following output :
{
"depositorypayin" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"creation_date" : "1547795277865",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "5TKW2BfDS66cuoHPe8k5lg",
"version" : {
"created" : "6050499"
},
"provided_name" : "depositorypayin"
}
}
}
}
This is not what is stored inside the index.I want to query the document inside the index.Please help. (PS: please forgive me for the typos)
The API you used above only returns information about the index itself (docs here). You need to use the Query DSL to search the documents. The following Match All Query will return all the documents in the index testindex:
curl -X GET "localhost:9200/testindex/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'
Actually I have edited my config file whic look like this now :
input {
. . .
}
filter {
. . .
}
output{
elasticsearch{
hosts => "localhost:9200"
index => "testindex"
}
}
And now I am able to get fetch the data from elasticSearch using
curl 'localhost:9200/testindex/_search'
I don't know how it works, but it is now.
can anyone explain why ?

Convert log message timestamp to UTC before storing it in Elasticsearch

I am collecting and parsing Tomcat access-log messages using Logstash, and am storing the parsed messages in Elasticsearch.
I am using Kibana to display the log messges in Elasticsearch.
Currently I am using Elasticsearch 2.0.0, Logstash 2.0.0, and Kibana 4.2.1.
An access-log line looks something like the following:
02-08-2016 19:49:30.669 ip=11.22.333.444 status=200 tenant=908663983 user=0a4ac75477ed42cfb37dbc4e3f51b4d2 correlationId=RID-54082b02-4955-4ce9-866a-a92058297d81 request="GET /pwa/rest/908663983/rms/SampleDataDeployment HTTP/1.1" userType=Apache-HttpClient requestInfo=- duration=4 bytes=2548 thread=http-nio-8080-exec-5 service=rms itemType=SampleDataDeployment itemOperation=READ dataLayer=MongoDB incomingItemCnt=0 outgoingItemCnt=7
The time displayed in the log file (ex. 02-08-2016 19:49:30.669) is in local time (not UTC!)
Here is how I parse the message line:
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
kv {}
mutate {
convert => { "duration" => "integer" }
convert => { "bytes" => "integer" }
convert => { "status" => "integer" }
convert => { "incomingItemCnt" => "integer" }
convert => { "outgoingItemCnt" => "integer" }
gsub => [ "message", "\r", "" ]
}
grok {
match => { "request" => [ "(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?)" ] }
overwrite => [ "request" ]
}
}
I would like Logstash to convert the time read from the log message ('logTimestamp' field) into UTC before storing it in Elasticsearch.
Can someone assist me with that please?
--
I have added the date filter to my processing, but I had to add a timezone.
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
date {
match => [ "logTimestamp" , "mm-dd-yyyy HH:mm:ss.SSS" ]
timezone => "Asia/Jerusalem"
target => "logTimestamp"
}
...
}
Is there a way to convert the date to UTC without supplying the local timezone, such that Logstash takes the timezone of the machine it is running on?
The motivation behind this question is I would like to use the same configuration file in all my deployments, in various timezones.
That's what the date{} filter is for - to parse a string field containing a date string replace the [#timestamp] field with that value in UTC.
This can also be done in an ingest processor as follows:
PUT _ingest/pipeline/chage_local_time_to_iso
{
"processors": [
{
"date" : {
"field" : "my_time",
"target_field": "my_time",
"formats" : ["dd/MM/yyyy HH:mm:ss"],
"timezone" : "Europe/Madrid"
}
}
]
}

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources