Ingesting data from Spark to Elasticsearch with index template - elasticsearch

In our existing design we are using logstash to fetch data from Kafka (JSON) and put it in ElasticSearch.
We are also using index template mapping while inserting data from logstash to ES and this could be done by setting 'template' property of ES output plugin of logstash, e.g.,
output {
elasticsearch {
template => "elasticsearch-template.json", //template file path
hosts => "localhost:9200"
template_overwrite => true
manage_template => true
codec=>plain
}
}
elasticsearch-template.json looks like below,
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "3s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256, "doc_values":true}
}
}
}
} ],
"properties" : {
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
Now we are going to replace logstash with Apache Spark and I want to use similar kind of usage of index template in Spark while inserting data to ES.
I am using elasticsearch-spark_2.11 library for this implementation.
Thanks.

Related

Grafana 4 Templating with Elasticsearch 5

Edit: See below for the solution
Currently having an issue with the templating in Grafana - trying to get a dropdown of hostnames from some data I'm feeding in to Elasticsearch via Logstash's Graphite plugin, so I can build a dynamic template in Grafana.
Versions are
Grafana 4.1.2 + Elasticsearch/Logstash 5.2.1
The terms query in Grafana I'm trying to use is as follows as per docs on grafana website - http://docs.grafana.org/features/datasources/elasticsearch/ :
{"find": "terms", "field": "host_name"}
This works fine if the field is a numeric type field - eg I get results in the template for metric_value, but this doesn't seem to work for text/string fields. I'm wondering if this is maybe due to the way I'm constructing or ingesting the fields - You can see below how I"m trying to achieve this - note, I've tried "keyword" and "text" types for these fields, neither seem to work.
This is the Logstash input filter that I'm using - basically trying to split the graphite style metric into seperate fields -
input {
graphite {
type => graphite
port => 2003
id => "graphite_input"
}
}
filter {
if [type] == "graphite" {
grok {
match => [ "message", "\Aicinga2\.%{MONGO_WORDDASH:host_name:keyword}\.%{WORD:metric_type:keyword}\.%{NOTSPACE:metric_name:keyword}\.value%{SPACE}%{NUMBER:metric_value:float}%{SPACE}%{POSINT:timestamp:date}" ]
}
}
}
output {
if [type] == "graphite" {
elasticsearch {
index => "graphite-%{+YYYY.MM}"
hosts => ["localhost"]
}
}
}
And an example document I'm indexing (taken from kibana)
{
"_index": "graphite-2017.02",
"_type": "graphite",
"_id": "XYZdflksdf",
"_score": null,
"_source": {
"#timestamp": "2017-02-21T00:17:16.000Z",
"metric_name": "interface-eth0.snmp-interface.perfdata.eth0_in_discard",
"port": 37694,
"icinga2.XXXYYY.services.interface-eth0.snmp-interface.perfdata.eth0_in_discard.value": 357237,
"#version": "1",
"host": "192.168.1.1",
"metric_type": "services",
"metric_value": 357237,
"message": "icinga2.XXXYYY.services.interface-eth0.snmp-interface.perfdata.eth0_in_discard.value 357237 1487636236",
"type": "graphite",
"host_name": "XXXYYY",
"timestamp": "1487636236"
},
"fields": {
"#timestamp": [
1487636236000
]
},
"sort": [
1487636236000
]
}
I have now solved this problem myself. The string fields are required to be defined as not_analyzed in order to appear in the Grafana dashboard.
Here's an example Template you can use:
Note: you'll have to install this manually, it seems like logstash won't install it into elasticsearch for some reason (maybe a bug?)
Install like so (assuming path is /etc/logstash/graphite-new.json:
curl -XPUT 'http://localhost:9200/_template/graphite-*' -d#/etc/logstash/graphite-new.json
Template:
{
"template" : "graphite-*",
"settings" : { "index.refresh_interval" : "60s" },
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"dynamic_templates" : [{
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : { "type" : "string", "index" : "not_analyzed" }
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : { "type" : "string", "index" : "not_analyzed" }
}
}],
"properties" : {
"#timestamp" : { "type" : "date", "format" : "dateOptionalTime" },
"#version" : { "type" : "integer", "index" : "not_analyzed" },
"metric_name" : { "type" : "string", "index" : "not_analyzed" },
"host" : { "type" : "string", "index" : "not_analyzed" },
"host_name" : { "type" : "string", "index" : "not_analyzed" },
"metric_type" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}
I've still got this defined in the logstash filter as well:
if [type] == "graphite" {
elasticsearch {
index => "graphite-%{+YYYY.MM}"
hosts => ["localhost"]
template => "/etc/logstash/graphite-new.json"
}
}

Logstash elastic search output custom template not working

My logstash config is something like the following
if "user" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "user-%{+YYYY.MM.dd}"
template => '/path/to/elastic-template.json'
flush_size => 50
}
}
And the json template contains the lines
"fields" : {
"{name}" : {"type": "string", "index" : "analyzed", "omit_norms" : true, "index_options" : "docs"},
"{name}.raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
So I assume the .raw can be used when searching or generating the visualization.
However, I removed the existing index and rebuild again, I can see the data, but I still cannot find the .raw field either Kibana's settings, discover or visualize
How to use the .raw field?
The template you posted isn't even valid JSON. If you want to apply a raw field as in not_analyzed you have to do it like this:
"action" : {
"type" : "string",
"fields" : {
"raw" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}
This will create a action.raw field.
I encountered same issue.
I used ES5.5.1 and logstash 5.5.1, below is my template file
{
"template": "access_log",
"settings": {
"index.refresh_interval" : "5s"
},
"mappings": {
"log": {
"properties":{
"geoip":{
"properties":{
"location" : {
"type" : "geo_point",
"index": "false"
}
}
}
}
}
}
}

Logstash issues in creating index remove .raw field in kibana

I have written a logstash conf filefor reading logs. If I use the default index, that is logstash-*, I could see .raw field in kibana. However, if I create a new index in conf file in logstash like
output{
elasticsearch {
hosts => "localhost"
index => "batchjob-*"}
}
Then the new index cant configure .raw field. Is there any resolve ways to solve it? Great Thanks.
The raw fields are created by a specific index template that the Logstash elasticsearch output creates in Elasticsearch.
What you can do is simply copy that template to a file named batchjob.json and change the template name to batchjob-* (see below)
{
"template" : "batchjob-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date" },
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "float" },
"longitude" : { "type" : "float" }
}
}
}
}
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => "localhost"
index => "batchjob-*"
template_name => "batchjob"
template => "/path/to/batchjob.json"
}
}

Is it possible to set custom mapping for index in logstash but not in elasticsearch?

There's input, filter and then output in Logstash main coding.
Is it possible to set custom mapping in
output
{ elasticsearch {
}
If it is possible, how do I set it?
With this example:
"mappings" : {
"_default_" : {
"properties" : {
"service" : { "type" : "integer" },
"rule" : { "type" : "integer" },
"ICMP Type" : { "type" : "integer" },
"ICMP Code" : { "type" : "integer" },
"ip_offset" : { "type" : "integer" },
"ip_id" : { "type" : "integer" },
"ip_len" : { "type" : "integer" },
"Confidence Level" : { "type" : "integer" },
"fragments_dropped" : { "type" : "integer" },
"Severity" : { "type" : "integer" },
"serial_num" : { "type" : "integer" },
"during_sec" : { "type" : "integer" },
"Attack info" : {"type": "string", "index" : "not_analyzed" },
"peer gateway" : {"type": "string", "index" : "not_analyzed" }
Logstash comes with a default template that is used when writing documents to elasticsearch.
If you'd like to change the default, you can update your config and pass it the location of a template file.
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-template
You can use template and template_overwrite fields like that :
elasticsearch {
template => "/tttttttttttt/elasticsearch-logstash-template.json"
index => "logstash-%{+YYYY.MM.dd}"
cluster=>"cluster"
template_overwrite => true
}

How to map geoip field in logstash with elasticsearch in order to display it in tile map of Kibana4

I'd like to display geoip fields in tile map of Kibana4.
Using the standard / automatic logstash geoip mapping to elasticsearch it all works fine.
However when creating a non-standard geoip field, I am not quite sure how to customize the elasticsearch-template.json in logstash in order to represent this field correctly in elasticsearch so that it can be chosen in Kibana4 for tile map creation.
Sure, customizing the standard template is not the best way - better create a custom template and point to it in elasticsearch output of logstash.conf. I just quickly wanted to check how the mapping has to be defined, so I modified the standard template.
My logstash.conf:
input {
tcp {
port => 514
type => syslog
}
udp {
port => 514
type => syslog
}
}
filter {
# Standard geoip field is automatically mapped by logstash to
# elastic search by using the elasticsearch-template.json file
geoip { source => "host" }
grok {
match => [
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} SPort:%{INT:sport} Dst:%{IPV4:dst} DPort:%{INT:dport} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}",
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} Dst:%{IPV4:dst} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}",
"message", "<%{POSINT:syslog_pri}>%{YEAR} %{SYSLOGTIMESTAMP:syslog_timestamp} %{DATA:device} <%{POSINT:status}> %{WORD:activity} %{DATA:inout} \(%{DATA:msg}\) Src:%{IPV4:src} Dst:%{IPV4:dst} Type:%{POSINT:type} Code:%{INT:code} IPP:%{INT:ipp} Rule:%{INT:rule} Interface:%{WORD:iface}"
]
}
# Is not mapped automatically by logstash in that it can be
# chosen in Kibana4 for tile map creation
geoip {
source => "src"
target => "src_geoip"
}
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
}
}
My ...logstash-1.4.2\lib\logstash\outputs\elasticsearch\elasticsearch-template.json:
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
"properties" : {
"#version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"path": "full",
"properties" : {
"location" : { "type" : "geo_point" }
}
},
"src_geoip" : {
"type" : "object",
"dynamic": true,
"path": "full",
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
UPDATE: I havent figured out yet when this json file gets applied in elasticsearch. I followed the hints outlined in this question and copied the json file to a config/templates folder in elasticsearch directory. After deleting the indizes and restart of elasticsearch, the template was applied successfully.
Anyway, the field "src_geoip.location" still does not show up in the tile map creation form of Kibana4 (only the standard geoip.location field does).
Try overwrite template after editing template. Re-create indexes in Kibana after config change.
output {
elasticsearch {
template_overwrite => "true"
...
}
}
You also need to add objects for the src_geoip object in the index template on your elasticsearch instance. To set the default template for all indexes that match "logstash-netflow-*", execute the following on your elasticsearch instance:
curl -XPUT localhost:9200/_template/logstash-netflow -d '{
"template" : "logstash-netflow-*",
"mappings" : {
"_default_" : {
"_all" : {
"enabled" : false
},
"properties" : {
"#timestamp" : { "index" : "analyzed", "type" : "date" },
"#version" : { "index" : "analyzed", "type" : "integer" },
"src_geoip" : {
"dynamic" : true,
"type" : "object",
"properties" : {
"area_code" : { "type" : "long" },
"city_name" : { "type" : "string" },
"continent_code" : { "type" : "string" },
"country_code2" : { "type" : "string" },
"country_code3" : { "type" : "string" },
"country_name" : { "type" : "string" },
"dma_code" : { "type" : "long" },
"ip" : { "type" : "string" },
"latitude" : { "type" : "double" },
"location" : { "type" : "double" },
"longitude" : { "type" : "double" },
"postal_code" : { "type" : "string" },
"real_region_name" : { "type" : "string" },
"region_name" : { "type" : "string" },
"timezone" : { "type" : "string" }
}
},
"netflow" : { ....snipped......
}
}
}
}}'

Resources