Logstash - how to change field types - elasticsearch

Hello guys I am new to Elasticsearch and I have a table called 'american_football_sacks_against_stats'. It consists of three columns.
id
sacks_against_total
sacks_against_yards
1
12
5
2
15
3
...
...
...
The problem is that sacks_against_total and sacks_against_yards aren't 'imported' as integers/longs/floats whatsoever but as a text field and a keyword field. How can I convert them into numbers?
I tried this but its not working:
mutate {
convert => {
"id" => "integer"
"sacks_against_total" => "integer"
"sacks_against_yards" => "integer"
}
}
This is my logstash.conf file:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/sportsdb"
jdbc_user => "user"
jdbc_password => "password"
jdbc_driver_class => "org.postgresql.Driver"
schedule => "*/5 * * * *"
statement => "SELECT * FROM public.american_football_sacks_against_stats"
jdbc_paging_enabled => "true"
jdbc_page_size => "300"
}
}
filter {
mutate {
convert => {
"id" => "integer"
"sacks_against_total" => "integer"
"sacks_against_yards" => "integer"
}
}
}
output {
stdout { codec => "json" }
elasticsearch {
hosts => "http://localhost:9200"
index => "sportsdb"
doc_as_upsert => true #
}
}

This is the solution I was looking for:
https://linuxhint.com/elasticsearch-reindex-change-field-type/
To start you create a input from your index and change the types.
PUT _ingest/pipeline/convert_pipeline
{
“description”: “converts the field sacks_against_total,sacks_against_yards fields to a long from string”,
"processors" : [
{
"convert" : {
"field" : "sacks_against_yards",
"type": "long"
}
},
{
"convert" : {
"field" : "sacks_against_total",
"type": "long"
}
}
]
}
Or in cURL:
curl -XPUT "http://localhost:9200/_ingest/pipeline/convert_pipeline" -H 'Content-Type: application/json' -d'{ "description": "converts the sacks_against_total field to a long from string", "processors" : [ { "convert" : { "field" : "sacks_against_total", "type": "long" } }, {"convert" : { "field" : "sacks_against_yards", "type": "long" } } ]}'
And then reindexing the index into another one
POST _reindex
{
“source”: {
"index": "american_football_sacks_against_stats"
},
"dest": {
"index": "american_football_sacks_against_stats_withLong",
"pipeline": "convert_pipeline"
}
}
Or in cURL:
curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'{ "source": { "index": "sportsdb" }, "dest": { "index": "sportsdb_finish", "pipeline": "convert_pipeline" }}'

Elasticsearch does not provide the functionality to change types for existing fields.
You can read here for some options:
https://medium.com/#max.borysov/change-field-type-in-elasticsearch-index-2d11bb366517

Related

Replica and shard settings not applied in elasticsearch template

I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?
Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.

geo_point in Elastic

I'm trying to map a latitude and longitude to a geo_point in Elastic.
Here's my log file entry:
13-01-2017 ORDER COMPLETE: £22.00 Glasgow, 55.856299, -4.258845
And here's my conf file
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
}
stdout {}
}
I start logstash with /bin/logstash -f orders.conf and this gives:
"#version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true,
"properties"=>{"ip"=>{"type"=>"ip"},
"location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"},
"longitude"=>{"type"=>"half_float"}}}}}}}}
See? It's seeing location as a geo_point. Yet GET sales/_mapping results in this:
"location": {
"properties": {
"lat": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
Update
Each time I reindex, I stop logstash thenremove the .sincedb from /opt/logstash/data/plugins/inputs/file.... I have also made a brand new log file and I increment the index each time (I'm currently up to sales7).
conf file
input {
file {
path => "/opt/ag-created/logs/orders2.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+), (?<order_lat>[0-9.]+), (?<order_long>[-0-9.]+)( - (?<order_failure_reason>[A-Za-z :]+))?" }
}
mutate {
convert => { "order_amount" => "float" }
}
mutate {
convert => { "order_lat" => "float" }
}
mutate {
convert => { "order_long" => "float" }
}
mutate {
rename => { "order_long" => "[location][lon]" }
}
mutate {
rename => { "order_lat" => "[location][lat]" }
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders2.json"
template_overwrite => true
}
stdout {}
}
JSON file
{
"template": "sales7",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders.json"
template_overwrite => true
}
stdout {}
}
Interestingly, when the geo_point mapping doesn't work (ie. both lat and long are floats), my data is indexed (30 rows). But when the location is correctly made into a geo_point, none of my rows are indexed.
There is two way to do this. First one is creating a template for your mapping to create a correct mapping while indexing you data. Because Elasticseach does not understand what your data type is. You should say it theses things like below.
Firstly, create a template.json file for your mapping structure:
{
"template": "sales*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
After that change your logstash configuration to put this mapping your index :
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
template_name => "myindex"
template => "/etc/logstash/conf.d/template.json"
template_overwrite => true
}
stdout {}
}
Second option is ingest node feature. I will update my answer for this option but now you can check my dockerized repository. At this example, I used ingest node feature instead of template while parsing location data.

ELK - Kibana doesn't recognize geo_point field

I'm trying to create a Tile map on Kibana, with GEO location points.
For some reason, When I'm trying to create the map, I get the following message on Kibana:
No Compatible Fields: The "logs" index pattern does not contain any of
the following field types: geo_point
My settings:
Logstash (version 2.3.1):
filter {
grok {
match => {
"message" => "MY PATTERN"
}
}
geoip {
source => "ip"
target => "geoip"
add_field => [ "location", "%{[geoip][latitude]}, %{[geoip][longitude]}" ] #added this extra field in case the nested field is the problem
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
index => "logs"
}
}
When log input arrives, I can see it parse it as should and I do get the geoIp data for a given IP:
"geoip" => {
"ip" => "XXX.XXX.XXX.XXX",
"country_code2" => "XX",
"country_code3" => "XXX",
"country_name" => "XXXXXX",
"continent_code" => "XX",
"region_name" => "XX",
"city_name" => "XXXXX",
"latitude" => XX.0667,
"longitude" => XX.766699999999986,
"timezone" => "XXXXXX",
"real_region_name" => "XXXXXX",
"location" => [
[0] XX.766699999999986,
[1] XX.0667
]
},
"location" => "XX.0667, XX.766699999999986"
ElasticSearch (version 2.3.1):
GET /logs/_mapping returns:
{
"logs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
"geoip": {
"properties": {
.
.
.
"latitude": {
"type": "double"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "double"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
Kibana (version 4.5.0):
I do see all the data and everything seems to be fine.
Just when I go to "Visualize" -> "Tile map" -> "From a new search" -> "Geo Coordinates", I get this error message:
No Compatible Fields: The "logs" index pattern does not contain any of the following field types: geo_point
Even tho I see in elasticsearch mapping that the location type is geo_point.
What am I missing?
Found the issue!
I called the index "logs". changed the index name to "logstash-logs" (need logstash-* prefix) and everything started to function!

logstash and elasticsearch geo_point

I am using logstash to input geospatial data from a csv into elasticsearch as geo_points.
The CSV looks like the following:
$ head -5 geo_data.csv
"lon","lat","lon2","lat2","d","i","approx_bearing"
-1.7841,50.7408,-1.7841,50.7408,0.982654,1,256.307
-1.7841,50.7408,-1.78411,50.7408,0.982654,1,256.307
-1.78411,50.7408,-1.78412,50.7408,0.982654,1,256.307
-1.78412,50.7408,-1.78413,50.7408,0.982654,1,256.307
I have create a mapping template that looks like the following:
$ cat map_template.json
{
"template": "base_map_template",
"order": 1,
"settings": {
"number_of_shards": 1
},
{
"mappings": {
"base_map": {
"properties": {
"lon2": { "type" : "float" },
"lat2": { "type" : "float" },
"d": { "type" : "float" },
"appox_bearing": { "type" : "float" },
"location": { "type" : "geo_point" }
}
}
}
}
}
My config file for logstash has been set up as follows:
$ cat map.conf
input {
stdin {}
}
filter {
csv {
columns => [
"lon","lat","lon2","lat2","d","i","approx_bearing"
]
}
if [lon] == "lon" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "lon" => "float" }
convert => { "lat" => "float" }
convert => { "lon2" => "float" }
convert => { "lat2" => "float" }
convert => { "d" => "float" }
convert => { "i" => "integer"}
convert => { "approx_bearing" => "float"}
}
mutate {
rename => {
"lon" => "[location][lon]"
"lat" => "[location][lat]"
}
}
}
}
output {
# stdout { codec => rubydebug }
stdout { codec => dots }
elasticsearch {
index => "base_map"
template => "map_template.json"
document_type => "node_points"
document_id => "%{i}"
}
}
I then try and use logstash to input the csv data into elasticsearch as geo_points using the following command:
$ cat geo_data.csv | logstash-2.1.3/bin/logstash -f map.conf
I get the following error:
Settings: Default filter workers: 16
Unexpected character ('{' (code 123)): was expecting double-quote to start field name
at [Source: [B#278e55d1; line: 7, column: 3]{:class=>"LogStash::Json::ParserError", :level=>:error}
Logstash startup completed
....Logstash shutdown completed
What am I missing?
wayward "{" on line 7 of your template file

Logstash/Elasticsearch CSV Field Types, Date Formats and Multifields (.raw)

I've been playing around with getting a tab delimited file into Elasticsearch using the CSV filter in Logstash. Getting the data in was actually incredibly easy, but I'm having trouble getting the field types to come in right when I look at the data in Kibana. Dates and integers continue to come in as strings, so I can't plot by date or do any analysis functions on integers (sum, mean, etc).
I'm also having trouble getting the .raw version of the fields to populate. For example, in device I have data like "HTC One", but when if I do a pie chart in Kibana, it'll show up as two separate groupings "HTC" and "One". When I try to chart device.raw instead, it comes up as a missing field. From what I've read, it seems like Logstash should automatically create a raw version of each string field, but that doesn't seem to be happening.
I've been sifting through the documentation, google and stack, but haven't found a solution. Any ideas appreciated! Thanks.
Config file:
#logstash.conf
input {
file {
path => "file.txt"
type => "event"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["userid","date","distance","device"]
separator => " "
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
protocol => "http"
index => "userid"
workers => 2
template => template.json
}
#stdout {
# codec => rubydebug
#}
}
Here's the template file:
#template.json:
{
"template": "event",
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0,
"index" : {
"query" : { "default_field" : "userid" }
}
},
"mappings": {
"_default_": {
"_all": { "enabled": false },
"_source": { "compress": true },
"dynamic_templates": [
{
"string_template" : {
"match" : "*",
"mapping": { "type": "string", "index": "not_analyzed" },
"match_mapping_type" : "string"
}
}
],
"properties" : {
"date" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss"},
"device" : { "type" : "string", "fields": {"raw": {"type": "string","index": "not_analyzed"}}},
"distance" : { "type" : "integer"}
}
}
}
Figured it out - the template name IS the index. So the "template" : "event" line should have been "template" : "userid"
I found another (easier) way to specify the type of the fields. You can use logstash's mutate filter to change the type of a field. Simply add the following filter after your csv filter to your logstash config
mutate {
convert => [ "fieldname", "integer" ]
}
For details check out the logstash docs - mutate convert

Resources