logstash and elasticsearch geo_point - elasticsearch

I am using logstash to input geospatial data from a csv into elasticsearch as geo_points.
The CSV looks like the following:
$ head -5 geo_data.csv
"lon","lat","lon2","lat2","d","i","approx_bearing"
-1.7841,50.7408,-1.7841,50.7408,0.982654,1,256.307
-1.7841,50.7408,-1.78411,50.7408,0.982654,1,256.307
-1.78411,50.7408,-1.78412,50.7408,0.982654,1,256.307
-1.78412,50.7408,-1.78413,50.7408,0.982654,1,256.307
I have create a mapping template that looks like the following:
$ cat map_template.json
{
"template": "base_map_template",
"order": 1,
"settings": {
"number_of_shards": 1
},
{
"mappings": {
"base_map": {
"properties": {
"lon2": { "type" : "float" },
"lat2": { "type" : "float" },
"d": { "type" : "float" },
"appox_bearing": { "type" : "float" },
"location": { "type" : "geo_point" }
}
}
}
}
}
My config file for logstash has been set up as follows:
$ cat map.conf
input {
stdin {}
}
filter {
csv {
columns => [
"lon","lat","lon2","lat2","d","i","approx_bearing"
]
}
if [lon] == "lon" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "lon" => "float" }
convert => { "lat" => "float" }
convert => { "lon2" => "float" }
convert => { "lat2" => "float" }
convert => { "d" => "float" }
convert => { "i" => "integer"}
convert => { "approx_bearing" => "float"}
}
mutate {
rename => {
"lon" => "[location][lon]"
"lat" => "[location][lat]"
}
}
}
}
output {
# stdout { codec => rubydebug }
stdout { codec => dots }
elasticsearch {
index => "base_map"
template => "map_template.json"
document_type => "node_points"
document_id => "%{i}"
}
}
I then try and use logstash to input the csv data into elasticsearch as geo_points using the following command:
$ cat geo_data.csv | logstash-2.1.3/bin/logstash -f map.conf
I get the following error:
Settings: Default filter workers: 16
Unexpected character ('{' (code 123)): was expecting double-quote to start field name
at [Source: [B#278e55d1; line: 7, column: 3]{:class=>"LogStash::Json::ParserError", :level=>:error}
Logstash startup completed
....Logstash shutdown completed
What am I missing?

wayward "{" on line 7 of your template file

Related

Elastic search load csv data with context

I have 3m records. Headers are value, type, other_fields..
Here I need to load the data as in this
I need to specify type as context for that value in the record. Is there any way to do this with log stash? or any other options?
val,val_type,id
Sunnyvale it labs, seller, 10223667
For this, I'd use the new CSV ingest processor
First create the ingest pipeline to parse your CSV data
PUT _ingest/pipeline/csv-parser
{
"processors": [
{
"csv": {
"field": "message",
"target_fields": [
"val",
"val_type",
"id"
]
}
},
{
"script": {
"source": """
def val = ctx.val;
ctx.val = [
'input': val,
'contexts': [
'type': [ctx.val_type]
]
]
"""
}
},
{
"remove": {
"field": "message"
}
}
]
}
Then, you can index your documents as follow:
PUT index/_doc/1?pipeline=csv-parser
{
"message": "Sunnyvale it labs,seller,10223667"
}
After ingestion, the document will look like this:
{
"val_type": "seller",
"id": "10223667",
"val": {
"input": "Sunnyvale it labs",
"contexts": {
"type": [
"seller"
]
}
}
}
UPDATE: Logstash solution
Using Logstash, it's also feasible. The configuration file would look something like this:
input {
file {
path => "/path/to/your/file.csv"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
csv {
skip_header => true
separator => ","
columns => ["val", "val_type", "id"]
}
mutate {
rename => { "val" => "value" }
add_field => {
"[val][input]" => "%{value}"
"[val][contexts][type]" => "%{val_type}"
}
remove_field => [ "value" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "your-index"
}
}

Logstash timestamp appearing as text in Elasticsearch

I am using Elasticsearch 7.3.1 and Logstash 7.3.1. I am trying to make a field of mine as the Elasticsearch timestamp using the date filter. The data is being inserted properly but the type of #timestamp is coming text. How do I fix this?
My input timestamp is like 1567408605794750813. My code is:
input {
elasticsearch {
hosts => "x.x.x.x"
index => "raw"
docinfo => true
}
}
filter {
mutate {
convert => {
"timestamp" => "integer"
}
}
date {
match => ["timestamp", "UNIX_MS", "ISO8601"]
target => "#timestamp"
}
}
output {
elasticsearch {
index => "logs-%{app_name}"
document_id => "%{[#metadata][_id]}"
}
}
After running the mapping API, I get
"#timestamp" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
try the below code,
date {
match => [ "[timestamp]", "UNIX" ]
target => "[#timestamp]"
}

geo_point in Elastic

I'm trying to map a latitude and longitude to a geo_point in Elastic.
Here's my log file entry:
13-01-2017 ORDER COMPLETE: £22.00 Glasgow, 55.856299, -4.258845
And here's my conf file
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
}
stdout {}
}
I start logstash with /bin/logstash -f orders.conf and this gives:
"#version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true,
"properties"=>{"ip"=>{"type"=>"ip"},
"location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"},
"longitude"=>{"type"=>"half_float"}}}}}}}}
See? It's seeing location as a geo_point. Yet GET sales/_mapping results in this:
"location": {
"properties": {
"lat": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
Update
Each time I reindex, I stop logstash thenremove the .sincedb from /opt/logstash/data/plugins/inputs/file.... I have also made a brand new log file and I increment the index each time (I'm currently up to sales7).
conf file
input {
file {
path => "/opt/ag-created/logs/orders2.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+), (?<order_lat>[0-9.]+), (?<order_long>[-0-9.]+)( - (?<order_failure_reason>[A-Za-z :]+))?" }
}
mutate {
convert => { "order_amount" => "float" }
}
mutate {
convert => { "order_lat" => "float" }
}
mutate {
convert => { "order_long" => "float" }
}
mutate {
rename => { "order_long" => "[location][lon]" }
}
mutate {
rename => { "order_lat" => "[location][lat]" }
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders2.json"
template_overwrite => true
}
stdout {}
}
JSON file
{
"template": "sales7",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders.json"
template_overwrite => true
}
stdout {}
}
Interestingly, when the geo_point mapping doesn't work (ie. both lat and long are floats), my data is indexed (30 rows). But when the location is correctly made into a geo_point, none of my rows are indexed.
There is two way to do this. First one is creating a template for your mapping to create a correct mapping while indexing you data. Because Elasticseach does not understand what your data type is. You should say it theses things like below.
Firstly, create a template.json file for your mapping structure:
{
"template": "sales*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
After that change your logstash configuration to put this mapping your index :
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
template_name => "myindex"
template => "/etc/logstash/conf.d/template.json"
template_overwrite => true
}
stdout {}
}
Second option is ingest node feature. I will update my answer for this option but now you can check my dockerized repository. At this example, I used ingest node feature instead of template while parsing location data.

ElasticSearch 5.0.0 - error about object name is already in use

I am learning ElasticSearch and have hit a block. I am trying to use logstash to load a simple CSV into ElasticSearch. This is the data, it is a postcode, longitude, latitude
ZE1 0BH,-1.136758103355,60.150855671143
ZE1 0NW,-1.15526666950369,60.1532197533966
I am using the following logstash conf file to filter the CSV to create a "location" field
input {
file {
path => "postcodes.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["postcode", "lat", "lon"]
separator => ","
}
mutate { convert => {"lat" => "float"} }
mutate { convert => {"lon" => "float"} }
mutate { rename => {"lat" => "[location][lat]"} }
mutate { rename => {"lon" => "[location][lon]"} }
mutate { convert => { "[location]" => "float" } }
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "postcodes"
}
stdout { codec => rubydebug }
}
And I have added the mapping to ElasticSearch using the console in Kibana
PUT postcodes
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"feature": {
"_all": { "enabled": true },
"properties": {
"postcode": {"type": "text"},
"location": {"type": "geo_point"}
}
}
}
}
I check the mappins for the index using
GET postcodes/_mapping
{
"postcodes": {
"mappings": {
"feature": {
"_all": {
"enabled": true
},
"properties": {
"location": {
"type": "geo_point"
},
"postcode": {
"type": "text"
}
}
}
}
}
}
So this all seems to be correct having looked at the documentation and the other questions posted.
However when i run
bin/logstash -f postcodes.conf
I get an error:
[location] is defined as an object in mapping [logs] but this name is already used for a field in other types
I have tried a number of alternative methods;
Deleted the index and the create a template.json and changed my conf file to have the extra settings:
manage_template => true
template => "postcode_template.json"
template_name =>"open_names"
template_overwrite => true
and this gets the same error.
I have managed to get the data loaded by not supplying a template however the data never gets loaded in as a geo_point so you cannot use the Kibana Tile Map to visualise the data
Can anyone explain why I am receiving that error and what method I should use?
Your problem is that you don't have a document_type => feature on your elasticsearch output. Without that, it's going to create the object on type logs which is why you are getting this conflict.

CSV geodata into elasticsearch as a geo_point type using logstash

Below is a reproducible example of the problem I am having using to most recent versions of logstash and elasticsearch.
I am using logstash to input geospatial data from a csv into elasticsearch as geo_points.
The CSV looks like the following:
$ head simple_base_map.csv
"lon","lat"
-1.7841,50.7408
-1.7841,50.7408
-1.78411,50.7408
-1.78412,50.7408
-1.78413,50.7408
-1.78414,50.7408
-1.78415,50.7408
-1.78416,50.7408
-1.78416,50.7408
I have create a mapping template that looks like the following:
$ cat simple_base_map_template.json
{
"template": "base_map_template",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"node_points" : {
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
and have a logstash config file that looks like the following:
$ cat simple_base_map.conf
input {
stdin {}
}
filter {
csv {
columns => [
"lon", "lat"
]
}
if [lon] == "lon" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "lon" => "float" }
convert => { "lat" => "float" }
}
mutate {
rename => {
"lon" => "[location][lon]"
"lat" => "[location][lat]"
}
}
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "base_map_simple"
template => "simple_base_map_template.json"
document_type => "node_points"
}
}
I then run the following:
$cat simple_base_map.csv | logstash-2.1.3/bin/logstash -f simple_base_map.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
However when looking at the index base_map_simple, it suggests the documents would not have a location: geo_point type in it...and rather it would be two doubles of lat and lon.
$ curl -XGET 'localhost:9200/base_map_simple?pretty'
{
"base_map_simple" : {
"aliases" : { },
"mappings" : {
"node_points" : {
"properties" : {
"location" : {
"properties" : {
"lat" : {
"type" : "double"
},
"lon" : {
"type" : "double"
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1457355015883",
"uuid" : "luWGyfB3ToKTObSrbBbcbw",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
How would i need to change any of the above files to ensure that it goes into elastic search as a geo_point type?
Finally, I would like to be able to carry out a nearest neighbour search on the geo_points by using a command such as the following:
curl -XGET 'localhost:9200/base_map_simple/_search?pretty' -d'
{
"size": 1,
"sort": {
"_geo_distance" : {
"location" : {
"lat" : 50,
"lon" : -1
},
"order" : "asc",
"unit": "m"
}
}
}'
Thanks
The problem is that in your elasticsearch output you named the index base_map_simple while in your template the template property is base_map_template, hence the template is not being applied when creating the new index. The template property needs to somehow match the name of the index being created in order for the template to kick in.
It will work if you simply change the latter to base_map_*, i.e. as in:
{
"template": "base_map_*", <--- change this
"order": 1,
"settings": {
"index.number_of_shards": 1
},
"mappings": {
"node_points": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
UPDATE
Make sure to delete the current index as well as the template first., i.e.
curl -XDELETE localhost:9200/base_map_simple
curl -XDELETE localhost:9200/_template/logstash

Resources