I'm trying to map a latitude and longitude to a geo_point in Elastic.
Here's my log file entry:
13-01-2017 ORDER COMPLETE: £22.00 Glasgow, 55.856299, -4.258845
And here's my conf file
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
}
stdout {}
}
I start logstash with /bin/logstash -f orders.conf and this gives:
"#version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true,
"properties"=>{"ip"=>{"type"=>"ip"},
"location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"},
"longitude"=>{"type"=>"half_float"}}}}}}}}
See? It's seeing location as a geo_point. Yet GET sales/_mapping results in this:
"location": {
"properties": {
"lat": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
Update
Each time I reindex, I stop logstash thenremove the .sincedb from /opt/logstash/data/plugins/inputs/file.... I have also made a brand new log file and I increment the index each time (I'm currently up to sales7).
conf file
input {
file {
path => "/opt/ag-created/logs/orders2.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+), (?<order_lat>[0-9.]+), (?<order_long>[-0-9.]+)( - (?<order_failure_reason>[A-Za-z :]+))?" }
}
mutate {
convert => { "order_amount" => "float" }
}
mutate {
convert => { "order_lat" => "float" }
}
mutate {
convert => { "order_long" => "float" }
}
mutate {
rename => { "order_long" => "[location][lon]" }
}
mutate {
rename => { "order_lat" => "[location][lat]" }
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders2.json"
template_overwrite => true
}
stdout {}
}
JSON file
{
"template": "sales7",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
index => "sales7"
document_type => "order"
template_name => "myindex"
template => "/tmp/templates/custom-orders.json"
template_overwrite => true
}
stdout {}
}
Interestingly, when the geo_point mapping doesn't work (ie. both lat and long are floats), my data is indexed (30 rows). But when the location is correctly made into a geo_point, none of my rows are indexed.
There is two way to do this. First one is creating a template for your mapping to create a correct mapping while indexing you data. Because Elasticseach does not understand what your data type is. You should say it theses things like below.
Firstly, create a template.json file for your mapping structure:
{
"template": "sales*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"sales": {
"_source": {
"enabled": false
},
"properties": {
"location": {
"type": "geo_point"
}
}
}
},
"aliases": {}
}
After that change your logstash configuration to put this mapping your index :
input {
file {
path => "/opt/logs/orders.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "(?<date>[0-9-]+) (?<order_status>ORDER [a-zA-Z]+): (?<order_amount>£[0-9.]+) (?<order_location>[a-zA-Z ]+)"}
}
mutate {
convert => { "order_amount" => "float" }
convert => { "order_lat" => "float" }
convert => { "order_long" => "float" }
rename => {
"order_long" => "[location][lon]"
"order_lat" => "[location][lat]"
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "sales"
document_type => "order"
template_name => "myindex"
template => "/etc/logstash/conf.d/template.json"
template_overwrite => true
}
stdout {}
}
Second option is ingest node feature. I will update my answer for this option but now you can check my dockerized repository. At this example, I used ingest node feature instead of template while parsing location data.
Related
I have 3m records. Headers are value, type, other_fields..
Here I need to load the data as in this
I need to specify type as context for that value in the record. Is there any way to do this with log stash? or any other options?
val,val_type,id
Sunnyvale it labs, seller, 10223667
For this, I'd use the new CSV ingest processor
First create the ingest pipeline to parse your CSV data
PUT _ingest/pipeline/csv-parser
{
"processors": [
{
"csv": {
"field": "message",
"target_fields": [
"val",
"val_type",
"id"
]
}
},
{
"script": {
"source": """
def val = ctx.val;
ctx.val = [
'input': val,
'contexts': [
'type': [ctx.val_type]
]
]
"""
}
},
{
"remove": {
"field": "message"
}
}
]
}
Then, you can index your documents as follow:
PUT index/_doc/1?pipeline=csv-parser
{
"message": "Sunnyvale it labs,seller,10223667"
}
After ingestion, the document will look like this:
{
"val_type": "seller",
"id": "10223667",
"val": {
"input": "Sunnyvale it labs",
"contexts": {
"type": [
"seller"
]
}
}
}
UPDATE: Logstash solution
Using Logstash, it's also feasible. The configuration file would look something like this:
input {
file {
path => "/path/to/your/file.csv"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
csv {
skip_header => true
separator => ","
columns => ["val", "val_type", "id"]
}
mutate {
rename => { "val" => "value" }
add_field => {
"[val][input]" => "%{value}"
"[val][contexts][type]" => "%{val_type}"
}
remove_field => [ "value" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "your-index"
}
}
There are two conf files used to load data from 2 json files,testOrders and testItems, each containing only one document, into same index. I am trying to create parent child relationship between two documents.
Below is my conf for testorders
input{
file{
path => ["/path_data/testOrders.json"]
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
target => "testorders_collection"
remove_field => [ "message" ]
}
ruby {
code => "
event.set('[my_join_field][name]', 'testorders')
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "testorder"
document_id => "%{[testorders_collection][eId]}"
routing => "%{[testorders_collection][eId]}"
}
}
Below is the conf for testItems
input{
file{
path => ["/path_to_data/testItems.json"]
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
target => "test_collection"
remove_field => [ "message" ]
}
}
filter {
ruby {
code => "
event.set('[my_join_field][name]', 'testItems')
event.set('[my_join_field][parent]', event.get('[test_collection][foreignKeyId]'))
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "testorder"
document_id => "%{[test_collection][eId]}"
routing => "%{[test_collection][foreignKeyId]}"
}
}
As expected the logstash creates 1 record for testOrders but creates 2 records for testItems given 1 json document each for testOrders and testItems. One document is created properly with data but other is created as duplicate and there seems to be no data. The document that is created with data not parsed looks like as follows
{
"_index": "testorder",
"_type": "doc",
"_id": "%{[test_collection][eId]}",
"_score": 1,
"_routing": "%{[test_collection][foreignKeyId]}",
"_source": {
"type": "json",
"#timestamp": "2018-07-10T04:15:58.494Z",
"host": "<hidden>",
"test_collection": null,
"my_join_field": {
"name": "testItems",
"parent": null
},
"path": "/path_to_data/testItems.json",
"#version": "1"
}
Defining a mapping relationship in elastic search solved the issue. This is the way to define the relationship
PUT fulfillmentorder
{
"mappings": {
"doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"fulfillmentorders": "orderlineitems"
}
}
}
}
}
}
I am learning ElasticSearch and have hit a block. I am trying to use logstash to load a simple CSV into ElasticSearch. This is the data, it is a postcode, longitude, latitude
ZE1 0BH,-1.136758103355,60.150855671143
ZE1 0NW,-1.15526666950369,60.1532197533966
I am using the following logstash conf file to filter the CSV to create a "location" field
input {
file {
path => "postcodes.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["postcode", "lat", "lon"]
separator => ","
}
mutate { convert => {"lat" => "float"} }
mutate { convert => {"lon" => "float"} }
mutate { rename => {"lat" => "[location][lat]"} }
mutate { rename => {"lon" => "[location][lon]"} }
mutate { convert => { "[location]" => "float" } }
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "postcodes"
}
stdout { codec => rubydebug }
}
And I have added the mapping to ElasticSearch using the console in Kibana
PUT postcodes
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"feature": {
"_all": { "enabled": true },
"properties": {
"postcode": {"type": "text"},
"location": {"type": "geo_point"}
}
}
}
}
I check the mappins for the index using
GET postcodes/_mapping
{
"postcodes": {
"mappings": {
"feature": {
"_all": {
"enabled": true
},
"properties": {
"location": {
"type": "geo_point"
},
"postcode": {
"type": "text"
}
}
}
}
}
}
So this all seems to be correct having looked at the documentation and the other questions posted.
However when i run
bin/logstash -f postcodes.conf
I get an error:
[location] is defined as an object in mapping [logs] but this name is already used for a field in other types
I have tried a number of alternative methods;
Deleted the index and the create a template.json and changed my conf file to have the extra settings:
manage_template => true
template => "postcode_template.json"
template_name =>"open_names"
template_overwrite => true
and this gets the same error.
I have managed to get the data loaded by not supplying a template however the data never gets loaded in as a geo_point so you cannot use the Kibana Tile Map to visualise the data
Can anyone explain why I am receiving that error and what method I should use?
Your problem is that you don't have a document_type => feature on your elasticsearch output. Without that, it's going to create the object on type logs which is why you are getting this conflict.
I'm trying to create a Tile map on Kibana, with GEO location points.
For some reason, When I'm trying to create the map, I get the following message on Kibana:
No Compatible Fields: The "logs" index pattern does not contain any of
the following field types: geo_point
My settings:
Logstash (version 2.3.1):
filter {
grok {
match => {
"message" => "MY PATTERN"
}
}
geoip {
source => "ip"
target => "geoip"
add_field => [ "location", "%{[geoip][latitude]}, %{[geoip][longitude]}" ] #added this extra field in case the nested field is the problem
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
index => "logs"
}
}
When log input arrives, I can see it parse it as should and I do get the geoIp data for a given IP:
"geoip" => {
"ip" => "XXX.XXX.XXX.XXX",
"country_code2" => "XX",
"country_code3" => "XXX",
"country_name" => "XXXXXX",
"continent_code" => "XX",
"region_name" => "XX",
"city_name" => "XXXXX",
"latitude" => XX.0667,
"longitude" => XX.766699999999986,
"timezone" => "XXXXXX",
"real_region_name" => "XXXXXX",
"location" => [
[0] XX.766699999999986,
[1] XX.0667
]
},
"location" => "XX.0667, XX.766699999999986"
ElasticSearch (version 2.3.1):
GET /logs/_mapping returns:
{
"logs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
"geoip": {
"properties": {
.
.
.
"latitude": {
"type": "double"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "double"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
Kibana (version 4.5.0):
I do see all the data and everything seems to be fine.
Just when I go to "Visualize" -> "Tile map" -> "From a new search" -> "Geo Coordinates", I get this error message:
No Compatible Fields: The "logs" index pattern does not contain any of the following field types: geo_point
Even tho I see in elasticsearch mapping that the location type is geo_point.
What am I missing?
Found the issue!
I called the index "logs". changed the index name to "logstash-logs" (need logstash-* prefix) and everything started to function!
I am using logstash to input geospatial data from a csv into elasticsearch as geo_points.
The CSV looks like the following:
$ head -5 geo_data.csv
"lon","lat","lon2","lat2","d","i","approx_bearing"
-1.7841,50.7408,-1.7841,50.7408,0.982654,1,256.307
-1.7841,50.7408,-1.78411,50.7408,0.982654,1,256.307
-1.78411,50.7408,-1.78412,50.7408,0.982654,1,256.307
-1.78412,50.7408,-1.78413,50.7408,0.982654,1,256.307
I have create a mapping template that looks like the following:
$ cat map_template.json
{
"template": "base_map_template",
"order": 1,
"settings": {
"number_of_shards": 1
},
{
"mappings": {
"base_map": {
"properties": {
"lon2": { "type" : "float" },
"lat2": { "type" : "float" },
"d": { "type" : "float" },
"appox_bearing": { "type" : "float" },
"location": { "type" : "geo_point" }
}
}
}
}
}
My config file for logstash has been set up as follows:
$ cat map.conf
input {
stdin {}
}
filter {
csv {
columns => [
"lon","lat","lon2","lat2","d","i","approx_bearing"
]
}
if [lon] == "lon" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "lon" => "float" }
convert => { "lat" => "float" }
convert => { "lon2" => "float" }
convert => { "lat2" => "float" }
convert => { "d" => "float" }
convert => { "i" => "integer"}
convert => { "approx_bearing" => "float"}
}
mutate {
rename => {
"lon" => "[location][lon]"
"lat" => "[location][lat]"
}
}
}
}
output {
# stdout { codec => rubydebug }
stdout { codec => dots }
elasticsearch {
index => "base_map"
template => "map_template.json"
document_type => "node_points"
document_id => "%{i}"
}
}
I then try and use logstash to input the csv data into elasticsearch as geo_points using the following command:
$ cat geo_data.csv | logstash-2.1.3/bin/logstash -f map.conf
I get the following error:
Settings: Default filter workers: 16
Unexpected character ('{' (code 123)): was expecting double-quote to start field name
at [Source: [B#278e55d1; line: 7, column: 3]{:class=>"LogStash::Json::ParserError", :level=>:error}
Logstash startup completed
....Logstash shutdown completed
What am I missing?
wayward "{" on line 7 of your template file