How to index geojson file in elasticsearch? - elasticsearch

I am trying to store spatial data in the form of geojson,csv files and shape files into elasticsearch USING PYTHON.I am new to elasticsearch and even after following the documentation i am not able to successfully index it. Any help would be appreciated.
sample geojson file :
"type": "FeatureCollection",
"features": [
"type": "Feature",
"properties": {
"ID_0": 105,
"ISO": "IND",
"NAME_0": "India",
"ID_1": 1288,
"NAME_1": "Telangana",
"ID_2": 15715,
"NAME_2": "Telangana",
"VARNAME_2": null,
"NL_NAME_2": null,
"HASC_2": "IN.TS.AD",
"CC_2": null,
"TYPE_2": "State",
"ENGTYPE_2": "State",
"VALIDFR_2": "Unknown",
"VALIDTO_2": "Present",
"REMARKS_2": null,
"Shape_Leng": 8.103535,
"Shape_Area": 127258717496
"geometry": {
"type": "Polygon",
"coordinates": [

import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
k = ({
"_index": "YOUR_INDEX",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
This portion of the code loads an external geojson file, then connects to Elasticsearch.
k = ({
"_index": "conflict-data",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
The ()s here creates a generator which we will feed to helpers.bulk(es, k). Remember _source is the original data as is in Elasticsearch speak - IE: our raw JSON. _index is just the index in which we want to put our data. You'll see other examples with _doc here. This is part of the mapping types and no longer exists in Elasticsearch 7.X+.
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
The function geojson uses a generator to produce events. A generator function will, instead of returning and finishingresume at the keywordyield` after each call. In this case, we are generating our GeoJSON features. In my code you also see:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
This is just an example of manipulating the data in the JSON before sending it out to Elasticsearch.
The key is in your mapping file you must have something tagged as geo_point or geo_shape. These data types are how Elasticsearch recognizes geo data. Example from my mapping file:
"properties": {
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
That is to say, before uploading your GeoJSON data with Python, you need to create your index, and then apply a mapping file which includes either geo_shape or geo_point using something like:
curl -X PUT "localhost:9200/YOUR_INDEX?pretty"
curl -X PUT localhost:9200/YOUR_INDEX/_mapping?pretty -H "Content-Type: application/json" -d #mapping.json

You must separate the GeoJson features into (1) geometry and (2) properties/attributes parts. You cannot index GeoJson features and feature collections directly (see documentation), only the geometry part is supported as a field type.
So you final indexable document would look somewhat flattened:
"ID_0": 105,
"ISO": "IND",
"NAME_0": "India",
"ID_1": 1288,
"NAME_1": "Telangana",
"ID_2": 15715,
"NAME_2": "Telangana",
"VARNAME_2": null,
"NL_NAME_2": null,
"HASC_2": "IN.TS.AD",
"CC_2": null,
"TYPE_2": "State",
"ENGTYPE_2": "State",
"VALIDFR_2": "Unknown",
"VALIDTO_2": "Present",
"REMARKS_2": null,
"Shape_Leng": 8.103535,
"Shape_Area": 127258717496,
"geometry": {
"type": "Polygon",
"coordinates": [


elasticsearch filebeat mapper_parsing_exception when using decode_json_fields

I have ECK setup and im using filebeat to ship logs from Kubernetes to elasticsearch.
Ive recently added decode_json_fields processor to my configuration, so that im able decode the json that is usually in the message field.
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 10
target: "log"
overwrite_keys: true
add_error_key: true
However logs have stopped appearing since adding it.
example log:
"_index": "filebeat-7.9.1-2020.10.01-000001",
"_type": "_doc",
"_id": "wF9hB3UBtUOF3QRTBcts",
"_score": 1,
"_source": {
"#timestamp": "2020-10-08T08:43:18.672Z",
"kubernetes": {
"labels": {
"controller-uid": "9f3f9d08-cfd8-454d-954d-24464172fa37",
"job-name": "stream-hatchet-cron-manual-rvd"
"container": {
"name": "stream-hatchet-cron",
"image": "<redacted>"
"node": {
"name": ""
"pod": {
"uid": "041cb6d5-5da1-4efa-b8e9-d4120409af4b",
"name": "stream-hatchet-cron-manual-rvd-bh96h"
"namespace": "default"
"ecs": {
"version": "1.5.0"
"host": {
"mac": [],
"hostname": "ip-172-20-32-60",
"architecture": "x86_64",
"name": "ip-172-20-32-60",
"os": {
"codename": "Core",
"platform": "centos",
"version": "7 (Core)",
"family": "redhat",
"name": "CentOS Linux",
"kernel": "4.9.0-11-amd64"
"containerized": false,
"ip": []
"cloud": {
"instance": {
"id": "i-06c9d23210956ca5c"
"machine": {
"type": "m5.large"
"region": "us-east-2",
"availability_zone": "us-east-2a",
"account": {
"id": "<redacted>"
"image": {
"id": "ami-09d3627b4a09f6c4c"
"provider": "aws"
"stream": "stdout",
"message": "{\"message\":{\"log_type\":\"cron\",\"status\":\"start\"},\"level\":\"info\",\"timestamp\":\"2020-10-08T08:43:18.670Z\"}",
"input": {
"type": "container"
"log": {
"offset": 348,
"file": {
"path": "/var/log/containers/stream-hatchet-cron-manual-rvd-bh96h_default_stream-hatchet-cron-73069980b418e2aa5e5dcfaf1a29839a6d57e697c5072fea4d6e279da0c4e6ba.log"
"agent": {
"type": "filebeat",
"version": "7.9.1",
"hostname": "ip-172-20-32-60",
"ephemeral_id": "6b3ba0bd-af7f-4946-b9c5-74f0f3e526b1",
"id": "0f7fff14-6b51-45fc-8f41-34bd04dc0bce",
"name": "ip-172-20-32-60"
"fields": {
"#timestamp": [
"suricata.eve.timestamp": [
In the filebeat logs i can see the following error:
2020-10-08T09:25:43.562Z WARN [elasticsearch] elasticsearch/client.go:407 Cannot
index event
ext:63737745936, loc:(*time.Location)(nil)}, Meta:null,
Private:file.State{Id:"native::30998361-66306", PrevId:"",
Finished:false, Fileinfo:(*os.fileStat)(0xc001c14dd0),
Offset:539, Timestamp:time.Time{wall:0xbfd7d4a1e556bd72,
ext:916563812286, loc:(*time.Location)(0x607c540)}, TTL:-1,
Type:"container", Meta:map[string]string(nil),
FileStateOS:file.StateOS{Inode:0x1d8ff59, Device:0x10302},
IdentifierName:"native"}, TimeSeries:false}, Flags:0x1,
Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400):
{"type":"mapper_parsing_exception","reason":"failed to parse field
[log.message] of type [keyword] in document with id
'56aHB3UBLgYb8gz801DI'. Preview of field's value: '{log_type=cron,
get text on a START_OBJECT at 1:113"}}
It throws an error because apparently log.message is of type "keyword" however this does not exist in the index mapping.
I thought this maybe an issue with the "target": "log" so ive tried changing this to something arbitrary like "my_parsed_message" or "m_log" or "mlog" and i get the same error for all of them.
{"type":"mapper_parsing_exception","reason":"failed to parse field
[mlog.message] of type [keyword] in document with id
'J5KlDHUB_yo5bfXcn2LE'. Preview of field's value: '{log_type=cron,
get text on a START_OBJECT at 1:217"}}
Elastic version: 7.9.2
The problem is that some of your JSON messages contain a message field that is sometimes a simple string and other times a nested JSON object (like in the case you're showing in your question).
After this index was created, the very first message that was parsed was probably a string and hence the mapping has been modified to add the following field (line 10553):
"mlog": {
"properties": {
"message": {
"type": "keyword",
"ignore_above": 1024
You'll find the same pattern for my_parsed_message (line 10902), my_parsed_logs (line 10742), etc...
Hence the next message that comes with message being a JSON object, like
{"message":{"log_type":"cron","status":"start"}, ...
will not work because it's an object, not a string...
Looking at the fields of your custom JSON, it seems you don't really have the control over either their taxonomy (i.e. naming) or what they contain...
If you're serious about willing to search within those custom fields (which I think you are since you're parsing the field, otherwise you'd just store the stringified JSON), then I can only suggest to start figuring out a proper taxonomy in order to make sure that they all get a standard type.
If all you care about is logging your data, then I suggest to simply disable the indexing of that message field. Another solution is to set dynamic: false in your mapping to ignore those fields, i.e. not modify your mapping.

Data filter works with json data but not with csv data

In this vega chart, if I download and convert the flare-dependencies.json to csv using the following jq command,
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' flare-dependencies.json > flare-dependencies.csv
And change the corresponding data property in the file from:
"name": "dependencies",
"url": "data/flare-dependencies.json",
"transform": [
"type": "formula",
"expr": "treePath('tree', datum.source,",
"as": "treepath",
"initonly": true
"name": "dependencies",
"url": "data/flare-dependencies.csv",
"format": { "type": "csv" },
"transform": [
"type": "formula",
"expr": "treePath('tree', datum.source,",
"as": "treepath",
"initonly": true
The hovering effect wont work(the colors wont change when I hover edges/nodes.
I suspect that the issue is with this section:
"name": "selected",
"source": "dependencies",
"transform": [
"type": "filter",
"expr": "datum.source === active || === active"
What am I missing? How can I fix this?
JSON data is typed; that is, the file format distinguishes between string and numerical data. CSV data is untyped: all entries are expressed as strings.
The chart specification above requires some fields to be numerical. When you convert the input data to CSV, you must add a format specifier to specify numerical types for the numerical data columns.
In case of this chart you can use the following for the nodes data:
"format": {
"type": "tsv",
"parse": { "id": "number", "name": "string", "parent": "number" }
And the following for the links data:
"format": {
"type": "tsv",
"parse": { "source": "number", "target": "number" }

Data from geoJSON API call in Larvel 5.8

I am trying to retrieve data from the API - it returns the format in geoJSON and I am not sure how to actually get the data I want from it.
If I am using the API, I have no issues as it returns JSON format in which I can pull from rather easily.
I am using GuzzleHTTP to make the API call.
I am playing around with learning APIs and I have an interest in weather so I figured I would work on an application in which I could pull information from the local weather station and output it in to readable format for users in a table.
The code I am currently using is:
$api_call =,LON;
$client = new \GuzzleHttp\Client();
$request = $client->get($api_call);
if ($request->getStatusCode() == 200) {
$weatherRequest = $request->getBody();
$requestedWeather = json_decode($weatherRequest);
$currentweather = $requestedWeather; ** THIS IS WHERE I NEED HELP ***
return $currentweather;
return view('currentweather', ["currentweather" => $currentweather]);
When I am returning $currentweather and var_dump it to the view, it gives me all the geoJSON data but I don't know how to correctly iterate through the data to pull the information I need.
When I pull from another API it gives a different JSON format which I can just pull like so:
$api_call = https://api.weatherbit.xx/v2.0/current?
$client = new \GuzzleHttp\Client();
$request = $client->get($api_call);
if ($request->getStatusCode() == 200) {
$weatherRequest = $request->getBody();
$requestedWeather = json_decode($weatherRequest);
$currentweather = $requestedWeather->data;
return $currentweather;
return view('currentweather', ["currentweather" => $currentweather]);
And when I use $currentweather in my view I can pull any data I need with the object string name. I am not sure how to pull the data when it's leading off with the #Context tag.
The data I want lies in the "properties" part of the geoJSON array and I just can't seem to figure out how to get that in the way I am currently using.
This is my geoJSON array return:
{ "#context": [ "", { "wx": "", "s": "", "geo": "", "unit": "", "#vocab": "", "geometry":
{ "#id": "s:GeoCoordinates", "#type": "geo:wktLiteral" }, "city": "s:addressLocality", "state": "s:addressRegion", "distance": { "#id": "s:Distance", "#type": "s:QuantitativeValue" }, "bearing": { "#type": "s:QuantitativeValue" }, "value": { "#id": "s:value" }, "unitCode":
{ "#id": "s:unitCode", "#type": "#id" }, "forecastOffice": { "#type": "#id" }, "forecastGridData": { "#type": "#id" }, "publicZone": { "#type": "#id" }, "county": { "#type": "#id" } } ], "id": ",xxx", "type": "Feature", "geometry": { "type": "Point", "coordinates": [ xxx, xxx ] }, "properties":
{ "#id": ",xxx", "#type": "wx:Point", "cwa": "xxx", "forecastOffice": "", "gridX": 86, "gridY": 77, "forecast": ",xx/forecast", "forecastHourly": ",xx/forecast/hourly", "forecastGridData": ",xx", "observationStations": ",xx/stations", "relativeLocation":
{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [ xxx, xxx ] }, "properties": { "city": "xxx", "state": "xx", "distance": { "value": xxxx.xxxxxxxxx, "unitCode": "unit:m" }, "bearing": { "value": 150, "unitCode": "unit:degrees_true" } } }, "forecastZone": "", "county": "", "fireWeatherZone": "", "timeZone": "America/New_York", "radarStation": "xxxx" } }
Thanks for your help!
Any member of the JSON object can be accessed via the same name on the object returned by json_decode. Your weatherbit example $requestedWeather->data works because everything is in a member called data. So... $requestedWeather->properties will get you what you want from the API.
You can also pass true as a second argument to json_decode to get back a plain PHP array instead.
$requestedWeather = json_decode($weatherRequest, true);
This is often recommended because JSON allows member names that are not valid PHP object property names (e.g., names containing hyphens).

Loading Numeric data into BigQuery with Avro files created with goavro

I am trying to figure out how to load dollar values into a Numeric column in BigQuery using an Avro file. I am using golang and the goavro package to generate the avro file.
It appears that the appropriate datatype in go to handle money is big.Rat.
BigQuery documentation indicates it should be possible to use Avro for this.
I can see from a few goavro test cases that encoding a *big.Rat into a fixed.decimal type is possible.
I am using a goavro.OCFWriter to encode data using a simple avro schema as follows:
"type": "record",
"name": "MyData",
"fields": [
"name": "ID",
"type": [
"name": "Cost",
"type": [
"type": "fixed",
"size": 12,
"logicalType": "decimal",
"precision": 4,
"scale": 2
I am attempting to Append data with the "Cost" field as follows:
map[string]interface{}{"fixed.decimal": big.NewRat(617, 50)}
This is successfully encoded, but the resulting avro file fails to load into BigQuery:
Err: load Table MyTable Job: {Location: ""; Message: "Error while reading data, error message: The Apache Avro library failed to parse the header with the following error: Missing Json field \"name\": {\"logicalType\":\"decimal\",\"precision\":4,\"scale\":2,\"size\":12,\"type\":\"fixed\"}"; Reason: "invalid"}
So am doing something wrong here... Hoping someone can point me in the right direction.
I figured it out. I need to use bytes.decimal instead of fixed.decimal
"type": "record",
"name": "MyData",
"fields": [
"name": "ID",
"type": [
"name": "Cost",
"type": [
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
Then encode similarly
map[string]interface{}{"bytes.decimal": big.NewRat(617, 50)}
And it works nicely!

elasticsearch: unable to set geo_shape value using XContentBuilder

I have following mapping in elastic search. I am able to PUT documents using Sense plugin but unable to do so using XContentBuilder to set the geo_shape field value. I am getting following error:
[106]: index [streets], type [street], id [{dc872755-f307-4c5e-93f6-bba9c95791c7}], message [MapperParsingException[failed to parse [shape]]; nested: ElasticsearchParseException[shape must be an object consisting of type and coordinates];]
PUT /streets
"mappings": {
"street": {
"properties": {
"id": {
"type": "string"
"shape": {
"type": "geo_shape",
"tree": "quadtree"
val bulkRequest:BulkRequestBuilder = esClient.prepareBulk()
xb = jsonBuilder().startObject()
xb.field("id", guid)
xb.field("shape", jsonString) // removing this line creates the index OK but without the geo_shape
bulkRequest.add(esClient.prepareIndex("streets", "street", guid).setSource(xb))
//end loop
val bulkResponse:BulkResponse = bulkRequest.execute().actionGet()
"id": "{98b8fd8d-074c-4349-a83b-6e892bf2d0ef}",
"shape": {
"type": "LineString",
"coordinates": [
[-70.81866815832467, 43.12187109162505],
[-70.83054813653018, 43.15917412985851],
[-70.81320737213957, 43.23522269547419],
[-70.90108590067649, 43.28102004268419]
"crs": {
"type": "name",
"properties": {
"name": "EPSG:4326"
Appreciate any feedback?
It might be a bit late for you, but this could help someone facing a similar issue even nowadays.
Following your index mapping for the document streets, we have these properties: id and shape.
In your error message, it's described that:
shape must be an object consisting of type and coordinates
So for your concrete case, the crs array is just not accepted (don't know exactly why you can't add extra parameters).
This is an example for how to add a document into the streets index using CURL:
curl -X POST "localhost:9200/streets/_doc?pretty" -H 'Content-Type: application/json' -d '
"id": 123,
"shape": {
"type": "Polygon",
"coordinates": [
If you need to add a LineString, instead of a Polygon, just change the 'type' attribute from the 'shape'.
I hope this helps people having to add documents with shapes into an ElasticSearch database.
