Elastic kibana selection of points (geopoints) in complex polygon - elasticsearch

I'm having trouble with what seems like a fairly basic use case, but I'm hitting certain limitations in Kibana and problems with certain geo data types. It's starting to feel like I'm just approaching it wrong.
I have a relatively large point data set (locations) of type geo_point, with a map and dashboard built. I now want to add a complex AOI. I took the shapefile, dissolved it so it became one feature instead of many, converted it to geojson and uploaded it (to create an index) via the Kibana Maps functionality. I then made it available as layer, and wanted to just allow it to be selected, show tooltip, and then Filter by Feature. Unfortunately I then received an error saying along the lines that this would be too large an operation to be posted to the URL - which I understand as there are over 2 million characters in the geojson.
Instead I thought I could write the query somehow according to the guidance on: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-query.html
with the pre-indexed shape.
However, it doesn't seem to work to allow geo_point to be queried against geo_shape.
e.g.
GET /locations_index/_search
{
"query": {
"geo_point": {
"geolocation": {
"relation": "within",
"indexed_shape": {
"index": "aoi_index",
"id": "GYruUnMBfgunZ6kjA8qn",
"path": "coordinates"
}
}
}
}
}
Gives an error of:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "no [query] registered for [geo_point]",
"line" : 3,
"col" : 18
}
],
"type" : "parsing_exception",
"reason" : "no [query] registered for [geo_point]",
"line" : 3,
"col" : 18
},
"status" : 400
}
Do I need to convert my points index to be geoshape instead of geopoints? Or is there a simpler way?
I note the documentation at: https://www.elastic.co/guide/en/elasticsearch/guide/current/filter-by-geopoint.html suggests that I can query by geo_polygon, but I can't see any way of referencing my pre-indexed shape, instead of having the huge chunk of JSON in the query (as the example suggests).
Can anyone point me (even roughly) in the right direction?
Thanks in advance.

Here's how you can utilize indexed_shape. Let me know if this answer is sufficient to get you started.

Related

Elastic/Opensearch: HowTo create a new document from an _ingest/pipeline

I am working with Elastic/Opensearch and want to create a new document in a different index out of an _ingest/pipeline
I found no help in the www...
All my documents (filebeat) get parsed and modified in the beginning by a pipline, lets say "StartPipeline".
Triggered by an information in a field of the incoming document, lets say "Start", I want to store that value in a special way by creating a new document in a different long-termindex - with some more information from the triggering document.
If found possibilities, how to do this manually from the console (update_by_query / reindex / painlesscripts) but it has to be triggered by an incoming document...
Perhaps this is easier to understand - in my head it looks like something like that.
PUT _ingest/pipeline/StartPipeline
{
"description" : "create a document in/to a different index",
"processors" : [ {
"PutNewDoc" : {
"if": "ctx.FieldThatTriggers== 'start'",
"index": "DestinationIndex",
"_id": "123",
"document": { "message":"",
"script":"start",
"server":"alpha
...}
}
} ]
}
Does anyone has an idea?
And sorry, I am no native speaker, I am from Germany

Elasticsearch 7 number_format_exception for input value as a String

I have field in index with mapping as :
"sequence_number" : {
"type" : "long",
"copy_to" : [
"_custom_all"
]
}
and using search query as
POST /my_index/_search
{
"query": {
"term": {
"sequence_number": {
"value": "we"
}
}
}
}
I am getting error message :
,"index_uuid":"FTAW8qoYTPeTj-cbC5iTRw","index":"my_index","caused_by":{"type":"number_format_exception","reason":"For input string: \"we\""}}}]},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar:7.1.1]
at
How can i ignore number_format_exception errors, so the query just doesn't return anything or ignores this filter in particular - either is acceptable.
Thanks in advance.
What you are looking for is not possible, ideally, you should have coherce enabled on your numeric fields so that your index doesn't contain dirty data.
The best solution is that in your application which generated the Elasticsearch query(you should have a check for NumberFormatExcepton if you are searching for numeric fields as your index doesn't contain the dirty data in the first place and reject the query if you get an exception in your application).
Edit: Another interesting approach is to validate the data before inserting into ES, using the Validate API as suggested by #prakash, only thing is that it would add another network call but if your application is not latency-sensitive, it can be used as a workaround.

Avoid mapping multiple fields in elastic search

I have the following problem when indexing documents in elasticsearch, my documents contain some fields that are not repeated in other documents, so I end having a mapping of more than 100.000 elements. Let's see an example:
If I send something like this to an empty index:
{"example":{
"a1":123,
"a2":444,
"a3":52566,
"a4":7,
.....
"aN":11
}
}
It will create the following mapping:
{"example" : {
"properties" : {
"a1" : {
"type" : "long"
},
"a2" : {
"type" : "long"
},
"a3" : {
"type" : "long"
},
"a4" : {
"type" : "long"
},
.....
"aN" : {
"type" : "long"
}
}
}
}
Then if I send another document:
{"example":{
"b1":123,
"b2":444,
"b3":52566,
"b4":7,
.....
"bN":11
}
}
It will create a mapping double as the one above.
The object is more complex than this, but the situation that I'm having now is that the mapping is that big that is killing the server.
How can I address this? is the multifield working in this scenario? I tried in several ways but it doesn't seem to work.
Thanks.
It is pretty tough to give you a definitive answer given we have no idea of your usecase, but my initial guess is that if you have a mapping of thousands of fields that have no logical bond you've probably made some wrong choices about the architecture of your data. Could you tell us why you need to have thousands of fields that have different names for a single document type ? As it is there's not much we can do to pinpoint you into the right direction.
If you really want to do so, create mapping as on example below:
POST /index_name/_mapping/type_name
{
"type_name": {
"enabled": false
}
}
It will give required behavior. elasticsearch will stop to create mapping for fields, as well as parsing and indexing of your documents.
See these links to get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-object-type.html#_enabled_3

Elasticsearch data representation

I am new to elasticsearch and I thought I will go through the 10 minutes walk through to get started.
But I stumbled upon with some very basic doubts here. I am not able to figure out the data representation here. For eg. the tutorial mentions about creating an index
curl -XPUT http://localhost:9200/shakespeare -d '
{
"mappings" : {
"_default_" : {
"properties" : {
"speaker" : {"type": "string", "index" : "not_analyzed" },
"play_name" : {"type": "string", "index" : "not_analyzed" },
"line_id" : { "type" : "integer" },
"speech_number" : { "type" : "integer" }
}
}
}
}
';
I understand that this is a JSON string, but beyond that I am not able to understand this representation? I am not getting what is default, what is meant by not_analyzed and so on.
Is there any standard that needs to be understood on how the data is represented before proceeding with elasticsearch?
I am totally new to elasticsearch and would really appreciate if I am guided with some information/tutorial which would help me understand how to start learning this technology.
Thanks & Regards
Sunil
I think that the main aim of the 10 minutes walk through is to give a quick demo about Kibana and not a full understanding of elasticsearch (mapping,indexing,etc.)
But if you wish to understand what's happening in that example, you might want to know how to go through the documentation.
Example :
default mapping :
Often, all types in an index share similar fields and settings.
It can be more convenient to specify these common settings in
the _default_ mapping, instead of having to repeat yourself every
time you create a new type. The _default_ mapping acts as a
template for new types. All types created after the _default_
mapping will include all of these default settings, unless
explicitly overridden in the type mapping itself.
And for more details about default mapping, please refer here.
The 10 minute walk-thru is for Kibana, running on top of Elasticsearch, and IMHO is not a great place to start when getting to know ES.
Personally over the last few years I've these introductions to be helpful:
http://joelabrahamsson.com/elasticsearch-101/
http://exploringelasticsearch.com/overview.html
Overall the ES documentation is reasonably complete, looks great but can be hard to navigate thru for a novice to find exactly what you need.

Is it possible to add filters when performing a GET elasticsearch?

I have a situation where I want to filter the results not when performing a search but rather a GET using elasticsearch. Basically I have a Document that has a status field indicating that the entity has a state of discarded. When performing the GET I need to check the value of this field thereby excluding it if the status is indeed one of "discarded".
I know i can do this using a search with a term query, but what about when using a GET against the index based on Document ID?
Update: Upon further investigation, it seems the only way to do this is to use percolation or a search. I hope I am wrong if anyone has any suggestions I am all ears.
Just to clarify I am using the Java API.
thanks
Try something like this:
curl http://domain/my_index/_search -d '{
"filter": {
"and": [
{
"ids" : {
"type" : "my_type",
"values" : ["123"]
}
},
{
"term" : {
"discarded" : "false"
}
}
]
}
}
NOTE: you can also use a missing filter if the discarded field does not exist on some docs.
NOTE 2: I don't think this will be markedly slower than a normal get request either...

Resources