Elasticsearch dynamic mapping with geopoint - elasticsearch

We use Elasticsearch to index schemaless data. The thing is that the majority of the entries that we want to index contain fields like "longitude", "latitude", "lat" or "long".
What would be the best way to index that data so the field type allows search with geo distance filter ?
Thanks a lot.

I know it's been some time since you posted this but in case someone stumbles upon it like I did, here's some ways to do it.
In our case, we needed a dynamic radius so here's the mapping we have:
"mappings": {
"mygeopoints": {
"properties": {
"geopoint": {
"type": "geo_point",
"lat_lon" : true
},
"radius": {
"type": "long"
}
}
}
}
Our document is indexed using a SQL query that looks like that:
SELECT label, (lat || ',' || lon) as geopoint, radius FROM points;
We're sending the geopoint as a string that contains both latitude and longitude seperated by a coma.
To now search through the points you can use the geo_distance filter:
"filter" : {
"geo_distance" : {
"geopoint" : [ 5.7, 43.5 ],
"distance" : "15km"
}
}
On our side though, we needed a dynamic range so we did not find any other solution than using a script filter.
"filter" : {
"script" : {
"script" : "!doc['geopoint'].empty && doc['geopoint'].distanceInKm(43.5,5.7) <= doc['radius'].value"
}
}

Related

Using Regexp Search inside a must bool query vs using must_not bool query

I want to make queries like - get all documents containing/not containing "some value" for a given field
-get all documents having value equal/not equal to "some value" for a given field.
As per my mapping the fields are String type meaning they support both keyword and full text search something like:
"myField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
I was initially using regex matching like(this query is for not matches) :
"bool": {
"must":[
{
"regexp": {
"myField.keyword": {
"value": "~(some value)",
"flags": "ALL"
}
}
}
]
}
so, basically ~(word) for not, .*word.* for contains and ~(.*word.*) for not containing.
But, then also came across the 'must_not' bool query, so I understand I can also add a 'must_not' for the not equals cases clause along with the 'must' and 'should' clauses(for boolean AND and OR between other fields) in my bigger bool query, but still not sure about contains and not contains search, can someone definitively explain, what is the best practice here speaking both in terms of performance and accuracy of the result set returned.
ElasticSearch version used - Currently transitioning from v 6.3 to v 7.1.1

ElasticSearch append non matched docs at the end of the search result

Is there any way to append non matched docs at the end of the search result?
I have been working on a project where we need to search docs by geolocation data but some docs don't have the geolocation data available. As a result of that these docs not returning in the search result.
Is there any way to append non matched docs at the end of the search result?
Example mapping:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
Data with geo location:
PUT /my_locations/_doc/1
{
"address" : {
"city: "XYZ",
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Data without geo location:
PUT /my_locations/_doc/2
{
"address" : {
"city: "ABC"
}
}
Is there any way to perform geo distance query which will select the docs with geolocation data plus append the non geo docs at the end of the result?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html#query-dsl-geo-distance-query
You have two separate queries
Get documents within the area
Get other documents
To get both of these in one search, would mean all of the documents appear in one result, and share ranking. It would be difficult to create a relevancy model which gets first 9 documents with address, and one without.
But you can just run two queries at once, one for say, the first 9 documents with location, and one for without any.
Example:
GET my_locations/_msearch
{}
{"size":9,"query":{"geo_distance":{"distance":"200km","pin.location":{"lat":40,"lon":-70}}}}
{}
{"size":1,"query":{"bool":{"must_not":[{"exists":{"field":"pin.location"}}]}}}

Mapping to limit length of Array datatype in Elasticsearch

I'm trying to create an elasticsearch Mapping which limits the length of an array datatype to x number of items.
mapping = """
{
"mappings": {
"document": {
"properties": {
"pages": {
"type": "text"
}
}
}
}
}
}
"""
in this case, how do I set the "pages" array to have a maximum of 1,000 list items? Also, is there a way to "ignore" insert errors triggered by ES when this limit has been reached?
Elasticsearch has no such limits, you'd have to enforce it in your application.
As for ignoring errors look at the ignore_malformed option for many fields.
Hope this helps!
Thanks Honza !
I had assumed so eventually ... to expand on your answer, here's how I'm inserting/indexing documents now:
data = {
"_op_type": "index",
"_index" : "myIndex",
"_type" : "document",
'script' : {
'inline': 'if(ctx._source.pages.length < 1001){ ctx._source.pages.add(params.page);}',
'params' : {
"page" : "{}".format(item['page'])
}
}
}
I'm using the script field, combined with the "painless" language to check the field length before indexing the document.
Note, I'm using Python Elasticsearch library's bulk helper in the above example, which is why you see the "_op_type" field.

Avoid mapping multiple fields in elastic search

I have the following problem when indexing documents in elasticsearch, my documents contain some fields that are not repeated in other documents, so I end having a mapping of more than 100.000 elements. Let's see an example:
If I send something like this to an empty index:
{"example":{
"a1":123,
"a2":444,
"a3":52566,
"a4":7,
.....
"aN":11
}
}
It will create the following mapping:
{"example" : {
"properties" : {
"a1" : {
"type" : "long"
},
"a2" : {
"type" : "long"
},
"a3" : {
"type" : "long"
},
"a4" : {
"type" : "long"
},
.....
"aN" : {
"type" : "long"
}
}
}
}
Then if I send another document:
{"example":{
"b1":123,
"b2":444,
"b3":52566,
"b4":7,
.....
"bN":11
}
}
It will create a mapping double as the one above.
The object is more complex than this, but the situation that I'm having now is that the mapping is that big that is killing the server.
How can I address this? is the multifield working in this scenario? I tried in several ways but it doesn't seem to work.
Thanks.
It is pretty tough to give you a definitive answer given we have no idea of your usecase, but my initial guess is that if you have a mapping of thousands of fields that have no logical bond you've probably made some wrong choices about the architecture of your data. Could you tell us why you need to have thousands of fields that have different names for a single document type ? As it is there's not much we can do to pinpoint you into the right direction.
If you really want to do so, create mapping as on example below:
POST /index_name/_mapping/type_name
{
"type_name": {
"enabled": false
}
}
It will give required behavior. elasticsearch will stop to create mapping for fields, as well as parsing and indexing of your documents.
See these links to get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-object-type.html#_enabled_3

How to use lucene SpanQuery in ElasticSearch

For my project, I thought of using Span Near Queries of ElasticSearch, with the constraint that is, certain tokens may have to searched with Fuzziness. I was able to generate a set of SpanQuery (org.apache.lucene.search.spans.SpanQuery) objects some with fuzzy enabled, some without. I couldn't figure out how to use these set of SpanQueries in ElasticSearch spanNearQuery.
Can someone help me out with right pointers to samples or docs. And is there any way to construct ES SpanNearQueryBuilder with some clauses fuzzy enabled ?
You can wrap an fuzzy query into a span query with Span Multi Term Query:
{
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_multi" :
"match" : {
"prefix" : { "user" : { "field" : "value2" } }
}
}
],
...
}
}

Resources