How to get distance from elasticsearch.net / NEST for a geo_point field - elasticsearch

I would like to get in my search request the distance for a geo_point..
I already write this request that gives me closest point to my searching param.
ConnectionSettings elasticSettings = new ConnectionSettings(new Uri("http://localhost:9200"));
ElasticClient client = new ElasticClient(elasticSettings);
var searchResults = client.Search<dynamic>(s => s.Index("index1,index2,index3").From(0).Size(10).Query(
q => q.Bool(
b => b.Must(
f => f.GeoDistance(
g => g.Distance(20, DistanceUnit.Kilometers).DistanceType(GeoDistanceType.Arc).Field("geo").Location(lat, lon))))));
I tried a lot of code found on the web but I can not adapt it for my code..
I just want that elasticsearch return me the distance foreach point.
My field in elasticsearch is like that (simple string):
geo 74.875,-179.875
and in another index test, is like that (structured) : the search doesn't works like this
geo {
"lat": 74.875,
"lon": -178.625
}
Is the first or second mapping can have an impact on the query ?
Here is my mapping for the index :
{
"index1": {
"aliases": {},
"mappings": {
"properties": {
"Date": { "type": "date" },
"Value": { "type": "text" },
"geo": { "type": "geo_point" }
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "4",
"provided_name": "index1",
"creation_date": "1569420798736",
"number_of_replicas": "0",
"uuid": "jqc1RRhxSC2e5yJJX2lyzw",
"version": { "created": "7030199" }
}
}
}
}
I integrate a scripfield in my query like that :
var searchResults = client.Search<dynamic>(s => s.Index("index").From(0).Size(100).ScriptFields(sf => sf.ScriptField("distance", d => d.Source("if(doc['geo'].size(){doc['geo'].arcDistance("+ lat+","+ lon + ")}"))).Query(
q => q.Bool(
b => b.Must(
f => f.GeoDistance(
g => g.Distance(20, DistanceUnit.Kilometers).DistanceType(GeoDistanceType.Arc).Field("geo").Location(lat, lon))))));
With this request, I have a "200 successfull" responses and it seems that I it returns me the distance but not the other field, and the 100 documents are null.
Valid NEST response built from a successful (200) low level call on
POST: /index1/_search?typed_keys=true
# Audit trail of this API call:
- [1] HealthyResponse: Node: http://localhost:9200/ Took:
00:00:01.0670113
# Request:
{"from":0,"query":{"bool":{"must":[{"geo_distance":
{"distance":"200km","distance_type":"arc","geo":
{"lat":57.123,"lon":-20.876}}}]}},"script_fields":{"distance":{"script":
{"source":"doc['geo'].arcDistance(57.123,-20.876)"}}},"size":100}
# Response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1203,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "index1",
"_type": "_doc",
"_id": "121197",
"_score": 1.0,
"fields": { "distance": [ 198251.11868760435 ] }
},
{
"_index": "index1",
"_type": "_doc",
"_id": "121198",
"_score": 1.0,
"fields": { "distance": [ 197018.831847128 ] }
},
...98 more
]
}
}
Thank you.

You need to use script field to return distance
"script_fields":{
"distance":{
"script":"doc['latlng'].arcDistance(params.lat,params.lng)",
"params":{
"lat":<some value>,
"lng":<some value>
}
}
}
Nest
var scriptFields = new ScriptFields
{
{
"distance", new ScriptField {
Script = new InlineScript( "if(doc['"+field+"'].size() > 0) { doc['"+field+"'].arcDistance(params.lat,params.lon) }")
{
Params=new FluentDictionary<string, object>
{
{ "lat", latitude},
{ "lon", longitude}
}
}
}
}
};

Related

Change field type in index without reindex

First, I had this index template
GET localhost:9200/_index_template/document
And this is output
{
"index_templates": [
{
"name": "document",
"index_template": {
"index_patterns": [
"v*-documents-*"
],
"template": {
"settings": {
"index": {
"number_of_shards": "1"
}
},
"mappings": {
"properties": {
"firstOperationAtUtc": {
"format": "epoch_millis",
"ignore_malformed": true,
"type": "date"
},
"firstOperationAtUtcDate": {
"ignore_malformed": true,
"type": "date"
}
}
},
"aliases": {
"documents-": {}
}
},
"composed_of": [],
"priority": 501,
"version": 1
}
}
]
}
And my data is indexed, for example
GET localhost:9200/v2-documents-2021-11-20/_search
{
"query": {
"bool": {
"should": [
{
"exists": {
"field": "firstOperationAtUtc"
}
}
]
}
}
}
Output is
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "v2-documents-2021-11-20",
"_type": "_doc",
"_id": "9b46d6fe78735274342d1bc539b084510000000455",
"_score": 1.0,
"_source": {
"firstOperationAtUtc": 1556868952000,
"firstOperationAtUtcDate": "2019-05-03T13:35:52.000Z"
}
}
]
}
}
Next, I need to update mapping for field firstOperationAtUtc and remove format epoch_millis
localhost:9200/_template/document
{
"index_patterns": [
"v*-documents-*"
],
"template": {
"settings": {
"index": {
"number_of_shards": "1"
}
},
"mappings": {
"properties": {
"firstOperationAtUtc": {
"ignore_malformed": true,
"type": "date"
},
"firstOperationAtUtcDate": {
"ignore_malformed": true,
"type": "date"
}
}
},
"aliases": {
"documents-": {}
}
},
"version": 1
}
After that, If I get previous request I still have indexed data.
But now I need to update field firstOperationAtUtc and set data from firstOperationAtUtcDate
localhost:9200/v2-documents-2021-11-20/_update_by_query
{
"script": {
"source": "if (ctx._source.firstOperationAtUtcDate != null) { ctx._source.firstOperationAtUtc = ctx._source.firstOperationAtUtcDate }",
"lang": "painless"
},
"query": {
"match": {
"_id": "9b46d6fe78735274342d1bc539b084510000000455"
}
}
}
After that, if I get previous request
GET localhost:9200/v2-documents-2021-11-20/_search
{
"query": {
"bool": {
"should": [
{
"exists": {
"field": "firstOperationAtUtc"
}
}
]
}
}
}
I have no indexed data
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But if I find with id, I will get this data with modify data but my field is ignored
GET localhost:9200/v2-documents-2021-11-20/_search
{
"query": {
"terms": {
"_id": [ "9b46d6fe78735274342d1bc539b084510000000455" ]
}
}
}
Output is
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "v2-documents-2021-11-20",
"_type": "_doc",
"_id": "9b46d6fe78735274342d1bc539b084510000000455",
"_score": 1.0,
"_ignored": [
"firstOperationAtUtc"
],
"_source": {
"firstOperationAtUtc": "2019-05-03T13:35:52.000Z",
"firstOperationAtUtcDate": "2019-05-03T13:35:52.000Z"
}
}
]
}
}
How I could indexed data without reindex? Because I have milliard data in index and this could may produce huge downtime in prod
What you changed is the index template, but not your index mapping. The index template is used only when a new index that matches the name pattern is created.
What you want to do is to modify the actual mapping of your index, like this:
PUT test/_mapping
{
"properties": {
"firstOperationAtUtc": {
"ignore_malformed": true,
"type": "date"
}
}
}
However, this won't be possible and you will get the following error, which makes sense as you cannot modify an existing field mapping.
Mapper for [firstOperationAtUtc] conflicts with existing mapper:
Cannot update parameter [format] from [epoch_millis] to [strict_date_optional_time||epoch_millis]
The only reason why your update by query seemed to work is because you have "ignore_malformed": true in your mapping. Because if you remove that parameter and try to run your update by query again, you'd see the following error:
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [firstOperationAtUtc] of type [date] in document with id '2'. Preview of field's value: '2019-05-03T13:35:52.000Z'",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "failed to parse date field [2019-05-03T13:35:52.000Z] with format [epoch_millis]",
"caused_by" : {
"type" : "date_time_parse_exception",
"reason" : "date_time_parse_exception: Failed to parse with all enclosed parsers"
}
}
So, to wrap it up, you have two options:
Create a new index with the right mapping and reindex your old index into it, but that doesn't seem like an option for you.
Create a new field in your existing index mapping (e.g. firstOperationAtUtcTime) and discard the use of firstOperationAtUtc
The steps would be:
Modify the index template to add the new field
Modify the actual index mapping to add the new field
Run your update by query by modifying the script to write your new field
In short:
# 1. Modify your index template
# 2. modify your actual index mapping
PUT v2-documents-2021-11-20/_mapping
{
"properties": {
"firstOperationAtUtcTime": {
"ignore_malformed": true,
"type": "date"
}
}
}
# 3. Run update by query again
POST v2-documents-2021-11-20/_update_by_query
{
"script": {
"source": "if (ctx._source.firstOperationAtUtcDate != null) { ctx._source.firstOperationAtUtcTime = ctx._source.firstOperationAtUtcDate; ctx._source.remove('firstOperationAtUtc')}",
"lang": "painless"
},
"query": {
"match": {
"_id": "9b46d6fe78735274342d1bc539b084510000000455"
}
}
}

How to perform elastic search _update_by_query using painless script - for complex condition

Can you suggest how to update documents (with a script - i guess painless) that based on condition fields?
its purpose is to add/or remove values from the document
so if I have those input documents:
doc //1st
{
"Tags":["foo"],
"flag":"true"
}
doc //2nd
{
"flag":"true"
}
doc //3rd
{
"Tags": ["goo"],
"flag":"false"
}
And I want to perform something like this:
Update all documents that have "flag=true" with:
Added tags: "me", "one"
Deleted tags: "goo","foo"
so expected result should be something like:
doc //1st
{
"Tags":["me","one"],
"flag":"true"
}
doc //2nd
{
"Tags":["me","one"],
"flag":"true"
}
doc //3rd
{
"Tags": ["goo"],
"flag":"false"
}
Create mapping:
PUT documents
{
"mappings": {
"document": {
"properties": {
"tags": {
"type": "keyword",
"index": "not_analyzed"
},
"flag": {
"type": "boolean"
}
}
}
}
}
Insert first doc:
PUT documents/document/1
{
"tags":["foo"],
"flag": true
}
Insert second doc (keep in mind that for empty tags I specified empty tags array because if you don't have field at all you will need to check in script does field exists):
PUT documents/document/2
{
"tags": [],
"flag": true
}
Add third doc:
PUT documents/document/3
{
"tags": ["goo"],
"flag": false
}
And then run _update_by_query which has two arrays as params one for elements to add and one for elements to remove:
POST documents/_update_by_query
{
"script": {
"inline": "for(int i = 0; i < params.add_tags.size(); i++) { if(!ctx._source.tags.contains(params.add_tags[i].value)) { ctx._source.tags.add(params.add_tags[i].value)}} for(int i = 0; i < params.remove_tags.size(); i++) { if(ctx._source.tags.contains(params.remove_tags[i].value)){ctx._source.tags.removeAll(Collections.singleton(params.remove_tags[i].value))}}",
"params": {
"add_tags": [
{"value": "me"},
{"value": "one"}
],
"remove_tags": [
{"value": "goo"},
{"value": "foo"}
]
}
},
"query": {
"bool": {
"must": [
{"term": {"flag": true}}
]
}
}
}
If you then do following search:
GET documents/_search
you will get following result (which I think is what you want):
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [{
"_index": "documents",
"_type": "document",
"_id": "2",
"_score": 1,
"_source": {
"flag": true,
"tags": [
"me",
"one"
]
}
},
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"flag": true,
"tags": [
"me",
"one"
]
}
},
{
"_index": "documents",
"_type": "document",
"_id": "3",
"_score": 1,
"_source": {
"tags": [
"goo"
],
"flag": false
}
}
]
}
}

elasticsearch 5.x : how make a nest match query search

In the previous version of Nest , i knew how to do the equivalent of a basic es match query with nest:
I created an exemple index and mapping
PUT /base_well
{
"mappings": {
"person": {
"properties": {
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "integer"
}
}
}
}
}
POST /base_well/person
{
"first_name":"Adrien",
"last_name" : "Mopo",
"Age" : 21
}
POST /base_well/person
{
"first_name":"Polo",
"last_name" : "Apou",
"Age" : 36
}
ES request works actually
POST /base_well/person/_search
{
"query":
{
"match":{
"first_name":"Adrien"
}
}
}
this Elasticsearch request give me this answere:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "base_well",
"_type": "person",
"_id": "AVkq9PI5ybdSs0epy_Rb",
"_score": 0.2876821,
"_source": {
"first_name": "Adrien",
"last_name": "Mopo",
"Age": 21
}
}
]
}
}
NEST equivalent that does not work anymore:
public class Person
{
public string first_name {get;set;}
public string last_name { get; set; }
public int Age { get; set; }
}
//nest equivalent does not work anymore
var uri = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(uri);
setting.DisableDirectStreaming(true);
setting.DefaultIndex("base_well");
var Client = new ElasticClient(setting);
var response = Client.Search<Person>(s => s.Query(p => p.Term(q => q.first_name, "Adrien")));
var tooks = response.Took;
var hits = response.Hits;
var total = response.Total;
It gives me 0 documents results , 0 hits
Do you know how to do that in the last version?
var response = Client.Search<Person>(s => s.Query(p => p.Match(m => m.Field(f => f.first_name).Query("Marc"))));

Grouping consecutive documents with Elasticsearch

Is there a way to make Elasticsearch consider sequence-gaps when grouping?
Provided that the following data was bulk-imported to Elasticsearch:
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Is there a way to query this data in a way that
the documents with sequence number 1 and 2 go to one output group,
the document with sequence number 3 goes to another one, and
the documents with sequence number 4 and 5 go to a third group?
... considering the fact that the type A sequence is interrupted by a type B item (or any other item that's not type A)?
I would like result buckets to look something like this (name and value for sequence_group may be different - just trying to illustrated the logic):
"buckets": [
{
"key": "a",
"sequence_group": 1,
"doc_count": 2
},
{
"key": "b",
"sequence_group": 3,
"doc_count": 1
},
{
"key": "a",
"sequence_group": 4,
"doc_count": 2
}
]
There is a good description of the problem and some SQL solution-approaches at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/. I would like to know if there is a solution for elasticsearch available as well.
We can use Scripted Metric Aggregation here which works in a map-reduce fashion (Ref link). It has different parts like init, map, combine and reduce. And, the good thing is that the result of all of these could be a list or map too.
I played around a bit on this.
ElasticSearch version used: 7.1
Creating index:
PUT test
{
"mappings": {
"properties": {
"sequence": {
"type": "long"
},
"type": {
"type": "text",
"fielddata": true
}
}
}
}
Bulk indexing: (Note that I removed mapping type 'groupingTest')
POST _bulk
{ "index": { "_index": "test", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Query
GET test/_doc/_search
{
"size": 0,
"aggs": {
"scripted_agg": {
"scripted_metric": {
"init_script": """
state.seqTypeArr = [];
""",
"map_script": """
def seqType = doc.sequence.value + '_' + doc['type'].value;
state.seqTypeArr.add(seqType);
""",
"combine_script": """
def list = [];
for(seqType in state.seqTypeArr) {
list.add(seqType);
}
return list;
""",
"reduce_script": """
def fullList = [];
for(agg_value in states) {
for(x in agg_value) {
fullList.add(x);
}
}
fullList.sort((a,b) -> a.compareTo(b));
def result = [];
def item = new HashMap();
for(int i=0; i<fullList.size(); i++) {
def str = fullList.get(i);
def index = str.indexOf("_");
def ch = str.substring(index+1);
def val = str.substring(0, index);
if(item["key"] == null) {
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
} else if(item["key"] == ch) {
item["doc_count"] = item["doc_count"] + 1;
} else {
result.add(item);
item = new HashMap();
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
}
}
result.add(item);
return result;
"""
}
}
}
}
And, finally the output:
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"scripted_agg" : {
"value" : [
{
"doc_count" : 2,
"sequence_group" : "1",
"key" : "a"
},
{
"doc_count" : 1,
"sequence_group" : "3",
"key" : "b"
},
{
"doc_count" : 2,
"sequence_group" : "4",
"key" : "a"
}
]
}
}
}
Please note that scripted aggregation impacts a lot on the performance of the query. So, you might notice some slowness if there is a large no of documents.
You can always do an terms aggregation and then apply tops hit aggregation to get this.
{
"aggs": {
"types": {
"terms": {
"field": "type"
},
"aggs": {
"groups": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

Resources