elasticsearch 5.x : how make a nest match query search - elasticsearch

In the previous version of Nest , i knew how to do the equivalent of a basic es match query with nest:
I created an exemple index and mapping
PUT /base_well
{
"mappings": {
"person": {
"properties": {
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "integer"
}
}
}
}
}
POST /base_well/person
{
"first_name":"Adrien",
"last_name" : "Mopo",
"Age" : 21
}
POST /base_well/person
{
"first_name":"Polo",
"last_name" : "Apou",
"Age" : 36
}
ES request works actually
POST /base_well/person/_search
{
"query":
{
"match":{
"first_name":"Adrien"
}
}
}
this Elasticsearch request give me this answere:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "base_well",
"_type": "person",
"_id": "AVkq9PI5ybdSs0epy_Rb",
"_score": 0.2876821,
"_source": {
"first_name": "Adrien",
"last_name": "Mopo",
"Age": 21
}
}
]
}
}
NEST equivalent that does not work anymore:
public class Person
{
public string first_name {get;set;}
public string last_name { get; set; }
public int Age { get; set; }
}
//nest equivalent does not work anymore
var uri = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(uri);
setting.DisableDirectStreaming(true);
setting.DefaultIndex("base_well");
var Client = new ElasticClient(setting);
var response = Client.Search<Person>(s => s.Query(p => p.Term(q => q.first_name, "Adrien")));
var tooks = response.Took;
var hits = response.Hits;
var total = response.Total;
It gives me 0 documents results , 0 hits
Do you know how to do that in the last version?

var response = Client.Search<Person>(s => s.Query(p => p.Match(m => m.Field(f => f.first_name).Query("Marc"))));

Related

How to get distance from elasticsearch.net / NEST for a geo_point field

I would like to get in my search request the distance for a geo_point..
I already write this request that gives me closest point to my searching param.
ConnectionSettings elasticSettings = new ConnectionSettings(new Uri("http://localhost:9200"));
ElasticClient client = new ElasticClient(elasticSettings);
var searchResults = client.Search<dynamic>(s => s.Index("index1,index2,index3").From(0).Size(10).Query(
q => q.Bool(
b => b.Must(
f => f.GeoDistance(
g => g.Distance(20, DistanceUnit.Kilometers).DistanceType(GeoDistanceType.Arc).Field("geo").Location(lat, lon))))));
I tried a lot of code found on the web but I can not adapt it for my code..
I just want that elasticsearch return me the distance foreach point.
My field in elasticsearch is like that (simple string):
geo 74.875,-179.875
and in another index test, is like that (structured) : the search doesn't works like this
geo {
"lat": 74.875,
"lon": -178.625
}
Is the first or second mapping can have an impact on the query ?
Here is my mapping for the index :
{
"index1": {
"aliases": {},
"mappings": {
"properties": {
"Date": { "type": "date" },
"Value": { "type": "text" },
"geo": { "type": "geo_point" }
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "4",
"provided_name": "index1",
"creation_date": "1569420798736",
"number_of_replicas": "0",
"uuid": "jqc1RRhxSC2e5yJJX2lyzw",
"version": { "created": "7030199" }
}
}
}
}
I integrate a scripfield in my query like that :
var searchResults = client.Search<dynamic>(s => s.Index("index").From(0).Size(100).ScriptFields(sf => sf.ScriptField("distance", d => d.Source("if(doc['geo'].size(){doc['geo'].arcDistance("+ lat+","+ lon + ")}"))).Query(
q => q.Bool(
b => b.Must(
f => f.GeoDistance(
g => g.Distance(20, DistanceUnit.Kilometers).DistanceType(GeoDistanceType.Arc).Field("geo").Location(lat, lon))))));
With this request, I have a "200 successfull" responses and it seems that I it returns me the distance but not the other field, and the 100 documents are null.
Valid NEST response built from a successful (200) low level call on
POST: /index1/_search?typed_keys=true
# Audit trail of this API call:
- [1] HealthyResponse: Node: http://localhost:9200/ Took:
00:00:01.0670113
# Request:
{"from":0,"query":{"bool":{"must":[{"geo_distance":
{"distance":"200km","distance_type":"arc","geo":
{"lat":57.123,"lon":-20.876}}}]}},"script_fields":{"distance":{"script":
{"source":"doc['geo'].arcDistance(57.123,-20.876)"}}},"size":100}
# Response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1203,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "index1",
"_type": "_doc",
"_id": "121197",
"_score": 1.0,
"fields": { "distance": [ 198251.11868760435 ] }
},
{
"_index": "index1",
"_type": "_doc",
"_id": "121198",
"_score": 1.0,
"fields": { "distance": [ 197018.831847128 ] }
},
...98 more
]
}
}
Thank you.
You need to use script field to return distance
"script_fields":{
"distance":{
"script":"doc['latlng'].arcDistance(params.lat,params.lng)",
"params":{
"lat":<some value>,
"lng":<some value>
}
}
}
Nest
var scriptFields = new ScriptFields
{
{
"distance", new ScriptField {
Script = new InlineScript( "if(doc['"+field+"'].size() > 0) { doc['"+field+"'].arcDistance(params.lat,params.lon) }")
{
Params=new FluentDictionary<string, object>
{
{ "lat", latitude},
{ "lon", longitude}
}
}
}
}
};

ElasticSearch Range query

I have created the index by using the following mapping:
put test1
{
"mappings": {
"type1": {
"properties": {
"age": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
}
}
Added following documents into index:
PUT test1/type1/1/_create
{
"age":50
}
PUT test1/type1/2/_create
{
"age":100
}
PUT test1/type1/3/_create
{
"age":150
}
PUT test1/type1/4/_create
{
"age":200
}
I have used the following range query to fetch result:
GET test1/_search
{
"query": {
"range" : {
"age" : {
"lte" : 150
}
}
}
}
It is giving me the following response :
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test1",
"_type": "type1",
"_id": "2",
"_score": 1,
"_source": {
"age": 100
}
},
{
"_index": "test1",
"_type": "type1",
"_id": "3",
"_score": 1,
"_source": {
"age": 150
}
}
]
}
}
the above response not showing document having age is 50 it is showing only age is 100 and 150. As 50 also less than 200. What is wrong here?
Can anyone help me to get a valid result?
In my schema age field type text, I don't want to change it.
How can I get a valid result?
Because age field type is text, the range query is using alphabetically order. So the results are correct:
"100"<"150"
"150"="150"
"50">"150"
If you are ingesting only numbers in age field, you should change the age field type to number, or add another inner field as number, just you did with raw inner field.
UPDATE: Tested on local system and it is working.
NOTE: Ideally, you would want the mappings to be correct, but if there is no other choice and you are not the person to decide on the mapping then you can still achieve it by following.
For ES version 6.3 onwards, try this.
GET test1/type1/_search
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "Integer.parseInt(doc['age.raw'].value) <= 150",
"lang": "painless"
}
}
}
}
}
}
Sources to refer:
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl-script-query.html
https://discuss.elastic.co/t/painscript-script-cast-string-as-int/97034
Type for your field age in mapping is set to text. That is reason it is doing dictionary sorting where 50 > 150. Please use long data type. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Elasticsearch wildcard case-sensitive

How to make wildcard case-insensitive?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html
Since version 7.10 the wildcard query supports special parameter case_insensitive (boolean).
Example of case-insensitive search:
GET /_search
{
"query": {
"wildcard": {
"my_field": {
"value": "ki*y",
"case_insensitive": true
}
}
}
}
Wildcards are not_analyzed. It depends on what analyzers you've provided for the field you're searching. But if you're using the default analyzers then a wildcard query will return case-insensitive results.
Example: Post two names in a sample index one is "Sid" and other "sid".
POST sample/sample
{
"name" : "sid"
}
POST sample/sample
{
"name" : "Sid"
}
Then perform a wildcard query:
GET sample/_search
{
"query": {
"wildcard": {
"name": {
"value": "s*"
}
}
}
}
This will return me both the documents:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "sample",
"_type": "sample",
"_id": "AWRPM87Wb6oopELrnEKE",
"_score": 1,
"_source": {
"name": "Sid"
}
},
{
"_index": "sample",
"_type": "sample",
"_id": "AWRPM9tpb6oopELrnEKF",
"_score": 1,
"_source": {
"name": "sid"
}
}
]
}
}
But if you perform a wildcard query on "S*" it will return nothing. Because the default token filter stores the terms in lowercase and the term "Sid" is stored as "sid" in the inverted index.
In my case this is not true, it is case sensitive by default - I am using ES 7.2.
In you sample the type of the field is "text" not "keyword"
I was looking for the same option for nodejs client, so came across this question, so posting as an answer might help someone else.
I have to convert the term to lowercase and its worked for me *${term.toLowerCase()}*
Here is the complete function
searchUsers(term, from, limit) {
let users = await EsClient.search({
index: 'users',
type: 'users',
body: {
from,
size: limit,
query: {
bool: {
should: [
{
wildcard: {
email: {
value: `*${term.toLowerCase()}*`
}
}
},
{
wildcard: {
"name.keyword": {
value: `*${term.toLowerCase()}*`
}
}
}
],
must_not: {
terms: {_id: blacklist}
}
}
}
}
});
}

Grouping consecutive documents with Elasticsearch

Is there a way to make Elasticsearch consider sequence-gaps when grouping?
Provided that the following data was bulk-imported to Elasticsearch:
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Is there a way to query this data in a way that
the documents with sequence number 1 and 2 go to one output group,
the document with sequence number 3 goes to another one, and
the documents with sequence number 4 and 5 go to a third group?
... considering the fact that the type A sequence is interrupted by a type B item (or any other item that's not type A)?
I would like result buckets to look something like this (name and value for sequence_group may be different - just trying to illustrated the logic):
"buckets": [
{
"key": "a",
"sequence_group": 1,
"doc_count": 2
},
{
"key": "b",
"sequence_group": 3,
"doc_count": 1
},
{
"key": "a",
"sequence_group": 4,
"doc_count": 2
}
]
There is a good description of the problem and some SQL solution-approaches at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/. I would like to know if there is a solution for elasticsearch available as well.
We can use Scripted Metric Aggregation here which works in a map-reduce fashion (Ref link). It has different parts like init, map, combine and reduce. And, the good thing is that the result of all of these could be a list or map too.
I played around a bit on this.
ElasticSearch version used: 7.1
Creating index:
PUT test
{
"mappings": {
"properties": {
"sequence": {
"type": "long"
},
"type": {
"type": "text",
"fielddata": true
}
}
}
}
Bulk indexing: (Note that I removed mapping type 'groupingTest')
POST _bulk
{ "index": { "_index": "test", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Query
GET test/_doc/_search
{
"size": 0,
"aggs": {
"scripted_agg": {
"scripted_metric": {
"init_script": """
state.seqTypeArr = [];
""",
"map_script": """
def seqType = doc.sequence.value + '_' + doc['type'].value;
state.seqTypeArr.add(seqType);
""",
"combine_script": """
def list = [];
for(seqType in state.seqTypeArr) {
list.add(seqType);
}
return list;
""",
"reduce_script": """
def fullList = [];
for(agg_value in states) {
for(x in agg_value) {
fullList.add(x);
}
}
fullList.sort((a,b) -> a.compareTo(b));
def result = [];
def item = new HashMap();
for(int i=0; i<fullList.size(); i++) {
def str = fullList.get(i);
def index = str.indexOf("_");
def ch = str.substring(index+1);
def val = str.substring(0, index);
if(item["key"] == null) {
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
} else if(item["key"] == ch) {
item["doc_count"] = item["doc_count"] + 1;
} else {
result.add(item);
item = new HashMap();
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
}
}
result.add(item);
return result;
"""
}
}
}
}
And, finally the output:
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"scripted_agg" : {
"value" : [
{
"doc_count" : 2,
"sequence_group" : "1",
"key" : "a"
},
{
"doc_count" : 1,
"sequence_group" : "3",
"key" : "b"
},
{
"doc_count" : 2,
"sequence_group" : "4",
"key" : "a"
}
]
}
}
}
Please note that scripted aggregation impacts a lot on the performance of the query. So, you might notice some slowness if there is a large no of documents.
You can always do an terms aggregation and then apply tops hit aggregation to get this.
{
"aggs": {
"types": {
"terms": {
"field": "type"
},
"aggs": {
"groups": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Updating a property of an object in a list in ElasticSearch document?

I'm fairly new to ElasticSearch. I'm using it in a .NET project and I'm using NEST client. Right now I'm examining ways of handling document updates.
I have a document that looks like this:
public class Class1
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
public List<Class2> PropList { get; set; }
}
and when I want to add something to the PropList, I'm doing it by script:
client.Update<Class1>(x => x
.Id(1)
.Index("index_name")
.Script("ctx._source.propList += prop")
.Params(p => p.Add("prop", newProp)));
That works perfectly right now. The problem is when I want to update a property of an object inside propList. The way I'm doing it right now is by retrieving the entire document, finding the object in the list, update the property and index the entire document again, which at some point can result in performance issues.
Is there a way of doing this more efficiently? Maybe using scripts or some other way?
Thanks.
I don't know how to set it up with nest, off-hand, but I would go with a parent/child relationship.
As a toy example, I set up an index with this mapping:
PUT /test_index
{
"mappings": {
"parent_type": {
"properties": {
"num_prop": {
"type": "integer"
},
"str_prop": {
"type": "string"
}
}
},
"child_type": {
"_parent": {
"type": "parent_type"
},
"properties": {
"child_num": {
"type": "integer"
},
"child_str": {
"type": "string"
}
}
}
}
}
then added some data:
POST /test_index/_bulk
{"index":{"_type":"parent_type","_id":1}}
{"num_prop":1,"str_prop":"hello"}
{"index":{"_type":"child_type","_id":1,"_parent":1}}
{"child_num":11,"child_str":"foo"}
{"index":{"_type":"child_type","_id":2,"_parent":1}}
{"child_num":12,"child_str":"bar"}
{"index":{"_type":"parent_type","_id":2}}
{"num_prop":2,"str_prop":"goodbye"}
{"index":{"_type":"child_type","_id":3,"_parent":2}}
{"child_num":21,"child_str":"baz"}
Now if I want to update a child document I can just post a new version:
POST /test_index/child_type/2?parent=1
{
"child_num": 13,
"child_str": "bars"
}
(note that I have to provide the parent id so ES can route the request appropriately)
I can also do a partial, scripted update if I want to:
POST /test_index/child_type/3/_update?parent=2
{
"script": "ctx._source.child_num+=1"
}
We can prove that this worked by searching the child types:
POST /test_index/child_type/_search
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "child_type",
"_id": "1",
"_score": 1,
"_source": {
"child_num": 11,
"child_str": "foo"
}
},
{
"_index": "test_index",
"_type": "child_type",
"_id": "2",
"_score": 1,
"_source": {
"child_num": 13,
"child_str": "bars"
}
},
{
"_index": "test_index",
"_type": "child_type",
"_id": "3",
"_score": 1,
"_source": {
"child_num": 22,
"child_str": "baz"
}
}
]
}
}
Hope this helps. Here is the code I used, plus a few more examples:
http://sense.qbox.io/gist/73f6d2f347a08bfe0c254a977a4a05a68d2f3a8d

Resources