NEST 2.0 with Elasticsearch for GeoDistance always returns all records - elasticsearch

I have the below code using C# .NET 4.5 and NEST 2.0 via nuget. This query always returns my type 'trackpointes' with the total number of documents with this distance search code. I have 2,790 documents and the count return is just that. Even for 1 centimeter as the distance unit it returns all 2,790 documents. My type of 'trackpointes' has a location field, type of geo_point, geohash true, and geohash_precision of 9.
I am just trying to filter results based on distance without any other search terms and for my 2,790 records it returns them all regardless of the unit of measurement. So I have to be missing something (hopefully small). Any help is appreciated. The NEST examples I can find are a year or two old and that syntax does not seem to work any more.
double distance = 4.0;
var geoResult = client.Search<TrackPointES>(s => s.From(0).Size(10000).Type("trackpointes")
.Query(query => query
.Bool( b => b.Filter(filter => filter
.GeoDistance(geo => geo
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);
If I use POSTMAN to connect to my instance of ES and POST a search w/ the below JSON, I get a return of 143 total documents out of 2,790. So I know the data is right as that is a realistic return.
{
"query" : {
"filtered" : {
"filter" : {
"geo_distance" : {
"distance" : "4km",
"location" : {
"top_left": {
"lat" : 35,
"lon" : -82
}
}
}
}
}
}
}

Looks like you didn't specify field in your query. Try this one:
var geoResult = client.Search<Document>(s => s.From(0).Size(10000)
.Query(query => query
.Bool(b => b.Filter(filter => filter
.GeoDistance(geo => geo
.Field(f => f.Location) //<- this
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);

I forgot to specify the field to search for the location. :( But I am posting here just in case someone else has the same issue and to shame myself into trying harder...
.Field(p => p.location) was the difference in the query.
var geoResult = client.Search<TrackPointES>(s => s.From(0).Size(10000).Type("trackpointes")
.Query(query => query
.Bool( b => b.Filter(filter => filter
.GeoDistance(geo => geo.Field(p => p.location).DistanceType(Nest.GeoDistanceType.SloppyArc)
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);

Related

How to create a custom analyzer to ignore accents and pt-br stopwords using elasticsearch nest api?

First of all, consider that I am using a "News" Class (Noticia, in portuguese) that has a string field called "Content" (Conteudo in portuguese)
public class Noticia
{
public string Conteudo { get; set; }
}
I am trying to create an index that is configured to ignore accents and pt-br stopwords as well as to allow up to 40mi chars to be analysed in a highligthed query.
I can create such an index using this code:
var createIndexResponse = client.Indices.Create(indexName, c => c
.Settings(s => s
.Setting("highlight.max_analyzed_offset" , 40000000)
.Analysis(analysis => analysis
.TokenFilters(tokenfilters => tokenfilters
.AsciiFolding("folding-accent", ft => ft
)
.Stop("stoping-br", st => st
.StopWords("_brazilian_")
)
)
.Analyzers(analyzers => analyzers
.Custom("folding-analyzer", cc => cc
.Tokenizer("standard")
.Filters("folding-accent", "stoping-br")
)
)
)
)
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Conteudo)
.Analyzer("folding-analyzer")
)
)
)
);
If I test this analyzer using Kibana Dev Tools, I get the result that I want: No accents and stopwords removed!
POST intranet/_analyze
{
"analyzer": "folding-analyzer",
"text": "Férias de todos os funcionários"
}
Result:
{
"tokens" : [
{
"token" : "Ferias",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "funcionarios",
"start_offset" : 19,
"end_offset" : 31,
"type" : "<ALPHANUM>",
"position" : 4
}
]
}
The same (good) results are being returned when I use NEST to analyze a query using my folding analyser (Tokens "Ferias" e "funcionarios" are returned)
var analyzeResponse = client.Indices.Analyze(a => a
.Index(indexName)
.Analyzer("folding-analyzer")
.Text("Férias de todos os funcionários")
);
However, If I perform a search using NEST ElasticSearch .NET client, terms like "Férias" (with accent) and "Ferias" (without accent) are beign treated as different.
My goal is to perform a query that returns all results, no matter if the word is Férias or Ferias
Thats the simplified code (C# nest) I am using to query elasticsearch:
var searchResponse = ElasticClient.Search<Noticia>(s => s
.Index(indexName)
.Query(q => q
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Titulo,4)
.Field(p => p.Conteudo,2)
)
.Query(termo)
)
)
);
and that's the extended API call associated with the searchResponse
Successful (200) low level call on POST: /intranet/_search?pretty=true&error_trace=true&typed_keys=true
# Audit trail of this API call:
- [1] HealthyResponse: Node: ###NODE ADDRESS### Took: 00:00:00.3880295
# Request:
{"query":{"multi_match":{"fields":["categoria^1","titulo^4","ementa^3","conteudo^2","attachments.attachment.content^1"],"query":"Ferias"}},"size":100}
# Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 13.788051,
"hits" : [
{
"_index" : "intranet",
"_type" : "_doc",
"_id" : "4934",
"_score" : 13.788051,
"_source" : {
"conteudo" : "blablabla ferias blablabla",
"attachments" : [ ],
"categoria" : "Novidades da Biblioteca - DBD",
"publicadaEm" : "2008-10-14T00:00:00",
"titulo" : "INFORMATIVO DE DIREITO ADMINISTRATIVO E LRF - JUL/2008",
"ementa" : "blablabla",
"matriculaAutor" : 900794,
"atualizadaEm" : "2009-02-03T13:44:00",
"id" : 4934,
"indexacaoAtiva" : true,
"status" : "Disponível"
}
}
]
}
}
I have also tryed to use Multi Fields and Suffix in a query, without success
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Conteudo)
.Analyzer("folding-analyzer")
.Fields(f => f
.Text(ss => ss
.Name("folding")
.Analyzer("folding-analyzer")
)
)
(...)
var searchResponse = ElasticClient.Search<Noticia>(s => s
.Index(indexName)
.Query(q => q
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Titulo,4)
.Field(p => p.Conteudo.Suffix("folding"),2)
)
.Query(termo)
)
)
);
Any clue what I am doing wrong or what I can do to reach my goal?
Thanks a lot in advance!
After a few days I found out what I was doing wrong and it was all about the mapping.
Here are the steps I took to approach the problem and solve it in the end
1 - first of all I`ve opened kibana console and found out that only the last field of my mapped fields was being assigned to my custom analyser (folding-analyser)
To test each one of your fields you can use the GET FIELD MAPPING API and a command in dev tools like this:
GET /<index>/_mapping/field/<field>
then you'll be able to see if your analyser is being assigned to your field or not
2 - After that, I discovered that the last field was the only one being assigned to my custom analyser and the reason was because I was messing up with fluent mapping in two ways:
First of all, I had to chain my text properties correctly
Second of all, I was trying to map another POCO class in another Map<> clause, when I was supposed to use the Object<> clause
the correct mapping that worked for me was a bit like this:
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Field1)
.Analyzer("folding-analyzer")
)
.Text(t => t
.Name(n => n.Field2)
.Analyzer("folding-analyzer")
)
.Object<NoticiaArquivo>(o => o
.Name(n => n.Arquivos)
.Properties(eps => eps
.Text(s => s
.Name(e => e.NAField1)
.Analyzer("folding-analyzer")
)
.Text(s => s
.Name(e => e.NAField2)
.Analyzer("folding-analyzer")
)
)
)
)
)
Finally, It's important to share that when you assign an analyser using the .Analyzer("analiserName") clause, you're telling elastic search that you want to use the argument analyser both for indexing and search
If you want to use an analyser only when you search and not on indexing time, you should use the .SearchAnalyzer("analiserName") clause.

Dynamic field list for MultiMatch - Nest

We have a requirement to have a search for a document type with a variable/dynamic number of fields being queried against. For one search/type it might be Name and Status. For another, the Description field. The fields to be searched against will be chosen by the user at run time.
To do this statically appears easy. Something like this to search in Name and Description fields. (Assume that rootQuery is a valid searchDescriptor ready for the query.
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => ff.Name).Field(ff => ff.Description))));
However, we don't want to have a library of static queries to handle the potential permutations if possible. We'd rather do something dynamic like:
foreach (var field in string-list-of-fields-from-user)
{
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => field);
}
Is this possible? If so, how?
You can pass the string list of fields directly to .Fields(...)
var searchResponse = client.Search<Document>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Query("query")
.Fields(new string[] { "field1", "field2", "field3" })
)
)
);
which yields
{
"query": {
"multi_match": {
"fields": ["field1", "field2", "field3"],
"query": "query"
}
}
}

Fos Elastica remove common words(or, and etc..) from search query

Hello I`m trying to get query results using FosElasticaBundle with this query, I
can't find a working example for filtering common words like (and, or) if it is possible this words not to be highlighted also would be really good. My struggle so far :
$searchForm = $this->createForm(SearchFormType::class, null);
$searchForm->handleRequest($request);
$matchQuery = new \Elastica\Query\Match();
$matchQuery->setField('_all', $queryString);
$searchQuery = new \Elastica\Query();
$searchQuery->setQuery($matchQuery);
$searchQuery->setHighlight(array(
"fields" => array(
"title" => new \stdClass(),
"content" => new \stdClass()
),
'pre_tags' => [
'<strong>'
],
'post_tags' => [
'</strong>'
],
'number_of_fragments' => [
'0'
]
));
Thanks in advance ;)
Do you want (and, or) to be ignored or not to have a value on your search?
If that's the case you may want to use stop words on your elasticsearch index.
Here's a reference.
https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html

Search result fluctuations

I have bunch of collections with documents and i have encountered so,ething starnge. When I execute same request few times in a row result change consecutively
It would be fine if it's small fluctuations, but count of results changes on ~75000 of documents
So I have a question what's going on
My request is:
POST mycollection/mytype/_search
{
"fields": ["timestamp", "bool_field"],
"filter" : {
"terms":{
"bool_field" : [true]
}
}
}
results are going like this:
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
When count is 148k
I see some records with bool_field: "False" in Sense

elasticsearch nest support of filters in functionscore function

I am currently trying to implement a "function_score" query in NEST, with functions that are only applied when a filter matches.
It doesn't look like FunctionScoreFunctionsDescriptor supports adding a filter yet. Is this functionality going to be added any time soon?
Here's a super basic example of what I'd like to be able to implement:
Runs an ES query, with basic scores
Goes through a list of functions, and adds to it the first score where the filter matches
"function_score": {
"query": {...}, // base ES query
"functions": [
{
"filter": {...},
"script_score": {"script": "25"}
},
{
"filter": {...},
"script_score": {"script": "15"}
}
],
"score_mode": "first", // take the first script_score where the filter matches
"boost_mode": "sum" // and add this to the base ES query score
}
I am currently using Elasticsearch v1.1.0, and NEST v1.0.0-beta1 prerelease.
Thanks!
It's already implemented:
_client.Search<ElasticsearchProject>(s =>
s.Query(q=>q
.FunctionScore(fs=>fs.Functions(
f=>f
.ScriptScore(ss=>ss.Script("25"))
.Filter(ff=>ff.Term(t=>t.Country, "A")),
f=> f
.ScriptScore(ss=>ss.Script("15"))
.Filter(ff=>ff.Term("a","b")))
.ScoreMode(FunctionScoreMode.first)
.BoostMode(FunctionBoostMode.sum))));
The Udi's answer didn't work for me. It seems that in new version (v 2.3, C#) there's no Filter() method on ScoreFunctionsDescriptor class.
But I found a solution. You can provide an array of IScoreFunction. To do that you can use new FunctionScoreFunction() or use my helper class:
class CustomFunctionScore<T> : FunctionScoreFunction
where T: class
{
public CustomFunctionScore(Func<QueryContainerDescriptor<T>, QueryContainer> selector, double? weight = null)
{
this.Filter = selector.Invoke(new QueryContainerDescriptor<T>());
this.Weight = weight;
}
}
With this class, filter can be applied this way (this is just an example):
SearchDescriptor<BlobPost> searchDescriptor = new SearchDescriptor<BlobPost>()
.Query(qr => qr
.FunctionScore(fs => fs
.Query(q => q.Bool(b => b.Should(s => s.Match(a => a.Field(f => f.FirstName).Query("john")))))
.ScoreMode(FunctionScoreMode.Max)
.BoostMode(FunctionBoostMode.Sum)
.Functions(
new[]
{
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.Id).Query("my_id")), 10),
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.FirstName).Query("john")), 10),
}
)
)
);

Resources