ElasticSearch NEST custom word joiner Analyzer not returning the correct result - elasticsearch

I created an autocomplete filter with ElasticSearch using NEST API. I cant seem to get the word joiner to work.
So basically if I search for something like Transhex i also want to be able to return Trans Hex
My Index looks as follow...I think the WordDelimiter filter might be wrong.
Also, I followed the following article Link. They use the low-level API so it is possible that I am doing it completely wrong using the NEST API
var response = this.Client.CreateIndex(
"company-index",
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("standard").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(3).MaxGram(30))))));
UPDATE
So I updated the Analyzer to look as follows, noticed that I updated the Analyzer from standard to keyword.
var response = this.Client.CreateIndex(
this.indexName,
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("keyword").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(1).MaxGram(20))))));
The Results will look as follows
Search Keyword : perfect pools
Results
perfect pools -> This is the correct one at the top
EXCLUSIVE POOLS
Perfect Painters
Search Keyword : perfectpools Or PerfectPools
Results
Perfect Hideaways (Pty) Ltd -> this is the wrong one i would like to display perfect pools
PERFORMANTA APAC PTY LTD
Perfect Laser Technologies (PTY) LTD

Use Keyword tokenizer. The standard tokenizer will split the word in 2 tokens, then apply the filters on them.
UPDATE:
I used a search like this one and seems ok.
var searchResult = EsClient.Search<CompanyDocument>(q => q
.Index("test_index")
.Type("companydocument")
.TrackScores(true)
.Query(qq =>
{
QueryContainer queryContainer = null;
queryContainer = qq.QueryString(qs => qs.Fields(fs => fs.Field(f => f.CompanyName)).Query("perfectpools").DefaultOperator(Operator.And).Analyzer("auto-complete"));
return queryContainer;
})
.Sort(sort => sort.Descending(SortSpecialField.Score))
.Take(10)
);

Related

ElasticSearch - Search middle of words over multiple fields

I'm trying to retrieve documents that have a phrase in them, not necessarily at the start of the word, over multiple document fields.
Such as "ell" should match a document field "hello". And do this on two fields.
I initially went with MultiMatch due to this SO answer. Here was my implementation:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>.MultiMatch(c => c
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query(query)
.MaxExpansions(2)
.Slop(2)
.Name("named_query")
);
But I found that it would only match "hello" if my search phrase started with the start of the word e.g. it would not match "ello".
So I then changed to QueryString due to this SO answer. My implementation was:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>.QueryString(c => c
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query(query)
.FuzzyMaxExpansions(2)
.Name("named_query")
);
But I found that was even worse. It didn't search multiple fields, only p.VeganItem.Name and still "ello" was not matching "hello".
How do I use Nest to search for a term that can be in the middle of a word and over multiple document fields?
Wildcard queries are expensive, if you want to customize and allow how many middle characters you want to search, you can do it using the n-gram tokenizer, that would be less expensive and will provide more customisation/flexibility to you.
I've also written a blog post on implementing the autocomplete and its various trade-offs with performance and functional requirements.
You will need to use wild card query for this scenario, for more information about wild cards query check here, and for nest WildQueries check here.
To do wild card query in Nest you can do like this:
new QueryContainer[]
{
Query<VeganItemEstablishmentSearchDto>.Wildcard(w => w
.Field(v => v.VeganItem.CompanyName))
.Value(query)),
Query<VeganItemEstablishmentSearchDto>.Wildcard(w => w
.Field(p => p.VeganItem.Name))
.Value(query)
}
Your should add asterisk (*) in the beginning and end of your query.
Please keep in your mind that wildCard queries are expensive and you might want to achieve these by having different Analyzer in your mapping.
QueryString from this SO answer is what worked for me for multiple fields and the middle of a word. I have not tried Amit's answer yet. I will in the future. This is the quick solution for a beginner:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>
.QueryString(c => c
.Name("named_query")
.Boost(1.1)
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query($"*{query}*")
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
This also works:
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.MatchPhrase(c => c
.Boost(1.1)
.Field(f => f.VeganItem.Name)
.Query(query)
.Slop(1)
);
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.MatchPhrase(c => c
.Boost(1.1)
.Field(f => f.VeganItem.CompanyName)
.Query(query)
.Slop(1)
);
var terms = query.ToLower().Split(' ');
foreach (var term in terms)
{
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.Wildcard(c => c
.Value($"*{term}*")
.Field(f => f.VeganItem.CompanyName)
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.Wildcard(c => c
.Value($"*{term}*")
.Field(f => f.VeganItem.Name)
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
}

Elasticsearch NEST API, Searching Multiple Indices

If one is seraching several indexes at the same time, is there any way to say that if searching index A, then add this filter and if searching index B then add a different filter.
For example:
var filters = new List<Func<QueryContainerDescriptor<PropertySearchResult>, QueryContainer>>();
filters.Add(fq => fq.Term(t => t.Field(f => f.PromoterId).Value(user.Id)));
filters.Add(fq => fq.Term(t => t.Field(f => f.SubscriptionId).Value(subscriptionId)));
string indicies = String.Join(",", Utils.SupportedCountries.Select(c => c.Key.ToLower()).ToArray());
var result = await ElasticSearchConfig.GetClient().DeleteByQueryAsync<PropertySearchResult>(u => u
.Index(indicies)
.Query(q => q
.Bool(bq => bq.Filter(filters))));
at the moment, all indices will be subject to the same filters but I would like to vary the filters based on which index is being searched.
Add(with &&) a term query to each of your filters
.Term("_index", A)
Check this link
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index-field.html

Query one field with multiple values in elasticsearch nest

I have a combination of two queries with Elasticsearch and nest, the first one is a full-text search for a specific term and the second one is to filter or query another field which is file-path but it should be for many files paths and the path could be part or full path, I can query one file-path but I couldn't manage to do it for many file paths, any suggestion?
Search<SearchResults>(s => s
.Query(q => q
.Match(m => m.Field(f => f.Description).Query("Search_term"))
&& q
.Prefix(t => t.Field(f => f.FilePath).Value("file_Path"))
)
);
For searching for more than one path you can use bool Query in elasticsearch and then use Should Occur to search like logical OR, so you code should look like this:
Search<SearchResults>(s => s
.Query(q => q.
Bool(b => b
.Should(
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
....
))
&& q.Match(m => m.Field(f => f.description).Query("Search_term")
)));
Also you should use WildCard Query to get result for paths that could be part or full path. For more information check ES offical documentation about WildQuery and Bool Query below:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/bool-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

Elasticsearch 2.1 - Deprecated search types

According to this link, both scan and count are deprecated.
I am trying to change my queries to reflect this. So the count change is easy, just removing the search type and adding size=0 to the request, however, I am not 100% on the scan change.
Currently I have this query:
var result = ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.SearchType(SearchType.Scan)
.Scroll("4s")
.Query
(qu =>
qu.Filtered
(fil =>
fil.Filter
(f =>
f.Bool(b => b.Must(m => m.Term("filedName", "abc")))))));
Am I correct in my understanding that all I need to change is remove the searchtype and add a sort? I.e:
var result = ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.Scroll("4s")
.Sort(x => x.OnField("_doc"))
.Query
(qu =>
qu.Filtered
(fil =>
fil.Filter
(f => f.Bool(b => b.Must(m => m.Term("filedName", "abc")))))));
I have seen a enum SortSpecialField here, but I am not sure how to actually use this in the sort parameter.
You're correct in your understanding that the change (as you document in your question) to sort by _doc will replace the deprecated Scan searchtype. The SortSpecialField enum is just syntax sugar for sorting by _doc. If you prefer to use it, in NEST 2.0 [only], you can do this:
ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.Scroll("4s")
.Sort(x => x.Ascending(SortSpecialField.DocumentIndexOrder))
...

how do i set similarity in nest for elasticsearch on a per field basis

i have not been able to 'programmatically' set the similarity on a field in elasticsearch using Nest.
here's an example of how i set up my index. it's within the multifield mapping where i'd like to set the similarity so i can experiment with things like BM25 similarity...
(see the props > multifield section below)...
var createInd = client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("nameAnalyzer", nameAnalyzer)
)
.AddMapping<SearchData>(m => m
.MapFromAttributes()
.Properties(props =>
{
props
.MultiField(mf => mf
//title
.Name(s => s.Title)
.Fields(f => f
.String(s => s.Name(o => o.Title).Analyzer("nameAnalyzer"))
.String(s => s.Name(o => o.Title.Suffix("raw")).Index(FieldIndexOption.not_analyzed))
)
);
...
It was just recently made possible with this commit to set the similarity on a string field. You can now do this:
.String(s => s.Name(o => o.Title).Similarity("my_similarity")
This is assumming you already have the similarity added to your index. NEST is lacking a bit of flexibility at the moment for actually configuring similarities. Right now you have to use the CustomSimilaritySettings class. For example:
var bm25 = new CustomSimilaritySettings("my_similarity", "BM25");
bm25.SimilarityParameters.Add("k1", "2.0");
bm25.SimilarityParameters.Add("b", "0.75");
var settings = new IndexSettings();
settings.Similarity = new SimilaritySettings();
settings.Similarity.CustomSimilarities.Add(bm25);
client.CreateIndex("myindex", c => c.InitializeUsing(settings));
It would be nice to be able to do this via the fluent API when creating an index. I am considering sending a PR for this before the 1.0RC release.

Resources