how do i set similarity in nest for elasticsearch on a per field basis - elasticsearch

i have not been able to 'programmatically' set the similarity on a field in elasticsearch using Nest.
here's an example of how i set up my index. it's within the multifield mapping where i'd like to set the similarity so i can experiment with things like BM25 similarity...
(see the props > multifield section below)...
var createInd = client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("nameAnalyzer", nameAnalyzer)
)
.AddMapping<SearchData>(m => m
.MapFromAttributes()
.Properties(props =>
{
props
.MultiField(mf => mf
//title
.Name(s => s.Title)
.Fields(f => f
.String(s => s.Name(o => o.Title).Analyzer("nameAnalyzer"))
.String(s => s.Name(o => o.Title.Suffix("raw")).Index(FieldIndexOption.not_analyzed))
)
);
...

It was just recently made possible with this commit to set the similarity on a string field. You can now do this:
.String(s => s.Name(o => o.Title).Similarity("my_similarity")
This is assumming you already have the similarity added to your index. NEST is lacking a bit of flexibility at the moment for actually configuring similarities. Right now you have to use the CustomSimilaritySettings class. For example:
var bm25 = new CustomSimilaritySettings("my_similarity", "BM25");
bm25.SimilarityParameters.Add("k1", "2.0");
bm25.SimilarityParameters.Add("b", "0.75");
var settings = new IndexSettings();
settings.Similarity = new SimilaritySettings();
settings.Similarity.CustomSimilarities.Add(bm25);
client.CreateIndex("myindex", c => c.InitializeUsing(settings));
It would be nice to be able to do this via the fluent API when creating an index. I am considering sending a PR for this before the 1.0RC release.

Related

Elasticsearch NEST API, Searching Multiple Indices

If one is seraching several indexes at the same time, is there any way to say that if searching index A, then add this filter and if searching index B then add a different filter.
For example:
var filters = new List<Func<QueryContainerDescriptor<PropertySearchResult>, QueryContainer>>();
filters.Add(fq => fq.Term(t => t.Field(f => f.PromoterId).Value(user.Id)));
filters.Add(fq => fq.Term(t => t.Field(f => f.SubscriptionId).Value(subscriptionId)));
string indicies = String.Join(",", Utils.SupportedCountries.Select(c => c.Key.ToLower()).ToArray());
var result = await ElasticSearchConfig.GetClient().DeleteByQueryAsync<PropertySearchResult>(u => u
.Index(indicies)
.Query(q => q
.Bool(bq => bq.Filter(filters))));
at the moment, all indices will be subject to the same filters but I would like to vary the filters based on which index is being searched.
Add(with &&) a term query to each of your filters
.Term("_index", A)
Check this link
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index-field.html

Query one field with multiple values in elasticsearch nest

I have a combination of two queries with Elasticsearch and nest, the first one is a full-text search for a specific term and the second one is to filter or query another field which is file-path but it should be for many files paths and the path could be part or full path, I can query one file-path but I couldn't manage to do it for many file paths, any suggestion?
Search<SearchResults>(s => s
.Query(q => q
.Match(m => m.Field(f => f.Description).Query("Search_term"))
&& q
.Prefix(t => t.Field(f => f.FilePath).Value("file_Path"))
)
);
For searching for more than one path you can use bool Query in elasticsearch and then use Should Occur to search like logical OR, so you code should look like this:
Search<SearchResults>(s => s
.Query(q => q.
Bool(b => b
.Should(
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
....
))
&& q.Match(m => m.Field(f => f.description).Query("Search_term")
)));
Also you should use WildCard Query to get result for paths that could be part or full path. For more information check ES offical documentation about WildQuery and Bool Query below:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/bool-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

ElasticSearch NEST custom word joiner Analyzer not returning the correct result

I created an autocomplete filter with ElasticSearch using NEST API. I cant seem to get the word joiner to work.
So basically if I search for something like Transhex i also want to be able to return Trans Hex
My Index looks as follow...I think the WordDelimiter filter might be wrong.
Also, I followed the following article Link. They use the low-level API so it is possible that I am doing it completely wrong using the NEST API
var response = this.Client.CreateIndex(
"company-index",
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("standard").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(3).MaxGram(30))))));
UPDATE
So I updated the Analyzer to look as follows, noticed that I updated the Analyzer from standard to keyword.
var response = this.Client.CreateIndex(
this.indexName,
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("keyword").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(1).MaxGram(20))))));
The Results will look as follows
Search Keyword : perfect pools
Results
perfect pools -> This is the correct one at the top
EXCLUSIVE POOLS
Perfect Painters
Search Keyword : perfectpools Or PerfectPools
Results
Perfect Hideaways (Pty) Ltd -> this is the wrong one i would like to display perfect pools
PERFORMANTA APAC PTY LTD
Perfect Laser Technologies (PTY) LTD
Use Keyword tokenizer. The standard tokenizer will split the word in 2 tokens, then apply the filters on them.
UPDATE:
I used a search like this one and seems ok.
var searchResult = EsClient.Search<CompanyDocument>(q => q
.Index("test_index")
.Type("companydocument")
.TrackScores(true)
.Query(qq =>
{
QueryContainer queryContainer = null;
queryContainer = qq.QueryString(qs => qs.Fields(fs => fs.Field(f => f.CompanyName)).Query("perfectpools").DefaultOperator(Operator.And).Analyzer("auto-complete"));
return queryContainer;
})
.Sort(sort => sort.Descending(SortSpecialField.Score))
.Take(10)
);

Using which field matched in a multimatch query in a function score

I have a multimatch query which I am using across 5 fields. I am also using a function score to combine various factors into the score. I would like to add a factor to this so that results that matched on one of the fields is increased (adding a large number so that matches on this field always have the highest score).
I know that I can use highlighting to find out which fields were matched, but how can I access that information in the function score script?
Here's what I have so far (using NEST, but that shouldn't make a difference).
var searchResponse = client.Search<TopicCollection.Topic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Name("function_score_query")
.Query(q1 => q1
.MultiMatch(c => c
.Fields(f => f
.Field(p => p.field1)
.Field(p => p.field2) //...etc
.Query(searchTerm)
)
)
.Functions(fun => fun
.ScriptScore(ss => ss.Script(sc => sc
.Inline(
//TODO: add 1000 to normalised _score if match is in field1
)))
).BoostMode(FunctionBoostMode.Replace)
)
).Highlight(h => h
.Fields(p => p.AllField())
)
);

Elasticsearch 2.1 - Deprecated search types

According to this link, both scan and count are deprecated.
I am trying to change my queries to reflect this. So the count change is easy, just removing the search type and adding size=0 to the request, however, I am not 100% on the scan change.
Currently I have this query:
var result = ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.SearchType(SearchType.Scan)
.Scroll("4s")
.Query
(qu =>
qu.Filtered
(fil =>
fil.Filter
(f =>
f.Bool(b => b.Must(m => m.Term("filedName", "abc")))))));
Am I correct in my understanding that all I need to change is remove the searchtype and add a sort? I.e:
var result = ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.Scroll("4s")
.Sort(x => x.OnField("_doc"))
.Query
(qu =>
qu.Filtered
(fil =>
fil.Filter
(f => f.Bool(b => b.Must(m => m.Term("filedName", "abc")))))));
I have seen a enum SortSpecialField here, but I am not sure how to actually use this in the sort parameter.
You're correct in your understanding that the change (as you document in your question) to sort by _doc will replace the deprecated Scan searchtype. The SortSpecialField enum is just syntax sugar for sorting by _doc. If you prefer to use it, in NEST 2.0 [only], you can do this:
ElasticClient.Search<Product>(s => s
.From(0)
.Size(10)
.Scroll("4s")
.Sort(x => x.Ascending(SortSpecialField.DocumentIndexOrder))
...

Resources