How would I recreate the following index using Elasticsearch Nest API?
Here is the json for the index including the mapping:
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"data": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
Here is my attempt:
var newIndex = client.CreateIndexAsync(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true))
.AddMapping<Object>(mapping => mapping
.IndexAnalyzer("trigram")
.Type("string"))
);
The documentation does not mention anything about this?
UPDATE:
Found this post that uses
var index = new IndexSettings()
and then adds Analysis with the string literal json.
index.Add("analysis", #"{json});
Where can one find more examples like this one and does this work?
Creating an index in older versions
There are two main ways that you can accomplish this as outlined in the Nest Create Index Documentation:
Here is the way where you directly declare the index settings as Fluent Dictionary entries. Just like you are doing in your example above. I tested this locally and it produces the index settings that match your JSON above.
var response = client.CreateIndex(indexName, s => s
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true)
.Add("analysis.filter.trigrams_filter.type", "nGram")
.Add("analysis.filter.trigrams_filter.min_gram", "3")
.Add("analysis.filter.trigrams_filter.max_gram", "3")
.Add("analysis.analyzer.trigrams.type", "custom")
.Add("analysis.analyzer.trigrams.tokenizer", "standard")
.Add("analysis.analyzer.trigrams.filter.0", "lowercase")
.Add("analysis.analyzer.trigrams.filter.1", "trigrams_filter")
)
.AddMapping<Object>(mapping => mapping
.Type("data")
.AllField(af => af.Enabled())
.Properties(prop => prop
.String(sprop => sprop
.Name("text")
.IndexAnalyzer("trigrams")
)
)
)
);
Please note that NEST also includes the ability to create index settings using strongly typed classes as well. I will post an example of that later, if I have time to work through it.
Creating index with NEST 7.x
Please also note that in NEST 7.x CreateIndex method is removed. Use Indices.Create isntead. Here's the example.
_client.Indices
.Create(indexName, s => s
.Settings(se => se
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Setting("merge.policy.merge_factor", "10")));
In case people have NEST 2.0, the .NumberOfReplicas(x).NumberOfShards(y) are in the Settings area now so specify within the lamba expression under Settings.
EsClient.CreateIndex("indexname", c => c
.Settings(s => s
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
)
NEST 2.0 has a lot of changes and moved things around a bit so these answers are a great starting point for sure. You may need to adjust a little for the NEST 2.0 update.
Small example :
EsClient.CreateIndex("indexname", c => c
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
.Settings(s => s
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "15s")
)
#region Analysis
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
.TokenFilters(i => i
.Add("engram", new EdgeNGramTokenFilter
{
MinGram = 1,
MaxGram = 20
}
)
)
.CharFilters(cf => cf
.Add("drop_chars", new PatternReplaceCharFilter
{
Pattern = #"[^0-9]",
Replacement = ""
}
)
#endregion
#region Mapping Categories
.AddMapping<Categories>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(n => n.Label_en)
.Fields(fs => fs
.String(s => s.Name(t => t.Label_en).Analyzer("folded_word"))
)
)
)
#endregion
);
In case anyone has migrated to NEST 2.4 and has the same question - you would need to define your custom filters and analyzers in the index settings like this:
elasticClient.CreateIndex(_indexName, i => i
.Settings(s => s
.Analysis(a => a
.TokenFilters(tf => tf
.EdgeNGram("edge_ngrams", e => e
.MinGram(1)
.MaxGram(50)
.Side(EdgeNGramSide.Front)))
.Analyzers(analyzer => analyzer
.Custom("partial_text", ca => ca
.Filters(new string[] { "lowercase", "edge_ngrams" })
.Tokenizer("standard"))
.Custom("full_text", ca => ca
.Filters(new string[] { "standard", "lowercase" } )
.Tokenizer("standard"))))));
For 7.X plus you can use the following code to create an index with Shards, Replicas and with Automapping:
if (!_elasticClient.Indices.Exists(_elasticClientIndexName).Exists)
{
var response = _elasticClient.Indices
.Create(_elasticClientIndexName, s => s
.Settings(se => se
.NumberOfReplicas(1)
.NumberOfShards(shards)
).Map<YourDTO>(
x => x.AutoMap().DateDetection(false)
));
if (!response.IsValid)
{
// Elasticsearch index status is invalid, log an exception
}
}
Related
We have an application with an ElasticSearch backend. I am trying to enable free text search for some fields and aggregate the same fields. The free text search is expecting the fields to be of the text type and the aggregator is expecting them to be of keyword type. Copy_to doesn't seem to be able to copy the keywords to a text field.
This aggregation works well with the keyword type:
var aggs = await _dataService.Search<Vehicle>(s => s
.Size(50)
.Query(q => q.MatchAll())
.Aggregations(a => a
.Terms("stockLocation_city", c => c
.Field(f => f.StockLocation.City)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
.Terms("stockLocation_country", c => c
.Field(f => f.StockLocation.Country)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
)
);
The schema looks like this:
"stockLocation": {
"type": "object",
"properties": {
"full_text": {
"type": "text",
"analyzer": "custom_analyzer"
},
"city": {
"type": "keyword",
"copy_to":"full_text"
},
"country": {
"type": "keyword",
"copy_to":"full_text"
}
}
}
The query for the full-text search which works with text fields copied to the full_text property:
var qieryDescriptor = query.SimpleQueryString(p => p.Query(searchQuery.FreeText));
And the ElasticClient instantiation:
public ElasticSearchService(ElasticSearchOptions elasticSearchOptions)
{
_elasticSearchOptions = elasticSearchOptions;
var settings = new ConnectionSettings(
_elasticSearchOptions.CloudId,
new Elasticsearch.Net.BasicAuthenticationCredentials(_elasticSearchOptions.UserName, _elasticSearchOptions.Password)
)
.DefaultIndex(_elasticSearchOptions.IndexAliasName)
.DefaultMappingFor<Vehicle>(m => m.IndexName(_elasticSearchOptions.IndexAliasName));
_elasticClient = new ElasticClient(settings);
}
I have looked in the documentation but haven't seen this particular use case anywhere, so I must be doing something wrong.
How can I enable both aggregation and free text search on the same fields?
Cheers
what you are looking for is the "multi-fields" feature:
Multi-Fields
that way you have the same entry in the document and the engine indexes it twice - once as full text and once as keyword.
We have a requirement to have a search for a document type with a variable/dynamic number of fields being queried against. For one search/type it might be Name and Status. For another, the Description field. The fields to be searched against will be chosen by the user at run time.
To do this statically appears easy. Something like this to search in Name and Description fields. (Assume that rootQuery is a valid searchDescriptor ready for the query.
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => ff.Name).Field(ff => ff.Description))));
However, we don't want to have a library of static queries to handle the potential permutations if possible. We'd rather do something dynamic like:
foreach (var field in string-list-of-fields-from-user)
{
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => field);
}
Is this possible? If so, how?
You can pass the string list of fields directly to .Fields(...)
var searchResponse = client.Search<Document>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Query("query")
.Fields(new string[] { "field1", "field2", "field3" })
)
)
);
which yields
{
"query": {
"multi_match": {
"fields": ["field1", "field2", "field3"],
"query": "query"
}
}
}
i've read through so much of the documentation, but now getting a bit confused about how to match part of a word in a search. i understand there are many techniques, but most talk about matching the first part of a word. such as 'quick' can match 'quick brown fox'.
well, what if i have the word 'endgame' i'm looking for but the input query is 'game'? i've tried using the standard, keyword, whitespace, etc, tokenizers but i'm not getting it.
i'm sure i'm missing something simple.
update
i was able to implement this with the help of john. here's the implementation using Nest...
var ngramTokenFilter = new NgramTokenFilter
{
MinGram = 2,
MaxGram = 3
};
var nGramTokenizer = new NGramTokenizer
{
MinGram = 2, MaxGram = 3, TokenChars = new List<string>{"letter", "digit"}
};
var nGramAnalyzer = new CustomAnalyzer
{
Tokenizer = "nGramTokenizer",
Filter = new[] { "ngram", "standard", "lowercase" }
};
client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("ngramAnalyer", nGramAnalyzer)
)
.Tokenizers(tkn => tkn
.Add("nGramTokenizer", nGramTokenizer)
)
.TokenFilters(x => x
.Add("ngram", ngramTokenFilter)
)
)
...
and my poco, i'm actually creating a multifield, one not analyzed, and one with my ngram tokenizer analyzer:
pm.Properties(props => props
.MultiField(mf => mf
.Name("myfield")
.Fields(f => f
.String(s => s.Name("myfield").Analyzer("ngramAnalyer"))
.String(s => s.Name("raw").Index(FieldIndexOption.not_analyzed))
)
)
);
I would try an ngram tokenizer: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
The example below is fairly extreme (it creates 2 and 3 letter tokens) but should give you an idea of how it works:
curl -XPUT 'localhost:9200/test' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_ngram_analyzer" : {
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "3",
"token_chars": [ "letter", "digit" ]
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
Result: FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
This will allow you to break up your tokens into smaller tokens of a configurable size and search on them. You'll want to play with min_gram and max_gram for your use case.
This can have some memory impact but tends to be a lot faster than a wildcard search that has trailing or leading wildcards (or both). http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html
I am currently trying to implement a "function_score" query in NEST, with functions that are only applied when a filter matches.
It doesn't look like FunctionScoreFunctionsDescriptor supports adding a filter yet. Is this functionality going to be added any time soon?
Here's a super basic example of what I'd like to be able to implement:
Runs an ES query, with basic scores
Goes through a list of functions, and adds to it the first score where the filter matches
"function_score": {
"query": {...}, // base ES query
"functions": [
{
"filter": {...},
"script_score": {"script": "25"}
},
{
"filter": {...},
"script_score": {"script": "15"}
}
],
"score_mode": "first", // take the first script_score where the filter matches
"boost_mode": "sum" // and add this to the base ES query score
}
I am currently using Elasticsearch v1.1.0, and NEST v1.0.0-beta1 prerelease.
Thanks!
It's already implemented:
_client.Search<ElasticsearchProject>(s =>
s.Query(q=>q
.FunctionScore(fs=>fs.Functions(
f=>f
.ScriptScore(ss=>ss.Script("25"))
.Filter(ff=>ff.Term(t=>t.Country, "A")),
f=> f
.ScriptScore(ss=>ss.Script("15"))
.Filter(ff=>ff.Term("a","b")))
.ScoreMode(FunctionScoreMode.first)
.BoostMode(FunctionBoostMode.sum))));
The Udi's answer didn't work for me. It seems that in new version (v 2.3, C#) there's no Filter() method on ScoreFunctionsDescriptor class.
But I found a solution. You can provide an array of IScoreFunction. To do that you can use new FunctionScoreFunction() or use my helper class:
class CustomFunctionScore<T> : FunctionScoreFunction
where T: class
{
public CustomFunctionScore(Func<QueryContainerDescriptor<T>, QueryContainer> selector, double? weight = null)
{
this.Filter = selector.Invoke(new QueryContainerDescriptor<T>());
this.Weight = weight;
}
}
With this class, filter can be applied this way (this is just an example):
SearchDescriptor<BlobPost> searchDescriptor = new SearchDescriptor<BlobPost>()
.Query(qr => qr
.FunctionScore(fs => fs
.Query(q => q.Bool(b => b.Should(s => s.Match(a => a.Field(f => f.FirstName).Query("john")))))
.ScoreMode(FunctionScoreMode.Max)
.BoostMode(FunctionBoostMode.Sum)
.Functions(
new[]
{
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.Id).Query("my_id")), 10),
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.FirstName).Query("john")), 10),
}
)
)
);
This may be a silly question, but how do I filter on an empty string in ElasticSearch using Nest. Specifically, how do I recreate the following result:
curl http://localhost:9200/test/event/_search
{
"filter" : { "term" : { "target" : "" }}
}
I've tried:
(f => f
.Term("target", "")
);
which according to ElasticSearch and Nest filtering does not work is treated like a conditionless query and returns everything, while adding a .Strict() throws a DslException:
(f => f
.Strict().Term("target", "")
);
I've also tried .Missing() and .Exists() to no avail.
The relevant section of my _mapping for reference:
{
"event": {
"dynamic": "false",
"properties": {
target": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs"
}
}
}
}
Any pointers would be greatly appreciated.
As the documentation on NEST and writing queries mentions you can toggle Strict() mode to trigger exceptions if a part of your query turns out to be conditionless but if thats what you really wanted then you were stuck as you've found out.
I just committed a .Verbatim() construct which works exactly like .Strict() but instead of throwing an exception it will take the query as is and render it as specified.
(f => f
.Verbatim()
.Term("target", "")
);
Should thus disable the conditionless query rewrite and insert the query literally as specified.
This will make it in the next version of NEST (so after the current version of 0.12.0.0)
I will just remark that you have to use Verbatim() on every query, not just once on the top.
var searchResults = this.Client.Search<Project>(s => s
.Query(q => q
//.Verbatim() // no, here won't work
.Bool(b => b
.Should(
bs => bs.Match(p => p.Query("hello").Field("name").Verbatim()),
bs => bs.Match(p => p.Query("world").Field("name").Verbatim())
)
)
)
);