How to view the analyzed text when a custom analyzer was used? - elasticsearch

I can't figure out how to test my custom analyzer/view the analyzed data.
Normally I would add my custom analyzer to the "index settings" when creating the index. The problem I'm having in this case is that I'm not using an index or at least I think that I'm not and I don't know how to add my custom analyzer to the Elasticsearch client?
This is the method which I'm currently using for testing the "analysis" part:
public async Task AnalizeField(string analyzer, string textToAnalyze)
{
var elasticClient = ElasticsearchHelper.DatabaseConnection();
var analyzeResponse = await elasticClient.AnalyzeAsync(a => a
.Analyzer(analyzer)
.Text(textToAnalyze)
);
var result = "";
if (analyzeResponse != null && analyzeResponse.Tokens.Count > 0)
{
foreach (var token in analyzeResponse.Tokens)
{
result += token.Token + " ";
}
}
Console.WriteLine("Analyzing text \"" + textToAnalyze + "\" using the \"" + analyzer + "\" analyzer: " + result);
}

Found it: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/testing-analyzers.html#_testing_a_custom_analyzer_in_an_index
Testing a custom analyzer in an index
In this example, we’ll add a custom analyzer to an existing index. First, we need to close the index
client.CloseIndex("analysis-index");
Now, we can update the settings to add the analyzer
client.UpdateIndexSettings("analysis-index", i => i
.IndexSettings(s => s
.Analysis(a => a
.CharFilters(cf => cf
.Mapping("my_char_filter", m => m
.Mappings("F# => FSharp")
)
)
.TokenFilters(tf => tf
.Synonym("my_synonym", sf => sf
.Synonyms("superior, great")
)
)
.Analyzers(an => an
.Custom("my_analyzer", ca => ca
.Tokenizer("standard")
.CharFilters("my_char_filter")
.Filters("lowercase", "stop", "my_synonym")
)
)
)
)
);
And open the index again. Here, we also wait up to five seconds for the status of the index to become green
client.OpenIndex("analysis-index");
client.ClusterHealth(h => h
.WaitForStatus(WaitForStatus.Green)
.Index("analysis-index")
.Timeout(TimeSpan.FromSeconds(5))
);
With the index open and ready, let’s test the analyzer
var analyzeResponse = client.Analyze(a => a
.Index("analysis-index")
.Analyzer("my_analyzer")
.Text("F# is THE SUPERIOR language :)")
);

you should try to install Cerebro.
https://github.com/lmenezes/cerebro
After you install it you have in the menu Analysis. Then you can easily see "analyze by field type" or "analyze by analyzer".
This should help

Related

How to add conditional properties for index creation in elasticsearch nest?

I want to create index with some condition,like with querycontainer to add conditional filters.
PropertiesDescriptor<object> ps = new PropertiesDescriptor<object>();
if (condition)
{
ps.Text(s => s.Name(name[1]));
}
if(condition)
{
ps.Number(s => s.Name(name[1]));
}
if (!_con.client.Indices.Exists(indexname).Exists)
{
var createIndexResponse = _con.client.Indices.Create(indexname, index => index.Settings(s => s.NumberOfShards(1).NumberOfReplicas(0))
.Map(m=>m.Properties(ps)));
}
But i receive following error, can you guide me how to acheive this.
cannot convert from 'Nest.PropertiesDescriptor<object>' to 'System.Func<Nest.PropertiesDescriptor<object>, Nest.IPromise<Nest.IProperties>>'
You are almost there, just change Properties part to m.Properties(p => ps).
_con.client.Indices.Create(indexname,
index => index.Settings(s => s.NumberOfShards(1).NumberOfReplicas(0)).Map(m=>m.Properties(p => ps)));
Hope that helps.

Adding FunctionScore/FieldValueFactor to a MultiMatch query

We've got a pretty basic query we're using to allow users to provide a query text, and then it boosts matches on different fields. Now we want to add another boost based on votes, but not sure where to nest the FunctionScore in.
Our original query is:
var results = await _ElasticClient.SearchAsync<dynamic>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
);
If I try to nest in FunctionScore around the MultiMatch, it basically ignores the query/fields, and just returns everything in the index:
var results = await _ElasticClient.SearchAsync<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
My expectation is that since I'm not providing a FunctionScore or any Functions, this should basically do the exact same thing as above. Then, just adding in FunctionScore will provide boosts on the results based on the functions I give it (in my case, boosting based on the votes field just FieldValueFactor).
The documentation around this is a little fuzzy, particularly with certain combinations, like MultiMatch, FunctionScore, and query text. I did find this answer, but it doesn't cover when including query text.
I'm pretty sure it boils down to my still foggy understanding of how Elastic queries work, but I'm just not finding much to cover the (what I would think is a pretty common) scenario of:
A user entering a query
Boosting matches of that query with certain fields
Boosting all results based on the value of a numeric field
Your function_score query is correct, but the reason that you are not seeing the results that you expect is because of a feature in NEST called conditionless queries. In the case of a function_score query, it is considered conditionless when there are no functions, omitting the query from the serialized form sent in the request.
The easiest way to see this is with a small example
private static void Main()
{
var defaultIndex = "my-index";
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool, new InMemoryConnection())
.DefaultIndex(defaultIndex)
.DisableDirectStreaming()
.PrettyJson()
.OnRequestCompleted(callDetails =>
{
if (callDetails.RequestBodyInBytes != null)
{
Console.WriteLine(
$"{callDetails.HttpMethod} {callDetails.Uri} \n" +
$"{Encoding.UTF8.GetString(callDetails.RequestBodyInBytes)}");
}
else
{
Console.WriteLine($"{callDetails.HttpMethod} {callDetails.Uri}");
}
Console.WriteLine();
if (callDetails.ResponseBodyInBytes != null)
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{Encoding.UTF8.GetString(callDetails.ResponseBodyInBytes)}\n" +
$"{new string('-', 30)}\n");
}
else
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{new string('-', 30)}\n");
}
});
var client = new ElasticClient(settings);
var queryText = "query text";
var results = client.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
}
which emits the following request
POST http://localhost:9200/my-index/object/_search?pretty=true&typed_keys=true
{}
You can disable the conditionless feature by marking a query as Verbatim
var results = client.Search<dynamic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Verbatim() // <-- send the query *exactly as is*
.Query(q2 => q2
.MultiMatch(mm => mm
.Fields(f => f
.Field("name^5")
.Field("hobbies^2")
)
.Query(queryText)
)
)
)
)
);
This now sends the query
POST http://localhost:9200/my-index/object/_search?pretty=true&typed_keys=true
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "query text",
"fields": [
"name^5",
"hobbies^2"
]
}
}
}
}
}

Nest update index settings

I am following the post Creating an index Nest and trying to update my index settings. All runs fine however the html_strip filter is not stripping HTML. My code is
var node = new Uri(_url + ":" + _port);
var settings = new ConnectionSettings(node);
settings.SetDefaultIndex(index);
_client = new ElasticClient(settings);
//to apply filters during indexing use folding to remove diacritics and html strip to remove html
_client.UpdateSettings(
f = > f.Analysis(descriptor = > descriptor
.Analyzers(
bases = > bases
.Add("folded_word", new CustomAnalyzer
{
Filter = new List < string > { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
)
.CharFilters(
cf = > cf.Add("html_strip", new HtmlStripCharFilter())
)
)
);
You are getting error:
Can't update non dynamic
settings[[index.analysis.analyzer.folded_word.filter.0,
index.analysis.char_filter.html_strip.type,
index.analysis.analyzer.folded_word.filter.1,
index.analysis.analyzer.folded_word.type,
index.analysis.analyzer.folded_word.tokenizer]] for open
indices[[my_index]]
Before you will try to update settings, close index first, update settings and reopen afterwards. Have a look.
client.CloseIndex(..);
client.UpdateSettings(..);
client.OpenIndex(..);
UPDATE
Add html_strip char filter to you custom analyzer:
.Analysis(descriptor => descriptor
.Analyzers(bases => bases.Add("folded_word",
new CustomAnalyzer
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard",
CharFilter = new List<string> { "html_strip" }
}))
)
Now you can run test to check if this analyzer returns correct tokens:
client.Analyze(a => a.Index(indexName).Text("this <a> is a test <div>").Analyzer("folded_word"));
Output:
this
is
a
test
Hope it helps.

example of how to use synonyms in nest

i haven't found a solid example on how to create and use synonyms using Nest for Elasticsearch. if anyone has one it would be helpful.
my attempt looks like this, but i don't know how to apply it to a field.
var syn = new SynonymTokenFilter
{
Synonyms = new [] { "pink, p!nk => pink", "lil, little", "ke$ha, kesha => ke$ha" },
IgnoreCase = true,
Tokenizer = "standard"
};
client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("fullTermCaseInsensitive", fullTermCaseInsensitive)
)
.TokenFilters(x => x
.Add("synonym", syn)
)
)
...
it's very simple :)
you will need to define first the Synonym filter the you can use it in your custom Analyzer...where you can add also other type of filters.
Small example :
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim", "synonym" },
Tokenizer = "standard"
}
)
)
.TokenFilters(i => i
.Add("synonym", new SynonymTokenFilter()
{
SynonymsPath="analysis/synonym.txt",
Format = "Solr"
}
)
)
Then you can use the custom analyzer in the mapping part
Assuming your fullTermCaseInsensitive analyzer is custom, you need to add your synonym filter to it:
var fullTermCaseInsensitive = new CustomAnalyzer()
{
.
.
.
Filter = new string[] { "syn" }
};
And upon creating your index, you can add a mapping and apply the fullTermCaseInsensitive analyzer to your field(s):
client.CreateIndex("myindex", c => c
.Analysis(a => a
.Analyzers(an => an.Add("fullTermCaseInsensitive", fullTermCaseInsensitive))
.TokenFilters(tf => tf.Add("syn", syn)))
.AddMapping<MyType>(m => m
.Properties(p => p
.String(s => s.Name(t => t.MyField).Analyzer("fullTermCaseInsensitive")))));

Need concrete documentation / examples of building complex index using NEST ElasticSearch library

I would like to use the NEST library's Fluent interface to create an index, which involves setting up custom filters, analyzers, and type mappings. I would like to avoid decorating my classes with NEST-specific annotations.
I have seen the documentation at http://nest.azurewebsites.net/indices/create-indices.html and http://nest.azurewebsites.net/indices/put-mapping.html. This documentation, while showing some examples, is not complete enough to help me figure out how to use the Fluent API to build some complex indexing scenarios.
I have found the tutorial at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ to be quite helpful; some code showing how to build the filters, analyzers and mappings in this tutorial via the NEST Fluent interface in place of the straight JSON would be a great answer to this question.
The more specific you can be with your question the better the answers you receive will be. Nevertheless, here is an index that sets up an analyzer (with filter) and tokenizer (EdgeNGram) and then uses them to create an autocomplete index on the Name field of a Tag class.
public class Tag
{
public string Name { get; set; }
}
Nest.IElasticClient client = null; // Connect to ElasticSearch
var createResult = client.CreateIndex(indexName, index => index
.Analysis(analysis => analysis
.Analyzers(a => a
.Add(
"autocomplete",
new Nest.CustomAnalyzer()
{
Tokenizer = "edgeNGram",
Filter = new string[] { "lowercase" }
}
)
)
.Tokenizers(t => t
.Add(
"edgeNGram",
new Nest.EdgeNGramTokenizer()
{
MinGram = 1,
MaxGram = 20
}
)
)
)
.AddMapping<Tag>(tmd => tmd
.Properties(props => props
.MultiField(p => p
.Name(t => t.Name)
.Fields(tf => tf
.String(s => s
.Name(t => t.Name)
.Index(Nest.FieldIndexOption.not_analyzed)
)
.String(s => s
.Name(t => t.Name.Suffix("autocomplete"))
.Index(Nest.FieldIndexOption.analyzed)
.IndexAnalyzer("autocomplete")
)
)
)
)
)
);
There is also a fairly complete mapping example in NEST's unit test project on github.
https://github.com/elasticsearch/elasticsearch-net/blob/develop/src/Tests/Nest.Tests.Unit/Core/Map/FluentMappingFullExampleTests.cs
Edit:
To query the index, do something like the following:
string queryString = ""; // search string
var results = client.Search<Tag>(s => s
.Query(q => q
.Text(tq => tq
.OnField(t => t.Name.Suffix("autocomplete"))
.QueryString(queryString)
)
)
);

Resources