I am following the post Creating an index Nest and trying to update my index settings. All runs fine however the html_strip filter is not stripping HTML. My code is
var node = new Uri(_url + ":" + _port);
var settings = new ConnectionSettings(node);
settings.SetDefaultIndex(index);
_client = new ElasticClient(settings);
//to apply filters during indexing use folding to remove diacritics and html strip to remove html
_client.UpdateSettings(
f = > f.Analysis(descriptor = > descriptor
.Analyzers(
bases = > bases
.Add("folded_word", new CustomAnalyzer
{
Filter = new List < string > { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
)
.CharFilters(
cf = > cf.Add("html_strip", new HtmlStripCharFilter())
)
)
);
You are getting error:
Can't update non dynamic
settings[[index.analysis.analyzer.folded_word.filter.0,
index.analysis.char_filter.html_strip.type,
index.analysis.analyzer.folded_word.filter.1,
index.analysis.analyzer.folded_word.type,
index.analysis.analyzer.folded_word.tokenizer]] for open
indices[[my_index]]
Before you will try to update settings, close index first, update settings and reopen afterwards. Have a look.
client.CloseIndex(..);
client.UpdateSettings(..);
client.OpenIndex(..);
UPDATE
Add html_strip char filter to you custom analyzer:
.Analysis(descriptor => descriptor
.Analyzers(bases => bases.Add("folded_word",
new CustomAnalyzer
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard",
CharFilter = new List<string> { "html_strip" }
}))
)
Now you can run test to check if this analyzer returns correct tokens:
client.Analyze(a => a.Index(indexName).Text("this <a> is a test <div>").Analyzer("folded_word"));
Output:
this
is
a
test
Hope it helps.
Related
I can't figure out how to test my custom analyzer/view the analyzed data.
Normally I would add my custom analyzer to the "index settings" when creating the index. The problem I'm having in this case is that I'm not using an index or at least I think that I'm not and I don't know how to add my custom analyzer to the Elasticsearch client?
This is the method which I'm currently using for testing the "analysis" part:
public async Task AnalizeField(string analyzer, string textToAnalyze)
{
var elasticClient = ElasticsearchHelper.DatabaseConnection();
var analyzeResponse = await elasticClient.AnalyzeAsync(a => a
.Analyzer(analyzer)
.Text(textToAnalyze)
);
var result = "";
if (analyzeResponse != null && analyzeResponse.Tokens.Count > 0)
{
foreach (var token in analyzeResponse.Tokens)
{
result += token.Token + " ";
}
}
Console.WriteLine("Analyzing text \"" + textToAnalyze + "\" using the \"" + analyzer + "\" analyzer: " + result);
}
Found it: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/testing-analyzers.html#_testing_a_custom_analyzer_in_an_index
Testing a custom analyzer in an index
In this example, we’ll add a custom analyzer to an existing index. First, we need to close the index
client.CloseIndex("analysis-index");
Now, we can update the settings to add the analyzer
client.UpdateIndexSettings("analysis-index", i => i
.IndexSettings(s => s
.Analysis(a => a
.CharFilters(cf => cf
.Mapping("my_char_filter", m => m
.Mappings("F# => FSharp")
)
)
.TokenFilters(tf => tf
.Synonym("my_synonym", sf => sf
.Synonyms("superior, great")
)
)
.Analyzers(an => an
.Custom("my_analyzer", ca => ca
.Tokenizer("standard")
.CharFilters("my_char_filter")
.Filters("lowercase", "stop", "my_synonym")
)
)
)
)
);
And open the index again. Here, we also wait up to five seconds for the status of the index to become green
client.OpenIndex("analysis-index");
client.ClusterHealth(h => h
.WaitForStatus(WaitForStatus.Green)
.Index("analysis-index")
.Timeout(TimeSpan.FromSeconds(5))
);
With the index open and ready, let’s test the analyzer
var analyzeResponse = client.Analyze(a => a
.Index("analysis-index")
.Analyzer("my_analyzer")
.Text("F# is THE SUPERIOR language :)")
);
you should try to install Cerebro.
https://github.com/lmenezes/cerebro
After you install it you have in the menu Analysis. Then you can easily see "analyze by field type" or "analyze by analyzer".
This should help
I need to search across multiple indices using OIS(Object Initializer Syntax).
I have seen examples of executing search across multiple indices with Fluent DSL, but I still do not know how to execute an equivalent search with OIS.
Here is my OIS search(Only searching against one index) :
var searchResult =
await _client.LowLevel.SearchAsync<string>(ApplicationsIndexName, "application", new SearchRequest()
{
From = (query.PageSize * query.PageNumber) - query.PageSize,
Size = query.PageSize,
Query = GetQuery(query),
Aggregations = GetAggregations()
});
Which modifications can be done, so I can search across multiple indices?
After some research, I found out how to search across multiple indices:
var searchResult =
await _client.LowLevel.SearchAsync<string>(new SearchRequest()
{
IndicesBoost = new Dictionary<IndexName, double>
{
{ "applications", 1.4 },
{ "attachments", 1.4 }
},
From = (query.PageSize * query.PageNumber) - query.PageSize,
Size = query.PageSize,
Query = GetQuery(query),
Aggregations = GetAggregations()
});
Just find out about nest. I already insert some number of document in Elastic Search. Right now I want to search the data based on my type, subcriberId. I did run through curl and it works just fine. But when I tried using nest, no result found.
My curl which work:
http://localhost:9200/20160902/_search?q=subscribeId:aca0ca1a-c96a-4534-ab0e-f844b81499b7
My NEST code:
var local = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(local);
var elastic = new ElasticClient(settings);
var response = elastic.Search<IntegrationLog>(s => s
.Index(DateTime.Now.ToString("yyyyMMdd"))
.Type("integrationlog")
.Query(q => q
.Term(p => p.SubscribeId, new Guid("aca0ca1a-c96a-4534-ab0e-f844b81499b7"))
)
);
Can someone point what I did wrong?
A key difference between your curl request and your NEST query is that the former is using a query_string query and the latter, a term query. A query_string query input undergoes analysis at query time whilst a term query input does not so depending on how subscribeId is analyzed (or not), you may see different results. Additionally, your curl request is searching across all document types within the index 20160902.
To perform the exact same query in NEST as your curl request would be
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool)
// set up NEST with the convention to use the type name
// "integrationlog" for the IntegrationLog
// POCO type
.InferMappingFor<IntegrationLog>(m => m
.TypeName("integrationlog")
);
var client = new ElasticClient(connectionSettings);
var searchResponse = client.Search<IntegrationLog>(s => s
.Index("20160902")
// search across all types. Note that documents found
// will be deserialized into instances of the
// IntegrationLog type
.AllTypes()
.Query(q => q
// use query_string query
.QueryString(qs => qs
.Fields(f => f
.Field(ff => ff.SubscribeId)
)
.Query("aca0ca1a-c96a-4534-ab0e-f844b81499b7")
)
)
);
}
public class IntegrationLog
{
public Guid SubscribeId { get; set; }
}
This yields
POST http://localhost:9200/20160902/_search
{
"query": {
"query_string": {
"query": "aca0ca1a-c96a-4534-ab0e-f844b81499b7",
"fields": [
"subscribeId"
]
}
}
}
this specifies the query_string query in the body of the request which is analogous to using the q query string parameter to specify the query.
I have a case where I need a partial match on the first part of some properties (last name and first name) and a partial match on the end of some other properties, and I'm wondering how to add both analyzers.
For example, if I have the first name of "elastic", I can currently search for "elas" and find it. But, if I have an account number of abc12345678, I need to search for "5678" and find all account numbers ending in that, but I can't have a first name search for "stic" find "elastic".
Here's a simplified example of my Person class:
public class Person
{
public string AccountNumber { get; set; }
[ElasticProperty(IndexAnalyzer = "partial_name", SearchAnalyzer = "full_name")]
public string LastName { get; set; }
[ElasticProperty(IndexAnalyzer = "partial_name", SearchAnalyzer = "full_name")]
public string FirstName { get; set; }
}
Here's the relevant existing code where I create the index, that currently works great for searching the beginning of a word:
//Set up analyzers on some fields to allow partial, case-insensitive searches.
var partialName = new CustomAnalyzer
{
Filter = new List<string> { "lowercase", "name_ngrams", "standard", "asciifolding" },
Tokenizer = "standard"
};
var fullName = new CustomAnalyzer
{
Filter = new List<string> { "standard", "lowercase", "asciifolding" },
Tokenizer = "standard"
};
var result = client.CreateIndex("persons", c => c
.Analysis(descriptor => descriptor
.TokenFilters(bases => bases.Add("name_ngrams", new EdgeNGramTokenFilter
{
MaxGram = 15, //Allow partial match up to 15 characters.
MinGram = 2, //Allow no smaller than 2 characters match
Side = "front"
}))
.Analyzers(bases => bases
.Add("partial_name", partialName)
.Add("full_name", fullName))
)
.AddMapping<Person>((m => m.MapFromAttributes()))
);
It seems like I could add another EdgeNGramTokenFilter, and make the Side = "back", but I don't want the first and last name searches to match back side searches. Can someone provide a way to do that?
Thanks,
Adrian
Edit
For completeness, this is the new decorator on the property that goes with the code in the accepted answer:
[ElasticProperty(IndexAnalyzer = "partial_back", SearchAnalyzer = "full_name")]
public string AccountNumber { get; set; }
You need to declare another analyzer (let's call it partialBack) specifically for matching from the back but you can definitely reuse the existing edgeNGram token filter, like this:
var partialBack = new CustomAnalyzer
{
Filter = new List<string> { "lowercase", "reverse", "name_ngrams", "reverse" },
Tokenizer = "keyword"
};
...
.Analyzers(bases => bases
.Add("partial_name", partialName)
.Add("partial_back", partialBack))
.Add("full_name", fullName))
)
The key here is the double use of the reverse token filter.
The string (abc12345678) is
first lowercased (abc12345678),
then reversed (87654321cba),
then edge-ngramed (87, 876, 8765, 87654, 876543, ...)
and finally the tokens are reversed again (78, 678, 5678, 45678, 345678, ...).
As you can see, the result is that the string is tokenized "from the back", so that a search for 5678 would match abc12345678.
i haven't found a solid example on how to create and use synonyms using Nest for Elasticsearch. if anyone has one it would be helpful.
my attempt looks like this, but i don't know how to apply it to a field.
var syn = new SynonymTokenFilter
{
Synonyms = new [] { "pink, p!nk => pink", "lil, little", "ke$ha, kesha => ke$ha" },
IgnoreCase = true,
Tokenizer = "standard"
};
client.CreateIndex("myindex", i =>
{
i
.Analysis(a => a.Analyzers(an => an
.Add("fullTermCaseInsensitive", fullTermCaseInsensitive)
)
.TokenFilters(x => x
.Add("synonym", syn)
)
)
...
it's very simple :)
you will need to define first the Synonym filter the you can use it in your custom Analyzer...where you can add also other type of filters.
Small example :
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim", "synonym" },
Tokenizer = "standard"
}
)
)
.TokenFilters(i => i
.Add("synonym", new SynonymTokenFilter()
{
SynonymsPath="analysis/synonym.txt",
Format = "Solr"
}
)
)
Then you can use the custom analyzer in the mapping part
Assuming your fullTermCaseInsensitive analyzer is custom, you need to add your synonym filter to it:
var fullTermCaseInsensitive = new CustomAnalyzer()
{
.
.
.
Filter = new string[] { "syn" }
};
And upon creating your index, you can add a mapping and apply the fullTermCaseInsensitive analyzer to your field(s):
client.CreateIndex("myindex", c => c
.Analysis(a => a
.Analyzers(an => an.Add("fullTermCaseInsensitive", fullTermCaseInsensitive))
.TokenFilters(tf => tf.Add("syn", syn)))
.AddMapping<MyType>(m => m
.Properties(p => p
.String(s => s.Name(t => t.MyField).Analyzer("fullTermCaseInsensitive")))));