How to make a field Keyword type and Text type at the same time to enable Aggregations and free text search simultainously - elasticsearch

We have an application with an ElasticSearch backend. I am trying to enable free text search for some fields and aggregate the same fields. The free text search is expecting the fields to be of the text type and the aggregator is expecting them to be of keyword type. Copy_to doesn't seem to be able to copy the keywords to a text field.
This aggregation works well with the keyword type:
var aggs = await _dataService.Search<Vehicle>(s => s
.Size(50)
.Query(q => q.MatchAll())
.Aggregations(a => a
.Terms("stockLocation_city", c => c
.Field(f => f.StockLocation.City)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
.Terms("stockLocation_country", c => c
.Field(f => f.StockLocation.Country)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
)
);
The schema looks like this:
"stockLocation": {
"type": "object",
"properties": {
"full_text": {
"type": "text",
"analyzer": "custom_analyzer"
},
"city": {
"type": "keyword",
"copy_to":"full_text"
},
"country": {
"type": "keyword",
"copy_to":"full_text"
}
}
}
The query for the full-text search which works with text fields copied to the full_text property:
var qieryDescriptor = query.SimpleQueryString(p => p.Query(searchQuery.FreeText));
And the ElasticClient instantiation:
public ElasticSearchService(ElasticSearchOptions elasticSearchOptions)
{
_elasticSearchOptions = elasticSearchOptions;
var settings = new ConnectionSettings(
_elasticSearchOptions.CloudId,
new Elasticsearch.Net.BasicAuthenticationCredentials(_elasticSearchOptions.UserName, _elasticSearchOptions.Password)
)
.DefaultIndex(_elasticSearchOptions.IndexAliasName)
.DefaultMappingFor<Vehicle>(m => m.IndexName(_elasticSearchOptions.IndexAliasName));
_elasticClient = new ElasticClient(settings);
}
I have looked in the documentation but haven't seen this particular use case anywhere, so I must be doing something wrong.
How can I enable both aggregation and free text search on the same fields?
Cheers

what you are looking for is the "multi-fields" feature:
Multi-Fields
that way you have the same entry in the document and the engine indexes it twice - once as full text and once as keyword.

Related

Running Query versus a Query Search Query via Nest client?

I am new to ElasticSearch and am writing basic search queries. I want to be able to search the full text field for a keyword. I understand that this can be done using query search query, but I am unclear on how this is done using the Nest client.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.Match(m => m
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));
This is the code I have. Basically, I am trying to search for some text between some date,, but I believe above it is not searching the body for code. Is there a way I can easily change this query so that it is searching the whole body? Or is it already doing that and I'm unware?
I am searching for events in cluster. An example of an event might look like:
{
"text": "string",
"includeExecution": true,
"processIds": "string",
"statuses": [
"string"
],
"space": "string",
"from": "2021-09-17T01:40:03.796Z",
"to": "2021-09-17T01:40:03.796Z",
"take": 0,
"skip": 0,
"orderBy": "string",
"orderByDescending": true
}
In my case, I want to be able to search for the word "string" and have this result come up (because "string" exists on space)
Try using the QueryString query like this. That will search for search.Text in all fields of your documents.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.QueryString(qs => qs
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));

Dynamic field list for MultiMatch - Nest

We have a requirement to have a search for a document type with a variable/dynamic number of fields being queried against. For one search/type it might be Name and Status. For another, the Description field. The fields to be searched against will be chosen by the user at run time.
To do this statically appears easy. Something like this to search in Name and Description fields. (Assume that rootQuery is a valid searchDescriptor ready for the query.
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => ff.Name).Field(ff => ff.Description))));
However, we don't want to have a library of static queries to handle the potential permutations if possible. We'd rather do something dynamic like:
foreach (var field in string-list-of-fields-from-user)
{
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => field);
}
Is this possible? If so, how?
You can pass the string list of fields directly to .Fields(...)
var searchResponse = client.Search<Document>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Query("query")
.Fields(new string[] { "field1", "field2", "field3" })
)
)
);
which yields
{
"query": {
"multi_match": {
"fields": ["field1", "field2", "field3"],
"query": "query"
}
}
}

reindex while converting a string value of a specific field (present in old index) into a number field value (in the new index)

Could I ask, how could I reindex while converting a 'string' field e.g. "field2": "123.2" (in old index documents) into a float/double number e.g. "field2": 123.2 (intended to be in the new index) ? This post is the closest I could get, but I do not know which function to use for the cast/conversion of a string to a number. I am using ElasticSearch version 2.3.3. Thank you very much for any advice !!!
You could use Logstash to reindex your data and convert the field. Something like the following:
input {
elasticsearch {
hosts => "es.server.url"
index => "old_index"
query => "*"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
mutate {
convert => { "fieldname" => "long" }
}
}
output {
elasticsearch {
host => "es.server.url"
index => "new_index"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
Use Elasticsearch templates to specify the mapping for the new index and specify the field as a double type.
The easiest way to build a template is to use the existing mapping.
GET oldindex/_mapping
POST _template/templatename
{
"template" : "newindex", // this can be a wildcard pattern to match indexes
"mappings": { // this is copied from the response of the previous call
"mytype": {
"properties": {
"field2": {
"type": "double" // change the type
}
}
}
}
}
POST newindex
GET newindex/_mapping
Then use the elasticsearch _reindex API to move the data from the old index to the new index and parse the field as a double using an inline scripting (you may need to enable inline scripting)
POST _reindex
{
"source": {
"index": "oldindex"
},
"dest": {
"index": "newindex"
},
"script": {
"inline": "ctx._source.field2 = ctx._source.field2.toDouble()"
}
}
Edit: Updated to use _reindex endpoint

Creating an index Nest

How would I recreate the following index using Elasticsearch Nest API?
Here is the json for the index including the mapping:
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"data": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
Here is my attempt:
var newIndex = client.CreateIndexAsync(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true))
.AddMapping<Object>(mapping => mapping
.IndexAnalyzer("trigram")
.Type("string"))
);
The documentation does not mention anything about this?
UPDATE:
Found this post that uses
var index = new IndexSettings()
and then adds Analysis with the string literal json.
index.Add("analysis", #"{json});
Where can one find more examples like this one and does this work?
Creating an index in older versions
There are two main ways that you can accomplish this as outlined in the Nest Create Index Documentation:
Here is the way where you directly declare the index settings as Fluent Dictionary entries. Just like you are doing in your example above. I tested this locally and it produces the index settings that match your JSON above.
var response = client.CreateIndex(indexName, s => s
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true)
.Add("analysis.filter.trigrams_filter.type", "nGram")
.Add("analysis.filter.trigrams_filter.min_gram", "3")
.Add("analysis.filter.trigrams_filter.max_gram", "3")
.Add("analysis.analyzer.trigrams.type", "custom")
.Add("analysis.analyzer.trigrams.tokenizer", "standard")
.Add("analysis.analyzer.trigrams.filter.0", "lowercase")
.Add("analysis.analyzer.trigrams.filter.1", "trigrams_filter")
)
.AddMapping<Object>(mapping => mapping
.Type("data")
.AllField(af => af.Enabled())
.Properties(prop => prop
.String(sprop => sprop
.Name("text")
.IndexAnalyzer("trigrams")
)
)
)
);
Please note that NEST also includes the ability to create index settings using strongly typed classes as well. I will post an example of that later, if I have time to work through it.
Creating index with NEST 7.x
Please also note that in NEST 7.x CreateIndex method is removed. Use Indices.Create isntead. Here's the example.
_client.Indices
.Create(indexName, s => s
.Settings(se => se
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Setting("merge.policy.merge_factor", "10")));
In case people have NEST 2.0, the .NumberOfReplicas(x).NumberOfShards(y) are in the Settings area now so specify within the lamba expression under Settings.
EsClient.CreateIndex("indexname", c => c
.Settings(s => s
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
)
NEST 2.0 has a lot of changes and moved things around a bit so these answers are a great starting point for sure. You may need to adjust a little for the NEST 2.0 update.
Small example :
EsClient.CreateIndex("indexname", c => c
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
.Settings(s => s
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "15s")
)
#region Analysis
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
.TokenFilters(i => i
.Add("engram", new EdgeNGramTokenFilter
{
MinGram = 1,
MaxGram = 20
}
)
)
.CharFilters(cf => cf
.Add("drop_chars", new PatternReplaceCharFilter
{
Pattern = #"[^0-9]",
Replacement = ""
}
)
#endregion
#region Mapping Categories
.AddMapping<Categories>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(n => n.Label_en)
.Fields(fs => fs
.String(s => s.Name(t => t.Label_en).Analyzer("folded_word"))
)
)
)
#endregion
);
In case anyone has migrated to NEST 2.4 and has the same question - you would need to define your custom filters and analyzers in the index settings like this:
elasticClient.CreateIndex(_indexName, i => i
.Settings(s => s
.Analysis(a => a
.TokenFilters(tf => tf
.EdgeNGram("edge_ngrams", e => e
.MinGram(1)
.MaxGram(50)
.Side(EdgeNGramSide.Front)))
.Analyzers(analyzer => analyzer
.Custom("partial_text", ca => ca
.Filters(new string[] { "lowercase", "edge_ngrams" })
.Tokenizer("standard"))
.Custom("full_text", ca => ca
.Filters(new string[] { "standard", "lowercase" } )
.Tokenizer("standard"))))));
For 7.X plus you can use the following code to create an index with Shards, Replicas and with Automapping:
if (!_elasticClient.Indices.Exists(_elasticClientIndexName).Exists)
{
var response = _elasticClient.Indices
.Create(_elasticClientIndexName, s => s
.Settings(se => se
.NumberOfReplicas(1)
.NumberOfShards(shards)
).Map<YourDTO>(
x => x.AutoMap().DateDetection(false)
));
if (!response.IsValid)
{
// Elasticsearch index status is invalid, log an exception
}
}

Filter on empty string using ElasticSearch/Nest

This may be a silly question, but how do I filter on an empty string in ElasticSearch using Nest. Specifically, how do I recreate the following result:
curl http://localhost:9200/test/event/_search
{
"filter" : { "term" : { "target" : "" }}
}
I've tried:
(f => f
.Term("target", "")
);
which according to ElasticSearch and Nest filtering does not work is treated like a conditionless query and returns everything, while adding a .Strict() throws a DslException:
(f => f
.Strict().Term("target", "")
);
I've also tried .Missing() and .Exists() to no avail.
The relevant section of my _mapping for reference:
{
"event": {
"dynamic": "false",
"properties": {
target": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs"
}
}
}
}
Any pointers would be greatly appreciated.
As the documentation on NEST and writing queries mentions you can toggle Strict() mode to trigger exceptions if a part of your query turns out to be conditionless but if thats what you really wanted then you were stuck as you've found out.
I just committed a .Verbatim() construct which works exactly like .Strict() but instead of throwing an exception it will take the query as is and render it as specified.
(f => f
.Verbatim()
.Term("target", "")
);
Should thus disable the conditionless query rewrite and insert the query literally as specified.
This will make it in the next version of NEST (so after the current version of 0.12.0.0)
I will just remark that you have to use Verbatim() on every query, not just once on the top.
var searchResults = this.Client.Search<Project>(s => s
.Query(q => q
//.Verbatim() // no, here won't work
.Bool(b => b
.Should(
bs => bs.Match(p => p.Query("hello").Field("name").Verbatim()),
bs => bs.Match(p => p.Query("world").Field("name").Verbatim())
)
)
)
);

Resources