Filter on empty string using ElasticSearch/Nest - filter

This may be a silly question, but how do I filter on an empty string in ElasticSearch using Nest. Specifically, how do I recreate the following result:
curl http://localhost:9200/test/event/_search
{
"filter" : { "term" : { "target" : "" }}
}
I've tried:
(f => f
.Term("target", "")
);
which according to ElasticSearch and Nest filtering does not work is treated like a conditionless query and returns everything, while adding a .Strict() throws a DslException:
(f => f
.Strict().Term("target", "")
);
I've also tried .Missing() and .Exists() to no avail.
The relevant section of my _mapping for reference:
{
"event": {
"dynamic": "false",
"properties": {
target": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs"
}
}
}
}
Any pointers would be greatly appreciated.

As the documentation on NEST and writing queries mentions you can toggle Strict() mode to trigger exceptions if a part of your query turns out to be conditionless but if thats what you really wanted then you were stuck as you've found out.
I just committed a .Verbatim() construct which works exactly like .Strict() but instead of throwing an exception it will take the query as is and render it as specified.
(f => f
.Verbatim()
.Term("target", "")
);
Should thus disable the conditionless query rewrite and insert the query literally as specified.
This will make it in the next version of NEST (so after the current version of 0.12.0.0)

I will just remark that you have to use Verbatim() on every query, not just once on the top.
var searchResults = this.Client.Search<Project>(s => s
.Query(q => q
//.Verbatim() // no, here won't work
.Bool(b => b
.Should(
bs => bs.Match(p => p.Query("hello").Field("name").Verbatim()),
bs => bs.Match(p => p.Query("world").Field("name").Verbatim())
)
)
)
);

Related

Running Query versus a Query Search Query via Nest client?

I am new to ElasticSearch and am writing basic search queries. I want to be able to search the full text field for a keyword. I understand that this can be done using query search query, but I am unclear on how this is done using the Nest client.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.Match(m => m
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));
This is the code I have. Basically, I am trying to search for some text between some date,, but I believe above it is not searching the body for code. Is there a way I can easily change this query so that it is searching the whole body? Or is it already doing that and I'm unware?
I am searching for events in cluster. An example of an event might look like:
{
"text": "string",
"includeExecution": true,
"processIds": "string",
"statuses": [
"string"
],
"space": "string",
"from": "2021-09-17T01:40:03.796Z",
"to": "2021-09-17T01:40:03.796Z",
"take": 0,
"skip": 0,
"orderBy": "string",
"orderByDescending": true
}
In my case, I want to be able to search for the word "string" and have this result come up (because "string" exists on space)
Try using the QueryString query like this. That will search for search.Text in all fields of your documents.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.QueryString(qs => qs
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));

How to make a field Keyword type and Text type at the same time to enable Aggregations and free text search simultainously

We have an application with an ElasticSearch backend. I am trying to enable free text search for some fields and aggregate the same fields. The free text search is expecting the fields to be of the text type and the aggregator is expecting them to be of keyword type. Copy_to doesn't seem to be able to copy the keywords to a text field.
This aggregation works well with the keyword type:
var aggs = await _dataService.Search<Vehicle>(s => s
.Size(50)
.Query(q => q.MatchAll())
.Aggregations(a => a
.Terms("stockLocation_city", c => c
.Field(f => f.StockLocation.City)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
.Terms("stockLocation_country", c => c
.Field(f => f.StockLocation.Country)
.Size(int.MaxValue)
.ShowTermDocCountError(false)
)
)
);
The schema looks like this:
"stockLocation": {
"type": "object",
"properties": {
"full_text": {
"type": "text",
"analyzer": "custom_analyzer"
},
"city": {
"type": "keyword",
"copy_to":"full_text"
},
"country": {
"type": "keyword",
"copy_to":"full_text"
}
}
}
The query for the full-text search which works with text fields copied to the full_text property:
var qieryDescriptor = query.SimpleQueryString(p => p.Query(searchQuery.FreeText));
And the ElasticClient instantiation:
public ElasticSearchService(ElasticSearchOptions elasticSearchOptions)
{
_elasticSearchOptions = elasticSearchOptions;
var settings = new ConnectionSettings(
_elasticSearchOptions.CloudId,
new Elasticsearch.Net.BasicAuthenticationCredentials(_elasticSearchOptions.UserName, _elasticSearchOptions.Password)
)
.DefaultIndex(_elasticSearchOptions.IndexAliasName)
.DefaultMappingFor<Vehicle>(m => m.IndexName(_elasticSearchOptions.IndexAliasName));
_elasticClient = new ElasticClient(settings);
}
I have looked in the documentation but haven't seen this particular use case anywhere, so I must be doing something wrong.
How can I enable both aggregation and free text search on the same fields?
Cheers
what you are looking for is the "multi-fields" feature:
Multi-Fields
that way you have the same entry in the document and the engine indexes it twice - once as full text and once as keyword.

Trying to filter some Elasticsearch results where the field might not exist

I have some data and I'm trying to add an extra filter that will exclude/filter-out any results which is where the key/value is foo.IsMarried == true.
Now, there's heaps of documents that don't have this field. If the field doesn't exist, then I'm assuming that the value is foo.IsMarried = false .. so those documents will be included in the result set.
Can anyone provide any clues, please?
I'm also using the .NET 'NEST' nuget client library - so I'll be really appreciative if the answer could be targeting that, but just happy with any answer, really.
Generally, within elasticsearch, for a boolean field, if the field doesn't exist, it doesn't mean that it's value is false. It could be that there is no value against it.
But, based on the assumption you are making in this case - we can check if the field foo.isMarried is explicitly false OR it does not exist in the document itself.
The query presented by Rahul in the other answer does the job. However since you wanted a NEST version of the same, the query can be constructed using the below snippet of code.
// Notice the use of not exists here. If you do not want to check for the 'false' value,
// you can omit the first term filter here. 'T' is the type to which you are mapping your index.
// You should pass the field based on the structure of 'T'.
private static QueryContainer BuildNotExistsQuery()
{
var boolQuery = new QueryContainerDescriptor<T>().Bool(
b => b.Should(
s => s.Term(t => t.Field(f => f.foo.IsMarried).Value(false)),
s => !s.Exists(ne => ne.Field(f => f.foo.IsMarried))
)
);
}
You can trigger the search through the NEST client within your project as shown below.
var result = client.Search<T>(
.From(0)
.Size(20)
.Query(q => BuildNotExistsQuery())
// other methods that you want to chain go here
)
You can use a should query with following conditions.
IsMarried = false
must not exists IsMarried
POST test/person/
{"name": "p1", "IsMarried": false}
POST test/person/
{"name": "p2", "IsMarried": true}
POST test/person/
{"name": "p3"}
Raw DSL query
POST test/person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"IsMarried": false
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "IsMarried"
}
}
}
}
]
}
}
}
I hope you can convert this raw DSL query to NEST!

Creating an index Nest

How would I recreate the following index using Elasticsearch Nest API?
Here is the json for the index including the mapping:
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"data": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
Here is my attempt:
var newIndex = client.CreateIndexAsync(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true))
.AddMapping<Object>(mapping => mapping
.IndexAnalyzer("trigram")
.Type("string"))
);
The documentation does not mention anything about this?
UPDATE:
Found this post that uses
var index = new IndexSettings()
and then adds Analysis with the string literal json.
index.Add("analysis", #"{json});
Where can one find more examples like this one and does this work?
Creating an index in older versions
There are two main ways that you can accomplish this as outlined in the Nest Create Index Documentation:
Here is the way where you directly declare the index settings as Fluent Dictionary entries. Just like you are doing in your example above. I tested this locally and it produces the index settings that match your JSON above.
var response = client.CreateIndex(indexName, s => s
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true)
.Add("analysis.filter.trigrams_filter.type", "nGram")
.Add("analysis.filter.trigrams_filter.min_gram", "3")
.Add("analysis.filter.trigrams_filter.max_gram", "3")
.Add("analysis.analyzer.trigrams.type", "custom")
.Add("analysis.analyzer.trigrams.tokenizer", "standard")
.Add("analysis.analyzer.trigrams.filter.0", "lowercase")
.Add("analysis.analyzer.trigrams.filter.1", "trigrams_filter")
)
.AddMapping<Object>(mapping => mapping
.Type("data")
.AllField(af => af.Enabled())
.Properties(prop => prop
.String(sprop => sprop
.Name("text")
.IndexAnalyzer("trigrams")
)
)
)
);
Please note that NEST also includes the ability to create index settings using strongly typed classes as well. I will post an example of that later, if I have time to work through it.
Creating index with NEST 7.x
Please also note that in NEST 7.x CreateIndex method is removed. Use Indices.Create isntead. Here's the example.
_client.Indices
.Create(indexName, s => s
.Settings(se => se
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Setting("merge.policy.merge_factor", "10")));
In case people have NEST 2.0, the .NumberOfReplicas(x).NumberOfShards(y) are in the Settings area now so specify within the lamba expression under Settings.
EsClient.CreateIndex("indexname", c => c
.Settings(s => s
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
)
NEST 2.0 has a lot of changes and moved things around a bit so these answers are a great starting point for sure. You may need to adjust a little for the NEST 2.0 update.
Small example :
EsClient.CreateIndex("indexname", c => c
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
.Settings(s => s
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "15s")
)
#region Analysis
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
.TokenFilters(i => i
.Add("engram", new EdgeNGramTokenFilter
{
MinGram = 1,
MaxGram = 20
}
)
)
.CharFilters(cf => cf
.Add("drop_chars", new PatternReplaceCharFilter
{
Pattern = #"[^0-9]",
Replacement = ""
}
)
#endregion
#region Mapping Categories
.AddMapping<Categories>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(n => n.Label_en)
.Fields(fs => fs
.String(s => s.Name(t => t.Label_en).Analyzer("folded_word"))
)
)
)
#endregion
);
In case anyone has migrated to NEST 2.4 and has the same question - you would need to define your custom filters and analyzers in the index settings like this:
elasticClient.CreateIndex(_indexName, i => i
.Settings(s => s
.Analysis(a => a
.TokenFilters(tf => tf
.EdgeNGram("edge_ngrams", e => e
.MinGram(1)
.MaxGram(50)
.Side(EdgeNGramSide.Front)))
.Analyzers(analyzer => analyzer
.Custom("partial_text", ca => ca
.Filters(new string[] { "lowercase", "edge_ngrams" })
.Tokenizer("standard"))
.Custom("full_text", ca => ca
.Filters(new string[] { "standard", "lowercase" } )
.Tokenizer("standard"))))));
For 7.X plus you can use the following code to create an index with Shards, Replicas and with Automapping:
if (!_elasticClient.Indices.Exists(_elasticClientIndexName).Exists)
{
var response = _elasticClient.Indices
.Create(_elasticClientIndexName, s => s
.Settings(se => se
.NumberOfReplicas(1)
.NumberOfShards(shards)
).Map<YourDTO>(
x => x.AutoMap().DateDetection(false)
));
if (!response.IsValid)
{
// Elasticsearch index status is invalid, log an exception
}
}

How do && and || work constructing queries in NEST?

According to http://nest.azurewebsites.net/concepts/writing-queries.html, the && and || operators can be used to combine two queries using the NEST library to communicate with Elastic Search.
I have the following query set up:
var ssnQuery = Query<NameOnRecordDTO>.Match(
q => q.OnField(f => f.SocialSecurityNumber).QueryString(nameOnRecord.SocialSecurityNumber).Fuzziness(0)
);
which is then combined with a Bool query as shown below:
var result = client.Search<NameOnRecordDTO>(
body => body.Query(
query => query.Bool(
bq => bq.Should(
q => q.Match(
p => p.OnField(f => f.Name.First)
.QueryString(nameOnRecord.Name.First).Fuzziness(fuzziness)
),
q => q.Match(p => p.OnField(f => f.Name.Last)
.QueryString(nameOnRecord.Name.Last).Fuzziness(fuzziness)
)
).MinimumNumberShouldMatch(2)
) || ssnQuery
)
);
What I think this query means is that if the SocialSecurityNumber matches, or both the Name.First and Name.Last fields match, then the record should be included in the results.
When I execute this query with the follow data for the nameOnRecord object used in the calls to QueryString:
"socialSecurityNumber":"123456789",
"name" : {
"first":"ryan",
}
the results are the person with SSN 123456789, along with anyone with first name ryan.
If I remove the || ssnQuery from the query above, I get everyone whose first name is 'ryan'.
With the || ssnQuery in place and the following query:
{
"socialSecurityNumber":"123456789",
"name" : {
"first":"ryan",
"last": "smith"
}
}
I appear to get the person with SSN 123456789 along with people whose first name is 'ryan' or last name is 'smith'.
So it does not appear that adding || ssnQuery is having the effect that I expected, and I don't know why.
Here is the definition of the index on object in question:
"nameonrecord" : {
"properties": {
"name": {
"properties": {
"name.first": {
"type": "string"
},
"name.last": {
"type": "string"
}
}
},
"address" : {
"properties": {
"address.address1": {
"type": "string",
"index_analyzer": "address",
"search_analyzer": "address"
},
"address.address2": {
"type": "string",
"analyzer": "address"
},
"address.city" : {
"type": "string",
"analyzer": "standard"
},
"address.state" : {
"type": "string",
"analyzer": "standard"
},
"address.zip" : {
"type" : "string",
"analyzer": "standard"
}
}
},
"otherName": {
"type": "string"
},
"socialSecurityNumber" : {
"type": "string"
},
"contactInfo" : {
"properties": {
"contactInfo.phone": {
"type": "string"
},
"contactInfo.email": {
"type": "string"
}
}
}
}
}
I don't think the definition of the address analyzer is important, since the address fields are not being used in the query, but can include it if someone wants to see it.
This was in fact a bug in NEST
A precursor to how NEST helps translate boolean queries:
NEST allows you to use operator overloading to create verbose bool queries/filters easily i.e:
term && term will result in:
bool
must
term
term
A naive implementation of this would rewrite
term && term && term to
bool
must
term
bool
must
term
term
As you can image this becomes unwieldy quite fast the more complex a query becomes NEST can spot these and join them together to become
bool
must
term
term
term
Likewise term && term && term && !term simply becomes:
bool
must
term
term
term
must_not
term
now if in the previous example you pass in a booleanquery directly like so
bool(must=term, term, term) && !term
it would still generate the same query. NEST will also do the same with should's when it sees that the boolean descriptors in play ONLY consist of should clauses. This is because the boolquery does not quite follow the same boolean logic you expect from a programming language.
To summarize the latter:
term || term || term
becomes
bool
should
term
term
term
but
term1 && (term2 || term3 || term4) will NOT become
bool
must
term1
should
term2
term3
term4
This is because as soon as a boolean query has a must clause the should start acting as a boosting factor. So in the previous you could get back results that ONLY contain term1 this is clearly not what you want in the strict boolean sense of the input.
NEST therefor rewrites this query to
bool
must
term1
bool
should
term2
term3
term4
Now where the bug came into play was that your situation you have this
bool(should=term1, term2, minimum_should_match=2) || term3 NEST identified both sides of the OR operation only contains should clauses and it would join them together which would give a different meaning to the minimum_should_match parameter of the first boolean query.
I just pushed a fix for this and this will be fixed in the next release 0.11.8.0
Thanks for catching this one!

Resources