Multi-term filter in ElasticSearch (NEST) - elasticsearch

I am trying to query documents based on a given field having multiple possible values. For example, my documents have an "extension" property which is the extension type of a file like .docx, xls, .pdf, etc. I want to be able to filter my "extensions" property on any number of values, but cannot find the correct syntax needed to get this functionality. Here is my current query:
desc.Type("entity")
.Routing(serviceId)
.From(pageSize * pageOffset)
.Size(pageSize)
.Query(q => q
.Filtered(f => f
.Query(qq =>
qq.MultiMatch(m => m
.Query(query)
.OnFields(_searchFields)) ||
qq.Prefix(p1 => p1
.OnField("entityName")
.Value(query)) ||
qq.Prefix(p2 => p2
.OnField("friendlyUrl")
.Value(query))
)
.Filter(ff =>
ff.Term("serviceId", serviceId) &&
ff.Term("subscriptionId", subscriptionId) &&
ff.Term("subscriptionType", subscriptionType) &&
ff.Term("entityType", entityType)
)
)
);
P.S. It may be easier to think of it in the inverse, where I send up the file extensions I DON'T want and set up the query to get documents that DON'T have any of the extension values given.

After discussion, this should be a raw json query, that should work and can be translated to NEST quite easily:
POST /test/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "VALUE"
}
},
{
"term": {
"subscriptionId": "VALUE"
}
},
{
"term": {
"subscriptionType": "VALUE"
}
},
{
"term": {
"entityType": "VALUE"
}
}
],
"must_not": [
{
"terms": {
"extension": [
"docx",
"doc"
]
}
}
]
}
}
}
}
}
What had to be done:
In order to have clauses that have to exist and the ones, that need to be filtered out, bool query suited best.
Must query stores all clauses that are present in OPs query
Must_not query should store all extensions that need to be filtered out

If you want to return items that match ".doc" OR ".xls" then you want a TERMS query. Here is a sample:
var searchResult = ElasticClient
.Search<SomeESType>(s => s
.Query(q => q
.Filtered(fq => fq
.Filter(f => f
.Terms(t => t.Field123, new List<string> {".doc", ".xls"})
)
)
)
)

Related

Converting JSON to Elastic NEST query doesn't work as intended

I'm trying to convert the following JSON to NEST, but it's not working as intended. It does match the field with the website, but it doesn't match the range, so I get some very old results.
When using Kibana to search, I send this request:
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"should": [
{
"match": {
"domain": "website.com"
}
}
],
"minimum_should_match": 1
}
},
{
"range": {
"#timestamp": {
"gte": "2020-08-03T12:37:07.821Z",
"lte": "2020-08-18T12:37:07.821Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
},
And converted to NEST:
SearchDescriptor<ApacheRequest> Query(SearchDescriptor<ApacheRequest> qc)
{
var query = qc.Query(q =>
q.Bool(b =>
b.Filter(f =>
f.Bool(fb =>
fb.Should(sh =>
sh.Match(ma => ma
.Field(x => x.Domain)
.Query("website.com")
)
)
),
f => f.Range(r => r.GreaterThanOrEquals(timestamp))
)
)
);
return query;
}
As I said, it matches the domain, but not the range. I get results a month back, even though I've tested that my timestamp is correct.
What am I doing wrong?
Ah, I found the issue.. I'm not supposed to use .Range() but rather .DateRange(). Now my query looks like this:
SearchDescriptor<ApacheRequest> Query(SearchDescriptor<ApacheRequest> qc)
{
var query = qc.Query(q =>
q.Bool(b =>
b.Filter(f =>
f.Bool(fb =>
fb.Must(sh =>
sh.Match(ma => ma
.Field(x => x.Domain)
.Query("website.com")
)
)
),
f => f.DateRange(r =>
r.Field(fi => fi.Timestamp).GreaterThanOrEquals(from)
)
)
)
);
return query;
}

Empty String elastic search

I'm using Elastic 6.5 .
I need to include an empty string search with one of the criteria i'm passing.
primaryKey = 1, 2, 3
subKey = "" or subKey = "A" along with a bunch of other criteria.
I've been unable to get the record that has the empty subKey.
i've tried using the MUST_NOT EXISTS but it doesn't fetch the record in question.
So below should return any records that have primarykey of 1, 2, or 3. and subKey of 'A' or Empty String. Filtered by the Date provided. I get all the records Except the record where the subKey is blank.
so i've tried this:
{
"size": 200, "from": 0,
"query": {
"bool": {
"must": [{
"bool": {
"should": [{ "terms": {"primaryKey": [1,2,3] }}]
}
},
{
"bool": {
"should": [
{"match": {"subKey": "A"}},
{
"bool" : {
"must_not": [{ "exists": { "field": "subKey"} }]
}
}
]
}
}],
"filter": [{"range": {"startdate": {"lte": "2018-11-01"}}}]
}
}
}
The subkey field is special.. where it's actually searched by LETTER. But i don't think that effects anything.. but here is the NEST coding i have for that index.
new CreateIndexDescriptor("SpecialIndex").Settings(s => s
.Analysis(a => a
.Analyzers(aa => aa
.Custom("subKey_analyzer", ma => ma
.Tokenizer("subKey_tokenizer")
.Filters("lowercase")
)
)
.Tokenizers(ta => ta
.NGram("subKey_tokenizer", t => t
.MinGram(1)
.MaxGram(1)
.TokenChars(new TokenChar[] { TokenChar.Letter, TokenChar.Whitespace })
)
)
)
)
.Mappings(ms => ms
.Map<SpecialIndex>(m => m
.Properties(p => p
.Text(s => s
.Name(x => x.subKey)
.Analyzer("subKey_analyzer")
)
)
));
Any ideas on how to resolve this? Thank you very much!
NOTE: i've seen posts saying this can be done with a filter, using missing. But as you can see from the query, i need the Query to do this, not the filter.
i've also tried the following rather than the MUST_NOT EXISTS
{
"term": { "subKey": { "value": "" }}
}
but doesn't work. I'm thinking I need another tokenizer to get this working.
Ok, I managed to fix this by using Multi-fields. This is what i did.
Changed the Mappings to this:
.Mappings(ms => ms
.Map<SpecialIndex>(m => m
.Properties(p => p
.Text(s => s
.Name(x => x.subKey)
.Fields(ff => ff
.Text(tt => tt
.Name("subKey")
.Analyzer("subKey_analyzer")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(5)
)
)
)
)
));
then i changed my query BOOL piece to this:
"bool": {
"should": [{
"match": {
"subKey.subKey": {
"query": "A"
}
}
},
{
"term": {
"subKey.keyword": {
"value": ""
}
}
}]
}
what i don't really like about this is that i think Elastic is creating an additional field just to find EMPTY strings of the same field. That really doesn't seem ideal.
Anyone have another suggestion that would be great!
[UPDATE] The NEST implementation needs to use SUFFIX to access the multi-fields.
.Bool(bb => bb
.Should(bbs => bbs
.Match(m => m.Field(f => f.subKey.Suffix("subKey")).Query(search.subKey)),
bbs => bbs
.Term(t => t.Verbatim().Field(f => f.subKey.Suffix("keyword")).Value(string.Empty)))

What replaces TermsExecution.And in NEST 2.3.3 (upgrading from NEST 1.6.2)

We are in the process of upgrading ElasticSearch and NEST from 1.6.2 -> 2.3.3.
What replaces how we do TermsExecution.And in 2.3.3?
How can this be easily done with an unknown number of terms that need to match? e.g. before you were able to just pass in an array.
TermsExecution.And on a terms query should be converted to a bool query with a conjunction of must (or filter, depending on query/filter context) queries, with each query being a term query on an individual value.
For example,
client.Search<dynamic>(s => s
.Query(q => +q
.Term("field", "value1")
&& +q
.Term("field", "value2")
)
);
yields
{
"query": {
"bool": {
"filter": [
{
"term": {
"field": {
"value": "value1"
}
}
},
{
"term": {
"field": {
"value": "value2"
}
}
}
]
}
}
}

How to use bool query with must clause and filter clause in NEST 2.3.0

The docs says that Filtered query is
Deprecated in 2.0.0-beta1. Use the bool query instead with a must
clause for the query and a filter clause for the filter.
source
Is this a proper use of filter clause?
var result = client.Search<Post>(x => x
.Query(q => q
.Bool(b => b
.Must(m => m
.MultiMatch(mp => mp
.Query(query)
.Fields(f => f
.Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags))))
.Filter(f => f
.Bool(b1 => b1
.Must(filters)))))); // or filter?
query is a string and filters is Func<QueryContainerDescriptor<Post>, QueryContainer>[]
the raw JSON request is:
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "javascript",
"fields": [ "title", "body", "tags" ]
}
}
],
"filter": [
{
"bool": {
"must": [
{ "term": { "tags": { "value": "javascript" } } },
{ "term": { "tags": { "value": "ajax" } } },
{ "term": { "tags": { "value": "jquery" } } }
]
}
}
]
}
Where you would have used filtered query query, you would now use a bool query must clause and likewise, where you would have used a filtered query filter, you would now use a bool query filter clause.
In your case, you have multiple filter clauses that must be satisfied so wrapping as a set of must clauses in a bool query passed to the outer bool filter clause is correct.
In Elasticsearch 2.0, queries and filters merged into one, with the notion of a query context and a filter context; when wrapped in a bool query filter clause, a query/filter is in a filter context so relevance scores will not be calculated and it will be cacheable.
NEST 2.x aligns with the change in Elasticsearch 2.0 and has queries (QueryContainer, QueryContainerDescriptor<T>, etc.) that can be used in both query and filter contexts.

A simple AND query with Elasticsearch

I am trying to do a simple query for two specified fields, and the manual and google is proving to be of little help. Example below should make it pretty clear what I want to do.
{
"query": {
"and": {
"term": {
"name.family_name": "daniel",
"name.given_name": "tyrone"
}
}
}
}
As a bonus question, why does it find "Daniel Tyrone" with "daniel", but NOT if I search for "Daniel". It behaves like a realy weird anti case sensitive search.
Edit: Updated, sorry. You need a separate Term object for each field, inside of a Bool query:
{
"query": {
"bool": {
"must" : [
{
"term": {
"name.family_name": "daniel"
}
},
{
"term": {
"name.given_name": "tyrone"
}
}
]
}
}
}
Term queries are not analyzed by ElasticSearch, which makes them case sensitive. A Term query says to ES "look for this exact token inside your index, including case and punctuation".
If you want case insensitivity, you could add a keyword + lowercase filter to your analyzer. Alternatively, you could use a query that analyzes your text at query time (like a Match query)
Edit2: You could also use And or Bool filters too.
I found a solution for at least multiple text comparisons on the same field:
{
"query": {
"match": {
"name.given_name": {
"query": "daniel tyrone",
"operator": "and"
}
}
}
And I found this for multiple fields, is this the correct way?
{
"query": {
"bool": {
"must": [
{
"match": {
"name.formatted": {
"query": "daniel tyrone",
"operator": "and"
}
}
},
{
"match": {
"display_name": "tyrone"
}
}
]
}
}
}
If composing the json with PHP, these 2 examples worked for me.
$activeFilters is just a comma separated string like: 'attractions, limpopo'
$articles = Article::searchByQuery(array(
'match' => array(
'cf_categories' => array(
'query' => $activeFilters,
'operator' =>'and'
)
)
));
// The below code is also working 100%
// Using Query String https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-filter.html
// https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
/* $articles = Article::searchByQuery(array(
'query_string' => array(
'query' => 'cf_categories:attractions AND cf_categories:limpopo'
)
)); */
This worked for me: minimum_should_match is set to 2 since the number of parameters for the AND query are 2.
{
"query": {
"bool": {
"should": [
{"term": { "name.family_name": "daniel"}},
{"term": { "name.given_name": "tyrone" }}
],
"minimum_should_match" : 2
}
}
}

Resources