Elasticsearch sorting by nested field in nested array - elasticsearch

I'm using ElasticSearch 7.2.1 and NEST 7.2.1
My data structure is following
{
id: "some_id",
"roles" : [
{
"name" : "role_one_name",
"members" : [
{
"id" : "member_one_id",
"name" : "member_one_name",
}
]
},
{
"name" : "role_two_name",
"members" : [
{
"id" : "member_two_id",
"name" : "member_two_name",
}
]
]
}
The idea is that I need to implement sorting by given role name (e.g. role_one_name).
Sorting should be performed on members.name (e.g. members[0].name). In my case members array will always contain one element, but for some roles (omitted in the example) it contains more that one element, so I can't get rid of nested array.
In my head I have an algorithm:
Get needed role by name.
Specify path to the first element in members array.
Point to the name property to sort on.
I'm a newbie in elasticsearch world, and after few days of trying I got a following query (which does not work).
var sortFilters = new List<Func<FieldSortDescriptor<T>, FieldSortDescriptor<T>>>();
var sortFieldValue = "role_two_name";
...
sortFilters.Add(o => o.Nested(n => n
.Path(p => p.Roles)
.Filter(f => f
.Term(t => t
.Field(c => c.Roles.First().Name)
.Value(sortFieldValue)) && f
.Nested(n => n
.Path(p => p.Roles.First().Members)
.Query(q => q
.Term(t => t
.Field(f => f.Roles.First().Members.First().Name)))))));
What am I doing wrong?

With help of my colleagues I managed to solve it.
GET index_name/_search
{
"from": 0,
"size": 20,
"query": {
"match_all": {}
},
"sort": [{
"roles.members.name.keyword": {
"order": "asc",
"nested": {
"path": "roles",
"filter": {
"term": {
"roles.name.keyword": {
"value": "sortFieldValue"
}
}
},
"nested": {
"path": "roles.members"
}
}
}
}
]
}
or using NEST:
sortFilters.Add(o => o.Field(f => f.Roles.First().Members.First().Name.Suffix("keyword")));
sortFilters.Add(o => o.Nested(n => n
.Path(p => p.Roles)
.Filter(f => f
.Term(t => t
.Field(q => q.Roles.First().Name.Suffix("keyword"))
.Value(sortFieldValue)
)
)
.Nested(n => n
.Path(p => p.Roles.First().Members)
)
));

Related

Converting JSON to Elastic NEST query doesn't work as intended

I'm trying to convert the following JSON to NEST, but it's not working as intended. It does match the field with the website, but it doesn't match the range, so I get some very old results.
When using Kibana to search, I send this request:
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"should": [
{
"match": {
"domain": "website.com"
}
}
],
"minimum_should_match": 1
}
},
{
"range": {
"#timestamp": {
"gte": "2020-08-03T12:37:07.821Z",
"lte": "2020-08-18T12:37:07.821Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
},
And converted to NEST:
SearchDescriptor<ApacheRequest> Query(SearchDescriptor<ApacheRequest> qc)
{
var query = qc.Query(q =>
q.Bool(b =>
b.Filter(f =>
f.Bool(fb =>
fb.Should(sh =>
sh.Match(ma => ma
.Field(x => x.Domain)
.Query("website.com")
)
)
),
f => f.Range(r => r.GreaterThanOrEquals(timestamp))
)
)
);
return query;
}
As I said, it matches the domain, but not the range. I get results a month back, even though I've tested that my timestamp is correct.
What am I doing wrong?
Ah, I found the issue.. I'm not supposed to use .Range() but rather .DateRange(). Now my query looks like this:
SearchDescriptor<ApacheRequest> Query(SearchDescriptor<ApacheRequest> qc)
{
var query = qc.Query(q =>
q.Bool(b =>
b.Filter(f =>
f.Bool(fb =>
fb.Must(sh =>
sh.Match(ma => ma
.Field(x => x.Domain)
.Query("website.com")
)
)
),
f => f.DateRange(r =>
r.Field(fi => fi.Timestamp).GreaterThanOrEquals(from)
)
)
)
);
return query;
}

Convert elastic search query + aggregations to nest syntax

I need to turn the following nested query and aggregations as written in Kibana to c# nest syntax.
The main issue is regarding the "harvest-date" sub-aggregation (I need to set it to the last 3 months). but also not sure the query itself is the best practice.
GET tdnetindex/_search
{
"size": 0,
"aggs": {
"TermsAggregation": {
"terms": {
"field": "database",
"size": 100
},
"aggs": {
"DateHistogramAggregation": {
"date_histogram": {
"field": "harvest_date",
"interval": "month"
}
}
}
}
},
"query": {
"bool": {
"filter": {
"range": {
"harvest_date": {
"gte": "now-3M/M"
}
}
}
}
}
}
what I did so far was:
var query = elasticClient.Search<ElasticResponse>(s => s
.Size(0)
.Aggregations(a1 => a1
.Terms("TermsAggregation", t => t
.Field(f => f.DataBase)
.Size(100)
.Aggregations(a2 => a2
.DateHistogram("DateHistogramAggregation", dh => dh
.Field(f => f.HarvestDate)
.Interval(DateInterval.Month)
)
)
)
)
.Query(q => q
.Bool(b => b
.Filter(f => f
.Range(r => r
.GreaterThanOrEquals(....);
)
)
)
)
)
You're almost there, just need to use .DateRange(r => r...) instead of .Range(r => r...).
For the DateMath expression, you can use the string "now-3M/M" directly, or translate to
DateMath.Now.Subtract("3M").RoundTo(DateMathTimeUnit.Month)

Empty String elastic search

I'm using Elastic 6.5 .
I need to include an empty string search with one of the criteria i'm passing.
primaryKey = 1, 2, 3
subKey = "" or subKey = "A" along with a bunch of other criteria.
I've been unable to get the record that has the empty subKey.
i've tried using the MUST_NOT EXISTS but it doesn't fetch the record in question.
So below should return any records that have primarykey of 1, 2, or 3. and subKey of 'A' or Empty String. Filtered by the Date provided. I get all the records Except the record where the subKey is blank.
so i've tried this:
{
"size": 200, "from": 0,
"query": {
"bool": {
"must": [{
"bool": {
"should": [{ "terms": {"primaryKey": [1,2,3] }}]
}
},
{
"bool": {
"should": [
{"match": {"subKey": "A"}},
{
"bool" : {
"must_not": [{ "exists": { "field": "subKey"} }]
}
}
]
}
}],
"filter": [{"range": {"startdate": {"lte": "2018-11-01"}}}]
}
}
}
The subkey field is special.. where it's actually searched by LETTER. But i don't think that effects anything.. but here is the NEST coding i have for that index.
new CreateIndexDescriptor("SpecialIndex").Settings(s => s
.Analysis(a => a
.Analyzers(aa => aa
.Custom("subKey_analyzer", ma => ma
.Tokenizer("subKey_tokenizer")
.Filters("lowercase")
)
)
.Tokenizers(ta => ta
.NGram("subKey_tokenizer", t => t
.MinGram(1)
.MaxGram(1)
.TokenChars(new TokenChar[] { TokenChar.Letter, TokenChar.Whitespace })
)
)
)
)
.Mappings(ms => ms
.Map<SpecialIndex>(m => m
.Properties(p => p
.Text(s => s
.Name(x => x.subKey)
.Analyzer("subKey_analyzer")
)
)
));
Any ideas on how to resolve this? Thank you very much!
NOTE: i've seen posts saying this can be done with a filter, using missing. But as you can see from the query, i need the Query to do this, not the filter.
i've also tried the following rather than the MUST_NOT EXISTS
{
"term": { "subKey": { "value": "" }}
}
but doesn't work. I'm thinking I need another tokenizer to get this working.
Ok, I managed to fix this by using Multi-fields. This is what i did.
Changed the Mappings to this:
.Mappings(ms => ms
.Map<SpecialIndex>(m => m
.Properties(p => p
.Text(s => s
.Name(x => x.subKey)
.Fields(ff => ff
.Text(tt => tt
.Name("subKey")
.Analyzer("subKey_analyzer")
)
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(5)
)
)
)
)
));
then i changed my query BOOL piece to this:
"bool": {
"should": [{
"match": {
"subKey.subKey": {
"query": "A"
}
}
},
{
"term": {
"subKey.keyword": {
"value": ""
}
}
}]
}
what i don't really like about this is that i think Elastic is creating an additional field just to find EMPTY strings of the same field. That really doesn't seem ideal.
Anyone have another suggestion that would be great!
[UPDATE] The NEST implementation needs to use SUFFIX to access the multi-fields.
.Bool(bb => bb
.Should(bbs => bbs
.Match(m => m.Field(f => f.subKey.Suffix("subKey")).Query(search.subKey)),
bbs => bbs
.Term(t => t.Verbatim().Field(f => f.subKey.Suffix("keyword")).Value(string.Empty)))

Multi-term filter in ElasticSearch (NEST)

I am trying to query documents based on a given field having multiple possible values. For example, my documents have an "extension" property which is the extension type of a file like .docx, xls, .pdf, etc. I want to be able to filter my "extensions" property on any number of values, but cannot find the correct syntax needed to get this functionality. Here is my current query:
desc.Type("entity")
.Routing(serviceId)
.From(pageSize * pageOffset)
.Size(pageSize)
.Query(q => q
.Filtered(f => f
.Query(qq =>
qq.MultiMatch(m => m
.Query(query)
.OnFields(_searchFields)) ||
qq.Prefix(p1 => p1
.OnField("entityName")
.Value(query)) ||
qq.Prefix(p2 => p2
.OnField("friendlyUrl")
.Value(query))
)
.Filter(ff =>
ff.Term("serviceId", serviceId) &&
ff.Term("subscriptionId", subscriptionId) &&
ff.Term("subscriptionType", subscriptionType) &&
ff.Term("entityType", entityType)
)
)
);
P.S. It may be easier to think of it in the inverse, where I send up the file extensions I DON'T want and set up the query to get documents that DON'T have any of the extension values given.
After discussion, this should be a raw json query, that should work and can be translated to NEST quite easily:
POST /test/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "VALUE"
}
},
{
"term": {
"subscriptionId": "VALUE"
}
},
{
"term": {
"subscriptionType": "VALUE"
}
},
{
"term": {
"entityType": "VALUE"
}
}
],
"must_not": [
{
"terms": {
"extension": [
"docx",
"doc"
]
}
}
]
}
}
}
}
}
What had to be done:
In order to have clauses that have to exist and the ones, that need to be filtered out, bool query suited best.
Must query stores all clauses that are present in OPs query
Must_not query should store all extensions that need to be filtered out
If you want to return items that match ".doc" OR ".xls" then you want a TERMS query. Here is a sample:
var searchResult = ElasticClient
.Search<SomeESType>(s => s
.Query(q => q
.Filtered(fq => fq
.Filter(f => f
.Terms(t => t.Field123, new List<string> {".doc", ".xls"})
)
)
)
)

How to use ElasticSearch Query params (DSL query) for multiple types?

I have been working with the ElasticSearch from last few months, but still find it complicated when I have to pass an complicated query.
I want to run the query which will have to search the multiple "types" and each type has to be searched with its own "filters", but need to have combined "searched results"
For example:
I need to search the "user type" document which are my friends and on the same time i have to search the "object type" document which I like, according to the keyword provided.
OR
The query that has both the "AND" and "NOT" clause
Example query:
$options['query'] = array(
'query' => array(
'filtered' => array(
'query' => array(
'query_string' => array(
'default_field' => 'name',
'query' => $this->search_term . '*',
),
),
'filter' => array(
'and' => array(
array(
'term' => array(
'access_id' => 2,
),
),
),
'not' => array(
array(
'term' => array(
'follower' => 32,
),
),
array(
'term' => array(
'fan' => 36,
),
),
),
),
),
),
);
as this query is meant to search the user with access_id = 2, but must not have the follower of id 32 and fan of id 36
but this is not working..
Edit: Modified query
{
"query": {
"filtered": {
"filter": {
"and": [
{
"not": {
"filter": {
"and": [
{
"query": {
"query_string": {
"default_field": "fan",
"query": "*510*"
}
}
},
{
"query": {
"query_string": {
"default_field": "follower",
"query": "*510*"
}
}
}
]
}
}
},
{
"term": {
"access_id": 2
}
}
]
},
"query": {
"field": {
"name": "xyz*"
}
}
}
}
}
now after running this query, i am getting two results, one with follower: "34,518" & fan: "510" and second with fan:"34", but isn't it supposed to be only the second one in the result.
Any ideas?
You may want to look at the slides of a presentation that I gave this month, which explains the basics of how the query DSL works:
Terms of endearment - the ElasticSearch Query DSL explained
The problem with your query is that your filters are nested incorrectly. The and and not filters are at the same level, but the not filter should be under and:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"and" : [
{
"not" : {
"filter" : {
"and" : [
{
"term" : {
"fan" : 36
}
},
{
"term" : {
"follower" : 32
}
}
]
}
}
},
{
"term" : {
"access_id" : 2
}
}
]
},
"query" : {
"field" : {
"name" : "keywords to search"
}
}
}
}
}
'
I just tried it with the "BOOL"
{
"query": {
"bool": {
"must": [
{
"term": {
"access_id": 2
}
},
{
"wildcard": {
"name": "xyz*"
}
}
],
"must_not": [
{
"wildcard": {
"follower": "*510*"
}
},
{
"wildcard": {
"fan": "*510*"
}
}
]
}
}
}
It gives the correct answer.
but I'm not sure should it be used like this ?

Resources