Elasticsearch search with nested properties - elasticsearch

I have an instance in which articles are stored which have various properties. But it may be that some items have no properties at all. There are countless properties and assigned values, all in random order.
Now the problem is that unfortunately it doesn't work the way I would like it to. The properties are respected, but it seems like the order of the properties is important. But it can be that there are a lot of properties in the entry in the instance and only 1-2 are queried in the search query and these can have minor deviations in the value.
The goal is to find entries that are as similar as possible, no matter the order of the properties.
Can anyone help me with this?
Elastic instance info:
"_index" : "articles",
"_type" : "_doc",
"_id" : "fYjaQXkBBdCju4scstN_",
"_score" : 1.0,
"_source" : {
"position" : "400.000",
"beschreibung" : "asc",
"menge" : 24.0,
"einheit" : "St",
"properties" : [
{
"desc" : "Farbe",
"val" : "rot"
},
{
"desc" : "Material",
"val" : "Holz"
},
{
"desc" : "Länge",
"val" : "20 cm"
},
{
"desc" : "Breite",
"val" : "100 km"
}
]
}
}
The nested part of my current query:
[nested] => Array
(
[path] => properties
[query] => Array
(
[0] => Array
(
[0] => Array
(
[bool] => Array
(
[should] => Array
(
[0] => Array
(
[match] => Array
(
[properties.desc] => Farbe
)
)
[1] => Array
(
[match] => Array
(
[properties.val] => rot
)
)
)
)
)
)
[1] => Array
(
[0] => Array
(
[bool] => Array
(
[should] => Array
(
[0] => Array
(
[match] => Array
(
[properties.desc] => Länge
)
)
[1] => Array
(
[match] => Array
(
[properties.val] => 22 cm
)
)
)
)
)
)
[2] => Array
(
[0] => Array
(
[bool] => Array
(
[should] => Array
(
[0] => Array
(
[match] => Array
(
[properties.desc] => Material
)
)
[1] => Array
(
[match] => Array
(
[properties.val] => Holz
)
)
)
)
)
)
)
)

There are two problems in your query leading to strange results:
You're using a match query on a text field, which has multiple terms. So when doing a
"match": {
"properties.val": "22 cm",
}
, Elasticsearch searches for "22" OR "cm" in the properties.val field. I assume you wanna match on the whole phrase, so you could for example use match_phrase here. Alternatively, you could put the unit into an own field. Another option would be to use the operator parameter:
"match": {
"properties.val": {
"query": "20 cm",
"operator": "and"
}
}
But be aware that this isn't looking for exact phrase. For example "20 30 cm" would also be matched, but maybe this could suit your case.
You're using the should clause on the property level. So you're basically asking for documents with properties, that "should have Farbe in their description and rot in their value", but that would match all following examples:
"properties" : [
{
"desc" : "Farbe",
"val" : "blau"
}
]
"properties" : [
{
"desc" : "Material",
"val" : "rot"
}
]
"properties" : [
{
"desc" : "Farbe",
"val" : "blau"
},
{
"desc" : "Material",
"val" : "rot"
}
]
So you need a bool query (having must or filter clauses) for each property you wanna match and a bool query around that having a should clause for each property. Your query from the question could then be like this:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"filter": [
{
"match": {
"properties.desc": "Farbe"
}
}
],
"must": [
{
"match_phrase": {
"properties.val": "rot"
}
}
]
}
}
}
},
{
"nested": {
"path": "properties",
"query": {
"bool": {
"filter": [
{
"match": {
"properties.desc": "Länge"
}
}
],
"must": [
{
"match_phrase": {
"properties.val": "22 cm"
}
}
]
}
}
}
},
{
"nested": {
"path": "properties",
"query": {
"bool": {
"filter": [
{
"match": {
"properties.desc": "Material"
}
}
],
"must": [
{
"match_phrase": {
"properties.val": "Holz"
}
}
]
}
}
}
}
]
}
}
}
Please try, if this gives the desired results.
You could still tweak the query for example by using minimum_should_match or defining the score given by each matched property.

Related

Elasticsearch sorting by nested field in nested array

I'm using ElasticSearch 7.2.1 and NEST 7.2.1
My data structure is following
{
id: "some_id",
"roles" : [
{
"name" : "role_one_name",
"members" : [
{
"id" : "member_one_id",
"name" : "member_one_name",
}
]
},
{
"name" : "role_two_name",
"members" : [
{
"id" : "member_two_id",
"name" : "member_two_name",
}
]
]
}
The idea is that I need to implement sorting by given role name (e.g. role_one_name).
Sorting should be performed on members.name (e.g. members[0].name). In my case members array will always contain one element, but for some roles (omitted in the example) it contains more that one element, so I can't get rid of nested array.
In my head I have an algorithm:
Get needed role by name.
Specify path to the first element in members array.
Point to the name property to sort on.
I'm a newbie in elasticsearch world, and after few days of trying I got a following query (which does not work).
var sortFilters = new List<Func<FieldSortDescriptor<T>, FieldSortDescriptor<T>>>();
var sortFieldValue = "role_two_name";
...
sortFilters.Add(o => o.Nested(n => n
.Path(p => p.Roles)
.Filter(f => f
.Term(t => t
.Field(c => c.Roles.First().Name)
.Value(sortFieldValue)) && f
.Nested(n => n
.Path(p => p.Roles.First().Members)
.Query(q => q
.Term(t => t
.Field(f => f.Roles.First().Members.First().Name)))))));
What am I doing wrong?
With help of my colleagues I managed to solve it.
GET index_name/_search
{
"from": 0,
"size": 20,
"query": {
"match_all": {}
},
"sort": [{
"roles.members.name.keyword": {
"order": "asc",
"nested": {
"path": "roles",
"filter": {
"term": {
"roles.name.keyword": {
"value": "sortFieldValue"
}
}
},
"nested": {
"path": "roles.members"
}
}
}
}
]
}
or using NEST:
sortFilters.Add(o => o.Field(f => f.Roles.First().Members.First().Name.Suffix("keyword")));
sortFilters.Add(o => o.Nested(n => n
.Path(p => p.Roles)
.Filter(f => f
.Term(t => t
.Field(q => q.Roles.First().Name.Suffix("keyword"))
.Value(sortFieldValue)
)
)
.Nested(n => n
.Path(p => p.Roles.First().Members)
)
));

how to get count of not-null value based on specific field in Elasticsearch

I have elastic search index and I need total count of records that one of fields ("actual_start") is not-null how can I do this?
I have wrote this query and I want to append count of not-null actual start value to the result of my query:
$params = [
'index' => "appointment_request",
'body' => [
'from' => $request['from'],
'size' => $request['size'],
"query" => [
"bool"=>[
"must" => [
[
"term" => [
"doctor_id" => [
"value" => $request['doctor_id']
]
]
],
[
"match" => [
"book_date" => $request['book_date']
]
],
]
],
]
]
];
Take a look at Exists Query
Try this:
GET <your-index>/_count
{
"query": {
"exists": {
"field": "actual_start"
}
}
}
Then you can read the count value which will give you the total count of records that actual_start is not-null.
You can also replace _count with _search and total value under hits will give you a total count (in case you also want the hits) .
If you want to do the opposite (all records which actual_start is null):
GET <your-index>/_count
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "actual_start"
}
}
]
}
}
}
UPDATE
If I understand you correctly you want to append your current query with the exists query.
Example:
GET <your-index>/_search
{
"query": {
"bool": {
"must": [
{
<put-your-query-here>
},
{
"exists": {
"field": "actual_start"
}
}
]
}
}
}

Multi-term filter in ElasticSearch (NEST)

I am trying to query documents based on a given field having multiple possible values. For example, my documents have an "extension" property which is the extension type of a file like .docx, xls, .pdf, etc. I want to be able to filter my "extensions" property on any number of values, but cannot find the correct syntax needed to get this functionality. Here is my current query:
desc.Type("entity")
.Routing(serviceId)
.From(pageSize * pageOffset)
.Size(pageSize)
.Query(q => q
.Filtered(f => f
.Query(qq =>
qq.MultiMatch(m => m
.Query(query)
.OnFields(_searchFields)) ||
qq.Prefix(p1 => p1
.OnField("entityName")
.Value(query)) ||
qq.Prefix(p2 => p2
.OnField("friendlyUrl")
.Value(query))
)
.Filter(ff =>
ff.Term("serviceId", serviceId) &&
ff.Term("subscriptionId", subscriptionId) &&
ff.Term("subscriptionType", subscriptionType) &&
ff.Term("entityType", entityType)
)
)
);
P.S. It may be easier to think of it in the inverse, where I send up the file extensions I DON'T want and set up the query to get documents that DON'T have any of the extension values given.
After discussion, this should be a raw json query, that should work and can be translated to NEST quite easily:
POST /test/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "VALUE"
}
},
{
"term": {
"subscriptionId": "VALUE"
}
},
{
"term": {
"subscriptionType": "VALUE"
}
},
{
"term": {
"entityType": "VALUE"
}
}
],
"must_not": [
{
"terms": {
"extension": [
"docx",
"doc"
]
}
}
]
}
}
}
}
}
What had to be done:
In order to have clauses that have to exist and the ones, that need to be filtered out, bool query suited best.
Must query stores all clauses that are present in OPs query
Must_not query should store all extensions that need to be filtered out
If you want to return items that match ".doc" OR ".xls" then you want a TERMS query. Here is a sample:
var searchResult = ElasticClient
.Search<SomeESType>(s => s
.Query(q => q
.Filtered(fq => fq
.Filter(f => f
.Terms(t => t.Field123, new List<string> {".doc", ".xls"})
)
)
)
)

elasticsearch -check if array contains a value

I want to check on an field of an array long type that includes some values.
the only way I found is using script: ElasticSearch Scripting: check if array contains a value
but it still not working fore me:
Query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['Commodity'].values.contains(param1)",
"params": {
"param1": 50
}
}
}
}
}
}
but I get 0 hits. while I have the records:
{
"_index" : "aaa",
"_type" : "logs",
"_id" : "2zzlXEOgRtujWiCGtX6s9Q",
"_score" : 1,
"_source" : {
"Commodity" : [
50
],
"Type" : 1,
"SourceId" : "fsd",
"Id" : 123
}
}
Try this instead of that script:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Commodity": [
55,
150
],
"execution": "and"
}
}
}
}
}
For those of you using the latest version of Elasticsearch (7.1.1), please note that
"filtered" and "execution" are deprecated so #Andrei Stefan's answer may not help anymore.
You can go through the below discussion for alternative approaches.
https://discuss.elastic.co/t/is-there-an-alternative-solution-to-terms-execution-and-on-es-2-x/41089
In the answer written by nik9000 in the above discussion, I just replaced "term" with "terms" (in PHP) and it started working with array inputs and AND was applied with respect to each of the "terms" keys that I used.
EDIT: Upon request I will post a sample query written in PHP.
'body' => [
'query' => [
'bool' => [
'filter' => [
['terms' => ['key1' => array1]],
['terms' => ['key2' => array2]],
['terms' => ['key3' => array3]],
['terms' => ['key4' => array4]],
]
]
]
]
key1,key2 and key3 are keys present in my elasticsearch data and they will be searched for in their respective arrays. AND function is applied between the ["terms" => ['key' => array ] lines.
For those of you who are using es 6.x, this might help.
Here I am checking whether the user(rennish.joseph#gmail.com) has any orders by passing in an array of orders
GET user-orders/_search
{
"query": {
"bool": {
"filter": [
{
"terms":{
"orders":["123456","45678910"]
}
},
{
"term":{
"user":"rennish.joseph#gmail.com"
}
}
]
}
}
}

How to use ElasticSearch Query params (DSL query) for multiple types?

I have been working with the ElasticSearch from last few months, but still find it complicated when I have to pass an complicated query.
I want to run the query which will have to search the multiple "types" and each type has to be searched with its own "filters", but need to have combined "searched results"
For example:
I need to search the "user type" document which are my friends and on the same time i have to search the "object type" document which I like, according to the keyword provided.
OR
The query that has both the "AND" and "NOT" clause
Example query:
$options['query'] = array(
'query' => array(
'filtered' => array(
'query' => array(
'query_string' => array(
'default_field' => 'name',
'query' => $this->search_term . '*',
),
),
'filter' => array(
'and' => array(
array(
'term' => array(
'access_id' => 2,
),
),
),
'not' => array(
array(
'term' => array(
'follower' => 32,
),
),
array(
'term' => array(
'fan' => 36,
),
),
),
),
),
),
);
as this query is meant to search the user with access_id = 2, but must not have the follower of id 32 and fan of id 36
but this is not working..
Edit: Modified query
{
"query": {
"filtered": {
"filter": {
"and": [
{
"not": {
"filter": {
"and": [
{
"query": {
"query_string": {
"default_field": "fan",
"query": "*510*"
}
}
},
{
"query": {
"query_string": {
"default_field": "follower",
"query": "*510*"
}
}
}
]
}
}
},
{
"term": {
"access_id": 2
}
}
]
},
"query": {
"field": {
"name": "xyz*"
}
}
}
}
}
now after running this query, i am getting two results, one with follower: "34,518" & fan: "510" and second with fan:"34", but isn't it supposed to be only the second one in the result.
Any ideas?
You may want to look at the slides of a presentation that I gave this month, which explains the basics of how the query DSL works:
Terms of endearment - the ElasticSearch Query DSL explained
The problem with your query is that your filters are nested incorrectly. The and and not filters are at the same level, but the not filter should be under and:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"and" : [
{
"not" : {
"filter" : {
"and" : [
{
"term" : {
"fan" : 36
}
},
{
"term" : {
"follower" : 32
}
}
]
}
}
},
{
"term" : {
"access_id" : 2
}
}
]
},
"query" : {
"field" : {
"name" : "keywords to search"
}
}
}
}
}
'
I just tried it with the "BOOL"
{
"query": {
"bool": {
"must": [
{
"term": {
"access_id": 2
}
},
{
"wildcard": {
"name": "xyz*"
}
}
],
"must_not": [
{
"wildcard": {
"follower": "*510*"
}
},
{
"wildcard": {
"fan": "*510*"
}
}
]
}
}
}
It gives the correct answer.
but I'm not sure should it be used like this ?

Resources