ElasticSearch Query for range OR missing in array - elasticsearch

I'm trying to create a query that will query the below mapping for values that have a "period" that either matches a specific date, or is missing it's value (with a null value). Please note that I am working with a third-party database, so I cannot change the mappings. Bear with me if the example data and mappings are large.. I've tried to cut everything nonessential away.
{
"EXAMPLE":{
"mappings":{
"company":{
"properties":{
"CompanyData":{
"properties":{
"participantRelations":{
"type":"nested",
"include_in_parent":true,
"properties":{
"participant":{
"type":"nested",
"include_in_parent":true,
"properties":{
"unitNumber":{
"type":"long"
}
},
"organizations":{
"properties":{
"memberData":{
"type":"nested",
"include_in_parent":true,
"properties":{
"attributes":{
"properties":{
"values":{
"properties":{
"period":{
"properties":{
"validFrom":{
"type":"date",
"format":"dateOptionalTime"
},
"validTo":{
"type":"date",
"format":"dateOptionalTime"
}
}
}
"value":{
"type":"string"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Here is some example data. I've translated it from the (Danish) source it comes from, so any slight missspellings etc are just misstypes.
{
"company": {
"companyData":{
"participantRelations":[
{
"participant":{
"unitNumber":4003857309
},
"organizations":[
{
"memberData":[
{
"attributtes":[
{
"values":[
{
"value":"chairman",
"period":{
"validFrom":"2014-10-01",
"validTo":"2016-08-11"
}
}
]
},
{
"values":[
{
"value":"generalassembly",
"period":{
"validFrom":"2014-10-01",
"validTo":"2016-08-11"
}
}
]
}
]
}
]
},
{
"memberData":[
{
"attributes":[
{
"values":[
{
"value":"chairman",
"period":{
"validFrom":"2016-08-16",
"validTo":"2017-06-08"
}
},
{
"value":"boardmember",
"period":{
"validFrom":"2017-06-09",
"validTo":null
}
}
]
},
{
"values":[
{
"value":"generalassembly",
"period":{
"validFrom":"2016-08-16",
"validTo":"2017-06-08"
}
},
{
"value":"generalassembly",
"period":{
"validFrom":"2017-06-09",
"validTo":null
}
}
]
}
]
}
]
}
]
}
]
}
}
}
What I want to do is something like the query below, which doesn't quite work as it has cases it cannot handle for reasons I do not know. What it needs to do is look for any company.participantRelations.organizations.memberData.attributes.values.period.validTo over a certain date, OR if the date is null. Now I know nulls are funky in ES, but I know that the date properties will always be there, but the validTo will be set to null if there is no date yet.
Furthermore, it needs to be nested on organizations as well, as I need a specific unitNumber to be present.
{
"query":{
"nested":{
"filter":{
"bool":{
"must":[
{
"nested":{
"filter":{
"bool":{
"must":[
{
"bool":{
"should":[
{
"range":{
"company.companyData.participantRelations.organizations.memberData.attributtes.values.period.validTo":{
"gte":"2017-08-14T15:23:11.011"
}
}
},
{
"missing":{
"field":"company.companyData.participantRelations.organizations.memberData.attributtes.values.period.validTo"
}
}
]
}
}
]
}
},
"path":"company.companyData.participantRelations.organizations.memberData"
}
},
{
"term":{
"company..companyData.participantRelations.participant.unitNumber":4003857309
}
}
]
}
},
"path":"company.companyData.participantRelations"
}
}
}
This query works in two cases:
Where there is only one entry in the list of values, and it's validTo date is null
Where the validTo date is greater or equal to my date limit.
It does not seem to work if there are two entries, the first of which has a date that is earlier than my limit, and the second entry has a null value (as in the example).
I realize this is kind of convoluted, but with the database I'm querying that is just the way it is. I hope I've simplified it enough for you to get my issue.
Thanks in advance.

Related

How do I sort using the best matching nested field or a default in Elasticsearch?

I have a bunch of documents that look like this in my index:
{
"given_name":"John",
"family_name":"Smith",
"email_addresses": [
{
"email_address":"john#gmail.com",
"primary":true
},
{
"email_address":"j.smith#gmail.com",
"primary":false
},
{
"email_address":"jpsmith#gmail.com",
"primary":false
},
{
"email_address":"johnsmith111#gmail.com",
"primary":false
}
]
}
The mapping looks like this:
{
"mappings":{
"properties":{
"given_name":{
"type":"keyword",
"fields":{
"search":{
"type":"search_as_you_type"
}
}
},
"family_name":{
"type":"keyword",
"fields":{
"search":{
"type":"search_as_you_type"
}
}
},
"email_addresses":{
"type":"nested",
"properties":{
"email_address":{
"type":"keyword",
"fields":{
"search":{
"type":"search_as_you_type"
}
}
},
"primary":{
"type":"boolean"
}
}
}
}
}
}
I am running a prefix search on given_name, family_name and email_addresses. This will allow the user to start typing and relevant results from those fields should start returning:
{
"query":{
"bool":{
"should":[
{
"nested":{
"path":"email_addresses",
"query":{
"prefix":{
"email_addresses.email_address.search": {
"value":"j"
}
}
}
}
},
{
"multi_match":{
"query":"j",
"fields":[
"given_name.search",
"family_name.search"
],
"type": "bool_prefix"
}
}
]
}
}
}
I'd like to sort the results from the above by the best matching email_address in email_addresses if there is one or more matching email_address under email_addresses, otherwise to use the email_address under email_addresses where primary is true.
I have looked into a script for sorting, but I didn't find anyway to access the matched nested child in a script in the documentation.
Is there anyway to achieve this?
To do this, we can use a bool query in the nested sort.
Given we have the following 4 documents:
{
"given_name":"John",
"family_name":"Smith1",
"email_addresses": [
{
"email_address":"someguy50#example.com",
"primary":true
},
{
"email_address":"someguy51#example.com",
"primary":false
},
{
"email_address":"someguy52#gmail.com",
"primary":false
},
{
"email_address":"someguy53gmail.com",
"primary":false
}
]
}
{
"given_name":"John",
"family_name":"Smith2",
"email_addresses": [
{
"email_address":"someguy54#example.com",
"primary":true
},
{
"email_address":"johnsmith#example.com",
"primary":false
},
{
"email_address":"someguy55#gmail.com",
"primary":false
},
{
"email_address":"someguy56gmail.com",
"primary":false
}
]
}
{
"given_name":"John",
"family_name":"Smith3",
"email_addresses": [
{
"email_address":"someguy49#example.com",
"primary":true
},
{
"email_address":"someguy47#example.com",
"primary":false
},
{
"email_address":"someguy48#gmail.com",
"primary":false
},
{
"email_address":"someguy46gmail.com",
"primary":false
}
]
}
{
"given_name":"John",
"family_name":"Smith4",
"email_addresses": [
{
"email_address":"someguy45#example.com",
"primary":true
},
{
"email_address":"someguy44#example.com",
"primary":false
},
{
"email_address":"someguy43#gmail.com",
"primary":false
},
{
"email_address":"someguy42gmail.com",
"primary":false
}
]
}
We can write our query like so:
{
"query":{
"bool":{
"should":[
{
"nested":{
"path":"email_addresses",
"query":{
"prefix":{
"email_addresses.email_address.search":{
"value":"john"
}
}
}
}
},
{
"multi_match":{
"query":"john",
"fields":[
"given_name.search",
"family_name.search"
],
"type":"bool_prefix"
}
}
]
}
},
"sort":[
{
"email_addresses.email_address":{
"order" : "asc",
"nested":{
"path":"email_addresses",
"filter":{
"bool":{
"should":[
{
"prefix":{
"email_addresses.email_address.search":{
"value":"john"
}
}
},
{
"term":{
"email_addresses.primary": true
}
}
]
}
}
}
}
}
]
}
First we do a prefix search on the email_addresses.email_address, given_name and family_name.
Then we sort on the nested email_addresses field as follows:
Sort by the email_addresses.email_address that matches our query.
Sort by email_address.primary = true.
The way this works is that in the bool query, Elasticsearch will first find documents that matches the first query under should and sort those documents. For the remaining documents that do not match, it will proceed to the next query, which in our case is email_address.primary = true. If there are more documents that do not match either of these queries, they will be ordered using an order predetermined by Elasticsearch.

Elasticsearch: When doing an "inner_hit" on nested documents, return all fields of matched offset in the hierarchy

Mapping for document:
{
"mappings": {
"properties": {
"client_classes": {
"type": "nested",
"properties": {
"members": {
"type": "nested",
"properties": {
"phone_nos": {
"type": "nested"
}
}
}
}
}
}
}
}
Data in Document:
{
"client_name":"client1",
"client_classes":[
{
"class_name":"class1",
"members":[
{
"name":"name1",
"phone_nos":[
{
"ext":"91",
"number":"99119XXXX"
},
{
"ext":"04",
"number":"99885XXXX"
}
]
},
{
"name":"name2",
"phone_nos":[
{
"ext":"03",
"number":"99887XXXX"
}
]
}
]
}
]
}
I query for "number" with value "99119XXXX"
{
"query":{
"nested":{
"path":"client_classes.members.phone_nos",
"query":{
"match":{
"client_classes.members.phone_nos.number":"99119XXXX"
}
},
"inner_hits":{}
}
}
}
Result from inner hits:
"inner_hits":{
"client_classes.members.phone_nos":{
"hits":{
"total":{
"value":1,
"relation":"eq"
},
"max_score":0.9808291,
"hits":[
{
"_index":"clients",
"_type":"_doc",
"_id":"1",
"_nested":{
"field":"client_classes",
"offset":0,
"_nested":{
"field":"members",
"offset":0,
"_nested":{
"field":"phone_nos",
"offset":0
}
}
},
"_score":0.9808291,
"_source":{
"ext":"91",
"number":"99119XXXX"
}
}
]
}
}
}
I get the desired matched result hierarchy of all the nested objects, in the inner hit, but I only receive the "offset" value and "field" from these objects. I need the full object of the corresponding offset.
Something like this:
{
"client_name":"client1",
"client_classes":[
{
"class_name":"class1",
"members":[
{
"name":"name1",
"phone_nos":[
{
"ext":"91",
"number":"99119XXXX"
}
]
}
]
}
]
}
I understand that with inner_hit I also get the complete root document, from where I can use the offset values from the innerhit object. But fetching the entire root document could be expensive for our memory, so I only need the result I have shared above.
Is there any such possibility as of now?
I am using elasticsearch 7.7
UPDATE: Added Mapping, result and a slight fix in document
Yes, just add "_source": false at the top-level and you'll only get the nested inner hits
{
"_source": false, <--- add this
"query":{
"nested":{
"path":"client_classes.members.phone_nos",
"query":{
"match":{
"client_classes.members.phone_nos.number":"99119XXXX"
}
},
"inner_hits":{}
}
}
}

How to reduce multiple conditions in ES

In the below query, I used multiple times match_phrase. how to reduce multiple match_phrase? because in production while querying to ES response is very slow.
GET /logs*/_search
{
"from":0,
"query":{
"bool":{
"filter":[
{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
}
],
"must":[
{
"bool":{
"must_not":[
{
"match_phrase":{
"message":{
"query":"System32"
}
}
},
{
"match_phrase":{
"message":{
"query":"212.118.14.45"
}
}
},
{
"match_phrase":{
"message":{
"query":" stopped state."
}
}
},
{
"match_phrase":{
"message":{
"query":" running state"
}
}
},
{
"match_phrase":{
"message":{
"query":" Share Name: \\\\*\\DLO-EBackup"
}
}
}
.
.
.
etc.,
.
.
.
.
.
{
"match_phrase":{
"message":{
"query":"WFO15Installation"
}
}
},
{
"match_phrase":{
"message":{
"query":"Windows\\SysWOW64"
}
}
},
{
"match_phrase":{
"message":{
"query":"Bitvise"
}
}
}
]
}
}
]
}
},
"size":10,
"sort":[
{
"#timestamp":{
"order":"desc"
}
}
]
}
Thank You!
to begin with, you could move the must_not block inside the filter one to skip score calculation and leverage on some caching. Something like:
"query":{
"bool":{
"filter":[{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
},
{
"bool": {
"must_not":[{
"match_phrase":{
"message":{
"query":"System32"
}
}
},
{
"match_phrase":{
"message":{
"query":"212.118.14.45"
}
}
},
...
]
}
}],
...
However, as someone already mentioned in the comments, you should optimise your data for searches before indexing your documents into Elasticsearch. A better solution than having so many filters in your query would be to process your data and applying those filters at ingestion time, for example by using the ingest APIs (see Elastic Documentation) or Logstash. E.g., you could evaluate the must_not conditions at index time and set the result into a boolean field (e.g., ignore) that you can add to all documents, so that you can use that field at query time with a query like this:
"query":{
"bool":{
"filter":[{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
},
{
"match": {
"ignore": false
}
},
...

Can i filter subarray in Elasticsearch?

I have orders and order products attached for each order as subarray in Elastic Search. When i'm aggregating Prices i need possibility to filter my order products in my documents of orders.
Example of my document in Elastic:
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"15",
"Price":"15.22",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"123",
"Price":"24.55",
},
]
}
How im filter right now:
{
"index":"order_index",
"from":0,
"size":100,
"body":{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"orderProducts.brandNo":"30"
}
}
],
}
}
}
}
}
}
What i'm expecting
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
]
}
What i'm really getting:
All document.
That is possible? To filter subarray data?
UPD.
Yes this is my schema mappings:
"mappings":{
"order":{
"dynamic_templates":[
{
"strings":{
"mapping":{
"type":"string",
"fields":{
"raw":{
"index":"not_analyzed",
"type":"string"
}
}
},
"match_mapping_type":"string"
}
}
],
"properties":{
"orderProducts":{
"include_in_parent":true,
"properties":{
"OrderProductID":{
"type":"long"
},
"OrderID":{
"type":"long"
},
"brandNo":{
"type":"long"
},
"Price":{
"type":"double"
}
},
"type":"nested"
},
"OrderID":{
"type":"long"
}
}
}
},
All right, after some experiments i discovered that that aggregation can be done like this:
{
"aggs":{
"sales":{
"nested":{
"path":"orderProducts"
},
"aggs":{
"filtered_nestedobjects":{
"filter":{
"bool":{
"must":[
{
"terms":{
"orderProducts.brandNo":[
"30"
]
}
}
]
}
},
"aggs":{
"Quantity":{
"sum":{
"field":"orderProducts.Quantity"
}
}
}
}
}
}
}
}
And the answer to main question can we filter subarray of elastic is yes. With the inner_hits only i did this.

settings the "index" property of an elasticsearch object

say I have a mapping of objects as such the mapping is:
{"my_type":
{"properties":
{"name":{"type":"string","store":"yes","index":"not_analyzed"},
"more":{"type":"object",
"properties":{"a_known_number":{"type":"long","index":"yes"},
"some_json_object":{"type":"object"}
}
}
}
}
}
I do not know what sub fields the "some_json_object" will have, but i DO know that I only want to store this object, but not index any of it's sub-fields.
Can I do:
{"my_type":
{"properties":
{"name":{"type":"string","store":"yes","index":"not_analyzed"},
"more":{"type":"object",
"properties":{"a_known_number":{"type":"long","index":"yes"},
"some_json_object":{"type":"object","store":"yes","index":"no"}
}
}
}
}
}
and affect all of the resulting sub-fields?
No, you can't specify the entire "object" as not indexed. However you can use dynamic_templates (http://www.elasticsearch.org/guide/reference/mapping/root-object-type/) to do this:
{
"my_type":{
"properties":{
"name":{
"type":"string",
"store":"yes",
"index":"not_analyzed"
}
},
"dynamic_templates":[
{
"stored_json_object_template":{
"path_match":"some_json_object.*",
"mapping":{
"store":"yes",
"index":"no"
}
}
}
]
}
}
This tells the mapper to map all properties for "some_json_object" as stored strings.
Update
Removed type from mapping in order to match all property types (match_path => path_match).
Update 2
If you then create an index:
{
"mappings":{
"my_type":{
"properties":{
"name":{
"type":"string",
"store":"yes",
"index":"not_analyzed"
}
},
"dynamic_templates":[
{
"stored_json_object_template":{
"path_match":"some_json_object.*",
"mapping":{
"store":"yes",
"index":"no"
}
}
}
]
}
}
}
and index an object:
{
"Name":"Henrik",
"some_json_object":{
"string":"string",
"long":12345
}
}
it will then get the following mapping:
{
"testindex":{
"my_type":{
"dynamic_templates":[
{
"stored_json_object_template":{
"mapping":{
"index":"no",
"store":"yes"
},
"path_match":"some_json_object.*"
}
}
],
"properties":{
"name":{
"type":"string",
"index":"not_analyzed",
"store":true,
"omit_norms":true,
"index_options":"docs"
},
"some_json_object":{
"properties":{
"long":{
"type":"long",
"index":"no",
"store":true
},
"string":{
"type":"string",
"index":"no",
"store":true
}
}
}
}
}
}
}

Resources