Elasticsearch : Sorting an array of objects - elasticsearch

I am new to elasticsearch.
The documents I have look something like this :
user_name : "test",
piInfo: {
profile: {
word_count: 535,
word_count_message: "There were 535 words in the input.",
processed_language: "en",
personality: [
{
name: "Openness",
category: "personality",
percentile: 0.0015293409544490655,
children: []
},
{
name: "Conscientiousness",
category: "personality",
percentile: 0.6803430984001135,
children: []
}
]
}
}
What I am trying to do is to sort users (user_name) by personality (for example : "Openness") by "percentile"
What I came up with so far, based on elasticsearch: Nested datatype and elasticsearch: Sorting within nested objects., is this code :
"query": {
"nested": {
"path": "piInfo.profile.personality",
"query": {
"bool": {
"must":
{ "match": { "piInfo.profile.personality.name": "Openness" }}
}
}
}
},
"sort" : {
"piInfo.profile.personality.percentile" : {
"order" : "asc",
"nested": {
"path": "piInfo.profile.personality",
"filter": {
"match": { "piInfo.profile.personality.name": "Openness" }
}
}
}
}
I got this error:
[nested] nested object under path [piInfo.profile.personality] is not of nested type
And that is logic because I didn't mapp it. I am taking data from an API and I am storing it as it is.
Is there a way around that?

You are getting that error because you can only use nested queries on nested types. In other words, you need to declare "personality" as the data type nested in your index mapping beore you load any data into it. The default type is object. Making objects nested means Elasticsearch will store them in a such a way that the relationship between the fields in each individual object is not lost. If you don't do that then each field is built into an index independently.
More info here:
Elasticsearch Nested Datatype

Related

Elasticsearch - object type, search nested elements

I'm using Elasticsearch version 7.2.0 and have the following type in my mapping:
"status_change": {
"type": "object",
"enabled": false
},
Example of the data inside this field:
"before": {
"status_type": "Status One"
},
"after": {
"status_type": "Status Two"
}
There are various status_types and I am attempting to create a query to identify changes from specified before status_type to specified after status type. I'm struggling to query by these nested elements and the following query fails:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"nested": {
"path": "status_change",
"query": {
"term": {
"status_change.before": "the_before_status"
}
}
}
}
}
}
}
With:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "[nested] nested object under path [status_change] is not of nested type"
}
Which is self explanatory, due to status_change not being a 'nested' field. I have tried looking at searching objects by their nested elements but I have not found a solution.
Is there a way for me to search the nested object elements like this?
You are not making use of nested type but an object type. There is a difference and I'd suggest you to go through the aforementioned links.
If you are using Nested Type
Change your mapping to the below:
"status_change": {
"type": "nested"
}
Once you do that and then reindex the documents, the query you have, would simply return the list of documents containing status_change.before: the_before_status irrespective of what value you have under status_change.after in your documents.
For further filtering based on status_change.after, add another must clause with that field to return documents you are looking for.
If you are using Object Type
Simply remove the field enabled: false from your mapping and change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"status_change.before.status_type": "the_before_status"
}
}
]
}
}
}
Hope that helps!

Matching multiple values in same field

I have "routes" field as long type (Im storing array of values in that Example 1. [5463, 3452] , 2. [5467, 3452]) in mapping. In the following query i
want to retrieve data which matches both 5463, 3452 in same record
GET /flight_routes/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"routes": [5463, 3452]
}
}
}
}
}
But it is returning document which matches with either one value. Should I have to migrate the mapping type to nested to handle this or
any other way to get it through query itself?
You can use the terms_set query with a minimum_should_match_script that returns the length of the array
POST /flight_routes/_search
{
"query": {
"terms_set": {
"routes" : {
"terms" : [5463, 3452],
"minimum_should_match_script": {
"source": "params.nb_terms",
"params": {
"nb_terms": 2
}
}
}
}
}
}

Elasticsearch script query involving root and nested values

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.
You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

Elasticsearch Nested Query matching and/or

For example, I have data containing the following:
{
author: "test",
books: [
{
name: "first book",
cost: 50
},
{
name: "second book",
cost: 100
}
]
}
I want to search the author which has ALL books with cost > 40. How would the query for that look like? The field books is mapped as nested property.
For author names with cost of one book greater than 40 (in hits), something as below in query would work
POST http://192.168.0.68:9200/library/Book/_search
{
"fields": ["author"],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "books",
"filter": {
"range": {
"books.cost": {
"gt": 40
}
}
}
}
}
}
}
}
For all books having cost greater than 40, I had to handle the collection of nested field manually in client side after getting the response.
Not sure if script applies here to apply filter to all nested objects.
Reference
document not in nested documents elasticsearch

ElasticSearch filter by array item

I have the following record in ES:
"authInput" : {
"uID" : "foo",
"userName" : "asdfasdfasdfasdf",
"userType" : "External",
"clientType" : "Unknown",
"authType" : "Redemption_regular",
"uIDExtensionFields" :
[
{
"key" : "IsAccountCreation",
"value" : "true"
}
],
"externalReferences" : []
}
"uIDExtensionFields" is an array of key/value pairs. I want to query ES to find all records where:
"uIDExtensionFields.key" = "IsAccountCreation"
AND "uIDExtensionFields.value" = "true"
This is the filter that I think I should be using but it never returns any data.
GET devdev/authEvent/_search
{
"size": 10,
"filter": {
"and": {
"filters": [
{
"term": {
"authInput.uIDExtensionFields.key" : "IsAccountCreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
Any help you guys could give me would be much appreciated.
Cheers!
UPDATE: WITH THE HELP OF THE RESPONSES BELOW HERE IS HOW I SOLVED MY PROBLEM:
lowercase the value that I was searching for. (changed "IsAccoutCreation" to "isaccountcreation")
Updated the mapping so that "uIDExtensionFields" is a nested type
Updated my filter to the following:
_
GET devhilden/authEvent/_search
{
"size": 10,
"filter": {
"nested": {
"path": "authInput.uIDExtensionFields",
"query": {
"bool": {
"must": [
{
"term": {
"authInput.uIDExtensionFields.key": "isaccountcreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
}
}
There are a few things probably going wrong here.
First, as mconlin points out, you probably have a mapping with the standard analyzer for your key field. It'll lowercase the key. You probably want to specify "index": "not_analyzed" for the field.
Secondly, you'll have to use nested mappings for this document structure and specify the key and the value in a nested filter. That's because otherwise, you'll get a match for the following document:
"uIDExtensionFields" : [
{
"key" : "IsAccountCreation",
"value" : "false"
},
{
"key" : "SomeOtherField",
"value" : "true"
}
]
Thirdly, you'll want to be using the bool-filter's must and not and to ensure proper cachability.
Lastly, you'll want to put your filter in the filtered-query. The top-level filter is for when you want hits to be filtered, but facets/aggregations to not be. That's why it's renamed to post_filter in 1.0.
Here's a few resources you'll want to check out:
Troubleshooting Elasticsearch searches, for Beginners covers the first two issues.
Managing Relations in ElasticSearch covers nested docs (and parent/child)
all about elasticsearch filter bitsets covers and vs. bool.

Resources