Elasticsearch - object type, search nested elements - elasticsearch

I'm using Elasticsearch version 7.2.0 and have the following type in my mapping:
"status_change": {
"type": "object",
"enabled": false
},
Example of the data inside this field:
"before": {
"status_type": "Status One"
},
"after": {
"status_type": "Status Two"
}
There are various status_types and I am attempting to create a query to identify changes from specified before status_type to specified after status type. I'm struggling to query by these nested elements and the following query fails:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"nested": {
"path": "status_change",
"query": {
"term": {
"status_change.before": "the_before_status"
}
}
}
}
}
}
}
With:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "[nested] nested object under path [status_change] is not of nested type"
}
Which is self explanatory, due to status_change not being a 'nested' field. I have tried looking at searching objects by their nested elements but I have not found a solution.
Is there a way for me to search the nested object elements like this?

You are not making use of nested type but an object type. There is a difference and I'd suggest you to go through the aforementioned links.
If you are using Nested Type
Change your mapping to the below:
"status_change": {
"type": "nested"
}
Once you do that and then reindex the documents, the query you have, would simply return the list of documents containing status_change.before: the_before_status irrespective of what value you have under status_change.after in your documents.
For further filtering based on status_change.after, add another must clause with that field to return documents you are looking for.
If you are using Object Type
Simply remove the field enabled: false from your mapping and change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"status_change.before.status_type": "the_before_status"
}
}
]
}
}
}
Hope that helps!

Related

Elasticsearch : Sorting an array of objects

I am new to elasticsearch.
The documents I have look something like this :
user_name : "test",
piInfo: {
profile: {
word_count: 535,
word_count_message: "There were 535 words in the input.",
processed_language: "en",
personality: [
{
name: "Openness",
category: "personality",
percentile: 0.0015293409544490655,
children: []
},
{
name: "Conscientiousness",
category: "personality",
percentile: 0.6803430984001135,
children: []
}
]
}
}
What I am trying to do is to sort users (user_name) by personality (for example : "Openness") by "percentile"
What I came up with so far, based on elasticsearch: Nested datatype and elasticsearch: Sorting within nested objects., is this code :
"query": {
"nested": {
"path": "piInfo.profile.personality",
"query": {
"bool": {
"must":
{ "match": { "piInfo.profile.personality.name": "Openness" }}
}
}
}
},
"sort" : {
"piInfo.profile.personality.percentile" : {
"order" : "asc",
"nested": {
"path": "piInfo.profile.personality",
"filter": {
"match": { "piInfo.profile.personality.name": "Openness" }
}
}
}
}
I got this error:
[nested] nested object under path [piInfo.profile.personality] is not of nested type
And that is logic because I didn't mapp it. I am taking data from an API and I am storing it as it is.
Is there a way around that?
You are getting that error because you can only use nested queries on nested types. In other words, you need to declare "personality" as the data type nested in your index mapping beore you load any data into it. The default type is object. Making objects nested means Elasticsearch will store them in a such a way that the relationship between the fields in each individual object is not lost. If you don't do that then each field is built into an index independently.
More info here:
Elasticsearch Nested Datatype

Elasticsearch: Search in an array of JSONs

I'm using Elasticsearch with the python library and I have a problem using the search query when the object become a little bit complex. I have objects build like that in my index:
{
"id" : 120,
"name": bob,
"shared_status": {
"post_id": 123456789,
"text": "This is a sample",
"urls" : [
{
"url": "http://test.1.com",
"displayed_url": "test.1.com"
},
{
"url": "http://blabla.com",
"displayed_url": "blabla.com"
}
]
}
}
Now I want to do a query that will return me this document only if in one of the displayed URL's a substring "test" and there is a field "text" in the main document. So I did this query:
{
"query": {
"bool": {
"must": [
{"exists": {"field": "text"}}
]
}
}
}
}
But I don't know what query to add for the part: one of the displayed URL's a substring "test"
Is that posssible? How does the iteration on the list works?
If you didn't define an explicit mapping for your schema, elasticsearch creates a default mapping based on the data input.
urls will be of type object
displayed_url will be of type string and using standard analyzer
As you don't need any association between url and displayed_url, the current schema will work fine.
You can use a match query for full text match
GET _search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "text"
}
},
{
"match": {
"urls.displayed_url": "test"
}
}
]
}
}
}

Elasticsearch script query involving root and nested values

Suppose I have a simplified Organization document with nested publication values like so (ES 2.3):
{
"organization" : {
"dateUpdated" : 1395211600000,
"publications" : [
{
"dateCreated" : 1393801200000
},
{
"dateCreated" : 1401055200000
}
]
}
}
I want to find all Organizations that have a publication dateCreated < the organization's dateUpdated:
{
"query": {
"nested": {
"path": "publications",
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['publications.dateCreated'].value < doc['dateUpdated'].value"
}
}
]
}
}
}
}
}
My problem is that when I perform a nested query, the nested query does not have access to the root document values, so doc['dateUpdated'].value is invalid and I get 0 hits.
Is there a way to pass in a value into the nested query? Or is my nested approach completely off here? I would like to avoid creating a separate document just for publications if necessary.
Thanks.
You can not access the root values from nested query context. They are indexed as separate documents. From the documentation
The nested clause “steps down” into the nested comments field. It no
longer has access to fields in the root document, nor fields in any
other nested document.
You can get the desired results with the help of copy_to parameter. Another way to do this would be to use include_in_parent or include_in_root but they might be deprecated in future and it will also increase the index size as every field of nested type will be included in root document so in this case copy_to functionality is better.
This is a sample index
PUT nested_index
{
"mappings": {
"blogpost": {
"properties": {
"rootdate": {
"type": "date"
},
"copy_of_nested_date": {
"type": "date"
},
"comments": {
"type": "nested",
"properties": {
"nested_date": {
"type": "date",
"copy_to": "copy_of_nested_date"
}
}
}
}
}
}
}
Here every value of nested_date will be copied to copy_of_nested_date so copy_of_nested_date will look something like [1401055200000,1393801200000,1221542100000] and then you could use simple query like this to get the results.
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "doc['rootdate'].value < doc['copy_of_nested_date'].value"
}
}
]
}
}
}
You don't have to change your nested structure but you would have to reindex the documents after adding copy_to to publication dateCreated

Elasticsearch Nested Filters being inclusive vs. exclusive

I have an object mapping that uses nested objects (props in our example) in a tag-like fashion.
Each tag can belong to a client/user and when we want to allow our users to generate query_string style searches against the props.name.
Issue is that when we run our query if an object has multiple props and if one of the many props match the filter when others don't the object is returned, when we want the opposite - if one returns false don't return vs. if one returns true return.
I have posted a comprehensive example here: https://gist.github.com/d2kagw/1c9d4ef486b7a2450d95
Thanks in advance.
I believe here you might need the advantage of a flattened list of values, like an array of values. The major difference between an array and nested objects is that the latter "knows" which value of a nested property corresponds to another value of another property in the same nested object. The array of values, on the other hand will flatten the values of a certain property and you lose the "association" between a client_id and a name. Meaning, with arrays you have props.client_id = [null, 2] and props.name = ["petlover", "premiumshopper"].
With your nested filter you want to match that string to all values for props.name meaning ALL nested props.names of one parent doc needs to match. Well, this doesn't happen with nested objects, because the nested documents are separate and are queried separately. And, if at least one nested document matches then it's considered a match.
In other words, for a query like "query": "props.name:(carlover NOT petlover)" you basically need to run it against a flattened list of values, just like arrays. You need that query ran against ["carlover", "petlover"].
My suggestion for you is to make your nested documents "include_in_parent": true (meaning, keep in parent a flattened, array-like list of values) and change a bit the queries:
for the query_string part, use the flattened properties approach to be able to match your query for a combined list of elements, not element by element.
for the match (or term, see below) and missing parts use the nested properties approach because you can have nulls in there. A missing on an array will match only if the whole array is missing, not one value in it, so here one cannot use the same approach as for the query, where the values were flattened in an array.
optional, but for the query match integer I would use term, as it's not string but integer and is by default not_analyzed.
These being said, with the above changes, these are the changes:
{
"mappings" : {
...
"props": {
"type": "nested",
"include_in_parent": true,
...
should (and does) return zero results
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "props.name:((carlover AND premiumshopper) NOT petlover)" }
}
},
{
"nested": {
"path": "props",
"filter": {
"or": [ { "query": { "match": { "props.client_id": 1 } } }, { "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 1
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{"query": {"query_string": { "query": "props.name:(carlover NOT petlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "match": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 2
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{ "query": {"query_string": { "query": "props.name:(* NOT carlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "term": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } }
]
}
}
}
]
}
}
}
}

Elastic search multiple terms in a dictionary

I have mapping like:
"profile": {
"properties": {
"educations": {
"properties": {
"university": {
"type": "string"
},
"graduation_year": {
"type": "string"
}
}
}
}
}
which obviously holds the educations history of people. Each person can have multiple educations. What I want to do is search for people who graduated from "SFU" in "2012". To do that I am using filtered search:
"filtered": {
"filter": {
"and": [
{
"term": {
"educations.university": "SFU"
}
},
{
"term": {
"educations.graduation_year": "2012"
}
}
]
}
But what this query does is to find the documents who have "SFU" and "2012" in their education, so this document would match, which is wrong:
educations[0] = {"university": "SFU", "graduation_year": 2000}
educations[1] = {"university": "UBC", "graduation_year": 2012}
Is there anyway I could filter both terms on each education?
You need to define nested type for educations and use nested filter to filter it, or Elasticsearch will internally flattens inner objects into a single object, and return the wrong results.
You can refer here for detail explainations and samples:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources