Score keyword terms query on nested fields in elastichsearch 6.3 - elasticsearch

I have a set of keywords (skills in my example) and I would like to retrieve documents which match most of them. The documents should be sorted by how many of the keywords they match. The field i am searching into (skills) is of nested type. The index has the following mapping:
{
"mappings": {
"profiles": {
"properties": {
"id": {
"type": "keyword"
},
"skills": {
"type": "nested",
"properties": {
"level": {
"type": "float"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
I tried both a terms query on the keyword field like:
{
"query": {
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"python",
"java"
]
}
}
}
}
}
And a boolean query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"should": [
{
"terms": {
"skills.name": [
"java"
]
}
},
{
"terms": {
"skills.name": [
"r"
]
}
}
]
}
}
}
}
}
For both queries the maximum score of the returned documents is 1. Thus both return documents that have ANY of the skills, but do not sort them such those with both skills are on top. The issues seems to be that skills is a nested field.

The second query works if each element of should is a nested query.
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"java"
]
}
}
}
},
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"r"
]
}
}
}
}
]
}
}
}

Related

Get the count of all the documents including innerHits in elasticsearch

I have an index defined in Elasticsearch which has 3 level of hierarchy relation defined.
aggParent
aggChildL1
aggChildL0
Below is the mapping for that index.
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"deviceName": {
"type": "keyword"
},
"agg_relation_type": {
"type": "join",
"relations": {
"aggParent": "aggChildL1",
"aggChildL1": "aggChildL0"
}
}
}
}
}
I have written a query that will return parent documents in the hits and the corresponding children in the innerHits.
Following is the query
{
"size": 1,
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "aggChildL1",
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "aggChildL0",
"query": {
"match": {
"id": "nc1olt5onu1unia"
}
},
"inner_hits": {
}
}
},
{
"bool": {
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
},
{
"match": {
"agg_relation_type": "aggChildL1"
}
}
]
}
}
]
}
},
"inner_hits": {
"size": 64,
"sort": [
{
"deviceType": {
"order": "desc"
}
}
]
}
}
},
{
"bool": {
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
},
{
"match": {
"agg_relation_type": "aggParent"
}
}
]
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "agg_relation_type"
}
},
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
}
]
}
}
]
}
}
}
This query returns a count at the top level with only the count of total aggParent documents.
I need to get the count at the inner hits level as well.
The count of all matching documents at the aggChildL0 level and then the count of all documents that gets loaded at the aggChildL1 level based on the has_child query and then the count of documents that match the filter on the aggChildL1 level.
Similarly the count of all documents that get loaded at aggParent level based on the top most has_child query and then the count of documents that match the filter on the aggParent level.
Basically the total count of all the documents that can be returned with the query.
Is there any way of getting the total count in ES?

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Elasticsearch for index array element

Hi i want to search array element from index using elastic search query
{
"name": "Karan",
"address": [
{
"city": "newyork",
"zip": 12345
},
{
"city": "mumbai",
"zip": 23456
}]
}}
when i am trying to search using match query it does not work
{
"query": {
"bool": {
"must": [
{
"match": {
"address.city": "newyork"
}
}
]
}
}
}
when i access simple feild like "name": "Karan" it works, there is only issue for array element.
Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:
GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "eggs"
}
},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{
"match": {
"comments.name": "john"
}
},
{
"match": {
"comments.age": 28
}
}
]
}
}
}
}
]
}}}
See the docs
The way i followed..
Mapping :
{
"mappings": {
"job": {
"properties": {
"name": {
"type": "text"
},
"skills": {
"type": "nested",
"properties": {
"value": {
"type": "text"
}
}
}
}
}
}
Records
[{"_index":"jobs","_type":"job","_id":"2","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}]}},{"_index":"jobs","_type":"job","_id":"1","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}, {"value": "javascript"}]}},
search Query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{ "match": {"skills.value": "java"}}
]
}
}
}
}
}

How to improve inner_hits in Elasticsearch

I have two ES_TYPEs in my_index
user
user_property
One is defined as parent (user) and another as child (user_property)
user_property has following mapping:
PUT /my_index/_mapping/user_property
{
"user_property": {
"properties": {
"name": {
"type": "keyword",
},
"value": {
"type": "keyword"
}
}
}
}
I want to get all users having some properties (say property1, property2) along with their properties value, so to do this I create following query with inner_hits but query response time is exponentially large with inner_hits.
GET /my_index/user/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "user_property",
"query": {
"bool": {
"must": [
{
"term": {
"name": "property1"
}
}
]
}
},
"inner_hits": {
"name": "inner_hits_1"
}
}
},
{
"has_child": {
"type": "user_property",
"query": {
"bool": {
"must": [
{
"term": {
"name": "property2"
}
}
]
}
},
"inner_hits": {
"name": "inner_hits_2"
}
}
}
]
}
}
}
Is there any way to reduce this time ?

elastic exists query for nested documents

I have a nested documents as:
"someField": "hello",
"users": [
{
"name": "John",
"surname": "Doe",
"age": 2
}
]
according to this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html, the above should match:
GET /_search
{
"query": {
"exists" : { "field" : "users" }
}
}
whereas the following should not,
"someField": "hello",
"users": []
but unfortunately both do not match. any ideas?
The example mentioned on the Elasticsearch blog refers to string and array of string types, not for nested types.
The following query should work for you:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "users"
}
}
]
}
}
}
}
}
Also, you can refer to this issue for more info, which discusses this usage pattern.
This works for me
GET /type/_search?pretty=true
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "outcome",
"query": {
"exists": {
"field": "outcome.outcomeName"
}
}
}
}
]
}
}
}
With the following index mapping:
{
"index_name": {
"mappings": {
"object_name": {
"dynamic": "strict",
"properties": {
"nested_field_name": {
"type": "nested",
"properties": {
"some_property": {
"type": "keyword"
}
}
}
}
}
}
}
}
I needed to use this query:
GET /index_name/_search
{
"query": {
"nested": {
"path": "nested_field_name",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "nested_field_name.some_property"
}
}
]
}
}
}
}
}
Elasticsearch version 5.4.3
The answer from user3775217 has worked for me but I needed to tweak it to work as expected for must_not. Essentially the bool/must needed to be wrapped around the nested portion of the query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "users",
"query": {
"exists": {
"field": "users"
}
}
}
}
}
]
}
}

Resources