Get the count of all the documents including innerHits in elasticsearch - elasticsearch

I have an index defined in Elasticsearch which has 3 level of hierarchy relation defined.
aggParent
aggChildL1
aggChildL0
Below is the mapping for that index.
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"deviceName": {
"type": "keyword"
},
"agg_relation_type": {
"type": "join",
"relations": {
"aggParent": "aggChildL1",
"aggChildL1": "aggChildL0"
}
}
}
}
}
I have written a query that will return parent documents in the hits and the corresponding children in the innerHits.
Following is the query
{
"size": 1,
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "aggChildL1",
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "aggChildL0",
"query": {
"match": {
"id": "nc1olt5onu1unia"
}
},
"inner_hits": {
}
}
},
{
"bool": {
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
},
{
"match": {
"agg_relation_type": "aggChildL1"
}
}
]
}
}
]
}
},
"inner_hits": {
"size": 64,
"sort": [
{
"deviceType": {
"order": "desc"
}
}
]
}
}
},
{
"bool": {
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
},
{
"match": {
"agg_relation_type": "aggParent"
}
}
]
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "agg_relation_type"
}
},
"must": [
{
"match": {
"id": "nc1olt5onu1unia"
}
}
]
}
}
]
}
}
}
This query returns a count at the top level with only the count of total aggParent documents.
I need to get the count at the inner hits level as well.
The count of all matching documents at the aggChildL0 level and then the count of all documents that gets loaded at the aggChildL1 level based on the has_child query and then the count of documents that match the filter on the aggChildL1 level.
Similarly the count of all documents that get loaded at aggParent level based on the top most has_child query and then the count of documents that match the filter on the aggParent level.
Basically the total count of all the documents that can be returned with the query.
Is there any way of getting the total count in ES?

Related

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Elasticsearch for index array element

Hi i want to search array element from index using elastic search query
{
"name": "Karan",
"address": [
{
"city": "newyork",
"zip": 12345
},
{
"city": "mumbai",
"zip": 23456
}]
}}
when i am trying to search using match query it does not work
{
"query": {
"bool": {
"must": [
{
"match": {
"address.city": "newyork"
}
}
]
}
}
}
when i access simple feild like "name": "Karan" it works, there is only issue for array element.
Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:
GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "eggs"
}
},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{
"match": {
"comments.name": "john"
}
},
{
"match": {
"comments.age": 28
}
}
]
}
}
}
}
]
}}}
See the docs
The way i followed..
Mapping :
{
"mappings": {
"job": {
"properties": {
"name": {
"type": "text"
},
"skills": {
"type": "nested",
"properties": {
"value": {
"type": "text"
}
}
}
}
}
}
Records
[{"_index":"jobs","_type":"job","_id":"2","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}]}},{"_index":"jobs","_type":"job","_id":"1","_score":1.0,"_source":{"name":"sr soft eng","skills":[{"value": "java"}, {"value": "oracle"}, {"value": "javascript"}]}},
search Query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{ "match": {"skills.value": "java"}}
]
}
}
}
}
}

Score keyword terms query on nested fields in elastichsearch 6.3

I have a set of keywords (skills in my example) and I would like to retrieve documents which match most of them. The documents should be sorted by how many of the keywords they match. The field i am searching into (skills) is of nested type. The index has the following mapping:
{
"mappings": {
"profiles": {
"properties": {
"id": {
"type": "keyword"
},
"skills": {
"type": "nested",
"properties": {
"level": {
"type": "float"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
I tried both a terms query on the keyword field like:
{
"query": {
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"python",
"java"
]
}
}
}
}
}
And a boolean query
{
"query": {
"nested": {
"path": "skills",
"query": {
"bool": {
"should": [
{
"terms": {
"skills.name": [
"java"
]
}
},
{
"terms": {
"skills.name": [
"r"
]
}
}
]
}
}
}
}
}
For both queries the maximum score of the returned documents is 1. Thus both return documents that have ANY of the skills, but do not sort them such those with both skills are on top. The issues seems to be that skills is a nested field.
The second query works if each element of should is a nested query.
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"java"
]
}
}
}
},
{
"nested": {
"path": "skills",
"query": {
"terms": {
"skills.name": [
"r"
]
}
}
}
}
]
}
}
}

How to improve inner_hits in Elasticsearch

I have two ES_TYPEs in my_index
user
user_property
One is defined as parent (user) and another as child (user_property)
user_property has following mapping:
PUT /my_index/_mapping/user_property
{
"user_property": {
"properties": {
"name": {
"type": "keyword",
},
"value": {
"type": "keyword"
}
}
}
}
I want to get all users having some properties (say property1, property2) along with their properties value, so to do this I create following query with inner_hits but query response time is exponentially large with inner_hits.
GET /my_index/user/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "user_property",
"query": {
"bool": {
"must": [
{
"term": {
"name": "property1"
}
}
]
}
},
"inner_hits": {
"name": "inner_hits_1"
}
}
},
{
"has_child": {
"type": "user_property",
"query": {
"bool": {
"must": [
{
"term": {
"name": "property2"
}
}
]
}
},
"inner_hits": {
"name": "inner_hits_2"
}
}
}
]
}
}
}
Is there any way to reduce this time ?

Elasticsearch AND Parens

I'm attempting to do the following with the query dsl but I'll express it as SQL:
(matrices.matrix = 'Matrix1' AND matrices.count = 1) AND (matrices.matrix = 'Matrix2' AND matrices.count >= 0)
So, I need to get docs that have both of these nested docs with these values.
This is the nested document it sits on the _source level
"matrices": [
{
"terms": [],
"count": 0,
"matrix": "none"
},
{
"terms": [
"greater"
],
"count": 1,
"matrix": "Matrix1"
}
]
And here is the mapping for the sub-doc:
"matrices": {
"type": "nested",
"include_in_parent": true,
"properties": {
"count": {
"type": "long"
},
"matrix": {
"type": "string"
},
"terms": {
"type": "string"
}
}
}
So, I need to generate a query that will allow me to get docs that match both (matrix = 'none' && count=0) && (matrix = 'Matrix' && count = 1)
Thanks,
So basically you want to retrieve documents that MUST contain two nested documents with the following criteria:
one nested document with matrices.count=0 AND matrices.matrix=none
another nested document with matrices.count=1 AND matrices.matrix=Matrix
Then with the mapping you have, you can achieve that result using the following query. We use bool/must for two nested queries which in turn match the criteria each of the nested documents that must be retrieved.
curl -XPOST localhost:9200/_search -d '{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "matrices",
"query": {
"bool": {
"must": [
{
"term": {
"matrices.count": 0
}
},
{
"term": {
"matrices.matrix": "none"
}
}
]
}
}
}
},
{
"nested": {
"path": "matrices",
"query": {
"bool": {
"must": [
{
"term": {
"matrices.count": 1
}
},
{
"term": {
"matrices.matrix": "matrix"
}
}
]
}
}
}
}
]
}
}
}
}
}

Resources