ElasticSearch - Filter results by inner hits - elasticsearch

We have a simple index of entities with mapping:
PUT resource/_mapping/entity
{
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"claims": {
"type": "nested",
"properties": {
"claimid": {
"type": "keyword"
},
"priority": {
"type": "short"
},
"visibility": {
"type": "keyword"
}
}
}
}
}
Here's a sample document in the index:
POST resource/entity/
{
"id": "2",
"name": "e2",
"claims": [
{
"claimid": "c1",
"priority": "2",
"visibility": "M",
"reqid" : "2"
},
{
"claimid": "c2",
"priority": "1",
"visibility": "V",
"reqid" : "2"
},
{
"claimid": "c5",
"priority": "3",
"visibility": "H",
"reqid" : "2"
}
]
}
And a query to filter documents by provided set of 'claims.claimid', then to sort by 'claims.priority', select the one with highest priority and return only the 'claims.visibility' e.g.:
GET resource/entity/_search/
{
"query": {
"nested": {
"path": "claims",
"query": {
"bool": {
"must": [
{
"terms": {
"claims.claimid": [
"c1",
"c4",
"c5"
]
}
}
]
}
},
"inner_hits": {
"sort": [
{
"claims.priority": "asc"
}
],
"size":1,
"_source":{"includes":["claims.visibility"]}
}
}
}
}
And finally the problem to be solved: how to modify the query to filter out documents having resulted in "H" for visibility with highest priority in inner hits? Or what other query will return a set of documents with visibility of highest priority filtered by provided claim ids, but only those where visibility is not "H"?
A catch here is that we have to sort the documents having all types of visibilities and filter out those with resulting "H" on a complete list of results.

Related

Elasticsearch - Return a subset of nested results

Elasticsearch 7.7 and I'm using the official php client to interact with the server.
My issue was somewhat solved here: https://discuss.elastic.co/t/need-to-return-part-of-a-doc-from-a-search-query-filter-is-parent-child-the-way-to-go/64514/2
However "Types are deprecated in APIs in 7.0+" https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html
Here is my document:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "boo"
},
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
},
{
"event_id": "9999",
"start_date": "2020-08-11 11:30:00",
"registration_count": "41",
"description": "test"
}
]
}
Notice how the object may have one or many "events"
Searching based on event data is the most common use case.
For example:
Find events that start before 12pm
Find events with a description of "xyz"
List find events with a start date in the next 10 days.
I would like to NOT return any events that didn't match the query!
So, for example Find events with a description of "xyz" for a given service
{
"query": {
"bool": {
"must": {
"match": {
"events.description": "xyz"
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"service_id": 20087
}
}
]
}
}
}
}
}
I would want the result to look like this:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
]
}
However, instead it just returns the ENTIRE document, with all events.
Is it even possible to return only a subset of the data? Maybe with Aggregations?
Right now, we're doing an "extra" set of filtering on the result set in the application (php in this case) to strip out event blocks that don't match the desired results.
It would be nice to just have elastic give directly what's needed instead of doing extra processing on the result to pull out the applicable event.
Thought about restructuring the data to instead have it based around "events" but then I would be duplicating data since every offering will have the parent data too.
This used to be in SQL, where there was a relation instead of having the data nested like this.
A subset of the nested data can be returned using Nested Aggregations along with Filter Aggregations
To know more about these aggregations refer these official documentation :
Filter Aggregation
Nested Aggregation
Index Mapping:
{
"mappings": {
"properties": {
"offering_id": {
"type": "integer"
},
"account_id": {
"type": "integer"
},
"service_id": {
"type": "integer"
},
"title": {
"type": "text"
},
"slug": {
"type": "text"
},
"summary": {
"type": "text"
},
"header_thumb_path": {
"type": "keyword"
},
"duration": {
"type": "integer"
},
"alter_ids": {
"type": "integer"
},
"premium": {
"type": "text"
},
"featured": {
"type": "text"
},
"events": {
"type": "nested",
"properties": {
"event_id": {
"type": "integer"
},
"registration_count": {
"type": "integer"
},
"description": {
"type": "text"
}
}
}
}
}
}
Search Query :
{
"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "events"
},
"aggs": {
"filter": {
"filter": {
"match": { "events.description": "xyz" }
},
"aggs": {
"total": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
Search Result :
"hits": [
{
"_index": "foo21",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "events",
"offset": 1
},
"_score": 1.0,
"_source": {
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
}
]
Second Method :
{
"query": {
"bool": {
"must": [
{
"match": {
"service_id": "20087"
}
},
{
"nested": {
"path": "events",
"query": {
"bool": {
"must": [
{
"match": {
"events.description": "xyz"
}
}
]
}
},
"inner_hits": {
}
}
}
]
}
}
}
You can even go through this SO answer:
How to filter nested aggregation bucket?
Returning a partial nested document in ElasticSearch

Elastic Search 6 Nested Query Aggregations

i am new to elastic search Query and aggregation.
I have a nested document with the following mapping
PUT /company
{
"mappings": {
`"data": {
"properties": {
"deptId": {
"type": "keyword"
},
"deptName": {
"type": "keyword"
},
"employee": {
"type": "nested",
"properties": {
"empId": {
"type": "keyword"
},
"empName": {
"type": "text"
},
"salary": {
"type": "float"
}
}}}}}}
I have inserted Sample Data as follows
PUT company/data/1
{
"deptId":"1",
"deptName":"HR",
"employee": [
{
"empId": "1",
"empName": "John",
"salary":"1000"
},
{
"empId": "2",
"empName": "Will",
"salary":"2000"
}
]}
PUT company/data/3
{
"deptId":"3",
"deptName":"FINANCE",
"employee": [
{
"empId": "1",
"empName": "John",
"salary":"1000"
},
{
"empId": "2",
"empName": "Will",
"salary":"2000"
},
{
"empId": "3",
"empName": "Mark",
"salary":"4000"
}]
}
How can i Construct a Query DSL for the following
Department with the maximum Employees
Employee that is present in most departments
I am using Elastic Search 6.2.4
Your First Questions answer is in this link nested inner doc count Which Stats
POST test/_search
{
"query": {
"nested": {
"path": "employee",
"inner_hits": {}
}
}
}
This Answers your Second Question there is also reading the link attached.
GET /my_index/blogpost/_search
{
"size" : 0,
"aggs": {
"employee": {
"nested": {
"path": "employee"
},
"aggs": {
"by_name": {
"terms": {
"field": "employee.empName"
}
}
}
}
}
}
Read Nested Agg
I hope this gives you what you need.

How to do aggregation on nested objects - Elasticsearch

I'm pretty new to Elasticsearch so please bear with me.
This is part of my document in ES.
{
"source": {
"detail": {
"attribute": {
"Size": ["32 Gb",4],
"Type": ["Tools",4],
"Brand": ["Sandisk",4],
"Color": ["Black",4],
"Model": ["Sdcz36-032g-b35",4],
"Manufacturer": ["Sandisk",4]
}
},
"title": {
"list": [
"Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
]
}
}
}
So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation.
So i did a query like this
{
"size": 0,
"aggs": {
"categoryList": {
"filter": {
"bool": {
"filter": [
{
"term": {
"title.list": "sandisk"
}
}
]
}
},
"aggs": {
"results": {
"terms": {
"field": "detail.attribute",
"size": 3
}
}
}
}
}
}
But it seems to be not working. How do I fix this? Any hints would be much appreciated.
This is the _mappings. It is not the complete one, but I guess this would suffice.
{
"catalog2_0": {
"mappings": {
"product": {
"dynamic": "strict",
"dynamic_templates": [
{
"attributes": {
"path_match": "detail.attribute.*",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"detail": {
"properties": {
"attMaxScore": {
"type": "scaled_float",
"scaling_factor": 100
},
"attribute": {
"dynamic": "true",
"properties": {
"Brand": {
"type": "text"
},
"Color": {
"type": "text"
},
"MPN": {
"type": "text"
},
"Manufacturer": {
"type": "text"
},
"Model": {
"type": "text"
},
"Operating System": {
"type": "text"
},
"Size": {
"type": "text"
},
"Type": {
"type": "text"
}
}
},
"description": {
"type": "text"
},
"feature": {
"type": "text"
},
"tag": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"title": {
"properties": {
"en": {
"type": "text"
}
}
}
}
}
}
}
}
According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.
Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.
Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype
Here is what I have used for a nested aggs query, minus the actual value names.
The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:
"STATUS_ID": {
"type": "keyword",
"index": "not_analyzed",
"doc_values": true
},
Query
GET index name/_search?size=200
{
"aggs": {
"panels": {
"nested": {
"path": "nested path"
},
"aggs": {
"statusCodes": {
"terms": {
"field": "nested path.STATUS.STATUS_ID",
"size": 50
}
}
}
}
}
}
Result
"aggregations": {
"status": {
"doc_count": 12108963,
"statusCodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O",
"doc_count": 5912218
},
{
"key": "C",
"doc_count": 401586
},
{
"key": "E",
"doc_count": 135628
},
{
"key": "Y",
"doc_count": 3742
},
{
"key": "N",
"doc_count": 1012
},
{
"key": "L",
"doc_count": 719
},
{
"key": "R",
"doc_count": 243
},
{
"key": "H",
"doc_count": 86
}
]
}
}

ElasticSearch - Getting paged result in a nested list (nested pagination)

I have the following Json that describes a country-city (1:n) relation
{
"country": [
{
"id": 1,
"name": "Country1",
"city": [
{"id": 1, "name": "City1"},
{"id": 2,"name": "City2"}
]
}, {
"id": 2,
"name": "Country2",
"city": [
{"id": 3,"name": "City3"},
{"id": 4,"name": "City4"}
]
}, {
"id": 3,
"name": "Country3",
"city": [
{"id": 5,"name": "City5"},
{"id": 6,"name": "City6"}
]
}
]
}
I have loaded it into an ES map with 3 documents of the three countries.
I have added nested property in the city index
...
"city": {
"type": "nested",
...
I want to query all cities and get a paged result.
For instance 3 hits will return city1, city2, city3
I want to filter by country name
I tried
GET /127.0.0.1:9200/country_city/_search
{
"from": 0,
"size": 2,
"fields": [
"city.id", "city.name"
]
}
and
GET /127.0.0.1:9200/country_city/country/_search?_source=false
{
"query": {
"nested": {
"path": "city",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": "city.id",
"from": 0,
"size": 3
}
}
},
"fields": [
"name",
"city.id",
"city.name"
]
}
But the first returned two 4 cities instead of 2.
(2 countries have 2 cities each)
The second returned all documents(although size is 2 in the request) and in an inner element returned the first 3 cities of each country.
How Can I get a page size of the nested object?
And then progress to the next page?
This should work
Mappings
{
"mappings": {
"type": {
"properties": {
"country": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"city": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
Query
{
"query": {
"nested": {
"path": "country",
"inner_hits": {},
"query": {
"nested": {
"path": "country.city",
"query": {
"match_all": {}
},
"inner_hits": {
"from": 0,
"size": 1,
"_source": {
"includes": ["country.city.name", "country.city.id"]
}
}
}
}
}
}
}
github bug
source filtering
Thanks

Elastic Search - OR filter with boolean and ids

I'm trying to search through items, where some of them might be private.
If a item is private, only friends of item owner (array item.friends) may see the item.
If it's not private, everyone can see it.
So my logic is:
If item is not is_private (is_private=0) OR user id (4 in my example) is in array item.friends, user can see the item.
Still i get no results. Every item is now set to is_private=1, so I guess something is wrong with my ids filter.
Any suggestions?
// ---- Mapping
{
"item": {
"properties": {
"name": {
"type": "string"
},
"description": {
"type": "string"
},
"created": {
"type": "date"
},
"location": {
"properties": {
"location": {
"type": "geo_point"
}
}
},
"is_proaccount": {
"type": "integer"
},
"is_given_away": {
"type": "integer"
},
"is_private": {
"type": "integer"
},
"friends": {
"type": "integer",
"index_name": "friend"
}
}
}
}
// ----- Example insert
{
"name": "Test",
"description": "Test",
"created": "2012-02-20T12:21:30",
"location": {
"location": {
"lat": "59.919914",
"lon": "10.753414"
}
},
"is_proaccount": "0",
"is_given_away": "0",
"is_private": 1,
"friends": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10
]
}
// ----- Query
{
"from": 0,
"size": 30,
"filter": {
"or": [
{
"bool": {
"must": [
{
"term": {
"is_private": 0
}
}
]
}
},
{
"ids": {
"values": [
4
],
"type": "friends"
}
}
]
},
"query": {
"match_all": {}
}
}
The "ids" filter probably does not mean what you think it means: it filters on the document ID (and, optionally, on the document type.)
See http://www.elasticsearch.org/guide/reference/query-dsl/ids-filter.html

Resources