i am new to elastic search Query and aggregation.
I have a nested document with the following mapping
PUT /company
{
"mappings": {
`"data": {
"properties": {
"deptId": {
"type": "keyword"
},
"deptName": {
"type": "keyword"
},
"employee": {
"type": "nested",
"properties": {
"empId": {
"type": "keyword"
},
"empName": {
"type": "text"
},
"salary": {
"type": "float"
}
}}}}}}
I have inserted Sample Data as follows
PUT company/data/1
{
"deptId":"1",
"deptName":"HR",
"employee": [
{
"empId": "1",
"empName": "John",
"salary":"1000"
},
{
"empId": "2",
"empName": "Will",
"salary":"2000"
}
]}
PUT company/data/3
{
"deptId":"3",
"deptName":"FINANCE",
"employee": [
{
"empId": "1",
"empName": "John",
"salary":"1000"
},
{
"empId": "2",
"empName": "Will",
"salary":"2000"
},
{
"empId": "3",
"empName": "Mark",
"salary":"4000"
}]
}
How can i Construct a Query DSL for the following
Department with the maximum Employees
Employee that is present in most departments
I am using Elastic Search 6.2.4
Your First Questions answer is in this link nested inner doc count Which Stats
POST test/_search
{
"query": {
"nested": {
"path": "employee",
"inner_hits": {}
}
}
}
This Answers your Second Question there is also reading the link attached.
GET /my_index/blogpost/_search
{
"size" : 0,
"aggs": {
"employee": {
"nested": {
"path": "employee"
},
"aggs": {
"by_name": {
"terms": {
"field": "employee.empName"
}
}
}
}
}
}
Read Nested Agg
I hope this gives you what you need.
Related
First of all, I must say I'm on Elasticsearch 5.6.16
I'm trying to figuring out what's happening here. I have several documents indexed with this mapping (I copied the document directly from Kibana):
{
"_index": "my_index",
"_type": "doc",
"_id": "Outbreak_10346",
"_version": 1,
"_score": 1,
"_source": {
"outbreakId": 10346,
"reference": "XX-AD-2021-00003",
"countryCode": "BE",
"adisNotificationReasonType": {
"code": "TERRESTRIAL"
},
"approximateLocation": false,
"latitude": 50.93766,
"longitude": 3.97156,
"adminZoneLevelOne": {
"zoneId": 40,
"zoneCode": "BE2"
},
"affectedSpecies": [
{
"speciesId": 16703,
"name": "Swine",
"measuringUnit": "ANIMAL",
"casesQuantity": 10,
"deadQuantity": 1,
"susceptibleQuantity": 100,
"isAquatic": false
}
],
"affectedSpeciesTotalSusceptible": 100,
"affectedSpeciesTotalCases": 10
}
}
If I do this query in Kibana:
GET my_index/_search
{
"query": {
"exists": {
"field": "adminZoneLevelOne"
}
}
}
I don't get any results. But if I change the field to any of the others I find the documents.
Also, when I retrieve the documents I can access the adminZoneLevelOne field.
How's this possible? Why Elasticsearch doesn't find any document with that field?
The index mapping for adminZoneLevelOne field is:
"adminZoneLevelOne": {
"type": "nested",
"properties": {
"zoneCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "WHITESPACE"
},
"zoneId": {
"type": "long"
}
}
}
And for adisNotificationReasonType that works fine, is:
"adisNotificationReasonType": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "LOWERCASE_KEYWORD"
}
}
}
Since adminZoneLevelOne is of nested type, you need to use exists query along with the nested query as
{
"query": {
"nested": {
"path": "adminZoneLevelOne",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "adminZoneLevelOne"
}
}
]
}
}
}
}
}
I have a "products" index with the nested field mapping. I perform a search query with nested aggregation and term aggregation by nested object's id. How to get "title" and "slug" properties from nested object in buckets?
PUT /products
{
"mappings": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"id": { "type": "long" },
"title": { "type": "text" },
"slug": { "type": "keyword" }
}
}
}
}
}
POST /products/_doc
{
"name": "Acer Aspire 5 Slim Laptop",
"categories": [
{
"id": 1,
"title": "Laptops",
"slug": "/catalog/laptops"
},
{
"id": 2,
"title": "Ultrabooks",
"slug": "/catalog/ultrabooks"
}
]
}
GET /products/_search
{
"query": {
"match": { "name": "acer" }
},
"aggs": {
"categories": {
"nested": {
"path": "categories"
},
"aggs": {
"id": {"terms": {"field": "categories.id"}}
}
}
}
}
That's a great start!! All you need is to add a top_hits sub-aggregation like this:
GET /products/_search
{
"query": {
"match": {
"name": "acer"
}
},
"aggs": {
"categories": {
"nested": {
"path": "categories"
},
"aggs": {
"id": {
"terms": {
"field": "categories.id"
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"_source": ["categories.title", "categories.slug"]
}
}
}
}
}
}
}
}
Elasticsearch 7.7 and I'm using the official php client to interact with the server.
My issue was somewhat solved here: https://discuss.elastic.co/t/need-to-return-part-of-a-doc-from-a-search-query-filter-is-parent-child-the-way-to-go/64514/2
However "Types are deprecated in APIs in 7.0+" https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html
Here is my document:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "boo"
},
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
},
{
"event_id": "9999",
"start_date": "2020-08-11 11:30:00",
"registration_count": "41",
"description": "test"
}
]
}
Notice how the object may have one or many "events"
Searching based on event data is the most common use case.
For example:
Find events that start before 12pm
Find events with a description of "xyz"
List find events with a start date in the next 10 days.
I would like to NOT return any events that didn't match the query!
So, for example Find events with a description of "xyz" for a given service
{
"query": {
"bool": {
"must": {
"match": {
"events.description": "xyz"
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"service_id": 20087
}
}
]
}
}
}
}
}
I would want the result to look like this:
{
"offering_id": "1190",
"account_id": "362353",
"service_id": "20087",
"title": "Quick Brown Mammal",
"slug": "Quick Brown Fox",
"summary": "Quick Brown Fox"
"header_thumb_path": "uploads/test/test.png",
"duration": "30",
"alter_ids": [
"59151",
"58796",
"58613",
"54286",
"51812",
"50052",
"48387",
"37927",
"36685",
"36554",
"28807",
"23154",
"22356",
"21480",
"220",
"1201",
"1192"
],
"premium": "f",
"featured": "f",
"events": [
{
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
]
}
However, instead it just returns the ENTIRE document, with all events.
Is it even possible to return only a subset of the data? Maybe with Aggregations?
Right now, we're doing an "extra" set of filtering on the result set in the application (php in this case) to strip out event blocks that don't match the desired results.
It would be nice to just have elastic give directly what's needed instead of doing extra processing on the result to pull out the applicable event.
Thought about restructuring the data to instead have it based around "events" but then I would be duplicating data since every offering will have the parent data too.
This used to be in SQL, where there was a relation instead of having the data nested like this.
A subset of the nested data can be returned using Nested Aggregations along with Filter Aggregations
To know more about these aggregations refer these official documentation :
Filter Aggregation
Nested Aggregation
Index Mapping:
{
"mappings": {
"properties": {
"offering_id": {
"type": "integer"
},
"account_id": {
"type": "integer"
},
"service_id": {
"type": "integer"
},
"title": {
"type": "text"
},
"slug": {
"type": "text"
},
"summary": {
"type": "text"
},
"header_thumb_path": {
"type": "keyword"
},
"duration": {
"type": "integer"
},
"alter_ids": {
"type": "integer"
},
"premium": {
"type": "text"
},
"featured": {
"type": "text"
},
"events": {
"type": "nested",
"properties": {
"event_id": {
"type": "integer"
},
"registration_count": {
"type": "integer"
},
"description": {
"type": "text"
}
}
}
}
}
}
Search Query :
{
"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "events"
},
"aggs": {
"filter": {
"filter": {
"match": { "events.description": "xyz" }
},
"aggs": {
"total": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
Search Result :
"hits": [
{
"_index": "foo21",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "events",
"offset": 1
},
"_score": 1.0,
"_source": {
"event_id": "9999",
"start_date": "2020-07-01 14:00:00",
"registration_count": "22",
"description": "xyz"
}
}
]
Second Method :
{
"query": {
"bool": {
"must": [
{
"match": {
"service_id": "20087"
}
},
{
"nested": {
"path": "events",
"query": {
"bool": {
"must": [
{
"match": {
"events.description": "xyz"
}
}
]
}
},
"inner_hits": {
}
}
}
]
}
}
}
You can even go through this SO answer:
How to filter nested aggregation bucket?
Returning a partial nested document in ElasticSearch
I'm trying to apply location radius on nested ES query but the nested value is not present all the time causing exception
"[nested] nested object under path [contact.address] is not of nested type"
I tried to check if the property exists then apply filter but nothing worked so far
The mapping is like:
{
"records": {
"mappings": {
"user": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"properties": {
"lat": {
"type": "long"
},
"lng": {
"type": "long"
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"properties": {
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"created_at": {
"type": "date"
}
}
}
}
}
}
and sometimes the records do not have the location or address data which cases problems. sample record:
{
"contact": {
"name": {
"first_name": "Test",
"last_name": "User"
},
"email": "test#user.com",
"address": {}
},
"user_id": 532188
}
here is what i'm trying:
GET records/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.address.location"
}
}
],
"minimum_should_match": 1,
"should": [
{
"bool": {
"filter": {
"nested": {
"ignore_unmapped": true,
"path": "contact.address",
"query": {
"geo_distance": {
"distance": "50mi",
"contact.address.location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
}
}
}
}
}
]
}
}
}
You need to define proper mapping with nested datatype to avoid this issue, looks dynamic mapping is creating some issue.
I defined my own mapping with nested datatype and even when I miss, some data in the nested fields, it doesn't complain.
Index def
{
"mappings": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"type": "nested"
}
}
}
}
Index sample doc
{
"contact": {
"name": {
"first_name": "raza",
"last_name": "ahmed"
},
"email": "opster#user.com",
"address" :{ --> note empty nested field
}
},
"user_id": 123456
}
Index another doc with data in the nested field
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": {
"location" :{. --> note nested data as well
"lat" : 51.5073509,
"lon" : -0.1277583
}
}
},
"user_id": 123456
}
Index another doc, which doesn't have even empty nested data
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com"
},
"user_id": 123456
}
Search query using nested field
{
"query": {
"nested": {
"path": "contact", --> note this
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.name.first_name"
}
}
]
}
}
}
}
}
The search result doesn't complain about the docs which don't include the nested doc (query which gives you issues)
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": { --> note the nested doc
"location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
},
"user_id": 123456
}
}
We have a simple index of entities with mapping:
PUT resource/_mapping/entity
{
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"claims": {
"type": "nested",
"properties": {
"claimid": {
"type": "keyword"
},
"priority": {
"type": "short"
},
"visibility": {
"type": "keyword"
}
}
}
}
}
Here's a sample document in the index:
POST resource/entity/
{
"id": "2",
"name": "e2",
"claims": [
{
"claimid": "c1",
"priority": "2",
"visibility": "M",
"reqid" : "2"
},
{
"claimid": "c2",
"priority": "1",
"visibility": "V",
"reqid" : "2"
},
{
"claimid": "c5",
"priority": "3",
"visibility": "H",
"reqid" : "2"
}
]
}
And a query to filter documents by provided set of 'claims.claimid', then to sort by 'claims.priority', select the one with highest priority and return only the 'claims.visibility' e.g.:
GET resource/entity/_search/
{
"query": {
"nested": {
"path": "claims",
"query": {
"bool": {
"must": [
{
"terms": {
"claims.claimid": [
"c1",
"c4",
"c5"
]
}
}
]
}
},
"inner_hits": {
"sort": [
{
"claims.priority": "asc"
}
],
"size":1,
"_source":{"includes":["claims.visibility"]}
}
}
}
}
And finally the problem to be solved: how to modify the query to filter out documents having resulted in "H" for visibility with highest priority in inner hits? Or what other query will return a set of documents with visibility of highest priority filtered by provided claim ids, but only those where visibility is not "H"?
A catch here is that we have to sort the documents having all types of visibilities and filter out those with resulting "H" on a complete list of results.