ElasticSearch Inner Hits on has_parent nested Inner hits - elasticsearch

I've searched for this and haven't found anything that says whether this is or is not supported. According to Elastic documentation:
"Inner hits can be used by defining an inner_hits definition on a
nested, has_child or has_parent query and filter. "
I want to use inner_hits on a has_parent, nested object. I've tried it as illustrated in the example below. Does anyone know if this is possible?
Example Scenario (I've simplified the data and properties for the purpose of this post)
We store task title & description translations as a nested object in
the parent task. Each nested title has an iso code and a translated
title & description. We distribute child tasks to, in some cases,
thousands of users so it didn't make sense replicating the
title/description into each child object.
Parent Task Example
{
"_id": "parenttask_177448",
"startDate": "2020-05-01T00:00:00",
"endDate": "2020-05-05T00:00:00",
"type": "task",
"taskjoin" : "parenttask",
"priorityId": 1,
"translations": [
{
"title": "This is a test task",
"description": "test",
"localeIsoCode": [
"en-US"
]
},
{
"description": "tester",
"title": "Ceci est une tâche de test",
"localeIsoCode": [
"fr-FR"
]
}
]
}
Children Task(s) Example
{
"_id": "childtask_12345",
"taskSubType": "distributed",
"subtasks": [],
"startDate": "2020-03-19T00:00:00",
"endDate": "2020-03-19T00:00:00",
"taskJoinField": {
"name": "childtask",
"parent": "parenttask_177448"
},
"assignedUserId": 12345,
"assignedUserName": "Bob Jones"
}
Relevant part of the query I'm running that brings back no inner hits results
{
"has_parent": {
"ignore_unmapped": true,
"parent_type": "parenttask",
"query": {
"nested": {
"ignore_unmapped": true,
"inner_hits": {
"name": "innerhits_task",
"_source": {
"includes": [
"title"
]
}
},
"path": "translations",
"query": {
"term": {
"translations.localeIsoCode.keyword": {
"value": "fr-FR"
}
}
},
"boost": 1.1,
"_name": "nested_isocode"
}
},
"score": true,
"boost": 1.1,
"_name": "parent_isocode"
}
}
Relevant Mapping
{
"thinktime_dev_7003_tasks": {
"mappings": {
"properties": {
"assignedUserId": {
"type": "long"
},
"assignedUserName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"taskJoinField": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"parenttask": "childtask"
}
},
"localeIsoCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"locationId": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"translations": {
"type": "nested",
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"image": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"isPrimary": {
"type": "long"
},
"localeIsoCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
I'm getting results back from the child tasks but no inner hits matches. If I move the inner hits to the has_parent, I get all the translations back.
My question is whether doing a parent nested inner hits is possible in Elastic? I'm surprised I didn't find anyone else trying to do this or examples on the Internet. This seems like a pretty common use case.
Thanks for your help.

Related

more like this search is not working on field in list object

this is the mapping of my index i am searching on payload nested field category
{
"mappings": {
"date_detection": false,
"properties": {
"#class": {
"type": "keyword"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"payload": {
"type": "nested",
"properties": {
"#class": {
"type": "keyword"
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"category": {
"type": "keyword"
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
This is my index mapping, Whenever i try to search object with more like this query in elasticsearch , it does not return any object,
**I am searching on list of object
**
the queries are
{
"query": {
"more_like_this": {
"fields": [
"payload.category"
],
"like": [
"ASSEST"
],
"min_term_freq": 1,
"max_query_terms": 12
}
}
}
It does not return any object but the values are present in elastic search
I just want to search similar object values present in elastic search thorough more like this query
But the Payload is actually list of objects which has filed category, i need to find similar objects according to it
For nested fields use nested query
GET <index-name>/_search
{
"query": {
"nested": {
"path": "payload",
"query": {
"more_like_this": {
"fields": [
"payload.category"
],
"like": [
"assest doc"
],
"min_term_freq": 1,
"max_query_terms": 12
}
}
}
}
}
Also look for min_doc_freq
The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.
If you have less than 5 matching documents set "min_doc_freq" to 1

How to perform nested aggregation in child parent relationship

I am using elasticsearch 7.11 and have implemented parent child relation on of the base reason was my updates were very frequent and time a new child could be added under 1 parent,
My project is something managing all the computers in the network all the activity related to the endpoints should be logged for the analytics purpose so.
My mapping is some thing.
PcInformation -> User
Now Pc has its own information the main thing to note is the activationTime and the user has its Department, username, role etc.
Now I want to get the top departments w.r.t to PC and its time.
Say I want to know which departments have most number of PC in 2020.
What I am currently doing is first get all the PC using the user relationship using hasChild query is below.
{
"query": {
"bool": {
"filter": [
{
"has_child": {
"type": "user",
"query": {
"nested": {
"path": "user",
"query": {
"match_all": {}
}
}
}
}
},
{
"range": {
"regDate": {
"gte": "2020-04-11",
"lte": "2022-04-31"
}
}
}
]
}
}
}
This would return me all the PC in specific time.
And then I am performing aggregation first on user than sub aggregation on pcConnection data for the time based aggragation now I want to know the name of the department but this is not in the the pc information.
One thing is to put user information in the pc but I would lost for what I am using parent child model.
Is there anyway to do so ?
Updated
The Sample Mapping
{
"pcinformation": {
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested",
"properties": {
"userGroup": {
"type": "keyword"
},
"userTeam": {
"type": "keyword"
},
"userCode": {
"type": "long"
},
"userName": {
"type": "keyword"
}
}
},
"antivirus": {
"type": "nested",
"properties": {
"datetime": {
"type": "date"
},
"name": {
"type": "keyword"
}
}
},
"cpuId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"domainName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firewall": {
"type": "nested",
"properties": {
"datetime": {
"type": "date"
},
"status": {
"type": "keyword"
}
}
},
"friendlyName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"activationDate": {
"type": "date"
},
"macId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"osArch": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"osType": {
"type": "keyword"
},
"osVersion": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"pcSignature": {
"type": "text"
},
"pcSignatureHash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"relation": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"infection": [
"user"
]
}
},
"userName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vm": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
So I got two records as this is parent child the one is
{
"_index": "pcInformation",
"_type": "_doc",
"_id": "abcd",
"_version": 1,
"_score": 1,
"_source": {
"_class": "stor.doc.pcInformation",
"pcSignatureHash": "abcd",
"pcSignature": "dddd",
"name": "DESKTOP8JGBPB9",
"userName": "Win1064",
"osType": "Windows.10.Enterprise",
"domainName": "DESKTOP8JGBPB9",
"cpuId": "NOCPUID",
"osVersion": "10.0.19042",
"osArch": "32",
"macId": "0800278A763D",
"activationDate": "2021-05-25T08:46:30.510Z",
"vm": "No VM",
"friendlyName": "Windows Defender",
"relation": {
"name": "pcInformation"
}
}
}
The other one is user information.
{
"_index": "pcInformation",
"_type": "_doc",
"_id": "Qw60onkBDTnt1BMJOeq0",
"_version": 1,
"_score": 1,
"_routing": "abcd",
"_source": {
"_class": "stor.doc.pcInformation",
"agent": {
"userCode": 1,
"userGroup":"admin",
"userRole":"manager"
},
"relation": {
"name": "user",
"parent": "abcd"
}
}
}

Elasticsearch query for multiple terms

I am trying to create a search query that allows to search by name and type.
I have indexed the values, and my record in Elasticsearch look like this:
{
_index: "assets",
_type: "asset",
_id: "eAOEN28BcFmQazI-nngR",
_score: 1,
_source: {
name: "test.png",
mediaType: "IMAGE",
meta: {
content-type: "image/png",
width: 3348,
height: 1890,
},
createdAt: "2019-12-24T10:47:15.727Z",
updatedAt: "2019-12-24T10:47:15.727Z",
}
}
so how would I create for example, a query that finds all assets that have the name "test' and are images?
I tried multi_mach query but that did not return the correct results:
{
"query": {
"multi_match" : {
"query": "*test* IMAGE",
"type": "cross_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}
The query above returns 0 results, and if I change the operator to "or" it returns all this assets of type IMAGE.
Any suggestions would be greatly appreciated. TIA!
EDIT: Added Mapping
Below is the mapping:
{
"assets": {
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"index": {
"creation_date": "1575884312237",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "nSiAoIIwQJqXQRTyqw9CSA",
"version": {
"created": "7030099"
},
"provided_name": "assets"
}
}
}
}
You are unnecessary using the wildcard expression for this simple query.
First, change your analyzer on name field.
You need to create a custom analyzer which replaces . with space as default standard analyzer doesn't do that, so that you when searching for test you get test.png as there will be both test and png in the inverted index. The main benefit of doing this is to avoid the regex queries which are very costly.
Updated mapping with custom analyzer which would do the work for you. Just update your mapping and re-index again all the doc.
{
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"analyzer" : "my_analyzer"
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"replace_dots"
]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
},
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Second, you should change your query to bool query as below:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "test"
}
},
{
"match": {
"mediaType.keyword": "IMAGE"
}
}
]
}
}
}
Which is using must with 2 match queries means, that it would return docs only when there is a match in all the clauses of must query.
I already tested my solution by creating the index, inserting a few sample docs and query them, let me know if you need any help.
Did you tried with best_fields ?
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "best_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}

my phrase_prefix query does not work for numeric values

my query is pretty simple, it looks like this:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "something_to_search",
"type": "phrase_prefix",
"fields": [
"name",
"id"
...
],
"lenient": true
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
name is text value and id is numeric value, if I search for "Jo" I will get people who's names starts with "Jo", but if I search for "123" I wont get people who's id's starts with "123", but if I search for the exact id I will get a result.
can someone please tell me how can I get also prefix queries on numeric?
my mappings:
{
"people_db": {
"mappings": {
"person": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"street": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"streetNumber": {
"type": "long"
},
"zipCode": {
"type": "long"
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
}
}
}
}
}
}

Sort a nested array and return top 10 in elastic

I have a nested data type in an elastic index and want to sort this ascending for all returned results. I have tried the following:
GET indexname/_search
{
"_source" : ["m_iTopicID", "m_iYear", "m_Companies"],
"query": {
"terms":{
"m_iTopicID": [11,12,13]
}
},
"sort" : [
{
"m_Companies.value" : {
"order" : "asc",
"nested_path" : "m_Companies"
}
}
]
}
The mapping of the index as follows:
{
"indexname": {
"mappings": {
"topicyear": {
"properties": {
"m_Companies": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_People": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Places": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Subtopics": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_fActivation": {
"type": "float"
},
"m_iDocBodyWordCnt": {
"type": "long"
},
"m_iNodeID": {
"type": "long"
},
"m_iTopicID": {
"type": "long"
},
"m_iYear": {
"type": "long"
},
"m_szDocID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szDocTitle": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szGeo1": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSourceType": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSrcUrl": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szTopicNames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
This returns all topics with ID 11, 12 or 13 with a list of m_Companies... but the lists aren't sorted ascending by the value field.
I would then like to only return the top 10 of each list. So the list doesn't return hundreds like currently but just n. If I can't achieve this option I will just obtain the top 10 at the front-end with a javascript splice(0,10) but it would be great if elastic could do this for me.
Thanks in advance.
Since you provided the sort in the main/parent level query, this will sort only the parent/root documents. As you might have observed with the results that documents are sorted with minimum value for m_Companes.value.
To sort the nested documents for each document you have to go deep inside the nested document and apply sort as m_Companies are subdocuments in the parent document. You have to use nested inner_hits and then sort the inner_hits.
This github issue has very good example of what i was trying to explain as how this sorts only the parent/root document based on values in nested documents.
Since you want all documents in nested, so you can let the nested query to fetch all nested documents using match_all and sort based on value field.
you can use the following query
{
"_source": ["m_iYear", "m_Companies"],
"query": {
"bool": {
"must": [{
"terms": {
"m_iTopicID": [11, 12, 13]
}
},
{
"nested": {
"path": "m_Companies",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": [{
"m_Companies.value": "asc"
}]
}
}
}
]
}
},
"sort": [{
"m_Companies.value": {
"order": "asc",
"nested_path": "m_Companies"
}
}]
}
Hope this helps,
Thanks

Resources