elasticsearch dynamic field nested detection - elasticsearch

Hi im trying to create an index in my elastic search without defining the mapping so what i did was this.
PUT my_index1/my_type/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith",
"age" : "1",
"enabled": false
},
{
"first" : "Alice",
"last" : "White",
"age" : "10",
"enabled": true
}
]
}
if did this elastic search will create a mapping for this index which is the result is
{
"my_index1": {
"mappings": {
"my_type": {
"properties": {
"group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
if you would notice the property user didn't have a type of nested other properties has their own type defined by elastic search is there a way to it automatically the mapping should be look like this for the user property
"user": {
type:"nested"
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
which is missing. im currently using nest
is there a way to define a dynamic mapping to detect if the newly added data on index is nested?

By default, Elasticsearch/Lucene has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values.
The above document would be converted internally into a document that looks more like this: (See Nested field type for more details)
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
There is no beautiful answer here. A common approach might be using dynamic template to convert object to nested (however, a side effect is that all fields of object type would be changed to nested type),
{
"mappings": {
"dynamic_templates": [
{
"objects": {
"match": "*",
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
]
}
}
Another approach is specify mapping for the field before inserting data.
PUT <your index>
{
"mappings": {
"properties": {
"user": {
"type": "nested"
}
}
}
}

You can define a dynamic template where you can define your own custom mapping which can be used later when you index documents in the index.
Adding a step by step procedure, with the help of which automatically the mapping of the user field would be mapped to that of nested type
First, you need to define a dynamic template for the index as shown below, which have a match parameter which will match the field name having pattern similar to user* and map it to nested type
PUT /<index-name>
{
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "user*",
"mapping": {
"type": "nested"
}
}
}
]
}
}
After creating this template, you need to index the documents into it
POST /<index-name>/_doc/1
{
"group": "fans",
"user": [
{
"first": "John",
"last": "Smith",
"age": "1",
"enabled": false
},
{
"first": "Alice",
"last": "White",
"age": "10",
"enabled": true
}
]
}
Now when you see the mapping of the index documents, using the Get Mapping API, the mapping would be similar to what you expect to see
GET /<index-name>/_mapping?pretty
{
"index-name": {
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "user*",
"mapping": {
"type": "nested"
}
}
}
],
"properties": {
"group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested", // note this
"properties": {
"age": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"enabled": {
"type": "boolean"
},
"first": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Or as #Jacky1205 mentioned, if it is not field-specific then you can use the below template that will match all object type fields to be of nested type
{
"mappings": {
"dynamic_templates": [
{
"nested_users": {
"match": "*",
"match_mapping_type": "object",
"mapping": {
"type": "nested"
}
}
}
]
}
}

Related

how do we use and query the keyword field?

When I do
PUT /vehicles/_doc/123
{
"make" : "Honda Civic",
"color" : "Blue",
"from": "Japan",
"size": "Big",
"comment": "deja vu",
"HP" : 250,
"milage" : 24000,
"price": 19300.97
}
It automatically generate the index definition below:
{
"vehicles": {
"aliases": {},
"mappings": {
"properties": {
"HP": {
"type": "long"
},
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"comment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"from": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"make": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"milage": {
"type": "long"
},
"price": {
"type": "float"
},
"size": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "vehicles",
"creation_date": "1670864230815",
"number_of_replicas": "1",
"uuid": "etLFicsvSXCpeuFiYCiT0g",
"version": {
"created": "8050299"
}
}
}
}
}
In the index, say color, it has type text, and there is a field keyword, how do we use and query the keyword field?
You just need to use color.keyword in your query when you want to query the keyword field, if you want to just query the text part, you simply use the color in field name.
text and keyword fields are tokenised and stored differently and used in different scenario, this answer will be useful for understand the difference.

more like this search is not working on field in list object

this is the mapping of my index i am searching on payload nested field category
{
"mappings": {
"date_detection": false,
"properties": {
"#class": {
"type": "keyword"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"payload": {
"type": "nested",
"properties": {
"#class": {
"type": "keyword"
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"category": {
"type": "keyword"
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
This is my index mapping, Whenever i try to search object with more like this query in elasticsearch , it does not return any object,
**I am searching on list of object
**
the queries are
{
"query": {
"more_like_this": {
"fields": [
"payload.category"
],
"like": [
"ASSEST"
],
"min_term_freq": 1,
"max_query_terms": 12
}
}
}
It does not return any object but the values are present in elastic search
I just want to search similar object values present in elastic search thorough more like this query
But the Payload is actually list of objects which has filed category, i need to find similar objects according to it
For nested fields use nested query
GET <index-name>/_search
{
"query": {
"nested": {
"path": "payload",
"query": {
"more_like_this": {
"fields": [
"payload.category"
],
"like": [
"assest doc"
],
"min_term_freq": 1,
"max_query_terms": 12
}
}
}
}
}
Also look for min_doc_freq
The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.
If you have less than 5 matching documents set "min_doc_freq" to 1

Elasticsearch query for multiple terms

I am trying to create a search query that allows to search by name and type.
I have indexed the values, and my record in Elasticsearch look like this:
{
_index: "assets",
_type: "asset",
_id: "eAOEN28BcFmQazI-nngR",
_score: 1,
_source: {
name: "test.png",
mediaType: "IMAGE",
meta: {
content-type: "image/png",
width: 3348,
height: 1890,
},
createdAt: "2019-12-24T10:47:15.727Z",
updatedAt: "2019-12-24T10:47:15.727Z",
}
}
so how would I create for example, a query that finds all assets that have the name "test' and are images?
I tried multi_mach query but that did not return the correct results:
{
"query": {
"multi_match" : {
"query": "*test* IMAGE",
"type": "cross_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}
The query above returns 0 results, and if I change the operator to "or" it returns all this assets of type IMAGE.
Any suggestions would be greatly appreciated. TIA!
EDIT: Added Mapping
Below is the mapping:
{
"assets": {
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"index": {
"creation_date": "1575884312237",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "nSiAoIIwQJqXQRTyqw9CSA",
"version": {
"created": "7030099"
},
"provided_name": "assets"
}
}
}
}
You are unnecessary using the wildcard expression for this simple query.
First, change your analyzer on name field.
You need to create a custom analyzer which replaces . with space as default standard analyzer doesn't do that, so that you when searching for test you get test.png as there will be both test and png in the inverted index. The main benefit of doing this is to avoid the regex queries which are very costly.
Updated mapping with custom analyzer which would do the work for you. Just update your mapping and re-index again all the doc.
{
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"analyzer" : "my_analyzer"
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"replace_dots"
]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
},
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Second, you should change your query to bool query as below:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "test"
}
},
{
"match": {
"mediaType.keyword": "IMAGE"
}
}
]
}
}
}
Which is using must with 2 match queries means, that it would return docs only when there is a match in all the clauses of must query.
I already tested my solution by creating the index, inserting a few sample docs and query them, let me know if you need any help.
Did you tried with best_fields ?
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "best_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}

my phrase_prefix query does not work for numeric values

my query is pretty simple, it looks like this:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "something_to_search",
"type": "phrase_prefix",
"fields": [
"name",
"id"
...
],
"lenient": true
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
name is text value and id is numeric value, if I search for "Jo" I will get people who's names starts with "Jo", but if I search for "123" I wont get people who's id's starts with "123", but if I search for the exact id I will get a result.
can someone please tell me how can I get also prefix queries on numeric?
my mappings:
{
"people_db": {
"mappings": {
"person": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"street": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"streetNumber": {
"type": "long"
},
"zipCode": {
"type": "long"
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
}
}
}
}
}
}

Sort a nested array and return top 10 in elastic

I have a nested data type in an elastic index and want to sort this ascending for all returned results. I have tried the following:
GET indexname/_search
{
"_source" : ["m_iTopicID", "m_iYear", "m_Companies"],
"query": {
"terms":{
"m_iTopicID": [11,12,13]
}
},
"sort" : [
{
"m_Companies.value" : {
"order" : "asc",
"nested_path" : "m_Companies"
}
}
]
}
The mapping of the index as follows:
{
"indexname": {
"mappings": {
"topicyear": {
"properties": {
"m_Companies": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_People": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Places": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Subtopics": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_fActivation": {
"type": "float"
},
"m_iDocBodyWordCnt": {
"type": "long"
},
"m_iNodeID": {
"type": "long"
},
"m_iTopicID": {
"type": "long"
},
"m_iYear": {
"type": "long"
},
"m_szDocID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szDocTitle": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szGeo1": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSourceType": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSrcUrl": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szTopicNames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
This returns all topics with ID 11, 12 or 13 with a list of m_Companies... but the lists aren't sorted ascending by the value field.
I would then like to only return the top 10 of each list. So the list doesn't return hundreds like currently but just n. If I can't achieve this option I will just obtain the top 10 at the front-end with a javascript splice(0,10) but it would be great if elastic could do this for me.
Thanks in advance.
Since you provided the sort in the main/parent level query, this will sort only the parent/root documents. As you might have observed with the results that documents are sorted with minimum value for m_Companes.value.
To sort the nested documents for each document you have to go deep inside the nested document and apply sort as m_Companies are subdocuments in the parent document. You have to use nested inner_hits and then sort the inner_hits.
This github issue has very good example of what i was trying to explain as how this sorts only the parent/root document based on values in nested documents.
Since you want all documents in nested, so you can let the nested query to fetch all nested documents using match_all and sort based on value field.
you can use the following query
{
"_source": ["m_iYear", "m_Companies"],
"query": {
"bool": {
"must": [{
"terms": {
"m_iTopicID": [11, 12, 13]
}
},
{
"nested": {
"path": "m_Companies",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": [{
"m_Companies.value": "asc"
}]
}
}
}
]
}
},
"sort": [{
"m_Companies.value": {
"order": "asc",
"nested_path": "m_Companies"
}
}]
}
Hope this helps,
Thanks

Resources