ElasticSearch indexing with nested collections in document - elasticsearch

I have been wrestling with an issue trying to index a document into a brand new index in ElasticSearch. My document looks something like this:
{
"id": "",
"name": "Process to run batch of steps",
"defaultErrorStep": {
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "General Error Handler",
"type": "ERROR",
"reference": "error",
"onError": "DEFAULT"
},
"startingStep": "one",
"steps": [
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step One",
"type": "CHAIN",
"reference": "one",
"onComplete": "two",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Two",
"type": "CHAIN",
"reference": "two",
"onComplete": "two",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Three",
"type": "BOOLEAN",
"reference": "three",
"onTrue": "four",
"onFalse": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Four",
"type": "LOOP",
"startingStep": "seven",
"steps": [
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Two",
"type": "CHAIN",
"reference": "six",
"onComplete": "seven",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Five",
"type": "FINISH_VOID",
"end": false,
"reference": "seven",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
}
],
"reference": "four",
"onComplete": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Five",
"type": "FINISH",
"end": true,
"reference": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
}
],
"configuration": {
"settings": {
"property-a": "a",
"property-b": "b",
"property-c": "c",
"property-d": "d",
"property-z": "z123"
}
}
}
My issue is that due to the nested structure of the property "steps" and its ability to also have loop objects with "steps" inside of that, I get into an issue of field duplication when trying to index. I understand the reason (I think) as to why my document is failing but I need to index it all the same. When I try to index the document I get the following error:
ElasticsearchException[Elasticsearch exception [type=json_parse_exception, reason=Duplicate field 'type'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#84a0697; line: 1, column: 186]]]
Again, I understand why this is an issue but I figured I could address this with mappings in my index. I have tried nested object type, flattened object types and even setting index:false on the steps field just to see if I could get the document to go in. But, no chance. I know this is a going to be a simple fix somewhere I just cannot see but does anyone have any thoughts on what I can try to get this document to index.
I am using ElasticSearch 7.3.1 via the latest Java SDK release. I have bypassed the java code for now and just using POSTMAN to send the indexing command but still I get the same issue.
Below is an example of one of the mappings I have tried.
{
"_source" : {
"enabled": true
},
"properties" : {
"name": {
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"steps":{
"type":"nested",
"properties":{
"steps":{
"type":"flattened",
"index":false
}
}
},
"configuration.settings":{"type":"flattened"}
}
}
As well as a more explicit mapping to cover the "defaultErrorStep" object.
{
"_source" : {
"enabled": true
},
"properties" : {
"name": {
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"defaultErrorStep":{
"type":"object",
"properties":{
"id":{"type":"text"},
"name":{"type":"text"},
"type":{"type":"text"},
"reference":{"type":"text"},
"onError":{"type":"text"}
}
},
"steps":{
"type":"nested",
"properties":{
"id":{"type": "text"},
"name":{
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"type":{"type": "text"},
"reference":{"type": "text"},
"onComplete":{"type": "text"},
"onError":{"type": "text"},
"parameterKeys":{"type": "object"},
"onTrue":{"type": "text"},
"onFalse":{"type": "text"},
"startingStep":{"type": "text"},
"steps":{
"type":"nested",
"properties":{
"id":{"type": "text"},
"name":{
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"type":{"type": "text"},
"reference":{"type": "text"},
"onComplete":{"type": "text"},
"onError":{"type": "text"},
"parameterKeys":{"type": "object"},
"onTrue":{"type": "text"},
"onFalse":{"type": "text"},
"startingStep":{"type": "text"},
"steps":{
"type": "flattened",
"index":false
},
"end":{"type": "boolean"}
}
},
"end":{"type": "boolean"}
}
},
"configuration.settings":{"type":"flattened"}
}
}
Please also bear in mind that the nature of the document is to outline a process/workflow of logic and the structure is key and I would also say valid JSON. So in theory the steps property could nest 3,4,10 levels if it had to. So Ideally I wouldn't want to be updating the mapping every time a new level was added in the data.
Any help anyone can give me to get this document to index would be much appreciated.
Thanks,
EDIT:
I have since removed my explicit mapping from my index and let dynamic mapping take over as all my objects fit into the base types dynamic mapping supports. This has been successful and I am able to index the document shown above with infinitely nested steps no problem. I then tried the same operation with the same document structure using the JAVA SDK and this failed with the same duplicate field exception. This indicates to me the issue is with the JAVA SDK and not something native to elasticsearch itself.
Dynamic mapping is the better option in my case as I have no control over how many levels steps could eventually get to.
Has anyone experienced any issues with the SDK behaving differently to the base product?

I am running elastic 7.3.1 and with following index mapping i am successfully able to create index with nested types inside nested type.
PUT new_index_1
{
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"steps": {
"type": "nested",
"properties": {
"steps": {
"type": "flattened",
"index": false
}
}
},
"configuration.settings": {
"type": "flattened"
}
}
}
}
Following index creation also works for me
PUT new_index_2
{
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"steps": {
"type": "nested",
"properties": {
"steps": {
"type": "nested"
}
}
}
}
}
}
Document Indexed
POST new_index_1/_doc
{
"name": "ajay",
"steps": [
{
"test": "working",
"steps": [
{
"name": "crow"
}
]
}
]
}

Related

Elasticsearch - nested types vs collapse/aggs

I have a use case where I need to find the latest data based on some fields.
The fields are:
category.name
category.type
createdAt
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
The problem is that category.* is a nested field and I can't aggs/collapse these fields because ES doesn't support it.
Mapping:
PUT data
{
"mappings": {
"properties": {
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
},
"category": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
},
"approved": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Data:
POST data/_create/1
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/2
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
POST data/_create/3
{
"category": [
{
"name": "Rick J.",
"level": "B"
}
],
"createdBy": "Rick",
"createdAt": "2022-03-02 02:09:27.527+0200",
"approved": "no"
}
I'm looking for either a search query that can handle that in an acceptable performant way, or a new object design without nested type where I could take advantage of aggs/collapse feature.
Any suggestion will be really appreciated.
About your first question,
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
I believe you can do something along those lines:
GET /72088168/_search
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.name": "John G."
}
},
{
"match": {
"category.level": "A"
}
}
]
}
}
}
},
"sort": [
{
"createdAt": {
"order": "desc"
}
}
],
"size":1
}
For the 2nd matter, it really depends on what you are aiming to do. could merge category.name and category.level in the same field. Such that you document would look like:
{
"category": ["John G. A","Chris T. A"],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
No more nested needed. Although I agree it feels like using tape to fix your issue.

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Separation of hits returned from elastic by nested field value

I've index with products there. I'm trying to separate hits returned from elastic by nested field value. There's my shortened index:
{
"mapping": {
"product": {
"properties": {
"id": {
"type": "integer"
},
"model_name": {
"type": "text",
},
"variants": {
"type": "nested",
"properties": {
"attributes": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"product_attribute_id": {
"type": "integer"
},
"value": {
"type": "text"
}
}
},
"id": {
"type": "integer"
},
"product_id": {
"type": "integer"
}
}
}
}
}
}
}
And product example (there's is more variants and attributes in product - I just cut them off):
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
},
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}
The field in question is attribute id. Let's say I want to separate products by size attribute - id 29. It would be perfect if the response would look like:
"hits" : [
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
}
]
},
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}]
I thought about separate all variants in elastic request and then group them on application side by those attribute but i think it's not most elegant and above all, efficient way.
What are the elastic keywords that I should be interested in?
Thank you in advance for your help.

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

Elasticsearch: average count of matching nested documents

I have documents with nested items. The mapping is something like the following:
"document": {
"properties": {
"fieldA": { "type": "integer" },
"items": { "type": "nested",
"properties": {
"is_x": {"type": "boolean"},
"name": {"type": "string"}
}
}
}
}
And a sample document:
document:
fieldA: 123,
...
items:
[
{ "name": "item1", "is_x":true},
{ "name": "item2", "is_x":false},
...
{ "name": "itemn", "is_x":true}
]
I want to get the average count of items per document that have "is_x"=false
One option is to save this value during the indexing, but I would love to know how this can be done during the search itself (search performance is not an issue in this case).

Resources