Elasticsearch merge new document with the existing document - elasticsearch

I want to merge new document with the existing document in elasticsearch instead of override. I have below record in ES,
{
"id": "1",
"student_name": "Rahul",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Started"
}
]
}
I have received another json to process I need to update the existing document if id is same or just insert it. If I receive below json,
{
"id": "1",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}
I want to have my final document like below:
{
"id": "1",
"student_name": "Rahul",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}
So basically I want to merge the new json with the existing document if any. i.e. for any given key be it on top or nested if its there in db but not received this time I have to retain that as it is. I got any new key have to add it and if updated have to modify.
Also for the array of json inside the doc if I got same id in json I have to replace but if new json with new id, I need to append that json in the array.
I want to understand whether it is possible to via es queries if yes then want to know the way how to achieve it. Merging at application level and override I can think one way but want to know the better way.

You can achieve this with an upsert query.
The first piece will be indexed as new document because it doesn't exist yet:
POST my-index/_doc/1/_update
{
"doc": {
"id": "1",
"student_name": "Rahul",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Started"
}
]
},
"doc_as_upsert": true
}
And the second piece will be merged with the first one because it already exists:
POST my-index/_doc/1/_update
{
"doc": {
"id": "1",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
},
"doc_as_upsert": true
}
The document you get after the two commands will be the one you expect:
GET my-index/_doc/1
=>
{
"id": "1",
"student_name": "Rahul",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}

Related

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Rest Query on the Patient Resource for Finding BOTH/ALL Given Name(s)

How do I search for a person with BOTH given names I provide?
I have the following 2 patients who are "close". Everything (in the Human Name area) is the same except one of the GivenNames are the same.
Note "Apple" vs "Banana".
{
"resourceType": "Bundle",
"id": "269caf66-0ccc-43e7-b9a5-f16f84db0149",
"meta": {
"lastUpdated": "2019-11-20T19:30:26.858917+00:00"
},
"type": "searchset",
"link": [
{
"relation": "self",
"url": "https://localhost:44348/Patient?given=Jingerheimer"
}
],
"entry": [
{
"fullUrl": "https://localhost:44348/Patient/504f6bd3-e9b4-4846-8948-97bf09c70722",
"resource": {
"resourceType": "Patient",
"id": "504f6bd3-e9b4-4846-8948-97bf09c70722",
"meta": {
"versionId": "1",
"lastUpdated": "2019-11-20T19:26:11.005+00:00"
},
"identifier": [
{
"system": "ssn",
"value": "111-11-1111"
},
{
"system": "uuid",
"value": "da55d068e0784b359fa97498a11543c5"
}
],
"name": [
{
"family": "Smith",
"given": [
"John",
"Apple",
"Jingerheimer"
]
}
]
},
"search": {
"mode": "match"
}
},
{
"fullUrl": "https://localhost:44348/Patient/10054ce9-6141-4eca-bc5b-0978f8c8afcb",
"resource": {
"resourceType": "Patient",
"id": "10054ce9-6141-4eca-bc5b-0978f8c8afcb",
"meta": {
"versionId": "1",
"lastUpdated": "2019-11-20T19:26:48.962+00:00"
},
"identifier": [
{
"system": "ssn",
"value": "222-22-2222"
},
{
"system": "uuid",
"value": "52d09f9436d44591816fd229dd139523"
}
],
"name": [
{
"family": "Smith",
"given": [
"John",
"Banana",
"Jingerheimer"
]
}
]
},
"search": {
"mode": "match"
}
}
]
}
One has GivenNames that include "Apple". The other includes GivenNames that include "Banana".
This search works fine:
https://localhost:44348/Patient/?given=Jingerheimer
What I have tried is:
https://localhost:44348/Patient/?given=Jingerheimer&given=Apple
but that gives me no results.
Note, omitting "given=Jingerheimer" is not an option....that filters a bunch of others.
I'm trying to get
"Has BOTH of the given names I provide"
Your syntax is correct, so I think the server does not handle the search correctly. Can you check the self link for your second search to see if it reflects the search you performed? Does the result Bundle have an OperationOutcome detailing something went wrong? If all that seems okay, you'll need to check your server's code.

How to turn an array of object to array of string while reindexing in elasticsearch?

Let say the source index have a document like this :
{
"name":"John Doe",
"sport":[
{
"name":"surf",
"since":"2 years"
},
{
"name":"mountainbike",
"since":"4 years"
},
]
}
How to discard the "since" information so once reindexed the object will contain only sport names? Like this :
{
"name":"John Doe",
"sport":["surf","mountainbike"]
}
Note that it would be fine if the resulting field keep the same name, but it's not mandatory.
I don't know which version of elasticsearch you're using, but here is a solution based on pipelines, introduced with ingest nodes in ES v5.0.
1) A script processor is used to extract the values from each subobject and set it in another field (here, sports)
2) The previous sport field is removed with a remove processor
You can use the Simulate pipeline API to test it :
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "random description",
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name) }"
}
},
{
"remove": {
"field": "sport"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sport": [
{
"name": "surf",
"since": "2 years"
},
{
"name": "mountainbike",
"since": "4 years"
}
]
}
}
]
}
which outputs the following result :
{
"docs": [
{
"doc": {
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sports": [
"surf",
"mountainbike"
]
},
"_ingest": {
"timestamp": "2018-07-12T14:07:25.495Z"
}
}
}
]
}
There may be a better solution, as I've not used pipelines a lot, or you could make this with Logstash filters before submitting the documents to your Elasticsearch cluster.
For more information about the pipelines, take a look at the reference documentation of ingest nodes.

Object Array search support in Elasticsearch

I have a array of object in elastic search.
I would like to search if a particular field value appears in top 2 position of the array without using script.
Imagine my ES data is as follows
[
{
"_id": "TestID1",
"data": [
{
"name": "Test1",
"priority": 2
},
{
"name": "Test2",
"priority": 3
},
{
"name": "Test3",
"priority": 4
}
]
},
{
"_id": "TestID2",
"data": [
{
"name": "Test3",
"priority": 2
},
{
"name": "Test9",
"priority": 3
},
{
"name": "Test5",
"priority": 4
},
{
"name": "Test10",
"priority": 5
}
]
},
{
"_id": "TestID3",
"data": [
{
"name": "Test1",
"priority": 2
},
{
"name": "Test2",
"priority": 3
},
{
"name": "Test3",
"priority": 6
}
]
}
]
Here I would like to make a query which searches for _Test3_ ONLY within the top 2 elements of the data array.
Searching here would return the result
_id: TestID2's data
because only TestID2 has Test3 in the top 2 of the data array.
You will not be able to perform such request directly without using script. The only solution that I can think of is to create a copy of the array field containing only the first 2 elements. You will then be able to search on this field.
You can add an ingest pipeline to trim your array automatically.
PUT /_ingest/pipeline/top2_elements
{
"description": "Create a top2 field containing only the first two values of an array",
"processors": [
{
"script": {
"source": "ctx.top2 = [ctx.data[0], ctx.data[1]]"
}
}
]
}

How to search exact text in nested document in elasticsearch

I have a index like this,
"_index": "test",
"_type": "products",
"_id": "URpYIFBAQRiPPu1BFOZiQg",
"_score": null,
"_source": {
"currency": null,
"colors": [],
"api": 1,
"sku": 9999227900050002,
"category_path": [
{
"id": "cat00000",
"name": "B1"
},
{
"id": "abcat0400000",
"name": "Cameras & Camcorders"
},
{
"id": "abcat0401000",
"name": "Digital Cameras"
},
{
"id": "abcat0401005",
"name": "Digital SLR Cameras"
},
{
"id": "pcmcat180400050006",
"name": "DSLR Package Deals"
}
],
"price": 1034.99,
"status": 1,
"description": null,
}
And i want to search only exact text ["Camcorders"] in category_path field.
I did some match query, but it search all the products which has "Camcorders" as a part of the text. Can some one help me to solve this.
Thanks
To search in nested field use like following query
{
"query": {
"term": {
"category_path.name": {
"value": "b1"
}
}
}
}
HOpe it helps..!
you could add one more nested field raw_name with not_analyzed analyzer and match against it.

Resources