Elastic Search: Adding an element on an array - elasticsearch

I am trying to batch update documents on the elastic search index. I want to know how can I achieve this scenario.
I have to create document if no document of that primaryKey exist.
I have to add the data in the array of the document if the primary key exist.
For example -
For initial write / if primary key not present.
Document written =
{
PrimaryKey,
DataList: [
{
DataField1: fieldValue1,
DataField2: fieldValue2,
}
]
}
if the document was present, the entry would have been appended to the list
{
PrimaryKey,
DataList: [
{
DataField1: fieldValue1,
DataField2: fieldValue2,
},
{
DataField1: fieldValue3,
DataField2: fieldValue4
}
....
]
}
In a batch update both types of primaryKeys may be present one which have document already present in the index, some document which was never added to the index.

I think this example can serve as a basis for your bulk.
What I did was to consider that the _id and PrimaryKey are the same because the way to know if the docmentos exists is through the _id, if it doesn't exist a new document is created.
I used the script to add items to the list if it already exists.
Read more about Update API upsert parameter.
Mapping
PUT my-index-000001
{
"mappings": {
"properties": {
"PrimaryKey": {
"type": "keyword"
},
"DataField1": {
"type": "nested"
}
}
}
}
POST my-index-000001/_doc/1
{
"PrimaryKeyame": 1,
"DataList": [
{
"DataField1": "fieldValue1",
"DataField2": "fieldValue2"
}
]
}
Bulk will add items to doc 1 and create the new document 2 (this does not exist in the index).
POST _bulk
{ "update" : { "_id" : "1", "_index" : "my-index-000001", "retry_on_conflict" : 3} }
{ "script" : { "source": "if (ctx._source.PrimaryKeyame != null) { ctx._source.DataList.addAll(params.DataList); }", "lang" : "painless", "params": { "PrimaryKeyame": "1", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}, "upsert" : {"PrimaryKeyame": "1", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}
{ "update" : { "_id" : "2", "_index" : "my-index-000001", "retry_on_conflict" : 3} }
{ "script" : { "source": "if (ctx._source.PrimaryKeyame != null) { ctx._source.DataList.addAll(params.DataList); }", "lang" : "painless", "params": { "PrimaryKeyame": "2", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}, "upsert" : {"PrimaryKeyame": "2", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}
Get Documents:
"hits": [
{
"_index": "my-index-000001",
"_id": "1",
"_score": 1,
"_source": {
"PrimaryKeyame": 1,
"DataList": [
{
"DataField1": "fieldValue1",
"DataField2": "fieldValue2"
},
{
"DataField2": "fieldValue4",
"DataField1": "fieldValue3"
}
]
}
},
{
"_index": "my-index-000001",
"_id": "2",
"_score": 1,
"_source": {
"PrimaryKeyame": "2",
"DataList": [
{
"DataField1": "fieldValue3",
"DataField2": "fieldValue4"
}
]
}
}
]

Related

How do I get just enough data in a list in Elasticsearch

Say I have a doc in Elasticsearch Index like below
{
"data": [
{
"color": "RED",
"qty": 3
},
{
"color": "BLACK",
"qty": 1
}, {
"color": "BLUE",
"qty": 0
}
]
}
I just need color BLACK.
Is there any way to get just enough data back like below.
{
"data": [
{
"color": "BLACK",
"qty": 1
}
]
}
You can use script field to generate new field which have only specific value from array. Below is sample query:
{
"_source": {
"excludes": "data"
},
"query": {
"match_all": {}
},
"script_fields": {
"address": {
"script": {
"lang": "painless",
"source": """
List li = new ArrayList();
if(params['_source']['data'] != null)
{
for(p in params['_source']['data'])
{
if( p.color == 'BLACK')
li.add(p);
}
}
return li;
"""
}
}
}
}
Response:
"hits" : [
{
"_index" : "sample1",
"_type" : "_doc",
"_id" : "tUc6338BMCbs63yKTqj_",
"_score" : 1.0,
"_source" : { },
"fields" : {
"address" : [
{
"color" : "BLACK",
"qty" : 1
}
]
}
}
]
Elasticsearch returns whole document if there is any match to any field. If you want to find matched nested document, you can make data array nested in your index mapping, write a nested query that filters data.color by its value, which is BLACK in your case and use inner_hits to find the matched nested documents.
You can use source filtering to retrieve only fields you want but it will only filter by fields, not by field values.

search first element of a multivalue text field in elasticsearch

I want to search first element of array in documents of elasticsearch, but I can't.
I don't find it that how can I search.
For test, I created new index with fielddata=true, but I still didn't get the response that I wanted
Document
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
Values
name : ["John", "Doe"]
My request
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['name'][0]=params.param1",
"params" : {
"param1" : "john"
}
}
}
}
}
}
}
Incoming Response
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
You can use the following script that is used in a search request to return a scripted field:
{
"script_fields": {
"firstElement": {
"script": {
"lang": "painless",
"inline": "params._source.name[0]"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64391432",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"firstElement": [
"John" <-- note this
]
}
}
]
You can use a Painless script to create a script field to return a customized value for each document in the results of a query.
You need to use equality equals operator '==' to COMPARE two
values where the resultant boolean type value is true if the two
values are equal and false otherwise in the script query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"name":{
"type":"text",
"fielddata":true
}
}
}
}
Index data:
{
"name": [
"John",
"Doe"
]
}
Search Query:
{
"script_fields": {
"my_field": {
"script": {
"lang": "painless",
"source": "params['_source']['name'][0] == params.params1",
"params": {
"params1": "John"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_field": [
true <-- note this
]
}
}
]
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
You can use the script as shown in my another answer if you want to just compare the value of the first element of the array to some other value. But based on your comments, it looks like your use case is quite different.
If you want to search the first element of the array you need to convert your data, into nested form. Using arrays of object at search time you can’t refer to “the first element” or “the last element”.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "nested"
}
}
}
}
Index Data:
{
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
{
"booking_id": 1,
"name": [
{
"first": "Adam Simith",
"second": "John Doe"
}
]
}
{
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name.first": "John Doe"
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9400072,
"_source": {
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.9400072,
"_source": {
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
}
]

I want to get all Entities from nested JSON Data where the "ai_id" has the Value = 0

i have this bellow JSON Data, and i want to write a Query in Elasticsearch , the Query is ,
(Give me all Entities where the "ai_id" has the Value = 0 ).
the JSON Data ist :
{
"_index": "try1",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"target": {
"br_id": 0,
"an_id": 0,
"ai_id": 0,
"explanation": [
"element 1",
"element 2"
]
},
"process": {
"an_id": 1311,
"pa_name": "micha"
},
"text": "hello world"
}
},
{
"_index": "try1",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"target": {
"br_id": 0,
"an_id": 1,
"ai_id": 1,
"explanation": [
"element 3",
"element 4"
]
},
"process": {
"an_id": 1311,
"pa_name": "luca"
},
"text": "the all People are good"
}
}
]
}
}
I tried this but seems not to Work , Please any Help i will be thankfull.
GET try1\_search
{
"query":{
{ "match_all": { "ai_id": 0}}
}
}
and this did not work too,
GET try1/_search
{
"query": {
"nested" : {
"query" : {
"must" : [
{ "match" : {"ai_id" : 0} }
]
}
}
}
}
Please an Suggestion .
thx
You need to query nested on your target object like this-
GET /try1/_search
{
"query": {
"nested" : {
"path" : "target",
"query" : {
"bool" : {
"must" : [
{ "match" : {"target.ai_id" : 0} }
]
}
}
}
}
}
Ref. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Elasticsearch parent - child mapping: Search in both and highlight

I have the following elasticsearch 1.6.2 index mappings: parent item and child document. One item can have several documents. Documents are not nested because they contain base64 data (mapper-attachments-plugin) and cannot be updated with an item.
"mappings" : {
"document" : {
"_parent" : {
"type" : "item"
},
"_routing" : {
"required" : true
},
"properties" : {
"extension" : {
"type" : "string",
"term_vector" : "with_positions_offsets",
"include_in_all" : true
}, ...
},
}
"item" : {
"properties" : {
"prop1" : {
"type" : "string",
"include_in_all" : true
}, ...
}
}
I like to search in both indices but always return items. If there is a match in an document, return the corresponding item. If there is a match in an item, return the item. If both is true, return the item.
Is it possible to combine has_child and has_parent searches?
This search only searches in documents and returns items:
{
"query": {
"has_child": {
"type": "document",
"query": {
"query_string":{"query":"her*}
},
"inner_hits" : {
"highlight" : {
"fields" : {
"*" : {}
}
}
}
}
EXAMPLE
GET index/item/174
{
"_type" : "item",
"_id" : "174",
"_source":{"prop1":"Perjeta construction"}
}
GET index/document/116
{
"_type" : "document",
"_id" : "116",
"_source":{"extension":"pdf","item": {"id":174},"fileName":"construction plan"}
}
__POSSIBLE SEARCH RESULT searching for "constr*"__
{
"hits": {
"total": 1,
"hits": [
{
"_type": "item",
"_id": "174",
"_source": {
"prop1": "Perjeta construction"
},
"highlight": {
"prop1": [
"Perjeta <em>construction<\/em>"
]
},
"inner_hits": {
"document": {
"hits": {
"hits": [
{
"_type": "document",
"_id": "116",
"_source": {
"extension": "pdf",
"item": {
"id": 174
},
"fileName": "construction plan"
},
"highlight": {
"fileName": [
"<em>construction<\/em> plan"
]
}
}
]
}
}
}
}
]
}
}
I can answer my question "Is it possible to combine has_child and has_parent" with no.
You should only use one at a time on one index.

Grouping consecutive documents with Elasticsearch

Is there a way to make Elasticsearch consider sequence-gaps when grouping?
Provided that the following data was bulk-imported to Elasticsearch:
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Is there a way to query this data in a way that
the documents with sequence number 1 and 2 go to one output group,
the document with sequence number 3 goes to another one, and
the documents with sequence number 4 and 5 go to a third group?
... considering the fact that the type A sequence is interrupted by a type B item (or any other item that's not type A)?
I would like result buckets to look something like this (name and value for sequence_group may be different - just trying to illustrated the logic):
"buckets": [
{
"key": "a",
"sequence_group": 1,
"doc_count": 2
},
{
"key": "b",
"sequence_group": 3,
"doc_count": 1
},
{
"key": "a",
"sequence_group": 4,
"doc_count": 2
}
]
There is a good description of the problem and some SQL solution-approaches at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/. I would like to know if there is a solution for elasticsearch available as well.
We can use Scripted Metric Aggregation here which works in a map-reduce fashion (Ref link). It has different parts like init, map, combine and reduce. And, the good thing is that the result of all of these could be a list or map too.
I played around a bit on this.
ElasticSearch version used: 7.1
Creating index:
PUT test
{
"mappings": {
"properties": {
"sequence": {
"type": "long"
},
"type": {
"type": "text",
"fielddata": true
}
}
}
}
Bulk indexing: (Note that I removed mapping type 'groupingTest')
POST _bulk
{ "index": { "_index": "test", "_id": "1" } }
{ "sequence": 1, "type": "A" }
{ "index": { "_index": "test", "_id": "2" } }
{ "sequence": 2, "type": "A" }
{ "index": { "_index": "test", "_id": "3" } }
{ "sequence": 3, "type": "B" }
{ "index": { "_index": "test", "_id": "4" } }
{ "sequence": 4, "type": "A" }
{ "index": { "_index": "test", "_id": "5" } }
{ "sequence": 5, "type": "A" }
Query
GET test/_doc/_search
{
"size": 0,
"aggs": {
"scripted_agg": {
"scripted_metric": {
"init_script": """
state.seqTypeArr = [];
""",
"map_script": """
def seqType = doc.sequence.value + '_' + doc['type'].value;
state.seqTypeArr.add(seqType);
""",
"combine_script": """
def list = [];
for(seqType in state.seqTypeArr) {
list.add(seqType);
}
return list;
""",
"reduce_script": """
def fullList = [];
for(agg_value in states) {
for(x in agg_value) {
fullList.add(x);
}
}
fullList.sort((a,b) -> a.compareTo(b));
def result = [];
def item = new HashMap();
for(int i=0; i<fullList.size(); i++) {
def str = fullList.get(i);
def index = str.indexOf("_");
def ch = str.substring(index+1);
def val = str.substring(0, index);
if(item["key"] == null) {
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
} else if(item["key"] == ch) {
item["doc_count"] = item["doc_count"] + 1;
} else {
result.add(item);
item = new HashMap();
item["key"] = ch;
item["sequence_group"] = val;
item["doc_count"] = 1;
}
}
result.add(item);
return result;
"""
}
}
}
}
And, finally the output:
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"scripted_agg" : {
"value" : [
{
"doc_count" : 2,
"sequence_group" : "1",
"key" : "a"
},
{
"doc_count" : 1,
"sequence_group" : "3",
"key" : "b"
},
{
"doc_count" : 2,
"sequence_group" : "4",
"key" : "a"
}
]
}
}
}
Please note that scripted aggregation impacts a lot on the performance of the query. So, you might notice some slowness if there is a large no of documents.
You can always do an terms aggregation and then apply tops hit aggregation to get this.
{
"aggs": {
"types": {
"terms": {
"field": "type"
},
"aggs": {
"groups": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Resources