Update large set of documents without knowing _id - elasticsearch

I would like up update a large set of documents in Elasticsearch at once.
One document looks like this:
{
"_index": "vue_storefront_magento_1_1587559712",
"_type": "product",
"_id": "1123",
"_version": 56,
"_score": 7.7135754,
"_source": {
"sku": "381735",
"score": "1"
}
"fields": {
"updated_at": [
1589880769000
]
},
"highlight": {
"type_id": [
"#kibana-highlighted-field#configurable#/kibana-highlighted-field#"
],
"sku": [
"#kibana-highlighted-field#381735#/kibana-highlighted-field#"
]
}
}
I have a JSON file that contains the data I want to update, there is no _id field, only an SKU. I want to use this JSON to create the request to ElasticSearch to update.
[
{ "sku": 381735, "score": 2 },
{ "sku": 381736, "score": 3 },
{ "sku": 381737, "score": 4 }
]
I would like to update all of the score fields based on the SKU field in the _source.
Is this possible? I already looked at the update by query API but can't figure it out :-/

Related

How to perform filter on aggregation results in elastic search?

I have an elastic search index that contains a certain field on which I want to perform a wildcard query. The issue is that the field is duplicated in many docs hence I want to use aggregation first to get unique values for that field and then perform a wildcard query on top of that. Is there a way I can perform the query on aggregation results in elastic search?
I believe you can find the results you need by collapsing your search results rather than using your strategy of first obtaining the aggregation results and then running a wildcard query.
Adding a working example with index data (with the default mapping), search query and search result.
Index Data:
{
"role": "example123",
"number": 1
}
{
"role": "example",
"number": 2
}
{
"role": "example",
"number": 3
}
Search Query:
{
"query": {
"wildcard": {
"role": "example*"
}
},
"collapse": {
"field": "role.keyword"
}
}
Search Result:
"hits": [
{
"_index": "72724517",
"_id": "1",
"_score": 1.0,
"_source": {
"role": "example",
"number": 1
},
"fields": {
"role.keyword": [
"example"
]
}
},
{
"_index": "72724517",
"_id": "3",
"_score": 1.0,
"_source": {
"role": "example123",
"number": 1
},
"fields": {
"role.keyword": [
"example123"
]
}
}
]

Elasticsearch - Delete query among nested object

I'm new to Elasticsearch, and I cannot find a Delete query.
Here is an example of an document in myIndex :
{
"_index": "myIndex",
"_type": "_doc",
"_id": "IPc5kn8Bq7SuVr5qM9dq",
"_score": 1,
"_source": {
"code": "1234567",
"matches": [
{
"hostname": "hostnameA.com",
"url": "https://www.hostnameA.com/....",
},
{
"hostname": "hostnameB.com",
"url": "https://www.hostnameB.com/....",
},
{
"hostname": "hostnameC.com",
"url": "https://www.hostnameC.com/....",
},
{
"hostname": "hostnameD.com",
"url": "https://www.hostnameD.com/....",
},
]
}
}
Let's say this index contains 10k documents.
I would like a query to remove all the item from my array matches where the hostname is equal to hostnameC.com, and keeping all the others.
Anyone would have an idea to help me?

Need Elasticsearch guidance for searching on an array of chemical compounds

I have a list of products and array of chemical compounds for each product, i.e. ['Sodium', 'Sodium bicarbonate', .....]. In this example 'sodium', and 'sodium bicarbonate' are two different values that can be search on independently, which complicates things, so using the text keyword field criteria did not help.
I need some guidance on the best method to handle these array of strings within Elasticsearch while retaining Elasticsearch's indexing magic. I appreciate any help you can provide.
FYI
I'm currently using Elasticsearch 6.3
You can use the multi-match query, which builds on the match query to allow multi-field queries
Adding a working example with index data, search query, and search result.
Index Data:
{
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
{
"product": "product2",
"compounds": [
"Sodium"
]
}
{
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
{
"product": "product4",
"compounds": [
"Chlorine
]
}
Search Query:
{
"query": {
"multi_match" : {
"query": "Sodium AND Sodium bicarbonate",
"fields": [ "compounds", "compounds.keyword" ]
}
}
}
Search Result:
"hits": [
{
"_index": "65513968",
"_type": "_doc",
"_id": "1",
"_score": 1.0897084,
"_source": {
"product": "product1",
"compounds": [
"Sodium",
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "3",
"_score": 1.0659102,
"_source": {
"product": "product3",
"compounds": [
"Sodium bicarbonate"
]
}
},
{
"_index": "65513968",
"_type": "_doc",
"_id": "2",
"_score": 0.7032229,
"_source": {
"product": "product",
"compounds": [
"Sodium"
]
}
}
]
You can use terms query if you want to return documents that contain one or more exact terms in a field
A unique list of chemical compounds
To find the unique lists of chemical compounds you can use the terms aggregation.
{
"size": 0,
"aggs": {
"compounds": {
"terms": {
"field": "compounds.keyword"
}
}
}
}
Result:
"aggregations": {
"compounds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sodium",
"doc_count": 2
},
{
"key": "Sodium bicarbonate",
"doc_count": 2
},
{
"key": "Chlorine",
"doc_count": 1
}
]
}
}

Elasticsearch: Transpose and aggregate the data

I am using the ES 6.5. When I fetch the required messages, I have to transpose and aggregate it. See example for more details.
Message retrieved - 2 messages retried for example:
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.0851293,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "New",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 5.30
}
]
}
]
}
}
},
{
"_index": "index_name",
"_type": "data",
"_id": "data_id",
"_score": 5.09512,
"_source": {
"header": {
"id": "System_20190729152502239_57246_16667",
"creationTimestamp": "2019-07-29T15:25:02.239Z",
},
"messageData": {
"messageHeader": {
"date": "2019-06-03",
"mId": "1000",
"mDescription": "TEST",
},
"messageBreakDown": [
{
"category": "Old",
"subCategory": "Sub",
"messageDetails": [
{
"Amount": 4.30
}
]
}
]
}
}
}
Now I am looking for a query to post on ES which will transpose the data and group by on category and sub category .
So basically if you check the messages, they have same header.id (which is the main search criteria). Within this header.id, one message is for category New and other Old (messageData.messageBreakDown is array and in it category value).
So ideally as you see the output, both messages belong to same mId, and it has New price and Old Price.
How to aggregate for the desired results ?
Final output message can have desired fields only e.g. date, mId, mDesciption, New price and Old price (both in one output)?
UPDATE:
Below is the mapping,
{"index_name":{"mappings":{"data":{"properties":{"header":{"properties":{"id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"creationTimestamp":{"type":"date"}}},"messageData":{"properties":{"messageBreakDown":{"properties":{"category":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"messageDetails":{"properties":{"Amount":{"type":"float"}}},"subCategory":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"messageHeader":{"properties":{"mDescription":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"mId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"date":{"type":"date"}}}}}}}}}}

How to search exact text in nested document in elasticsearch

I have a index like this,
"_index": "test",
"_type": "products",
"_id": "URpYIFBAQRiPPu1BFOZiQg",
"_score": null,
"_source": {
"currency": null,
"colors": [],
"api": 1,
"sku": 9999227900050002,
"category_path": [
{
"id": "cat00000",
"name": "B1"
},
{
"id": "abcat0400000",
"name": "Cameras & Camcorders"
},
{
"id": "abcat0401000",
"name": "Digital Cameras"
},
{
"id": "abcat0401005",
"name": "Digital SLR Cameras"
},
{
"id": "pcmcat180400050006",
"name": "DSLR Package Deals"
}
],
"price": 1034.99,
"status": 1,
"description": null,
}
And i want to search only exact text ["Camcorders"] in category_path field.
I did some match query, but it search all the products which has "Camcorders" as a part of the text. Can some one help me to solve this.
Thanks
To search in nested field use like following query
{
"query": {
"term": {
"category_path.name": {
"value": "b1"
}
}
}
}
HOpe it helps..!
you could add one more nested field raw_name with not_analyzed analyzer and match against it.

Resources