Can I bulk delete documents without specifying a routing key? - elasticsearch

I'm using python-elasticsearch (https://elasticsearch-py.readthedocs.io/en/master/) to bulk insert documents on an ES cluster with multiple shards. I'm using a routing during the insertion, but I don't have access to that key on deletion
Is there a way to hit all shards? Something like routing=*. If I don't specify a routing key ES is not able to find the document
This
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1", "routing": "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } } // Fails because routing is missing
gives me
{
"took" : 1334,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"delete" : {
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 404
}
}
]
}

Related

How can we delete multiple documents at once in elastic search that belongs to different indexes?

I know we have delete_by_query API that can do this job, but I'm looking for a solution using bulk API. I tried to follow https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html but it's only working when we delete documents from one index. what about when we have more than one index that having comma(,) separated indexes like index1,index2
You can build the request like that:
PUT _bulk
{ "delete" : { "_index" : "products", "_id" : 1 } }
{ "delete" : { "_index" : "idx_movies", "_id" : 1 } }
Delete in different indice the same _id.
The response will be (If find the doc, will be deleted else the response will be not found:
{
"took" : 6,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1,
"status" : 200
}
},
{
"delete" : {
"_index" : "idx_movies",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1012,
"_primary_term" : 2,
"status" : 404
}
}
]
}
You can use Bulk API with different index name as shown below:
POST _bulk
{ "delete" : { "_index" : "index1", "_id" : "1" } }
{ "delete" : { "_index" : "index1", "_id" : "2" } }
{ "delete" : { "_index" : "index2", "_id" : "3" } }
{ "delete" : { "_index" : "index3", "_id" : "4" } }
Similar you can achive using Elastic Java or any other language client as well.

Null field in elasticsearch need to be replaced

How can i replace the "build_duration" : "null", with value 21600000 in elasticsearch?
DevTools > Console
GET myindex/_search
{
"query": {
"term": {
"build_duration": "null"
}
}
}
Output:-
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 9.658761,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "40324749",
"_score" : 9.658761,
"_source" : {
"build_duration" : "null",
"build_end_time" : "2021-05-20 04:00:36",
"build_requester" : "daniel.su",
"build_site" : "POL",
"build_id" : "40324749",
"#version" : "1"
}
}
]
}
}
With below query able to replace the filed value.
POST /myindex/_update/mydocid
{
"doc" : {
"build_duration": "21600000"
}
}

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

Problem with creating roles in open-distro for elasticsearch

I have 2 roles that are assigned to one user. In the first role, I include field name for documents which have _id 1 and 2
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [ \"1\", \"2\"] \n }\n}\n\n",
"fls": [
"name"
],
"masked_fields": [],
"allowed_actions": [
"get",
"crud"
]
}
],
"tenant_permissions": [],
"cluster_permissions": [
"*"
]
}
and in the second role, I include field job_description for document which have _id 3
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [\"3\"] \n }\n}\n",
"fls": [
"job_description"
],
"masked_fields": [],
"allowed_actions": []
}
],
"tenant_permissions": [],
"cluster_permissions": []
}
when I try to get data from the index it shows me job_description and name in all documents,
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
but I want to see the only name in two firs records and only job_description in 3 document like that
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
does anyone know how to do it?
DLS and FLS do not work in conjunction like that.
DLS is used to only return back a subset of search response based on the DLS query, whereas FLS is used to only include or exclude certain fields from the search response returned from elasticsearch.
All the DLS queries are combined (OR condition) and similarly all FLS input is combined (AND condition) for a user that contains multiple such configurations.
In your case, you have two DLS and two FLS query. The two DLS queries will work as OR conditions, in your case it will return back documents matching 1,2 or 3 doc_id. Similarly, both name and job_description will be returned back.

Using index sorting by default in _search

I am using ElasticSearch 7.6 and the Index Sorting feature which was introduced in 6.0.
What i am looking to do is to do a GET /myindice/_search without specifying sort and get documents based on Index sorting settings I have specified for my index and NOT insertion order.
My index as per the doc :
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date",
"sort.order" : "desc"
}
},
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
PUT twitter/_doc/a
{
"date": "2015-01-01"
}
PUT twitter/_doc/b
{
"date": "2016-01-01"
}
PUT twitter/_doc/c
{
"date": "2017-01-01"
}
My initial thought is that
GET twitter/_search
Should return doc C, B and A.
I get the following :
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "a",
"_score" : 1.0,
"_source" : {
"date" : "2015-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "b",
"_score" : 1.0,
"_source" : {
"date" : "2016-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "c",
"_score" : 1.0,
"_source" : {
"date" : "2017-01-01"
}
}
]
}
}
As the documentation isn't clear at this particular subject and that all query are using sort :
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/index-modules-index-sorting.html
Am I required to specify the sort order in the GET query (hence repeating the sort specified as the Index Sorting) ?
Thanks in advance for any diligent soul that could help me,

Resources