ES 1.5 Delete By Query API not working - elasticsearch

I am using an old version on ElasticSearch - 1.5.
Problem: I need to delete a lot of documents, like few hundred thousands up to few millions. I have all the info about the records, including it's _ids - so array of _ids is what I want to use.
Scale problem: I had this deletion in the loop before, but ES is inconsistent when performing a lot of subsequent operations in a high speed. Thus I decided to look for a bulk delete.
I am trying to make use of delete by query API.
Docs states:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'
What I'm doing:
curl -XDELETE 'http://localhost:9200/my_index/logs/_query' -d '{
"query" : {
"terms" : { "_id" : ["AVTD6fhLAn35BG25xbZz", "AVTD6fhLAn35BG25xbaC"] }
}
}
'
The response is:
{
"found":false,
"_index":"my_index",
"_type":"logs",
"_id":"_query",
"_version":1,
"_shards":{"total":2, "successful":1, "failed":0}
}
And it does not remove any of documents. How do I make it work and actually delete these records?

Not sure about the delete_by_query API in elasticsearch 1.5. Seems to me that elasticsearch is unable to understand your query as it is looking for "_id": "_query" (as evident from the response you posted).
What you can do is, use the Bulk API as documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/docs-bulk.html
As in the example in the doc page, you can do:
curl -s -XPOST localhost:9200/_bulk --data-binary #requests; echo
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
...
You need to make a file by any name ("requests" here) and add individual delete requests, each separated by a newline character.

Related

Search particular document id in all available indices of Elasticsearch

Is there any possibility where we can search a particular document id in all available indices. /_all/_search/ returns all documents but I tried it as /_all/_search/?q=<MYID> or
/_all/_search/_id/<MYID>
but I'm not getting any documents.
If Elasticsearch does not support this, how will we achieve this task ? The use case is centralized log system based on Logstash and Elasticsearch, having multiple indices of different running services.
You can use the terms query for this. Use _all to search on all indexes.Please refer here
here is the request I used
curl -XGET "http://localhost:9200/_all/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"terms": {
"_id": [
"4ea288f192e2c8b6deb3cee00d7b873b",
"dcc2b9c4fb6d14b2d41dbc5fee801af3"
]
}
}
}'
_id is the id of the document
You can use multi get api
You will need to pass the index name , it won't work on all indices
GET /_mget
{
"docs" : [
{
"_index" : "index1",
"_id" : "1"
},
{
"_index" : "index2",
"_id" : "1"
}
]
}

In Elastic search ,how to get "-id" value of a document by providing unique content present in the document

I have few documents ingested in Elastic search. A sample document is as below.
"_index": "author_index",
"_type": "_doc",
"_id": "cOPf2wrYBik0KF", --Automatically generated by Elastic search after ingestion
"_score": 0.13956004,
"_source": {
"author_data": {
"author": "xyz"
"author_id": "123" -- This is unique id for each document
"publish_year" : "2016"
}
}
Is there a way to get the auto-generated _id by sending author_id from High-level Rest Client?
I tried researching solutions.But all the solutions are only fetching the document using _id. But I need the reverse operation.
Actual Output expected: cOPf2wrYBik0KF
The SearchHit provides access to basic information like index, document ID and score of each search hit, so with Search API you can do it this way on Java,
String index = hit.getIndex();
String id = hit.Id() ;
OR something like this,
SearchResponse searchResponse =
client.prepareSearch().setQuery(matchAllQuery()).get();
for (SearchHit hit : searchResponse.getHits()) {
String yourId = hit.id();
}
SEE HERE: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html#java-rest-high-search-response
You can use source filtering.You can turn off _source retrieval as you are interested in just the _id.The _source accepts one or more wildcard patterns to control what parts of the _source should be returned.(https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-source-filtering.html):
GET /author_index
{
"_source" : false,
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Another approach will also give for the _id for the search.The stored_fields parameter is about fields that are explicitly marked as stored in the mapping, which is off by default and generally not recommended:
GET /author_index
{
"stored_fields" : ["author_data.author_id", "_id"],
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Output for both above queries:
"hits" : [
{
"_index" : "author_index",
"_type" : "_doc",
"_id" : "cOPf2wrYBik0KF",
"_score" : 6.4966354
}
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-stored-fields.html

How to index same doc in different indices with different routing

I need to be able to index the same document in different indexes with different routing value.
Basically the problem to solve is to be able to calculate complex aggregations about payment information from the perspective of payer and collector. For example, "payments made / received in the last 15 days grouped by status"
I was wondering how we can achieve this using ElasticSearch bulk api.
Is it possible to achieve this without generating redundancy in the ndjson? Something like this for example:
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "index" : { "_index" : "test_2", "_id" : "1", "routing": "5678" } }
{ "field1" : "value1" }
I looked for documentation but I didn't find a place that explain this.
By only using the bulk API, you'll need to repeat the document each time.
Another way of doing it is to bulk-index the documents into the first index and then using the Reindex API you can create the second index with a different routing value for each document.
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "field1" : "value1", "routing2": "5678" }
And then you can reindex into a second index using the second routing value (that you need to store in the document somehow
POST _reindex
{
"source": {
"index": "test_1"
},
"dest": {
"index": "test_2"
},
"script": {
"source": "ctx._routing = ctx._source.routing2",
"lang": "painless"
}
}
That way, you only index the data once using the bulk API, which will roughly take half the time than when doubling all documents, and then by leveraging the Reindex API all the data will be reindexed internally (i.e. without the added network latency of sending the potentially big payload)

Can fielddata_fields be used in mget request?

I am trying to get the fielddata from a not_analyzed field in Multi Get query. It is working fine in _search queries.
This is what I've tried with no luck:
curl -XGET "http://es:9200/articles/article/_mget/?pretty&fielddata_fields=url" -d '{"ids" : ["5763197951"]}'
curl -XGET "http://es:9200/articles/article/_mget/?pretty" -d '{"fielddata_fields": ["url"], "ids" : ["5763197951"]}'
curl -XGET "http://es:9200/articles/article/_mget/?pretty" -d '{"docs" : [{"_id" : "5763197951", "fielddata_fields": ["url"]}]}'
It looks like fielddata_fields is completely ignored, since I always get this result:
{
"docs" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "5763197951",
"_version" : 1,
"found" : true
} ]
}
I'm running ES version 1.4.4 with JVM: 1.8.0_31
Edit: I just tried the above with a test database running ES 2.2.2 with the same results...

Return document on update elasticsearch

Lets say I'm updating user data
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"fields": ["_source"]
}'
Heres an example of what I'm getting back when I perform an update
{
"_index" : "test",
"_type" : "type1",
"_id" : "1",
"_version" : 4
}
How do I perform an update that returns the given document post update?
The documentation is a little misleading with regards to returning fields when performing an Elasticsearch update. It actually uses the same approach that the Index api uses, passing the parameter on the url, not as a field in the update.
In your case you would submit:
curl -XPOST 'localhost:9200/test/type1/1/_update?fields=_source' -d '{
"doc" : {
"name" : "new_name"
}
}'
In my testing in Elasticsearch 1.2.1 it returns something like this:
{
"_index":"test",
"_type":"testtype",
"_id":"1","_version":9,
"get": {
"found":true,
"_source": {
"user":"john",
"body":"testing update and return fields",
"name":"new_name"
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

Resources