Elasticsearch type data deletion - elasticsearch

Is it possible to delete all the documents of a particular type in the elasticsearch index ?
- Does it affect the type mapping too ?
- I want to retain the mapping for that type.
Using elasticsearch 2.2

Found an answer related to this here. Following content is directly from that answer.
You can use the delete-by-query plugin43 for that. Here's an example:
We create an index with two types and add some documents:
POST /_bulk
{"index":{"_index":"mammals","_type":"people"}}
{"tag_line":"I am Mike"}
{"index":{"_index":"mammals","_type":"people"}}
{"tag_line":"I am Hanna"}
{"index":{"_index":"mammals","_type":"people"}}
{"tag_line":"I am Bert"}
{"index":{"_index":"mammals","_type":"animals"}}
{"tag_line":"I am a dog"}
{"index":{"_index":"mammals","_type":"animals"}}
{"tag_line":"I am a cat"}
When we query for all documents, we get 5 results:
GET /mammals/_search?size=0
{
"query": {
"match_all": {}
}
}
Now we can delete all documents of the type "animals":
DELETE /mammals/animals/_query
{
"query": {
"match_all": {}
}
}
This will only work when the delete-by-query plugin is installed.
When we search once again for all documents, we only get 3 results as the animals are gone.
P.S: This plugin is there in 2.x version and not there in 5.x. So in 5.x there can be other ways to do this. I believe that this deletion does not affect the mapping because this just deletes individual documents.

Related

How to delete data from a specific index in elasticsearch after a certain period?

I have an index in elasticsearch with is occupied by some json files with respected to timestamp.
I want to delete data from that index.
curl -XDELETE http://localhost:9200/index_name
Above code deletes the whole index. My requirement is to delete certain data after a time period(for example after 1 week). Could I automate the deletion process?
I tried to delete by using curator.
But I think it deletes the indexes created by timestamp, not data with in an index. Can we use curator for delete data within an index?
It will be pleasure if I get to know that either of following would work:
Can Curl Automate to delete data from an index after a period?
Can curator Automate to delete data from an index after a period?
Is there any other way like python scripting to do the job?
References are taken from the official site of elasticsearch.
Thanks a lot in advance.
You can use the DELETE BY QUERY API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Basically it will delete all the documents matching the provided query:
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
But the suggested way is to implement indexes for different periods (days for example) and use curator to drop them periodically, based on the age:
...
logs_2019.03.11
logs_2019.03.12
logs_2019.03.13
logs_2019.03.14
Simple example using Delete By Query API:
POST index_name/_delete_by_query
{
"query": {
"bool": {
"filter": {
"range": {
"timestamp": {
"lte": "2019-06-01 00:00:00.0",
"format": "yyyy-MM-dd HH:mm:ss.S"
}
}
}
}
}
}
This will delete records which have a field "timestamp" which is the date/time (within the record) at which they occured. One can run the query to get a count for what will be deleted.
GET index_name/_search
{
"size": 1,
"query: {
-- as above --
Also it is nice to use offset dates
"lte": "now-30d",
which would delete all records older than 30 days.
You can always delete single documents by using the HTTP request method DELETE.
To know which are the id's you want to delete you need to query your data. Probably by using a range filter/query on your timestamp.
As you are interacting with the REST api you can do this with python or any other language. There is also a Java client if you prefer a more direct api.

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

ElasticSearch: Using match_phrase for all fields

As a user of ElasticSearch 5, I have been using something like this to search for a given phrase in all fields:
GET /my_index/_search
{
"query": {
"match_phrase": {
"_all": "this is a phrase"
}
}
}
Now, the _all field is going away, and match_phrase does not seem to work like query_string, where you can simply use something like this to run a search for all fields:
"query": {
"query_string": {
"query": "word"
}
}
What is the alternative for a exact phrase search for all fields without using the _all field from version 6.0?
I have many fields per document so specifying all of them in the query is not really a solution for me.
You can find answer in Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It says:
Use a custom field and the mapping copy_to parameter
So, you have to create custom fields in source, and copy all other fields to it.

Get elasticsearch indices before specific date

My logstash service sends the logs to elasticsearch as daily indices.
elasticsearch {
hosts => [ "127.0.0.1:9200" ]
index => "%{type}-%{+YYYY.MM.dd}"
}
Does Elasticsearch provides the API to lookup the indices before specific date?
For example, how could I get the indices created before 2015-12-15 ?
The only time I really care about what indexes are created is when I want to close/delete them using curator. Curator has "age" type features built in, if that's also your use case.
I think you are looking for Indices Query have a look here
Here is an example:
GET /_search
{
"query": {
"indices" : {
"query": {
"term": {"description": "*"}
},
"indices" : ["2015-01-*", "2015-12-*"],
"no_match_query": "none"
}
}
}
Each index has a creation_date field.
Since the number of indices is supposed to be quite small there's no such feature as 'searching for indices'. So you just get their metadata and filter them inside your app. The creation_date is also available via _cat API.

How to update multiple documents that match a query in elasticsearch

I have documents which contains only "url"(analyzed) and "respsize"(not_analyzed) fields at first. I want to update documents that match the url and add new field "category"
I mean;
at first doc1:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500"
}
I have an external data and I know "stackoverflow.com" belongs to category 10,
And I need to update the doc, and make it like:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500",
"category":"10"
}
Of course I will do this all documents which url fields has "stackoverflow.com"
and I need the update each doc oly once.. Because category data of url is not changeable, no need to update again.
I need to use _update api with _version number to check it but cant compose the dsl query.
EDIT
I run this and looks works fine:
But documents not changed..
Although query result looks true, new field not added to docs, need refresh or etc?
You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish.
curl -XPOST 'localhost:9200/webproxylog/_update_by_query' -H "Content-Type: application/json" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "stackoverflow.com"
}
},
{
"missing": {
"field": "category"
}
}
]
}
}
}
},
"script" : "ctx._source.category = \"10\";"
}'
After running this, all your documents with url: stackoverflow.com that don't have a category, will get category: 10. You can run the same query again later to fix new stackoverflow.com documents that have been indexed in the meantime.
Also make sure to enable scripting in elasticsearch.yml and restart ES:
script.inline: on
script.indexed: on
In the script, you're free to add as many fields as you want, e.g.
...
"script" : "ctx._source.category1 = \"10\"; ctx._source.category2 = \"20\";"
UPDATE
ES 2.3 now features the update by query functionality. You can still use the above query exactly as is and it will work (except that filtered and missing are deprecated, but still working ;).
That all sounds great but just to add to #Val answer, Update By Query is available form ElasticSearch 2.x but not for earlier versions. In our case we're using 1.4 for legacy reasons and there is no chance of upgrading in forseeable future so another solution is using the Update by query plugin provided here: https://github.com/yakaz/elasticsearch-action-updatebyquery

Resources