Delete all search response documents - elasticsearch

Using nest for elasticsearch I am trying to delete an exact number of documents (the oldest I can find) from my index. My mapped object has a TimeStamp field. Only way I managed to make this work is by searching for these documents and then running a foreach over every hit and passing the ID of that hit into the delete API and removing them one by one:
var searchResponseAsc = client.Search<MyPersonalObject>(s => s.Sort(sd => sd.Ascending(e => e.TimeStamp)).Take(NumberOfDocumentsToBeDeleted));
foreach (IHit<MyPersonalObject> hit in searchResponseAsc.Hits) {
client.DeleteByQuery<MyPersonalObject>(dbq => dbq.Index(IndexName).Query(q => q.Ids(s => s.Values(hit.Id))));
}
Is there a way to call the delete API on a bulk of search response or even better to nest the search query directly into the delete query?
Any tips would be much appreciated!

You can embed an Elasticsearch search query in the delete by query API:
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}

Related

Filtering DSL Query Search - Elasticsearch

I was reading a few articles and documents over query context and filter context and learned that it is always best to use filter context if you do not need to do a full text search or scoring does not matter. In my case, I am wanting to return the logs that contain an ID... so I then realized I should just use a filter context instead of a query context. Besides the full text search or scoring is there a hard base line that defines when you should use one over the other?
So I went from my original DSL query search cmd:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"match": {
"_id": "mJvG0nkBiU3wk_hrEd-8"
}
}
to the filter context:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"bool": {
"filter": [
{"match": {"_id": "mJvG0nkBiU3wk_hrEd-8"}}
]
}
}
}
}
Since, I am wanting to use NEST to perform the query search I took this approach.
[HttpGet("GetAll/{_id}")]
public async Task<EsSource> GetAll(String _id)
{
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => q
.Bool(b => b
.Filter(f => f
.Match(m => m.Field("_id").Query(_id))))));
return response?.Documents?.FirstOrDefault();
}
Would this be the correct way to do a filter context using NEST?
That would be the correct way to issue a query with only a filter context. Some additional points that might help
A term-level query on the _id field, like a term query should suffice, as there's no analysis chain involved
If you know the index that contains the document, the get API would be a better option. Given a wildcard index pattern is being used though, implies that the index might not be known.
NEST has convenient operator overloads on queries to make writing bool queries more succinct. The final query can be written more succinctly as
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => +q
.Match(m => m
.Field("_id")
.Query(_id)
)
)
);

How to delete data from a specific index in elasticsearch after a certain period?

I have an index in elasticsearch with is occupied by some json files with respected to timestamp.
I want to delete data from that index.
curl -XDELETE http://localhost:9200/index_name
Above code deletes the whole index. My requirement is to delete certain data after a time period(for example after 1 week). Could I automate the deletion process?
I tried to delete by using curator.
But I think it deletes the indexes created by timestamp, not data with in an index. Can we use curator for delete data within an index?
It will be pleasure if I get to know that either of following would work:
Can Curl Automate to delete data from an index after a period?
Can curator Automate to delete data from an index after a period?
Is there any other way like python scripting to do the job?
References are taken from the official site of elasticsearch.
Thanks a lot in advance.
You can use the DELETE BY QUERY API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Basically it will delete all the documents matching the provided query:
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
But the suggested way is to implement indexes for different periods (days for example) and use curator to drop them periodically, based on the age:
...
logs_2019.03.11
logs_2019.03.12
logs_2019.03.13
logs_2019.03.14
Simple example using Delete By Query API:
POST index_name/_delete_by_query
{
"query": {
"bool": {
"filter": {
"range": {
"timestamp": {
"lte": "2019-06-01 00:00:00.0",
"format": "yyyy-MM-dd HH:mm:ss.S"
}
}
}
}
}
}
This will delete records which have a field "timestamp" which is the date/time (within the record) at which they occured. One can run the query to get a count for what will be deleted.
GET index_name/_search
{
"size": 1,
"query: {
-- as above --
Also it is nice to use offset dates
"lte": "now-30d",
which would delete all records older than 30 days.
You can always delete single documents by using the HTTP request method DELETE.
To know which are the id's you want to delete you need to query your data. Probably by using a range filter/query on your timestamp.
As you are interacting with the REST api you can do this with python or any other language. There is also a Java client if you prefer a more direct api.

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

Elastic search: Delete by query isn't working

Introduction
I'm using Elastic Search (v5.x) and trying to delete documents, by query.
My index called "data". The documents are stored in hierarchic structure. Documents URL built in this pattern:
https://server.ip/data/{userid}/{document-id}
So, let's say the user-id '1' have two documents stored ('1', '2'). So, their direct URL will be:
https://server.ip/data/1/1
https://server.ip/data/1/2
Target
Now, what I'm trying to do is to delete the user from the system (the user and his stored documents).
The only way that worked for me is to send HTTP DELETE request for each document URL. Like this:
DELETE https://server.ip/data/1/1
DELETE https://server.ip/data/1/2
This is working. But, in this solution I have to call delete multiple times. I want to delete all the documents in one call. So, this solution is rejected.
My first try was to send HTTP DELETE request to
https://server.ip/data/1
Unfortently, it's not working (error code 400).
My second try was to use the _delete_by_query function. Each document that I'm stored is containing the UserId field, which contain the UserId. So, I tried to make a delete query for removing all the documents, in 'data' index, that containing the field with the value 1 ('UserId'==1)
POST https://server.ip/data/_delete_by_query
{
"query":{
"match":{
"UserId":"1"
}
}
}
This also not working. The response was HTTP Error Code 400 with this body:
{
"error":{
"root_cause":[
{
"type":"invalid_type_name_exception",
"reason":"Document mapping type name can't start with '_'"
}
],
"type":"invalid_type_name_exception",
"reason":"Document mapping type name can't start with '_'"
},
"status":400
}
Do you know how to solve those problems? Maybe do you have alternative solution?
Thank you!
I assume you've got your document_type defined in your logstash conf something like this within your output>elasticsearch:
output {
elasticsearch {
index => "1"
document_type => "1type"
hosts => "localhost"
}
stdout {
codec => rubydebug
}
}
Hence you could simply delete all the documents which has the same type:
curl -XDELETE https://server.ip/data/1/1type
OR try something like this if you're willing to use delete by query:
POST https://server.ip/data/_delete_by_query?UserId=1
{
"query": {
"match_all": {}
}
}
This could be an absolute gem of a source. Hope it helps!

How can perform an Elasticsearch Multisearch, with only suggesters?

I need to return suggestions from 4 separate suggesters, across two separate indices.
I am currently doing this by sending two separate requests to Elasticsearch (one for each index) and combining the results in my application. Obviously this does not seem ideal when the Multisearch API is available.
From playing with the Multisearch API I am able to combine these suggestion requests into one and it correctly retrieves results from all 4 completion suggesters from both indexes.
However, it also automatically performs a match_all query on the chosen indices. I can of course minimize the impact of this by setting searchType to count but the results are worse than the two separate curl requests.
It seems that no matter what I try I cannot prevent the Multisearch API from performing some sort of query over each index.
e.g.
{
index: 'users',
type: 'user'
},
{
suggest: {
users_suggest: {
text: term,
completion: {
size : 5,
field: 'users_suggest'
}
}
},
{
index: 'photos',
type: 'photo'
},
{
suggest: {
photos_suggest: {
text: term,
completion: {
size : 5,
field: 'photos_suggest'
}
}
}
}
A request like the above which clearly omits the {query:{} part of this multisearch request, still performs a match_all query and returns everything in the index.
Is there any way to prevent the query taking place so that I can simply get the combined completion suggesters results? Or is there another way to search multiple suggesters on multiple indices in one query?
Thanks in advance
Do make size=0, so that no hits will be returned but only suggestions.
{
"size": 0,
"suggest":{}
}
for every request.

Resources