Bulk delete elasticsearch - elasticsearch

i am using elastic search 2.2.
here is the count of documents
curl 'xxxxxxxxx:9200/_cat/indices?v'
yellow open app 5 1 28019178 5073 11.4gb 11.4gb
In the "app" index we have two types of document.
"log"
"syslog"
Now i want to delete all the documents under type "syslog".
Hence, i tried using the following command
curl -XDELETE "http://xxxxxx:9200/app/syslog"
But am getting the following error
No handler found for uri [/app/syslog]
i have installed delete-by-query plugin as well. Is there any way i can do a bulk delete operation ?
For now , i am deleting records by fetching the id.
curl -XDELETE "http://xxxxxx:9200/app/syslog/A121312"
it took around 5 mins for me to delete 10000 records. i have more than 1000000 docs which needs to be deleted. please help.
[EDIT -1]
i ran the below query to delete syslog type docs
curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query' -d'
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
}
}'
And result is below
{"found":false,"_index":"app","_type":"syslog","_id":"_query","_version":1,"_shards":{"total":2,"successful":1,"failed":0}}
i used to query to get this message from index
{
"_index" : "app",
"_type" : "syslog",
"_id" : "AVckPMQnKYIebrQhF556",
"_score" : 1.0,
"_source" : {
"message" : "some test message",
"#version" : "1",
"#timestamp" : "2016-09-13T15:49:04.562Z",
"type" : "syslog",
"host" : "1.2.3.4",
"priority" : 0,
"severity" : 0,
"facility" : 0,
"facility_label" : "kernel",
"severity_label" : "Emergency"
}
[EDIT 2]
Delete by query listed as plugin
sudo /usr/share/elasticsearch/bin/plugin list
Installed plugins in /usr/share/elasticsearch/plugins/node1:
- delete-by-query

I had similar problem, after filling elasticsearch with 77 millions of unwanted documents in last couple of days. Setting timeout in query is your friend. As mentioned here. Curl has parameter to increase too (-m 3600)
curl --request DELETE \
--url 'http://127.0.0.1:9200/nadhled/tree/_query?timeout=60m' \
--header 'content-type: application/json' \
-m 3600 \
--data '{"query":{
"filtered":{
"filter":{
"range":{
"timestamp":{
"lt":1564826247
},
"timestamp":{
"gt":1564527660
}
}
}
}
}
}'
I know this is not your bulk delete, but I've found this page during my research so I post it here. Hope it helps you too.

In latest Elasticsearch(5.2), you could use _delete_by_query
curl -XPOST "http://localhost:9200/index/type/_delete_by_query" -d'
{
"query":{
"match_all":{}
}
}'
The delete-by-query API is new and should still be considered
experimental. The API may change in ways that are not backwards
compatible
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

I would suggest that you should rather create a new index and reindex the documents you want to keep
But if you wanna use delete by query you should use this,
curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query'
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
}
}
but then you'll be left with mapping.

Related

Search particular document id in all available indices of Elasticsearch

Is there any possibility where we can search a particular document id in all available indices. /_all/_search/ returns all documents but I tried it as /_all/_search/?q=<MYID> or
/_all/_search/_id/<MYID>
but I'm not getting any documents.
If Elasticsearch does not support this, how will we achieve this task ? The use case is centralized log system based on Logstash and Elasticsearch, having multiple indices of different running services.
You can use the terms query for this. Use _all to search on all indexes.Please refer here
here is the request I used
curl -XGET "http://localhost:9200/_all/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"terms": {
"_id": [
"4ea288f192e2c8b6deb3cee00d7b873b",
"dcc2b9c4fb6d14b2d41dbc5fee801af3"
]
}
}
}'
_id is the id of the document
You can use multi get api
You will need to pass the index name , it won't work on all indices
GET /_mget
{
"docs" : [
{
"_index" : "index1",
"_id" : "1"
},
{
"_index" : "index2",
"_id" : "1"
}
]
}

Document Count in keyword buckets from list in document as aggregation in Elasticsearch

The situation:
I am a starter in Elasticsearch and cannot wrap my head around how to use the aggregations go get what I need.
I have documents with the following structure:
{
...
"authors" : [
{
"name" : "Bob",
"#type" : "Person"
}
],
"resort": "Politics",
...
}
I want to use an aggregation to get the documents count for every author. Since there may be more than one author for some documents, these documents should be counted for every author individually.
What I've tried:
Since the terms aggregation worked with the resort field I tried using it with authors or the name field inside, but always getting no buckets at all. For this I used the following curl request:
curl -X POST 'localhost:9200/news/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": false,
"aggs": {
"author_agg": { "terms": {"field": "authors.keyword" } }
}
}'
I concluded, that the terms aggregation doesn't work with fields, that are contained by a list.
Next I thought about the nested aggregation, but the documentation says, it is a
single bucket aggregation
so not what I am searching for. Because I ran out of ideas I tried it, but was getting the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I found this answer and tried use it for my data. I had the following request:
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "authors"
},
"aggs": {
"authorname": {
"terms" : {
"field": "name.keyword"
}
}
}
}
}
}'
which gave me the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I searched for how to make my path nested using mappings, but I couldn't find out how to accomplish that. I don't even know, if this actually makes sense or not.
So how can I aggregate the documents into buckets based on a key, that lies in elements of a list inside the documents?
Maybe this question have been answered somewhere else, but then I'm not able to state my problem in the right way, since I'm still confused by all the new information. Thank you for your help in advance.
I finally solved my problem:
The idea of getting the authors key mapping nested was totally right. But unfortunately Elasticsearch does not let you change the type from un-nested to nested directly, because all items in this key then have to be indexed too. So you have to go the following way:
Create a new index with a custom mapping. Here we go into the document type _doc, into it's properties and then into the documents field authors. There we set type to nested.
~
curl -X PUT "localhost:9200/new_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc" : {
"properties" : {
"authors": { "type": "nested" }
}
}
}
}'
Then we reindex our dataset and set the destination to our newly created index. This will index the data from the old index into the new index, essentially copying the pure data, but taking the new mapping (since settings and mappings are not copied this way).
~
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}'
Now we can do the nested aggregation here, to sort the documents into buckets based on the authors:
curl -X GET 'localhost:9200/new_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"authors": {
"nested": {
"path": "authors"
},
"aggs": {
"authors_by_name": {
"terms": { "field": "authors.name.keyword" }
}
}
}
}
}'
I don't know how to rename indices until now, but surely you can just simple delete the old index and then do the described procedure to create another new index with the name of the old one but the custom mapping.

If I store a date in elastic search using a timestamp (e.g. 1428956853627), can I still query that record by day, without the time value?

For example, if I have a mapping with a date field like:
$ curl -XGET http://localhost:9200/testing/blog/_mapping
{"testing":{
"mappings":{
"blog":{
"properties":{
"posted":{
"type":"date",
"format":"dateOptionalTime"
},
"title":{
"type":"string"
}
}
}
}
}
}
and I then insert a record like:
$ curl -XPUT http://localhost:9200/testing/blog/1 -d
'{"title": "Elastic search", "posted": 1428956853627}'
{"_index":"testing","_type":"blog","_id":"1","_version":1,"created":true}
using the timestamp corresponding to Mon Apr 13 15:27:33 CDT 2015, is there some way to query that back out by "plain old" date? For instance, if I want to see posts that were posted on 4/13/15, I try:
$ curl -XGET http://localhost:9200/testing/blog/_search?pretty -d
'{"query":{"filtered": {"filter": {"term": {"posted": "2015-04-13"}}}}}'
and get back no hits:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Even if I do include the timestamp in the query, I still don't get back my record:
$ curl -XGET http://localhost:9200/testing/blog/_search?pretty -d
'{"query": {"filtered": {"filter": {"term": {"posted": "2015-04-13T15:27:33"}}}}}'
I thought that at least, since the mapping declared "posted" as a date, I could retrieve it by range via:
$ curl -XGET http://localhost:9200/testing/blog/_search?pretty -d
'{"query": {"filtered": {"filter": {"range": {"posted": {"gt": "2014-04-13", "lt": "2014-04-15"}}}}}}'
But even that doesn't work. The only way I can seem to query back the entry by date is by using the exact same timestamp (long) value that I originally passed in:
$ curl -XGET http://localhost:9200/testing/blog/_search?pretty -d
'{"query": {"filtered": {"filter": {"term": {"posted": 1428956853627}}}}}'
But that's not really that useful... do I have to convert all of my dates to strings before inserting them into elastic search? Ideally I'd like to be able to save with higher fidelity, including the times in case I need them, but still search for 2014-04-13. Can elasticsearch do that?
The date range query is the way to go with.
{
"query": {
"filtered": {
"filter": {
"range": {
"posted": {
"gt": "2015-04-13",
"lt": "2015-04-15"
}
}
}
}
}
}
The date you provided is from year 2015 and not 2014 as you have done in your query. That is why the range query is not working.
Also ES is accepting this time format only because its been covered under the formats specified under dateOptionalTime.
Do check it if you need to see usable formats here.
If you need other formats to be accepted , you need to mention the additional ones in the mappings.

Percolate not returning results as expected

We're trying to set up and use percolate, but we aren't quite getting results as expected.
First, I register a few queries:
curl -XPUT 'localhost:9200/index-234234/.percolator/query1' -d '{
"query" : {
"range" : {
"price" : { "gte": 100 }
}
}
}'
curl -XPUT 'localhost:9200/index-234234/.percolator/query2' -d '{
"query" : {
"range" : {
"price" : { "gte": 200 }
}
}
}'
And then, when I try to match it against 150, which should ideally match only query1, instead it matches both queries:
curl -XGET 'localhost:9200/index-234234/message/_percolate' -d '{
"doc" : {
"price" : 150
}
}'
{"took":4,"_shards":{"total":5,"successful":5,"failed":0},"total":2,"matches":[{"_index":"index-234234","_id":"query1"},{"_index":"index-234234","_id":"query2"}]}
Any pointers as to why this is happening would be much appreciated.
The problem is that you are registering your percolator queries prior to setting up the mappings for the document. The percolator has to register the query without a defined mapping and this can be an issue particularly for range queries.
You should start over again by deleting the index and then run this mapping command first:
curl -XPOST localhost:9200/index-234234 -d '{
"mappings" : {
"message" : {
"properties" : {
"price" : {
"type" : "long"
}
}
}
}
}'
Then execute your previous commands (register the two percolator queries and then percolate one document) you will get the following correct response:
{"took":3,"_shards":{"total":5,"successful":5,"failed":0},"total":1,"matches":[{"_index":"index-234234","_id":"query1"}]}
You may find this discussion from a couple of years ago helpful:
http://grokbase.com/t/gg/elasticsearch/124x6hq4ev/range-query-in-percolate-not-working
Not a solution, but this works (without knowing why) for me:
Register both percolator queries
Do the _percolator request (returns your result: "total": 2)
Register both percolator queries again (both are now in version 2)
Do the _percolator request again (returns right result: "total": 1)

No query registered for [match]

I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo

Resources