Deleting documents in multiple indices from elasticsearch - elasticsearch

I'm trying to delete all documents of a certain type across multiple indices (documents are created by logstash so there is an index for each day).
I've tried this:
DELETE _all/_query?q=type:iss
The result looks something like:
{
"_indices": {
"logstash-2014.01.18": {
"_shards": {
"total": 5,
"successful": 0,
"failed": 5
}
},
"_indices": {
"logstash-2014.01.18": {
"_shards": {
"total": 5,
"successful": 2,
"failed": 3
}
},
...
}
Every time I run it I get a different number of successes/failures in each index. The 1st query above initially seemed to work. If I look in elasticsearch-head and Kibana it seems like at least some of the documents have been deleted. However if I then query for them:
POST _search {"query":{"match":{"type":"iis"}}}
or
GET _search?q=type:iis
it still returns all results. I don't believe this is a caching problem as I've done everything possible to try to ensure that isn't the case (cleared browser data, restarted elasticsearch/server etc).
I also tried:
DELETE _all/iis/_query {"query":{"match_all":{}}}
Again I get the inconsistent success/failure results but it does seem to have deleted documents when I run the search queries again. It only seems to be deleting a few every time though.
Why is this so inconsistent and what can I do to get this working consistently?

Related

return empty result for a nested bool query on fields that don't have data

I'm doing the following query:
the ns.ns field has configured (has both mapping and setting set up successfully) but there is no source data for this field. and I get empty result returned from ElasticSearch. is that right? I mean without data this query would return empty result, is that correct? Still learning ES and thanks for the help.
The ns.ns field has configured (has both mapping and setting set up
successfully) but there is no source data for this field. and I get
empty result returned from ElasticSearch. is that right?
without data this query would return an empty result, is that correct?
As you have mentioned above that the ns field is mapped as type nested, therefore when you hit the search query you will not get "index_not_found_exception", since the index already exists.
The search API returns search hits that match the query defined in the request.
When you hit the search query, mentioned in the question above, the following response is there:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
The response provides the following information about the search request:
took – how long it took Elasticsearch to run the query, in
milliseconds
timed_out – whether or not the search request timed out
_shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
max_score – the score of the most relevant document found
hits.total.value - how many matching documents were found
The hits.hits above returns a blank array([]), hits.hits is an array of found documents that meet your search query. As here no documents are indexed, therefore no documents are matched when a search query is hit.
Refer to this ES documentation, to know more about how scoring works in ES
In the above response max_score value is NULL, the _score in
Elasticsearch is a way of determining how relevant a match is to the
query.

Getting different sequence of documents when upgraded from ES 1.4 to ES 2.3

I used this query curl localhost:9200/tweets/user/_search?size=25 on es 1.4.2 and I got the following result:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 294633,
"max_score": 1,
"hits": [
...
with a list of documents.
When I ran the same query on es 2.3.0, I got same hits but the documents were completely different.
What could be the reason?
The documentation says the order will be random:
This will apply a constant score (default of 1) to all documents. It will perform the same as the above query, and all documents will be returned randomly like before, they’ll just have a score of one instead of zero.

Inconsistent doc count

Hi I am running Elasticsearch 1.5.2
I indexed 6,761,727 documents in one of my indexes.
When I run the following query....
GET myindex/mytype/_search
{
"size": 0
}
The hits.total count keeps alternating between 2 values...
"hits": {
"total": 6761727,
"max_score": 0,
"hits": []
}
and
"hits": {
"total": 6760368,
"max_score": 0,
"hits": []
}
No matter how many times I run the query the count goes back and forth between the 2.
I searched around a bit and found out that it seems that primary vs replica shards don't have exact same number of docs. If I use preference=primary then the doc count returned is correct.
What is the easiest way to check which shard is the culprit and try to fix him without re-indexing everything?
Set the replica count to 0 for that index
PUT /my_index/_settings
{
"index": {
"number_of_replicas": 0
}
}
wait to see no more replicas for that index when you do GET /_cat/shards/my_index?v and then set back to the initial number of replicas.
This will delete all the replicas for that index and then make a new copy of the primaries.

quick check whether an elasticsearch index will return search hits

We're running commands against an elasticsearch source against a few indices like so:
curl -XGET 'http://es-server:9200/logstash-2015.01.28,logstash-2015.01.27/_search?pretty' -d #/a_query_file_in_json_format
Works great most of the time, and we can parse the results we need.
However when the indices are in a bad state-- maybe there's been a lag in indexing, or some shards are acting up-- the query above will return no results, and it's impossible to know whether it's because there's no matching records or the index is unstable in some way.
I've been looking at the elastic search indices recovery API but am a bit overwhelmed. Are there some queries I can run that will give a yes/no answer to 'can a search against these indices be relied upon at the moment?'
You have multiple ways to get this information.
1) You can use the cluster health API at the indices level like this :
GET _cluster/health/my_index?level=indices
This will output the status of the cluster, with information about status and shards of the index my_index :
{
"cluster_name": "elasticsearch_thomas",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 5,
"active_shards": 5,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 5,
"indices": {
"my_index": {
"status": "yellow",
"number_of_shards": 5,
"number_of_replicas": 1,
"active_primary_shards": 5,
"active_shards": 5,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 5
}
}
}
2) If you want to have a less verbose answer, or to filter only on some specific information, you can rely on the _cat API, which allows you to customize the output. However, the output is no longer a JSON.
For example, if you want only the name and health status of the indices, the following request will do the trick :
GET _cat/indices/my_index?h=index,health&v
by outputting this :
index health
my_index yellow
Note that the column headers are shown only because of the verbose flag (v GET parameter in the previous request).
To have a complete list of what columns are available, use the help parameter :
GET _cat/indices?help

Verify database is indexed by Elasticsearch

I'm using Elasticsearch to index and search my db...
How can I verify that the database is indexed?
If I use the following command:
curl -XGET 'http://localhost:9200/_search?q=whatever'
the results are:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
How should these results be interpreted?
You can get the list of indices present in your database using the command. You can see if your index is present in the list. That shows that index has been created.
curl -XGET 'localhost:9200/_cat/indices?v&pretty'
To check if there are any entries present in your Index. You can get the list of all the documents using this command.
curl -XGET 'localhost:9200/INDEX_NAME/_search?v&pretty'
In the question that you posted.
"_shards" {
"total" : gives how many entries are present in your index (6 here)
}
"hits" : {
"total" : gives you the entries that matched your search with keyword whatever (0 here)
}
To check if your db has been indexed, you can try the command:
curl -XGET 'http://localhost:9200/_aliases'?pretty=true
where you can see the list of indices and check if yours has been indexed.
The command you used basically searches for the keyword "whatever" in all of the indices.
But it was not able to find anything. Hence you get the following output:
The search was successful (depicted by "took":4, "timed_out":false, "_shards":"total":6, "successful":6, "failed":0}) but it did not find anything (depicted by "hits":{"total":0,"max_score":null,"hits":[]})

Resources