ElasticSearch (cURL REST API request) not giving results expected - elasticsearch

I use this command, which should match all documents:
curl -XGET 'localhost:9200/users/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} }
}
'
I get this response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
But I'm 99.9% sure I have documents on that index. If I am right, why isn't it showing the matches? If I am wrong, how can I confirm this?

You should be able to determine what's happening if you know (a) where all your documents are being stored and (b) what the server thinks the 'users' index actually is.
For the first question, you can hit the _cat/indices endpoint to see how many documents you have in each index (the "docs.count" column):
$curl -XGET 'http://localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open query 1 0 0 0 159b 159b
green open some_index 1 0 54 0 24.7kb 24.7kb
green open autocomplete 1 0 0 0 159b 159b
green open test_index 2 0 10065 4824 7.9mb 7.9mb
For the second question, check the aliases defined on your server. It's possible that "users" has been defined as an alias to an index that doesn't have any documents, or it's possible that a filtered alias has been defined on that index with a filter that is excluding all of your documents (many aliases have date-related filters that will exclude all documents outside of a very specific date range). To check for the presence of aliases you can use
$curl -XGET 'http://localhost:9200/_aliases?pretty=true'

Related

Interpreting the output of elasticsearch GET query

The output of the curl -X GET "localhost:9200/_cat/shards?v" is as follows:
index
shard
prirep
state
docs
store
ip
node
test_index
1
p
STARTED
0
283b
127.0.0.1
Deepaks-MacBook-Pro-2.local
test_index
1
r
UNASSIGNED
0
test_index
1
r
UNASSIGNED
0
test_index
0
p
STARTED
1
12.5kb
127.0.0.1
Deepaks-MacBook-Pro-2.local
test_index
0
r
UNASSIGNED
0
test_index
0
r
UNASSIGNED
0
And the output of the query curl -X GET "localhost:9200/test_index/_search?size=1000" | json_pp is as follows:
{
"_shards" : {
"failed" : 0,
"skipped" : 0,
"successful" : 2,
"total" : 2
},
"hits" : {
"hits" : [
{
"_id" : "101",
"_index" : "test_index",
"_score" : 1,
"_source" : {
"in_stock" : -4,
"name" : "pizza maker",
"prize" : 10
},
"_type" : "_doc"
}
],
"max_score" : 1,
"total" : {
"relation" : "eq",
"value" : 1
}
},
"timed_out" : false,
"took" : 2
}
MY QUESTION: As you can see, only text_index 0 primary shard has the data (from the output of first query), why successful key inside the _shards key has the value of 2?
Also, there is only 1 document, then why the value of total key inside _shards key is 2?
test_index has two primary shards and four unassigned replica shards (probably because you have a single node). Since a primary shard is a partition of your index, a document can only be stored in a single primary shard, in your case primary shard 0.
total: 2 means that the search was run over the two primary shards of your index (i.e. 100% of the data) and successful: 2 means that all primary shards responded. So you know you can trust the response to have searched over all your test_index data.
There's nothing wrong here.

Elasticsearch timout doesn't work when do searching

Elasticsearch version (bin/elasticsearch --version):5.2.2
JVM version (java -version): 1.8.0_121
OS version (uname -a if on a Unix-like system): opensuse
Do search with " curl -XGET 'localhost:9200/_search?pretty&timeout=1ms' "
The part of response is :
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 208,
"successful" : 208,
"failed" : 0
},
"hits" : {
"total" : 104429,
"max_score" : 1.0,
"hits" :
...
The took time is 5ms, and timeout setting is 1ms. Why "timed_out" is false rather than true.
Thanks
The timeout is per searched shard (looks like 208 in your case), while the took is for the entire query. On a per shard level you are within the limit. The documentation has some additional information on when you will hit timed_out and more caveats.
Try with a more expensive query (leading wildcard, fuzziness,...) — I guess then you should hit the (shard) limit.

Elasticsearch : delete index and recreate same index result in incorrect data

I use elasticsearch-2.3.2
I created my index http://localhost:9200/github_inactivusr-2017.03.21
The command
curl http://localhost:9200/github_inactivusr-2017.03.21/_search
indicates I have a total of 7650 entries in my index
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7650,
"max_score" : 1.0,
...
}
I delete this index
curl -X DELETE http://localhost:9200/github_inactivusr-2017.03.21
I do get the message
{"acknowledged":true}
When I execute
curl http://localhost:9200/_cat/indices
I get
red open stats-new_format_membership-2016.08.12 5 1
red open stats-json3-jira-users-2017.07.13 5 1
yellow open github_activusr-2017.03.21 5 1 80495 0 16.6mb 16.6mb
yellow open github_activusr-2017.07.24 5 1 34697 0 9.3mb 9.3mb
The index github_inactivusr-2017.03.21 is no longer listed
I then recreate the index "github_inactivusr-2017.03.21" (exactly the same name and same mapping) again with 2550 entries
However, when I use the command curl http://localhost:9200/github_inactivusr-2017.03.21/_search,
I still get a total of 7650 entries.
After recreating the index, if I execute the command :
curl http://localhost:9200/_cat/indices
I get
red open stats-new_format_membership-2016.08.12 5 1
red open stats-json3-jira-users-2017.07.13 5 1
yellow open github_activusr-2017.03.21 5 1 80495 0 16.6mb 16.6mb
yellow open github_activusr-2017.07.24 5 1 34697 0 9.3mb 9.3mb
yellow open github_inactivusr-2017.03.21 5 1 7650 0 1.6mb 1.6mb
It is as if the index was not properly removed. Even if I stop and restart elasticsearch before recreating the index, I get this behaviour.
It is as if there is a cache or whatsoever that does not get rid of the data.
Does anyone have any idea ?
I found the root cause.
The problem was related to the logstash configuration file that I used.
It was reinjecting 3 times the same data, which explains the 7650 (2550 * 3)
Sorry about the time and thanks #Val

how to assign shards which are actual unassigned?

Problem: I've started five elasticsearch nodes, but only 66,84 % of the Data is in kibana available. When I check the cluster health with localhost:9200/_cluster/health?pretty=true I've got the following informations: {
"cluster_name" : "A2A",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 4,
"active_primary_shards" : 612,
"active_shards" : 613,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 304,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 66.8484187568157
}
And also all my indices are red, except of the kibana index.
Small Part:
red open logstash-2015.11.08 5 0 47256 668 50.5mb 50.5mb
red open logstash-2015.11.09 5 0 46540 1205 50.4mb 50.4mb
red open logstash-2015.11.06 5 0 65645 579 69.2mb 69.2mb
red open logstash-2015.11.07 5 0 62733 674 66.4mb 66.4mb
green open .kibana 1 1 2 0 19.7kb 9.8kb
red open logstash-2015.11.11 5 0 49254 1272 53mb 53mb
red open logstash-2015.11.12 5 0 50885 466 53.6mb 53.6mb
red open logstash-2015.11.10 5 0 49174 1288 52.6mb 52.6mb
red open logstash-2016.04.12 5 0 92508 585 104.8mb 104.8mb
red open logstash-2016.04.13 5 0 95120 279 107.2mb 107.2mb
I've tried to fix the problem with curl -XPUT 'localhost:9200/_settings' -d ' {"index.routing.allocation.disable_allocation": false}' but it doesn't work!
So has anyone of you some ideas how to assign my shards?
And when you need some other infos please ask and I will try to offer you the data:
Have you seen this answer? https://stackoverflow.com/a/23816954/1834331
You could also try restarting elasticsearch first: service elasticsearch restart.
Otherwise, just try reallocating the shards manually (as your indices have 5 shards, run the command with the shard flag 0, 1, 2, .. 5):
curl -XPOST -d '{ "commands" : [ {
"allocate" : {
"index" : "logstash-2015.11.08",
"shard" : 0,
"node" : "SOME_NODE_HERE",
"allow_primary":true
}
} ] }' http://localhost:9200/_cluster/reroute?pretty`
You can check the nodes with unassigned shards using: curl -s localhost:9200/_cat/shards | grep UNASS
if shards are stuck in unallocated they can be manually allocated. For example:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands": [{
"allocate": {
"index": "logstash-2015.11.07",
"shard": 5,
"node": "Frederick Slade",
"allow_primary": 1
}
}]
}'
See the Cluster Reroute documentation, including warnings on the use of allow_primary.

Elasticsearch: Inconsistent number of shards in stats & cluster APIs

I uploaded the data to my single node cluster and named the index as 'gequest'.
When I GET from http://localhost:9200/_cluster/stats?human&pretty, I get:
"cluster_name" : "elasticsearch",
"status" : "yellow",
"indices" : {
"count" : 1,
"shards" : {
"total" : 5,
"primaries" : 5,
"replication" : 0.0,
"index" : {
"shards" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"primaries" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"replication" : {
"min" : 0.0,
"max" : 0.0,
"avg" : 0.0
}
}
}
When I do GET on http://localhost:9200/_stats?pretty=true
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
}
How come total number of shards not consistent in two reports? Why total shards are 10 from stats API. How to track the other 5?
From the results it is likely that you have a single elasticsearch node running and created a index with default values(which creates 5 shards and one replica). Since there is only one node running elasticsearch is unable to assign the replica shards anywhere(elasticsearch will never assign the primary and replica of the same shard in a single node).
The _cluster/stats API gives information about the cluster including the current state. From your result it is seen that the cluster state is "yellow" indicating that all the primary shards are allocated but not all replicas have been allocated/initialized. So it is showing only the allocated shards as 5.
The _stats API gives information about your indices in the cluster. It will give information about how many shards the index will have and how many replicas. Since your index needs a total of 10 shards (5 primary and 5 replica as specified when you create the index) the stats contain information as total 10, successful 5 and failed 5(failed because unable to allocate in any node).
Use http://localhost:9200/_cat/shards to see the overall shard status

Resources