How to find execution time of elasticsearch query - elasticsearch

I am using elasticsearch to get the fast result from a rails app. I want to know how much time a particular query took to get executed? Is there any tool where I can find and compare execution time so that I can optimize query?

The 'took' attribute in the response object is the execution time in milliseconds. For example:
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Related

Elasticsearch deleted document reappears using logstash

I am running ES on single node cluster for development.
I am deleting a document using delete api from kibana. It is deleted for a second and immediately reappears. Any help would be appreciated
Here is api command I use:
DELETE test/_doc/12345
{
"_index" : "test",
"_type" : "_doc",
"_id" : "12345",
"_version" : 231,
"result" : "deleted",
"_shards" : {
"total" : 3,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 899,
"_primary_term" : 1
}
GET test/_count
{
"count" : 3,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Immediately deleted doc is re-indexed
GET test/_count
{
"count" : 4,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
According to the documentation:
...If clean_run is set to true, this value will be ignored and
sql_last_value will be set to Jan 1, 1970
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_state
That may explain why all your data are added each 10 minutes. Remove the clean_run and test again or check if the _version filed is updated.
I found that it was an data issue. my logstash jdbc statement checks for modificationdate greater than sql_last_value. and scheduler is set to run every 10 seconds. The reappeared documents have modificationdate in the future, changing it to current date solved the problem

Elasticsearch max of field combined with a unique field

I have an index with two fields:
name: uuid
version: long
I now only want to count the documents (on a very large index [1 million+ entries]) where the version of the name is the highest. For e.g. a query on an index with the following documents:
{name="a", version=1}
{name="a", version=2}
{name="a", version=3}
{name="b", version=1}
... would return:
count=2
Is this somehow possible? I can not find a solution for this particular problem.
You are effectively describing a count of distinct names, which you can do with a cardinality aggregation.
Request:
GET test1/_search
{
"aggs" : {
"distinct_count" : {
"cardinality" : {
"field" : "name.keyword"
}
}
},
"size": 0
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"distinct_count" : {
"value" : 2
}
}
}

Count the number of duplicates in elasticsearch

I have an application inserting a numbered sequence of logs into elasticsearch.
Under certain conditions, after stopping my application, I find that in elasticsearch there are more logs than I have actually generated.
This simple aggregation helped me find out that a few duplicates are present:
curl /logstash-*/_search?pretty -d '{
size: 0,
aggs: {
msgnum_terms: {
terms: {
field: "msgnum.raw",
min_doc_count: 2,
size: 0
}
}
}
}'
msgnum is the field containing the numeric sequence. Normally it should be unique and the resulting doc_counts never exceed 1. Instead I get something like:
{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 100683,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"msgnum_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "4097",
"doc_count" : 2
}, {
"key" : "4099",
"doc_count" : 2
...
...
...
}, {
"key" : "5704",
"doc_count" : 2
} ]
}
}
}
How can I count the exact number of duplicates in order to make sure that they are the only cause of mismatch between number of generated log lines and number of hits in elasticsearch?

Elastic Search Index Status

I am trying to setup a scripted reindex operation as suggested in: http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
To go with the suggestion of creating a new index, aliasing then deleting the old index I would need to have a way to tell when the indexing operation on the new index was complete. Ideally via the REST interface.
It has 80 million rows to index and can take a few hours.
I can't find anything helpful in the docs..
You can try with _stats : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-stats.html
Eg :
{
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
},
"_all" : {
"primaries" : {
"docs" : {
"count" : 0,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 575,
"throttle_time_in_millis" : 0
},
"indexing" : {
"index_total" : 0,
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
I think, you can compare _all.total.docs.count and _all.total.indexing.index_current

ElasticSearch count returned result

I want to count number of document returned as a result of a query with size limit. For example, I run following query:
curl -XGET http://localhost:9200/logs_-*/a_logs/_search?pretty=true -d '
{
"query" : {
"match_all" : { }
},
"size" : 5,
"from" : 8318
}'
and I get:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 159,
"successful" : 159,
"failed" : 0
},
"hits" : {
"total" : 8319,
"max_score" : 1.0,
"hits" : [ {
....
Total documents matching my query are 8319, but I fetched at max 5. Only 1 document was returned since I queried "from" 8318.
In the response, I do not know how many documents are returned. I want to write a query such that the number of documents being returned are also present in some field. Maybe some facet may help, but I could not figure out. Kindly help.
Your query :
{
"query" : {
"match_all" : { }
},
=> Means that you ask all your data
"size" : 5,
=> You want to display only 5 results
"from" : 8318
=> You start from the 8318 records
ElasticSearch respons :
....
"hits" : {
"total" : 8319,
...
=> Elastic search told you that there is 8319 results in his index.
You ask him all the result and you start from the 8318.
8319 - 8318 = 1 So you have 1 result.
Try by removing the from.
Looking through the documentation, it's not clear how to make the query return this -- if indeed the API supports it. If you just want to have the count of the returned hits, the easiest way seems to be to actually count them yourself after parsing the response.

Resources