How to benchmark executing in Elasticsearch? - elasticsearch

I want to executing benckmark test and I follow the Elasticsearch documentation.
/bin/elasticsearch --node.bench true
curl -XPUT 'localhost:9200/_bench/?pretty=true' -d '{
"name": "my_benchmark",
"competitors": [ {
"name": "my_competitor",
"requests": [ {
"query": {
"match": { "_all": "a*" }
}
} ]
} ]
}'
But I get the exception.
{ "error" : "InvalidIndexNameException[[_bench] Invalid index name
[_bench], must not start with '_']", "status" : 400 }
What am I doing wrong.

I checked the freshly downloaded elasticsearch in version 1.4.2 and I got the same result as you.
Then I compiled elastic search from master branch taken from github and it worked. For your curl I got following reply:
{
"status" : "COMPLETE",
"errors" : [ ],
"competitors" : {
"my_competitor" : {
"summary" : {
"nodes" : [
"Cable" ],
"total_iterations" : 5,
"completed_iterations" : 5,
"total_queries" : 5000,
"concurrency" : 5,
"multiplier" : 1000,
"avg_warmup_time" : 0.0,
"statistics" : {
"min" : 1,
"max" : 1,
"mean" : 1.0,
"qps" : 1000.0,
"std_dev" : 0.0,
"millis_per_hit" : 0.0,
"percentile_10" : 1.0,
"percentile_25" : 1.0,
"percentile_50" : 1.0,
"percentile_75" : 1.0,
"percentile_90" : 1.0,
"percentile_99" : 1.0
}
}
}
}
}
And for simpler curl I got:
curl -XGET 'localhost:9200/_bench?pretty'
{ }
So it seems that this functionality is so experimental that it's not yet included in any stable release.
I guess reference [master] in documentation header was not a coincidence ;)

Related

No exact match for RANGE query for a specific time

Question
Why does the Elasticsearch Range query not exact match with the time "2017-11-30T13:23:23.063657+11:00"? Kindly suggest if there is a mistake in the query or it is expected.
Query
curl -XGET 'https://hostname/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"range" : {
"time" : {
"gte": "2017-11-30T13:23:23.063657+11:00",
"lte": "2017-11-30T13:23:23.063657+11:00"
}
}
}
}
'
The expected only one data to match is below.
{
"_index": "***",
"_source": {
"time": "2017-11-30T13:23:23.063657+11:00",
"log_level": "INFO",
"log_time": "2017-11-30 13:23:23,042"
},
"fields": {
"time": [
1512008603063
]
}
}
Result
However, it matched multiple records which is closer to the time.
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ {
"_index" : "***",
"_score" : 1.0,
"_source" : {
"time" : "2017-11-30T13:23:23.063612+11:00",
"log_level" : "INFO",
"log_time" : "2017-11-30 13:23:23,016"
}
}, {
"_index" : "core-apis-non-prod.97d5f1ee-a570-11e6-b038-02dc30517283.2017.11.30",
"_score" : 1.0,
"_source" : {
"time" : "2017-11-30T13:23:23.063722+11:00",
"log_level" : "INFO",
"log_time" : "2017-11-30 13:23:23,046"
}
}
...
Elasticsearch uses Joda-Time for parsing dates. And your problem is that Joda-Time only stores date/time values down to the millisecond.
From the docs:
The library internally uses a millisecond instant which is identical
to the JDK and similar to other common time representations. This
makes interoperability easy, and Joda-Time comes with out-of-the-box
JDK interoperability.
This means that the last 3 digits of the seconds are not taken into account when parsing the date.
2017-11-30T13:23:23.063612+11:00
2017-11-30T13:23:23.063657+11:00
2017-11-30T13:23:23.063722+11:00
Are all interpreted as:
2017-11-30T13:23:23.063+11:00
And the corresponding epoch time is 1512008603063 for all these values.
You can see this too by adding explain to the query like this:
{
"query": {
"range" : {
"time" : {
"gte": "2017-11-30T13:23:23.063657+11:00",
"lte": "2017-11-30T13:23:23.063657+11:00"
}
}
},
"explain": true
}
That is basically the reason all those documents match your query.

Elasticsearch range with aggs

i want average rating of every user document but is not working according to me.please check the code given below.
curl -XGET 'localhost:9200/mentorz/users/_search?pretty' -H 'Content-Type: application/json' -d'
{"aggs" : {"avg_rating" : {"range" : {"field" : "rating","ranges" : [{ "from" : 3, "to" : 19 }]}}}}';
{ "_index" : "mentorz", "_type" : "users", "_id" : "555", "_source" : { "name" : "neeru", "user_id" : 555,"email_id" : "abc#gmail.com","followers" : 0,
"following" : 0, "mentors" : 0, "mentees" : 0, "basic_info" : "api test info",
"birth_date" : 1448451985397,"charge_price" : 0,"org" : "cz","located_in" : "noida", "position" : "sw developer", "exp" : 7, "video_bio_lres" : "test bio lres url normal signup","video_bio_hres" : "test bio hres url normal signup", "rating" : [ 5 ,4], "expertises" : [ 1, 4, 61, 62, 63 ] }
this is my user document,i want to filter only those users who have average rating range from 3 to 5.
Update Answer
I've made a query using script, hope the below query works for you.
GET mentorz/users/_search
{
"size": 0,
"aggs": {
"term": {
"terms": {
"field": "user.keyword",
"size": 100
},
"aggs": {
"NAME": {
"terms": {
"field": "rating",
"size": 10,
"script": {
"inline": "float var=0;float count=0;for(int i = 0; i < params['_source']['rating'].size();i++){var=var+params['_source']['rating'][i];count++;} float avg = var/count; if(avg>=4 && avg<=5) {avg}else{null}"
}
}
}
}
}
}
}
You can change the range of your desired rating range by changing the if condition "if(avg>=4 && avg<=5)".

Elasticsearch: remove/update field inside nested object

{
"_index" : "test",
"_type" : "test",
"_id" : "1212",
"_version" : 5,
"found" : true,
"_source" : {
"count" : 42,
"list_data" : [ {
"list_id" : 11,
"timestamp" : 1469125397
}, {
"list_id" : 122,
"timestamp" : 1469050965
} ]
}
}
This is my document schema.list_data is nested object. I have requirement to update/delete particular filed inside list_data. I am able to update count field using groovy script.
$ curl -XPOST 'localhost:9200/index/type/1212/_update?pretty' -d '
{
"script" : "ctx._source.count = 41"
}'
But don't know how to update nested object.
For example i want to add this into list_data.
{
"list_id" : 121,
"timestamp" : 1469050965
}
and my document should change to:
{
"_index" : "test",
"_type" : "test",
"_id" : "1212",
"_version" : 6,
"found" : true,
"_source" : {
"count" : 41,
"list_data" : [ {
"list_id" : 11,
"timestamp" : 1469125397
}, {
"list_id" : 121,
"timestamp" : 1469050965
}, {
"list_id" : 122,
"timestamp" : 1469050965
} ]
}
}
and if i perform delete based on list_id = 122 my record should look like
{
"_index" : "test",
"_type" : "test",
"_id" : "1212",
"_version" : 7,
"found" : true,
"_source" : {
"count" : 41,
"list_data" : [ {
"list_id" : 11,
"timestamp" : 1469125397
}, {
"list_id" : 121,
"timestamp" : 1469050965
}]
}
}
To add a new element to your nested field you can proceed like this:
$ curl -XPOST 'localhost:9200/index/type/1212/_update?pretty' -d '
{
"script" : "ctx._source.list_data += newElement",
"params": {
"newElement": {
"list_id" : 121,
"timestamp" : 1469050965
}
}
}'
To remove an existing element from your nested field list, you can proceed like this:
$ curl -XPOST 'localhost:9200/index/type/1212/_update?pretty' -d '
{
"script" : "ctx._source.list_data.removeAll{it.list_id == remove_id}",
"params": {
"remove_id" : 122
}
}'
I was getting the error [UpdateRequest] unknown field [params] as I was using the latest version of ElasticSearch 7.9.0 (When wrote this answer 7.9.0 was the latest), seems like the syntax is changed a bit.
Following should work for newer versions of ElasticSearch:
$ curl -XPOST 'localhost:9200/<index-name>/_update/1212'
{
"script": {
"source": "ctx._source.list_data.removeIf(list_item -> list_item.list_id == params.remove_id);",
"params": {
"remove_id": 122
}
}
}
I don't know why, but I find that
ctx._source.list_data.removeAll{it.list_id == remove_id}
can't work. Instead I use removeIf like this:
ctx._source.list_data.removeIf{list_item -> list_item.list_id == remove_id}
where list_item could be arbitrary string.
What worked for me was the instructions in the following link. Perhaps it is the version of ES.

has_child queries fail if queryName is set

I am using elasticsearch v0.90.5 and trying to set QueryName (_name) for a has_child query as described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-named-queries-and-filters.html
But the has_child queries are failing if queryName is set. They also fail if queryName is set on a wrapping query. You can find the curl recreation here: https://gist.github.com/hmrizin/9645816
"query": {
"has_child": {
"query": {
"term": {
"postid": "p1"
}
},
"child_type": "post",
"_name": "somename"
}
}
and
"query": {
"bool": {
"should": {
"has_child": {
"query": {
"term": {
"postid": "p2"
}
},
"child_type": "post"
}
},
"_name": "somename"
}
}
The error that both cause is the same:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"failed" : 1,
"failures" : [ {
"index" : "twtest",
"shard" : 1,
"status" : 500,
"reason" : "ElasticSearchIllegalStateException[has_child filter hasn't executed properly]"
} ]
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ ]
}
}
What am I missing?
Having just run this with v1.0.1, it worked as-is without any changes.
However, I downloaded v0.90.5 and it got the error that you posted unless I removed the _name. It looks like you came across a bug, and in the very next release, v0.90.6, the entire class to blame (HasChildFilter) was refactored into other code.
v0.90.5 package: org.elasticsearch.index.search.child
v0.90.6 package: org.elasticsearch.index.search.child
It looks like you came across a bug that also existed in v0.90.4 (I verified), which is when the feature was added. Unfortunately, I think the solution to your problem is to upgrade your installation of Elasticsearch to at least v0.90.6. Based on its patch notes, they may not even have been aware of the error that they fixed, but I was able to verify that it was fixed in that release.

ElasticSearch how to use boost

This query works great, but it's returning too many results. I would like to add the boost function but I don't know the proper syntax.
$data_string = '{
"from" : 0, "size" : 100,
"sort" : [
{ "date" : {"order" : "desc"} }
],
"query": {
"more_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1
}
}
}
}';
Found the solution. Looks like using fuzzy_like_this_field and min_similarity is the way to go.
$data_string = '{
"from" : 0, "size" : 100,
"query": {
"fuzzy_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_similarity": 0.9
}
}
}
}';
According to the docs, you just need to add it to the other parameters:
...
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1,
"boost": 1.0
}
...
Also, if you have too many docs, you can try to increase the min_term_freq and the min_doc_freq, too.

Resources