ElasticSearch how to use boost - elasticsearch

This query works great, but it's returning too many results. I would like to add the boost function but I don't know the proper syntax.
$data_string = '{
"from" : 0, "size" : 100,
"sort" : [
{ "date" : {"order" : "desc"} }
],
"query": {
"more_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1
}
}
}
}';

Found the solution. Looks like using fuzzy_like_this_field and min_similarity is the way to go.
$data_string = '{
"from" : 0, "size" : 100,
"query": {
"fuzzy_like_this_field" : {
"thread.title" : {
"like_text" : "this is a test",
"min_similarity": 0.9
}
}
}
}';

According to the docs, you just need to add it to the other parameters:
...
"thread.title" : {
"like_text" : "this is a test",
"min_word_len" : 4,
"min_term_freq" : 1,
"min_doc_freq" : 1,
"boost": 1.0
}
...
Also, if you have too many docs, you can try to increase the min_term_freq and the min_doc_freq, too.

Related

Elasticsearch range with aggs

i want average rating of every user document but is not working according to me.please check the code given below.
curl -XGET 'localhost:9200/mentorz/users/_search?pretty' -H 'Content-Type: application/json' -d'
{"aggs" : {"avg_rating" : {"range" : {"field" : "rating","ranges" : [{ "from" : 3, "to" : 19 }]}}}}';
{ "_index" : "mentorz", "_type" : "users", "_id" : "555", "_source" : { "name" : "neeru", "user_id" : 555,"email_id" : "abc#gmail.com","followers" : 0,
"following" : 0, "mentors" : 0, "mentees" : 0, "basic_info" : "api test info",
"birth_date" : 1448451985397,"charge_price" : 0,"org" : "cz","located_in" : "noida", "position" : "sw developer", "exp" : 7, "video_bio_lres" : "test bio lres url normal signup","video_bio_hres" : "test bio hres url normal signup", "rating" : [ 5 ,4], "expertises" : [ 1, 4, 61, 62, 63 ] }
this is my user document,i want to filter only those users who have average rating range from 3 to 5.
Update Answer
I've made a query using script, hope the below query works for you.
GET mentorz/users/_search
{
"size": 0,
"aggs": {
"term": {
"terms": {
"field": "user.keyword",
"size": 100
},
"aggs": {
"NAME": {
"terms": {
"field": "rating",
"size": 10,
"script": {
"inline": "float var=0;float count=0;for(int i = 0; i < params['_source']['rating'].size();i++){var=var+params['_source']['rating'][i];count++;} float avg = var/count; if(avg>=4 && avg<=5) {avg}else{null}"
}
}
}
}
}
}
}
You can change the range of your desired rating range by changing the if condition "if(avg>=4 && avg<=5)".

ElasticSearch aggregation over inner object

In my ES I've a schema type like this:
{
"index_v1":{
"mappings":{
"fuas":{
"properties":{
"comment":{
"type":"string"
},
"matter":{
"type":"string"
},
"metainfos":{
"properties":{
"department":{
"type":"string"
},
"processos":{
"type":"string"
}
}
}
}
}
}
}
}
Shortly, fuas type has two properties comment and matter and an inner (not nested) object metainfos with several properties department and processos.
I'd like to know how many metainfos' fields are informed with its number of occurrences.
Imagine a document doc1 with metainfos: {department: "d1"} and a doc2 with metainfos: {department: "d2", processos: "p1"}.
Then I'd like to get: {department: 2, processos: 1}.
EDIT
As a inner object and since ES is schemaless documents' metainfos inner objects can have several fields informed or not.
So, doc1's metainfos {field1: 1, field3: 3} and doc2's metainfos {field2: 1, field4: 5} and doc3's metainfos {field1:2, field4: 2, field5: 1}.
I'd like to get: {field1: 2, field2: 1, field3: 1, field4: 2, field5: 1}. I think the main issue to solve it is how I'm able to ask for fields I don't know exist.
I've tested with two documents:
{
"hits":{
"total":2,
"max_score":1.0,
"hits":[
{
"_source":{
"matter":"FUA2",
"comment":null,
"metainfos":[
{
"department":"d1"
}
]
}
},
{
"_source":{
"matter":"FUA1",
"comment":"vcvcvc",
"metainfos":[
{
"department":"d1"
},
{
"processos":"p1"
}
]
}
}
]
}
}
I've tested this with this command:
curl -XGET 'http://localhost:9201/living_team/fuas/_search?pretty' -d '
{
"size": 0,
"aggregations" : {
"followUpActivity.metainfo.department" : {
"terms" : {
"field" : "metainfos.*"
}
}
}
}
'
The results have been:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"followUpActivity.metainfo.department" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
}
You can use the value_count aggregation for this:
{
"size": 0,
"aggs" : {
"dept" : {
"value_count" : { "field" : "metainfos.department" }
},
"proc" : {
"value_count" : { "field" : "metainfos.processos" }
}
}
}
You need to use nested fields as otherwise your inner fields are not seen "together" in the metainfos object.
See here: ElasticSearch aggregation over inner object

How to benchmark executing in Elasticsearch?

I want to executing benckmark test and I follow the Elasticsearch documentation.
/bin/elasticsearch --node.bench true
curl -XPUT 'localhost:9200/_bench/?pretty=true' -d '{
"name": "my_benchmark",
"competitors": [ {
"name": "my_competitor",
"requests": [ {
"query": {
"match": { "_all": "a*" }
}
} ]
} ]
}'
But I get the exception.
{ "error" : "InvalidIndexNameException[[_bench] Invalid index name
[_bench], must not start with '_']", "status" : 400 }
What am I doing wrong.
I checked the freshly downloaded elasticsearch in version 1.4.2 and I got the same result as you.
Then I compiled elastic search from master branch taken from github and it worked. For your curl I got following reply:
{
"status" : "COMPLETE",
"errors" : [ ],
"competitors" : {
"my_competitor" : {
"summary" : {
"nodes" : [
"Cable" ],
"total_iterations" : 5,
"completed_iterations" : 5,
"total_queries" : 5000,
"concurrency" : 5,
"multiplier" : 1000,
"avg_warmup_time" : 0.0,
"statistics" : {
"min" : 1,
"max" : 1,
"mean" : 1.0,
"qps" : 1000.0,
"std_dev" : 0.0,
"millis_per_hit" : 0.0,
"percentile_10" : 1.0,
"percentile_25" : 1.0,
"percentile_50" : 1.0,
"percentile_75" : 1.0,
"percentile_90" : 1.0,
"percentile_99" : 1.0
}
}
}
}
}
And for simpler curl I got:
curl -XGET 'localhost:9200/_bench?pretty'
{ }
So it seems that this functionality is so experimental that it's not yet included in any stable release.
I guess reference [master] in documentation header was not a coincidence ;)

ElasticSearch doesn't seem to support array lookups

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.
Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

ElasticSerach - Statistical facets on length of the list

I have the following sample mappipng:
{
"book" : {
"properties" : {
"author" : { "type" : "string" },
"title" : { "type" : "string" },
"reviews" : {
"properties" : {
"url" : { "type" : "string" },
"score" : { "type" : "integer" }
}
},
"chapters" : {
"include_in_root" : 1,
"type" : "nested",
"properties" : {
"name" : { "type" : "string" }
}
}
}
}
}
I would like to get a facet on number of reviews - i.e. length of the "reviews" array.
For instance, verbally spoken results I need are: "100 documents with 10 reviews, 20 documents with 5 reviews, ..."
I'm trying the following statistical facet:
{
"query" : {
"match_all" : {}
},
"facets" : {
"stat1" : {
"statistical" : {"script" : "doc['reviews.score'].values.size()"}
}
}
}
but it keeps failing with:
{
"error" : "SearchPhaseExecutionException[Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w][facettest][0]: QueryPhaseExecutionException[[facettest][0]: query[ConstantScore(NotDeleted(cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#a2a5984b)))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near : {... doc[reviews.score].values.size() ....}]
^
[Line: 1, Column: 5]]; }]",
"status" : 500
}
How can I achieve my goal?
ElasticSearch version is 0.19.9.
Here is my sample data:
{
"author" : "Mark Twain",
"title" : "The Adventures of Tom Sawyer",
"reviews" : [
{
"url" : "amazon.com",
"score" : 10
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
{
"author" : "Jack London",
"title" : "The Call of the Wild",
"reviews" : [
{
"url" : "amazon.com",
"score" : 8
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
},
{
"url" : "www.books.com",
"score" : 5
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
It looks like you are using curl to execute your query and this curl statement looks like this:
curl localhost:9200/my-index/book -d '{....}'
The problem here is that because you are using apostrophes to wrap the body of the request, you need to escape all apostrophes that it contains. So, you script should become:
{"script" : "doc['\''reviews.score'\''].values.size()"}
or
{"script" : "doc[\"reviews.score"].values.size()"}
The second issue is that from your description it looks like your are looking for a histogram facet or a range facet but not for a statistical facet. So, I would suggest trying something like this:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"key_script" : "doc[\"reviews.score\"].values.size()",
"value_script" : "doc[\"reviews.score\"].values.size()",
"interval" : 1
}
}
}
}'
The third problem is that the script in the facet will be called for every single record in the result list and if you have a lot of results it might take really long time. So, I would suggest indexing an additional field called number_of_reviews that should be populated with the number of reviews by your client. Then your query would simply become:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"field" : "number_of_reviews"
"interval" : 1
}
}
}
}'

Resources