Query to get random n items from top 100 items in Elastic Search - elasticsearch

I need to write a query in elasticsearch to get random 12 items in the top 100 sorted items.
I tried something like this, but I am unable to get random 12 items(I can get only the top 12 items).
The query I used:
GET product/_search
{
"sort": [
{
"DateAdded": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"definitionName": {
"value": "ABC"
}
}
},
{
"range": {
"price": {
"gt": 0
}
}
}
]
}
},
"functions": [
{
"random_score": {
"seed": 314159265359
}
}
]
}
},
"size": 12
}
Can anybody guide me where am I going wrong? (I am a beginner in writing ElasticQueries)
Thanks in Advance.

EDIT: doesnot work, window_size recalculate score on the X top results.
Also:
need to set: "track_scores" to true at the top level.
corect syntax is:
"rescore": {
"window_size": 10,
"query": {
"score_mode": "max", //wathever
"rescore_query": {
"bool": {
"should": [
{
//your query here - you can use a function or a script score too
}
]
}
},
"query_weight": 0.7,
"rescore_query_weight": 1.2
}
}
Ok i understand better.
Indeed you have to sort by date (top 100) and rescore with a random function (read https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-post-filter).
Should be something like:
{
"sort": [
{
"DateAdded": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"term": {
"definitionName": {
"value": "ABC"
}
}
},
{
"range": {
"price": {
"gt": 0
}
}
}
]
}
},
"size": 100,
"rescore": {
"window_size": 12,
"query": {
"rescore_query": {
"random_score": {
"seed": 314159265359
}
}
}
}
}

Related

Aggregation not taking place on basis of size paramter passed in ES query

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.
from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

How to get 3 random search results in elasticserch query

I have my elasticsearch query that returns record between the range of publishedDates:
{
query : {
bool: {
filter: [
],
must: {
range: {
publishedDate: {
gte: "2018-11-01",
lte: "2019-03-30"
}
}
}
}
}
from: 0,
size: 3,
}
I need to show 3 random results every time I send this query
It is mentioned in the elastic search documentation that I can send a seed to get random results:
After following the documentation, I updated my query as:
{
"query" : {
"bool": {
"filter": [
],
"must": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
}
},
"function_score": {
"functions": [
{
"random_score": {
"seed": "123123123"
}
}
]
}
},
"from": 0,
"size": 3
}
But it is not working (saying query is malformed), can anyone suggest how to correct this query to return 3 random search results.
If you just need random results returned, you could restructure the query to be similar to the following
{
"query": {
"function_score": {
"query": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
},
"boost": "5",
"random_score": {},
"boost_mode": "multiply"
}
},
"from": 0,
"size": 3
}
Modified from the elastic documentation -
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

Elasticsearch must_not filter not works with a big bunch of values

I have the next query that include some filters:
{
"from": 0,
"query": {
"function_score": {
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"idpais": [
115
]
}
},
{
"term": {
"tipo": [
1
]
}
}
],
"must_not": [
{
"term": {
"idregistro": [
5912471,
3433876,
9814443,
11703069,
6333176,
8288242,
9924922,
6677850,
11852501,
12530205,
4703469,
12776479,
12287659,
11823679,
12456304,
12777457,
10977614,
...
]
}
}
]
}
},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"area": "Coordinator"
}
},
{
"match_phrase": {
"company": {
"boost": 5,
"query": "IBM"
}
}
},
{
"match_phrase": {
"topic": "IT and internet stuff"
}
},
{
"match_phrase": {
"institution": {
"boost": 5,
"query": "University of my city"
}
}
}
]
}
}
}
},
"script_score": {
"params": {
"idpais": 115,
"idprovincia": 0,
"relationships": []
},
"script_id": "ScoreUsuarios"
}
}
},
"size": 24,
"sort": [
{
"_script": {
"order": "desc",
"script_id": "SortUsuarios",
"type": "number"
}
}
]
}
The must_not filter has a big bunch of values to exclude (around 200 values), but it looks like elasticsearch ignores those values and it includes on the result set. If I try to set only a few values (10 to 20 values) then elasticsearch applies the must_not filter.
Exists some restriction a bout the amount of values in the filters? Exists some way to remove a big amount of results from the query?
terms query is used for passing a list of values not term query.You have to use it like below in your must filter.
{
"query": {
"terms": {
"field_name": [
"VALUE1",
"VALUE2"
]
}
}
}

How to Boost a field based on condition in ElasticSearch

I am having a query structure like
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match_phrase": {
"user_agencies": "Census"
}
},
{
"match_phrase": {
"user_agencies": "MDA"
}
},
{
"match_phrase": {
"user_agencies": "OSD"
}
}
]
}
},
"size": 500,
"from": 0
}
Suppose this will return a list of 10 users.
What I need to get is, the user having Agency: 'Census' to be the first one in the search result (boost the results having Census as agency). How can we do this?
The following will do it. I converted some of the match_phrase queries to match queries as they contain only single terms
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match": {
"user_agencies": {
"query": "Census",
"boost": 3
}
}
},
{
"match": {
"user_agencies": {
"query": "MDA",
}
},
{
"match": {
"user_agencies": {
"query": "OSD",
}
}
]
}
},
"size": 500,
"from": 0
}
You should boost at query time, and give a big boost documents with "Census" in the agency field. If the boost is high enough, a document matching "Census" will always be on top, regardless of the values for the other fields.
{
"sort": {},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"user_categories": "Grant Writing"
}
},
{
"match_phrase": {
"user_agencies": "Census", "boost": 10
}
},
{
"match_phrase": {
"user_agencies": "MDA"
}
},
{
"match_phrase": {
"user_agencies": "OSD"
}
}
]
}
},
"size": 500,
"from": 0
}

Resources