Elasticsearch apply condintions in query on basis of results count - elasticsearch

Is there any way in Elasticsearch for following type of outcome
"Apply first condition, if no results found then apply next conditions and so on.."
I am aware of basics of ES queries. I know this can be done by querying again and again on results basis but I want to do this in single query for the sake of time and efficiency.
Here is my current query
GET_search{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 28.6143519,
"lon": -81.50773
},
"bottom_right": {
"lat": 28.3479859,
"lon": -81.22977
}
}
}
}
]
}
}
}
},
"size": 10,
"from": 0,
"sort": {
"search_score": {
"order": "desc"
}
}
}
Now what I want to do is, if this query hits zero results then this should search for another increased set of lat lon bounds. I can do this by requering elasticsearch but it will be an inefficient way.
I want to know if is this possible in elasticsearch?

Related

How to sort before aggregation in search query?

Here is what I am trying to achieve. Let's say that my origin location is Chicago, IL. I get over 5,000 listings from Chicago if I search listings.
I want to get nearby cities from these listings. So, I have decided to used aggregations in search query to get the nearby cities. This is the example output: [{Illinois {[{Ottawa} {Chicago} ... ]}}.
However, this does not give me nearby ones first. I believe this is because listings(when 5,000 are searched) are not sorted by nearby ones. The default of aggregations size is set to 10, so if the nearby listings are way behind in the listings, they will not show in the result.
I have tried to use sort outside of aggregations but there was no difference.
"sort": [
{
"_geo_distance": {
"distance_type": "plane",
"location": {
"lat": 41.8781136,
"lan": -87.6297982
},
"mode": "min",
"order": "asc",
"unit": "mi",
}
}
]
So, I believe I need to use sort inside of the aggregation but I am not sure how I could do this with this nested aggregation.
My listings model have this: (but nothing about the distance so I am pretty sure I need to get that by using _geo_distance)
Listing = {
state: "Illinois"
city: "Chicago"
...
}
Here is my search query:
"aggs": {
"states": {
"aggs": {
"cities": {
"terms": {
"fields": "city.keyword"
}
}
},
"terms": {
"field": "state.keyword"
}
}
},
"query": {
"bool": {
"filter": [
{
"geo_distance": {
"distance": "100mi",
"distance_type": "plane",
"location": {
"lat": 41.8910166,
"lon": -87.6198982,
}
}
}
...(rest of the search query)
I do not want to sort bucket after the aggregation. I want to sort by nearby ones first then apply the aggregation to have nearby ones in the result. Any help please?

ElasticSearch - score boosting using scripting

We have a specific use-case for our ElasticSearch instance: we store documents which contain proper names, dates of birth, addresses, ID numbers, and other related info.
We use a name-matching plugin which overrides the default scoring of ES and assigns a relevancy score between 0 and 1 based on how closely the name matches.
What we need to do is boost that score by a certain amount if other fields match. I have started to read up on ES scripting to achieve this. I need assistance on the script part of the query. Right now, our query looks like this:
{
"size":100,
"query":{
"bool":{
"should":[
{"match":{"Name":"John Smith"}}
]
}
},
"rescore":{
"window_size":100,
"query":{
"rescore_query":{
"function_score":{
"doc_score":{
"fields":{
"Name":{"query_value":"John Smith"},
"DOB":{
"function":{
"function_score":{
"script_score":{
"script":{
"lang":"painless",
"params":{
"query_value":"01-01-1999"
},
"inline":"if **<HERE'S WHERE I NEED ASSISTANCE>**"
}
}
}
}
}
}
}
}
},
"query_weight":0.0,
"rescore_query_weight":1.0
}
}
The Name field will always be required in a query and is the basis for the score, which is returned in the default _score field; for ease of demonstration, we'll just add one additional field, DOB, which if matched, should boost the score by 0.1. I believe I'm looking for something along the lines of if(query_value == doc['DOB'].value add 0.1 to _score), or something along these lines.
So, what would be the correct syntax to be entered into the inline row to achieve this? Or, if the query requires other syntax revision, please advise.
EDIT #1 - it's important to highlight that our DOB field is a text field, not a date field.
Splitting to a separate answer as this solves the problem differently (i.e. - by using script_score as OP proposed instead of trying to rewrite away from scripts).
Assuming the same mapping and data as the previous answer, a scripted version of the query might look like the following:
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": "double boost = 0.0; if (params['_source']['State'] == 'FL') { boost += 0.1; } if (params['_source']['DOB'] == '1965-05-24') { boost += 0.3; } return boost;",
"lang": "painless"
}
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
Two notes about the script:
The script uses params['_source'][field_name] to access the document, which is the only way to get access to text fields. This is significantly slower as it requires accessing documents directly on disk, though this penalty might not be too bad in the context of a rescore. You could instead use doc[field_name].value if the field was an aggregatable type, such as keyword, date, or something numeric
DOB here is compared directly to a string. This is possible because we're using the _source field, and the JSON for the documents has the dates specified as strings. This is somewhat brittle, but likely will do the trick
Assuming static weights per additional field, you can accomplish this without using scripting (though you may need to use script_score for any more complex weighting). To solve your issue of directly adding to a document's original score, your rescoring query will need to be a function score query that:
Composes queries for additional fields in a should clause for the function score's main query (i.e. - will only produce scores for documents matching at least one additional field)
Uses one function per additional field, with the filter set to select documents with some value for that field, and a weight to specify how much the score should increase (or some other scoring function if desired)
Mapping (as template)
Adding a State and DOB field for sake of example (making sure multiple additional fields contribute to the score correctly)
PUT _template/employee_template
{
"index_patterns": ["employee"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"Name": {
"type": "text"
},
"State": {
"type": "keyword"
},
"DOB": {
"type": "date"
}
}
}
}
}
Sample data
POST /employee/_doc/_bulk
{"index":{}}
{"Name": "John Smith", "State": "NY", "DOB": "1970-01-01"}
{"index":{}}
{"Name": "John C. Reilly", "State": "CA", "DOB": "1965-05-24"}
{"index":{}}
{"Name": "Will Ferrell", "State": "FL", "DOB": "1967-07-16"}
Query
EDIT: Updated the query to include the original query in the new function score in an attempt to compensate for custom scoring plugins.
A few notes about the query below:
Setting the rescorers score_mode: max is effectively a replace here, since the newly computed function score should only be greater than or equal to the original score
query_weight and rescore_query_weight are both set to 1 such that they are compared on equal scales during score_mode: max comparison
In the function_score query:
score_mode: sum will add together all the scores from functions
boost_mode: sum will add the sum of the functions to the score of the query
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
],
"filter": {
"bool": {
"should": [
{
"term": {
"State": "CA"
}
},
{
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
}
]
}
}
}
},
"functions": [
{
"filter": {
"term": {
"State": "CA"
}
},
"weight": 0.1
},
{
"filter": {
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
},
"weight": 0.3
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"score_mode": "max",
"query_weight": 1,
"rescore_query_weight": 1
}
}
}

Elasticsearch geo distance query range

Hi for a project I am trying to return users from elastic search which are in a range of 2km to 4km away from the search user.
I use the below query
`
{
"size": 1000,
"from": 0,
"_source": "user_id",
"query":{
"bool":{
"must_not": {
"terms": {
"user_id": []
}
},
"filter":[
{
"geo_distance_range":{
"from":"2km",
"to": "4km",
"location":{
"lon":-122.4194,
"lat":37.7749
}
}
}
]
}
}
}`
This query is deleted in elastic search version 6.3 which is the version I am using.
Can anyone please help me solve this use case in elastic search 6.3? Aggregations only returns the number of users in the range but I want to return complete results of all users in the range.
I can't test this, but it seems reasonable that you should be able to combine must and must_not clauses with geo_distance:
"query": {
"bool": {
"must_not": {
"terms": {
"user_id": []
},
"geo_distance": {
"distance": "2km",
"location": [-122.4194, 37.7749]
}
},
"must": {
"geo_distance": {
"distance": "4km",
"location": [-122.4194, 37.7749]
}
}
}
}

Elasticsearch - search across multiple indices with conditional decay function

I'm trying to search across multiple indices with one query, but only apply the gaussian decay function to a field that exists on one of the indices.
I'm running this through elasticsearch-api gem, and that portion works just fine.
Here's the query I'm running in marvel.
GET episodes,shows,keywords/_search?explain
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "AWESOME SAUCE",
"type": "most_fields",
"fields": [ "title", "summary", "show_title"]
}
},
"functions": [
{ "boost_factor": 2 },
{
"gauss": {
"published_at": {
"scale": "4w"
}
}
}
],
"score_mode": "multiply"
}
},
"highlight": {
"pre_tags": ["<span class='highlight'>"],
"post_tags": ["</span>"],
"fields": {
"summary": {},
"title": {},
"description": {}
}
}
}
The query works great for the episodes index because it has the published_at field for the gauss func to work its magic. However, when run across all indices, it fails for shows and keywords (still succeeds for episodes).
Is it possible to run a conditional gaussian decay function if the published_at field exists or on the single episodes index?
I'm willing to explore alternatives (i.e. run separate queries for each index and then merge the results), but thought a single query would be the best in terms of performance.
Thanks!
You can add a filter to apply those gaussian decay function only to a subset of documents:
{
"filter": {
"exists": {
"field": "published_at"
}
}
"gauss": {
"published_at": {
"scale": "4w"
}
}
}
For docs that don't have the field you can return a score of 0:
{
"filter": {
"missing": {
"field": "published_at"
}
}
"script_score": {
"script": "0"
}
}
In the newer elasticsearch versions you have to use the script score query. The function score query is getting deprecated.

Elasticsearch: Limit filtered query to 5 items per type per day

I'm using elasticsearch to gather data for my frontpage on my event-portal. the current query is as follows:
{
"query": {
"function_score": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "50km",
"location": {
"lat": 50.78,
"lon": 6.08
},
"_cache": true
}
},
{
"or": [
{
"and": [
{
"term": {
"type": "event"
}
},
{
"range": {
"datetime": {
"gt": "now"
}
}
}
]
},
{
"not": {
"term": {
"type": "event"
}
}
}
]
}
]
},
"functions": [
...
]
}
}
}
So basically all events in an 50km distance which are future events or other types. Other types could be status, photo, video, soundcloud etc... All these items have a datetime field and a parent field which account the items belongs to. There are some functions after the filter for scoring objects based on there distance and age.
Now my question:
Is there a way to filter the query to get only the first (or even better highest scored) 5 items per type per account per day?
So currently I have accounts which upload 20 images at the same time. This is too much to display on the frontpage.
I thought about using filter scripts in a post_filter. But i am not very familiar with this topic.
Any ideas?
many thanks in advance
DTFagus
I solved it this way:
"aggs": {
"byParent": {
"terms": {
"field": "parent_id"
},
"aggs": {
"byType": {
"terms": {
"field": "type"
},
"aggs": {
"perDay": {
"date_histogram" : {
"field" : "datetime",
"interval": "day"
},
"aggs": {
"topHits": {
"top_hits": {
"size": 5,
"_source": {
"include": ["path"]
}
}
}
}
}
}
}
}
}
}
Unfortunately there is no pagination for aggregations (or other way around: the pagination of the query is not used). So I will get the paginated query results and the aggregation of all hits and intersect the arrays in js. Does not sound very efficient but I currently have no better idea. Anyone?
The only way around this I see would be to index all data into two indices. One containing all data and one with only the top 5 per day per type per account. This would be less time consuming to query but more time and storage consuming while indexing :/
You can limit the number of results returned by your query using the "size" parameter.if you set size to 5, then you will get the first 5 results returned by your query.
Check the documentation http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/pagination.html
Hope this helps!

Resources