Elasticsearch find documents based on result of a main query - elasticsearch

I want to search documents based on the field of the result main query
For ex. Let say that my doc contains only two fields
userId
geopoint
I need a query that return me the document of a specific userId and documents of users that are around his geopoint
I didn't find a way to make this in one query and for now I making 2 queries (one to retrieve the doc of a user and one to retrieve users around his geopoint)
Thanks
UPDATE 1
The first query:
GET users\_search
{
"query": {
"term": {
"userId": "10250000075114"
}
}
}
Then I make the second query for users around it
GET users\_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must_not": {
"term": {
"userId": "10250000075114"
}
}
}
},
"functions": [
{
"gauss": {
"rank": {
"origin": "0.8",
"offset": "0.05",
"scale": "0.1"
}
}
},
{
"gauss": {
"startPoint": {
"origin": "32.547484,34.95457",
"offset": "5km",
"scale": "10km"
}
}
},
{
"script_score": {
"script": "_score"
}
}
]
}
}
}
Where the startPoint in the second query is the startPoint result of the first

what you are looking for is the sub-query(which is present in RDBMS) but sub-queries are not present in Elasticsearch.
But you can use the filter on your user-ids and then find the users around only those users, please refer boolean query for more info and examples.

Related

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

update and retrieve in a single query elasticsearch

I want to update the status field to "IN_PROGRESS" from "FAILED" to all the docs in one of the ElasticSearch index that matches this below query and retrieve updated docs.
{
"query": {
"bool": {
"must": {
"match": { "status": "FAILED" }
},
"filter": [
{
"range": {
"count": { "gte": "2" }
}
},
{
"range": {
"updated": { "gte": "now-2h" }
}
}
]
}
}
}
I know I can achieve this by two queries (update_by_query to update and GET to retrieve all the updated docs). .The Problem is that I want to update and retrieve all the updated docs in a single query .
Is there any efficient way where I can perform this in a single query.
You can use below query with "_source": false which will return _id for all the documents.
POST multiapi/_search
{
"_source": false,
"query": {
"term": {
"status.keyword": {
"value": "FAILED"
}
}
}
}
From response you can get all the _ids and pass to the below Ids query.
POST multiapi/_update_by_query
{
"query": {
"ids": {
"values": ["M1BbcX4Bo1YkEVbN1wG1","NFBbcX4Bo1YkEVbN3gHm"]
}
},
"script": {
"source": "ctx._source['status'] = 'IN_PROGRESS'"
}
}
Also, if your index have large documents set then use search_after to retrive more then 10k documents.

ElasticSearch - score boosting using scripting

We have a specific use-case for our ElasticSearch instance: we store documents which contain proper names, dates of birth, addresses, ID numbers, and other related info.
We use a name-matching plugin which overrides the default scoring of ES and assigns a relevancy score between 0 and 1 based on how closely the name matches.
What we need to do is boost that score by a certain amount if other fields match. I have started to read up on ES scripting to achieve this. I need assistance on the script part of the query. Right now, our query looks like this:
{
"size":100,
"query":{
"bool":{
"should":[
{"match":{"Name":"John Smith"}}
]
}
},
"rescore":{
"window_size":100,
"query":{
"rescore_query":{
"function_score":{
"doc_score":{
"fields":{
"Name":{"query_value":"John Smith"},
"DOB":{
"function":{
"function_score":{
"script_score":{
"script":{
"lang":"painless",
"params":{
"query_value":"01-01-1999"
},
"inline":"if **<HERE'S WHERE I NEED ASSISTANCE>**"
}
}
}
}
}
}
}
}
},
"query_weight":0.0,
"rescore_query_weight":1.0
}
}
The Name field will always be required in a query and is the basis for the score, which is returned in the default _score field; for ease of demonstration, we'll just add one additional field, DOB, which if matched, should boost the score by 0.1. I believe I'm looking for something along the lines of if(query_value == doc['DOB'].value add 0.1 to _score), or something along these lines.
So, what would be the correct syntax to be entered into the inline row to achieve this? Or, if the query requires other syntax revision, please advise.
EDIT #1 - it's important to highlight that our DOB field is a text field, not a date field.
Splitting to a separate answer as this solves the problem differently (i.e. - by using script_score as OP proposed instead of trying to rewrite away from scripts).
Assuming the same mapping and data as the previous answer, a scripted version of the query might look like the following:
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": "double boost = 0.0; if (params['_source']['State'] == 'FL') { boost += 0.1; } if (params['_source']['DOB'] == '1965-05-24') { boost += 0.3; } return boost;",
"lang": "painless"
}
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
Two notes about the script:
The script uses params['_source'][field_name] to access the document, which is the only way to get access to text fields. This is significantly slower as it requires accessing documents directly on disk, though this penalty might not be too bad in the context of a rescore. You could instead use doc[field_name].value if the field was an aggregatable type, such as keyword, date, or something numeric
DOB here is compared directly to a string. This is possible because we're using the _source field, and the JSON for the documents has the dates specified as strings. This is somewhat brittle, but likely will do the trick
Assuming static weights per additional field, you can accomplish this without using scripting (though you may need to use script_score for any more complex weighting). To solve your issue of directly adding to a document's original score, your rescoring query will need to be a function score query that:
Composes queries for additional fields in a should clause for the function score's main query (i.e. - will only produce scores for documents matching at least one additional field)
Uses one function per additional field, with the filter set to select documents with some value for that field, and a weight to specify how much the score should increase (or some other scoring function if desired)
Mapping (as template)
Adding a State and DOB field for sake of example (making sure multiple additional fields contribute to the score correctly)
PUT _template/employee_template
{
"index_patterns": ["employee"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"Name": {
"type": "text"
},
"State": {
"type": "keyword"
},
"DOB": {
"type": "date"
}
}
}
}
}
Sample data
POST /employee/_doc/_bulk
{"index":{}}
{"Name": "John Smith", "State": "NY", "DOB": "1970-01-01"}
{"index":{}}
{"Name": "John C. Reilly", "State": "CA", "DOB": "1965-05-24"}
{"index":{}}
{"Name": "Will Ferrell", "State": "FL", "DOB": "1967-07-16"}
Query
EDIT: Updated the query to include the original query in the new function score in an attempt to compensate for custom scoring plugins.
A few notes about the query below:
Setting the rescorers score_mode: max is effectively a replace here, since the newly computed function score should only be greater than or equal to the original score
query_weight and rescore_query_weight are both set to 1 such that they are compared on equal scales during score_mode: max comparison
In the function_score query:
score_mode: sum will add together all the scores from functions
boost_mode: sum will add the sum of the functions to the score of the query
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
],
"filter": {
"bool": {
"should": [
{
"term": {
"State": "CA"
}
},
{
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
}
]
}
}
}
},
"functions": [
{
"filter": {
"term": {
"State": "CA"
}
},
"weight": 0.1
},
{
"filter": {
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
},
"weight": 0.3
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"score_mode": "max",
"query_weight": 1,
"rescore_query_weight": 1
}
}
}

Elasticsearch - Aggregations on part of bool query

Say I have this bool query:
"bool" : {
"should" : [
{ "term" : { "FirstName" : "Sandra" } },
{ "term" : { "LastName" : "Jones" } }
],
"minimum_should_match" : 1
}
meaning I want to match all the people with first name Sandra OR last name Jones.
Now, is there any way that I can get perform an aggregation on all the documents that matched the first term only?
For example, I want to get all of the unique values of "Prizes" that anybody named Sandra has. Normally I'd just do:
"query": {
"match": {
"FirstName": "Sandra"
}
},
"aggs": {
"Prizes": {
"terms": {
"field": "Prizes"
}
}
}
Is there any way to combine the two so I only have to perform a single query which returns all of the people with first name Sandra or last name Jones, AND an aggregation only on the people with first name Sandra?
Thanks alot!
Use post_filter.
Please refer the following query. Post_filter will make sure that your bool should clause don't effect your aggregation scope.
Aggregations are filtered based on main query as well, but they are unaffected by post_filter. Please refer to the link
{
"from": 0,
"size": 20,
"aggs": {
"filtered_lastname": {
"filter": {
"query": {
"match": {
"FirstName": "sandra"
}
}
},
"aggs": {
"prizes": {
"terms": {
"field": "Prizes",
"size": 10
}
}
}
}
},
"post_filter": {
"bool": {
"should": [{
"term": {
"FirstName": "Sandra"
}
}, {
"term": {
"LastName": "Jones"
}
}],
"minimum_should_match": 1
}
}
}
Running a filter inside the aggs before aggregating on prizes can help you achieve your desired usecase.
Thanks
Hope this helps

Elasticsearch - search across multiple indices with conditional decay function

I'm trying to search across multiple indices with one query, but only apply the gaussian decay function to a field that exists on one of the indices.
I'm running this through elasticsearch-api gem, and that portion works just fine.
Here's the query I'm running in marvel.
GET episodes,shows,keywords/_search?explain
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "AWESOME SAUCE",
"type": "most_fields",
"fields": [ "title", "summary", "show_title"]
}
},
"functions": [
{ "boost_factor": 2 },
{
"gauss": {
"published_at": {
"scale": "4w"
}
}
}
],
"score_mode": "multiply"
}
},
"highlight": {
"pre_tags": ["<span class='highlight'>"],
"post_tags": ["</span>"],
"fields": {
"summary": {},
"title": {},
"description": {}
}
}
}
The query works great for the episodes index because it has the published_at field for the gauss func to work its magic. However, when run across all indices, it fails for shows and keywords (still succeeds for episodes).
Is it possible to run a conditional gaussian decay function if the published_at field exists or on the single episodes index?
I'm willing to explore alternatives (i.e. run separate queries for each index and then merge the results), but thought a single query would be the best in terms of performance.
Thanks!
You can add a filter to apply those gaussian decay function only to a subset of documents:
{
"filter": {
"exists": {
"field": "published_at"
}
}
"gauss": {
"published_at": {
"scale": "4w"
}
}
}
For docs that don't have the field you can return a score of 0:
{
"filter": {
"missing": {
"field": "published_at"
}
}
"script_score": {
"script": "0"
}
}
In the newer elasticsearch versions you have to use the script score query. The function score query is getting deprecated.

Resources