Complex aggregations with Elastic Search - elasticsearch

Supposing this is my elasticsearch structure:
{
"_index": "my_index",
"_type": "person",
"_id": "ID",
"_source": {
...DATA...
}
}
{
"_index": "my_index",
"_type": "result",
"_id": "ID",
"_source": {
"personID": "personID"
"date": "timestamp",
"result": "integer",
"speciality": "categoryID"
}
}
I would like to get the most 10 most "influent" people based on:
number of competition in the last 30 days
number of competition in the last year
competition's results in the last 30 days
number of different specialities in the last 30 days
I'm thinking about using _score but I don't know how to influence the score using some values aggregated from the documents of type "result" . This is what I'm trying to achieve
POST my_index/_search?search_type=dfs_query_then_fetch
{
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"_type": {
"value": "person"
}
}
}
]
}
}
},
"functions": [
{
"field_value_factor": {
"field": {
"query": {
//competitions in the last 30 days
},
"aggs": {
//cout
}
},
"factor": 1
},
"weight": 0.1
}
]
}
}
Is this possible with just 1 request?
Is this a good approach?
Any tip on what to look at is appreciated

Related

elasticsearch - get intermediate scores within 'function_score'

Here's my index
POST /blogs/1
{
"name" : "learn java",
"popularity" : 100
}
POST /blogs/2
{
"name" : "learn elasticsearch",
"popularity" : 10
}
My search query:
GET /blogs/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": "learn"
}
},
"script_score": {
"script": {
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
}
}
}
}
}
which returns:
[
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxnperVbDy5wjSDBC",
"_score": 0.58024323,
"_source": {
"name": "learn elastic search",
"popularity": 100
}
},
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxqmL8cCMCxtBYOyC",
"_score": 0.43638366,
"_source": {
"name": "learn java",
"popularity": 10
}
}
]
Problem: I need to return an extra field in results which would give me raw score (just tf/idf which doesn't take popularity into account)
Things I have explored: script_fields (which doesn't give access to _score at fetch time.
The problem is in the way you are querying, which over-writes the _score variable. Instead if you use sort then _score isn't changed and can be pulled up within the same query.
You can try querying this way :
{
"query": {
"match": {
"name": "learn"
}
},
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
},
"order": "desc"
}
},
"_score"
]
}

Elastic scripted aggregation of nested key/value pairs

I want to have a scripted aggregation of key/value pairs in a nested array in elastic. An example of the documents returned is as follows:
"hits": [
{
"_index": "testdan",
"_type": "year",
"_id": "AVtXirjYuoFS95t7pfkg",
"_score": 1,
"_source": {
"m_iYear": 2006,
"m_iTopicID": 11,
"m_People": [
{
"name": "Petrovic, Rade",
"value": 3.70370364
},
{
"name": "D. Kirovski",
"value": 3.70370364
}
]
}},
{
"_index": "testdan",
"_type": "year",
"_id": "AVtXirjYuoFS95t7pfkg",
"_score": 1,
"_source": {
"m_iYear": 2007,
"m_iTopicID": 11,
"m_People": [
{
"name": "Petrovic, Rade",
"value": 6.70370364
},
{
"name": "D. Kirovski",
"value": 2.70370364
}
]
}}
]
I would like to aggregate an average value for each person in m_Person over each document, as follows:
Petrovic, Rade = 3.70370364 + 6.70370364 / 2 = 7.05
D. Kirovski = 3.70370364 + 2.70370364 / 2 = 5.05
The division for the average should be calculated by the number of years that name appears.. One year may not show only one name for instance.
If this is more difficult due to not having unique IDs for people, I plan to add an ID for each person, but how would you go about scripting this so instead of returning all people, and needing to loop through at front-end, I can just have an array of people and their averages?
You may be able to achieve this sort of aggregation by utilizing Kibana Scripted Fields. See the examples section. This assumes you are using Elasticsearch 5.0 as the scripting language is Painless.
You can achieved this with a nested aggregation pretty easily. For each year, we're aggregating on the people's names and then computing the average value for each of them.
{
"size": 0,
"aggs": {
"years": {
"terms": {
"field": "m_iYear"
},
"aggs": {
"names": {
"nested": {
"path": "m_People"
},
"aggs": {
"names": {
"terms": {
"field": "m_People.name"
},
"aggs": {
"average": {
"avg": {
"field": "m_People.value"
}
}
}
}
}
}
}
}
}
}

Query for : How many elements of an array are matching in a document attribute in ElasticSearch

I've many documents having an attribute that is an array of values like these:
{
"_index": "myindex",
"_type": "mytype",
"_id": "myid1",
"_source": {
"tags": [
"devid",
"batman",
"obama"
]
}
},
{
"_index": "myindex",
"_type": "mytype",
"_id": "myid2",
"_source": {
"tags": [
"devid",
"superman"
]
}
}
I have an array of elements like: ["devid", "batman", "pippo"]
I want to get all the documents matching at least one element of the array, sorted by how many elements are matched.
For example, I expect that myid1 will have an higher score than myid2.
How can I do this?
At the moment I'm "stuck" here:
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"tags": ["devid", "batman", "pippo"]
}
}
}
}
}
}
}
It only filters by terms and sets 1 as score to both.
I'm noob with elasticsearch any hint is welcome!
Using the terms query instead of filter would result in documents with more terms matching get a higher score.
Example :
{
"query": {
"terms": {
"tags": [
"devid",
"batman",
"pippo"
]
}
}
}

Boosting results based on selected types in elasticsearch

I have different types indexed in elastic search.
but, if I want to boost my results on some selected types then what should I do?
I could use type filter in boosting query, but type filter allows me only one type to be used in filter. I need results to be boosted on the basis of multiple types.
Example:
I have Person, Event, Location data indexed in elastic search where Person, Location and Event are my types.
I am searching for keyword 'London' in all types but i want Person and Event type records to be boosted than Location.
How could I achieve the same?
One of the ways of getting the desired functionality is by wrapping your query inside a bool query and then make use of the should clause, in order to boost certain documents
Small example:
POST test/person
{
"title": "london elise moore"
}
POST test/event
{
"title" : "london is a great city"
}
Without boost:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
]
}
}
}
With the following response:
"hits": {
"total": 2,
"max_score": 0.2972674,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.2972674,
"_source": {
"title": "london elise moore"
}
},
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 0.26010898,
"_source": {
"title": "london is a great city"
}
}
]
}
And now with the added should clause:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
],
"should": [
{
"term": {
"_type": {
"value": "event",
"boost": 2
}
}
}
]
}
}
}
Which gives back the following response:
"hits": {
"total": 2,
"max_score": 1.0326607,
"hits": [
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 1.0326607,
"_source": {
"title": "london is a great city"
}
},
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.04235228,
"_source": {
"title": "london elise moore"
}
}
]
}
You could even leave out the extra boost in the should clause, cause if the should clause matches it will boost the result :)
Hope this helps!
I see two ways of doing that using that but both is using scripts
1. using sorting
POST c1_1/_search
{
"from": 0,
"size": 10,
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": "double boost = 1; if(doc['_type'].value == 'Person') { boost *= 2 }; if(doc['_type'].value == 'Event') { boost *= 3}; return _score * boost; ",
"params": {}
}
},
{
"_score": {}
}
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
}
}
Second option Using function score.
POST c1_1/_search
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
},
"script_score": {
"script": "_score * (doc['_type'].value == 'Person' || doc['_type'].value == 'Event'? 2 : 1)"
}
}
}
}

How to get total score specific to each row

I need, Elasticsearch GET query to view the total score of each and every students by adding up the marks earned by them in all the subject rather I am getting total score of all the students in every subject.
GET /testindex/testindex/_search
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
}
}
},
"aggs": {
"total": {
"sum": {
"script" : "doc['physics'].value + doc['maths'].value + doc['chemistry'].value"
}
}
}
}
Output
{
....
"hits": [
{
"_index": "testindex",
"_type": "testindex",
"_id": "1",
"_score": 1,
"_source": {
"personalDetails": {
"name": "viswa",
"age": "33"
},
"marks": {
"physics": 18,
"maths": 5,
"chemistry": 34
},
"remarks": [
"hard working",
"intelligent"
]
}
},
{
"_index": "testindex",
"_type": "testindex",
"_id": "2",
"_score": 1,
"_source": {
"personalDetails": {
"name": "bob",
"age": "13"
},
"marks": {
"physics": 48,
"maths": 45,
"chemistry": 44
},
"remarks": [
"hard working",
"intelligent"
]
}
}
]
},
"aggregations": {
"total": {
"value": 194
}
}
}
Expected Output:
I would like to get total mark earned in subjects of each and every student rather than total of all the students.
What changes I need to do in the query to achieve this.
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"aggs": {
"student": {
"terms": {
"field": "personalDetails.name",
"size": 10
},
"aggs": {
"total": {
"sum": {
"script": "doc['physics'].value + doc['maths'].value + doc['chemistry'].value"
}
}
}
}
}
}
But, be careful, for student terms aggregation you need a "unique" (something that makes that student unique - like a personal ID or something) field, maybe the _id itself, but you need to store it.

Resources