Elasticsearch results ORDERed BY and GROUPed BY - sorting

I have this mapping model in my elasticsearch index :
{
"my_index": {
"mappings": {
"vehicules": {
"properties": {
"name": {
"type": "text"
},
"category_id": {
"type": "integer"
},
"price": {
"type": "float"
},
}
}
}
}
}
In this index, I've insert some demo data :
+--------------+-------------+-------+
| Name | Category ID | Price |
+--------------+-------------+-------+
| Car 1 | 1 | 1500 |
| Car 2 | 1 | 4000 |
| Car 3 | 1 | 2500 |
| Motorcycle 1 | 2 | 3000 |
| Motorcycle 2 | 2 | 1400 |
| Motorcycle 3 | 2 | 2700 |
| Truck 1 | 3 | 19000 |
| Truck 2 | 3 | 15000 |
+--------------+-------------+-------+
I would like to sort all the product based on price value ASC, and group the results by category. The categories themselves have to be sorted with the price value of her child data. Which give :
{
"2": [ <= Category where the price start with lower price (1400)
{
"name": "Motorcycle 2",
"price": 1400
},
{
"name": "Motorcycle 3",
"price": 2700
},
{
"name": "Motorcycle 1",
"price": 3000
}
],
"1": [
{
"name": "Car 1",
"price": 1500
},
{
"name": "Car 3",
"price": 2500
},
{
"name": "Car 2",
"price": 4000
}
],
"3": [
{
"name": "Truck 2",
"price": 15000
},
{
"name": "Truck 1",
"price": 19000
}
]
}
Is it possible to have that kind of results or something close to it with ES ? I'm a very beginner with ES and I've tried many different query in the DevTool of Kibana, without success.
I think I found the query to have the desired result. I'm not sure it's fully optimized, but it works.
GET my_index/my_type/_search
{
"size": 0,
"aggs": {
"grouped_by_cat": {
"terms": {
"field": "category_id",
"order": {
"min_price_aggs": "asc"
}
},
"aggs": {
"min_price_aggs": {
"min": {
"field": "price"
}
},
"list_top_hits": {
"top_hits": {
"_source": {
"includes": [
"name",
"price"
]
},
"sort": [
{
"price": {
"order": "asc"
}
}
],
"size": 10000
}
}
}
}
}
}
Does this query seems correct to you?

So you can achieve the above result by following query
GET my_index/vehicles/_search
{
"size": 0,
"aggs" : {
"category_id_aggs": {
"terms" : {
"field" : "category_id"
},
"aggs": {
"data": {
"top_hits": {
"size": 10,
"sort": [{"price":"asc"}],
"_source": {
"includes" : ["name","price"]
}
}
}
}
}
}
}

Related

Elastic Search Apply Filter To Aggregated data

I have an index with user locations and I want to pick latest location for each user and then apply somewhat complex filter to latest locations. I managed to pickup latest locations using aggregation and sort however I cannot find a way to apply filter to it after.
Filter or post_filter do not produce expected results because they are applied to hits not to aggregated locations. I saw some comments about Bucket script aggregation but available examples are very simple.
Is there a way to do in Elastic Search? It is very easy script in SQL. Any help is appreciated!
Sample ES query to aggregate locations:
GET locations/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
}
}
},
"aggs": {
"users": {
"terms": {
"field": "user_id"
},
"aggs": {
"userLocations": {
"top_hits": {
"sort": [{
"created": {
"order": "desc"
}
}],
"size": 1
}
}
}
}
}
}
Sample SQL query (oversimplified, not based on actual data):
SELECT subquery.* FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY UserId ORDER BY Created DESC) AS "RowNumber",
UserId,
Latitute,
Longitude,
Active,
OrganizationId
FROM Locations
) subquery
WHERE subquery.RowNumber = 1 AND Latitude < 10 AND Longitude > 0 AND Active = 1 AND OrganizationId = 123
Sample mapping:
{
"mappings": {
"properties": {
"user_id": {
"type": "integer"
},
"location": {
"type": "geo_point"
},
"location_source_type": {
"type": "integer"
},
"created": {
"type": "date"
}
"active": {
"type": "integer"
},
"org_id": {
"type": "integer"
}
}
}
}
Sample data:
user_id location location_source_type created active org_id
1 (0,0) 1 2020-01-01 1 123
1 (0,0) 1 2020-02-01 1 123
1 (9,9) 1 2020-03-01 1 123
2 (8,8) 1 2020-04-01 1 123
Expected result:
1 (9,9) 1 2020-03-01 1 123
2 (8,8) 1 2020-04-01 1 123
This is example of filter I want to apply to aggregated data:
"post_filter": {
"bool": {
"should": [{
"bool": {
"must": [
{
{ "term": { "location_source_type": 1 }},
{ "term" : { "org_id" : "123" }}
},
{
"range": { "created": { "lte": "2020-01-01T00:00:00" } }
}
],
"filter": [{
"geo_distance": {
"distance": "1000km",
"location": {
"lat": 9,
"lon": 9
}
}
}]
}
}]
}
}

How to do filter and aggregation of nested objects on a 12 months from now with Elasticsearch?

I have a document of following mapping:
{
"id": {"type": "integer"},
"owner": {"type": "object"},
"company_id": {"type": "integer"},
"summary": {"type": "object"},
"create_date": {"type": "date"},
}
So basically I want to filter id of owner and 12 months from now based on create_date. And then perform aggregate on keys inside summary objects.
Example of data I have:
id | owner | company_id | summary | create_date
01 | {"id": 1, "name": "x"} | 1 | {"data1": 2, "data2": 5, "data3": 6} | "2020-09-22T01:04:17.852112Z"
02 | {"id": 2, "name": "y"} | 2 | {"data1": 2, "data2": 5, "data4": 6} | "2020-09-17T04:11:45.851231Z"
03 | {"id": 3, "name": "z"} | 3 | {"data1": 0, "data2": 4, "data3": 6} | "2019-02-02T12:19:27.852121Z"
Data as I want.
month-year | aggregate of summary keys
09-2020 (any indicator/format of month and year) |{"data1":1, "data2": 5, "data3": 6, "data4": 6}
here data I want average of all the keys inside summary object according of every month of last 12 months.
GET data/_search
{
"size": 0, // <====== Represent that query o/p is not required, only aggs
"query": {
"bool": {
"filter": [
{
"range": {
"create_date": {
"gte": "now-6M" // <========== 'M' represent month, now represents current timestamp
}
}
},
{
"term": {
"owner.id": 4
}
}
]
}
},
"aggs": {
"NAME": { //<====== Custom name you can provide to this aggregation
"terms": { // <============ You need grouping based on the field and count of the grouped field will be returned
"field": "summary.v1",
"size": 10 // <==== How many data points needs to be returned
}
}
}
}
Some details are added in the query. Other important things to learn :
Queries & Filters
Terms Aggregation
Edit: Use below aggregation part in the existing query if you need monthly avgs.
"aggs": {
"monthly_grouping": {
"date_histogram": {
"field": "create_date",
"interval": "month",
"missing": "0"
},"aggs": {
"average_V1": {
"avg": {
"field": "summary.v1"
}
},
"average_V2": { //<===== Similarly add other fields if required
"avg": {
"field": "summary.v1"
}
}
}
}
}
Read about Date-Histogram here.

elastic Aggregation one result

I want to group by a property and only get the cheapest returned in the result.
When I have the following information in an elastic index:
Name | price | type
bear | 15 | animal
bal | 4 | toy
duck | 10 | animal
bear | 13 | animal
doll | 16 | toy
dog | 20 | animal
I would like the following as the result
Name | price | type
duck | 10 | animal
bal | 4 | toy
I've tried to get such a result with the following query:
{
"aggregations": {
"aggregation_1": {
"terms": {
"field": "type.keyword",
"order": {
"price_min": "desc"
},
"size": 5
},
"aggregations": {
"price_min": {
"min": {
"field": "price"
}
}
}
}
},
"size": 10
}
But the results from that query returns all items, is aggregation the wrong method to get what I want?
You want to do it like this, i.e. for each product type, find the hit with the lowest price:
{
"size": 0,
"aggregations": {
"aggregation_1": {
"terms": {
"field": "type.keyword",
"size": 5
},
"aggregations": {
"cheapest": {
"top_hits": {
"size": 1,
"sort": {
"price": "asc"
}
}
}
}
}
}
}

multiple words act as single word in search - Elasticsearch

I have an issue with tags such as social media, two words, tag with many spaces have a multiplied score for each word in search query.
How can I achieve to search two words as one word instead getting different score when searching two and two words
Here is a visual representation the current results score:
+-----------------------+-------+
| search | score |
+-----------------------+-------+
| two | 2.76 |
| two words | 5.53 |
| tag with many spaces | 11.05 |
| singleword | 2.76 |
Here is a visual representation of what I want:
+-----------------------+-------+
| search | score |
+-----------------------+-------+
| two | 2.76 |
| two words | 2.76 |
| tag with many spaces | 2.76 |
| singleword | 2.76 |
There are multiple tags in each document. each tag search is broken down by a comma , in PHP and outputted like the query below
Assuming a document has multiple tags including two words and singleword, this would be the search query:
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "two words"
}
},
{
"match": {
"tags.name": "singleword"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
The score will be different if searching two instead of two words
Here is how the result looks like when searching two words
{
"_index": "index",
"_type": "type",
"_id": "u10q42cCZsbFNf1W0Tdq",
"_score": 4.708793,
"_source": {
"url": "example.com",
"title": "title of the document",
"description": "some description of the document",
"popularity": 9,
"tags": [
{
"name": "two words",
"votes": 1
},
{
"name": "singleword",
"votes": 1
},
{
"name": "othertag",
"votes": 1
},
{
"name": "random",
"votes": 1
}
]
}
}
Here is the result when searching two instead of two words
{
"_index": "index",
"_type": "type",
"_id": "u10q42cCZsbFNf1W0Tdq",
"_score": 3.4481666,
"_source": {
"url": "example.com",
"title": "title of the document",
"description": "some description of the document",
"popularity": 9,
"tags": [
{
"name": "two words",
"votes": 1
},
{
"name": "singleword",
"votes": 1
},
{
"name": "othertag",
"votes": 1
},
{
"name": "random",
"votes": 1
}
]
}
}
Here is the mapping (for the tags specifically)
"tags": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
I have tried searching with "\"two words\"" and "*two words*" but it gave no difference.
Is it possible to achieve this?
You should use the non analyzed string for your matching and switch to a term query.
Can you try :
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"term": {
"tags.name.keyword": "two words"
}
},
{
"term": {
"tags.name.keyword": "singleword"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
With your actual implementation, when you do a match query with the query "two words" it will analyze your query to search for token "two" and "words" in your tags. So documents with tag "two words" will match the two tokens and will be boosted.

Using filters to count values in Kibana / Visualize?

(I am quite new to ELK stack and may ask something obvious...)
I have documents describing customers informations, with data such as name, address, age, etc...
Sometimes, not all these fields exist and I would like to know the number of documents having them filled.
If the data looks like:
PUT customers
{
"mappings": {
"customer": {
"properties": {
"id": {
"type": "integer"
},
"category": {
"type": "keyword"
},
"email": {
"type": "text"
},
"age": {
"type": "integer"
},
"address": {
"type": "text"
}
}
}
}
}
POST _bulk
{"index":{"_index":"customers","_type":"customer"}}
{"id":"1","category":"aa","email":"sam#test.com"}
{"index":{"_index":"customers","_type":"customer"}}
{"id": "2", "category" : "aa", "age": "5"}
{"index":{"_index":"customers","_type":"customer"}}
{"id": "3", "category" : "aa", "email": "bob#test.com", "age": "36"}
{"index":{"_index":"customers","_type":"customer"}}
{"id": "4", "category" : "bb", "email": "kim#test.com", "age": "42", "address": "london"}
The idea is to have in Kibana visualize a data table like :
+----------+-------+-------+-----+---------+
| category | total | email | age | address |
+----------+-------+-------+-----+---------+
| aa | 3 | 2 | 2 | 0 |
| bb | 1 | 1 | 1 | 1 |
+----------+-------+-------+-----+---------+
(eg: we have 3 customers in category "aa"; among them 2 gave their email, 2 gave their age, none gave its address)
I can figure out how to do that with a query like:
POST /customers/_search?size=0
{
"aggs": {
"category": {
"terms": {
"field": "category"
},
"aggs": {
"count_email": {
"filter": {
"exists": {
"field": "email"
}
}
},
"count_age": {
"filter": {
"exists": {
"field": "age"
}
}
},
"count_address": {
"filter": {
"exists": {
"field": "address"
}
}
}
}
}
}
}
But I can't find how we can do that in Kibana Visualize.
Should I use scripted fields ? JSON inputs ? how ? is there a better way ?
Thanks for your advices.
In the UI I was able to split the rows using the keyword term filter.
Below is a url to get you started.
The call will create a data table, aggregate by count and split rows by category keyword term.
http://localhost:5601/app/kibana#/visualize/create?type=table&indexPattern=customers&_g=()&_a=(filters:!(),linked:!f,query:(query_string:(analyze_wildcard:!t,query:'*')),uiState:(vis:(params:(sort:(columnIndex:!n,direction:!n)))),vis:(aggs:!((enabled:!t,id:'1',params:(),schema:metric,type:count),(enabled:!t,id:'2',params:(field:category.keyword,order:desc,orderBy:_term,size:2),schema:bucket,type:terms)),listeners:(),params:(perPage:10,showMeticsAtAllLevels:!f,showPartialRows:!f,showTotal:!f,sort:(columnIndex:!n,direction:!n),totalFunc:sum),title:'CategoryTable',type:table))

Resources