How to sort data in elastic search based on the filter data - elasticsearch

I am relatively new to this elastic search. So I have data stored in the elastic search in a below-mentioned way:
[{
"name": "user1",
"city": [{
"name": "city1",
"count": 18
},{
"name": "city2",
"count": 15
},{
"name": "city3",
"count": 10
},{
"name": "city4",
"count": 5
}]
},{
"name": "user2",
"city": [{
"name": "city2",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city6",
"count": 8
},{
"name": "city8",
"count": 15
}]
},{
"name": "user3",
"city": [{
"name": "city1",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city7",
"count": 28
},{
"name": "city2",
"count": 1
}]
}]
So, what I am trying to do is, find out those users who have "city2" in their city list and order the data based on the "count" of "city2".
Here is my query what I have tried:
{
"sort": [{
"city.count": {
"order" : "desc"
}
}],
"query": {
"bool": {
"must": [
{"match": {"city.name": "city2"}}
]
}
}
}
So I am not able to figure out the sort part how to do it!
The sorting part is considering all the "count" value of all the cities based on the filter, but I just want the order to happen only based on the "count" of "city2".
Any kind of help would be appreciated. Thanks in advance.

Since the field city is object and not nested object, what you are trying to achieve won't be possible. The reason for this is when you define a field as object, elastics flattens each of the object field values as an array. So,
"city": [
{
"name": "city1",
"count": 18
},
{
"name": "city2",
"count": 15
},
{
"name": "city3",
"count": 10
},
{
"name": "city4",
"count": 5
}
]
is indexed as :
"city.name" : ["city1", "city2", "city3", "city4"]
"city.count": [18, 15, 10, 5]
As you can see, because of the way elastic index the object the relation between each city and its count is lost.
So, whenever you want to preserve the relation you should define the field as nested type.
{
"city": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"count": {
"type": "long"
}
}
}
}
Sorting then can be achieved by using this nested field.
{
"sort": [
{
"city.count": {
"order": "desc",
"mode": "avg",
"nested": {
"path": "city",
"filter": {
"match": {
"city.name": "city2"
}
}
}
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"city.name": "city2"
}
}
]
}
}
}

Reaching your goal will be a little complex.
First, your query says that you want to get the docs with "city2" in them. Since at least one of the elements in the array "city" matches, the whole document will be returned.
The problem is that you only want to return the count for city2, not for all of them. This is where the complex part comes.
There are plenty of paths you can follow:
Change your index design. Instead of having an array of users, have one document per user with all their info, including the cities they have visited. However, the "I only want 1 element from the array" problem will still be there, but you will only will fight with one array at time, instead of n.
You can use Painless to only bring back the count of that particular city, but it would imply a lot of scripting. Don't trust the name. Painless is very Painful.
You can bring back all the elements and do the filtering within your code. For example, if you use the Python Elasticsearch Client, you can execute the query, return all the objects and only selec the wanted elements with Python.
Don't consider using the Terms aggregation. It would bring back the total counting of all the cities, without having the relationship with each user. And this is not what you want to do.
Hope this is helpful and sorry we can't get a straight-forward solution :(

Related

Group by terms and get count of nested array property?

I would like to get the count from a document series where an array item matches some value.
I have documents like these:
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 10
},{
"State": "PENDING"
"Timer": 5
}]
}
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 5
},{
"State": "PENDING"
"Timer": 2
}]
}
{
"Name": "martin",
"Todos": [{
"State": "COMPLETED"
"Timer": 15
},{
"State": "PENDING"
"Timer": 10
}]
}
I would like to count how many documents I have where they have any Todos with COMPLETED State. And group by Name.
So from the above I would need to get:
jason: 2
martin: 1
Usually I do this with a term aggregation for the Name, and an other sub aggregation for other items:
"aggs": {
"statistics": {
"terms": {
"field": "Name"
},
"aggs": {
"test": {
"filter": {
"bool": {
"must": [{
"match_phrase": {
"SomeProperty.keyword": {
"query": "THEVALUE"
}
}
}
]
}
},
But not sure how to do this here as I have items in an array.
Elasticsearch has no problem with arrays because in fact it flattens them by default:
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.
So a query like the one you posted will do. I would use term query for keyword datatype, though:
POST mytodos/_search
{
"size": 0,
"aggs": {
"by name": {
"terms": {
"field": "Name"
},
"aggs": {
"how many completed": {
"filter": {
"term": {
"Todos.State": "COMPLETED"
}
}
}
}
}
}
}
I am assuming your mapping looks something like this:
PUT mytodos/_mappings
{
"properties": {
"Name": {
"type": "keyword"
},
"Todos": {
"properties": {
"State": {
"type": "keyword"
},
"Timer": {
"type": "integer"
}
}
}
}
}
The example documents that you posted will be transformed internally into something like this:
{
"Name": "jason",
"Todos.State": ["COMPLETED", "PENDING"],
"Todos.Timer": [10, 5]
}
However, if you need to query for Todos.State and Todos.Timer, for example, filter for those "COMPLETED" but only with Timer > 10, it will not be possible with such mapping because Elasticsearch forgets the link between fields of object array items.
In this case you would need to use something like nested datatype for such arrays, and query them with special nested query.
Hope that helps!

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Item variants in ElasticSearch

What is the best way to use item variants in elasticsearch and retrieving only 1 item of the variant group?
For example, let's say I have the following items:
[{
"sku": "abc-123",
"group": "abc",
"color": "red",
"price": 10
},
{
"sku": "def-123",
"group": "def",
"color": "red",
"price": 10
},
{
"sku": "abc-456",
"group": "abc",
"color": "black",
"price": 20
}
]
The first item and the last one are in the same group, so I want only to return one of them if I query for items below the price of 20 (for example), but with the best hit score.
Feel free to suggest documents design and queries accordingly.
If your mapping is of Nested datatype, then you can use this to retrieve them.
GET index/type/_search
{
"size": 2000,
"_source": false,
"query": {
"bool": {
"filter": {
"nested": {
"path": "childs",
"query": {
"bool": {
"filter": {
"term": {
"childs.group.keyword": "abc"
}
}
}
},
"inner_hits": {}
}
}
}
}
}

Elastic Search design for nested of nested data

I am intending to use Elastic Search as primary datastore and my documents are like this nested of nested data. Events has 3 levels of nested data.
{
"Date": "2015-10-21",
"Hour": "7",
"Minute": "15-29",
"Domain": "abc.com",
"Processed_at": "10/23/2015 9:47 UTC"
"Events": [
{
"Name": "visit",
"Count": "188",
"Attributes_Aggregations": [
{
"Name": "price",
"Value_Aggregations": [
{
"Value": "$125",
"Count": "188",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
},
{
"ID": "CL_2135514566_1427476812_392007750_2004930119",
"Count": "32"
},
....
]
},
....
]
},
{
"Name": "color",
"Value_Aggregations": [
{
"Value": "red",
"Count": "188",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
}
]
}
]
},
...
]
},
{
"Name": "order_created",
"Count": "159",
"Attributes_Aggregations": [
{
"Name": "price",
"Value_Aggregations": [
{
"Value": "$125",
"Count": "159",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930122",
"Count": "32"
},
....
]
}
]
},
]
},
]
}
If i consider to use parent/child relationship structure but as per the Elastic document this level of parent/child query will become slow.
Is there any other idea to design the document to best fit in Elastic Search?
My desired queries will be using all the keys of the document to filter. range and count also to be used.
You can use nested queries if you define the structure at mapping as nested, as explained here and here. I'm not sure why nafas didn't mention this. Queries will be quite nasty to write though.
Elastic Search is a great tool, however there is a major downfall in nested data, the problem is that ES flatten the array of objects, so if you query the nested info it returns them all.
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
for example if you query for Unique_Users.Count=38 it will return you
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
},
{
"ID": "CL_2135514566_1427476812_392007750_2004930119",
"Count": "32"
}
because this particular array (Unique_Users) has a field Count that matches 38

Elasticsearch - bump individual result to the top

I'm working with Elasticsearch. I have an array of documents, and I'm trying to sort documents by the property price, except that I'd like a particular document to be the first result no matter what.
The below is what I'm using as my "sort" array as my attempt to order documents by ID 1213, and then all following documents ordered by price descending.
[
{
"id": {
"mode": "max",
"order": "desc",
"nested_filter": {
"term": {
"id": 1213
}
},
"missing": "_last"
}
},
{
"price": {
"order": "asc"
}
}
]
This doesn't appear to be working, though—document 1213 doesn't appear first. What am I doing wrong here?
As an example—the ideal returned result:
[{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
Instead, I get:
[{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
As others have already asked, what is the reason for the nested_filter?
There's many possible ways to do what you need. Here is one possible way which fits with the simple requirements you mentioned so far:
{
"query" : {
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : {
"term" : {
"id" : "1213"
}
},
"boost" : 2
}
]
}
},
"sort" : [
"_score",
"price"
]
}
The assumption here is that your query is simple like the match_all query and does not affect the scores in anyway. If you do have something more complicated for the queries, to not affect the scores, you can try wrapping with a constant_score query. But ideally you get the document set you want where all the documents have the same score and then custom_filters_score query will boost the score of the document you want. You can do this for any number of documents adding further filters or if the documents are equal, use a terms filter. In the end the sort by the score and then the price.
In this case you need to use function_score to modify score of each doc.
{
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"id": "1213"
}
},
"weight": 1
},
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
],
"score_mode": "sum",
"boost_mode" : "replace",
"query" : {
//YOUR QUERY GOES HERE
}
}
}
}
Explanation:
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
Compute score based on price and give a value < 1. The higher the price the smaller the score (ascending). If you want to switch to descending then just replace it with
"script": "(1 - (1 / doc['price'].value))"
{
"filter": {
term": {
"id": "1213"
}
},
"weight": 1
}
This will give any docs with "id" = 1213 an extra 1 score. The total score at the end will be the sum of those 2 functions.

Resources