Elastic Search design for nested of nested data - elasticsearch

I am intending to use Elastic Search as primary datastore and my documents are like this nested of nested data. Events has 3 levels of nested data.
{
"Date": "2015-10-21",
"Hour": "7",
"Minute": "15-29",
"Domain": "abc.com",
"Processed_at": "10/23/2015 9:47 UTC"
"Events": [
{
"Name": "visit",
"Count": "188",
"Attributes_Aggregations": [
{
"Name": "price",
"Value_Aggregations": [
{
"Value": "$125",
"Count": "188",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
},
{
"ID": "CL_2135514566_1427476812_392007750_2004930119",
"Count": "32"
},
....
]
},
....
]
},
{
"Name": "color",
"Value_Aggregations": [
{
"Value": "red",
"Count": "188",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
}
]
}
]
},
...
]
},
{
"Name": "order_created",
"Count": "159",
"Attributes_Aggregations": [
{
"Name": "price",
"Value_Aggregations": [
{
"Value": "$125",
"Count": "159",
"Unique_Users": [
{
"ID": "CL_2135514566_1427476812_392007750_2004930122",
"Count": "32"
},
....
]
}
]
},
]
},
]
}
If i consider to use parent/child relationship structure but as per the Elastic document this level of parent/child query will become slow.
Is there any other idea to design the document to best fit in Elastic Search?
My desired queries will be using all the keys of the document to filter. range and count also to be used.

You can use nested queries if you define the structure at mapping as nested, as explained here and here. I'm not sure why nafas didn't mention this. Queries will be quite nasty to write though.

Elastic Search is a great tool, however there is a major downfall in nested data, the problem is that ES flatten the array of objects, so if you query the nested info it returns them all.
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
for example if you query for Unique_Users.Count=38 it will return you
{
"ID": "CL_2135514566_1427476812_392007750_2004930118",
"Count": "38"
},
{
"ID": "CL_2135514566_1427476812_392007750_2004930119",
"Count": "32"
}
because this particular array (Unique_Users) has a field Count that matches 38

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Cannot sort search results within nested objects in Elasticsearch 7

I want to sort objects in ascending order but the sort doesn't work.
Here is a sort query below.
"sort":[
{
"category.position": {
"order":"asc",
"mode":"min",
"nested": {
"path": "category",
"filter": {
"term": {"category_category_id":42} }
}
}
}]
And here are the objects below.
"name": "Yeti",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 3
},
],
"name": "Venus",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 4
}
],
Please, help! Many thanks in advance!
Solved. There was a typo… Must be "category.category_id" indtead of "category_category_id".

How to sort data in elastic search based on the filter data

I am relatively new to this elastic search. So I have data stored in the elastic search in a below-mentioned way:
[{
"name": "user1",
"city": [{
"name": "city1",
"count": 18
},{
"name": "city2",
"count": 15
},{
"name": "city3",
"count": 10
},{
"name": "city4",
"count": 5
}]
},{
"name": "user2",
"city": [{
"name": "city2",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city6",
"count": 8
},{
"name": "city8",
"count": 15
}]
},{
"name": "user3",
"city": [{
"name": "city1",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city7",
"count": 28
},{
"name": "city2",
"count": 1
}]
}]
So, what I am trying to do is, find out those users who have "city2" in their city list and order the data based on the "count" of "city2".
Here is my query what I have tried:
{
"sort": [{
"city.count": {
"order" : "desc"
}
}],
"query": {
"bool": {
"must": [
{"match": {"city.name": "city2"}}
]
}
}
}
So I am not able to figure out the sort part how to do it!
The sorting part is considering all the "count" value of all the cities based on the filter, but I just want the order to happen only based on the "count" of "city2".
Any kind of help would be appreciated. Thanks in advance.
Since the field city is object and not nested object, what you are trying to achieve won't be possible. The reason for this is when you define a field as object, elastics flattens each of the object field values as an array. So,
"city": [
{
"name": "city1",
"count": 18
},
{
"name": "city2",
"count": 15
},
{
"name": "city3",
"count": 10
},
{
"name": "city4",
"count": 5
}
]
is indexed as :
"city.name" : ["city1", "city2", "city3", "city4"]
"city.count": [18, 15, 10, 5]
As you can see, because of the way elastic index the object the relation between each city and its count is lost.
So, whenever you want to preserve the relation you should define the field as nested type.
{
"city": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"count": {
"type": "long"
}
}
}
}
Sorting then can be achieved by using this nested field.
{
"sort": [
{
"city.count": {
"order": "desc",
"mode": "avg",
"nested": {
"path": "city",
"filter": {
"match": {
"city.name": "city2"
}
}
}
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"city.name": "city2"
}
}
]
}
}
}
Reaching your goal will be a little complex.
First, your query says that you want to get the docs with "city2" in them. Since at least one of the elements in the array "city" matches, the whole document will be returned.
The problem is that you only want to return the count for city2, not for all of them. This is where the complex part comes.
There are plenty of paths you can follow:
Change your index design. Instead of having an array of users, have one document per user with all their info, including the cities they have visited. However, the "I only want 1 element from the array" problem will still be there, but you will only will fight with one array at time, instead of n.
You can use Painless to only bring back the count of that particular city, but it would imply a lot of scripting. Don't trust the name. Painless is very Painful.
You can bring back all the elements and do the filtering within your code. For example, if you use the Python Elasticsearch Client, you can execute the query, return all the objects and only selec the wanted elements with Python.
Don't consider using the Terms aggregation. It would bring back the total counting of all the cities, without having the relationship with each user. And this is not what you want to do.
Hope this is helpful and sorry we can't get a straight-forward solution :(

Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :
"_source": {
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price": [
"€ 139",
"€ 125",
"€ 120",
"€ 108"
],
"max_occupancy": [
2,
2,
1,
1
],
"type": [
"Type 1",
"Type 1 - (Tag)",
"Type 2",
"Type 2 (Tag)",
],
"availability": [
10,
10,
10,
10
],
"size": [
"26 m²",
"35 m²",
"47 m²",
"31 m²"
]
}
}
Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"price: ": "€ 125",
"max_occupancy": "2",
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
}
Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").
Is it possible to extract such a result from elastic search? What kind of query do I need to perform?
Could someone suggest the best approach?
My best approach: go nested with Nested Datatype
Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.
Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.
How would the mapping is going to look like? something like this:
{
"mappings": {
"properties": {
"last_updated": {
"type": "date"
},
"country": {
"type": "string"
},
"records": {
"type": "nested",
"properties": {
"price": {
"type": "string"
},
"max_occupancy": {
"type": "long"
},
"type": {
"type": "string"
},
"availability": {
"type": "long"
},
"size": {
"type": "string"
}
}
}
}
}
}
EDIT: New document structure (containing nested documents) -
{
"last_updated": "2017-10-25T18:33:51.434706",
"country": "Italia",
"records": [
{
"price": "€ 139",
"max_occupancy": 2,
"type": "Type 1",
"availability": 10,
"size": "26 m²"
},
{
"price": "€ 125",
"max_occupancy": 2,
"type": "Type 1 - (Tag)",
"availability": 10,
"size": "35 m²"
},
{
"price": "€ 120",
"max_occupancy": 1,
"type": "Type 2",
"availability": 10,
"size": "47 m²"
},
{
"price": "€ 108",
"max_occupancy": 1,
"type": "Type 2 (Tag)",
"availability": 10,
"size": "31 m²"
}
]
}
Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:
{
"_source": [
"last_updated",
"country"
],
"query": {
"bool": {
"must": [
{
"term": {
"country": "Italia"
}
},
{
"nested": {
"path": "records",
"query": {
"bool": {
"must": [
{
"range": {
"records.max_occupancy": {
"gte": 2
}
}
}
]
}
},
"inner_hits": {
"sort": {
"records.price": "asc"
},
"size": 1
}
}
}
]
}
}
}
Conditions are: Italia AND max_occupancy > 2.
Inner hits: sort by price ascending order and get the first result.
Hope you'll find it useful

Elastic Search Grouped Queries

I'm indexing an array of key value pairs. The key is always a UUID and the value is a user entered value. I've been crawling through the documentation but I can't figure out exactly how to query in this scenarioExample schema:
{
"id": 1,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the red card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
},
{
"id": 2,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the blue card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
}
I would like to query:
{ "term": { "owner_id": 1 },
{ "term": { "values.key": "23a2dd23108" }, "match": { "values.value": "purple" } },
{ "term": { "values.key": "k3kfa23rewf" }, "match": { "values.value": "blue" } }
So that the record with ID 2 is returned. Any suggestions?
I think that you need here to use nested documents.
That way, you will be able to create BoolQueries, with a Must clause with a TermQuery on owner_id and two must clauses with nested queries with Term and Match queries on values.key and values.value.
Does it help?

Resources