Facets for nested tree with ElasticSearch - elasticsearch

I'm new to ElasticSearch, so I need some help with it.
I have a query to search for products which can belong to many categories. Categories are combined in a nested tree.
Example data:
categories: [
{
id: 1,
name: 'First category',
categories:[
{
id: 12,
name: 'First subcategory'
},
{
id: 13,
name: 'Second subcategory'
}
]
},
{
id: 2,
name: 'Second category'
}
],
products: [
{
id: 1,
name: 'First product',
categories_ids: [2, 12]
},
{
id: 2,
name: 'Second product',
categories_ids: [1]
}
]
Besides the search results I need to get the categories tree including the number of search results in each category (excluding categories without any search results).
For the above example it should be:
First category (2)
First subcategory (1)
Second category (1)
Can someone explain how to do this using ElasticSearch's aggregations?
Thanks.

I had similar need and used Nested objects. Here is that thread
How to narrow down the current aggregation context to a specific scope within set of documents returned from Filter Aggregation?

I think you are in search for something around this:
{
"aggs": {
"category_agg": {
"terms": {
"field": "category_name"
},
"aggs": {
"sub_category_agg": {
"terms": {
"field": "sub_category"
},
"filter": {
"term": {
"sub_category": "First subcategory"
}
}
}
}
}
}
}
Apply filters (or omit) as on need and be sure the fields on which you make aggregations (category_name and subcategory_name in this example) to be not_analyzed.

Related

Elasticsearch - Sort query based on collapse results

I'm trying to group/stack items based on their SKU.
Currently if sorting from high to low, an item thats being sold for $10 or $1, will show the $1 item first (because it's also sold for $10 it will be placed in front of the array ofcourse). The sorting should only respect the lowest_price for its sorting operation, for only that specific SKU.
Is there a way so I can do sorting based on the lowest_price of for every SKU and only return 1 single item per SKU?
If the results from the collapse could be used as variable for the sorting, this could be solved but I haven't been able to find out how this work.
My item object looks like this:
{
itemId: String,
sku: String,
price: Number
}
This is my query:
let itemsPerPage = 25;
let searchQuery = {
from: itemsPerPage * page,
size: itemsPerPage,
_source: ['itemId'],
sort: [{'sale.price': 'desc'}],
query: {
bool: {
must: [],
must_not: []
}
},
collapse: {
field: 'sku',
inner_hits: [{
name: 'lowest_price',
size: 1,
_source: ['itemId'],
sort: [{
'price': 'asc'
}]
}
],
}
};
You need to add sort underneeth collapse.
example:
GET /test/_search
{
"query": {
"function_score": {
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match" : {
"job_status" : "SUCCESS"
}
}
]
}
}
}
}
}
},
"collapse": {
"field": "run_id.keyword"
},
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
This may solve your issue.

Return unique results in elasticsearch

I have a use case in which I have data like
{
name: "John",
parentid": "1234",
filter: {a: '1', b: '3', c: '4'}
},
{
name: "Tim",
parentid": "2222",
filter: {a: '2', b: '1', c: '4'}
},
{
name: "Mary",
parentid": "1234",
filter: {a: '1', b: '3', c: '5'}
},
{
name: "Tom",
parentid": "2222",
filter: {a: '1', b: '3', c: '1'}
}
expected results:
bucket:[{
key: "2222",
hits: [{
name: "Tom" ...
},
{
name: "Tim" ...
}]
},
{
key: "1234",
hits: [{
name: "John" ...
},
{
name: "Mary" ...
}]
}]
I want to return unique document by parentid. Although I can use top aggregation but I don't how can I paginate the bucket. As there is more chance of parentid being different than same. So mine bucket array would be large and I want to show all of them but by paginating them.
There is no direct way of doing this. But you can follow these steps to get desired result.
Step 1. You should know all parentid. This data can be obtained by doing a simple terms aggregation (Read more here) on field parentid and you will get only the list of parentid, not the documents matching to that. In the end you will have a smaller array on than you are currently expectig.
{
"aggs": {
"parentids": {
"terms": {
"field": "parentid",
"size": 0
}
}
}
}
size: 0 is required to return all results. Read more here.
OR
If you already know list of all parentid then you can directly move to step 2.
Step 2. Fetch related documents by filtering documents by parentid and here you can apply pagination.
{
"from": 0,
"size": 20,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"parentid": "2222"
}
}
}
}
}
from and size are used for pagination, so you can loop through each of parentid in the list and fetch all related documents.
If you are just looking for all names grouped by parent id, you can use below query:
{
"query": {
"match_all": {}
},"aggs": {
"parent": {
"terms": {
"field": "parentid",
"size": 0
},"aggs": {
"NAME": {
"terms": {
"field": "name",
"size": 0
}
}
}
}
},"size": 0
}
If you want the entire document grouped by parentdId, it will be a 2 step process as explained by Sumit above and you can use pagination there.
Aggregation doesn't give you access to all documents/document-ids in the agg result, so this will have to be a 2 step process.

Elasticsearch searching and sorting across 2 models

I have 2 models: Products and Skus, where a Product has one or more Skus, and a Sku belongs to exactly one Product. They have the following columns:
Product: id, title, content, category_id
Sku: id, product_id, price
I'd like to be able to display 48 products per page across various search and sort configurations, but I'm having trouble translating this to elasticsearch.
For example, it's not clear to me how I would search on title while sorting the relevant results by the lowest-priced Sku for each Product. I've tried a few different things, and closest has been to index everything as belonging to the Sku, then searching like so:
size: '48',
aggs: {
group_by_product: {
terms: { field: 'product_id' }
}
},
filter: {
and: [{
bool: {
must: { range: { price: { gte: 0, lte: 50 } } }
},{
bool: {
must: { terms: { category_id: [ 1, 2, 3, 4, 5, 6 ] } }
}
}]
},
query: {
fuzzy_like_this: {
fields: [ 'title', 'content' ],
like_text: 'Chair',
fuzziness: 1
}
}
But this gives 48 matching Skus, many of which belong to the same Product, so my pagination is off if I try to combine them after the search.
What would be the best way to handle this use case?
Update
Trying with the nested method, using the following structure:
{
size: '48',
query:
{ bool:
{ should:
{ fuzzy_like_this:
{ fields: [ 'title' ],
like_text: 'chair',
fuzziness: 1 },
},
{ must:
{ nested:
{ path: 'skus',
query:
{ bool:
{ must: { range: { price: { gte: 0, lte: 100 } } }
}
}
}
}
}
}
},
sort:
{ _score: 'asc',
'skus.price':
{ nested_path: 'skus',
nested_filter:
{ range: { 'skus.price': { gte: 0, lte: 100 } } },
order: 'asc',
mode: 'min'
}
}
}
This is likely closer, but still not sure how to format it. The above gives products ordered by price, but seems to completely disregard the search field.
Since paginating aggregation results is not possible, even though the approach of including the sku inside the product is a good one, I would go with nested objects depending on the requirements for queries.
As an example query:
GET /product/test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "whatever",
"fuzziness": 1,
"prefix_length": 3
}
}
},
{
"nested": {
"path": "skus",
"query": {
"range": {
"skus.price": {
"gte": 11,
"lte": 50
}
}
}
}
}
]
}
},
"sort": [
{
"skus.price": {
"nested_path": "skus",
"order": "asc",
"mode": "min"
}
}
]
}

Trouble with Elasticsearch nested query & date calculations

I'm having trouble writing a query to query users with active events.
The short setup is I have users whom have events with start dates and end dates. Given a specific date, I need to know which users do NOT have active events on that day. Events are indexed as nested objects as they have their own models.
So here is some data
[
{
id: 1
name: 'MyUser',
events :[
{ id: 1, start: 02/01/2016, end: 02/05/2016 },
{ id: 2, start: 02/09/2016, end: 02/10/2016 },
]
},
{
id: 2
name: 'MyUser2',
events :[
{ id: 3, start: 02/02/2016, end: 02/04/2016 },
]
},
{
id: 3
name: 'MyUser3',
events :[
]
}
]
the map looks like this
'events' => [
'type'=>'nested',
'properties'=>[
'start'=>[
'type' => 'date',
"format"=> "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
],
'end'=>[
'type' => 'date',
"format"=> "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
]
]
],
So for a example query of 02/08/2016, i need to show all users free, 02/04/2016 only user 3, and 02/08/2016 only users 1 and 3
my query currently looks like this
{
"filtered": {
"filter": {
"bool": {
"should": [{
"term": {
"events_count": 0
}
}, {
"nested": {
"path": "events",
"query": {
"bool": {
"must_not": [{
"range": {
"events.start": {
"lte" : "2016-02-08"
}
}
}, {
"range": {
"events.end": {
"gte": "2016-02-08"
}
}
}]
}
}
}
}]
}
}
}
}
I indexed events_count separate because I already gave up on mixing missing with nested objects it just didn't work as expected
Actual Problem:
So the trouble with this is user trying to match start and end dates together, currently User1 is matching the start criteria lte $search_date when it shouldn't.
The logic I'm trying to write is WHEN events.start < $search_date AND events.end > $search_date, consider it a match.
What is actually happening it seems is its evaluating the start & end logic as separate logic and thus if start < $search_date even if .end < $search_date it considers it a match.
You need to wrap your range queries within another bool query and must clause (equivalent of SQL AND).
must_not will exclude all the documents which match any of the queries
So rather than having
must_not => range_queries
make it like so:
must_not => bool => must => range_queries

How to do an SQL like "group by" an indexed field in Elastic Search?

How can I do an SQL like group by statement on a '_search' query in elastic search?
I basically need to:
1 - Filter a bunch of items using multiple filters, queries etc. Done
2 - Put these results into buckets of unique category_id. 'category_id' is currently mapped as a 'float' field of the item document type. I also need to display one of the items matching the above filters from each bucket.
3 - Paginate through these buckets
Note: Item count: 1 Million, Unique category_id count: 60,000
I would like to get all of the data type 'items' grouped by a field called . In the results I would like to get a list of all unique 'category_id' and a single item in each category (first or any item, doesn't matter) inside this group. I'd like to be able to use "from" and "size" to paginate through these results.
For example if i had data to the effect of:
id:1, category_id: 1, color:'blue',
id:2, category_id: 1, color:'red',
id:3, category_id: 1, color:'red',
id:4, category_id: 2, color:'blue',
id:5, category_id: 2, color:'red',
id:6, category_id: 3, color:'blue',
id:7, category_id: 3, color:'blue',
id:8, category_id: 3, color:'blue',
For example i want to get all that have the color 'red' then grouped by category_id and get back data to the effect of:
category_id: 1
{
item: { id:2, category_id: 1, color:'red'}
},
category_id: 2
{
item: { id:5, category_id: 2, color:'red'}
}
This is what i have so far, but it doesn't get the correct top hit, and i dont think it allows multiple filters and queries or is paginatable.
GET swap/item/_search
{
"size": 0,
"aggs": {
"color_filtered_items": {
"filter": {
"and": [
{
"terms": {
"color": [
"red"
]
}
}
]
},
"aggs": {
"group_by_cat_id": {
"terms": {
"field": "category_id",
"size": 10
},
"aggs": {
"items": {
"top_hits": {
"_source": {
"include": [
"name",
"id",
"category_id",
"color"
]
},
"size": 1
}
}
}
}
}
}
}
}
Hacks, workaround, changes to data storage suggestions welcome. Any help greatly appreciated.
Thank you all :)
The following should work , assuming that you don't want number range based aggregation for category_id.
Also you cant do pagination on aggregated results , but then you can control the size per aggregation.
{
"aggs": {
"itemsAgg": {
"terms": {
"field": "items",
"size": 10
},
"aggs": {
"categoryAgg": {
"terms": {
"field": "category_id",
"size": 10
}
}
}
}
}
}

Resources