Getting aggregated results with selected facet - elasticsearch

Not sure if this is possible, but I'm running into the current issue:
While being on the page, without any facet selected I run a query with some aggregations on my facets.
For example: on the "ladies shoes" page I run a query with "gender=ladies" and category "shoes" as filter, which gives me all the wanted results. Also there is an aggregation on "brand" which returns me all the brands. However, this also contains brands with a count of 0, since they don't match the "ladies shoes" criteria. But since no facet is selected, I can simply hide them, so the user won't see them.
So far, so good.
Now, when I run a query for "ladies shoes from Nike" (brand=nike as filter), I get the same list of aggregations, but now all the brands have a count of 0, except Nike. Now, it's hard to just hide them, since we want to offer the possibility to filter on multiple (available) brands.
What should be the best approach to this, with as less queries as possible?

When you're talking about multi select faceting as in your example - there is a very handy feature in the Elasticsearch - post_filter
The post_filter is applied to the search hits at the very end of a
search request, after aggregations have already been calculated.
All you need to do, is to move your Nike brand filter to the post_filter of the query like this:
{
"query": {
...
},
"aggs": {
...
},
"post_filter": {
"term": { "brand": "Nike" }
}
}
which would allow you to calculate aggregations on all brands and only after it filter out selected brand.

Related

Filtering collapsed results in Elasticsearch

I have an elasticsearch index containing documents that represent entities at a given point in time. When an entity changes state, a new document is created with a timestamp. When I need to get the current state of all entities, I can do the following:
GET https://127.0.0.1:9200/myindex/_search
{
"collapse": {
"field": "entity_id"
},
"sort" : [{
"timestamp": {
"order": "desc"
}
}]
}
However, I would like to further filter the result of the collapse. When entities are deleted I create a new document that includes an is_deleted flag along with the timestamp in a nested metadata field. I would like to extend the above query to entirely filter out those entities that have been deleted. Using a term filter on entity_metadata.is_deleted: true obviously does not work, because then my result just includes the last document with that entity_id before it got marked as deleted. How can I filter my results after the collapse is done to exclude any tombstoned entites?
What I would suggest is that instead of adding an is_deleted flag to all entity_id documents, you could add a date_deleted field with the date of the deletion to all documents of that entity, and then when you view a document, given its date and the deleted_date you'd know if the document was LIVE or deleted at that date.
In addition, it would allow you to consider:
all documents that don't have a deleted_date field (i.e. not deleted) and
all documents that have a deleted_date before/after a given date.

preserving UI in post filter aggregated faceted search

I'm moving a sql server product catalog over to elasticsearch and want to preserve how the ui currently allows the user to navigate the options. I am using aggregates with post filter but cannot get the selected options siblings to show up in the aggregates.
An example of what I am trying to achieve is from the elastic docs.
GET /cars/transactions/_search
{
"size" : 0,
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color" }
}
}
}
So, the user has clicked on the green option and the returned documents show only green ford cars, but the aggregates list all of the colors available for ford with their counts, which can be added to a ui.
All of this is ok. But, there are many makes of car other than ford. If I added a 'makes' aggregate, then this query will only return ford in the aggregates list. If building the navigation ui dynamically from the returned results (as I am), then there would be no way to place all the other makes of car into the ui, unless I queried elasticsearch many times to build up my ui - which I don't want to do.
If I changed the query to a match-all and added the query to the post filter, then I would get the full list of car makes in the aggregation, but the counts would always be a global count from the match-all query and not reflective of the drill-down count.
Is it possible to do this with elasticsearch? I've gone through all the documents - several times, and tried many different query formats, but nothing has produced quite the right results so far.

how can I sort ElasticSearch result properly

For example I have the following two documents with fields Id and Name(the name field is analyzed):
1,jack-in-box
2,box
When my query was "box", I got the both documents, but actually I only wanna the document 2, or getting document 2 above of document 1.
How can I query this please.
I know that the doc1 was tokenized to jack,in and box, so when I search box I would get the doc1. My current solution is creating another field called name_not_analyzed and it is not analyzed. But I have been wondering if we have the best way via query to solve this such I don't have to reindex. Thanks in advance!
As #jgr pointed out in comment doc2 should be above doc1 by default unless you have your own ranking algorithm or if you are using constant score query or if you are only using filter which would give score of 1 to all documents
Now if you only want doc2, you could use scripting
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source.name.toLowerCase()=='box'"
}
}
}
}
}
I am accessing the source itself to check against, also using lowercase to match BOX, Box etc.
Hope this helps!!

How to exclude large number of IDs from an Elastic Search query

I'm working on an app similar to Tinder. In ElasticSearch I have a collection of about half a million users and their locations). Whenever the user opens the app to search for nearby users I run an Elastic Search query over that collection. The query is fairly complex, it takes into consideration not only the location but also how active the user is or how many photos he has.
What I struggle with is how to exclude those users who the current user already swiped through from the query. A naive way to implement this would probably be to maintaint a nested array of user IDs as part of every user document in the index and exclude based on that. But as every user does dozens of thousands swipes that array could potentially grow super big, so it's not a scalable solution.
Is there a way to exclude large number of entities from an Elastic Search query based on their IDs which does not hurt performace?
Use the lookup feature of the Terms query: Terms lookup mechanism
When it’s needed to specify a terms filter with a lot of terms it can be beneficial to fetch those term values from a document in an index. A concrete example would be to filter tweets tweeted by your followers. Potentially the amount of user ids specified in the terms filter can be a lot. In this scenario it makes sense to use the terms filter’s terms lookup mechanism.
You can try adding the ids filter into a bool/must_not clause of your complex query and see how it behaves.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... <--- your other "must" constraints
],
"must_not": [
{
"ids": {
"values": [ "id1", "id2", "id3" ] <--- your list of ids to exclude
}
}
]
}
}
}
}
}

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Resources