Elasticsearch multi-select faceting - filter

Lets imagine we have the following documents:
curl -XPUT 'http://localhost:9200/multiselect/demo/1' -d '{
"title": "One",
"tags": ["tag1"],
"keywords": ["keyword1"]
}'
curl -XPUT 'http://localhost:9200/multiselect/demo/2' -d '{
"title": "Two",
"tags": ["tag2"],
"keywords": ["keyword2"]
}'
If we do the query:
curl -XGET '
{
"post_filter": {
"and": [
{
"terms": {
"tags": [
"tag1",
"tag2"
]
}
},
{
"terms": {
"keywords": [
"keyword1"
]
}
}
]
},
"aggs": {
"tagFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "tags",
"size": 0
}
}
},
"filter": {
"terms": {
"keywords": [
"keyword1"
]
}
}
},
"keywordFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "keywords",
"size": 0
}
}
},
"filter": {
"terms": {
"tags": [
"tag1",
"tag2"
]
}
}
}
}
}
'
We will have a document "One" and a list of facets: tag1 - 1, keyword1 - 1, keyword2 - 0 and tag2 - 1, but actually the last one tag2 should not be there, because we don't have anything for the keyword2 in our filter (and the facets).
The question is, is there are any possibility to get facets without tag2, and not to make 2 requests.
Let me know, if you need a better explanation, but I guess the basic idea should be clear.
PS. Some better explanation of the following pattern you can find out here: https://gist.github.com/mattweber/1947215; it's the same thing, and it have the same issue described here.

Related

Elasticsearch unexpected results when sorting against deeply nested attributes

I'm trying to perform some sorting based on the attributes of a document's deeply nested children.
Let's say we have an index filled with publisher documents. A publisher has a collection of books, and
each book has a title, a published flag, and a collection of genre scores. A genre_score represents how well
a particular book matches a particular genre, or in this case a genre_id.
First, let's define some mappings (for simplicity, we will only be explicit about the nested types):
curl -XPUT 'localhost:9200/book_index' -d '
{
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested"
}
}
}
}
}
}
}'
Here are our two publishers:
curl -XPUT 'localhost:9200/book_index/publisher/1' -d '
{
"name": "Best Books Publishing",
"books": [
{
"name": "Published with medium genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 50 },
{ "genre_id": 2, "score": 15 }
]
}
]
}'
curl -XPUT 'localhost:9200/book_index/publisher/2' -d '
{
"name": "Puffin Publishers",
"books": [
{
"name": "Published book with low genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 10 },
{ "genre_id": 4, "score": 10 }
]
},
{
"name": "Unpublished book with high genre_id of 1",
"published": false,
"genre_scores": [
{ "genre_id": 1, "score": 100 },
{ "genre_id": 2, "score": 35 }
]
}
]
}'
And here is the final definition of our index & mappings...
curl -XGET 'localhost:9200/book_index/_mappings?pretty=true'
...
{
"book_index": {
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested",
"properties": {
"genre_id": {
"type": "long"
},
"score": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"published": {
"type": "boolean"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now suppose we want to query for a list of publishers, and have them sorted by those who books performing
well in a particular genre. In other words, sort the publishers by the genre_score.score of one of their books
for the target genre_id.
We might write a search query like this...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"term": {
"books.genre_scores.genre_id": 1
}
}
}
}
],
"_source":false,
"query": {
"nested": {
"path": "books",
"query": {
"bool": {
"must": []
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Which correctly returns the Puffin (with a sort value of [100]) first and Best Books second (with a sort value of [50]).
But suppose we only want to consider books for which published is true. This would change our expectation to have Best Books first (with a sort of [50]) and Puffin second (with a sort of [10]).
Let's update our nested_filter and query to the following...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"bool": {
"must": [
{
"term": {
"books.genre_scores.genre_id": 1
}
}, {
"term": {
"books.published": true
}
}
]
}
}
}
}
],
"_source": false,
"query": {
"nested": {
"path": "books",
"query": {
"term": {
"books.published": true
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Suddenly, our sort values for both publishers has become [-9223372036854775808].
Why does adding an additional term to our nested_filter in the top-level sort have this impact?
Can anyone provide some insight as to why this behavior is happening? And additionally, if there are any viable solutions to the proposed query/sort?
This occurs in both ES1.x and ES5
Thanks!

Facet by objects(tags) in an array

I am running into a query problem with ElasticSearch.
We have objects that looks like this:
{
"id":"1234",
"tags":[
{ "tagName": "T1", "tagValue":"V1"},
{ "tagName": "T2", "tagValue":"V2"},
{ "tagName": "T3", "tagValue":"V3"}
]
}
{
"id":"5678",
"tags":[
{ "tagName": "T1", "tagValue":"X1"},
{ "tagName": "T2", "tagValue":"X2"}
]
}
And I would like to get a list of tagValues for tagName=T1, which is "V1" and "X1".
I tried
{
"filter": {
"bool": {
"must": [
{
"term":{
"tags.tagName": "T1"
}
}
]
}
},
"facets": {
"TagValues":{
"filter": {
"term": {
"tags.tagName": "T1"
}
},
"terms": {
"field": "tags.tagValue",
"size": 30
}
}
}
}
It seems like it's returning all tagValues from all tags "T1", "T2", and "T3".
Can someone please help me with this query? How can I get faceted list for objects that's in an array?
Any help would be appreciated.
Thank you,
The main idea is to use the nested type for your tags field. Here is the mapping you should use:
curl -XPUT localhost:9200/mytags -d '{
"mappings": {
"mytag": {
"properties": {
"id": {
"type": "string"
},
"tags": {
"type": "nested",
"properties": {
"tagName": {
"type": "string",
"index": "not_analyzed"
},
"tagValue": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
Then you can reindex your data and run a query like the one below, which will first filter only the document containing a tagName whose value is T1 and then using aggregations (don't use facets anymore as they are deprecated), you can again select only those tags whose tagName is T1 and then retrieve the associated tagValue fields. This will get you the expected V1 and X1 values.
curl -XPOST localhost:9200/mytags/mytag/_search -d '{
"size": 0,
"query": {
"filtered": {
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.tagName": "T1"
}
}
}
}
}
},
"aggs": {
"tags": {
"nested": {
"path": "tags"
},
"aggs": {
"values": {
"filter": {
"term": {
"tags.tagName": "T1"
}
},
"aggs": {
"values": {
"terms": {
"field": "tags.tagValue"
}
}
}
}
}
}
}
}'

elasticsearch size parameter not work

With size parameter or not this query always return 10 document(total doc:12678)
Somehow it ignores size parameter, even size eqauls to 2 it returns 10 docs again
POST webproxylog/_search
{
"from": 0, "size": 100,
"query": {
"filtered": {
"filter": {
"terms": {
"category": [
"-1",
"0"
]
}
}
}
},
"sort": [
{
"respsize": {
"order": "desc"
}
}
]
}
You should use POST instead of GET when sending the query in the HTTP payload. Some HTTP clients do not send a payload when using GET.
The following will get you 100 results:
curl -XPOST localhost:9200/webproxylog/_search -d '{
"from": 0, "size": 100,
"query": {
"filtered": {
"filter": {
"terms": {
"category": [
"-1",
"0"
]
}
}
}
},
"sort": [
{
"respsize": {
"order": "desc"
}
}
]
}'

Elasticsearch multi-select facet functionality with child aggregation

Given the following data:
curl -XPUT 'http://localhost:9200/products/'
curl -XPOST 'http://localhost:9200/products/product/_mapping' -d '{
"product": {
"_parent": {"type": "product_group"}
}
}'
curl -XPUT 'http://localhost:9200/products/product_group/1' -d '{
"title": "Product 1"
}'
curl -XPOST localhost:9200/products/product/1?parent=1 -d '{
"height": 190,
"width": 120
}'
curl -XPOST localhost:9200/products/product/2?parent=1 -d '{
"height": 120,
"width": 100
}'
curl -XPOST localhost:9200/products/product/3?parent=1 -d '{
"height": 110,
"width": 120
}'
Child aggregation on product results in the following facets:
Height
110 (1)
120 (1)
190 (1)
Width
120 (2)
100 (1)
If I now filter on height 190, what I would like is to have the height aggregation excluded from the filter so the results would be:
Height
110 (1)
120 (1)
190 (1)
Width
120 (1)
This is solvable with filter aggregation, but I'm not sure if it works or how the syntax is when using parent - child relations.
See http://distinctplace.com/2014/07/29/build-zappos-like-products-facets-with-elasticsearch/
What I've tried so far:
curl -XGET 'http://localhost:9200/products/product_group/_search?pretty=true' -d '{
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {"height": 190}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {"type": "product"},
"aggs": {
"height": {
"filter": {"match_all": {}},
"aggs": {
"height": {
"terms": {"field": "height", "size": 10}
}
}
},
"width": {
"filter": {
"and": [{"terms": { "height": [190]}}]
},
"aggs": {
"width": {
"terms": {"field": "width", "size": 10}
}
}
}
}
}
}
}
'
I don't fully understand your question, but If you want to have multiple aggregation inside child aggregation, you have to append parent type name before every field in aggregation.
here is modified query,
curl -XPOST "http://localhost:9200/products/product_group/_search?pretty=true" -d'
{
"size": 0,
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {
"product.height": 190
}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {
"type": "product"
},
"aggs": {
"height": {
"filter": {
"match_all": {}
},
"aggs": {
"height": {
"terms": {
"field": "product.height",
"size": 10
}
}
}
},
"width": {
"filter": {
"and": [
{
"terms": {
"product.height": [
190
]
}
}
]
},
"aggs": {
"width": {
"terms": {
"field": "product.width",
"size": 10
}
}
}
}
}
}
}
}'
It wasn't mentioned anywhere in documentation, which is confusing to many users, I guess they treat child aggregation same as nested aggregation so same way to aggregate.

Trying to extract a leaf field from Elasticsearch

I have an object in elasticsearch which resembles something like this:
{
"text": "something something something",
"entities": { "hashtags":["test","test123"]}
}
The problem is that not each document has the entities attribute set. So I want to write a query which:
must contain a keyword in the text field
must have the entities field
extracts the entities.hashtag field
I'm trying to extract a leaf field using following query, the problem is I still get documents which don't have an entities field.
For the second part of the question, I was wondering: How do I only extract the entities.hashtags field? I tried something like "fields": ["entities.hashtags"] but it didn't work.
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match_all": {
}
},
"filter": {
"bool": {
"must": [{
"term": {
"text": "something"
}
},
{
"missing": {
"field": "entities",
"existence": true
}
}]
}
}
}
}
}
This seems to do what you want, if I'm understanding you correctly. A "term" filter on the "text" field and an "exists" filter on the "entities" field filters the docs, and a "terms" aggregation on "entities.hashtags" extracts the values. I'll just post the full example I used:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
}
}
PUT /test_index/doc/1
{
"text": "something something something",
"entities": { "hashtags": ["test","test123"] }
}
PUT /test_index/doc/2
{
"text": "another doc",
"entities": { "hashtags": ["testagain","testagain123"] }
}
PUT /test_index/doc/3
{
"text": "doc with no entities"
}
POST /test_index/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{ "term": { "text": "something" } },
{ "exists": { "field": "entities" } }
]
}
}
}
},
"aggs": {
"hashtags": {
"terms": {
"field": "entities.hashtags"
}
}
}
}
...
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"hashtags": {
"buckets": [
{
"key": "test",
"doc_count": 1
},
{
"key": "test123",
"doc_count": 1
}
]
}
}
}

Resources