Filtering facets results in nested element with ElasticSearch - elasticsearch

I have this mapping:
products: {
product: {
properties: {
id: {
type: "long"
},
name: {
type: "string"
},
tags: {
dynamic: "true",
properties: {
tagId: {
type: "long"
},
tagType: {
type: "long"
}
}
}
}
}
}
I want to create a facet on tag ids, but with tag-type filtering.
I need the filter to only apply on the facet and not the query results.
So here's my request:
{
"from": 0,
"size": 10,
"facets": {
"tags": {
"terms": {
"field": "tags.tagId",
"size": 10
},
"facet_filter": {
"terms": {
"tags.tagType": [
"11",
"19"
]
}
}
}
},
"query": {
"match_all": {}
}
}
The facet filtering does not seem to affect the faceting.
Any ideas?

The filter is applied to the documents, the parent entity in your example. That means that you're filtering the documents on which you make the facet by tags.tagType. Therefore all documents which have a specific tags.tagType value are used to build the facet, which is not what I want.
This is the usecase for nested documents. You can have a look at this nice article too.

Related

OpenSearch / ElasticSearch index mappings

I have a system that ingests multiple scores for events and we use opensearch (previously elastic search) for getting the averages.
For example, an input would be similar to:
// event 1
{
id: "foo1",
timestamp: "some-iso8601-timestamp",
scores: [
{ name: "arbitrary-name-1", value: 80 },
{ name: "arbitrary-name-2", value: 55 },
{ name: "arbitrary-name-3", value: 30 },
]
}
// event 2
{
id: "foo2",
timestamp: "some-iso8601-timestamp",
scores: [
{ name: "arbitrary-name-1", value: 90 },
{ name: "arbitrary-name-2", value: 65 },
{ name: "arbitrary-name-3", value: 40 },
]
}
The score name are arbitrary and subject to change from time to time.
We ultimately would like to query the data to get the average scores values:
[
{ name: "arbitrary-name-1", value: 85 },
{ name: "arbitrary-name-2", value: 60 },
{ name: "arbitrary-name-3", value: 35 },
]
However, the only way we have been able to achieve this so far has been to insert multiple documents, one for each score name/value pair in each event. This seems wasteful. The search in place currently is to group the documents by score name and timestamp intervals, then to perform a weighted average of the scores in each bucket.
Is there a way the data can be inserted to allow this query pattern to take place by only adding one document into opensearch per event/record (rather than one document per score per event/record)? How might that look?
Thanks!
Is it what you were trying to do ?
I got a bit confused. ^^
DELETE /71397606
PUT /71397606
{
"mappings": {
"properties": {
"id": {
"type": "text"
},
"scores": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "long"
}
}
},
"timestamp": {
"type": "text"
}
}
}
}
POST /_bulk
{"index":{"_index":"71397606"}}
{"id":"foo1","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":80},{"name":"arbitrary-name-2","value":55},{"name":"arbitrary-name-3","value":30}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":90},{"name":"arbitrary-name-2","value":65},{"name":"arbitrary-name-3","value":40}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":85},{"name":"arbitrary-name-x","value":65},{"name":"arbitrary-name-y","value":40}]}
GET /71397606/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"nested": {
"nested": {
"path": "scores"
},
"aggs": {
"pername": {
"terms": {
"field": "scores.name",
"size": 10
},
"aggs": {
"avg": {
"avg": {
"field": "scores.value"
}
}
}
}
}
}
}
}
ps:
If not could you give an example ?

Filter document on items in an array ElasticSearch

I am using ElasticSearch to search through documents. However, I need to make sure the current user is able to see those documents. Each document is tied to a community, in which the user may belong.
Here is the mapping for my Document:
export const mapping = {
properties: {
amazonId: { type: 'text' },
title: { type: 'text' },
subtitle: { type: 'text' },
description: { type: 'text' },
createdAt: { type: 'date' },
updatedAt: { type: 'date' },
published: { type: 'boolean' },
communities: { type: 'nested' }
}
}
I'm currently saving the ids of the communities the document belongs to in an array of strings. Ex: ["edd05cd0-0a49-4676-86f4-2db913235371", "672916cf-ee32-4bed-a60f-9a7c08dba04b"]
Currently, when I filter a query with {term: { communities: community.id } }, it returns all the documents, regardless of the communities it's tied to.
Here's the full query:
{
index: 'document',
filter_path: { filter: {term: { communities: community.id } } },
body: {
sort: [{ createdAt: { order: 'asc' } }]
}
}
This is the following result based on the community id of "b7d28e7f-7534-406a-981e-ddf147b5015a". NOTE: This is a return from my graphql, so the communities on the document are actual full objects after resolving the hits from the ES query.
"hits": [
{
"title": "The One True Document",
"communities": [
{
"id": "edd05cd0-0a49-4676-86f4-2db913235371"
},
{
"id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
}
]
},
{
"title": "Boring Document 1",
"communities": []
},
{
"title": "Boring Document 2",
"communities": []
},
{
"title": "Unpublished",
"communities": [
{
"id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
}
]
}
]
When I attempt to map the communities as {type: 'keyword', index: 'not_analyzed'} I receive an error that states, [illegal_argument_exception] Could not convert [communities.index] to boolean.
So do I need to change my mapping, my filter, or both? Searching around the docs for 6.6, I see that terms needs the non_analyzed mapping.
UPDATE --------------------------
I updated the communities mapping to be a keyword as suggested below. However, I still received the same result.
I updated my query to the following (using a community id that has documents):
query: { index: 'document',
body:
{ sort: [ { createdAt: { order: 'asc' } } ],
from: 0,
size: 5,
query:
{ bool:
{ filter:
{ term: { communities: '672916cf-ee32-4bed-a60f-9a7c08dba04b' } } } } } }
Which gives me the following results:
{
"data": {
"communities": [
{
"id": "672916cf-ee32-4bed-a60f-9a7c08dba04b",
"feed": {
"documents": {
"hits": []
}
}
}
]
}
}
Appears that my filter is working too well?
Since you are storing ids of communities you should make sure that the ids doesn't get analysed. For this communities should be of type keyword. Second you want to store array of community ids since a user can belong to multiple communities. To do this you don't need to make it of type nested. Nested has all together different use case.
To sore values as array you need to make sure that while indexing you are always passing the values against the field as array even if the value is single value.
You need to change mapping and the way you are indexing values against field communities.
1. Update mapping as below:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"amazonId": {
"type": "text"
},
"title": {
"type": "text"
},
"subtitle": {
"type": "text"
},
"description": {
"type": "text"
},
"createdAt": {
"type": "date"
},
"updatedAt": {
"type": "date"
},
"published": {
"type": "boolean"
},
"communities": {
"type": "keyword"
}
}
}
}
}
2. Adding a document to index:
PUT my_index/_doc/1
{
"title": "The One True Document",
"communities": [
"edd05cd0-0a49-4676-86f4-2db913235371",
"672916cf-ee32-4bed-a60f-9a7c08dba04b"
]
}
3. Filtering by community id:
GET my_index/_doc/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"communities": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
}
}
]
}
}
}
Nested Field approach
1. Mapping:
PUT my_index_2
{
"mappings": {
"_doc": {
"properties": {
"amazonId": {
"type": "text"
},
"title": {
"type": "text"
},
"subtitle": {
"type": "text"
},
"description": {
"type": "text"
},
"createdAt": {
"type": "date"
},
"updatedAt": {
"type": "date"
},
"published": {
"type": "boolean"
},
"communities": {
"type": "nested"
}
}
}
}
}
2. Indexing document:
PUT my_index_2/_doc/1
{
"title": "The One True Document",
"communities": [
{
"id": "edd05cd0-0a49-4676-86f4-2db913235371"
},
{
"id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
}
]
}
3. Querying (used of nested query):
GET my_index_2/_doc/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "communities",
"query": {
"term": {
"communities.id.keyword": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
}
}
}
}
]
}
}
}
You might be noticing I used communities.id.keyword and not communities.id. To understand the reason for this go through this.

Elasticsearch with nested objects query

I have an index with a nested mapping.
I want to preform a query that will return the following: give me all the documents where each word in the search term appears in one or more of the nested documents.
Here is the index:
properties: {
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
Here is the latest query I try:
nested: {
path: "column_values_index_as_objects",
query: {
bool: {
must: [
{
match: {
"column_values_index_as_objects.value.analyzed": {
query: search_term,
boost: 10 * boost_factor,
operator: "or",
analyzer: "searchkick_search"
}
}
}
For example if I search the words 'food and water', I want that each word will appear in at least on nested document.
The current search returns the document even if only one of the words exists
Thanks for the help!
Update:
As Cristoph suggested, the solution works. now I have the following problem.
Here is my index:
properties: {
name: {
type: "text"
},
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
And the query I want to preform is if I search for 'my name is guy', and will give all the documents where all the words are found - might be in the nested documents and might in the name field.
For example, I could have a document with the value 'guy' in the name field and other words in the nested documents
In order to do this, I usually split the terms and generate a request like this (foo:bar is an other criteria on an other field) :
{
"bool": {
"must": [
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "food",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "and",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "water",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"query": {
"term": {
"foo": "bar"
}
}
}
]
}
}

Perform nested sort without inner_hits in ElasticSearch

I need some help on querying records from ELasticSearch (1.7.3). We will be getting a list of evaluations performed and display only the last evaluation done as shown below:
evaluation: [
{
id: 2,
breaches: null
},
{
id: 6,
breaches: null
},
{
id: 7,
breaches: null
},
{
id: 15,
breaches: null
},
{
id: 18,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
},
{
id: 19,
breaches: [
"rule_one",
"rule_two",
"rule_three"
]
}
]
Now we need to query records on the basis of latest evaluation performed, that is to query only on the last object of the evaluation array. We found out the there is a support of inner_hits to sort and limit the nested records. For that we wrote a query to sort on the basis of evaluation id in desc order and limit its size to 1 as shown below:
{
"query": {
"bool": {
"must": {
"nested": {
"path": " evaluation",
"query": {
"bool": {
"must": {
"term": {
" evaluation. breaches": "rule_one"
}
}
}
},
"inner_hits": {
"sort": {
" evaluation.id": {
"order": "desc"
}
},
"size": 1
}
}
}
}
}
}
Please find the mapping below:
evaluation: {
type: "nested",
properties: {
id: {
type: "long"
},
breaches: {
type: "string"
}
}
}
We tried sorting records but it did not worked, can you suggest some other ways to search on just the last object of nested records.
Thanks.

Elastic Search Nested Aggregation size

I have a nested aggregation that is only returning 10 results. I want it to return 1000 results. However, I'm not sure where to to specify the size. My mapping looks like (its in YAML but is processed to json, dont worry about that)
mappings:
datainfo:
properties:
filterValues:
type: string
metadata:
properties:
isPrimary:
type: boolean
name:
index: not_analyzed
type: string
source:
enabled: false
type: object
type:
index: not_analyzed
type: string
val:
index: not_analyzed
type: string
type: nested
source:
enabled: false
type: object
title:
type: string
My query looks something like
{
"query": "<some query>",
"aggs": {
"series": {
"nested": { "path": "metadata" },
"aggs": {
"val": {
"terms": { "field": "metadata.val" },
"aggs": {
"type": {
"terms": { "field": "metadata.name" }
}
}
}
}
}
}
}
Where do I put a "size" field in order to make this return X results? It currently only returns 10
To specify the number of results of the query, you can use size, or size with range:
{
"query": "<some query>",
"size": 1000,
"aggs": {
"series": {
"nested": { "path": "metadata" },
"aggs": {
"val": {
"terms": { "field": "metadata.val" },
"aggs": {
"type": {
"terms": { "field": "metadata.name" }
}
}
}
}
}
}
}
To specify the number of results on the aggregation buckets you can use Top Hits Aggregation (example from link):
{
"aggs": {
"top-tags": {
"terms": {
"field": "tags",
"size": 3
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"sort": [
{
"last_activity_date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"title"
]
},
"size" : 100
}
}
}
}
}
}
One recommended approach if you only need the aggregated results is to specify a size of 0 for the query results which eliminates the first fetch and consequently performs better.

Resources