ElasticSearch - issue with sub term aggregation with array fields - elasticsearch

I have the two following documents:
{
"title":"The Avengers",
"year":2012,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Actor",
},
{
"name":"Chris Evans",
"category":"Actor",
}
]
}
and:
{
"title":"The Judge",
"year":2014,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Producer",
},
{
"name":"Robert Duvall",
"category":"Actor",
}
]
}
I would like to perform aggregations, based on two fields : casting.name and casting.category.
I tried with a TermsAggregation based on casting.name field, with a subaggregation, which is another TermsAggregation based on the casting.category field.
The problem is that for the "Chris Evans" entry, ElasticSearch set buckets for ALL categories (Actor, Producer) whereas it should set only 1 bucket (Actor).
It seems that there is a cartesian product between all casting.category occurences and all casting.name occurences.
It behaves like this with array fields (casting), whereas I don't have the problem with simple fields (as title, or year).
I also tried to use nested aggregations, but maybe not properly, and ElasticSearch throws an error telling that casting.category is not a nested field.
Any idea here?

Elasticsearch will flatten the nested objects, so internally you will get:
{
"title":"The Judge",
"year":2014,
"casting.name": ["Robert Downey Jr.","Robert Duvall"],
"casting.category": ["Producer", "Actor"]
}
if you want to keep the relationship you'll need to use either nested objects or a parent child relationship
To do a nested mapping you'd need to do something like this:
"mappings": {
"movies": {
"properties": {
"title" : { "type": "string" },
"year" : { "type": "integer" },
"casting": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"category": { "type": "string" }
}
}
}
}
}

Related

Can anyone help me - how to use arrays in opensearch?

I put an object with some field and i wanna figure out how to mapping the index to handle and show the values like elasticsearch. I dunno why opensearch separate to individual fields the values. Both app has the same index mappings but the display is different for something.
I tried to map the object type set to nested but nothing changes
PUT test
{
"mappings": {
"properties": {
"szemelyek": {
"type": "nested",
"properties": {
"szam": {
"type": "integer"
},
"nev": {
"type": "text"
}
}
}
}
}
}

Kibana - missing text highlighting for multi-field mapping

I am experimenting with ECS - Elastic Common Schema.
We need to highlight text search for the field error.stack_trace . This field is a multi-field mapped defined here
I just did a simple test running Elasticsearch and Kibana 7.17.4 one field defined as multi-field and one with single field.
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": { "type": "text" },
"stack_trace02": {
"fields": {
"text": {
"type": "text"
}
},
"type": "wildcard"
}
}
}
}
POST simple-index-01/_doc
{
"#timestamp" : "2022-06-07T08:21:05.000Z",
"stack_trace01": "java.lang.NullPointerException: null",
"stack_trace02": "java.lang.NullPointerException: null"
}
Is it a Kibana expected behavior not to highlight multi-fields?
wildcard type will be not available to search using full text query as mentioned in documentaion (it is part of keyword type family):
The wildcard field type is a specialized keyword field for
unstructured machine-generated content you plan to search using
grep-like wildcard and regexp queries.
So when you try below query it will not return result and this is the reason why it is not highlghting your stack_trace02 field in discover.
POST simple-index-01/_search
{
"query": {
"match": {
"stack_trace02": "null"
}
}
}
But below query will give result:
{
"query": {
"wildcard": {
"stack_trace02": {
"value": "*null*"
}
}
}
}
You can create index mapping something like below and your parent type field should text type:
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": {
"type": "text"
},
"stack_trace02": {
"fields": {
"text": {
"type": "wildcard"
}
},
"type": "text"
}
}
}
}
You can now use stack_trace02.wildcard when you want to search wildcard type of query.
There is already open issue on similar behaviour but it is not for wildcard type.

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Indexing a field for both phrase searching and partial matches

I am creating an index on an object, and wanting to be able to do both full phrase searches as well as partial matches. The type is called "deponent", and a simplified index creation is shown below:
{
"deponent": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string"
},
"full": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
The intent of this is to index the values in the "name" field twice: once where the individual words within the field are not broken up (name.full) and once where the words are broken up (name.name).
I have a document which has been indexed whose name field is set to "Dr. Danny Watson". I would expect the following behaviors to occur when executing a term query (whose query string is not analyzed according to the documentation):
When searching name.full using "Dr. Danny Watson", the record
should be returned
When searching name.full using "Watson", the record should not be returned
When searching name.name using "Dr. Danny Watson", the record should not be returned
When searching name.name using "Watson", the record should be returned
The queries for the four points above:
1 - works as expected (returns the record)
{
"query" : {
"term": {
"name.full": {
"value": "Dr. Danny Watson"
}
}
}
}
2 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.full": {
"value": "Watson"
}
}
}
}
3 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.name": {
"value": "Dr. Danny Watson"
}
}
}
}
4 - does NOT work as expected - the record is not returned
{
"query" : {
"term": {
"name.name": {
"value": "Watson"
}
}
}
}
So it seems my understanding of something is flawed. What am I missing?
You don't need to call the field "name.name". The multi-field with the original name is used as the default, so you should use just "name" for that.
Also it's always good to make sure the index and search analyzers are in order (so for instance both your indexed terms and the search term are changed to lower case).

Resources