How do I do a terms aggregation by concatenating two arrays? - elasticsearch

I have an Elasticsearch mapping that looks like this:
"product": {
"properties": {
"attributes": {
"type": "keyword",
"normalizer": "lowercase"
},
"skus": {
"type": "nested",
"properties": {
"attributes": {
"type": "keyword",
"normalizer": "lowercase"
}
}
}
}
}
I'm trying to do a terms aggregation on both the field attributes and the field skus.attributes by concatenating them but I haven't figured out how. Both fields are simple string arrays. This is as far as I've gotten:
{
"query": {
"match_all": {}
},
"aggregations": {
"unique_attrs": {
"terms": {
"field": "attributes"
}
}
}
}
Of course, I could reindex my data in a way that there would be another field that contains a concatenation of the values of both fields but that seem right.

As mentioned on the Elasticsearch Forums: https://discuss.elastic.co/t/combining-nested-and-non-nested-aggregations/82583 it recommends merging them using a copy_to mapping when indexing the data.

Related

Elasticsearch - Using copy_to on the fields of a nested type

Elastic version 7.17
Below I've pasted a simplified version of my mappings which represent a nested object structure. One top-level-object will have one or more second-level-object. A second-level-object will have one or more third-level-object. Fields field_a, field_b, and field_c on third-level-object are all related to each other so I'd like to copy them into a single field that can be partial matched against. I've done this on a lot of attributes at the top-level-object level, so I know it works.
{
"mappings": {
"_doc": { //one top level object
"dynamic": "false",
"properties": {
"second-level-objects": { //one or more second level objects
"type": "nested",
"dynamic": "false",
"properties": {
"third-level-objects": { //one or more third level objects
"type": "nested",
"dynamic": "false",
"properties": {
"my_copy_to_field": { //should have the values from field_a, field_b, and field_c
"type": "text",
"index": true
},
"field_a": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_b": {
"type": "long",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_c": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_d": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
}
}
However, when I run a nested query against that my_copy_to_field I get no results because the field is never populated, even though I know my documents have data in the 3 fields with copy_to. If I perform a nested query against field_d which is not part of the copied info I get the expected results, so it seems like there's something going on with nested (or double-nested in my case) usage of copy_to that I'm overlooking. Here is my query which returns nothing:
GET /my_index/_search
{
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects",
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects.third-level-objects",
"query": {
"bool": {
"should": [
{"match": {"second-level-objects.third-level-objects.my_copy_to_field": "my search value"}}
]
}
}
}
}
}
}
}
I've tried adding include_in_root:true to the third-level-objects, but that didn't make any difference. If I could just get the field to populate with the copied data then I'm sure I can work through getting the query working. Is there something I'm missing about using copy_to with nested fields?
Additionally, when I view my data in Kibana -> Discover, I see second-level-objects as an available "Nested" field, but I don't see anything for third-level-objects, even though KQL recognizes it as a field. Is that symptomatic of an issue?
You must add complete path nested, like this:
"field_a": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_b": {
"type": "long",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_c": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
}

nested terms aggregation on object containing a string field

I like to run a nested terms aggregation on string field which is inside an object.
Usually, I use this query
"terms": {
"field": "fieldname.keyword"
}
to enable fielddata
But I am unable to do that for a nested document like this
{
"nested": {
"path": "objectField"
},
"aggs": {
"allmyaggs": {
"terms": {
"field": "objectField.fieldName.keyword"
}
}
}
}
The above query is just returning an empty buckets array
Is there a way this can be done without enabling field-data by default during index mapping.
Since that will take a large heap memory and I have already loaded a huge data without it
document mapping
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text"
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
sample document
{
"productname": [
{
"productlineseqno": "1.58",
"iscontrolleddrug": "false",
"productlinename": "Consultations",
"productlinedescription": "Consultations",
"isprescribable": "false",
"invoiceitemname": "invoice name"
}
]
}
Fixed
By changing the mapping to enable field data
Nested query is used to access nested fields similarly nested aggregation is needed to aggregation on nested fields
{
"aggs": {
"fieldname": {
"nested": {
"path": "objectField"
},
"aggs": {
"fields": {
"terms": {
"field": "objectField.fieldname.keyword",
"size": 10
}
}
}
}
}
}
EDIT1:
If you are searching for productname.invoiceitemname.keyword then it will give empty bucket as no field exists with that name.
You need to define your mapping like below
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text",
"fields":{ --> note
"keyword":{
"type":"keyword"
}
}
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
}
}
}
Fields
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
When mapping is not explicitly provided, keyword fields are created by default. If you are creating your own mapping(which you need to do for nested type), you need to provide keyword fields in mapping, wherever you intend to use them

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.
Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}
I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter
so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}

Elasticsearch: sorting Spanish double names alphabetically

I am doing an Elasticsearch query and I want the results ordered alphabetically by last name. My problem: the last names are all Spanish double names, and ES doesn't order them the way I would like it.
I would prefer the order to be:
Batres Rivera
Batrín Chojoj
Fion Morales
Lopez Giron
Martinez Castellanos
Milán Casanova
This is my query:
{
"query": {
"match_all": {}
},
"sort": [
{
"Last Name": {
"order": "asc"
}
}
]
}
The order that I get with this is:
Batres Rivera
Batrín Chojoj
Milán Casanova
Martinez Castellanos
Fion Morales
Lopez Giron
So it is not sorting by the first string, but by either of both (Batres, Batrín, Casanova, Castellanos, Fion, Giron).
If I try additionally
{
"order": "asc",
"mode": "max"
}
then I get:
Batrín Chojoj
Lopez Giron
Martinez Castellanos
Milán Casanova
Fion Morales
Batres Rivera
All the fields are indexed by default, I checked with
curl -XGET localhost/my_index/_mapping
and I get back
my_index: {
my_type: {
properties: {
FirstName: {
type: string
}LastName: {
type: string
}MiddleName: {
type: string
}
...
}
}
}
Does anyone know how to make the results to be ordered to be ordered alphabetically by the beginning string of the last name?
Thanks!
The problem is that your LastName field is analyzed, so the string Batres Rivera is indexed as a multi-value field with two terms: batres and rivera. But this isn't like an ordered array, it's more like a "bag of values". So when you try to sort on the field, it chooses one of the terms (the min or max) and sorts on that.
What you need to do is to store the LastName as a single term (Batres Rivera) for sorting purposes, by mapping the field as
{ "type": "string", "index": "not_analyzed"}
Obviously you can't then use that field for search purposes: you wouldn't be able to search for rivera and match on that field.
The way to support both searching and sorting is to use multi-fields: ie index the same value in two ways, one for searching and one for sorting.
In 0.90.* the syntax for multi-fields is:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "multi_field",
"fields": {
"LastName": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
In 1.0.* the multi_field type has been removed and now any core field type supports sub-fields as follows:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
So you can use the LastName field for searching, and the LastName.raw field for sorting:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
Language specific sorting
You should also look at using the ICU analysis plugin to sort using the Spanish sort order (or collation). This is a bit more complex but is worth using:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_folding"
]
},
"es_sorting": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"spanish"
]
}
},
"filter": {
"spanish": {
"type": "icu_collation",
"language": "es"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"analyzer": "folding",
"fields": {
"raw": {
"type": "string",
"analyzer": "es_sorting"
}
}
}
}
}
}
}'
We create a folding analyzer which we'll use for the LastName field, which will analyze a string like Muñoz Rivera into the two terms munoz (without the ~) and rivera. So a user can search for munoz or muñoz and either will match.
Then we create the es_sorting analyzer which indexes the proper sort order for muñoz rivera (lowercased) in Spanish.
Searching would be done in the same way:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
We need to know how you are indexing the name.
Please check this discussion link.
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-td932996.html
This should be very helpful for your case. This depends on your mapping settings. What analyzer you use for the name field.
Need your mapping definition to decide on a proper solution.

Resources