ElasticSearch Terms Aggregation Order Case Insensitive - elasticsearch

I am trying to sort the buckets of a terms aggregation in elasticsearch case-insensitive. Here is the field mapping:
'brandName' => [
'type' => 'string',
'analyzer' => 'english',
'index' => 'analyzed',
'fields' => [
'raw' => [
'type' => 'string',
'index' => 'not_analyzed'
]
]
]
Note that this data structure here is for PHP.
And the aggregation looks like this:
aggregations => [
'brands' => [
'terms' => [
'field' => 'brandName.raw',
'size' => 0,
'order' => ['_term' => 'asc']
]
]
]
This works, but the resulting buckets are in lexicographical order.
I found some interesting docs here that explained how to do this, but it is in the context of sorting the hits, not the aggregations buckets.
I tried it anyway. Here is the analyzer I created:
'analysis' => [
'analyzer' => [
'case_insensitive_sort' => [
'tokenizer' => 'keyword',
'filter' => [ 'lowercase' ]
]
]
]
And here is the updated field mapping, with a new sub-field called "sort" using the analyzer.
'brandName' => [
'type' => 'string',
'analyzer' => 'english',
'index' => 'analyzed',
'fields' => [
'raw' => [
'type' => 'string',
'index' => 'not_analyzed'
],
'sort' => [
'type' => 'string',
'index' => 'not_analyzed',
'analyzer' => 'case_insensitive_sort'
]
]
]
And here's the updated aggregation portion of my query:
aggregations => [
'brands' => [
'terms' => [
'field' => 'brandName.raw',
'size' => 0,
'order' => ['brandName.sort' => 'asc']
]
]
]
This generates the following error: Invalid term-aggregator order path [brandName.sort]. Unknown aggregation [brandName].
Am I close? Can this kind of aggregation bucket sorting be done?

The short answer is that this kind of advanced sorting on aggregations is not yet supported and there is an open issue that is tackling this (slated for v2.0.0).
There are two other points worth mentioning here:
the brandName.sort sub-field being declared as not_analyzed, it's contradictory to also set an analyzer at the same time.
The error you're getting is because the order part can only refer to sub-aggregation names, not field names (i.e. brandName.sort is a field name)

Related

elasticsearch 7, boost by integer value

I'm trying to boost a search by the "created" field (an integer / timestamp) but always run into
"{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181}],"type":"parsing_exception","reason":"Unknown key for a START_OBJECT in [script].","line":1,"col":181},"status":400}"
Without the 'script' the query works fine. But I'm running out of ideas how to write this script correctly. Any ideas?
return [
'index' => 'articles_' . $this->system,
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc',
],
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true,
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000",
],
],
],
],
];
EDIT: Updated query, but still running into "{"error":{"root_cause":[{"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171}],"type":"parsing_exception","reason":"[query_string] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":171},"status":400}"
Script is not a standalone attribute. It should be part of bool. When you have multiple filters these should be in must/should/filter under bool
'body' => [
'size' => $this->size,
'from' => $this->start,
'sort' => [
$this->order => 'desc'
],
'query' => [
'bool' => [
'must' =>[
'query_string' => [
'query' => $this->term,
'fields' => ['title^5', 'caption^3', 'teaser^2', 'content'],
'analyze_wildcard' => true
],
'script' => [
'script' => [
'lang' => 'painless',
'source' => "doc['#created'].value / 100000"
]
]
]
]
]
]
Above can have syntax issue of brackets(I couldn't test it) , query structure is correct
...
'query' => [
'function_score' => [
'query' => [
'query_string' => [
'query' => $this->term,
'fields' => ['title^10', 'caption^8', 'teaser^5', 'content'],
'analyze_wildcard' => true,
],
],
'script_score' => [
'script' => [
'lang' => 'expression',
'source' => "_score + (doc['created'] / 10000000000000)",
],
],
],
],
Was my solution at the end. Sadly found at the documentation of elasticsearch later. But you really have to divide the timestamp strongly that it doesn't totally overpower the best matches.

How to include mapped fields subfield in result in Elasticsearch?

For aggregation, I have a raw value of my field. But I can't access this value in my query. For example, in my case I have a brand Tommy Hilfiger and it's raw value tommy-hilfiger as a brand.keyword. How to include this value in a search results?
'body' => [
'settings' => [
'analysis' => [
'filter' => [
'remove_spaces_inside' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => ' '
],
'convert_spaces' => [
'type' => 'pattern_replace',
'pattern' => '\\s+',
'replacement' => '-'
],
],
'char_filter' => [
'convert_amp' => [
'type' => 'pattern_replace',
'pattern' => '&',
'replacement' => 'and'
]
],
'analyzer' => [
'slug' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'lowercase', 'asciifolding', 'remove_spaces_inside', 'convert_spaces']
],
'format' => [
'char_filter' => ['convert_amp'],
'tokenizer' => 'keyword',
'filter' => ['trim', 'remove_spaces_inside']
]
]
]
],
'mappings' => [
'my_type' => [
'properties' => [
'brand' => [
'type' => 'string',
'fields' => [
'keyword' => [
'type' => 'string',
'analyzer' => 'slug',
'index_options' => 'docs',
]
]
]
]
]
]
]
Upd.
In my case, I store brand in 2 fields: default "Tommy Hilfiger" for full-text search, formatted keyword (slug) "tommy-hilfiger" for exact search. I can aggregate data by slug, but can't get this field in my query. For example, this query return all records with brand Tommy Hilfiger, but only default values, not a slug.
'body' => [
'_source' => [
'brand',
'brand.keyword'
],
'query' => [
'bool' => [
'must' => [
[
'terms' => [
'brand.keyword' => [
'tommy-hilfiger',
]
]
]
]
]
]
]

Multi match and highlighting in elasticsearch

When I try to match one field in query everything works fine with highlighting in elasticsearch.
When I try to use:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
]
],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]
];
everything works, but when I try to catch multiple fields, my search works correctly, but highlighting disappears.
'match' => [ 'content' => 'what I want to search' ],
'match' => [ 'type' => 1 ]
Do you know how to achieve functional highlighting, when I want apply search on two different fields with two different queries?
try this:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
],
'filter' => ['type' => 1]
]
] ],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]

Ignore Results Outside of Distance Range

I am working with ElasticSearch for an application which deals with "posts". I currently have it working with a geo_point so that it will return all posts ordered by distance from the end-user. While this is working I also need to work in one more aspect for the system.
Posts can be paid for and for instance if I were to pay for my post and choose "Local" as the area range then this post should only show to end-users which are less than or equal to 20 miles away.
I have a column on my index named spotlight_range, is there a way I can create a query to say ignore all records if the spotlight_range = 'Local' and the distance is > 20 miles? I need to do this for several different spotlight ranges. For instance Regional may be 100 miles or less, etc.
My current query looks like this
$params = [
'index' => 'my_index',
'type' => 'posts',
'size' => 25,
'from' => 0,
'body' => [
'sort' => [
'_geo_distance' => [
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
],
'order' => 'asc',
'unit' => 'mi'
]
],
'query' => [
'filtered' => [
'query' => [
'match_all' => []
],
'filter' => [
'geo_distance' => [
'distance' => '100mi',
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
]
]
]
]
]
]
];
My index is setup with the following fields.
'id' => ['type' => 'integer'],
'title' => ['type' => 'string'],
'description' => ['type' => 'string'],
'price' => ['type' => 'integer'],
'shippable' => ['type' => 'boolean'],
'username' => ['type' => 'string'],
'post_location' => ['type' => 'geo_point'],
'post_location_string' => ['type' => 'string'],
'is_spotlight' => ['type' => 'boolean'],
'spotlight_range' => ['type' => 'string'],
'created_at' => ['type' => 'date', 'format' => 'yyyy-MM-dd HH:mm:ss'],
'updated_at' => ['type' => 'date', 'format' => 'yyyy-MM-dd HH:mm:ss']
My end goal for this is not specifically to search for distance < X and range = Y but rather to have it filter them out for all types based on distances I specify. The search should return ALL types of ranges but also filter out anything past my specified distance for each range type based on the users lat/lon passed into the query.
I have been looking for a solution to this online without much luck.
I would add a circle geo_shape to the document, centered on post_location and with a radius corresponding to the spotlight_range since you know both information at indexing time. That way you can encode into each post its corresponding "reach".
...
'post_location' => ['type' => 'geo_point'],
'spotlight_range' => ['type' => 'string'],
'reach' => ['type' => 'geo_shape'], <---- add this
So a "local" document would look something like this once indexed
{
"spotlight_range": "local",
"post_location": {
"lat": 42.1526,
"lon": -71.7378
},
"reach" : {
"type" : "circle",
"coordinates" : [-71.7378, 42.1526],
"radius" : "20mi"
}
}
Then the query would feature another geo_shape centered on the user's location with the chosen radius and would only retrieve documents whose reach intersects the circle shape in the query.
$params = [
'index' => 'my_index',
'type' => 'posts',
'size' => 25,
'from' => 0,
'body' => [
'sort' => [
'_geo_distance' => [
'post_location' => [
'lat' => '44.4759',
'lon' => '-73.2121'
],
'order' => 'asc',
'unit' => 'mi'
]
],
'query' => [
'filtered' => [
'query' => [
'match_all' => []
],
'filter' => [
'geo_shape' => [
'reach' => [
'relation' => 'INTERSECTS',
'shape' => [
'type' => 'circle',
'coordinates' => [-73.2121, 44.4759],
'radius' => '20mi'
]
]
]
]
]
]
]
];

Elasticsearch aggregations multifaceted navigation, excluding facets by group

I'm pretty new to ES and have been trying to implement faceted navigation in the same way most of the large e-commerce stores do.
Part of my products mapping looks like this:
'name' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'range' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'filters' => [
'type' => 'nested',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
],
'values' => [
'type' => 'object',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
]
]
]
]
],
As you can see I'm storing the facets in "filters" as a nested object.
In my search query I can then add this aggregation to my query:
'aggs' => [
'facets' => [
'nested' => ['path' => 'filters'],
'aggs' => [
'filters' => [
'terms' => [
'field' => 'filters.name', //facet group name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.id' //facet group ID
]
],
'values' => [
'terms' => [
'field' => 'filters.values.name', //facet name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.values.id' //facet id
]
]
]
]
]
]
]
]
]
This gives me a nice list of facets to stick in my navigation. I can apply selected facets to my document results in 2 ways: Using the "post filter" or the "filtered query"
The post filter applies the aggregations after the query and so gives me document counts regardless of what facets have been selected by the user. In contrast the filtered query calculates facet counts based on selected facets, however it hides facets with no matching documents.
What I need to do - and what most of the big e-commerce stores do - is a hybrid of the 2. I want the facet counts to be based on what is selected but ignore any facets within the same group.
If I have these facets:
Colours:
Red (1)
Blue (2)
Green (3)
Brands:
Audi (1)
Ford (2)
BMW (3)
If somebody selects Blue, the counts should remain the same for Red and Green but would effect the counts for brand.
I have found a similar question on Stack Overflow:
ElasticSearch aggregation: exclude one filter per aggregation
From what I can gather I need to provide a pre defined list of facets (from my relational DB) and add them to my aggregations. So I have a manual list of my facet groups, then I add a filter bucket (https://www.elastic.co/guide/en/elasticsearch/guide/current/_filter_bucket.html) to each of these. Within the filter, I need to add a bool query which contains all the user selected facets which I include for each facet group, leaving out any facets which belong to that group.
So now I have a huge list of aggregations, grouped/bucketed by their facet group, each of these has a filter containing a bool query which may have a dozen selected fields. The query now is so large it probably would not fit on one page if I posted it here!
This to me just seems like a crazy addition to my query considering what I had to do before was almost what I needed. Is this the ONLY way I can achieve this?
Any help is greatly appreciated, I hope my question is clear enough.

Resources