Elasticsearch Collapsing Based on Text Similarity

Elasticsearch Collapsing Based on Text Similarity - elasticsearch

I'm working with Elasticsearch and trying to come up with a way to filter a text field based on a phrase. I have basic searching working, but I also want to collapse "similar" results rather than duplicating them.
For example, given 5 objects with text content as
Buy 1 car get one car free until March
Buy 1 car get one car free until April
50% off your car insurance when you buy through us
Get 50% off your oven
If searching for car then I'd be looking for 2 results:
50% off your car insurance [...]
EITHER of the 1st or 2nd one (with both showing in inner_hits)
I've tried to do this using collapse on the content field but that will only collapse on exact matches.
'query' => [
'match' => [
'content' => 'car',
],
],
'collapse' => [
'field' => 'content',
'inner_hits' => [
'name' => 'recently_seen_on',
'size' => 3,
'sort' => [['seen_on' => 'desc']],
],
],
I've also tried creating adding a similarity property to the content field but I couldn't figure out if it's possible to collapse using that.
I also come across this https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-significanttext-aggregation.html but when I tried something similar I got 0 result. I set the content type to keywords in the mappings:
[
'content' => ['type' => 'keyword'],
]
And then using:
'query' => [
'match' => [
'content' => 'car',
],
],
'aggs' => [
'keywords' => [
'significant_text' => [
'field' => 'content',
'filter_duplicate_text' => true,
],
],
],
Is achieving something like this possible without coming adding a field that groups fields based on content manually?

Related

Elastic Search - sort on should and must

I have this problem, i want to sort on 2 match, and my code look like this right now (its in php array)
$data = [
'query' => [
'bool' => [
'must' => [
'multi_match' => [
'query' => $q,
'fields' => ['Fulltext'],
'type' => 'cross_fields',
'operator' => 'and'
]
],
'should' => [
'match' => [
'Title' => $q,
]
]
]
],
'aggregations' => [
'categorys' => [
'terms' => [
'field' => 'Categorys.value',
'size' => $category_size
]
]
],
'size' => $product_size,
'sort' => [
'Popular' => [
'order' => 'desc'
]
]
];
The problem in this sort i want to sort it by "Popular" but, its not first sort on match in "should" and then after in "must" so my question is.
How can i start sorting in the first match points, and then sort in the secound match points based on Popular field?
The problem is i want to search in prouct where the query match in Title is higher importen then in Fulltext field.
so if i get 10 resualt based on Title and the score are highter then the next 10 resualt based on Fulltext, but we have 3 reusalt from Title and 2 from Fulltext where Probular is gretnder then 0 then this match shut be view'en and sorted by (_score, Popular) if Popular is greather then zero else sort just based on (_score)
Can eny body help me with this question?

Elastic Search - Order / Scoring for document of the same user

I have a question about I can accomplish something.
I have my search algorithm for user documents ready.
I get the list of documents, but I don't wanna have the list to have grouped parts of documents of the same user.
Eg:
doc1: user-1
doc2: user-2
doc3: user-2
doc4: user-3
doc5: user-4
Change to:
doc1: user-1
doc2: user-2
doc4: user-3
doc5: user-4
doc3: user-2
Kind of sorting/randomising...
Any tips, ideas for what I can search?
Or much better, some examples.
I'm quite new to elastic search. The documentation about custom-scoring or ordering is great but not giving me the right answer.
Thanks a million
Stefan
Update 18.08.:
As wished, here also my current query.
'query' => [
'filtered' => [
'query' => [
'bool' => [
'must' => [
'multi_match' => [
'query' => $q,
'fields' => [ 'title^6', 'description^1', 'tags^3']
]
],
'should' => [
[
'match' => [
'isTopDocument' => [
'query' => 'true',
'boost' => 2,
]
]
],[
'range' => [
'online_start' => [
'boost' => 1.8,
'gte' => 'now-7d/d'
]
]
],[
'range' => [
'online_start' => [
'boost' => 1.4,
'gte' => 'now-14d/d'
]
]
],[ // This is to include all available jobs, at least one should must be true if a must is set
// https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html#_controlling_precision
'range' => [
'online_start' => [
'gte' => 'now-61d/d'
]
]
]
]
]
],
'filter' => [
'bool' => [
// Some term filters
'should' => $filter_should,
'must' => $filter_must,
]
]
]
],
'size' => $perPage,
'from' => $from

Even if you find a search trick to score this particular use-case, you probably want to consider just post-processing the search results to get what you need.
Just loop through the list, keeping a reference to the previous user seen, and if you see the same user in the next result, just remove it from the results, and append it to the end of the list.
Generally speaking you'll get your "shuffled" users as desired, with an occasional pileup of your most prolific user at the very end of the list.

Elasticsearch aggregations multifaceted navigation, excluding facets by group

I'm pretty new to ES and have been trying to implement faceted navigation in the same way most of the large e-commerce stores do.
Part of my products mapping looks like this:
'name' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'range' => [
'type' => 'string',
'analyzer' => 'english_analyzer',
],
'filters' => [
'type' => 'nested',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
],
'values' => [
'type' => 'object',
'properties' => [
'name' => [
'type' => 'string',
'index' => 'not_analyzed',
]
]
]
]
],
As you can see I'm storing the facets in "filters" as a nested object.
In my search query I can then add this aggregation to my query:
'aggs' => [
'facets' => [
'nested' => ['path' => 'filters'],
'aggs' => [
'filters' => [
'terms' => [
'field' => 'filters.name', //facet group name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.id' //facet group ID
]
],
'values' => [
'terms' => [
'field' => 'filters.values.name', //facet name
'size' => 0
],
'aggs' => [
'id' => [
'terms' => [
'field' => 'filters.values.id' //facet id
]
]
]
]
]
]
]
]
]
This gives me a nice list of facets to stick in my navigation. I can apply selected facets to my document results in 2 ways: Using the "post filter" or the "filtered query"
The post filter applies the aggregations after the query and so gives me document counts regardless of what facets have been selected by the user. In contrast the filtered query calculates facet counts based on selected facets, however it hides facets with no matching documents.
What I need to do - and what most of the big e-commerce stores do - is a hybrid of the 2. I want the facet counts to be based on what is selected but ignore any facets within the same group.
If I have these facets:
Colours:
Red (1)
Blue (2)
Green (3)
Brands:
Audi (1)
Ford (2)
BMW (3)
If somebody selects Blue, the counts should remain the same for Red and Green but would effect the counts for brand.
I have found a similar question on Stack Overflow:
ElasticSearch aggregation: exclude one filter per aggregation
From what I can gather I need to provide a pre defined list of facets (from my relational DB) and add them to my aggregations. So I have a manual list of my facet groups, then I add a filter bucket (https://www.elastic.co/guide/en/elasticsearch/guide/current/_filter_bucket.html) to each of these. Within the filter, I need to add a bool query which contains all the user selected facets which I include for each facet group, leaving out any facets which belong to that group.
So now I have a huge list of aggregations, grouped/bucketed by their facet group, each of these has a filter containing a bool query which may have a dozen selected fields. The query now is so large it probably would not fit on one page if I posted it here!
This to me just seems like a crazy addition to my query considering what I had to do before was almost what I needed. Is this the ONLY way I can achieve this?
Any help is greatly appreciated, I hope my question is clear enough.

ElasticSearch Terms Aggregation Order Case Insensitive

I am trying to sort the buckets of a terms aggregation in elasticsearch case-insensitive. Here is the field mapping:
'brandName' => [
'type' => 'string',
'analyzer' => 'english',
'index' => 'analyzed',
'fields' => [
'raw' => [
'type' => 'string',
'index' => 'not_analyzed'
]
]
]
Note that this data structure here is for PHP.
And the aggregation looks like this:
aggregations => [
'brands' => [
'terms' => [
'field' => 'brandName.raw',
'size' => 0,
'order' => ['_term' => 'asc']
]
]
]
This works, but the resulting buckets are in lexicographical order.
I found some interesting docs here that explained how to do this, but it is in the context of sorting the hits, not the aggregations buckets.
I tried it anyway. Here is the analyzer I created:
'analysis' => [
'analyzer' => [
'case_insensitive_sort' => [
'tokenizer' => 'keyword',
'filter' => [ 'lowercase' ]
]
]
]
And here is the updated field mapping, with a new sub-field called "sort" using the analyzer.
'brandName' => [
'type' => 'string',
'analyzer' => 'english',
'index' => 'analyzed',
'fields' => [
'raw' => [
'type' => 'string',
'index' => 'not_analyzed'
],
'sort' => [
'type' => 'string',
'index' => 'not_analyzed',
'analyzer' => 'case_insensitive_sort'
]
]
]
And here's the updated aggregation portion of my query:
aggregations => [
'brands' => [
'terms' => [
'field' => 'brandName.raw',
'size' => 0,
'order' => ['brandName.sort' => 'asc']
]
]
]
This generates the following error: Invalid term-aggregator order path [brandName.sort]. Unknown aggregation [brandName].
Am I close? Can this kind of aggregation bucket sorting be done?

The short answer is that this kind of advanced sorting on aggregations is not yet supported and there is an open issue that is tackling this (slated for v2.0.0).
There are two other points worth mentioning here:
the brandName.sort sub-field being declared as not_analyzed, it's contradictory to also set an analyzer at the same time.
The error you're getting is because the order part can only refer to sub-aggregation names, not field names (i.e. brandName.sort is a field name)

elasticsearch returning unique values

I have a field named 'category'.
I can return the list of possible categories by just doing:
$searchParams['body']['aggs']['category']['terms']['field'] = 'category';
But I want to search inside that field and return only categories matching my query.
Example list:
Pizza
Apple pie
Orange pie
Cupcake
Burger
I want to search for "pie" and have the following result:
Apple pie
Orange pie
There are more than 200 categories. I want to do this the elasticsearch way, not using MySQL as the search.
Thanks for all the help :)

Aggregations operate in the "scope" of the query. So if you execute a search query for "pie", the aggregation will only see (and aggregate) "pie" documents.
$query = [
'index' => 'my_index',
'type' => 'my_type',
'search_type' => 'count', // <-- Note search_type = count, to ignore search hits
'body' => [
'query' => [
'match' => [
'category' => 'pie'
]
],
'aggs' => [
'category' => [
'terms' => [
'field' => 'category'
]
]
]
]
];
$results = $client->search($query);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Collapsing Based on Text Similarity - elasticsearch

Related

Elastic Search - sort on should and must

Elastic Search - Order / Scoring for document of the same user

Elasticsearch aggregations multifaceted navigation, excluding facets by group

ElasticSearch Terms Aggregation Order Case Insensitive

elasticsearch returning unique values

Categories

Resources