How to add "context" to Elastic Search suggestions - elasticsearch

I'm building a Enterprise social network.
I want to suggest people to add as friend, based on their title.
For example, the value can be: developer, blogger, singer, barber, bartender ...
My users are saved into ElasticSearch, their titles are saved in the field 'title'.
The current mapping is:
title: {
type: 'text',
analyzer: 'autocomplete_analyzer',
search_analyzer: 'autocomplete_analyzer_search'
}
and the query is:
should: [
{
match: {
title: {
query: user.title,
minimum_should_match: '90%',
boost: 2
}
}
}
]
and the analyzers definitions are:
indexConfig: {
settings: {
analysis: {
analyzer: {
autocomplete_analyzer: {
tokenizer: 'autocomplete_tokenizer',
filter: ['lowercase', 'asciifolding']
},
autocomplete_analyzer_search: {
tokenizer: 'lowercase',
filter: ['asciifolding']
},
phrase_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
},
derivative_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'derivative_filter', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
}
},
tokenizer: {
autocomplete_tokenizer: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
token_chars: ['letter', 'digit']
}
},
filter: {
derivative_filter: {
type: 'word_delimiter',
generate_word_parts: true,
catenate_words: true,
catenate_numbers: true,
catenate_all: true,
split_on_case_change: true,
preserve_original: true,
split_on_numerics: true,
stem_english_possessive: true
},
en_stop: {
type: 'stop',
stopwords: '_english_'
},
en_stemmer: {
type: 'stemmer',
language: 'light_english'
},
fr_stop: {
type: 'stop',
stopwords: '_french_'
},
fr_stemmer: {
type: 'stemmer',
language: 'light_french'
}
}
}
}
}
I tested it, the relevance is very good, but they are not enough users matched by this, because of the '90%' criteria.
A quick and dirty solution is to lower this criteria to 50% of course.
However, If I do that, I suppose that Elastic will search titles based on the concordance of the letters in the title, rather that the relevance of the proximity between titles.
For example, If my user is a 'barber', ElasticSearch might suggest 'bartender', because they have in common: b,a,r,e,r
Hence, I have two questions:
1 - is my assumption correct ?
2 - what can I do to add more relevance on my titles search ?

The problem with your search is following - it uses autocomplete_analyzer, which is basically creates a huge index with a lot of n-grams.
Example for bartender would be something like ba, bar, bart, etc.
As you could see, for barber you will have a bit similar n-grams, which would make a match.
Regarding your questions, if you would lower the minimum_should_match you will get more results, but that's just because the following matching procedure will lead to partial matches.
To increase the relevancy - I would recommend to use another analyzer, since this n-gram analyzer is usually suitable only for autosuggest functionality, which isn't the case. There could be several choices from keeping it simple to keyword analyzer, or whitespace one.
What would be more important is to properly construct the query. For example if user searches for partial title, e.g bar, you may use prefix query. However, if you're searching just by full match (e.g. developer or bartender) it would be more important to just normalize title field properly. E.g. to use lowercase analyzer with some stemming.

Related

Elasticsearch multiple suggestions with more advanced cases like matching prefix in the middle of a sentence

My use case : I have a search bar when the user can type his query. I want to show multiple types of search suggestions to the user in addition to a regular query suggestion. For example, in the screenshot below, as you can see in this screenshot, you can see there are company sector, companies, and schools suggestions.
This is currently implemented using completion suggesters and the following mappings (this is code from our Ruby implementation, but I believe you should be able to understand it easily)
{
_source: '',
suggest: {
text: query_from_the_user, # User query like "sec" to find "security" related matches
'school_names': {
completion: {
field: 'school_names_suggest',
},
},
'companies': {
completion: {
field: 'company_name.suggest',
},
},
'sectors': {
completion: {
field: sector_field_based_on_current_language(I18n.locale),
# uses 'company_sector.french.suggest' when the user browses in french
},
},
},
}
Here are my mappings (this is written in Ruby as but I believe it shouldn't be too hard to mentally convert this to Elasticsearch JSON config
indexes :company_name, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :company_sector, type: 'object' do
indexes :french, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :english, type: 'text' do
indexes :suggest, type: 'completion'
end
end
indexes :school_names_suggest, type: 'completion'
# sample Indexed JSON
{
company_name: "Christian Dior Couture",
company_sector: {
english: 'Milk sector',
french: 'Secteur laitier'
},
school_names_suggest: ['Télécom ParisTech', 'Ecole Centrale Paris']
}
The problem is the suggestion is not powerful enough and cannot autocomplete based on the middle of a sentence and provide additional results even after a perfect match. Here are some scenarios that I need to capture with my ES implementation
CASE 1 - Matching by prefix in the middle of a sentence
# documents
[{ company_name: "Christian Dior Couture" }]
# => A search term "Dior" should return this document because it matches by prefix on the second word
CASE 2 - Provide results even after a perfect match
# documents
[
{ company_name: "Crédit Agricole" },
{ company_name: "Crédit Agricole Pyrénées Gascogne" },
]
# => A search term "Crédit Agricole" should return both documents (using the current implementation it only returns "Crédit Agricole"
Can I implement this using suggesters in Elasticsearch ? Or do I need to fall back to multiple search that would take advantage of the new search-as-you-type data type using a query as mentionned in the doc ?
I am using elasticsearch 7.1 on AWS and the Ruby driver (gem elasticsearch-7.3.0)

Elasticsearch search result relevance issue

Why does match query return less relevant results first? I have an index field named normalized. Its mapping is:
normalized: {
type: "text"
analyzer: "autocomplete"
}
settings for this field are:
analysis; {
filter: {
autocomplete_filter: {
type: "edge_ngram",
min_gram => "1",
max_gram => "20"
}
analyzer: {
autocomplete: {
filter: [
"lowercase",
"asciifolding",
"autocomplete_filter"
],
type: "custom",
tokenizer: "standard"
}
}
so as I know it makes an ascii, lowercase, tokens e.g. MOUSE = m, mo, mou, mous, mouse.
The problem is that request like:
{
'query': {
'bool': {
'must': {
'match': {
'normalized': 'simag'
}
}
}
}
}
returns results like
"siman siman service"
"mgr simona simunkova simiki"
"Siman - SIMANS"
"simunek simunek a simunek"
.....
But there is no SIMAG which contains all the letters of the match phrase.
How to achieve that most relevant result will be the words which contains all the letters before the tokens which does not contain all letters.
Hope somebody understand what I need.
Thanks.
PS: I am not sure but what about this query:
{
'query': {
'bool': {
'should': [
{'term': {'normalized': 'simag'}},
{'match': {'normalized': 'simag'}}
]
}
}
}
Does it make sense in comparison to previous code?
Please note that match query is analyzed, which means the same analyzer is used at the query time, which was used at the index time for the field you mentioned in your query.
In your case, you applied autocomplete analyzer on your normalized field and as you mentioned, it generates below token for MOUSE :
MOUSE = m, mo, mou, mous, mouse.
In similar way, if you search for mouse using the match query on the same field, it would search for below query strings :-
m, mo, mou, mous, mouse .. hence results which contain the words like mousee or mouser will also come as during index .. it created tokens which matches with the tokens generated on the search term.
Read more about match query on Elastic site https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html first line itself explains your search results
match queries accept text/numerics/dates, analyzes them, and
constructs a query:
If you want to go deep and understand, how your search query is matching the documents and its score use explain API
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

Elasticsearch & X-Pack: how to get vertices/connections from nested documents

I just started using X-Pack for Elasticsearch and want to connect vertices from a nested document type. However, looking for documentation on this hasn't got me anywhere.
What I have is an index of documents which have person names/ids as nested documents (one document can have many persons, one person can be related to many documents). The desired result is to get a graph data with connections between persons.
Does anyone have a clue or can tell me if this is even possible?
Part of my mappings:
mappings: {
legend: {
properties: {
persons: {
type: 'nested',
properties: {
id: {
type: 'string',
index: 'not_analyzed'
},
name: {
type: 'string',
index: 'not_analyzed'
}
}
}
}
}
}
And my Graph API query, which of course doesn't work because I don't know how to handle the "name" field of the nested "persons" field.
POST sagenkarta_v3/_xpack/_graph/_explore
{
"controls": {
"use_significance": true,
"sample_size": 20000,
"timeout": 2000
},
"vertices": [
{
"field": "persons.name"
}
],
"connections": {
"vertices": [
{
"field": "persons.name"
}
]
}
}
Thanks in advance!
The following question was discussed here:
https://discuss.elastic.co/t/elasticsearch-x-pack-how-to-get-vertices-connections-from-nested-documents/88709
quote from Mark_Harwood - Elastic Team Member:
Unfortunately Graph does not support nested documents but you can use
copy_to in your mappings to put the person data in an indexed field in
the containing root document.
I can see that you have the classic problem of
"computers-want-IDs-but-people-want-labels" and have both these
values. In Graph (and arguably the rest of Kibana too) I suggest you
use tokens that combine IDs for uniqueness' sake and names for
readability by humans.
The copy_to and IDs-and-labels tips are part of the modelling
suggestions in my elasticon talk this year:
https://www.elastic.co/elasticon/conf/2017/sf/getting-your-data-graph-ready
3

How to approach non-latin characters in ElasticSearch autocompletion with Mongoosastic?

Autocompletion is working fine using es.search({size: 0, suggest: ...} using completion mapping on a field that can have non-latin diacritics (accented characters like â, ê, etc.).
I am creating mappings using mongoosastic. I need to be able to use something like asciifolding for suggestions or add additional field to the response.
I have those fields:
name which is the one with diacritics.
nameSearch which is the name latinized (no diacritics/accented characters).
What I need is to either continue completion suggestions over name but treat a the same as â (and the other way).
In the response I need name. Not nameSearch.
I stumbled on this problem again, this time without mongoosastic. The answer is to have settings field in the index query (in mongoosastic you can add it when using custom mappings).
settings: {
analysis: {
analyzer: {
folding: {
tokenizer: 'standard',
filter: ['lowercase', 'custom_asciifolding'],
},
},
filter: {
custom_asciifolding: {
type: 'asciifolding',
preserve_original: true,
},
},
},
}

Implementing ElasticSearch custom filter for all queries

I'm trying to upgrade from using ElasticSearch 0.90.3 to 2.0 and running into some issues. The people who originally setup and configured ES are no longer available so I'm going about this with very little knowledge about how it works.
They configured ES 0.90.3 to use ElasticSearch-ServiceWrapper and Tire, beyond that there are only a couple small configuration changes.
For the most part, the upgrade went smooth, I replaced the setup information in the cap deploy process to now install ES 2.0 instead of 0.90.3 and the service is coming up, however, I cannot get the partial matching that was taking place before to work. I need to setup a standard filter that applies to all searches that will search all fields using partial matches. I've done tons of google searches and this is the closest I can come up with, but it still isn't returning partial matches.
index:
settings:
analysis:
filter:
autocomplete_filter:
type: edge_ngram
min_gram: 2
max_gram: 32
analyzer:
autocomplete:
type: custom
tokenizer: standard
filter: [ lowercase, autocomplete_filter ]
mappings:
access_point_status:
properties:
text:
type: string
analyzer: autocomplete
search_analyzer: standard
I was hoping to not need to replace Tire as that would make this upgrade much more involved, but if the problem lies within the queries and not the setup then I will go down that road. This is a sample query that is not returning the desired results:
curl -X GET 'http://localhost:9200/access_point_status/_search?from=0&size=100&pretty' -d
'{ "query":
{ "bool":
{ "must": [
{ "match":
{ "_all":
{ "query":
"1925","type":"phrase_prefix"
}
}
}
]}
}
,"sort": [ { "name":"asc" } ]
,"filter": { "term": { "domain":"domain_1" } }
,"size":100,"from":0
}'
Thanks
So I've found most of the issues. The indexes were being created by Tire and data_tables, using a different mapping. This couldn't be overwritten once created.
I created these filters and then applied them to the fields,
index:
analysis:
filter:
edge_ngram_filter:
type: edge_ngram
min_gram: 2
max_gram: 32
side: front
analyzer:
character_only:
type: custom
tokenizer: standard
filter: [ lowercase, edge_ngram_filter ]
special_character:
type: custom
tokenizer: keyword
filter: [ lowercase, edge_ngram_filter ]
And I'm matching about 95% of the things I would hope too with
curl -X GET 'http://localhost:9200/access_point_status/_search from=0&size=100&pretty' -d '
{
"query":
{
"bool":
{
"must":[
{
"prefix":
{
"_all":"bsap-"
}
}]
}
},"sort":[
{
"name":"asc"
}],"filter":
{
"term":
{
"domain":"domain_1"
}
},"size":100,"from":0
}'
The only thing I'm missing is getting special characters to match and uppercase characters are not matched. I've tried several types of queries, query_string doesn't seem to match any partials. Anyone have any thoughts on other queries?
I need to match things like mac addresses, ip's, and then combined text/number fields with -_,. as seperators.

Resources