Getting disordered sort result via elasticsearch - elasticsearch

I'm a beginner with elasticsearch, I have a list of articles, with a articleReferenceName property, and I'm trying to sort them alphabetically via articleReferenceName, but they are not sorted correctly, maybe because the articles are not indexed correctly...Can some one help me to fix the filter configuration, and figure out the problem please. I think I issed some frnesh filter
This is the yaml configuration of the analyzers and filters defintions:
elasticsearch:
synonyms_file: "%es_synonyms_file%"
# https://gist.github.com/dadoonet/2146038
# http://obtao.com/blog/2013/10/configure-elasticsearch-on-an-efficient-way/
settings:
number_of_shards: 5
number_of_replicas: 1
index:
mapping.total_fields.limit: 10000
max_result_window: 500000
analysis:
analyzer:
francais_synonym:
type: custom
tokenizer: standard
filter: [ lowercase, custom_synonyms, asciifolding, fr_stopwords, fr_snowball, elision, worddelimiter ]
francais_search:
type: custom
tokenizer: standard
filter: [ lowercase, asciifolding, fr_stopwords, fr_snowball, elision, worddelimiter ]
starts_with:
tokenizer: keyword
filter: lowercase
starts_with_numeric:
tokenizer: keyword
filter: [ lowercase, worddelimiter_numeric ]
full_text:
tokenizer: keyword
filter: [ lowercase, asciifolding ]
regions:
tokenizer: keyword
filter: [ lowercase, worddelimiter_regions ]
filter:
fr_stopwords:
type: stop
stopwords: [_french_]
fr_snowball:
type: snowball
language: French
fr_stemmer:
type: stemmer
name: french
elision:
type: elision
articles: [ l, m, t, qu, n, s, j, d ]
worddelimiter:
type: word_delimiter
worddelimiter_regions:
type: word_delimiter
generate_word_parts: false
split_on_case_change: false
split_on_numerics: false
stem_english_possessive: false
custom_synonyms:
type: synonym
synonyms_path: "%es_synonyms_file%"
worddelimiter_numeric:
type: word_delimiter
generate_word_parts: false
generate_number_parts: false
catenate_numbers: true
split_on_case_change: false
split_on_numerics: false
stem_english_possessive: false
preserve_original: true
article:
mappings:
article:
_source:
enabled: true
properties:
id:
type: integer
articleReferenceName:
type: text
analyzer: francais_synonym
search_analyzer: francais_search
aggs:
type: object
properties:
articleReferenceName:
type: text
index: not_analyzed
fielddata: true
PS: I gonna accept any edit improvement on this question.

In fact I got a solution for my problem to sort or perform some aggregation feature, base on the doc we need to make the type as keyword to make the mapping index looks like this:
PUT /my_index
{
"mappings": {
"_doc": {
"properties": {
"articleReferenceName": { "type": "keyword" }
}
}
}
}
By the way, the main reason of using index: not_analyzed option is to add the field value to the index unchanged, as a single term. This is the default for all fields that support this option except for string fields. not_analyzed fields are usually used with term-level queries for structured search.

Related

Elasticsearch multiple suggestions with more advanced cases like matching prefix in the middle of a sentence

My use case : I have a search bar when the user can type his query. I want to show multiple types of search suggestions to the user in addition to a regular query suggestion. For example, in the screenshot below, as you can see in this screenshot, you can see there are company sector, companies, and schools suggestions.
This is currently implemented using completion suggesters and the following mappings (this is code from our Ruby implementation, but I believe you should be able to understand it easily)
{
_source: '',
suggest: {
text: query_from_the_user, # User query like "sec" to find "security" related matches
'school_names': {
completion: {
field: 'school_names_suggest',
},
},
'companies': {
completion: {
field: 'company_name.suggest',
},
},
'sectors': {
completion: {
field: sector_field_based_on_current_language(I18n.locale),
# uses 'company_sector.french.suggest' when the user browses in french
},
},
},
}
Here are my mappings (this is written in Ruby as but I believe it shouldn't be too hard to mentally convert this to Elasticsearch JSON config
indexes :company_name, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :company_sector, type: 'object' do
indexes :french, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :english, type: 'text' do
indexes :suggest, type: 'completion'
end
end
indexes :school_names_suggest, type: 'completion'
# sample Indexed JSON
{
company_name: "Christian Dior Couture",
company_sector: {
english: 'Milk sector',
french: 'Secteur laitier'
},
school_names_suggest: ['Télécom ParisTech', 'Ecole Centrale Paris']
}
The problem is the suggestion is not powerful enough and cannot autocomplete based on the middle of a sentence and provide additional results even after a perfect match. Here are some scenarios that I need to capture with my ES implementation
CASE 1 - Matching by prefix in the middle of a sentence
# documents
[{ company_name: "Christian Dior Couture" }]
# => A search term "Dior" should return this document because it matches by prefix on the second word
CASE 2 - Provide results even after a perfect match
# documents
[
{ company_name: "Crédit Agricole" },
{ company_name: "Crédit Agricole Pyrénées Gascogne" },
]
# => A search term "Crédit Agricole" should return both documents (using the current implementation it only returns "Crédit Agricole"
Can I implement this using suggesters in Elasticsearch ? Or do I need to fall back to multiple search that would take advantage of the new search-as-you-type data type using a query as mentionned in the doc ?
I am using elasticsearch 7.1 on AWS and the Ruby driver (gem elasticsearch-7.3.0)

How to add "context" to Elastic Search suggestions

I'm building a Enterprise social network.
I want to suggest people to add as friend, based on their title.
For example, the value can be: developer, blogger, singer, barber, bartender ...
My users are saved into ElasticSearch, their titles are saved in the field 'title'.
The current mapping is:
title: {
type: 'text',
analyzer: 'autocomplete_analyzer',
search_analyzer: 'autocomplete_analyzer_search'
}
and the query is:
should: [
{
match: {
title: {
query: user.title,
minimum_should_match: '90%',
boost: 2
}
}
}
]
and the analyzers definitions are:
indexConfig: {
settings: {
analysis: {
analyzer: {
autocomplete_analyzer: {
tokenizer: 'autocomplete_tokenizer',
filter: ['lowercase', 'asciifolding']
},
autocomplete_analyzer_search: {
tokenizer: 'lowercase',
filter: ['asciifolding']
},
phrase_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
},
derivative_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'derivative_filter', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
}
},
tokenizer: {
autocomplete_tokenizer: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
token_chars: ['letter', 'digit']
}
},
filter: {
derivative_filter: {
type: 'word_delimiter',
generate_word_parts: true,
catenate_words: true,
catenate_numbers: true,
catenate_all: true,
split_on_case_change: true,
preserve_original: true,
split_on_numerics: true,
stem_english_possessive: true
},
en_stop: {
type: 'stop',
stopwords: '_english_'
},
en_stemmer: {
type: 'stemmer',
language: 'light_english'
},
fr_stop: {
type: 'stop',
stopwords: '_french_'
},
fr_stemmer: {
type: 'stemmer',
language: 'light_french'
}
}
}
}
}
I tested it, the relevance is very good, but they are not enough users matched by this, because of the '90%' criteria.
A quick and dirty solution is to lower this criteria to 50% of course.
However, If I do that, I suppose that Elastic will search titles based on the concordance of the letters in the title, rather that the relevance of the proximity between titles.
For example, If my user is a 'barber', ElasticSearch might suggest 'bartender', because they have in common: b,a,r,e,r
Hence, I have two questions:
1 - is my assumption correct ?
2 - what can I do to add more relevance on my titles search ?
The problem with your search is following - it uses autocomplete_analyzer, which is basically creates a huge index with a lot of n-grams.
Example for bartender would be something like ba, bar, bart, etc.
As you could see, for barber you will have a bit similar n-grams, which would make a match.
Regarding your questions, if you would lower the minimum_should_match you will get more results, but that's just because the following matching procedure will lead to partial matches.
To increase the relevancy - I would recommend to use another analyzer, since this n-gram analyzer is usually suitable only for autosuggest functionality, which isn't the case. There could be several choices from keeping it simple to keyword analyzer, or whitespace one.
What would be more important is to properly construct the query. For example if user searches for partial title, e.g bar, you may use prefix query. However, if you're searching just by full match (e.g. developer or bartender) it would be more important to just normalize title field properly. E.g. to use lowercase analyzer with some stemming.

Elastic search - Search by alphabet characters A-Z

I would like to ask how to filter specific data with elastic search by simple 1 character A-Z
so.. for example i got data
Orange
Apple
Ancient
Axe
I would like to get all results which start(not contains) with character for example "A". So results is
Apple
Ancient
Axe
I found here that i should create new analyzer analyzer_startswith and set up like this. What am doing wrong? Now im getting 0 results
Elastica .yml config
fos_elastica:
clients:
default: noImportantInfo
indexes:
bundleName:
client: default
finder: ~
settings:
index:
analysis:
analyzer:
analyzer_startswith:
type: custom
tokenizer: keyword
filter:
- lowercase
types:
content:
properties:
id:
type: integer
elasticaPriority:
type: integer
title:
type: string
analyzer: another_custom_analyzer
fields:
raw:
type: string
index: not_analyzed
title_ngram:
type: string
analyzer: analyzer_startswith
property_path: title
Thank you
You could use the Prefix query for this see here https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-prefix-query.html :
GET /_search
{ "query": {
"prefix" : { "user" : "ki" }
}
}
Thanks i used prefix and its working.
I set index to not_analyzed and used Prefix to find a first character in string.
title_ngram:
type: string
property_path: title
index: not_analyzed
Is there any other way how to apply a standard search for my "title_ngram" now? Because i would like to search by single 1 character and also full text search in "title_ngram"
Try this one
GET /content/_search
{
"query": {
"match": {
"title": "A"
}
},
"sort": "title.raw"
}
For more information refer below link
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-fields.html

How to approach non-latin characters in ElasticSearch autocompletion with Mongoosastic?

Autocompletion is working fine using es.search({size: 0, suggest: ...} using completion mapping on a field that can have non-latin diacritics (accented characters like â, ê, etc.).
I am creating mappings using mongoosastic. I need to be able to use something like asciifolding for suggestions or add additional field to the response.
I have those fields:
name which is the one with diacritics.
nameSearch which is the name latinized (no diacritics/accented characters).
What I need is to either continue completion suggestions over name but treat a the same as â (and the other way).
In the response I need name. Not nameSearch.
I stumbled on this problem again, this time without mongoosastic. The answer is to have settings field in the index query (in mongoosastic you can add it when using custom mappings).
settings: {
analysis: {
analyzer: {
folding: {
tokenizer: 'standard',
filter: ['lowercase', 'custom_asciifolding'],
},
},
filter: {
custom_asciifolding: {
type: 'asciifolding',
preserve_original: true,
},
},
},
}

How to match "prefix" and not whole string in elasticsearch?

I have indexed documents, each with a field: "CodeName" that has values like the following:
document 1 has CodeName: "AAA01"
document 2 has CodeName: "AAA02"
document 3 has CodeName: "AAA03"
document 4 has CodeName: "BBB02"
When I try to use a match query on field:
query: {
"match": {
"CodeName": "AAA"
}
}
I expect to get results for "AAA01" and "AAA02", but instead, I am getting an empty array. When I pass in "AAA01" (I type in the whole thing), I get a result. How do I make it such that it matches more generically? I tried using "prefix" instead of "match" and am getting the same problem.
The mapping for "CodeName" is a "type": "string".
I expect to get results for "AAA01" and "AAA02"
This is not what Elastic Search expects. ES breaks your string into tokens using a tokenizer that you specify. If you didn't specify any tokenizer/analyzer, the default standard tokenizer splits words on spaces and hyphens etc. In your case, the tokens are stored as "AAA01", "AAA02" and so on. There is no such term as "AAA", and hence you don't get any results back.
To fix this, you can use match_phrase_prefix query or set the type of match query to be phrase_prefix . Try this code:
"query": {
"match_phrase_prefix": {
"CodeName": "AAA"
}
}
OR
"query": {
"match": {
"CodeName": {
"query": "AAA",
"type": "phrase_prefix"
}
}
}
Here is the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html. Also pay attention to the max_expansions parameter, as this query can be slow sometimes depending upon your data.
Note that for this technique, you should go with default mapping. You don't not to use nGram.
As far as I know first of all you sould index your data using a tokenizer of type nGram.
You can check detailes in documentation
COMMENT RELATED:
I'm familiar with symfony-way of using elasticsearch and we are using it like this:
indexes:
search:
client: default
settings:
index:
analysis:
custom_index_analyzer:
type: custom
tokenizer: nGram
filter: [lowercase, kstem]
tokenizer:
nGram:
type: nGram
min_gram: 2
max_gram: 20
types:
skill:
mappings:
skill.name:
search_analyzer: custom_index_analyzer
index_analyzer: custom_index_analyzer
type: string
boost: 1

Resources