Elastic search - Search by alphabet characters A-Z - elasticsearch

I would like to ask how to filter specific data with elastic search by simple 1 character A-Z
so.. for example i got data
Orange
Apple
Ancient
Axe
I would like to get all results which start(not contains) with character for example "A". So results is
Apple
Ancient
Axe
I found here that i should create new analyzer analyzer_startswith and set up like this. What am doing wrong? Now im getting 0 results
Elastica .yml config
fos_elastica:
clients:
default: noImportantInfo
indexes:
bundleName:
client: default
finder: ~
settings:
index:
analysis:
analyzer:
analyzer_startswith:
type: custom
tokenizer: keyword
filter:
- lowercase
types:
content:
properties:
id:
type: integer
elasticaPriority:
type: integer
title:
type: string
analyzer: another_custom_analyzer
fields:
raw:
type: string
index: not_analyzed
title_ngram:
type: string
analyzer: analyzer_startswith
property_path: title
Thank you

You could use the Prefix query for this see here https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-prefix-query.html :
GET /_search
{ "query": {
"prefix" : { "user" : "ki" }
}
}

Thanks i used prefix and its working.
I set index to not_analyzed and used Prefix to find a first character in string.
title_ngram:
type: string
property_path: title
index: not_analyzed
Is there any other way how to apply a standard search for my "title_ngram" now? Because i would like to search by single 1 character and also full text search in "title_ngram"

Try this one
GET /content/_search
{
"query": {
"match": {
"title": "A"
}
},
"sort": "title.raw"
}
For more information refer below link
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-fields.html

Related

Atlas Search Index partial match

I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3

Elasticsearch multiple suggestions with more advanced cases like matching prefix in the middle of a sentence

My use case : I have a search bar when the user can type his query. I want to show multiple types of search suggestions to the user in addition to a regular query suggestion. For example, in the screenshot below, as you can see in this screenshot, you can see there are company sector, companies, and schools suggestions.
This is currently implemented using completion suggesters and the following mappings (this is code from our Ruby implementation, but I believe you should be able to understand it easily)
{
_source: '',
suggest: {
text: query_from_the_user, # User query like "sec" to find "security" related matches
'school_names': {
completion: {
field: 'school_names_suggest',
},
},
'companies': {
completion: {
field: 'company_name.suggest',
},
},
'sectors': {
completion: {
field: sector_field_based_on_current_language(I18n.locale),
# uses 'company_sector.french.suggest' when the user browses in french
},
},
},
}
Here are my mappings (this is written in Ruby as but I believe it shouldn't be too hard to mentally convert this to Elasticsearch JSON config
indexes :company_name, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :company_sector, type: 'object' do
indexes :french, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :english, type: 'text' do
indexes :suggest, type: 'completion'
end
end
indexes :school_names_suggest, type: 'completion'
# sample Indexed JSON
{
company_name: "Christian Dior Couture",
company_sector: {
english: 'Milk sector',
french: 'Secteur laitier'
},
school_names_suggest: ['Télécom ParisTech', 'Ecole Centrale Paris']
}
The problem is the suggestion is not powerful enough and cannot autocomplete based on the middle of a sentence and provide additional results even after a perfect match. Here are some scenarios that I need to capture with my ES implementation
CASE 1 - Matching by prefix in the middle of a sentence
# documents
[{ company_name: "Christian Dior Couture" }]
# => A search term "Dior" should return this document because it matches by prefix on the second word
CASE 2 - Provide results even after a perfect match
# documents
[
{ company_name: "Crédit Agricole" },
{ company_name: "Crédit Agricole Pyrénées Gascogne" },
]
# => A search term "Crédit Agricole" should return both documents (using the current implementation it only returns "Crédit Agricole"
Can I implement this using suggesters in Elasticsearch ? Or do I need to fall back to multiple search that would take advantage of the new search-as-you-type data type using a query as mentionned in the doc ?
I am using elasticsearch 7.1 on AWS and the Ruby driver (gem elasticsearch-7.3.0)

What is the difference between a `text` field with a `keyword` analyzer and a `keyword field in Elasticsearch?

properties: {
keyword: {
type: "keyword"
fields: {
text: { type: "text", analyzer: "keyword" }
}
}
}
If I create an index with this mapping what is the difference between keyword and keyword.text?
Both are same. Keyword type/analyzer -as per document accepts whatever text it is given and outputs the exact same text as a single term.
If intention is to do an exact match keyword type should be preferred. If you need to customise it (ex. case insensitive search) then custom analyzer can used to modify it.

Getting disordered sort result via elasticsearch

I'm a beginner with elasticsearch, I have a list of articles, with a articleReferenceName property, and I'm trying to sort them alphabetically via articleReferenceName, but they are not sorted correctly, maybe because the articles are not indexed correctly...Can some one help me to fix the filter configuration, and figure out the problem please. I think I issed some frnesh filter
This is the yaml configuration of the analyzers and filters defintions:
elasticsearch:
synonyms_file: "%es_synonyms_file%"
# https://gist.github.com/dadoonet/2146038
# http://obtao.com/blog/2013/10/configure-elasticsearch-on-an-efficient-way/
settings:
number_of_shards: 5
number_of_replicas: 1
index:
mapping.total_fields.limit: 10000
max_result_window: 500000
analysis:
analyzer:
francais_synonym:
type: custom
tokenizer: standard
filter: [ lowercase, custom_synonyms, asciifolding, fr_stopwords, fr_snowball, elision, worddelimiter ]
francais_search:
type: custom
tokenizer: standard
filter: [ lowercase, asciifolding, fr_stopwords, fr_snowball, elision, worddelimiter ]
starts_with:
tokenizer: keyword
filter: lowercase
starts_with_numeric:
tokenizer: keyword
filter: [ lowercase, worddelimiter_numeric ]
full_text:
tokenizer: keyword
filter: [ lowercase, asciifolding ]
regions:
tokenizer: keyword
filter: [ lowercase, worddelimiter_regions ]
filter:
fr_stopwords:
type: stop
stopwords: [_french_]
fr_snowball:
type: snowball
language: French
fr_stemmer:
type: stemmer
name: french
elision:
type: elision
articles: [ l, m, t, qu, n, s, j, d ]
worddelimiter:
type: word_delimiter
worddelimiter_regions:
type: word_delimiter
generate_word_parts: false
split_on_case_change: false
split_on_numerics: false
stem_english_possessive: false
custom_synonyms:
type: synonym
synonyms_path: "%es_synonyms_file%"
worddelimiter_numeric:
type: word_delimiter
generate_word_parts: false
generate_number_parts: false
catenate_numbers: true
split_on_case_change: false
split_on_numerics: false
stem_english_possessive: false
preserve_original: true
article:
mappings:
article:
_source:
enabled: true
properties:
id:
type: integer
articleReferenceName:
type: text
analyzer: francais_synonym
search_analyzer: francais_search
aggs:
type: object
properties:
articleReferenceName:
type: text
index: not_analyzed
fielddata: true
PS: I gonna accept any edit improvement on this question.
In fact I got a solution for my problem to sort or perform some aggregation feature, base on the doc we need to make the type as keyword to make the mapping index looks like this:
PUT /my_index
{
"mappings": {
"_doc": {
"properties": {
"articleReferenceName": { "type": "keyword" }
}
}
}
}
By the way, the main reason of using index: not_analyzed option is to add the field value to the index unchanged, as a single term. This is the default for all fields that support this option except for string fields. not_analyzed fields are usually used with term-level queries for structured search.

How to match "prefix" and not whole string in elasticsearch?

I have indexed documents, each with a field: "CodeName" that has values like the following:
document 1 has CodeName: "AAA01"
document 2 has CodeName: "AAA02"
document 3 has CodeName: "AAA03"
document 4 has CodeName: "BBB02"
When I try to use a match query on field:
query: {
"match": {
"CodeName": "AAA"
}
}
I expect to get results for "AAA01" and "AAA02", but instead, I am getting an empty array. When I pass in "AAA01" (I type in the whole thing), I get a result. How do I make it such that it matches more generically? I tried using "prefix" instead of "match" and am getting the same problem.
The mapping for "CodeName" is a "type": "string".
I expect to get results for "AAA01" and "AAA02"
This is not what Elastic Search expects. ES breaks your string into tokens using a tokenizer that you specify. If you didn't specify any tokenizer/analyzer, the default standard tokenizer splits words on spaces and hyphens etc. In your case, the tokens are stored as "AAA01", "AAA02" and so on. There is no such term as "AAA", and hence you don't get any results back.
To fix this, you can use match_phrase_prefix query or set the type of match query to be phrase_prefix . Try this code:
"query": {
"match_phrase_prefix": {
"CodeName": "AAA"
}
}
OR
"query": {
"match": {
"CodeName": {
"query": "AAA",
"type": "phrase_prefix"
}
}
}
Here is the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html. Also pay attention to the max_expansions parameter, as this query can be slow sometimes depending upon your data.
Note that for this technique, you should go with default mapping. You don't not to use nGram.
As far as I know first of all you sould index your data using a tokenizer of type nGram.
You can check detailes in documentation
COMMENT RELATED:
I'm familiar with symfony-way of using elasticsearch and we are using it like this:
indexes:
search:
client: default
settings:
index:
analysis:
custom_index_analyzer:
type: custom
tokenizer: nGram
filter: [lowercase, kstem]
tokenizer:
nGram:
type: nGram
min_gram: 2
max_gram: 20
types:
skill:
mappings:
skill.name:
search_analyzer: custom_index_analyzer
index_analyzer: custom_index_analyzer
type: string
boost: 1

Resources