elasticsearch do not analyze field

elasticsearch do not analyze field - elasticsearch

I use Elasticsearch (2.4) and I have an index with a field that is, in theory, analyzed on index step. But, in practice, it's not analyzed. I think I miss something, but what ?
The complete index definition :
{
"test_index": {
"aliases": {},
"mappings": {
"users": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_analyser"
},
"id": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"index_directly": "1",
"number_of_shards": "1",
"cron_limit": "50",
"creation_date": "1496150121337",
"analysis": {
"analyzer": {
"standard": {
"type": "standard",
"max_token_length": "255",
"stopwords": ""
},
"my_analyser": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "3",
"type": "ngram",
"max_gram": "3"
}
}
},
"fields": {
"name": {
"type": "text"
}
},
"number_of_replicas": "0",
"uuid": "lmwPFWoISlC2knZZn2nNZQ",
"version": {
"created": "2040599"
}
}
},
"warmers": {}
}
}
A simple document to index :
{
"id": 0,
"name": "John"
}
The result :
{
"_index": "test_index",
"_type": "users",
"_id": "0",
"_version": 1,
"found": true,
"_source": {
"id": 0,
"name": "John"
}
}
What I am expecting :
{
"_index": "test_index",
"_type": "users",
"_id": "0",
"_version": 1,
"found": true,
"_source": {
"id": 0,
"name": [
"Joh",
"ohn"
]
}
}
I have other fields on this index, and I want my custom analyzer just on name field.

Your analyzer won't affect the _source object, it only impacts the result terms that are stored in index and used for search

Related

Assign a higher score to matches containing the search query at an earlier position in elasticsearch

This question is similar to my other question enter link description here which Val answered.
I have an index containing 3 documents.
{
"firstname": "Anne",
"lastname": "Borg",
}
{
"firstname": "Leanne",
"lastname": "Ray"
},
{
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
When I search for "Ann", I would like elastic to return all 3 of these documents (because they all match the term "Ann" to a degree). BUT, I would like Leanne Ray to have a lower score (relevance ranking) because the search term "Ann" appears at a later position in this document than the term appears in the other two documents.
Here are my index settings...
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "1",
"type": "ngram",
"max_gram": "2"
}
}
}
},
"mappings": {
"properties": {
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
The following query brings back the expected documents, but attributes a higher score to Leanne Ray than to Anne Borg.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "Ann",
"fields": ["full_name"]
}
},
"should": {
"match": {
"full_name": "Ann"}
}
}
}
}
Here are the results...
"hits": [
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 6.6333585,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 6.142234,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "3",
"_score": 6.079495,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
}
Using an ngram token filter and an ngram tokenizer together seems to fix this problem...
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"ngram"
],
"tokenizer": "ngram"
}
}
}
},
"mappings": {
"properties": {
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
}
The same query brings back the expected results with the desired relative scoring. Why does this work? Note that above, I am using an ngram tokenizer with a lowercase filter and the only difference here is that I am using an ngram filter instead of the lowercase filter.
Here are the results. Notice that Leanne Ray scored lower than both Anne Borg and Anne M Stone, as desired.
"hits": [
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "3",
"_score": 4.953257,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 4.87168,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 1.0364896,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
}
By the way, this query also brings back a whole lot of false positive results when the index contains other documents as well. It's not such a problem becasuethese false positives have very low scores relative to the scores of the desirable hits. But still not ideal. For example, if I add {firstname: Gideon, lastname: Grossma} to the document, the above query will bring back that document in the result set as well - albeit with a much lower score than the documents containing the string "Ann"

The answer is the same as in the linked thread. Since you're ngraming all the indexed data, it works the same way with Ann as with Anne, You'll get the exact same response (see below), with different scores, though:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5Jr-DHIBhYuDqANwSeiw",
"_score" : 4.8442974,
"_source" : {
"firstname" : "Anne",
"lastname" : "Borg"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5pr-DHIBhYuDqANwSeiw",
"_score" : 4.828779,
"_source" : {
"firstname" : "Anne",
"middlename" : "M",
"lastname" : "Stone"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5Zr-DHIBhYuDqANwSeiw",
"_score" : 0.12874341,
"_source" : {
"firstname" : "Leanne",
"lastname" : "Ray"
}
}
]
UPDATE
Here is a modified query that you can use to check for parts (i.e. ann vs anne). Again, the casing makes no difference here, since the analyzer lowercases everything before indexing.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "ann",
"fields": [
"full_name"
]
}
},
"should": [
{
"match_phrase_prefix": {
"firstname": {
"query": "ann",
"boost": "10"
}
}
},
{
"match_phrase_prefix": {
"lastname": {
"query": "ann",
"boost": "10"
}
}
}
]
}
}
}

How to use an ngram and edge ngram tokenizer together in elasticsearch index?

I have an index containing 3 documents.
{
"firstname": "Anne",
"lastname": "Borg",
}
{
"firstname": "Leanne",
"lastname": "Ray"
},
{
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
When I search for "Anne", I would like elastic to return all 3 of these documents (because they all match the term "Anne" to a degree). BUT, I would like Leanne Ray to have a lower score (relevance ranking) because the search term "Anne" appears at a later position in this document than the term appears in the other two documents.
Initially, I was using an ngram tokenizer. I also have a generated field in my index's mapping called "full_name" that contains the firstname, middlename and lastname strings. When I searched for "Anne", all 3 documents are in the result set. However, Anne M Stone has the same score as Leanne Ray. Anne M Stone should have a higher score than Leanne.
To address this, I changed my ngram tokenizer to an edge_ngram tokenizer. This had the effect of completely leaving out Leanne Ray from the result set. We would like to keep this result in the result set - because it still contains the query string - but with a lower score than the other two better matches.
I read somewhere that it may be possible to use the edge ngram filter alongside an ngram filter in the same index. If so, how should I recreate my index to do so? Is there a better solution?
Here are the initial index settings.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "3",
"type": "ngram",
"max_gram": "4"
}
}
}
},
"mappings": {
"properties": {
"contact_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
And here is my query
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Anne",
"fields": [
"full_name"
]
}
}
]
}
}
}
This brought back these results
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.59604377,
"hits": [
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "3",
"_score": 0.59604377,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
},
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "1",
"_score": 0.5592099,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_15",
"_type": "_doc",
"_id": "2",
"_score": 0.5592099,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
}
]
}
If I instead use an edge ngram tokenizer, this is what the index's settings look like...
{
"settings": {
"max_ngram_diff": "10",
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": ["edge_ngram_tokenizer"]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "10"
}
}
}
},
"mappings": {
"properties": {
"contact_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and that same query brings back this new result set...
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.5131824,
"hits": [
{
"_index": "contacts_16",
"_type": "_doc",
"_id": "3",
"_score": 1.5131824,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_16",
"_type": "_doc",
"_id": "1",
"_score": 1.4100108,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
}
]
}

You can keep using ngram (i.e. first solution) but then you need to change your query to improve the relevance. The way it works is that you add a boosted multi_match query in a should clause to increase the score of documents whose first or last name match exactly with the input:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "Anne",
"fields": [
"full_name"
]
}
}
],
"should": [
{
"multi_match": {
"query": "Anne",
"fields": [
"firstname",
"lastname"
],
"boost": 10
}
}
]
}
}
}
This query would bring Anne Borg and Anne M Stone before Leanne Ray.
UPDATE
Here is how I arrived at the results.
First I created a test index with the exact same settings/mappings as you have added to your question:
PUT test
{ ... copy/pasted mappings/settings ... }
Then I added the three sample documents you provided:
POST test/_doc/_bulk
{"index":{}}
{"firstname":"Anne","lastname":"Borg"}
{"index":{}}
{"firstname":"Leanne","lastname":"Ray"}
{"index":{}}
{"firstname":"Anne","middlename":"M","lastname":"Stone"}
Finally, if you run my query above, you get the following results, which is exactly what you expect (look at the scores):
{
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 5.1328206,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "4ZqbDHIBhYuDqANwQ-ih",
"_score" : 5.1328206,
"_source" : {
"firstname" : "Anne",
"lastname" : "Borg"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "45qbDHIBhYuDqANwQ-ih",
"_score" : 5.0862665,
"_source" : {
"firstname" : "Anne",
"middlename" : "M",
"lastname" : "Stone"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "4pqbDHIBhYuDqANwQ-ih",
"_score" : 0.38623023,
"_source" : {
"firstname" : "Leanne",
"lastname" : "Ray"
}
}
]
}
}

Search across _all field in Elastic and return results with highlighting

I am using Elastic 5.4 and wanted to query across index containing documents of multiple types.(type a and type b). Below are example documents in the index:
Documents:
{
"_index": "test",
"_type": "a",
"_id": "1",
"_source": {
"id": "1",
"name": "john-usa-soccer",
"class": "5",
"lastseen": "2017-07-05",
"a_atts": {
"lastname": "tover",
"hobby": "soccer",
"country": "usa"
}
}
}
{
"_index": "test",
"_type": "b",
"_id": "2",
"_source": {
"id": "2",
"name": "john-usa",
"class": "5",
"lastseen": "2017-07-05",
"b_atts": {
"lastname": "kaml",
"hobby": "baseball",
"country": "usa"
}
}
}
Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"a": {
"dynamic_templates": [
{
"strings": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple"
},
"analyzer1": {
"type": "text",
"analyzer": "simple"
},
"analyzer2": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
]
},
"b": {
"dynamic_templates": [
{
"strings": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple"
},
"analyzer1": {
"type": "text",
"analyzer": "simple"
},
"analyzer2": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
]
}
}
}
My query is to search all documents which contain 'john' across any of the fields in any type and highlight the fields where the match was found. This query is constructed as per Elastic documentation. My Schema mappings has ngram_analyzer configured as analyzer instead of default analyzer for all fields of type string in the schema.
Query: http://localhost:9200/student/_search
{
"query": {
"bool": {
"should": [
{ "match": { "_all": "john"} }
]
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": false
},
"a_atts.lastname":{
"require_field_match": false
},
"a_atts.hobby":{
"require_field_match": false
},
"a_atts.country":{
"require_field_match": false
}
}
}
}
Response:
{
"took": 79,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.17669111,
"hits": [
{
"_index": "student",
"_type": "a",
"_id": "AV1WjBeYEZrDBYsdGMtY",
"_score": 0.17669111,
"_source": {
"name": "john-usa-soccer",
"class": "5",
"lastseen": "2017-07-05",
"a_atts": {
"lastname": "tover",
"hobby": "soccer",
"country": "usa"
}
}
},
{
"_index": "student",
"_type": "b",
"_id": "AV1WjHFxEZrDBYsdGMtZ",
"_score": 0.17669111,
"_source": {
"name": "john-usa",
"class": "5",
"lastseen": "2017-07-05",
"b_atts": {
"lastname": "kaml",
"hobby": "baseball",
"country": "usa"
}
}
}
]
}
}
However, executing the above query against an index, returns documents matched with their _source content but not highlight field. It is missing the following:
"highlight": {
"name": [
"<em>john</em>-usa-soccer"
]
}
How can I return highlight in the results?

I got highlighter to work by following the answer provided in this link.
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
}

Synonym analyzer not working

Here are my settings:
{
"countries": {
"aliases": {},
"mappings": {
"country": {
"properties": {
"countryName": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1472140045116",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "7-fKyD9aR2eG3BwUNdadXA",
"version": {
"created": "2030599"
}
}
},
"warmers": {}
}
}
My synonym.txt file is in the config folder inside the main elasticsearch folder.
Here is my query:
query: {
query_string: {
fields: ["countryName"],
default_operator: "AND",
query: searchInput,
analyzer: "synonym"
}
}
The words in synonym.txt are: us, u.s., united states.
So this doesn't work. What's interesting is that search works as normal, except for when I enter any of the words in the synonym.txt file. So for example, when I usually type in us into the search, I would get results. With this analyzer, us doesn't give me anything.
I've done close and open to my ES server, and still it doesn't work.
EDIT
An example of a document:
{
"_index": "countries",
"_type": "country",
"_id": "57aabeb80057405968de152b",
"_score": 1,
"_source": {
"countryName": "United States"
}
Example of searchInput (this is coming from the front-end):
united states
EDIT #2:
Here is my updated index config file:
{
"countries": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"creation_date": "1472219634083",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
},
"country": {
"properties": {
"countryName": {
"type": "string",
"analyzer": "synonym"
},
"number_of_replicas": "1",
"uuid": "50ZwpIVFTqeD_rJxlmd59Q",
"version": {
"created": "2030599"
}
}
},
"warmers": {}
}
}
}
}
When I try adding documents, and doing a search on said documents, the synonym analyzer does not work for me.
EDIT #3
Here are 2 documents in the index:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [{
"_index": "stocks",
"_type": "stock",
"_id": "2",
"_score": 1,
"_source": {
"countryName": "United States"
}
}, {
"_index": "stocks",
"_type": "stock",
"_id": "1",
"_score": 1,
"_source": {
"countryName": "Canada"
}
}]
}
}

You are close, but I suggest reading thoroughly this section from the documentation to understand better this functionality.
As a solution:
PUT /countries
{
"mappings": {
"country": {
"properties": {
"countryName": {
"type": "string",
"analyzer": "synonym"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"lowercase",
"synonym"
],
"tokenizer": "whitespace"
}
}
}
}
}
You need to delete the index and create it again with the mapping above.
Then use this query:
"query": {
"query_string": {
"fields": [
"countryName"
],
"default_operator": "AND",
"query": "united states"
}
}

Have you deleted/created the index after pushing the txt ?
I think you should remove the "synonyms": "" if you are using "synonyms_path"

Unexpected (case-insensitive) string sorting in Elasticsearch

I have a list of console platforms that I'm sorting in Elasticsearch.
Here is the mapping for the "name" field:
{
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string",
"index": "analyzed"
},
"sort_name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
When I execute the following query
{
"query": {
"match_all": {}
},
"sort": [
{
"name.sort_name": { "order": "asc" }
}
],
"fields": ["name"]
}
I get these results:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 17,
"max_score": null,
"hits": [
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602489",
"_score": null,
"fields": {
"name": "GameCube"
},
"sort": [
"GameCube"
]
},
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602490",
"_score": null,
"fields": {
"name": "Gameboy Advance"
},
"sort": [
"Gameboy Advance"
]
},
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602498",
"_score": null,
"fields": {
"name": "Nintendo 3DS"
},
"sort": [
"Nintendo 3DS"
]
},
...remove for brevity ...
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602493",
"_score": null,
"fields": {
"name": "Xbox 360"
},
"sort": [
"Xbox 360"
]
},
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602502",
"_score": null,
"fields": {
"name": "Xbox One"
},
"sort": [
"Xbox One"
]
},
{
"_index": "platforms",
"_type": "platform",
"_id": "1393602497",
"_score": null,
"fields": {
"name": "iPhone/iPod"
},
"sort": [
"iPhone/iPod"
]
}
]
}
Everything is sorted as expected except the iPhone/iPod result is at the end (instead of after GameBoy Advance) - why does the / in the name have an effect on the sorting?
Thanks

Okay so I discovered the reason wasn't anything to do with the /. ES will sort by capital letters then lower case letters.
I added a custom analyzer to the settings of the index creation:
{
"analysis": {
"analyzer": {
"sortable": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
Then in the field mapping I added 'analyzer': 'sortable' to the sort_name multi field.

Use Normalizer with keyword to handle the sort
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html#analysis-normalizers
PUT index_name
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": ["quote"],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
Search query may be modified like this
{
"query": {
"match_all": {}
},
"sort": [
{
"name.sort_name": { "order": "asc" }
}
],
"fields": "name.keyword"
}

According to https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html (ElasticSearch 7.16) ...
Elasticsearch ships with a lowercase built-in normalizer.
So you can define an additional field (in the example below named "lowersortable"):
PUT /myindex/_mapping
{
"properties": {
"myproperty": {
"type": "text",
"fields": {
"lowersortable": {
"type": "keyword",
"normalizer": "lowercase"
}
}
}
}
}
... and use this field myproperty.lowersortable for sorting in the search query.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

elasticsearch do not analyze field - elasticsearch

Your analyzer won't affect the _source object, it only impacts the result terms that are stored in index and used for search

Related

Assign a higher score to matches containing the search query at an earlier position in elasticsearch

How to use an ngram and edge ngram tokenizer together in elasticsearch index?

Search across _all field in Elastic and return results with highlighting

Synonym analyzer not working

Unexpected (case-insensitive) string sorting in Elasticsearch

Categories

Resources