Elasticsearch index tokenizer keyword not working - elasticsearch

I have a indices with fields like:
"Id":{"type":"string","analyzer":"string_lowercase"} // guid for example
In elasticsearch.yml:
index:
analysis:
analyzer:
string_lowercase:
tokenizer: keyword
filter: lowercase
But filtering like this
{
"filter": {
"term": {
"Id": "2c4294c2-ca84-4f69-b648-8a014ff6e55d"
}
}
}
is not working for a whole guid value, only parts ("2c4294c2","ca84",..)
Interestingly, on other machine it work properly with same configuration.

You can't add a custom analyzer through elasticsearch.yml. There is a REST API for adding a custom analyzer. For your requirement, below is the required command:
PUT <index name>
{
"settings": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Related

How to use a custom analyser on specific elasticsearch documents

Suppose I have a custom analyser that I want to use on only specific documents that have the table entity_type, how would I go about that?
Document I want to match:
{
... other keys
"_source": {
"entity_type": "table" // <-- I want to match this and use the custom analyser on this entire document
}
}
Custom analyser (currently just set to the default but I want it to only affect tables)
elasticsearch.indices.create(
index="myIndex",
body={
"settings": {
"analysis": {
"char_filter": {
"underscore_to_dash": {
"type": "mapping",
"mappings": ["_ => -"],
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"],
"char_filter": ["underscore_to_dash"],
}
},
},
}
})
Analyzer can only be applied to a specific field. So in your case it might make sense to have 2 fields and use one for "entity_type": "table" and another for other docs.

How to add case insensitive search along with pattern analyser in elasticsearch

In my elasticsearch, I have added an analyser as follows for the below field, where I will store the skills as comma separated.
"skills": "Java,Engineer(IT, Non-IT),python"
Here I want to index each string separated by a comma. In search user may search for Java,java or JAVA. The skills can be in any cases. So the search should be case-insensitive. ie, If the search is for "java" it should then give records with Java or java JaVa etc..
This is the analyser I am using. Here what changes I need to make the search should return maximum records irrespective of the case.
{
"analysis": {
"analyzer": {
"pattern_analyzers": {
"tokenizer": "custom_pattern_tokenizer",
"lowercase": true
}
},
"tokenizer": {
"custom_pattern_tokenizer": {
"pattern": ",(?![^(]*\))",
"type": "pattern"
}
}
}
NOTE: I am using elasticsearch version 2.4
Try to add lowercase filter to your analyzer:
"analysis": {
"analyzer": {
"pattern_analyzers": {
"tokenizer": "custom_pattern_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"custom_pattern_tokenizer": {
"pattern": ",(?![^(]*\))",
"type": "pattern"
}
}
}
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/analysis-lowercase-tokenfilter.html

title.folded not working in elasticsearch

I am new to elastic-search . I have created an index and have used the following analyser
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
while creating the index . The problem is when i use
title.folded = "string to be searched" i am unable to get the results for some data which is present in the index and if i don't use i get the results but then the accent does not work . What could be the problem ?
you only configured that analyzer in the mapping, but you also have to configure this analyzer for certain fields. See the example in the analyzer docs.

Elasticsearch keyword and lowercase and aggregation

I have previously stored some fields with the mapping "keyword". But, they are case senstive.
To solve this, it is possible to use an analyzer, such as
{
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
}
with the mapping
{
"properties": {
"field": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
But then the Aggregate on term does not work.
Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [a] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
It works on mapping type=keyword, but type=keyword does not allow analyzer it seems.
How do I index it as a lowercase keyword but still make it possible to use aggregation without setting fielddata=true?
If you're using ES 5.2 or above, you can now leverage normalizers for keyword fields. Simply declare your index settings and mappings like this and you're good to go
PUT index
{
"settings": {
"analysis": {
"normalizer": {
"keyword_lowercase": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"type": {
"properties": {
"field": {
"type": "keyword",
"normalizer": "keyword_lowercase"
}
}
}
}
}

Remove index filter settings on an existing index with documents

I've created an index on my elasticsearch server with the following settings:
PUT /myindex
{
"settings": {
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
After adding a lot of documents, I've updated my index settings using the following request:
PUT /myindex/_settings
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "asciifolding" ]
}
}
}
}
}
and removed the index lowercase filter, but it seems that all my documents on that index are still indexed with lowercase filtering. Should I have to reindex all my documents (sigh) or is there any way to tell elasticsearch to update all documents considering my new filter settings?
You need to reindex, basically underlying lucene index segment is immutable. If you have fresh ES version this API will help you: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html otherwise you have to use search&scroll or just refetch the data from the original source

Resources