title.folded not working in elasticsearch - elasticsearch

I am new to elastic-search . I have created an index and have used the following analyser
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
while creating the index . The problem is when i use
title.folded = "string to be searched" i am unable to get the results for some data which is present in the index and if i don't use i get the results but then the accent does not work . What could be the problem ?

you only configured that analyzer in the mapping, but you also have to configure this analyzer for certain fields. See the example in the analyzer docs.

Related

Search for parts of a string in _id field in an existing elasticSearch index

Hie,
I am working with an existing Elastic Search index, trying to search for a string in the _id field.
The _id in this index consists of two concatinated strings, and I need to be able to search for the second part of that string.
After reading documentation I have found out that I probably should use ngram to search for a substring, but I can't make this work properly.
I have found an example online from someone who was trying to do the same, så I updated my index with the following:
PUT /"myIndex"
{"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"partial_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"partial": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"partial_filter"
]
}
}
}
}}
And then tried to add this:
PUT /"index"/_mapping/type2
{
"type2": {
"properties": {
"_id": {
"type": "string",
"analyzer": "partial"
}
}
}
}
That gives me an exception: "Rejecting mapping update to [bci_report_provider_s_dev-"myIndex"] as the final mapping would have more than 1 type: [type2, bci-report]"
How can I resolve this, and is there another way to be able to de a partial search on the _id field?
Thanks a lot in advance!
Bjørn Olav Berg

Remove index filter settings on an existing index with documents

I've created an index on my elasticsearch server with the following settings:
PUT /myindex
{
"settings": {
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
After adding a lot of documents, I've updated my index settings using the following request:
PUT /myindex/_settings
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [ "asciifolding" ]
}
}
}
}
}
and removed the index lowercase filter, but it seems that all my documents on that index are still indexed with lowercase filtering. Should I have to reindex all my documents (sigh) or is there any way to tell elasticsearch to update all documents considering my new filter settings?
You need to reindex, basically underlying lucene index segment is immutable. If you have fresh ES version this API will help you: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html otherwise you have to use search&scroll or just refetch the data from the original source

How to implement case sensitive search in elasticsearch?

I have a field in my indexed documents where i need to search with case being sensitive. I am using the match query to fetch the results.
An example of my data document is :
{
"name" : "binoy",
"age" : 26,
"country": "India"
}
Now when I give the following query:
{
“query” : {
“match” : {
“name” : “Binoy"
}
}
}
It gives me a match for "binoy" against "Binoy". I want the search to be case sensitive. It seems by default,elasticsearch seems to go with case being insensitive. How to make the search case sensitive in elasticsearch?
In the mapping you can define the field as not_analyzed.
curl -X PUT "http://localhost:9200/sample" -d '{
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}'
echo
curl -X PUT "http://localhost:9200/sample/data/_mapping" -d '{
"data": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}'
Now if you can do normal index and do normal search , it wont analyze it and make sure it deliver case insensitive search.
It depends on the mapping you have defined for you field name. If you haven't defined any mapping then elasticsearch will treat it as string and use the standard analyzer (which lower-cases the tokens) to generate tokens. Your query will also use the same analyzer for search hence matching is done by lower-casing the input. That's why "Binoy" matches "binoy"
To solve it you can define a custom analyzer without lowercase filter and use it for your field name. You can define the analyzer as below
"analyzer": {
"casesensitive_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
You can define the mapping for name as below
"name": {
"type": "string",
"analyzer": "casesensitive_text"
}
Now you can do the the search on name.
note: the analyzer above is for example purpose. You may need to change it as per your needs
Have your mapping like:
PUT /whatever
{
"settings": {
"analysis": {
"analyzer": {
"mine": {
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "mine"
}
}
}
}
}
meaning, no lowercase filter for that custom analyzer.
Here is the full index template which worked for my ElasticSearch 5.6:
{
"template": "logstash-*",
"settings": {
"analysis" : {
"analyzer" : {
"case_sensitive" : {
"type" : "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"fluentd": {
"properties": {
"message": {
"type": "text",
"fields": {
"case_sensitive": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
}
}
As you see, the logs are coming from FluentD and are saved into a timebased index logstash-*. To make sure, I can still execute wildcard queries on the message filed, I put a multi-field mapping on that field. Wildcard/analyzed queries can be done on message field and the case sensitive one on the message.case_sensitive field.

Updating analyzer within ElasticSearch settings

I'm using Sense (Chrome plugin) and I've managed to setup an analyzer and it is working correctly. If I issue a GET (/media/_settings) on the settings the following is returned.
{
"media": {
"settings": {
"index": {
"creation_date": "1424971612982",
"analysis": {
"analyzer": {
"folding": {
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"uuid": "ks98Z6YCQzKj-ng0hU7U4w",
"version": {
"created": "1040499"
},
"number_of_replicas": "1"
}
}
}
}
I am trying to update it by doing the following:
Closing the index
Issuing this PUT command (removing a filter)
PUT /media/_settings
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
}
}
}
}
Opening the index
But when the settings come back, the filter is not removed. Can you not update an analyzer once you've created it?
Short answer: No.
Longer answer. From the ES docs:
"Although you can add new types to an index, or add new fields to a
type, you can’t add new analyzers or make changes to existing fields.
If you were to do so, the data that had already been indexed would be
incorrect and your searches would no longer work as expected."
Best way is to create a new index, and move your data. Some clients have helpers to do this for you, but it's not part of the standard Java client.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html

Elasticsearch index tokenizer keyword not working

I have a indices with fields like:
"Id":{"type":"string","analyzer":"string_lowercase"} // guid for example
In elasticsearch.yml:
index:
analysis:
analyzer:
string_lowercase:
tokenizer: keyword
filter: lowercase
But filtering like this
{
"filter": {
"term": {
"Id": "2c4294c2-ca84-4f69-b648-8a014ff6e55d"
}
}
}
is not working for a whole guid value, only parts ("2c4294c2","ca84",..)
Interestingly, on other machine it work properly with same configuration.
You can't add a custom analyzer through elasticsearch.yml. There is a REST API for adding a custom analyzer. For your requirement, below is the required command:
PUT <index name>
{
"settings": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Resources