Can I use stopwords, case insensitivty and removing punctuation in Elastic search Keyword data type? - elasticsearch

Actually, as we know for keyword data type we have to use normalizer, but I am gettting an error to use stopword in normalizer. Is there any other way to add stopwords in normalizer for keyword data type?
It is working for lowercase but not for stop word.
Can any one help me out?
I am just want track or any sample code...
Or I have to convert keyword data type into text?

Related

How to value exact match higher than term frequency in elasticsearch?

I have an index that has several title fields.
main_title,
sub_titles,
preferred_titles
etc.
These texts fields also have a suggest field each where I run a custom analyzer that uses edge-n-gram tokenizer so we can search as we type.
I would like to value exact match over term frequency. And exact match in main_title is worth more than exact match in preferred_titles.
Anyone know how I can achieve this? Thanks in advance.
I have tried a bool_query with multi_match_query in the must clause. The multi_match is crossfields with no fields attached with the operator 'and'.
I have both the text fields and the suggest fields in the should cluase. Each text field is in a match_query with a boost and the operator 'and'. Each suggest field is in a match_phrase_query with a boost and the operator 'and'. The issue is that several boosts are added on top of the scores and I end up with very inflated scores.

How can I use standard SQL on text fields of elastic without using the specials SQL elasticSearch operators?

I would like to create SQL query on some text field (not keyword) for example "name" field and send that query to elastic server.
my problem is that I need to use the standard SQL language (not the MATCH and QUERY operators which are specials for elastic SQL) of text fields.
when I tried to use JDBC driver or when I tried to use high-level-java-client with LIKE operatorI got the following error
"No keyword/multi-field defined exact matches for [name]; define one or use MATCH/QUERY instead"
I also tried to use the translate API of elasticsearch- but even there I couldn't use the "LIKE" operator on text fields only on keyword fields.
does anyone have any solution for me? I want to use the LIKE operator on text fields instead of the full text operators which are unique to elastic sql.
Please check the this documentation. they have clearly mentioned in document that it is not possible.
One significant difference between LIKE/RLIKE and the full-text search
predicates is that the former act on exact fields while the latter
also work on analyzed fields. If the field used with LIKE/RLIKE
doesn’t have an exact not-normalized sub-field (of keyword type)
Elasticsearch SQL will not be able to run the query. If the field is
either exact or has an exact sub-field, it will use it as is, or it
will automatically use the exact sub-field even if it wasn’t
explicitly specified in the statement.
If you still want to used text field then you need to enabled multi-field as mentioned here. or you can try out to enable fielddata on text field but i am not sure that it will work SQL or not.

Practical usage of keyword analyzer

What is the scenario when you would need to use a mapping of Keyword-Analyzer compared to marking it as not_analyzed with Doc values turned on for the field.
From the elasticsearch documentation, it seems that if a field is not analyzed , then it is better to turn doc_Values on for this field.
The elasticsearch documentation also states specifically that [sic] Note, when using mapping definitions, it might make more sense to simply mark the field as not_analyzed
I am a bit confused as to why the Keyword Analyzer will ever be used ?
According to a core committer, both are equivalent.
That wouldn't be the case of the keyword tokenizer, though, which can be combined with other filters (lowercase, etc) and thus participate in many different ways of tokenizing your input.
I had the exact usecase for keyword analyzer, analyzers allow you to add character filters in addition to common filters and that's where they are used.
I had a field containing numeric values including whitespaces. Numbers may be persian , for ex ۱۲۳۴۵۶۷۸۹ or English, so i needed a char filter to normalize them to english ones with a char filter without any other tokenizing process, when I marked the field not_analyzed I was unable to use my char filter.

Search by ignore value case checking

In my index I have inserted fields without changing the case of values(Upper case or Lower case), like in my elasticsearch document a field name contains value Hello World. And i have made name field as not_analyzed for exact match. But in that case, when i search by hello world this document don’t returned by elasticsearch, might be due to case sensitivity. I have tried by using term query and match query but haven't found a luck.
Please suggest, if there is a way.
Thanks
The only way you can do this in Elasticsearch is by analyzing the field and using token filters. There is a lowercase token filter available that you should use but this can't really be done on-the-fly like SQL where you wrap the field to be queried against in something like LOWER().
To get the effect you desire I would use something like the Keyword tokenizer with the Lowercase token filter. If you set this analyzer to be the default analyzer for indexing and searching then your searches will also be case insensitive too.

Is it possible to set a custom analyzer to not tokenize in elasticsearch?

I want to treat the field of one of the indexed items as one big string even though it might have whitespace. I know how to do this by setting a non-custom field to be 'not-analyzed', but what tokenizer can you use via a custom analyzer?
The only tokenizer items I see on elasticsearch.org are:
Edge
NGram
Keyword
Letter
Lowercase
NGram
Standard
Whitespace
Pattern
UAX URL Email
Path
Hierarchy
None of these do what I want.
The Keyword tokenizer is what you are looking for.
The Keyword tokenizer doesn't really do:
When searching, it'll tokenize the entire query string into a single token, making text queries behave like a term query.
The issue I run into is that I want to add filters and then search indexed keywords in a long text (Keyword assignment). I would say there's no tokenizer that could do this, and that the normalizer can't accept necessary filters. The workaround for me is to prepare the text before feeding it to elasticsearch.

Resources