Is there a way to make elasticsearch case-insensitive without altering the existing documents? - elasticsearch

Does Elasticsearch allow us to query documents case-insensitive? Or should I save them as case-insensitive before querying? Or is there some setting that I should set for the whole index to make it case-insensitive?
Can you clarify this moment please?

By Default, the fields are case-insensitive because of the mapping elastic applied.
Try below:
PUT myindex/doc/1
{
"name":"TEST"
}
GET myindex/_mapping
It should return :
{
"myindex": {
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now if you query with below, it will return a match (notice the mapping[text and keyword]):
POST myindex/_search
{
"query": {
"match": {
"name2": "test"
}
}
}
Now, if you explicitly specify to index the field as keyword, then it will be case-sensitive search. Try below and see; it will not return any results.
PUT myindex/_mapping/doc
{
"properties": {
"name2": {
"type": "keyword"
}
}
}
PUT myindex/doc/1
{
"name2":"TEST"
}
POST myindex/_search
{
"query": {
"match": {
"name2": "test"
}
}
}
TLDR; Use default mapping or text type- if you specify the field to index only keyword type, it will be case-sensitive.

Related

Update "keyword" to "text" field type of an index for inexact words matching in elasticsearch

{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I tried to update by using below PUT request on the index, but still getting the above ouput of _mapping
{
"_doc" : {
"properties" : {
"city" : {"type" : "text"}
}
}
}
I am not able to query with inexact words because its type is "keyword", for the below the actual value in record is "Mumbai"
{
"query": {
"bool": {
"must": {
"match": {
"city": {
"query": "Mumbi",
"minimum_should_match": "10%"
}
}
}
}
}
}
Below mapping (What is shared in the question) will store 'city' as text and 'city.keyword' as a keyword.
{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text", // ==========> Store city as text
"fields": {
"keyword": {
"type": "keyword", // =========> store city.keyword as a keyword
"ignore_above": 256
}
}
}
}
}
}
}
your's is the use case of Fuzzy search and not minimum_should_match.
ES Docs for Fuzzy Search: https://www.elastic.co/blog/found-fuzzy-search
Try below query
{
"query": {
"match": {
"city": {
"query": "mubai",
"fuzziness": "AUTO"
}
}
}
}
minimum_should_match
Minimum number of clauses that must match for a document to be returned
It signifies the percentage of clauses not the percentage of the string. Go through this documentation to frame the query to get the expected results. Invalid queries return invalid results.

Difference between keyword and text in ElasticSearch

Can someone explain the difference between keyword and text in ElasticSearch with an example?
keyword type:
if you define a field to be of type keyword like this.
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
Then when you make a search query on this field you have to insert the whole value (keyword search) so keyword field.
POST products/_doc
{
"name": "washing machine"
}
when you execute search like this:
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
it will not match any docs. You have to search with the whole word "washing machine".
text type on the other hand is analyzed and you can search using tokens from the field value. a full text search in the whole value:
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
and the search :
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
will return a matching documents.
You can check this to more details keyword Vs. text
The primary difference between the text datatype and the keyword datatype is that text fields are analyzed at the time of indexing, and keyword fields are not.
What that means is, text fields are broken down into their individual terms at indexing to allow for partial matching, while keyword fields are indexed as is.
Keyword Mapping
"channel" : {
"name" : "keyword"
},
"product_image" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
Along with the other advantages of keyword type in elastic search, one more is that you can store any data type inside of it. Be it string, numeric, date, etc.
PUT /demo-index/
{
"mappings": {
"properties": {
"name": { "type": "keyword" }
}
}
}
POST /demo-index/_doc
{
"name": "2021-02-21"
}
POST /demo-index/_doc
{
"name": 100
}
POST /demo-index/_doc
{
"name": "Jhon"
}

How to search a elasticsearch index by partial text of a field in the indexed document?

I have an ElsaticSearch index where I keep certain data. Each document in the index has a field named file_namein a nested document. So a doc looks like
{
...
"file_data":{
"file_name": "sample_filename_acp24_20180223_1222.json"
}
...
}
I want my search to return above document if I search for sample, filename,acp24 and 20180223 and likewise.
So far I tried following analyzers and full text search queries. But still it doesn't return the above doc if I searched for acp24, 20180223.
Index Mapping
{
"index_name": {
"mappings": {
"type": {
"properties": {
"file_data": {
"type": "nested",
"properties": {
"file_name": {
"type": "text",
"analyzer": "keyword_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Analyzer
{
"analysis": {
"analyzer": {
"keyword_analyzer":{
"type": "pattern",
"pattern":"\\W|_",
"lowercase": true
}
}
}
}
Search Query
{
"query": {
"match_phrase_prefix": {
"_all": {
"query": "20180223",
"analyzer": "keyword_analyzer"
}
}
}
}
Any help on how to achieve this is very much appreciated. I have spent so many hours with this and still couldn't find a solution.
If I understand right, you could use the wildcard query :
POST /my_index
{
"query" : {
"wildcard" : {
"file_data.file_name" : {
"wildcard" : "sample_*filename_acp24*", "boost" : 2.0
}
}
}
}
(tested with elasticsearch 6.1, might need to change the syntax for other versions)

Elasticsearch: using mapping to index one field different ways

I have an index with the following mapping:
{
"hosts": {
"mappings": {
"host": {
"properties": {
"dn": {
"type": "keyword",
"fields": {
"fqdn": {
"type": "text"
}
}
},
"hostname": {
"type": "text"
},
.....
}
}
}
where my intention is to be able get exact matches on 'dn' and full text on 'fqdn'. What happens in practice is that searches on 'dn' work fine but searches on 'fqdn' always return no documents.
E.g.
{"query": {"term": {"dn": "ps346256.uoa.auckland.ac.nz" } } }
returns one document but
{"query": {"match": {"fqdn": "ps346256" } } }
returned none.
What am I missing?
fqdn should be referenced as dn.fqdn as shown below:
{"query": {"match": {"dn.fqdn": "ps346256" } } }
Find the reference for multi-fields here.
Hope this helps!

How to force a terms filter to ignore stopwords?

I have an Elasticsearch index with a bunch of fields, some of which I want to use along with the default stopword list. On the other hand, I have a username field which should return results for users called the, be etc.
Of course, when I run the following query:
{
"query": {
"constant_score": {
"filter": {
"terms": {
"username": [
"be"
]
}
}
}
}
}
nothing is returned. I have seen various solutions for changing the standard analyzer to remove stopwords, but am struggling to find how I would do so for this one field only. Thanks for any pointers.
You can do it like the following: add a custom analyzer that shouldn't use stopwords and then explicitly specify this analyzer just for those fields you want stopwords to be recognized (like your username field).
PUT /stopwords
{
"settings": {
"analysis": {
"analyzer": {
"my_english": {
"type": "english",
"stopwords": "_none_"
}
}
}
},
"mappings": {
"text": {
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"username": {
"type": "string",
"analyzer": "my_english"
}
}
}
}
}

Resources