elk's elastic search dsl case sensitive - elasticsearch

I'm doing an Elasticsearch Query DSL query on ELK such as:
{
"query": {
"wildcard": {
"url.path": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score"
}
}
}
}
but it seems is case sensitive (so show only info with "download", not "Download" or "DOWNLOAD").
i.e. is case sensitive.
can I disable this? and search case insensitive?
Version used: 7.9.1

The below query will help you perform case-insensitive search as it will fetch results for *download, *Download and *DOWNLOAD. You may replace with your index and with the field you would like to perform this search.
Search Query
GET /<my-index>/_search
{
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "*download",
"fields": ["<field1>"]
}
}
}
}
}
If you wish to perform the same search on multiple fields, you can add the same in list.
Search on multiple fields
GET /<my-index>/_search
{
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "*download",
"fields": ["<field1>","<field2>","field3>"]
}
}
}
}
}

There is a case_insensitive parameter available for wildcard query, but it was introduced in Elasticsearch 7.10.0, so you need to upgrade if you are still on 7.9.1.
If you can upgrade to 7.10.0 or higher:
Ideally, in index mapping field should use wildcard type:
{
"mappings": {
"properties": {
"url.path": {
"type": "wildcard"
}
}
}
}
Then a wildcard query with case insensitivity enabled will find all the variants ("download", "DOWNLOAD", "download", etc)
{
"query": {
"wildcard": {
"url.path": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": true
}
}
}
}
If you must remain at 7.9.1:
Define your mapping in such a way that Elasticsearch treats the field contents as lowercase. The following will mimic wildcard type (it's a keyword, so only one token) indexed as lowercase.
{
"mappings": {
"properties": {
"url": {
"type": "text",
"analyzer": "lowercase-keyword"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"lowercase-keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
The query, without the case_insensitive parameter which is unsupported in this version:
{
"query": {
"wildcard": {
"url": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score"
}
}
}
}
Example results (note that searching for "*download*" and "*DoWnLoAd*" with both work in the same way):
{
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "PtbQe3wByTvslqtrs7Cn",
"_score": 1.0,
"_source": {
"url": "http://example.com/download"
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "P9bQe3wByTvslqtrvbDt",
"_score": 1.0,
"_source": {
"url": "http://example.com/Download"
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "QNbQe3wByTvslqtrzbDw",
"_score": 1.0,
"_source": {
"url": "http://example.com/DOWNLOAD"
}
}
]
}
}

You can use case_insensitive parameter for wildcard query. This parameter was introduced in 7.10.0 version
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"url": {
"properties": {
"path": {
"type": "wildcard"
}
}
}
}
}
}
Index Data:
{
"url":{
"path":"xx/download"
}
}
Search Query:
{
"query": {
"wildcard": {
"url.path": {
"value": "*Download*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": false
}
}
}
}
Search Result:
No results will be there when you are searching for *Download* or *DOWNLOAD*
Update:
You can use the wildcard query with "case_insensitive": true parameter
Adding a sample index data, search query, and search result
Index Data:
{
"url": {
"path": "download"
}
}
{
"url": {
"path": "DOWNLOAD"
}
}
{
"url": {
"path": "Download"
}
}
Search Query:
{
"query": {
"wildcard": {
"url.path": {
"value": "*DOWNLOAD*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": true
}
}
}
}
Search Result:
"hits": [
{
"_index": "67210888",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"url": {
"path": "download"
}
}
},
{
"_index": "67210888",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"url": {
"path": "Download"
}
}
},
{
"_index": "67210888",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"url": {
"path": "DOWNLOAD"
}
}
}
]

Related

has_child and has_parent not returning results

I went through the following links before pasting the ques
Elasticsearch has_child returning no results
ElasticSearch 7.3 has_parent/has_child don't return any hits
ES documentation
I created a simple mapping with text_doc as the parent and flag_doc as the child.
{
"doc_index_ap3" : {
"mappings" : {
"properties" : {
"domain" : {
"type" : "keyword"
},
"email_text" : {
"type" : "text"
},
"id" : {
"type" : "keyword"
},
"my_join_field" : {
"type" : "join",
"eager_global_ordinals" : true,
"relations" : {
"text_doc" : "flag_doc"
}
}
}
}
}
}
The query with parent_id works fine & returns 1 doc as expected
GET doc_index_ap3/_search
{
"query": {
"parent_id": {
"type": "flag_doc",
"id":"f0d2cb3c-bf4b-11eb-9f67-93a282921115"
}
}
}
But none of the below queries return any results.
GET doc_index_ap3/_search
{
"query": {
"has_parent": {
"parent_type": "text_doc",
"query": {
"match_all": {
}
}
}
}
}
GET doc_index_ap3/_search
{
"query": {
"has_child": {
"type": "flag_doc",
"query": {
"match_all": {}
}
}
}
}
There must be some issue in the way you have indexed the parent and child documents. Refer to this official documentation, to know more about parent-child relationship
Adding a working example using the same index mapping as given in the question above
Parent document in the text_doc context
PUT /index-name/_doc/1
{
"domain": "ab",
"email_text": "ab",
"id": "ab",
"my_join_field": {
"name": "text_doc"
}
}
Child document
PUT /index-name/_doc/2?routing=1&refresh
{
"domain": "cs",
"email_text": "cs",
"id": "cs",
"my_join_field": {
"name": "flag_doc",
"parent": "1"
}
}
Search Query:
{
"query": {
"has_parent": {
"parent_type": "text_doc",
"query": {
"match_all": {
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "67731507",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_routing": "1",
"_source": {
"domain": "cs",
"email_text": "cs",
"id": "cs",
"my_join_field": {
"name": "flag_doc",
"parent": "1"
}
}
}
]
Search Query:
{
"query": {
"has_child": {
"type": "flag_doc",
"query": {
"match_all": {}
}
}
}
}
Search Result:
"hits": [
{
"_index": "67731507",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"domain": "ab",
"email_text": "ab",
"id": "ab",
"my_join_field": {
"name": "text_doc"
}
}
}
]

Elasticsearch query to return part of words searched for

I would like to know how I can return "thanks" or "thanking" if I search for "thank"
Currently I have a multi-match query which returns only occurrences of "thank" like "thank you" but not "thanksgiving" or "thanks". I am using ElasticSearch 7.9.1
query: {
bool: {
must: [
{match: {accountId}},
{
multi_match: {
query: "thank",
type: "most_fields",
fields: ["text", "address", "description", "notes", "name"],
}
}
],
filter: {match: {type: "personaldetails"}}
}
},
Also is it possible to combine the multimatch query with a queryString on one of the fields (say description, where I would do a querystring search only on description and a phrase match on other fields)
{ "query": {
"query_string": {
"query": "(new york city) OR (big apple)",
"default_field": "content"
}
}
}
Any input is appreciated.
thanks
You can use edge_ngrma tokenizer that first breaks text down into
words whenever it encounters one of a list of specified characters,
then it emits N-grams of each word where the start of the N-gram is
anchored to the beginning of the word.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 5,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"notes": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard" // note this
}
}
}
}
Index Data:
{
"notes":"thank"
}
{
"notes":"thank you"
}
{
"notes":"thanks"
}
{
"notes":"thanksgiving"
}
Search Query:
{
"query": {
"multi_match" : {
"query": "thank",
"fields": [ "notes", "name" ]
}
}
}
Search Result:
"hits": [
{
"_index": "65511630",
"_type": "_doc",
"_id": "1",
"_score": 0.1448707,
"_source": {
"notes": "thank"
}
},
{
"_index": "65511630",
"_type": "_doc",
"_id": "3",
"_score": 0.1448707,
"_source": {
"notes": "thank you"
}
},
{
"_index": "65511630",
"_type": "_doc",
"_id": "2",
"_score": 0.12199639,
"_source": {
"notes": "thanks"
}
},
{
"_index": "65511630",
"_type": "_doc",
"_id": "4",
"_score": 0.06264679,
"_source": {
"notes": "thanksgiving"
}
}
]
To combine multi-match query with query string, use the below query:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "thank",
"fields": [
"notes",
"name"
]
}
},
"should": {
"query_string": {
"query": "(new york city) OR (big apple)",
"default_field": "content"
}
}
}
}
}

Elasticsearch exact match query (not fuzzy)

I have the following query over a string field:
const query = {
"query": {
"match_phrase": {
name: "ron"
}
},
"from": 0,
"size": 10
};
these are the names I have in the database
1. "ron"
2. "ron martin"
3. "ron ron"
4. "ron howard"
the result of this query is very fuzzy, all the of the rows are returned instead of row number 1 only.
It's like it is performing "contains" instead of "equals".
Thanks
In your case, all the documents are returning, because all the documents have ron in them.
If you want that only the exact field should match, then you need to add keyword subfield to the name field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after name field). Try out this below query -
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Index Data:
{
"name":"ron"
}
{
"name":"ron martin"
}
{
"name":"ron ron"
}
{
"name":"ron howard"
}
{
"name": "john howard"
}
Search Query:
{
"query": {
"match_phrase": {
"name.keyword": "ron"
}
},
"from": 0,
"size": 10
}
Search Result:
"hits": [
{
"_index": "64982377",
"_type": "_doc",
"_id": "1",
"_score": 1.2039728,
"_source": {
"name": "ron"
}
}
]
Update 1:
Based on the comments below, if you want to search for both exact match and fuzzy match (according to your requirement), then you can use multi_match query.
Search Query:
{
"query": {
"multi_match": {
"query": "howard",
"fields": [
"name",
"name.keyword"
],
"type": "phrase"
}
}
}
Search Result:
"hits": [
{
"_index": "64982377",
"_type": "_doc",
"_id": "4",
"_score": 0.83740485,
"_source": {
"name": "ron howard"
}
},
{
"_index": "64982377",
"_type": "_doc",
"_id": "5",
"_score": 0.83740485,
"_source": {
"name": "john howard"
}
}
]

Elastic search query for name / value pair columns pull

We have one document in elastic search with multiple sections of name/value pair and we want to fetch value's only based on name column value.
"envelopeData": {
"envelopeName": "Bills",
"details": {
"detail": [
{
"name": "UC_CORP",
"value": "76483"
},
{
"name": "UC_CYCLE",
"value": "V"
}
We are expecting only 76483 as result based on name equals to UC_CORP
If the field envelopeData.details.detail is nested type then you can perform a match query for the desired name on the nested path and can use inner_hits to get just the value.
Map the field envelopeData.details.detail as nested(if not nested):
PUT stackoverflow
{
"mappings": {
"_doc": {
"properties": {
"envelopeData.details.detail": {
"type": "nested"
}
}
}
}
}
then you can perform the following query to get value using inner_hits:
GET stackoverflow/_search
{
"_source": "false",
"query": {
"nested": {
"path": "envelopeData.details.detail",
"query": {
"match": {
"envelopeData.details.detail.name.keyword": "UC_CORP"
}
},
"inner_hits": {
"_source": "envelopeData.details.detail.value"
}
}
}
}
which outputs:
{
"_index": "stackoverflow",
"_type": "_doc",
"_id": "W5GUW2gB3GnGVyg-Sf4T",
"_score": 0.6931472,
"_source": {},
"inner_hits": {
"envelopeData.details.detail": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "stackoverflow",
"_type": "_doc",
"_id": "W5GUW2gB3GnGVyg-Sf4T",
"_nested": {
"field": "envelopeData.details.detail",
"offset": 0
},
"_score": 0.6931472,
"_source": {
"value": "76483" -> Outputs value only
}
}
]
}
}
}
}

boosting along with prefix query with all

I want to be able to prefix query on EACH of search terms found in any field, and I would like to be able to have highlighting. I formulated a query which seems to work. Now, I want to update query so that matches in one of the fields yields a higher score than matches in the other fields.
For example I index the following data (this is just a sample, in my real data there are many more fields than just the two):
PUT /my_index/my_type/abc124
{
"title" : "blah",
"description" : "golf"
}
PUT /my_index/my_type/abc123
{
"title" : "blah golf",
"description" : "course"
}
PUT /my_index/my_type/abc125
{
"title" : "blah golf tee",
"description" : "course"
}
Then I can query as mentioned with a query like:
POST my_index/my_type/_search
{
"query": {
"bool": {
"must": [
{
"prefix": {
"_all" : "gol"
}
},
{
"prefix": {
"_all": "bla"
}
}
]
}
},
"highlight":{
"fields":{
"*":{}
}
}
}
Which produces the result:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.4142135,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "abc125",
"_score": 1.4142135,
"_source": {
"title": "blah golf tee",
"description": "course"
},
"highlight": {
"title": [
"<em>blah</em> <em>golf</em> tee"
]
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "abc124",
"_score": 1.4142135,
"_source": {
"title": "blah",
"description": "golf"
},
"highlight": {
"description": [
"<em>golf</em>"
],
"title": [
"<em>blah</em>"
]
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "abc123",
"_score": 1.4142135,
"_source": {
"title": "blah golf",
"description": "course"
},
"highlight": {
"title": [
"<em>blah</em> <em>golf</em>"
]
}
}
]
}
}
How can I modify the scoring using function_score or other means so that I can score matches on title field higher than other fields? Do I need to change the query to multi-match instead of using _all? Any suggestions would be appreciated.
Regards,
LT
Try adding to your bool query a should section which would give a higher score to the whole query if any of the statements in the should match (and it's not mandatory for those to match for the query to return results).
For example, try this:
POST my_index/my_type/_search
{
"query": {
"bool": {
"must": [
{
"prefix": {
"_all": "gol"
}
},
{
"prefix": {
"_all": "bla"
}
}
],
"should": [
{
"prefix": {
"title": {
"value": "gol",
"boost": 3
}
}
},
{
"prefix": {
"title": {
"value": "bla",
"boost": 3
}
}
}
]
}
},
"highlight": {
"fields": {
"*": {}
}
}
}

Resources