Elasticsearch wildcard case-sensitive - elasticsearch

How to make wildcard case-insensitive?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

Since version 7.10 the wildcard query supports special parameter case_insensitive (boolean).
Example of case-insensitive search:
GET /_search
{
"query": {
"wildcard": {
"my_field": {
"value": "ki*y",
"case_insensitive": true
}
}
}
}

Wildcards are not_analyzed. It depends on what analyzers you've provided for the field you're searching. But if you're using the default analyzers then a wildcard query will return case-insensitive results.
Example: Post two names in a sample index one is "Sid" and other "sid".
POST sample/sample
{
"name" : "sid"
}
POST sample/sample
{
"name" : "Sid"
}
Then perform a wildcard query:
GET sample/_search
{
"query": {
"wildcard": {
"name": {
"value": "s*"
}
}
}
}
This will return me both the documents:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "sample",
"_type": "sample",
"_id": "AWRPM87Wb6oopELrnEKE",
"_score": 1,
"_source": {
"name": "Sid"
}
},
{
"_index": "sample",
"_type": "sample",
"_id": "AWRPM9tpb6oopELrnEKF",
"_score": 1,
"_source": {
"name": "sid"
}
}
]
}
}
But if you perform a wildcard query on "S*" it will return nothing. Because the default token filter stores the terms in lowercase and the term "Sid" is stored as "sid" in the inverted index.

In my case this is not true, it is case sensitive by default - I am using ES 7.2.
In you sample the type of the field is "text" not "keyword"

I was looking for the same option for nodejs client, so came across this question, so posting as an answer might help someone else.
I have to convert the term to lowercase and its worked for me *${term.toLowerCase()}*
Here is the complete function
searchUsers(term, from, limit) {
let users = await EsClient.search({
index: 'users',
type: 'users',
body: {
from,
size: limit,
query: {
bool: {
should: [
{
wildcard: {
email: {
value: `*${term.toLowerCase()}*`
}
}
},
{
wildcard: {
"name.keyword": {
value: `*${term.toLowerCase()}*`
}
}
}
],
must_not: {
terms: {_id: blacklist}
}
}
}
}
});
}

Related

Elasticsearch query showing weird behavior : bug?

To sum up things quickly, we are using Elasticsearch 6.8.4 and have documents with fields such as "statutPublicOuInterne" (public or internal state) or "identifiant" (identifier).
I cannot share the whole JSON (_source) for security reasons (corporate restrictions), but it looks like the following:
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"contenu": { ... }
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH",
(...)
}
Some of the "statutPublicOuInterne" can have values such as "enAttenteTraitementCommissionTask" or "enCoursTraitementCommissionTask".
1st question: for some reason, when I search for statutPublicOuInterne=enCoursTraitementCommissionTask, it doesn't work, but if I search for statutPublicOuInterne=enCoursTraitementCommission (without "Task"), it works! That seems so weird to me and I really can't explain it.
2nd question: if I assume I need to search without the "Task" at the end, then searching for statutPublicOuInterne=enCoursTraitementCommission works but statutPublicOuInterne=enAttenteTraitementCommission doesn't work! (nor does statutPublicOuInterne=enAttenteTraitementCommissionTask work)
The query is as follows:
{
"query": {
"bool" : {
"must" : [
{
"match" : {
"statutPublicOuInterne" : {
"query" : "enAttenteTraitementCommission"
}
}
}
]
}
}
}
I just can't understand why it doesn't find anything, because if I search for this document with its "identifiant" field, then it works:
{
"query": {
"bool" : {
"must" : [
{
"match" : {
"identifiant" : {
"query" : "SFB-20201105-ELUH"
}
}
}
]
}
}
}
The response is:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.0283146,
"hits": [
{
"_index": "some-index",
"_type": "demandes",
"_id": "SFB-20201105-ELUH",
"_score": 2.0283146,
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"contenu": { ... }
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH",
(...)
}
}
]
}
}
We can clearly see "statutPublicOuInterne": "enAttenteTraitementCommissionTask" in the response.
Am I missing something?
Many thanks in advance for your help!
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"statutPublicOuInterne": {
"type": "text"
}
}
}
}
Index Data:
{
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH"
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"match": {
"statutPublicOuInterne": {
"query": "enAttenteTraitementCommissionTask"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "64700803",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dateCreation": "2020-11-05T16:31:28.404+01:00",
"dateDerModif": "2020-11-05T16:31:49.183+01:00",
"langue": "fr",
"observations": null,
"statutPublicOuInterne": "enAttenteTraitementCommissionTask",
"identifiant": "SFB-20201105-ELUH"
}
}
]

how to make proper query to select by ID and later update using elastic search?

I am very new in ES and I am trying to figure out some things.
I did a basic query this way
GET _search
{
"query": {
"match_all": {}
}
}
and I got this...
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 768,
"successful": 768,
"failed": 0
},
"hits": {
"total": 456,
"max_score": 1,
"hits": [
{
"_index": "sometype_1",
"_type": "sometype",
"_id": "12312321312312",
"_score": 1,
"_source": {
"readModel": {
"id": "asdfqwerzcxv",
"status": "active",
"hidden": false
},
"model": {
"id": "asdfqwerzcxv",
"content": {
"objectId": "421421312312",
"message": "hello world",
..... //the rest of the object...
So right now I want to get the object with id asdfqwerzcxv and I did this:
GET _search
{
"query": {
"match" : {
"id" :"asdfqwerzcxv"
}
}
}
But of course is not working... I also tried to make the whole route like:
GET _search
{
"query": {
"match" : {
"_source" :{
"readModel" : {
"id": "asdfqwerzcxv"
}
}
}
}
}
But no luck...
is there a way to do this? could someone help me?
Thanks
You need to use the full-qualified field name, try this:
GET _search
{
"query": {
"match" : {
"readModel.id" :"asdfqwerzcxv"
^
|
add this
}
}
}

Elastic Query Filters Challenge

I have the following query, generating a Top 100 sellers for a given supplier ID, running against a sales index that looks up the product skus for the given supplier in an index of product_skus. This works well.
query = {
size: 0,
query: {
bool: {
filter: [
{
constant_score: {
filter: {
terms: {
sku: {
index: "product_skus",
type: "product",
id: supplier_id,
path: "skus"
}
}
}
}
}
],
must_not: []
}
},
aggs: {
unit_sum: {
terms: {
field: "sku",
size: 100,
order: {
one: "desc"
}
},
aggs: {
one: {
sum: {
field: "units"
}
}
}
}
}
}
Now I have a scenario where a given user needs to have their access restricted to a subset of the suppliers skus. I am trying to get my head around the best way to tackle this. I am leaning towards having another index of the Skus a user can access and doing a second lookup, but I can't quite get my head around the query logic.
In simple terms for example; if in the above query, for supplier 1 we return products [A,B,C,D,E]
and user John should only see the results based on products [A,C,E]
How would I go about writing the query to do this? Is it as simple as adding in a should clause after the filter inside the bool?
Thanks in advance!
Routing is probably what you need in this situation, as your scenario allows you to use routing for users. As an additional bonus of organizing your data into separate shards, it will allow to increase performances when routing is used in query. Why? Because when using routing, the request will be sent only to shards containing the relevant data instead of every node across the cluster.
What would it look like in your case? Let's have a look with a simple mapping, and a product that should only be accessed with an id 123:
The mapping of product_skus (modify as needed):
PUT product_skus
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {
"product": {
"_routing": {
"required": true
},
"properties": {
"supplierId":{
"type": "integer"
}, "path":{
"type": "string"
}
}
}
}
}
Now let's put a product in the index type (notice the routing):
POST product_skus/product?routing=123
{
"supplierId": 123,
"path": "some/path"
}
And finally two requests and their output using the routing:
GET product_skus/_search?routing=123
{
"query": {
"match_all": {}
}
}
Output:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "product_skus",
"_type": "product",
"_id": "AVrMHzgx28yun46LEMYm",
"_score": 1,
"_routing": "123",
"_source": {
"supplierId": 123,
"path": "some/path"
}
}
]
}
}
Second query:
GET product_skus/_search?routing=124
{
"query": {
"match_all": {}
}
}
Output:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
It is only a simple example, you might want to check the documentation for more information:
The _routing field
Another routing example
An example of routing with fields of the type
In addition the following shows that only one shard is used with routing:
GET product_skus/_search_shards?routing=123
Output:
{
"nodes": {
"1sMKtN6aQ9yyOsTjknWyQA": {
"name": "1sMKtN6",
"ephemeral_id": "X-V2QGTwTmqUFQb1B6KIUw",
"transport_address": "127.0.0.1:9300",
"attributes": {}
}
},
"shards": [
[
{
"state": "STARTED",
"primary": true,
"node": "1sMKtN6aQ9yyOsTjknWyQA",
"relocating_node": null,
"shard": 0,
"index": "product_skus",
"allocation_id": {
"id": "1MMkFaALRxm1N-x8J8AGhg"
}
}
]
]
}
See the search shards API for more details.

ElasticSearch - Match (email value) returns wrong registers

I'm using match to search for a specific email but the result is wrong. The match property brings me results similar. If the result exists, the result displays on first lines but when the results not exists, it brings me result by same domain.
Here is my query:
{
"query": {
"match" : {
"email" : "placplac#xxx.net"
}
}
}
This email doesn't exist in my base but returning values like banana#xxx.net, ronyvon#xxx.net*, etc.
How can i force to return only if the value is equal from the query?
Thank in advance.
You need to put "index":"not_analyzed" on the "email" field. That way, the only terms that are queried against are the exact values that have been stored to that field (as opposed to the case with the standard analyzer, which is the default used if no analyzer is listed).
To illustrate, I set up a simple mapping with the email field not analyzed, and added two simple docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test_index/doc/1
{"email": "placplac#xxx.net"}
PUT /test_index/doc/2
{"email": "placplac#nowhere.net"}
Now your match query will return only the document that matches the query exactly:
POST /test_index/_search
{
"query": {
"match" : {
"email" : "placplac#xxx.net"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"email": "placplac#xxx.net"
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/12763f63f2a75bf30ff956c25097b5955074508a
PS: What you actually probably want here is a term query or even term filter, since you don't want any analysis on the query text. So maybe something like:
POST /test_index/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"email": "placplac#xxx.net"
}
}
}
}
}

Is there any method in Elastic Search to get result in case of misspelling?

I want to know if it's possible to search among the data in case of misspelling like we search in google.
Currently this query returns thousands of results:
{
"query": {
"query_string": {
"query": "obama"
}
}
}
but when I change it to:
{
"query": {
"query_string": {
"query": "omama"
}
}
}
"obama" replaced with "omama" there is no result. is it possible to get results in case of wrong spelling?
I think what you are looking for is Fuzzy Query .
{
"query": {
"fuzzy": {
"field_name" : "omama"
}
}
}
If you are run this on single field the you can use fuzzy query like this field
{
"fuzzy_like_this_field" : {
"name.first" : {
"like_text" : "omama",
"max_query_terms" : 12
}
}
}
You can also check Phonetic Matching
https://github.com/elasticsearch/elasticsearch-analysis-phonetic
Simply use a fuzzy query, (documentation) :
{
"query": {
"fuzzy": {
"name": "omama"
}
}
}
You should get your result :
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.7917595,
"hits": [
{
"_index": "test",
"_type": "obama",
"_id": "D_ovfcHkQwODdftWM4_z1Q",
"_score": 2.7917595,
"_source": {
"name": "obama"
}
}
]
}
}

Resources