my data is stored in the elastic search below format
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca963f84470d852",
"_score": 0.69321066,
"_source": {
"email": "test20011#gmail.com",
"wallet": "test20011#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "7916318809",
"name": "test20011"
}
},
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca9634d1c70d856",
"_score": 0.69321066,
"_source": {
"email": "test50011#gmail.com",
"wallet": "test50011#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "3483330496",
"name": "test50011"
}
},
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca96304b370d857",
"_score": 0.69321066,
"_source": {
"email": "test110021#gmail.com",
"wallet": "test110021#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "2744697207",
"name": "test110021"
}
}
The record should not find if we are using below query
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet": {
"query": "operatorqa2.akeodev.com",
"operator": "and"
}
}
},
{
"match": {
"email": {
"query": "operatorqa2.akeodev.com",
"operator": "and"
}
}
}
]
}
}
}
the record should find if I am passing below Query
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet": {
"query": "test20011#operatorqa2.akeodev.com",
"operator": "and"
}
}
},
{
"match": {
"email": {
"query": "test20011#operatorqa2.akeodev.com",
"operator": "and"
}
}
}
]
}
}
}
I have created the index on the email and wallet field.
whenever users searching data by email or wallet and I am not sure that whatever string is sending by the user it's email or wallet so I am using bool.
the record should find if a user sends the full Email address or full Wallet Address.
Please help me to find a solution
As mentioned by the other community members, when asking questions like this you should specify the version of Elasticsearch you are using and also provide the mapping.
Starting with Elasticsearch version 5 with default mappings you would only need to change your query to query against the exact version of the field rather than the analyzed version. By default Elasticsearch maps strings to a multi-field of type text (analyzed, for full-text search) and keyword (not-analyzed, for exact match search). In your query you would then query against the <fieldname>.keyword-fields:
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet.keyword": "test20011#operatorqa2.akeodev.com"
}
},
{
"match": {
"email.keyword": "test20011#operatorqa2.akeodev.com"
}
}
]
}
}
}
If you are on an Elasticsearch version prior to version 5, change the index-property from analyzed to not_analyzed and re-index your data.
Mapping snippet:
{
"email": {
"type" "string",
"index": "not_analyzed"
}
}
Your query would still not need to use the and-operator. It will look identical to the query I posted above, with the exception that you have to query against the email and wallet-fields, and not email.keyword and wallet.keyword.
I can recommend you the following blog post from Elastic related to that topic: Strings are dead, long live strings!
As I don't have a mapping to your index schema, I am assuming you are using ES defaults(you can get this using mapping API) and in your case, wallet and email fields would be defined as text with default analyzer which is the standard analyzer.
This analyzer wouldn't recognize these text as mail-ids and would create three tokens for test50011#operatorqa2.akeodev.com which you can check using the analyze APIs.
http://localhost:9200/_analyze?text=test50011#operatorqa2.akeodev.com&tokenizer=standard
{
"tokens": [
{
"token": "test50011",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "operatorqa2",
"start_offset": 10,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "akeodev.com",
"start_offset": 22,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 3
}
]
}
What you need here is custom analyzer for mails using UAX URI Mail tokenizer which is used for email fields. This would generate a proper token(just 1) for test50011#operatorqa2.akeodev.com as shown below:
http://localhost:9200/_analyze?text=test50011#operatorqa2.akeodev.com&tokenizer=uax_url_email
{
"tokens": [
{
"token": "test50011#operatorqa2.akeodev.com",
"start_offset": 0,
"end_offset": 33,
"type": "<EMAIL>",
"position": 1
}
]
}
Now as you can see it's not splitting test50011#operatorqa2.akeodev.com, hence when you search using your same query it would also generate the same token and ES works on a token to token match.
Let me know if you need any help, its very simple to setup and use.
Related
I have follow query with terms, that works fine.
{
"query": {
"terms": {
"130": [
"jon#domain.com",
"mat#domain.com"
]
}
}
}
Found 2 docs.
but now i would like to build similar query with match (want to find all users in domain). I've tried follow query without any result
{
"query": {
"match": {
"130": {
"query":"#domain.com"
}
}
}
}
Found 0 docs. Why??
Field 130 has follow mapping:
"130":{"type":"text","analyzer":"whitespace","fielddata":true}
If you are using a whitespace analyzer, then the token generated will be :
{
"tokens": [
{
"token": "jon#domain.com",
"start_offset": 0,
"end_offset": 14,
"type": "word",
"position": 0
}
]
}
So terms query will match with the above token as it returns documents that contain one or more exact terms in a provided field, but match query will give 0 results
Instead, you should use a standard analyzer (which is the default one), which will generate the following tokens:
{
"tokens": [
{
"token": "jon",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "domain.com",
"start_offset": 4,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
}
]
}
You can even go through the uax_url_email tokenizer which is like the standard tokenizer except that it recognizes URLs and email addresses as single tokens.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"130": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Index Data:
{
"130":"jon#domain.com"
}
Search Query:
{
"query": {
"match": {
"130": {
"query": "#domain.com"
}
}
}
}
Search Result:
"hits": [
{
"_index": "65121147",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"130": "jon#domain.com"
}
}
]
Below is the query to get the exact match
GET courses/_search
{
"query": {
"term" : {
"name.keyword": "Anthropology 230"
}
}
}
I need to find the Anthropology 230 and Anthropology 250 also
How to get the exact match
You can check and try with, match, match_phrase or match_phrase_prefix
Using match,
GET courses/_search
{
"query": {
"match" : {
"name" : "Anthropology 230"
}
},
"_source": "name"
}
Using match_phrase,
GET courses/_search
{
"query": {
"match_phrase" : {
"name" : "Anthropology"
}
},
"_source": "name"
}
OR using regexp,
GET courses/_search
{
"query": {
"regexp" : {
"name" : "Anthropology [0-9]{3}"
}
},
"_source": "name"
}
The mistake that you are doing is that you are using the term query on keyword field and both of them are not analyzed, which means they try to find the exact same search string in inverted index.
What you should be doing is: define a text field which you anyway will have if you have not defined your mapping. I am also assuming the same as in your query you mentioned .keyword which gets created automatically if you don't define mapping.
Now you can just use below match query which is analyzed and uses standard analyzer which splits the token on whitespace, so Anthropology 250 and 230 will be generated for your 2 sample docs.
Simple and efficient query which brings both the docs
{
"query": {
"match" : {
"name" : "Anthropology 230"
}
}
}
And search result
"hits": [
{
"_index": "matchterm",
"_type": "_doc",
"_id": "1",
"_score": 0.8754687,
"_source": {
"name": "Anthropology 230"
}
},
{
"_index": "matchterm",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"name": "Anthropology 250"
}
}
]
The reason why above query matched both docs is that it created two tokens anthropology and 230 and matches anthropology in both of the documents.
You should definitely read about the analysis process and can also try analyze API to see the tokens generated for any text.
Analyze API output for your text
POST http://{{hostname}}:{{port}}/{{index-name}}/_analyze
{
"analyzer": "standard",
"text": "Anthropology 250"
}
{
"tokens": [
{
"token": "anthropology",
"start_offset": 0,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "250",
"start_offset": 13,
"end_offset": 16,
"type": "<NUM>",
"position": 1
}
]
}
Assuming you may have more 'Anthropology nnn' items, this should do what you need:
"query":{
"bool":{
"must":[
{"term": {"name.keyword":"Anthropology 230"}},
{"term": {"name.keyword":"Anthropology 250"}},
]
}
}
I don't see any difference between term and match in filter:
POST /admin/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber": "j1knd"
}
}
]
}
}
}
And the result contains not exactly matched partnumbers too, e.g.: "52527.J1KND-H"
Why?
Term queries are not analyzed and mean whatever you send will be used as it is to match the tokens in the inverted index, while match queries are analyzed and the same analyzer applied on the fields, which is used at index time and accordingly matches the document.
Read more about term query and match query. As mentioned in the match query:
Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
You can also use the analyze API to see the tokens generated for a particular field.
Tokens generated by standard analyzer on 52527.J1KND-H text.
POST /_analyze
{
"text": "52527.J1KND-H",
"analyzer" : "standard"
}
{
"tokens": [
{
"token": "52527",
"start_offset": 0,
"end_offset": 5,
"type": "<NUM>",
"position": 0
},
{
"token": "j1knd",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "h",
"start_offset": 12,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Above explain to you why you are getting the not exactly matched partnumbers too, e.g.: "52527.J1KND-H", I would take your example and how you can make it work.
Index mapping
{
"mappings": {
"properties": {
"partnumber": {
"type": "text",
"fields": {
"raw": {
"type": "keyword" --> note this
}
}
}
}
}
}
Index docs
{
"partnumber" : "j1knd"
}
{
"partnumber" : "52527.J1KND-H"
}
Search query to return only the exact match
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber.raw": "j1knd" --> note `.raw` in field
}
}
]
}
}
Result
"hits": [
{
"_index": "so_match_term",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"partnumber": "j1knd"
}
}
]
}
I have query like this (I've removed sorting part because it doesn't matter):
GET _search
{
"query": {
"multi_match": {
"query": "somethi",
"fields": [ "title", "content"],
"fuzziness" : "AUTO",
"prefix_length" : 0
}
}
}
When running this I'm getting results like this:
"hits": [
{
"_index": "test_index",
"_type": "article",
"_id": "2",
"_score": 0.083934024,
"_source": {
"title": "Matching something abc",
"content": "This is a piece of content",
"categories": [
{
"name": "B",
"weight": 4
}
]
},
"sort": [
4,
0.083934024,
"article#2"
]
},
{
"_index": "test_index",
"_type": "article",
"_id": "3",
"_score": 0.18436861,
"_source": {
"title": "Matching something abc",
"content": "This is a piece of content containing something",
"categories": [
{
"name": "C",
"weight": 3
}
]
},
"sort": [
3,
0.18436861,
"article#3"
]
},
...
So no problem to get what is expected. However I noticed, that I remove one letter from query to have someth instead, Elasticsearch won't return any results.
This is quite strange for me. It seems multi_match is doing partial match but it somehow require to use minimum x characters. Same if I try to put in query for example omethin I will get results, but using only omethi I won't get any.
Is there any setting to set minimum number of characters in queries or maybe I would need to rewrite my query to achieve what I want? I would like to run match on multiple fields (in above query on title and content fields) that will allow partial match together with fuzzinness.
You get this behaviour because you have "fuzziness": "AUTO" parameter set, which means that in a word with more than 5 characters it is acceptable to misplace maximum of two characters. Generally, fuzziness parameter tells elasticsearch to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character. With fuzziness it is not possible to have more than two changes.
If you need to be able to search with partial matching, you could try to configure you index using Edge NGram analyzer and set it to your title and content fields. You can easily test how it works:
Create na index with following mapping:
PUT http://127.0.0.1:9200/test
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
And run this query:
curl -X POST \
'http://127.0.0.1:9200/test/_analyze?pretty=true' \
-d '{
"analyzer" : "edge_ngram_analyzer",
"text" : ["something"]
}'
As a result you'll get:
{
"tokens": [
{
"token": "so",
...
},
{
"token": "som",
...
},
{
"token": "some",
...
},
{
"token": "somet",
...
},
{
"token": "someth",
...
},
{
"token": "somethi",
...
},
{
"token": "somethin",
...
},
{
"token": "something",
...
}
]
}
And these are the tokens you'll get during search with edge_ngram_analyzer. With min_gram and max_gram you can configure minimum/maximum length of characters in a gram.
If you need to handle the case with omething etc. (missing letter at the beginning) try the same with NGram analyzer.
I'm having trouble executing the following request against Elasticsearch v2.2.0. If I remove the filter property (and contents, of course), I get my entity back (only one exists). With the filter clause in place, I just get 0 results, but no error. Same if I remove the email filter and/or the name filter. Am I doing something wrong with this request?
Request
GET http://localhost:9200/my-app/my-entity/_search?pretty=1
{
"query": {
"filtered" : {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"email": "my.email#email.com"
}
},
{
"term": {
"name": "Test1"
}
}
]
}
}
}
}
Existing Entity
{
"email": "my.email#email.com",
"name": "Test1"
}
Mapping
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
},
"term": {
"type": "long"
}
}
Since email field is analyzed with no custom analyzer, Standard Analyzer will get applied to it and it will split into tokens.
Read about Standard Tokenizer here.
You can use below command to see how my.email#email.com is getting tokenized.
curl -XGET "http://localhost:9200/_analyze?tokenizer=standard" -d "my.email#email.com".
This will generate following output.
{
"tokens": [
{
"token": "my.email", ===> Notice this
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "email.com", ===> Notice this
"start_offset": 9,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
}
]
}
If you want full or exact search you need to make it not_analyzed. Study about how to create a not_analyzed field here.
{
"email": {
"type": "string",
"index": "not_analyzed"
}
}
Hope it is clear