elasticsearch match query in array - elasticsearch

I have follow query with terms, that works fine.
{
"query": {
"terms": {
"130": [
"jon#domain.com",
"mat#domain.com"
]
}
}
}
Found 2 docs.
but now i would like to build similar query with match (want to find all users in domain). I've tried follow query without any result
{
"query": {
"match": {
"130": {
"query":"#domain.com"
}
}
}
}
Found 0 docs. Why??
Field 130 has follow mapping:
"130":{"type":"text","analyzer":"whitespace","fielddata":true}

If you are using a whitespace analyzer, then the token generated will be :
{
"tokens": [
{
"token": "jon#domain.com",
"start_offset": 0,
"end_offset": 14,
"type": "word",
"position": 0
}
]
}
So terms query will match with the above token as it returns documents that contain one or more exact terms in a provided field, but match query will give 0 results
Instead, you should use a standard analyzer (which is the default one), which will generate the following tokens:
{
"tokens": [
{
"token": "jon",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "domain.com",
"start_offset": 4,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
}
]
}
You can even go through the uax_url_email tokenizer which is like the standard tokenizer except that it recognizes URLs and email addresses as single tokens.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"130": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Index Data:
{
"130":"jon#domain.com"
}
Search Query:
{
"query": {
"match": {
"130": {
"query": "#domain.com"
}
}
}
}
Search Result:
"hits": [
{
"_index": "65121147",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"130": "jon#domain.com"
}
}
]

Related

Elasticsearch : how to return the document with the exact word searched and not all documents that contain that word in an sentence?

I have field (type text) named 'description'
I have 3 documents.
doc1 description = "test"
doc2 description = "test dsc"
doc3 description = "2021 test desc"
CASE 1- if i search "test" i want only doc1
CASE 2- if i search "test dsc" i want only doc2
CASE 3- if i search "2021 test desc" i want only doc3
But now only CASE 3 is working
For example CASE1 not working .If i try this query i have all 3 document
GET /myindex/_search
{
"query": {
"match" : {
"Description" : "test"
}
}
}
thanks
You are getting all three documents in your search because by default elasticsearch uses a standard analyzer, for the text type field. This will tokenize "2021 test desc" into
{
"tokens": [
{
"token": "2021",
"start_offset": 0,
"end_offset": 4,
"type": "<NUM>",
"position": 0
},
{
"token": "test",
"start_offset": 5,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "desc",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Therefore, it will return all the documents that match any of the above tokens.
If you want to search for the exact term you need to update your index mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
PUT /_mapping
{
"properties": {
"description": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
And then reindex the data again. After this, you will be able to query using the "description" field as of text type and "description.raw" as of keyword type
Search Query:
{
"query": {
"match": {
"description.raw": "test dsc"
}
}
}
Search Result:
"hits": [
{
"_index": "67777521",
"_type": "_doc",
"_id": "2",
"_score": 0.9808291,
"_source": {
"description": "test dsc"
}
}
]

Elasticsearch term vs match

I have to write a search query on 2 condition.
timestamp
directory
When I am using match in search query like below
{
"query":{
"bool":{
"must":{
"match":{
"directory":"/user/ayush/test/error/"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2020-08-25 01:00:00",
"lte":"2020-08-25 01:30:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
In the filter result I am getting records with directory
/user/ayush/test/error/
/user/hive/
/user/
but when I am using term like below
{
"query":{
"bool":{
"must":{
"term":{
"directory":"/user/ayush/test/error/"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2020-08-25 01:00:00",
"lte":"2020-08-25 01:30:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
I am not getting any results not even with directory value /user/ayush/test/error/
The match query analyzes the input string and constructs more basic
queries from that.
The term query matches exact terms.
Refer these blogs to get detailed information :
SO question on Term vs Match query
https://discuss.elastic.co/t/term-query-vs-match-query/14455
elasticsearch match vs term query
The field value /user/ayush/test/error/ is analyzed as follows :
POST/_analyze
{
"analyzer" : "standard",
"text" : "/user/ayush/test/error/"
}
The tokens generated are:
{
"tokens": [
{
"token": "user",
"start_offset": 1,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "ayush",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "test",
"start_offset": 12,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "error",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Index data:
{ "directory":"/user/ayush/test/error/" }
{ "directory":"/user/ayush/" }
{ "directory":"/user" }
Search Query using Term query:
The term query does not apply any analyzers to the search term, so will only look for that exact term in the inverted index. So to search for the exact term, you need to use directory.keyword OR change the mapping of field.
{
"query": {
"term": {
"directory.keyword": {
"value": "/user/ayush/test/error/",
"boost": 1.0
}
}
}
}
Search Result for Term query:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"directory": "/user/ayush/test/error/"
}
}
]

Elasticsearch match vs. term in filter

I don't see any difference between term and match in filter:
POST /admin/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber": "j1knd"
}
}
]
}
}
}
And the result contains not exactly matched partnumbers too, e.g.: "52527.J1KND-H"
Why?
Term queries are not analyzed and mean whatever you send will be used as it is to match the tokens in the inverted index, while match queries are analyzed and the same analyzer applied on the fields, which is used at index time and accordingly matches the document.
Read more about term query and match query. As mentioned in the match query:
Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
You can also use the analyze API to see the tokens generated for a particular field.
Tokens generated by standard analyzer on 52527.J1KND-H text.
POST /_analyze
{
"text": "52527.J1KND-H",
"analyzer" : "standard"
}
{
"tokens": [
{
"token": "52527",
"start_offset": 0,
"end_offset": 5,
"type": "<NUM>",
"position": 0
},
{
"token": "j1knd",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "h",
"start_offset": 12,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Above explain to you why you are getting the not exactly matched partnumbers too, e.g.: "52527.J1KND-H", I would take your example and how you can make it work.
Index mapping
{
"mappings": {
"properties": {
"partnumber": {
"type": "text",
"fields": {
"raw": {
"type": "keyword" --> note this
}
}
}
}
}
}
Index docs
{
"partnumber" : "j1knd"
}
{
"partnumber" : "52527.J1KND-H"
}
Search query to return only the exact match
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber.raw": "j1knd" --> note `.raw` in field
}
}
]
}
}
Result
"hits": [
{
"_index": "so_match_term",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"partnumber": "j1knd"
}
}
]
}

getting data without matching full string in elastic search query

my data is stored in the elastic search below format
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca963f84470d852",
"_score": 0.69321066,
"_source": {
"email": "test20011#gmail.com",
"wallet": "test20011#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "7916318809",
"name": "test20011"
}
},
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca9634d1c70d856",
"_score": 0.69321066,
"_source": {
"email": "test50011#gmail.com",
"wallet": "test50011#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "3483330496",
"name": "test50011"
}
},
{
"_index": "wallet",
"_type": "wallet",
"_id": "5dfcbe0a6ca96304b370d857",
"_score": 0.69321066,
"_source": {
"email": "test110021#gmail.com",
"wallet": "test110021#operatorqa2.akeodev.com",
"countryCode": "+91",
"phone": "2744697207",
"name": "test110021"
}
}
The record should not find if we are using below query
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet": {
"query": "operatorqa2.akeodev.com",
"operator": "and"
}
}
},
{
"match": {
"email": {
"query": "operatorqa2.akeodev.com",
"operator": "and"
}
}
}
]
}
}
}
the record should find if I am passing below Query
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet": {
"query": "test20011#operatorqa2.akeodev.com",
"operator": "and"
}
}
},
{
"match": {
"email": {
"query": "test20011#operatorqa2.akeodev.com",
"operator": "and"
}
}
}
]
}
}
}
I have created the index on the email and wallet field.
whenever users searching data by email or wallet and I am not sure that whatever string is sending by the user it's email or wallet so I am using bool.
the record should find if a user sends the full Email address or full Wallet Address.
Please help me to find a solution
As mentioned by the other community members, when asking questions like this you should specify the version of Elasticsearch you are using and also provide the mapping.
Starting with Elasticsearch version 5 with default mappings you would only need to change your query to query against the exact version of the field rather than the analyzed version. By default Elasticsearch maps strings to a multi-field of type text (analyzed, for full-text search) and keyword (not-analyzed, for exact match search). In your query you would then query against the <fieldname>.keyword-fields:
{
"query": {
"bool": {
"should": [
{
"match": {
"wallet.keyword": "test20011#operatorqa2.akeodev.com"
}
},
{
"match": {
"email.keyword": "test20011#operatorqa2.akeodev.com"
}
}
]
}
}
}
If you are on an Elasticsearch version prior to version 5, change the index-property from analyzed to not_analyzed and re-index your data.
Mapping snippet:
{
"email": {
"type" "string",
"index": "not_analyzed"
}
}
Your query would still not need to use the and-operator. It will look identical to the query I posted above, with the exception that you have to query against the email and wallet-fields, and not email.keyword and wallet.keyword.
I can recommend you the following blog post from Elastic related to that topic: Strings are dead, long live strings!
As I don't have a mapping to your index schema, I am assuming you are using ES defaults(you can get this using mapping API) and in your case, wallet and email fields would be defined as text with default analyzer which is the standard analyzer.
This analyzer wouldn't recognize these text as mail-ids and would create three tokens for test50011#operatorqa2.akeodev.com which you can check using the analyze APIs.
http://localhost:9200/_analyze?text=test50011#operatorqa2.akeodev.com&tokenizer=standard
{
"tokens": [
{
"token": "test50011",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "operatorqa2",
"start_offset": 10,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "akeodev.com",
"start_offset": 22,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 3
}
]
}
What you need here is custom analyzer for mails using UAX URI Mail tokenizer which is used for email fields. This would generate a proper token(just 1) for test50011#operatorqa2.akeodev.com as shown below:
http://localhost:9200/_analyze?text=test50011#operatorqa2.akeodev.com&tokenizer=uax_url_email
{
"tokens": [
{
"token": "test50011#operatorqa2.akeodev.com",
"start_offset": 0,
"end_offset": 33,
"type": "<EMAIL>",
"position": 1
}
]
}
Now as you can see it's not splitting test50011#operatorqa2.akeodev.com, hence when you search using your same query it would also generate the same token and ES works on a token to token match.
Let me know if you need any help, its very simple to setup and use.

how to search a document containing a substring

I have the following document with this (partial) mapping:
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
I'm trying to perform a query for document containing "success":"0" through the following DSL query:
{
"query": {
"bool": {
"must": {
"regexp": {
"message": ".*\"success\".*0.*"
}
}
}
}
}
but I don't get any result, whereas if I perform the following DSL:
{
"query": {
"bool": {
"must": {
"regexp": {
"message": ".*\"success\""
}
}
}
}
}
I'm returned some document! I.e.
{"data":"[{\"appVersion\":\"1.1.1\",\"installationId\":\"any-ubst-id\",\"platform\":\"aaa\",\"brand\":\"Dalvik\",\"screenSize\":\"xhdpi\"}]","executionTime":"0","flags":"0","method":"aaa","service":"myService","success":"0","type":"aservice","version":"1"}
What's wrong with my query?
The text field message uses standard analyzer which tokenize the input string and convert it to tokens.
If we analyze the string "success":"0" using standard analyzer we will get these tokens
{
"tokens": [
{
"token": "success",
"start_offset": 2,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "0",
"start_offset": 12,
"end_offset": 13,
"type": "<NUM>",
"position": 1
}
]
}
So you can see that colon double quotes etc are removed. And since regexp query applied on each token it will not match your query.
But if we use message.keyword which has field type keyword. it is not analyzed thus keep the string as it is.
{
"tokens": [
{
"token": """ "success":"0" """,
"start_offset": 0,
"end_offset": 15,
"type": "word",
"position": 0
}
]
}
So if we use the below query it should work
{
"query": {
"regexp": {
"message.keyword": """.*"success".*0.*"""
}
}
}
But another problem is you have set message.keyword field settings to "ignore_above": 256 So This field will ignore any string longer than 256 characters.

Resources