Elasticsearch match vs. term in filter - elasticsearch

I don't see any difference between term and match in filter:
POST /admin/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber": "j1knd"
}
}
]
}
}
}
And the result contains not exactly matched partnumbers too, e.g.: "52527.J1KND-H"
Why?

Term queries are not analyzed and mean whatever you send will be used as it is to match the tokens in the inverted index, while match queries are analyzed and the same analyzer applied on the fields, which is used at index time and accordingly matches the document.
Read more about term query and match query. As mentioned in the match query:
Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
You can also use the analyze API to see the tokens generated for a particular field.
Tokens generated by standard analyzer on 52527.J1KND-H text.
POST /_analyze
{
"text": "52527.J1KND-H",
"analyzer" : "standard"
}
{
"tokens": [
{
"token": "52527",
"start_offset": 0,
"end_offset": 5,
"type": "<NUM>",
"position": 0
},
{
"token": "j1knd",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "h",
"start_offset": 12,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Above explain to you why you are getting the not exactly matched partnumbers too, e.g.: "52527.J1KND-H", I would take your example and how you can make it work.
Index mapping
{
"mappings": {
"properties": {
"partnumber": {
"type": "text",
"fields": {
"raw": {
"type": "keyword" --> note this
}
}
}
}
}
}
Index docs
{
"partnumber" : "j1knd"
}
{
"partnumber" : "52527.J1KND-H"
}
Search query to return only the exact match
{
"query": {
"bool": {
"filter": [
{
"term": {
"partnumber.raw": "j1knd" --> note `.raw` in field
}
}
]
}
}
Result
"hits": [
{
"_index": "so_match_term",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"partnumber": "j1knd"
}
}
]
}

Related

Elasticsearch : how to return the document with the exact word searched and not all documents that contain that word in an sentence?

I have field (type text) named 'description'
I have 3 documents.
doc1 description = "test"
doc2 description = "test dsc"
doc3 description = "2021 test desc"
CASE 1- if i search "test" i want only doc1
CASE 2- if i search "test dsc" i want only doc2
CASE 3- if i search "2021 test desc" i want only doc3
But now only CASE 3 is working
For example CASE1 not working .If i try this query i have all 3 document
GET /myindex/_search
{
"query": {
"match" : {
"Description" : "test"
}
}
}
thanks
You are getting all three documents in your search because by default elasticsearch uses a standard analyzer, for the text type field. This will tokenize "2021 test desc" into
{
"tokens": [
{
"token": "2021",
"start_offset": 0,
"end_offset": 4,
"type": "<NUM>",
"position": 0
},
{
"token": "test",
"start_offset": 5,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "desc",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Therefore, it will return all the documents that match any of the above tokens.
If you want to search for the exact term you need to update your index mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
PUT /_mapping
{
"properties": {
"description": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
And then reindex the data again. After this, you will be able to query using the "description" field as of text type and "description.raw" as of keyword type
Search Query:
{
"query": {
"match": {
"description.raw": "test dsc"
}
}
}
Search Result:
"hits": [
{
"_index": "67777521",
"_type": "_doc",
"_id": "2",
"_score": 0.9808291,
"_source": {
"description": "test dsc"
}
}
]

elasticsearch match query in array

I have follow query with terms, that works fine.
{
"query": {
"terms": {
"130": [
"jon#domain.com",
"mat#domain.com"
]
}
}
}
Found 2 docs.
but now i would like to build similar query with match (want to find all users in domain). I've tried follow query without any result
{
"query": {
"match": {
"130": {
"query":"#domain.com"
}
}
}
}
Found 0 docs. Why??
Field 130 has follow mapping:
"130":{"type":"text","analyzer":"whitespace","fielddata":true}
If you are using a whitespace analyzer, then the token generated will be :
{
"tokens": [
{
"token": "jon#domain.com",
"start_offset": 0,
"end_offset": 14,
"type": "word",
"position": 0
}
]
}
So terms query will match with the above token as it returns documents that contain one or more exact terms in a provided field, but match query will give 0 results
Instead, you should use a standard analyzer (which is the default one), which will generate the following tokens:
{
"tokens": [
{
"token": "jon",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "domain.com",
"start_offset": 4,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
}
]
}
You can even go through the uax_url_email tokenizer which is like the standard tokenizer except that it recognizes URLs and email addresses as single tokens.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"130": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Index Data:
{
"130":"jon#domain.com"
}
Search Query:
{
"query": {
"match": {
"130": {
"query": "#domain.com"
}
}
}
}
Search Result:
"hits": [
{
"_index": "65121147",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"130": "jon#domain.com"
}
}
]

Elasticsearch term vs match

I have to write a search query on 2 condition.
timestamp
directory
When I am using match in search query like below
{
"query":{
"bool":{
"must":{
"match":{
"directory":"/user/ayush/test/error/"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2020-08-25 01:00:00",
"lte":"2020-08-25 01:30:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
In the filter result I am getting records with directory
/user/ayush/test/error/
/user/hive/
/user/
but when I am using term like below
{
"query":{
"bool":{
"must":{
"term":{
"directory":"/user/ayush/test/error/"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2020-08-25 01:00:00",
"lte":"2020-08-25 01:30:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
I am not getting any results not even with directory value /user/ayush/test/error/
The match query analyzes the input string and constructs more basic
queries from that.
The term query matches exact terms.
Refer these blogs to get detailed information :
SO question on Term vs Match query
https://discuss.elastic.co/t/term-query-vs-match-query/14455
elasticsearch match vs term query
The field value /user/ayush/test/error/ is analyzed as follows :
POST/_analyze
{
"analyzer" : "standard",
"text" : "/user/ayush/test/error/"
}
The tokens generated are:
{
"tokens": [
{
"token": "user",
"start_offset": 1,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "ayush",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "test",
"start_offset": 12,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "error",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Index data:
{ "directory":"/user/ayush/test/error/" }
{ "directory":"/user/ayush/" }
{ "directory":"/user" }
Search Query using Term query:
The term query does not apply any analyzers to the search term, so will only look for that exact term in the inverted index. So to search for the exact term, you need to use directory.keyword OR change the mapping of field.
{
"query": {
"term": {
"directory.keyword": {
"value": "/user/ayush/test/error/",
"boost": 1.0
}
}
}
}
Search Result for Term query:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"directory": "/user/ayush/test/error/"
}
}
]

how to search a document containing a substring

I have the following document with this (partial) mapping:
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
I'm trying to perform a query for document containing "success":"0" through the following DSL query:
{
"query": {
"bool": {
"must": {
"regexp": {
"message": ".*\"success\".*0.*"
}
}
}
}
}
but I don't get any result, whereas if I perform the following DSL:
{
"query": {
"bool": {
"must": {
"regexp": {
"message": ".*\"success\""
}
}
}
}
}
I'm returned some document! I.e.
{"data":"[{\"appVersion\":\"1.1.1\",\"installationId\":\"any-ubst-id\",\"platform\":\"aaa\",\"brand\":\"Dalvik\",\"screenSize\":\"xhdpi\"}]","executionTime":"0","flags":"0","method":"aaa","service":"myService","success":"0","type":"aservice","version":"1"}
What's wrong with my query?
The text field message uses standard analyzer which tokenize the input string and convert it to tokens.
If we analyze the string "success":"0" using standard analyzer we will get these tokens
{
"tokens": [
{
"token": "success",
"start_offset": 2,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "0",
"start_offset": 12,
"end_offset": 13,
"type": "<NUM>",
"position": 1
}
]
}
So you can see that colon double quotes etc are removed. And since regexp query applied on each token it will not match your query.
But if we use message.keyword which has field type keyword. it is not analyzed thus keep the string as it is.
{
"tokens": [
{
"token": """ "success":"0" """,
"start_offset": 0,
"end_offset": 15,
"type": "word",
"position": 0
}
]
}
So if we use the below query it should work
{
"query": {
"regexp": {
"message.keyword": """.*"success".*0.*"""
}
}
}
But another problem is you have set message.keyword field settings to "ignore_above": 256 So This field will ignore any string longer than 256 characters.

No results returned for filtered Elasticsearch query

I'm having trouble executing the following request against Elasticsearch v2.2.0. If I remove the filter property (and contents, of course), I get my entity back (only one exists). With the filter clause in place, I just get 0 results, but no error. Same if I remove the email filter and/or the name filter. Am I doing something wrong with this request?
Request
GET http://localhost:9200/my-app/my-entity/_search?pretty=1
{
"query": {
"filtered" : {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"email": "my.email#email.com"
}
},
{
"term": {
"name": "Test1"
}
}
]
}
}
}
}
Existing Entity
{
"email": "my.email#email.com",
"name": "Test1"
}
Mapping
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
},
"term": {
"type": "long"
}
}
Since email field is analyzed with no custom analyzer, Standard Analyzer will get applied to it and it will split into tokens.
Read about Standard Tokenizer here.
You can use below command to see how my.email#email.com is getting tokenized.
curl -XGET "http://localhost:9200/_analyze?tokenizer=standard" -d "my.email#email.com".
This will generate following output.
{
"tokens": [
{
"token": "my.email", ===> Notice this
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "email.com", ===> Notice this
"start_offset": 9,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
}
]
}
If you want full or exact search you need to make it not_analyzed. Study about how to create a not_analyzed field here.
{
"email": {
"type": "string",
"index": "not_analyzed"
}
}
Hope it is clear

Resources