Elastic Multimatch Query doesn't match document - elasticsearch

I am querying elastic (v6.7) for items that match the phrase "x-ray" with the query below:
POST item/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"type": "phrase_prefix",
"query": "X-Ray",
"fields": [
"mpn",
"product_description"
"manufacturer_name"
],
"operator": "and",
"analyzer": "standard"
}
}
}
}
}
The result set is empty.
I have item documents that contain the phrase "x-ray". For example if I query:
GET items/_doc/3e4a2d80-9d5e-11e7-a6c5-6ddf18575461
It returns:
{
"_index": "items",
"_type": "_doc",
"_id": "3e4a2d80-9d5e-11e7-a6c5-6ddf18575461",
"_version": 1,
"_seq_no": 7605,
"_primary_term": 1,
"found": true,
"_source": {
"manufacturer_name": "GE",
"var_pricing": 0,
"on_hand": 1,
...
"product_description": "Portable X-Ray w/Fuji CR Reader", <----This should be a match!
"project_id": null,
"user_id": "12",
"quote_items": [],
"parentCategory": [
0
]
}
}
If I run a query on a freshly installed version of elastic (v7.3) where I add three documents like so:
POST product/_bulk
{"index":{"_id":1001}}
{"name":"x-ray Machine","price":152000,"in_stock":38,"sold":47,"tags":["Alcohol","Wine"],"description":"x-ray machine for x-rays","is_active":true,"created":"2004\/05\/13"}
{"index":{"_id":1002}}
{"name":"X-Ray film","price":99,"in_stock":10,"sold":430,"tags":[],"description":"just some x-ray film","is_active":true,"created":"2007\/10\/14"}
{"index":{"_id":1003}}
{"name":"Table","price":2500,"in_stock":24,"sold":215,"tags":[],"description":"could be used for an x-ray table","is_active":true,"created":"2000\/11\/17"}
Then query with:
POST product/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"type": "phrase_prefix",
"query": "X-Ray",
"fields": [
"name",
"description"
],
"operator": "and",
"analyzer": "standard"
}
}
}
}
}
All three items are returned:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 31.876595,
"hits" : [
{
"_index" : "product",
"_type" : "default",
"_id" : "1001",
"_score" : 31.876595,
"_source" : {
"name" : "x-ray Machine",
"price" : 152000,
"in_stock" : 38,
"sold" : 47,
"tags" : [
"Alcohol",
"Wine"
],
"description" : "x-ray machine for x-rays",
"is_active" : true,
"created" : "2004/05/13"
}
},
{
"_index" : "product",
"_type" : "default",
"_id" : "1002",
"_score" : 27.347116,
"_source" : {
"name" : "X-Ray film",
"price" : 99,
"in_stock" : 10,
"sold" : 430,
"tags" : [ ],
"description" : "just some x-ray film",
"is_active" : true,
"created" : "2007/10/14"
}
},
{
"_index" : "product",
"_type" : "default",
"_id" : "1003",
"_score" : 25.889376,
"_source" : {
"name" : "Table",
"price" : 2500,
"in_stock" : 24,
"sold" : 215,
"tags" : [ ],
"description" : "could be used for an x-ray table",
"is_active" : true,
"created" : "2000/11/17"
}
}
]
}
}
What gives?
I used the explain API to get some more insight but all it says is that there isn't a match:
POST items/_doc/3e4a2d80-9d5e-11e7-a6c5-6ddf18575461/_explain
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "phrase_prefix",
"query": "X-Ray",
"fields": [
"product_description",
"mpn",
"manufacturer_name"
],
"operator": "and",
"analyzer": "standard"
}}
]
}
}
}
}
Returns:
{
"_index": "items",
"_type": "_doc",
"_id": "3e4a2d80-9d5e-11e7-a6c5-6ddf18575461",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause (((+product_description:x +product_description:ray) | (+mpn:x +mpn:ray) | (+manufacturer_name:x +manufacturer_name:ray)))",
"details": [
{
"value": 0,
"description": "No matching clause",
"details": []
}
]
},
{
"value": 0,
"description": "no match on required clause (MatchNoDocsQuery(\"Type list does not contain the index type\"))",
"details": [
{
"value": 0,
"description": "MatchNoDocsQuery(\"Type list does not contain the index type\") doesn't match id 12556",
"details": []
}
]
}
]
}
}
Not much changes when I change the analyzer to whitespace or keyword either.

( this is not answer but I could not type all this up in a comment)
I am not sure you really needed to use analyzer with your query if you intended to match X-Ray as a whole.
look at this
POST _analyze
{
"analyzer": "standard",
"text":"X-Ray"
}
and the response is
{
"tokens" : [
{
"token" : "x",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "ray",
"start_offset" : 2,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
so your search term X-Ray became x and ray. Is this what you intended?

So I determined my problem was that the standard analyzer is being applied all the time because it was set in the mappings to use a custom analyzer (which used the standard analyzer).
shown here:
GET items/_mapping
shows
...
"manufacturer_name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
},
"analyzer": "my_search_analyzer",
"search_analyzer": "standard"
},
...
This is the same for the other two index fields I was querying for.
The lesson here:
Check the mappings to assure no custom analyzers have been set for certain fields if you are having issues with search.

Related

Elasticsearhc filter sub object before search

Let's say I have index like this:
{
"id": 6,
"name": "some name",
"users": [
{
"id": 1,
"name": "User1",
"isEnabled": false,
},
{
"id": 2,
"name": "User2",
"isEnabled": false,
},
{
"id": 3,
"name": "User3,
"isEnabled": true,
},
]
}
what I need is to return that index while user searching for the name some name, but also I want to filter out all not enabled users, and if there is not enabled users omit that index.
I tried to use filters like this:
{
"query": {
"bool": {
"must": {
"match": {
"name": "some name"
}
},
"filter": {
"term": {
"users.isEnabled": true
}
}
}
}
}
but in such a case I see index with all users no matter if user is enabled or not. I'm a bit new but is there a way to do so??? I can filter out all that in code after getting data from elasticsearch but in such a case it can break pagination if I remove some index without enabled users from result set.
I'm a bit new to elasticsearch, but as far I can't find how to do it. Thank you in advice!
Elasticsearch will return whole document if there is any match. If you update your mapping and make users array nested, you can achieve this by using inner hits. This is a basic example mapping that works:
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"users": {
"type": "nested"
}
}
}
}
And if you send a query like following, response will contain id and name from the parent document, and it will contain inner_hits that match to your user's isEnabled query.
{
"_source": ["id", "name"],
"query": {
"bool": {
"must": [
{
"match": {
"name": "some name"
}
},
{
"nested": {
"path": "users",
"query": {
"term": {
"users.isEnabled": {
"value": true
}
}
},
"inner_hits": {}
}
}
]
}
}
}
This is an example response
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9375811,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9375811,
"_source" : {
"name" : "some name",
"id" : 6
},
"inner_hits" : {
"users" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.540445,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "users",
"offset" : 2
},
"_score" : 1.540445,
"_source" : {
"id" : 3,
"name" : "User3",
"isEnabled" : true
}
}
]
}
}
}
}
]
}
}
Then you can do the mapping in the application.

Elasticsearch English stemming not working correctly

I've added an english stemmer analyzer and filter to our query but it doesn't seem to be working correctly with plurals stemming from 'y' => 'ies'.
For example, when I search 'raspberry' the results never include 'raspberries' and so on.
I've tried both english and minimal_english but I still get the same result.
Here's the analyzer and settings:
analysis: {
analyzer: {
custom_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "english_stemmer"],
},
},
filter: {
english_stemmer: {
type: "stemmer",
language: "english",
},
},
},
}
What am I doing wrong?
Though english should work for the e.g. you mentioned, you can even go for porter_stem instead. This is equivalent to stemmer with language english.
porter_stem in action:
POST /_analyze
{
"tokenizer": "standard",
"filter": ["porter_stem"],
"text": ["raspberry", "raspberries"]
}
Response of above request:
{
"tokens" : [
{
"token" : "raspberri",
"start_offset" : 0,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "raspberri",
"start_offset" : 10,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 101
}
]
}
You can see both raspberry and raspberries get tokenise to raspberri. Therefore searching for raspberry will also match raspberries and vice-versa.
Make sure that the field against which you are indexing and searching has defined the analyzer as custom_analyzer (according to settings you stated in your question).
Working e.g.
Mapping:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stemmer"
]
}
},
"filter": {
"english_stemmer": {
"type": "stemmer",
"language": "english"
}
}
}
},
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
Indexing:
PUT test/_doc/1
{
"field1": "raspberries"
}
PUT test/_doc/2
{
"field1": "raspberry"
}
Search:
GET test/_search
{
"query": {
"match": {
"field1": {
"query": "raspberry"
}
}
}
}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.18232156,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.18232156,
"_source" : {
"field1" : "raspberries"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.18232156,
"_source" : {
"field1" : "raspberry"
}
}
]
}
}
You can also have a look at other stemmer kstem.
Unfortunately, porter_stem doesn't always work, e.g. virus and viruses. Someone suggested snowball - but I haven't tried it yet...

Filter nested objects in ElasticSearch 6.8.1

I didn't find any answers how to do simple thing in ElasticSearch 6.8 I need to filter nested objects.
Index
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {
"human": {
"properties": {
"cats": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"breed": {
"type": "text"
},
"colors": {
"type": "integer"
}
}
},
"name": {
"type": "text"
}
}
}
}
}
Data
{
"name": "iridakos",
"cats": [
{
"colors": 1,
"name": "Irida",
"breed": "European Shorthair"
},
{
"colors": 2,
"name": "Phoebe",
"breed": "european"
},
{
"colors": 3,
"name": "Nino",
"breed": "Aegean"
}
]
}
select human with name="iridakos" and cats with breed contains 'European' (ignore case).
Only two cats should be returned.
Million thanks for helping.
For nested datatypes, you would need to make use of nested queries.
Elasticsearch would always return the entire document as a response. Note that nested datatype means that every item in the list would be treated as an entire document in itself.
Hence in addition to return entire document, if you also want to know the exact hits, you would need to make use of inner_hits feature.
Below query should help you.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "iridakos"
}
},
{
"nested": {
"path": "cats",
"query": {
"match": {
"cats.breed": "european"
}
},
"inner_hits": {}
}
}
]
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.74455214,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1", <--- The document that hit
"_score" : 0.74455214,
"_source" : {
"name" : "iridakos",
"cats" : [
{
"colors" : 1,
"name" : "Irida",
"breed" : "European Shorthair"
},
{
"colors" : 2,
"name" : "Phoebe",
"breed" : "european"
},
{
"colors" : 3,
"name" : "Nino",
"breed" : "Aegean"
}
]
},
"inner_hits" : { <---- Note this
"cats" : {
"hits" : {
"total" : {
"value" : 2, <---- Count of nested doc hits
"relation" : "eq"
},
"max_score" : 0.52354836,
"hits" : [
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 1
},
"_score" : 0.52354836,
"_source" : { <---- First Nested Document
"breed" : "european"
}
},
{
"_index" : "my_cat_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "cats",
"offset" : 0
},
"_score" : 0.39019167,
"_source" : { <---- Second Document
"breed" : "European Shorthair"
}
}
]
}
}
}
}
]
}
}
Note in your response how the inner_hits section would appear where you would find the exact hits.
Hope this helps!
You could use something like this:
{
"query": {
"bool": {
"must": [
{ "match": { "name": "iridakos" }},
{ "match": { "cats.breed": "European" }}
]
}
}
}
To search on a cat's breed, you can use the dot-notation.

How to do an exact match query in ElasticSearch?

I want to do an exact match query to an ElasticSearch index,
I have the following data -
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.21110919,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.21110919,
"_source" : {
"id" : 1,
"name" : "test"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.160443,
"_source" : {
"id" : 2,
"name" : "test two"
}
}
]
}
}
I want to query the field name,
I am trying to search the name test,
But it returns me both documents.
The expected result is the only document 1.
Mapping is as follows -
{
"test" : {
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
I tried the following -
GET /test/_search
{
"query": {
"bool": {
"must": {
"term" : {
"name": "test"
}
}
}
}
}
GET /test/_search
{
"query": {
"match": {
"name": "test"
}
}
}
In addition to the link to the answer I provided in comment, I would suggest you to define name field as:
{
"name":{
"type": "text",
"fields":{
"keyword":{
"type": "keyword"
}
}
}
}
and then query on field name.keyword whenever you require exact match (case sensitive) and name if you want partial match such as search on first name only.
Looks like you are using text datatype on your name field, which is spitting test two in 2 tokens as test and two, hence it matches your search query test as match query is analyzed and applies the same analyzer to resultant tokens are matched against the documents tokens present in the inverted index.
Solution your using example
Index def
{
"mappings": {
"properties": {
"name": {
"type": "keyword" --> note use of `keyword` type
}
}
}
}
Index you sample docs
{
"name" : "test two"
}
{
"name" : "test"
}
Search query same as yours
{
"query": {
"match": {
"name": "test"
}
}
}
Search results as you want
"hits": [
{
"_index": "so_key",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"name": "test"
}
}
]
Important Note: you can use the analyze API to see how your data is indexed, for example
Using standard(default analyzer) on the text field
POST _analyze
{
"text": "test two",
"analyzer" : "standard" --> Change analyzer to keyword and see diff
}
Tokens
{
"tokens": [
{
"token": "test",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "two",
"start_offset": 5,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
}
]
}

Elasticsearch Array (Label/Tag Querying

I really think that I'm trying to do is fairly simple. I'm simply trying to query for N tags. A clear example of this was asked and answered over at "Elasticsearch: How to use two different multiple matching fields?". Yet, that solution doesn't seem to work for the latest version of ES (more likely, I'm simply doing it wrong).
To show the current data and to demonstrate a working query, see below:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Price": [10,5]
}
}
}
}
}
Here are the results for this. As you can see, 5 and 10 are showing up (this demonstrates that basic queries do work):
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGMYXB5vRcKBZaDw",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "a",
"Name" : "Sample 1",
"Timestamp" : 1.455031083799152E9,
"Price" : "10",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGHHXB5vRcKBZaDF",
"_score" : 1.0,
"_source" : {
"Category" : [ "Small Signs" ],
"Code" : "b",
"Name" : "Sample 2",
"Timestamp" : 1.45503108346191E9,
"Price" : "5",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGILXB5vRcKBZaDO",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "c",
"Name" : "Sample 3",
"Timestamp" : 1.455031083530215E9,
"Price" : "10",
"IsEnabled" : true
}
}, {
"_index" : "labelsample",
"_type" : "entry",
"_id" : "AVLGnGGgXB5vRcKBZaDA",
"_score" : 1.0,
"_source" : {
"Category" : [ "Medium Signs" ],
"Code" : "d",
"Name" : "Sample 4",
"Timestamp" : 1.4550310834233E9,
"Price" : "10",
"IsEnabled" : true
}
}]
}
}
As a side note: the following bool query gives the exact same results:
{
"query": {
"bool": {
"must": [{
"terms": {
"Price": [10,5]
}
}]
}
}
}
Notice Category...
Let's simply copy/paste Category into a query:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Category" : [ "Medium Signs" ]
}
}
}
}
}
This gives the following gem:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Again, here's the bool query version that gives the same 0-hit result:
{
"query": {
"bool": {
"must": [{
"terms": {
"Category" : [ "Medium Signs" ]
}
}]
}
}
}
In the end, I definitely need something similar to "Category" : [ "Medium Signs", "Small Signs" ] working (in concert with other label queries and minimum_should_match as well-- but I can't even get this bare-bones query to work).
I have zero clue why this is. I poured over the docs for houring, trying everything I can see. Do I need to look into debugging various encodings? Is my syntax archaic?
The problem here is that ElasticSearch is analyzing and betokening the Category field, and the terms filter expects an exact match. One solution here is to add a raw field to Category inside your entry mapping:
PUT labelsample
{
"mappings": {
"entry": {
"properties": {
"Category": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"Code": {
"type": "string"
},
"Name": {
"type": "string"
},
"Timestamp": {
"type": "date",
"format": "epoch_millis"
},
"Price": {
"type": "string"
},
"IsEnabled": {
"type": "boolean"
}
}
}
}
}
...and filter on the raw field:
GET labelsample/entry/_search
{
"query": {
"filtered": {
"filter": {
"terms": {
"Category.raw" : [ "Medium Signs" ]
}
}
}
}
}

Resources