Fuzziness behavior on a match_phrase query - elasticsearch

Days ago I got this "problem". I was running a match_phrase query in my index. Everything was as expected, until I did the same search with a multiple words nouns (before I was using single word nouns, eg: university). I made one misspelling and the search did not work (not found), if I removed a word (let's say the one that was spelled correctly), the search work (found).
Here there are the example I made:
The settings
PUT index1
{
"mappings": {
"myType": {
"properties": {
"field1": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
POST index1/myType/1
{
"field1": "Commercial Banks"
}
Case 1: Single noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178303,
"hits": [
{
"_index": "index1",
"_type": "myType",
"_id": "1",
"_score": 0.19178303,
"_source": {
"field1": "Commercial Banks"
}
}
]
}
}
Case 2: Multiple noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial banks",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
So, in the second case, why am I not finding the document when performing the match_phrase query? Is there something I am missing?
Those result just make doubt about what I know.
Am I using the fuzzy search incorrectly? I'm not sure if this is a problem, or I'm the one who do not understand the behavior.
Many thanks in advance for reading my question. I hope you can help me with this.

Fuzziness is not supported in phrase queries.
Currently, ES is silent about it, i.e. it allows you to specify the parameter but doesn't warn you that it is not supported. A pull request (#18322) (related to issue #7764) exists that will remedy to this problem. Once merged into ES 5, this query will error out.
In the breaking changes document for 5.0, we can see that this won't be supported:
The multi_match query will fail if fuzziness is used for cross_fields, phrase or phrase_prefix type. This parameter was undocumented and silently ignored before for these types of multi_match.

Related

How do I use the whitespace analyzer correctly?

I am currently having an issue where I cannot search for UUID's in my logs. For instance, I have a fieldname "log" and in there is a full log, for example:
"log": "time=\"2022-10-10T07:46:00Z\" level=info msg=\"message to endpoint (outgoing)\" message=\"{8503fb5a-3899-4305-8480-6ddc0f5df296 2022-10-10T09:45:59+02:00}\"\n",
I want to get this log in elastic search, and via Postman I send this:
{
"query": {
"match": {
"log": {
"analyzer": "whitespace",
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
},
"size": 50,
"from": 0
}
As a response I get:
{
"took": 930,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 581,
"successful": 581,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But when I search on "8503fb5a" alone, then I get the wanted results. This means the dashes are still causing issues, but I thought using the whitespace analyzer should fix this? Am I doing something wrong?
These are the fields I have.
You not required to use whitespace analyzer.
You have 2 option to search entire UUID.
First, You can use match query with operator set to and:
{
"query": {
"match": {
"log":{
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296",
"operator": "and"
}
}
}
}
Second, You can use match_phrase query which will search for exact match.
{
"query": {
"match_phrase": {
"log": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
}

Elasticsearch query with fuzziness AUTO not working as expected

From the Elasticsearch documentation regarding fuzziness:
AUTO
Generates an edit distance based on the length of the term. Low and high distance arguments may be optionally provided AUTO:[low],[high]. If not specified, the default values are 3 and 6, equivalent to AUTO:3,6 that make for lengths:
0..2
Must match exactly
3..5
One edit allowed
>5
Two edits allowed
However, when I am trying to specify low and high distance arguments in the search query the result is not what I am expecting.
I am using Elasticsearch 6.6.0 with the following index mapping:
{
"fuzzy_test": {
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text"
},
"id": {
"type": "keyword"
}
}
}
}
}
}
Inserting a simple document:
{
"id": "1",
"description": "hello world"
}
And the following search query:
{
"size": 10,
"timeout": "30s",
"query": {
"match": {
"description": {
"query": "helqo",
"fuzziness": "AUTO:7,10"
}
}
}
}
I assumed that fuzziness:AUTO:7,10 would mean that for the input term with length <= 6 only documents with the exact match will be returned. However, here is a result of my query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.23014566,
"hits": [
{
"_index": "fuzzy_test",
"_type": "_doc",
"_id": "OQtUu2oBABnEwrgM3Ejr",
"_score": 0.23014566,
"_source": {
"id": "1",
"description": "hello world"
}
}
]
}
}
This is strange but seems like that bug exists only in version the Elasticsearch 6.6.0. I've tried 6.4.2 and 6.6.2 and both of them work just fine.

fuzzy query in elasticsearch

I followed this tutorial.
I tried with 3 data and it worked but when I add 200 data just as the tutorial "text and id" when I did the research it doesn't work
For exemple I have { "index": { "_id": 237 }} { "text": "EMCO"}
when I do my research as:
GET /weef/dicoMot/_search {"query": { "fuzzy": {
"text": "EMCO" }}}
I got this:
{ "took": 36,"timed_out": false, "_shards": { "total": 5 "successful": 5, "failed": 0},
"hits": { "total": 0, "max_score": null, "hits": [] }}
Any suggestion?
You are using standard analyzer with "lowercase" token filter.
so "EMCO" will be indexed as "emco".
There are two solutions to solve this problem:
Use lowercase keyword to search and get the search result:
GET /weef/dicoMot/_search {"query": { "fuzzy": {
"text": "emco" }}}
Update the index analyzer without lowercase filter

Elastic Search fulltext search query and filters

I wanna perform a full-text search, but I also wanna use one or many possible filters. The simplified structure of my document, when searching with /things/_search?q=*foo*:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "things",
"_type": "thing",
"_id": "63",
"_score": 1,
"fields": {
"name": [
"foo bar"
],
"description": [
"this is my description"
],
"type": [
"inanimate"
]
}
}
]
}
}
This works well enough, but how do I combine filters with a query? Let's say I wanna search for "foo" in an index with multiple documents, but I only want to get those with type == "inanimate"?
This is my attempt so far:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*foo*"
}
},
"filter": {
"bool": {
"must": {
"term": { "type": "inanimate" }
}
}
}
}
}
}
When I remove the filter part, it returns an accurate set of document hits. But with this filter-definition it does not return anything, even though I can manually verify that there are documents with type == "inanimate".
Since you have not done explicit mapping, term query is looking for an exact match. you need to add "index : not_analyzed" to type field and then your query will work.
This will give you correct documents
{
"query": {
"match": {
"type": "inanimate"
}
}
}
but this is not the solution, You need do explicit mapping as I said.

How to request a single document by _id via alias?

Is it possible to request a single document by its id by querying an alias, provided that all keys across all indices in the alias are unique (it is an external guarantee)?
From Elasticsearch 5.1 the query looks like:
GET /my_alias_name/_search/
{
"query": {
"bool": {
"filter": {
"term": {
"_id": "AUwNrOZsm6BwwrmnodbW"
}
}
}
}
}
Yes, querying an alias spanning over multiple indices work the same way as querying one indice.
Just do this query over the alias:
POST my_alias_name/_search
{
"filter":{
"term":{"_id": "AUwNrOZsm6BwwrmnodbW"}
}
}
EDIT: GET operations are not real searches and can't be done on aliases spanning over multiple indexes. So the following query is in fact no permitted:
GET my_alias_name/my_type/AUwNrOZsm6BwwrmnodbW
Following Elasticsearch 8.2 Doc, you can retrieve a single document by using GET API:
GET my-index-000001/_doc/0
7.2 version Docs suggest:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
Response should look like:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "indexwhatever",
"_type": "_doc",
"_id": "anyID",
"_score": 1.0,
"_source": {
"field1": "value1",
"field2": "value2"
}
}
]
}
}
In case you want to find a document with some internal id with curl:
curl -X GET 'localhost:9200/_search?q=id:42&pretty'

Resources