fuzzy query in elasticsearch - elasticsearch

I followed this tutorial.
I tried with 3 data and it worked but when I add 200 data just as the tutorial "text and id" when I did the research it doesn't work
For exemple I have { "index": { "_id": 237 }} { "text": "EMCO"}
when I do my research as:
GET /weef/dicoMot/_search {"query": { "fuzzy": {
"text": "EMCO" }}}
I got this:
{ "took": 36,"timed_out": false, "_shards": { "total": 5 "successful": 5, "failed": 0},
"hits": { "total": 0, "max_score": null, "hits": [] }}
Any suggestion?

You are using standard analyzer with "lowercase" token filter.
so "EMCO" will be indexed as "emco".
There are two solutions to solve this problem:
Use lowercase keyword to search and get the search result:
GET /weef/dicoMot/_search {"query": { "fuzzy": {
"text": "emco" }}}
Update the index analyzer without lowercase filter

Related

How do I use the whitespace analyzer correctly?

I am currently having an issue where I cannot search for UUID's in my logs. For instance, I have a fieldname "log" and in there is a full log, for example:
"log": "time=\"2022-10-10T07:46:00Z\" level=info msg=\"message to endpoint (outgoing)\" message=\"{8503fb5a-3899-4305-8480-6ddc0f5df296 2022-10-10T09:45:59+02:00}\"\n",
I want to get this log in elastic search, and via Postman I send this:
{
"query": {
"match": {
"log": {
"analyzer": "whitespace",
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
},
"size": 50,
"from": 0
}
As a response I get:
{
"took": 930,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 581,
"successful": 581,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But when I search on "8503fb5a" alone, then I get the wanted results. This means the dashes are still causing issues, but I thought using the whitespace analyzer should fix this? Am I doing something wrong?
These are the fields I have.
You not required to use whitespace analyzer.
You have 2 option to search entire UUID.
First, You can use match query with operator set to and:
{
"query": {
"match": {
"log":{
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296",
"operator": "and"
}
}
}
}
Second, You can use match_phrase query which will search for exact match.
{
"query": {
"match_phrase": {
"log": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
}

I want to use a wildcard query for url in elasticsearch. I am using elasticsearch 2.3.0

My index looks like this:
GET pibtest1/_search
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 1,
"hits": [
{
"_index": "pibtest1",
"_type": "SearchTech",
"_id": "_update",
"_score": 1,
"_source": {
"script": "ctx._source.remove(\"wiki_collection\")"
}
},
{
"_index": "pibtest1",
"_type": "SearchTech",
"_id": "http://www.searchtechnologies.com/bundles/jquery?v=gOdOgfykTFJnypePAvGweyMPwl-krhx8ntIhefPKelg1",
"_score": 1,
"_source": {
"extension": {
"X-Parsed-By": "org.apache.tika.parser.DefaultParser",
"Content-Encoding": "ISO-8859-1",
"resourceName": "http://www.searchtechnologies.com/bundles/jquery?v=gOdOgfykTFJnypePAvGweyMPwl-krhx8ntIhefPKelg1"
},
"keywords": "keywords-NOT-PROVIDED",
"default_collection": true,
"wiki_collection": false,
"description": "description-NOT-PROVIDED",
"connectorSpecific": {
"discoveredBy": "http://www.searchtechnologies.com/",
"xslt": "false",
"pathFromSeed": "E",
"md5": "OKTGVLEWTE5V4PWXUBM2RK3KMQ"
},
"title": "Title-NOT-PROVIDED",
"url": "http://www.searchtechnologies.com/bundles/jquery?v=gOdOgfykTFJnypePAvGweyMPwl-krhx8ntIhefPKelg1",
"remove": "wiki_collection",
"UD": "http://www.searchtechnologies.com/bundles/jquery?v=gOdOgfykTFJnypePAvGweyMPwl-krhx8ntIhefPKelg1",
Now I want to use a wildcard query to search for few url which includes some pattern(for eg. http://www.searchtechnologies.com/bundles)
This is my wildcard query:
GET pibtest1/_search
{
"query": {
"wildcard": {
"url": {
"value": "http://www.searchtechnologies.com/bundles*"
}
}
}
}
I am using "*" wildcard which matches any character sequence. But I am not getting any results. My output looks like this:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I want my results to include those url which matches this "http://www.searchtechnologies.com/bundles" pattern. Any help would be appreciated.
Based on comments your url field is an analyzed field. So when you insert data the data will be tokenized as ["www.searchtechnologies.com", "v", "jquery", "gOdOgfykTFJnypePAvGweyMPwl", ...]. So your query wont match this field.
You should delete your index.
Insert a mapping and specify url field as not analyzed {"index":"not_analyzed"}
Insert your data.
Run wildcard query.
If you dont want to delete your index because a downtime check: https://www.elastic.co/blog/changing-mapping-with-zero-downtime

How to get a new field in results for the searched query in Elasticsearch?

So I’m using elasticsearch V2.3.1. Below is my elasticsearch query:
GET pibtest1/_search?q=white
{
"size": 1,
"fields": ["U", "UE", "UD", "T"]
}
I get the following result after running the above query:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 85,
"max_score": 0.15116164,
"hits": [
{
"_index": "pibtest1",
"_type": "SearchTech",
"_id": "1",
"_score": 0.15116164,
"fields": {
"UE": [
"Some value1"
],
"U": [
"Some value2"
],
"T": [
"Some value3"
],
"UD": [
"Some value4"
]
}
}
]
}
}
As you can see in the results, Elasticsearch doesn’t provide any information about the query which is searched. In my case, the query is “white”. So is there any way to get the searched query (“white”) in the result? For example, I would like to get something like this in the result ->
“query”: “white”
I checked the explain API of Elasticsearch. It does provide the details of how the score gets computed but it doesn’t explicitly contain any field for searched query. Thank you everyone.

Fuzziness behavior on a match_phrase query

Days ago I got this "problem". I was running a match_phrase query in my index. Everything was as expected, until I did the same search with a multiple words nouns (before I was using single word nouns, eg: university). I made one misspelling and the search did not work (not found), if I removed a word (let's say the one that was spelled correctly), the search work (found).
Here there are the example I made:
The settings
PUT index1
{
"mappings": {
"myType": {
"properties": {
"field1": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
POST index1/myType/1
{
"field1": "Commercial Banks"
}
Case 1: Single noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178303,
"hits": [
{
"_index": "index1",
"_type": "myType",
"_id": "1",
"_score": 0.19178303,
"_source": {
"field1": "Commercial Banks"
}
}
]
}
}
Case 2: Multiple noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial banks",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
So, in the second case, why am I not finding the document when performing the match_phrase query? Is there something I am missing?
Those result just make doubt about what I know.
Am I using the fuzzy search incorrectly? I'm not sure if this is a problem, or I'm the one who do not understand the behavior.
Many thanks in advance for reading my question. I hope you can help me with this.
Fuzziness is not supported in phrase queries.
Currently, ES is silent about it, i.e. it allows you to specify the parameter but doesn't warn you that it is not supported. A pull request (#18322) (related to issue #7764) exists that will remedy to this problem. Once merged into ES 5, this query will error out.
In the breaking changes document for 5.0, we can see that this won't be supported:
The multi_match query will fail if fuzziness is used for cross_fields, phrase or phrase_prefix type. This parameter was undocumented and silently ignored before for these types of multi_match.

How to filter out elements from an array that doesn’t match the query?

stackoverflow won't let me write that much example code so I put it on gist.
So I have this index
with this mapping
here is a sample document I insert into newly created mapping
this is my query
GET products/paramSuggestions/_search
{
"size": 10,
"query": {
"filtered": {
"query": {
"match": {
"paramName": {
"query": "col",
"operator": "and"
}
}
}
}
}
}
this is the unwanted result I get from previous query
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
{
"paramName": "capacity",
"value": "32GB"
}
]
}
}
]
}
}
and finally the wanted result, how I want the query result to look like
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
]
}
}
]
}
}
How should the query look like to achieve the wanted result with filtered array field which matches the query? In other words, all other non-matching array items should not appear in the final result.
The final result is the _source document that you indexed. There is no feature that lets you mask field elements of your document out of the Elasticsearch response.
That said, depending on your goal, you can look into how Highlighters and Suggesters identify result terms matching the query, or possibly, roll-your-own client-side masking using info returned from setting "explain": true in your query.

Resources