exact match in elasticSearch after incorporating hunspell filter - elasticsearch

We have added the hunspell filter to our elastic search instance. Nothing fancy...
{
"index" : {
"analysis" : {
"tokenizer" : {
"comma" : {
"type" : "pattern",
"pattern" : ","
}
},
"filter": {
"en_GB": {
"type": "hunspell",
"language": "en_GB"
}
},
"analyzer" : {
"comma" : {
"type" : "custom",
"tokenizer" : "comma"
},
"en_GB": {
"filter": [
"lowercase",
"en_GB"
],
"tokenizer": "standard"
}
}
}
}
}
Now though we seem to have lost the built in facility to do exact match queries using quotation marks. So searching for "lace" will also do an equal score search for "lacy" for example. I understand this is kind of the point of including hunspell but I would like to be able to force exact matches by using quotes
I am doing boolean queries for this by the way. Along the lines of (in java)
"bool" : {
"must" : {
"query_string" : {
"query" : "\"lace\"",
"fields" :
...
or (postman direct to 9200 ...
{
"query" : {
"query_string" : {
"query" : "\"lace\"",
"fields" :
....
Is this possible ? I'm guessing this might be something we would do in the tokaniser but I'm not quite sure where to start...?

You will not be able to handle this tokenizer level, but you can tweak configurations at mapping level to use multi-fields, you can keep a copy of the same field which will not be analyzed and later use this in query to support your usecase.
You can update your mappings like following
"mappings": {
"desc": {
"properties": {
"labels": {
"type": "string",
"analyzer": "en_GB",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
Furthur modify your query to search on raw field instead of analyzed field.
{
"query": {
"bool": {
"must": [{
"query_string": {
"default_field": "labels.raw",
"query": "lace"
}
}]
}
}
}
Hope this helps
Thanks

Related

How can I use query_string to match both nested and non-nested fields at the same time?

I have an index with a mapping something like this:
"email" : {
"type" : "nested",
"properties" : {
"from" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
},
"subject" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
},
"to" : {
"type" : "text",
"analyzer" : "lowercase_keyword",
"fielddata" : true
}
}
},
"textExact" : {
"type" : "text",
"analyzer" : "lowercase_standard",
"fielddata" : true
}
I want to use query_string to search for matches in both the nested and the non-nested field at the same time, e.g.
email.to:foo#example.com AND textExact:bar
But I can't figure out how to write a query that will search both fields at once. The following doesn't work, because query_string searches do not return nested documents:
"query": {
"query_string": {
"fields": [
"textExact",
"email.to"
],
"query": "email.to:foo#example.com AND textExact:bar"
}
}
I can write a separate nested query, but that will only search against nested fields. Is there any way I can use query_string to match both nested and non-nested fields at the same time?
I am using Elasticsearch 6.8. Cross-posted on the Elasticsearch forums.
Nested documents can only be queried with the nested query.
You can follow below two approaches.
1. You can combine nested and normal query in must clause, which works like "and" for different queries.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "email",
"query": {
"term": {
"email.to": "foo#example.com"
}
}
}
},
{
"match": {
"textExact": "bar"
}
}
]
}
}
}
2. copy-to
The copy_to parameter allows you to copy the values of multiple fields into a group field, which can then be queried as a single field.
{
"mappings": {
"properties": {
"textExact":{
"type": "text"
},
"to_email":{
"type": "keyword"
},
"email":{
"type": "nested",
"properties": {
"to":{
"type":"keyword",
"copy_to": "to_email" --> copies to non-nested field
},
"from":{
"type":"keyword"
}
}
}
}
}
}
Query
{
"query": {
"query_string": {
"fields": [
"textExact",
"to_email"
],
"query": "to_email:foo#example.com AND textExact:bar"
}
}
}
Result
"_source" : {
"textExact" : "bar",
"email" : [
{
"to" : "sdfsd#example.com",
"from" : "a#example.com"
},
{
"to" : "foo#example.com",
"from" : "sdfds#example.com"
}
]
}

Match fails elasticsearch

I have the following index in which I index mail addresses.
PUT _myindex
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : true,
"patterns" : [
"^(.*?)#",
"(\\w+(?=.*#))"]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "lowercase","email", "unique" ]
}
}
}
},
"mappings": {
"emails": {
"properties": {
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
My e-mail in the following form "example.elastic#yahoo.com". When i index them they get analysed like example.elastic#yahoo.com, example.elastic, elastic, example.
When i run a match
GET _myindex/_search
{
"query": {
"match": {
"email": "example.elastic#yahoo.com"
}
}
}
or using as a query string example, elastic, Elastic it works and retrieves results. But the problem is when I have "example.elastic.blabla#yahoo.com", it also returns the same results. What can be the problem?
Using term query instead of match query will solve this.
Reason is, The match query will apply analyzer to the search term and will therefore match what is stored in the index. The term query does not apply any analyzers to the search term, so will only look for that exact term in the index.
Ref: https://stackoverflow.com/a/23151332/6546289
GET _myindex/_search
{
"query": {
"term": {
"email": "example.elastic#yahoo.com"
}
}
}

Undesired Stopwords in Elastic Search

I am using Elastic Search 6.This is query
PUT /semtesttest
{
"settings": {
"index" : {
"analysis" : {
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "analysis1/stopwords.csv"
},
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis1/synonym.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["synonym","my_stop"]
}
}
}
}
},
"mappings": {
"all_questions": {
"dynamic": "strict",
"properties": {
"kbaid":{
"type": "integer"
},
"answer":{
"type": "text"
},
"question": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
PUT /semtesttest/all_questions/1
{
"question":"this is hippie"
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"hippie","fuzziness":2}}
}
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"this is","fuzziness":2}}
}
}
in synonym.txt it is
this, that, money => sainai
in stopwords.csv it is
hello
how
are
you
The first get ('hippie') return empty
only the second get ('this is') return results
what is the problem? It looks like the stop word "this is" is filtered in the first query, but I have specified my stop words explicitly?
fuzzy is a term query. It is not going to analyze the input, so your query was looking for the exact term this is (applying some fuzzy fun).
So you either want to build a query off those two terms, or use a full text query instead. If fuzziness is important, I think the only full text query is match:
GET /semtesttest/all_questions/_search?pretty
{
"query":{
"match":{"question":{"query":"this is","fuzziness":2}}
}
}
If match phrases is important, you may want to look at this answer and work with span queries.
This might also help you so you can see how your analyzer is being used:
GET /semtesttest/_analyze?analyzer=my_analyzer&field=question&text=this is

How do I search for partial accented keyword in elasticsearch?

I have the following elasticsearch settings:
"settings": {
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":["lowercase", "asciifolding"]
}
}
}
}
}
The above works fine for the following keywords:
Beyoncé
Céline Dion
The above data is stored in elasticsearch as beyonce and celine dion respectively.
I can search for Celine or Celine Dion without the accent and I get the same results. However, the moment I search for Céline, I don't get any results. How can I configure elasticsearch to search for partial keywords with the accent?
The query body looks like:
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["name"],
"type": "phrase",
"query": "Céline"
}
}
]
}
}
}
and the mapping is
"mappings" : {
"artist" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"orig" : {
"type" : "string",
"index" : "not_analyzed"
},
"simple" : {
"type" : "string",
"analyzer" : "analyzer_keyword"
}
},
}
I would suggest this mapping and then go from there:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
}
Confirm that the same analyzer is getting used at query time. Here are some possible reasons why that might not be happening:
you specify a separate analyzer at query time on purpose that is not performing similar analysis
you are using a term or terms query for which no analyzer is applied (See Term Query and the section title "Why doesn’t the term query match my document?")
you are using a query_string query (E.g. see Simple Query String Query) - I have found that if you specify multiple fields with different analyzers and so I have needed to separate the fields into separate queries and specify the analyzer parameter (working with version 2.0)

bidirectional match on elasticsearch

I've indexed a list of terms and now I want to query for some of them
Say that I've indexed 'dog food','red dog','dog','food','cats'
How do I create an exact bidirectional match query. ie: I want when search for 'dog' to get only the term dog and not the other terms (because they don't match back).
One primitive solution I thought of is indexing the terms with their length (Words-wise) and then when searching query with lengh X restrict it to the terms of length X. but it seems over complicated.
Create a custom analyzer to lowercase and normalize your search terms. So that would be your index:
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : [
"asciifolding",
"lowercase"
]
}
}
}
},
"mappings" : {
"your_type" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "my_analyzer_keyword"
}
}
}
}
}
So if you have indexed 'dog' and users types in Dog or dog or DOG, it will match only dog, 'dog food' won't be brought back.
Just set your field's index property to not_analyzed and your query should use term filter to search for text.
As per Evaldas' suggestion, find below a more complete solution, that also keeps the original value indexed with standard analyzer but uses a sub-field with a lowercased version of the terms:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"asset": {
"properties": {
"name": {
"type": "string",
"fields": {
"case_ignore": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
POST /test/asset/1
{
"name":"dog"
}
POST /test/asset/2
{
"name":"dog food"
}
POST /test/asset/3
{
"name":"red dog"
}
GET /test/asset/_search
{
"query": {
"match": {
"name.case_ignore": "Dog"
}
}
}

Resources