Search results ordered by search-text-length/match length - elasticsearch

I have this simple mapping:
PUT testindex
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "edgeNGram"]
}
},
"filter" : {
"ngram" : {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer" : "ngram_analyzer"
}
}
}
}
}
With these values:
PUT testindex/test/1
{"name" : "Power"}
PUT testindex/test/2
{"name" : "Pow"}
PUT testindex/test/3
{"name" : "PowerMax"}
PUT testindex/test/4
{"name" : "PowerRangers"}
And searched this:
GET testindex/test/_search
{
"query": {
"match": {
"name": "Po"
}
}
}
And got:
PowerRangers
Power
Pow
PowerMax
All with the same score of 0.2876821
Clearly, the closest result to "Po" is "Pow", and that I expect to receive first; but I don't.
How Should I modify my mapping to behave by this logic?

I think scripted sorting is the solution, but it comes with a performance decrease drawback. See here more about this. And the query you can use is this:
GET testindex/test/_search
{
"query": {
"match": {
"name": "Po"
}
},
"sort": {
"_script": {
"script": "_source['name'].value.length",
"type": "number",
"order": "asc"
}
}
}

Related

Elasticsearch : using fuzzy search to find abbreviations

I have indexed textual articles which mentions company names, like apple and lemonade, and am trying to search for these companies using their abbreviations like APPL and LMND but fuzzy search is giving other results, for example, searching with LMND gives land which is mentioned in the text but it doesn't output lemonade whichever parameters I tried.
First question
Is fuzzy search the suitable solution for such search ?
Second question
what could be a good parameter values ranges to support my problem ?
UPDATE
I have tried synonym filter
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"search_analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
and for SEARCH I used
{
"_source": false,
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "lmnd"
}
}
}
}
}
but it's not working
I believe that the best option for you is the use of synonyms, they serve exactly what you need.
I'll leave an example and the link to an article explaining some details.
PUT teste
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}
GET teste/_search
{
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "appl"
}
}
}
}
}

How to search aggregations in ES?

I have a books index which contains an array of tags (with both text/keyword types), i'd like to offer an autocomplete for tags so users type "ro" and it returns "romance" or "rock and roll".
Here's my mapping:
/books {
...
tags: {
type: 'text',
field: {
keyword: {type: 'keyword'}
}
}
}
Example book
{ name: "foo", tags: ['romance', 'story', 'fiction'] }
My aggregation for tags:
{
size: 0,
aggregations: {
options: {
terms: {
field: `tags.keyword`,
size: 20
}
}
}
How can I only get all distinct tags that match "ro"?
Simply try:
GET book/_search
{
"query": {
"prefix": {
"tags.keyword": "ro"
}
}, "size": 0,
"aggs": {
"options": {
"terms": {
"field": "tags.keyword",
"size": 20
}
}
}
}
But for your use case I suggest to you to build a custom analyzer with ngram filter, like this:
"tags": {
"type": "text",
"analyzer": "english_custom",
"fields": {
"suggester": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
},
"keyword":{
"type": "keyword" }
}
The autocomplete analyzer should be something like this:
{"filter":{
....
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 8
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
EDIT:
Could you play with the include clause in terms aggregation?
GET /_search
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags.keyword",
"include" : "ro.*"
}
}
}
}

Wildcard / regexp in a phrase which has space

Create an index:
Here I an using edge_ngram
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "keyword",
"fields": {
"raw": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
}
POST my_index/my_type/1
{
"text": "2 #Quick Foxes lived and died"
}
POST my_index/my_type/2
{
"text": "2 #Quick Foxes lived died"
}
Now when we search
GET my_index/my_type/_search
{
"query": {
"query_string": {
"default_operator" : "AND",
"query" : "f* d*",
"fields": ["text.raw"]
}
}
}
Only ID 2 should list. But nothing returns.
when you try this
GET my_index/my_type/_search
{
"query": {
"query_string": {
"default_operator" : "AND",
"query" : "f* d*",
"fields": ["text"]
}
}
}
It will return both.
If we have an index with huge data and if we wanted to search wildcards, how we will do it?
single keyword will work, but if we add phrases like which i mentioned in the example, it won't give you any proper result.
To generate a regex expression you can follow these websites:-
Generate regex expression here- http://buildregex.com/
and test your string with expression generated from here https://regex101.com/

Elastic search Unorderered Partial phrase matching with ngram

Maybe I am going down the wrong route, but I am trying to set up Elasticsearch to use Partial Phrase matching to return parts of words from any order of a sentence.
Eg. I have the following input
test name
tester name
name test
namey mcname face
test
And I hope to do a search for "test name" (or "name test"), and I hope all of these return (hopefully sorted in order of score). I can do partial searches, and also can do out of order searches, but not able to combine the 2. I am sure this would be a very common issue.
Below is my Settings
{
"myIndex": {
"settings": {
"index": {
"analysis": {
"filter": {
"mynGram": {
"type": "nGram",
"min_gram": "2",
"max_gram": "5"
}
},
"analyzer": {
"custom_analyser": {
"filter": [
"lowercase",
"mynGram"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "5"
}
}
}
}
}
}
}
My mapping
{
"myIndex": {
"mappings": {
"myIndex": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "custom_analyser"
}
}
}
}
}
}
And my query
{
"query": {
"bool": {
"must": [{
"match_phrase": {
"name": {
"query": "test name",
"slop": 5
}
}
}]
}
}
}
Any help would be greatly appreciated.
Thanks in advance
not sure if you found your solution - I bet you did because this is such an old post, but I was on the hunt for the same thing and found this: Query-Time Search-as-you-type
Look up slop.

Elasticsearch phrase suggester prefix phonetic differences

I was wondering if there is any way for the phrase suggester to correct prefix spelling mistakes on phonetic differences.
Elasticsearch 5.1.2
Testing in Kibana 5.1.2
For Example:
Instead of "circus" someone wrote "sircus", or instead of "coding" someone wrote "koding".
Funny thing is, that instead of "phrase" you can write "frase" and get a suggestion.
Here is my setup.
Settings:
PUT text_index
{
"settings": {
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"shingle_filter"
],
"type": "custom"
},
"reverse": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "reverse"]
}
},
"filter": {
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 5,
"type": "shingle"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"suggest_field": {
"type": "text",
"analyzer": "suggests_analyzer",
"fields": {
"reverse": {
"type": "text",
"analyzer": "reverse"
}
}
}
}
}
}
}
Some documents:
POST test_index/test_type/_bulk
{"index":{}}
{ "suggest_field": "phrase"}
{"index":{}}
{ "suggest_field": "Circus"}
{"index":{}}
{ "suggest_field": "Coding"}
Querying:
POST /so-index/_search
{
"suggest" : {
"text" : "sircus",
"simple_phrase" : {
"phrase" : {
"field" : "suggest_field",
"max_errors": 0.9,
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
},
"direct_generator" : [ {
"field" : "suggest_field",
"suggest_mode" : "always"
}, {
"field" : "suggest_field.reverse",
"suggest_mode" : "always",
"pre_filter" : "reverse",
"post_filter" : "reverse"
}]
}
}
}
}
Also, I repeat following steps a few times (between 5 and 10) without changing anything:
delete index
put index, settings & mappings
add documents
query (codign)
Sometimes I get suggestions and sometimes I don't. Is there any explanation for it?
Try setting "prefix_length": 0 in the direct_generator.

Resources