elastic search phrase suggestor not removing spaces from text - elasticsearch

elastic search suggestor not removing unwanted spaces
query used...
POST /_search
{
"_source": false,
"suggest": {
"text": "mega polis",
"simple_phrase": {
"phrase": {
"field": "address.phonetic",
"size": 5,
"confidence": 1,
"max_errors": 3,
"gram_size": 2,
"analyzer": "trigram",
"direct_generator": [
{
"suggest_mode": "always",
"field": "address.phonetic",
"size": 10,
"prefix_length": 0
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
I have indexed data megapolis and if i enter search text mega polis
it is not correcting to megapolis
I have used ngram analyzer min_shingle_size=2 and max_shingle_size=3

Have you found a good solution to this?
I'm not sure if this is the best way, but in my case a simple solution was to switch to the term-suggester. To keep the full input as a single term I used the keyword-analyzer. I guess your trigram-analyzer will do the trick as well, and it's probably preferable when dealing with longer input texts.
GET myindex/_search
{
"suggest": {
"did_you_mean": {
"text": "orddelings feil",
"term": {
"analyzer": "keyword",
"field": "title"
}
}
}
}

Related

add fuzziness to elasticsearch query

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.
There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

Elasticsearch Context Suggester geo context - boost without filtering?

I'm creating a completion suggester with a geo context (Elastic 5.x).
mapping...
"suggest": {
"type": "completion",
...
"contexts": [
{
"name": "geoloc",
"type": "geo",
"precision": 3,
"path": "geolocation"
}
]
When I query this, I'd like to have it not filter by the geo context, only boost results that are within the geohash. It works great to filter by a single geohash, or filter by a lower precision, and then boost a higher precision within that original filter like this:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"precision": 1
},
{
"lat": 44.8214564,
"lon": -93.475399,
"boost": 2
}
]
}
}
}
}
However, I can't get it to only boost on a single geo context without filtering.
When I submit the following query, it filters and boosts:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"boost": 2
}
]
}
}
}
}
Is what I'm trying to do just not supported, or am I missing something?
Thanks!
Jason
Just ran into this issue as well.
The solution I came up with through trial and error was to use the category context to filter first to all my documents. Say you had added a category to your documents named "all" you could do this:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"category": ["all"],
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"precision": 2,
"boost": 2
}
]
}
}
}
}
When this is done, it seems to be selecting everything with the "all" category and then boosts the ones within the precision level specified to the top.
Using Elastic 6.*

Autocomplete functionality using elastic search

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:
mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4
Usecase:
My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.
I investigated 2 ways for doing that but non of the methods seem to address the usecase above:
1) Suggester autocomplete:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html
With this it seems I have to add another field called "suggest" replicating the data which is not desirable.
2) using a prefix filter/query:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html
this gives the whole document back not the exact matching terms.
Is there a clean way of achieving this, please advise.
Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.
GET /netflix/movie/_search
{
"query": {
"query_string": {
"query": "sta*"
}
}
}
I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.
This is a sample index(I am assuming you are using ES 1.7 from your question
PUT netflix
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"edge_filter"
]
},
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
},
"filter": {
"edge_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
},
"mappings": {
"movie":{
"properties": {
"name":{
"type": "string",
"fields": {
"prefix":{
"type":"string",
"index_analyzer" : "prefix_analyzer",
"search_analyzer" : "keyword_analyzer"
},
"raw":{
"type": "string",
"analyzer": "keyword_analyzer"
}
}
},
"tags":{
"type": "string", "index": "not_analyzed"
}
}
}
}
}
Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter
so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.
The following query will give top 10 suggestions.
GET netflix/movie/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sci-fi"
}
},
"query": {
"match": {
"name.prefix": "sta"
}
}
}
},
"size": 0,
"aggs": {
"unique_movie_name": {
"terms": {
"field": "name.raw",
"size": 10
}
}
}
}
Results will be something like
"aggregations": {
"unique_movie_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "star trek",
"doc_count": 1
},
{
"key": "star wars",
"doc_count": 1
}
]
}
}
UPDATE :
You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.
{
"query": {
"query_string": {
"query": "sta*"
}
},
"_source": false,
"highlight": {
"fields": {
"*": {}
}
}
}

Find concatenate words in Elasticsearch

Say I have indexed this data
song:{
title:"laser game"
}
but the user is searching for
lasergame
How would you go about mapping/indexing/querying for this?
This is kind of tricky problem.
1) I guess the most effective way might be to use compound token filter, with word list made up of some words you think user might concatenate.
"settings": {
"analysis": {
"analyzer": {
"concatenate_split": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"myFilter"
]
}
},
"filter": {
"myFilter": {
"type": "dictionary_decompounder",
"word_list": [
"laser",
"game",
"lean",
"on",
"die",
"hard"
]
}
}
}
}
After applying analyzer, lasergame will split into laser and game along with lasergame, now this will give you results that has any of those words.
2) Another approach could be concatenating whole title with pattern replace char filter replacing all the spaces.
{
"index" : {
"analysis" : {
"char_filter" : {
"my_pattern":{
"type":"pattern_replace",
"pattern":"\\s+",
"replacement":""
}
},
"analyzer" : {
"custom_with_char_filter" : {
"tokenizer" : "standard",
"char_filter" : ["my_pattern"]
}
}
}
}
}
You need to use multi fields with this approach, with this pattern, laser game will be indexed as lasergame and your query will work.
Here the problem is laser game play will be indexed as lasegameplay and search for lasergame wont return anything so you might want to consider using prefix query or wildcard query for this.
3) This might not make sense but you could also use synonym filter, if you think users are often concatenating some words.
Hope this helps!
Easiest solution would be using nGrams. That would be the base to start working with and could be tweaked to meet your needs. But here you go:
Mappings
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"type": "custom",
"tokenizer": "nGram",
"filter": [
"asciifolding",
"lowercase"
]
}
}
}
},
"mappings": {
"sample": {
"properties": {
"myField": {
"type": "string",
"analyzer": "myAnalyzer"
}
}
}
}
}
Test document
PUT /test/sample/1
{
"myField": "laser game"
}
Query
GET /test/_search
{
"query": {
"match": {
"myField": "lasergame"
}
}
}
Results
{
"took": 47,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2161999,
"hits": [
{
"_index": "test",
"_type": "sample",
"_id": "1",
"_score": 0.2161999,
"_source": {
"myField": "laser game"
}
}
]
}
}
This analyzer will create lots of ngrams in your index, such as la, las, lase...gam, game and etc. Both lasergame and laser game will produce a lot of similar tokens and will find your document as you'd expect.

ElasticSearch: Attempting to get spelling suggestion on proper names

Before I begin, let me just say that I'm no ElasticSearch expert, but I am currently tasked with tweaking some analyzers to get spelling suggestions working better in a couple of different situations. I've seen examples of people who are doing spelling suggestions on proper names, so I know it must be possible, but I've been at this for a couple days now, and I must be missing something, because ElasticSearch doesn't seem to recognize the name I'm looking for. Can you please help me figure this out? Thanks in advance!
Here's the analyzer I'm using for index as well as search:
"full_text": {
"filter": [
"lowercase",
"asciifolding",
],
"type": "custom",
"tokenizer": "keyword"
},
This should demonstrate that the field is tokenizing into one long keyword, which I want.
{
"query": {
"match": {
"_all": "combine 5"
}
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_field"
}
}
}
}
...and it outputs something like this, which shows how the field is being tokenized. Looks good:
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 75,
"max_score": 0.58574116,
"hits": [
{
"_index": "my_index",
"_type": "thing",
"_id": "1",
"_score": 0.58574116,
"fields": {
"terms": [
[
"combine 5"
]
]
}
}
}
}
... but when I do a suggest query, it doesn't suggest the field, even though it's just off by a space.
{
"query": {
"match": {
"_all": "combine 5"
}
},
"suggest": {
"suggest-0": {
"term": {
"field": "_all",
"size": 5
},
"text": "combine5"
}
}
}
Which returns a bunch of documents and this suggestion:
"suggest": {
"suggest-0": [
{
"text": "combine5",
"offset": 0,
"length": 8,
"options": [
{
"text": "combined",
"score": 0.875,
"freq": 15
},
{
"text": "combine",
"score": 0.85714287,
"freq": 17
}
]
}
]
}
Note that if I change the spelling suggestion to work just on the field that contains the text, it does suggest it, but not when I'm using _all. Is there a way to get the words in a specific field to be suggested when suggesting against _all?
I'm not sure this qualifies as exactly the answer I was looking for, but I ended up solving this by adding a field on the document containing the keyword value that I was looking for "combine5", so now it is registered as a word and if I suggest on that field, or _all, the word is suggested. It's also found in queries against _all.

Resources