I have just started using Elastic search and am stuck with the following use case -
I am using complete suggester in elastic search with auto fuzziness setting to get city suggestions as output. My city name in completion field has weights according to popularity. The problem is the ordering in case of fuzzy results.
Example if user types "dilh" -> I would want to give "delhi" result above "digha" or "dighwara" owing to popularity i.e. weights assigned to different cities.
Right now "digha","dighwara","Dihira" etc are coming above more relevant cities like "delhi" or "dalhousie". Since the edit distance is same anyone can let me know how can I configure this so the order is according to the weights of cities?
Attaching sample request:
{
"suggest": {
"loc-suggest2": {
"prefix": "dilh",
"completion": {
"field": "suggestedNames",
"size":20,
"fuzzy": {
"fuzziness": auto
}
}
}
}
}
Related
I want to implement address autocompletion using Elasticsearch.
The current approach I am investigating is based on search_as_you_type field type.
Consider this two addresses:
3543JN Carl Zellerhof 8 Utrecht (3543JN is postcode)
1234JN The Street 3543 Utrecht
It is important to prioritize some address parts over others, for instance, postcode should have more weight than number, eg when a user types 3543 - the first address should be first in search results.
I see two solutions here:
Combine address into one string and give weight based on position within the combined string
Do search on multiple fields (then weight can be adjusted per field, but it seems more complex to me, how to ensure the same address part is not matched several times?)
I am leaning more towards one-string solution, but this implementation gives the same weight for the 3543 search query.
Please advise how to implement this.
(It is also desirable to allow some fuzziness)
UPD:
seems adding postcode field to the multi_match fields gives me what I want. Are there any disadvantages of this approach?
the index
{
"mappings": {
"properties": {
"search": {
"type": "search_as_you_type"
}
}
}
}
the search query
{
"query": {
"multi_match": {
"query": "3543",
"type": "bool_prefix",
"fields": [
"search",
"search._2gram",
"search._3gram"
]
}
}
}
We have a field title and the type is search_as_you_type,
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
and when we a searching
{
"query": {
"match_phrase_prefix": {
"title": "red"
}
}
}
we are getting duplicates results
red car
red icecream
red car
This is because we have documents with same title values.
Is there a way to indicate that result must have distinct vaules?
You can see terms aggregation of your title field in case of search as you type works on not by following the example given in [this SO answer] 1. You can also check this blog which explains how to get unique values from Elasticsearch.
Also, make sure these documents which are coming in your results are the same documents and not the different document which has the same values.
Edit:- As discussed in the comment, in this case, completion suggestor was more useful as it deals with duplicates and it solved the issue.
I have an existing query that is providing suggestions for postcode having the query as below (I have hard coded it with postcode as T0L)
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
This works fine, but it searches for some results where the city contains null values. So I need to filter the addresses where the city is not null.
So I followed the solution on this and prepared the query like this.
{
"query": {
"constant_score": {
"filter": {
"exists": {
"field": "city"
}
}
}
},
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
}
But unfortunately this is not giving the required addresses where the postcode contains T0L, rather I am getting results where postcode starts with A1X. So I believe it is querying for all the addresses where the city is present and ignoring the completion suggester query. Can you please let me know where is the mistake. Or may be how to write it correctly.
There is no way to filter out suggestions at query time, because completion suggester use FST (special in-memory data structure that built at index time) for lightning-fast search.
But you can change your mapping and add context for your suggester. The basic idea of context that it also filled at index time along with completion field and therefore can be used at query time with suggest query.
I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.
I'm new to Elastic Search and have an index with lots of articles in it. I have 3 main fields I use; title, snippet and date. I want to find the most common or top key-phrases or keywords for a specific date in the title field. I was hoping someone can provide an example on how to do this or at least point me in the right direction.
Many Thanks!
I think you are looking for terms aggregation. Try something like this
{
"query": {
"match": {
"date": {
"query": "your_date"
}
}
},
"size": 0,
"aggs": {
"common_words": {
"terms": {
"field": "title",
"size": 10
}
}
}
}
You will find common words at the top as they are ordered by count.
If you are looking for phrases you might have to analyze your title field accordingly. You can map title with multiple analyzer. for e.g standard analyzer for common words and shingle analyzer for common phrases.
You also might want to look into significant terms aggregation if you want to find something unusual.