I am in learning process of ElasticSearch and having hard time matching certain cases.
For example I have product name: "SkyProdigy 130" and I am trying to write a query that will match this product name when someone types "sky prodigy". Also, another example for the manufacturer "Magpul" I would like to be able to match even if someone type "mag pul", etc.
I have managed to make this work with fuzzy query, but I am looking for a more organic way to achieve this through analyzers and correct mappings.
Can someone recommend the best approach for this case?
Related
I'm looking for a convenient way to search for related words to a term. For example, If I search for the word "washer", I should be getting related search terms like "dryer" with the lower score thank washer results, It means the washer documents must appear the first and then dryer documents. how can I do this functionality?
You need to build a synonym dictionary. Fortunately, we have machine learning models now, like "word2vec (neural net)", that can do this. You can try using open source gensim package for this.
The input to the model would be lots of text/info/articles that carries the word washer and dryer. once you train on this, you can find closest words that are related to "washer" and use these as synonym like dictionary.
At query time, look up this dictionary and expand the query with lower weight/boost for synonyms than the actual term.
I'm trying to find out how to properly write my query in order to do a LIKE query with ElasticSearch.
Let's say I have a record of firstname and I want to find every one where there is ma in it.
So I've tried multiple things but none are working. Here is a list :
{"match": {"text": ".*ma.*"}}
{"match": {"text": "*ma*"}}
{"match":{"text"{"query":"ma","fuzziness":"AUTO","prefix_length":1}}}
Do you have an idea of how to do that or where am I missing something?
You might look into using the N-Gram tokenizer to split your documents' tokens up into their substrings.
This will allow you to search against the index with the "partial" matches you're describing.
Bear in mind that this will affect how your documents are tokenized for search so, if you are using other types of analysis for other parts of your application, you may want to create additional fields for your N-Gram tokenized values (or even create a separate index for them).
As a rule of thumb, always try to optimize your index for the queries you want to perform, rather than trying to solve your search problems at query time.
there are usecases where I really would like to know which term was matched in which field by my search. With this information I would like to disclose the information which field caused the hit to the user on my webpage. I also would like to know the term playing part in the hit. In my case it is a database identifier, so I would take the matched term - an ID - get the respective database record and display useful information to the user.
I currently know two ways: Highlighting and the explain API. However, the first requires stored values which seems unnecessary. The second is meant for debugging only and is rather expensive so I wouldn't want it to run with every query.
I don't know another way which is confusing: The highlighting algorithms need the information I want to use anyway, can't I just get it somehow?
On a related note, I would also be interested in the opposite case: Which term did not hit at all? This information would allow for features like "terms that didn't match your query" like Google does sometimes (where the respective words are shown in grey-strikeout).
Thanks for hints!
What I'm trying to accomplish on a high level is an autocomplete input field which queries both customers and orders on multiple fields, with customers ranking higher for customer name searches.
It seems to me that there are various ways to approach this problem with the tools that elasticsearch provides.
The way that I have approached this is to use multi_match queries with prefix_phrase type in order to get partial queries to work across multiple fields.
For example, "bo" should return back matches for "Bob Smith" as well as "Adam Boss". I'm indexing fullname as a separate field from firstname and lastname, so that "adam boss" will return a valid prefix match as well.
In addition, I'd like to boost customer results - trying to do that with a boost param on the multi_match, but that doesn't seem to be working the way I'd expect it to.
What would be a straight forward way to tackle this problem?
One of the challenges I'm facing with the elasticsearch docs is that it's not always clear which properties and features apply to which others. For example, the multi_match documentation doesn't talk about using a custom boost, other than on a field-level.
I think the best way is using completion suggester of ES (v0.90.3+), please refer here for a real use case:
http://www.elasticsearch.org/blog/you-complete-me/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
I am attempting to create a query to exactly match on a few fields, such as account_id and from_addresses (which is an array), while also fuzzy matching on another field such as message_content. What is the best way to do this?
I have tried a Bool query with a few must and should parameters but can't seem to get it working.
I believe what you want to do it to use Filters. More specifically, an AND filter. So your query message_content, but filter by account_id and from_addresses.
I don't know which library you are using, so I can't really provide any code examples.