Predictive autosuggest logic - full-text-search

I would like to implement predictive autosuggest in my website. I have used Solr to improve search performance. But after a research of last 2 days, I understand that Solr didn't have any built in package or support to implement predictive suggestion like Amazon or flipkart search. Anybody can advice me what is the easy logic to implement predictive suggestion
OR what are the technologies supports this type of search suggestion?
Expected workflow as follows,
If user search string "samsung" our autosuggestion should show grouped suggestion as follows,
samsung in Mobile
samsung in Television
samsung in Laptop
and so on

You're describing "filtered search" (via autosuggest). You can determine which filters to offer using Solr facets.
Assuming "Mobile", "Television" and "Laptop" are all values in a Solr field called category:
Run a query for samsung with rows=0 and request a terms facet on category.
You'll get back an frequency-ordered list of categories where documents match samsung
Display these categories as filtered search options (via autosuggest) if you decide the result count is high enough.
When a suggestion is chosen, run a second query for samsung filtered by the chosen category (eg: q=samsung&fq=category:Mobile&rows=10)

Related

Elasticsearch multiple score fields

Maybe a dummy question: is it possible to have multiple score fields?
I use a custom score based on function_score query. This score is being displayed to the user to show, how much each document matches his/her preferences. So far so good.
But! The user should be able to filter the documents and (of course) sort them not only by the custom relevance (how much each document matches his/her preferences) but also by the common relevance - how much each document matches the filter criteria.
So my first idea was to place the score calculated by function_score query to a custom field but it does not seems to be supported.
Or am I completely wrong and I should use another approach?
I took a different approach - in case user applies some filter the I run the query without function_score percolation and use the score calculated by ES and sort by it. Then I take all IDs from the result page and run percolation query with these IDs to get the custom "matching score". It does not seems to cause noticeable slowdown.
Anyway, I welcome any feedback.

How to do basic semantic search using ElasticSearch?

I want to implement basic semantic search in my product search service which uses ElasticSearch.
I am fairly new to ML and NLP and implementing a full contextual search using models trained on search analytics will take a significant time.
I am thinking to solve few basic use cases first like - "yellow shirt" or "green jacket" where in yellow/green are essentially a color filter which I can apply on the keywords shirt/jacket.
How can I implement this kind of query parsing? I know querqy is one tool.
Should I be doing something with spaCy?
How about this silly logic:
Query ElasticSearch with "yellow jacket"
Search the tokens (yellow and jacket) in the facets/aggs, if they match then apply that facet as a filter and send a search query to ES again.
So basically "yellow" will be found in the "color" facet and I will apply the color=yellow filter.
The downside of this approach is that I'll have to send two queries to ES (two n/w calls) and also, I think that pagination will go hay-wired.
Please suggest me good ways to approach this problem based on your experience. Thanks in advance.

Multiple field autocomplete with index type boost

What I'm trying to accomplish on a high level is an autocomplete input field which queries both customers and orders on multiple fields, with customers ranking higher for customer name searches.
It seems to me that there are various ways to approach this problem with the tools that elasticsearch provides.
The way that I have approached this is to use multi_match queries with prefix_phrase type in order to get partial queries to work across multiple fields.
For example, "bo" should return back matches for "Bob Smith" as well as "Adam Boss". I'm indexing fullname as a separate field from firstname and lastname, so that "adam boss" will return a valid prefix match as well.
In addition, I'd like to boost customer results - trying to do that with a boost param on the multi_match, but that doesn't seem to be working the way I'd expect it to.
What would be a straight forward way to tackle this problem?
One of the challenges I'm facing with the elasticsearch docs is that it's not always clear which properties and features apply to which others. For example, the multi_match documentation doesn't talk about using a custom boost, other than on a field-level.
I think the best way is using completion suggester of ES (v0.90.3+), please refer here for a real use case:
http://www.elasticsearch.org/blog/you-complete-me/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Lucene.NET: Query or Filter?

It is my understanding that documents are found based on a query, and then that result is then filtered by the filter.
The Query is the only that that will effect the score/relevance of a document.
Would there be any performance (caching) improvements if I query results that have relevance towards relevancy, and filter items that don't?
Here is my situation. I have a lot of products, and the website will often search for products by category or manufacturer. I was thinking about using queries for that as that will bring the products down to a smaller subset which can be cached. I can then filter my results by product specifications. Should I use filters for specifications? That way we can filter based on an already cached (by lucene) subset of products (category or manufacturer).
Using filters also does not affect the returned score whereas additional terms in a query do. You should use filters, for example, if a user picks a certain category from a list of available categories as facets :
Category : Electricals
Query Terms : DSLR Camera
Resultant scores (relevancy) are based on the query terms other than a hit on the category
The difference between filter and query is mostly that filter is exact. If you filter on brand=... than you will only get that exact brand. If you query on it, you will get the brand and possibly other results that also match your query.
So the question is, do you want an exact filter, or is it just for relevance?
Filtering provides a mechanism to further restrict the results of a query and provide a possible performance gain if the same query is run multiple times.
We mostly use filters for security - this would provide performance gains as results of the query are cached.

Lucene - Searching several terms in different fields

I have a Lucene index which populates from a database. I store/index some fields and then add a FullText field in which I index the contents of all the other fields, so I can do a general search.
Now let's say I have a document with the following two fields:
fld1 - "Samsung releases a new 22'' LCD screen"
fld2 - "Sony Ericsson phone's batteries explode"
If an user does a "Samsung phone", he probably just wants news about samsung phones, not a document with info about a samsung screen and a sony phone, but searching by the FullText field, I will get this as a valid result.
Is there a nice way to handle this?
I've thought of indexing with some separator and the doing a SpanNotQuery, so the FullText field would have this contents:
"Samsung releases a new 22'' LCD screen MYLUCENESEPARATOR Sony Ericsson phone's batteries explode" and then doing a SpanNotQuery with MYLUCENESEPARATOR as the non-spanning term.
Is this a good solution? Does it scale well with more than two terms? I fear it would be a performance killer. Is there a better way to achieve this?
If the number of fields is limited you can put the two description strings in two different fields. Then you can use MultiFieldQueryParser to search on these fields. Since these are two separate fields, the document will match only if both the terms appear in a single field with AND operator.
Let's take your example.
fld1 - "Samsung releases a new 22'' LCD screen"
fld2 - "Sony Ericsson phone's batteries explode"
If these are indexed in separate fields fld1 & fld2, your query becomes
(+fld1:samsung +fld1:phone) (+fld2:samsung +fld2:phone)
Multifield query helps you to construct such queries easily so that you don't need to repeat a query for multiple fields.

Resources