It is my understanding that documents are found based on a query, and then that result is then filtered by the filter.
The Query is the only that that will effect the score/relevance of a document.
Would there be any performance (caching) improvements if I query results that have relevance towards relevancy, and filter items that don't?
Here is my situation. I have a lot of products, and the website will often search for products by category or manufacturer. I was thinking about using queries for that as that will bring the products down to a smaller subset which can be cached. I can then filter my results by product specifications. Should I use filters for specifications? That way we can filter based on an already cached (by lucene) subset of products (category or manufacturer).
Using filters also does not affect the returned score whereas additional terms in a query do. You should use filters, for example, if a user picks a certain category from a list of available categories as facets :
Category : Electricals
Query Terms : DSLR Camera
Resultant scores (relevancy) are based on the query terms other than a hit on the category
The difference between filter and query is mostly that filter is exact. If you filter on brand=... than you will only get that exact brand. If you query on it, you will get the brand and possibly other results that also match your query.
So the question is, do you want an exact filter, or is it just for relevance?
Filtering provides a mechanism to further restrict the results of a query and provide a possible performance gain if the same query is run multiple times.
We mostly use filters for security - this would provide performance gains as results of the query are cached.
Related
Maybe a dummy question: is it possible to have multiple score fields?
I use a custom score based on function_score query. This score is being displayed to the user to show, how much each document matches his/her preferences. So far so good.
But! The user should be able to filter the documents and (of course) sort them not only by the custom relevance (how much each document matches his/her preferences) but also by the common relevance - how much each document matches the filter criteria.
So my first idea was to place the score calculated by function_score query to a custom field but it does not seems to be supported.
Or am I completely wrong and I should use another approach?
I took a different approach - in case user applies some filter the I run the query without function_score percolation and use the score calculated by ES and sort by it. Then I take all IDs from the result page and run percolation query with these IDs to get the custom "matching score". It does not seems to cause noticeable slowdown.
Anyway, I welcome any feedback.
I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.
I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user
We do understand the behavior of user by analyzing the tags he usually search for.
Now we need to give higher precedence for such tags for these users. I would like to know how we can achieve this using Elasticsearch in an elegant manner.
Well the best approach for this would be to
Analyse the behavior of the user
See which all keywords are of his interests
Maintain one document per user in another index which have all these keywords.
On the searches for that user , boost the occurrence of these keywords using function_score query
You can use terms filter inside boost function to achieve this.Add the boost function under functions in the function score query
In terms filter , you can point to this users document and get the values dynamically
Use custom filter key so that the cache key constructed wont eat too much memory
In this approach , you can avoid lots of code paths in client code.
I have a search query for some such as "match":{"product name":"glasses car"} so that I am looking for glasses or cars (or both).
But I want to exclude the docs that have google glass or google car in them. so how would I filter them out?
I could use a bool query and use must_not for google, but then I will lose the scoring and get a constant score.
You should use a filtered query with a filter to exclude words like google and keep your main search as the query.
The filter does not affect scoring, only the query part of the filtered query does.
You can find a small example on the Filtered Query page in the ElasticSearch documentation.
I need to facet inside n documents which are selected like
... ORDER BY something DESC LIMIT 100
Is that possible with Solr? How?
this is a total hack, but here goes...
do your initial query, and get your results back.
construct a new query, like so:
http://localhost:8080/solr/select/?q=id%3A123+OR+id%3A456...(keep OR-ing them up)...&facet=true&facet.field=something
where you concatenate all of your ids to a new query using OR. then, when you facet on your field, the facet summary will only apply to the results.
AFAIK no, that's not supported / implemented. Facets aren't really meant to be "stats" but a guidance to the end-user. Picture yourself browsing a faceted interface and seeing facets change whenever you change sort order or paging. Faceted browsing would be useless if it worked like that.
I think this would be a nice feature for the StatsComponent though.
I think this is possible with results grouping (now in trunk!):
http://wiki.apache.org/solr/FieldCollapsing
... the only problem is that you can set only one 'facet.field' (i.e. group.field)
But the great thing is that you get scored facets!