filtering out docs that have a word from a stop list - elasticsearch

I have a search query for some such as "match":{"product name":"glasses car"} so that I am looking for glasses or cars (or both).
But I want to exclude the docs that have google glass or google car in them. so how would I filter them out?
I could use a bool query and use must_not for google, but then I will lose the scoring and get a constant score.

You should use a filtered query with a filter to exclude words like google and keep your main search as the query.
The filter does not affect scoring, only the query part of the filtered query does.
You can find a small example on the Filtered Query page in the ElasticSearch documentation.

Related

Elasticsearch multiple score fields

Maybe a dummy question: is it possible to have multiple score fields?
I use a custom score based on function_score query. This score is being displayed to the user to show, how much each document matches his/her preferences. So far so good.
But! The user should be able to filter the documents and (of course) sort them not only by the custom relevance (how much each document matches his/her preferences) but also by the common relevance - how much each document matches the filter criteria.
So my first idea was to place the score calculated by function_score query to a custom field but it does not seems to be supported.
Or am I completely wrong and I should use another approach?
I took a different approach - in case user applies some filter the I run the query without function_score percolation and use the score calculated by ES and sort by it. Then I take all IDs from the result page and run percolation query with these IDs to get the custom "matching score". It does not seems to cause noticeable slowdown.
Anyway, I welcome any feedback.

Return a list of search results with results related to user first with ElasticSearch or Neo4j

I'm trying to choose a database/search engine to return a list of results which shows any results the user has a relationship with first, then others after. Similar to the way Facebook works where you search a business name and one's you have liked appear first then others after?
I've seen this question which is similar to what I need but I believe it only show's results for that user: How can ElasticSearch be used to implement social search?
Is this possible with either ElasticSearch, Neo4j or anything else?
Elasticsearch can certainly do this.
Results are returned from Elasticsearch based on the score, which basically means the better the match the bigger the score.
You could use the "bool" query to specify your query as a "must" and then the user match as a "should". Optionally you might want to add a "boost" to the should query so it scores highest if matched.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

haystack elasticsearch facet but disregard filters on this query

Using haystack solr I can do this:
# facet by category but disregard any category filters on this query
squeryset = squeryset.facet('{!ex=category}category')
which will give me facets for the category but ignore any category filters
Now how do I do the same query using elasticsearch?
It's for when someone queries for something with a specific category, I can show the counts for the other categories they did not select.
I was looking for the same thing. See http://demo.fullscale.co/multiselect/ for a working demo to do this in ES. It is however not supported by Haystack. I'm working on that to make it work.
Paul

Using matching document original score in filter script for custom filters score query

I want to use "custom filters score" query and use filters to control the score of resulting documents.
I want a way to use the document's original score as computed by ElasticSearch, and then use that score to calculate the final score of the document, which matches the given filters.
Something like "_docScore * 50/100" as a script for a filter, where "_docScore" is the original score of a document that matches the filter.
How to achieve this in ElasticSearch?
Any help is greatly appreciated.
Regards & Thanks,
Aditya.
Documents in a filtered query would be unranked and have the same score.
http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query/
But you can use a custom score query together with a filtered query and use a script to calculate a score based on the document values. This was added in 0.90, I believe.

Lucene.NET: Query or Filter?

It is my understanding that documents are found based on a query, and then that result is then filtered by the filter.
The Query is the only that that will effect the score/relevance of a document.
Would there be any performance (caching) improvements if I query results that have relevance towards relevancy, and filter items that don't?
Here is my situation. I have a lot of products, and the website will often search for products by category or manufacturer. I was thinking about using queries for that as that will bring the products down to a smaller subset which can be cached. I can then filter my results by product specifications. Should I use filters for specifications? That way we can filter based on an already cached (by lucene) subset of products (category or manufacturer).
Using filters also does not affect the returned score whereas additional terms in a query do. You should use filters, for example, if a user picks a certain category from a list of available categories as facets :
Category : Electricals
Query Terms : DSLR Camera
Resultant scores (relevancy) are based on the query terms other than a hit on the category
The difference between filter and query is mostly that filter is exact. If you filter on brand=... than you will only get that exact brand. If you query on it, you will get the brand and possibly other results that also match your query.
So the question is, do you want an exact filter, or is it just for relevance?
Filtering provides a mechanism to further restrict the results of a query and provide a possible performance gain if the same query is run multiple times.
We mostly use filters for security - this would provide performance gains as results of the query are cached.

Resources