i have a doubt about solr possibilities. i need to do a request with special issues:
i need first: promoted records with all the terms typed by the user (ordered randomly).
second: promoted records with any term typed by the user (ordered randomly).
third: promoted records found by the stemming search (ordered randomly).
fourth: promoted records found by the phonetic search (randomly).
fifth: free records ordered alphabeticly (having all or any term typed by the user).
these results need to be paginated.
is it possible to do it in the same request?
After finding out that random ordering is support in solr via:
<fieldType name="random" class="solr.RandomSortField" />
<field name="random" type="random" indexed="true" stored="false"/>
those queries will be possible but NOT in one query
Although one could use the facet and facet.query feature, but this only returns the count ... not the docs.
I would setup a separate advertising index instead of the normal way to implement 'advertising' with the elevation component
promoted records with all the terms typed by the user (ordered randomly)
a simple AND query against the advertising index
promoted records with any term typed by the user (ordered randomly)
a simple OR query against the advertising index
promoted records found by the stemming search (ordered randomly).
normal search (with stemming) in the advertisment index.
promoted records found by the phonetic search (randomly).
you'll need to transform the query and the terms via your own phonetic transformation to do that. so you'll have a special field phonetic_text and you'll need to query this via
q=phonetic_text:"U R G8" (which means: you are great ;-))
free records ordered alphabeticly (having all or any term typed by the user).
again normal search via "AND" or "OR" with the sort parameter
Related
Let's say I have two texts:
Text 1 - "The fox has been living in the wood cabin for days."
Text 2 - "The wooden hammer is a dangerous weapon."
And I would like to search for the word "wood", without it matching me "wooden hammer". How would I do that in Elastic Search or nest?
Term query is used for exact matches search. However it's not recommended to use it against text fields, the following quote from term query documentation:
To better search text fields, the match query also analyzes your
provided search term before performing a search. This means the match
query can search text fields for analyzed tokens rather than an exact
term.
The term query does not analyze the search term. The term query only
searches for the exact term you provide. This means the term query may
return poor or no results when searching text fields.
The problem with text exact matches, as described in the Term query documentation:
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
So, the documents data is modified (i.e., analyzed) before indexing. This depends on the index mapping definition for each field, defaults to the default index analyzer, or the standard analyzer.
But the default standard analyzer will not change the token "Wooden" to "Wood", this might happen if you used stemming for this field.
This means, if you don't use a different analyzer or stemming, querying with "Wood" shouldn't match "Wooden" token.
To summarize: Indexed data is modified/analyzed before indexing (based on the field mapping definition). Match query analyze the search query, while Term query doesn't analyze the search query. So you have to properly chose the field mapping and the search query to better suit your use case
For some use cases, like storing email addressed, phone numbers or keyword fields that always have the same value, consider using the Keyword type, which is suitable for exact matches in these use cases. However, ES recommends:
Avoid using keyword fields for full-text search. Use the text field
type instead.
So for better visibility and practical solution for your use case, it's better to elaborate more the field mapping you use and what you want to achieve.
I'm trying to improve search on my service but get stuck on complex queries.
I need to match some documents by terms but return only documents that contains all of provided terms in any order and contains only this terms.
So for example, lets take movie titles:
"Jurassic Park"
"Lost World: Jurassic Park"
"Jurassic Park III"
When I type "Park Jurassic" I want only first document to be returned because it contains both words and nothing more.
This is silly example of complex problem but I've simplified it.
I tried with terms queries, match etc but I don't know how to check if entire field was matched.
So in short it must match all tokens in any order.
Field is mapped as text and also as keyword.
You tested the terms set query?
Returns documents that contain a minimum number of exact terms in a
provided field.
The terms_set query is the same as the terms query, except you can
define the number of matching terms required to return a document.
Being new to ElasticSearch, need help in my understanding.
What I read about term vs match query is that term query is used for exact match and match query is used when we are searching for a term and want result based on a relevancy score.
But if we already defined a mapping for a field as a keyword, why anyone has to decide upon between term vs match, wouldn't it be always a term query in case mapping is defined as a keyword?
What are the use cases where someone will make a match query on the keyword mapping field?
The same confusion is vice versa.
A text field will be analyzed (transformed, split) to generate N tokens, and the keyword itself will become a token with no transformations. At the end, you have N tokens referencing a document.
Then.
By doing a match query, you will treat your query as a text as well, by analyzing it before performing the matching (transforming it), and the term will not.
You can create a field with a term mapping, but then perform a match query on top of it (for example if you want to be case insensitive), and you can create a text mapping for a n-gram and perform a term query to match exactly what you're asking for.
We have an ES index which has a field which stores its data as an array. In this field, we include the original text, plus text without any punctuation, special characters, etc. The problem is, when searching on the field, the multiple values appears to be skewing the score.
For example, if we search on the term 'up', the document which has the array ['up, up and away', 'up up and away'] is scoring higher with a multi_match (we are using because we may search more than one field) than the document with the array as simply ['up'].
In the end, I guess what I am looking for is a score that emulates calculating a score for each item in the array and returning me the highest. I believe in this case, comparing 'up' to 'Up' and 'Up, Up and Away' will give me a higher score for 'Up'.
With my research, I believe I may need to do custom scoring on this field...? If that is true, am I looking at "score_mode": "max" as what I want?
I think you slightly over-engineered your index. You don't need to create duplicate fields for the same information and remove punctuation, lowercase fields yourself.
I'd recommend you to read what are elasticsearch token filters and how to create multiple analyzers for the same field.
For your exact use case, if you provided a document sample, it would certainly help. But in any case looking at what you are dealing with - index your array of strings with default analyzer and with a custom one that you'll build yourself. Then you can use the same field, but with different analyzers (differently processed text) to control your score.
I am using Haystack with ElasticSearch and I would like to perform boosts that don't just boost a term in general, but instead boost a term only when it is found on a specific field.
For instance, on my UserIndex, I would like to prioritize (boost) search results where the user is marked as active. is_active is a BooleanField on the index model. I know how to filter so that I only fetch active users, but how can I boost active users but not outright filter out inactive users? I could apply a boost to the field in UserIndex, but that doesn't seem like it would work without some way other than an outright filter to search against that BooleanField (since otherwise there are no search terms that the field boost would affect). I could apply a boost to the SearchQuerySet, but the boost() function takes a string which appears to just be a straight-up search term, and you cannot specify a field for that term to occur in.
I might be able to solve that issue in isolation with order_by, but I have a bunch of other complex boosts I want to do:
I want to be able to boost matching users if they have IDs in a list specified by the application at runtime (this is so I can boost users relative to the context of the page where the search button was pressed). I could simply boost a search term containing the user's ID, but then if that number was coincidentally in another field, it would boost that field too and thus give very strange results.
I want to be able to boost the searching user's friends. I currently have the list of every user's friends in a MultiValueField on the search index model. I want to pass the searching user's ID in with the search query, and boost any users in the index who have the searching user's ID in their friends list. Again, I have the same problem as above -- I can boost the ID, but I can't specify that I only want to boost the occurrence of that ID in that specific field.
I have a second BooleanField I want to boost by, similar to is_active but boosted by a smaller amount.
All of this is easy-ish if I can boost by a combination of a term and a field, but it seems very hard if I can only boost a term and not a field.
The only thing I have been able to think of so far is basically a hack: instead of BooleanFields, use CharFields with magic strings in them. Then boost those magic strings as search terms, and count on nobody accidentally using the magic strings in their inputted text. Likewise, instead of raw ids in my MultiValueFields, use ids prepended with magic strings. This is awkward, fragile and potentially buggy given that the behavior of the ElasticSearch standard tokenizer may be unpredictable given nonsensical "magic strings".
Another option I considered was using a Raw input type and adding ElasticSearch-specific syntax, but usage of Raw with ElasticSearch is almost entirely undocumented and the ElasticSearch boosting documentation itself is very thin.
Is there any way to solve this that does not involve mangling my index data in such a fashion?
In your mapping you could add:
"is_active":{
"type":"boolean",
"boost":10.0
}
and
"friends":{
"type":"int",
"index":"not_analyzed",
"boost":5.0
}
And then wrap your original query in a boolean query with a MUST on your original query and a SHOULD on is_active:true and SHOULD on friends:1234