Excluding specific fields from consideration in a multi_match query - elasticsearch

We use the fields property of a multi_match query to include certain fields by wildcard. For example, we often want to construct a query that searches only the content (as opposed to title and other fields) across the various types in our system. So we include '*.content' in the fields property of our query. However, there are some cases where we might want to exclude one or more types from the search. So I'm looking for something like an exclude_fields property that would exclude those specific fields from consideration.
I realize that one way to achieve this would be to not use the wildcard and explicitly add 'Foo.content, Bar.content, ...' to the fields property, but this would be very cumbersome in our system.

Related

Control which multi fields are queried by default

I have a preexisting index that contains field mappings and is currently being queried by many applications. I would like to add additional ways for the data to be queried, specifically, support full text search via analysis. Multi-fields seemed like the obvious way to do this, but I found that adding new multi-fields actually changes the existing query behavior.
For example, I have an "id" field that is a keyword. Applications are already using this field to query on. After I add a new multi-field, like "txt" (using the standard analyzer), new documents can be found by querying with just a partial value match. Values for "id" look like this: "123-abc" so now a query with just "abc" will match when querying against the "id" field. This is not how it worked previously (the keyword only field would require the entire value "123-abc").
Ideally, the top-level "id" field would be keyword only, and if a "full text" search was required, the query would need to specify "id.txt". So my question is... is there a way to disable multi-fields and require that the query explicitly set a sub field when needed?
My only other thought on how to solve this, was to use copy_to so that these fields are completely distinct... but that is a bit more work and there are many many fields to deal with that would require this.

Avoid part of a string search in elasticsearch

I have a scenario where i want to search for 'bank of india' and documents retrieved have hits for 'reserve bank of india', 'state bank of india', etc. Basically the search string named entity is part of another named entity as well.
What are the ways to avoid it in elasticsearch?
If you use keyword type instead of text as the mapping for your entity field you will no longer have those partial matches. keyword says treat this text like a single unit (named entities are like this), while text says treat each word as a unit and consider the field as a bag of words, So the query looks for the most word matches, regardless of order or if all of the words are there. There are different queries that can get at that requiring order (match_phrase) and requiring all words to be matches (minimum_should_match parameter), but I like to use the term query if you follow the keyword mapping strategy. Does that make sense?

Elasticsearch 7 - prevent fields from being searchable

I know I can prevent fields from being indexed by setting the enabled mapping to false. This does work as expected but I am concerned that some of these fields will be needed in the future.
In my use case, I am searching for a product SKU of t-shirt-small-red and while ES7 does return the correct results, it also returns everything else as I am indexing the created_at and updated_at fields with dates 2020-02-08 00:00:00.
At least for now, I have no use for searching these within my app so I would like a way to exclude these from any search while keeping them indexed for future use. I am guessing I may want to perform filtering or aggregation on these in the future.
I know I can limit the search to just a single field but that does not work for this either. I need the search to work across every field apart from these 2 date fields.
As in one of the comments, you can exclude them from the fields part of the query. I was not using fields before so it actually means specifying all the fields explicitly.
Additionally, I found that specifying the field type as date also ensures they did not show up as false positives in the search.

How to query all fields individually with ElasticSearch

As I understand it, ElasticSearch searches on the magic _all field by default. The problem with this seems to be that if a field uses a different index analyzer, the analyzed data from this field is not searched.
I've had success with searching on the fields ['domain', '_all'] but I really need to avoid having to manually specify each field which was analyzed differently. I see fields supports wildcards but seemingly not '' on its own. I could do a, b*, c*, d* etc. but this seems a tad inefficient.
the special field "_all" is discontinued and copy_to function can be used instead as per the official documentation. This approach allows one to create a computed field (managed by elastic search) that one can specify to copy data from other fields to mimic _all search.
However there is an alternative approach through the use of multi_match providing wildcard field names as part of the query. This works just like the earlier mechanism searching "_all" field.
{"multi_match":{"query":"java","fields":["*"]}}]}}

Haystack boosting based on specific value in specific field

I am using Haystack with ElasticSearch and I would like to perform boosts that don't just boost a term in general, but instead boost a term only when it is found on a specific field.
For instance, on my UserIndex, I would like to prioritize (boost) search results where the user is marked as active. is_active is a BooleanField on the index model. I know how to filter so that I only fetch active users, but how can I boost active users but not outright filter out inactive users? I could apply a boost to the field in UserIndex, but that doesn't seem like it would work without some way other than an outright filter to search against that BooleanField (since otherwise there are no search terms that the field boost would affect). I could apply a boost to the SearchQuerySet, but the boost() function takes a string which appears to just be a straight-up search term, and you cannot specify a field for that term to occur in.
I might be able to solve that issue in isolation with order_by, but I have a bunch of other complex boosts I want to do:
I want to be able to boost matching users if they have IDs in a list specified by the application at runtime (this is so I can boost users relative to the context of the page where the search button was pressed). I could simply boost a search term containing the user's ID, but then if that number was coincidentally in another field, it would boost that field too and thus give very strange results.
I want to be able to boost the searching user's friends. I currently have the list of every user's friends in a MultiValueField on the search index model. I want to pass the searching user's ID in with the search query, and boost any users in the index who have the searching user's ID in their friends list. Again, I have the same problem as above -- I can boost the ID, but I can't specify that I only want to boost the occurrence of that ID in that specific field.
I have a second BooleanField I want to boost by, similar to is_active but boosted by a smaller amount.
All of this is easy-ish if I can boost by a combination of a term and a field, but it seems very hard if I can only boost a term and not a field.
The only thing I have been able to think of so far is basically a hack: instead of BooleanFields, use CharFields with magic strings in them. Then boost those magic strings as search terms, and count on nobody accidentally using the magic strings in their inputted text. Likewise, instead of raw ids in my MultiValueFields, use ids prepended with magic strings. This is awkward, fragile and potentially buggy given that the behavior of the ElasticSearch standard tokenizer may be unpredictable given nonsensical "magic strings".
Another option I considered was using a Raw input type and adding ElasticSearch-specific syntax, but usage of Raw with ElasticSearch is almost entirely undocumented and the ElasticSearch boosting documentation itself is very thin.
Is there any way to solve this that does not involve mangling my index data in such a fashion?
In your mapping you could add:
"is_active":{
"type":"boolean",
"boost":10.0
}
and
"friends":{
"type":"int",
"index":"not_analyzed",
"boost":5.0
}
And then wrap your original query in a boolean query with a MUST on your original query and a SHOULD on is_active:true and SHOULD on friends:1234

Resources