How to get unanalyed text in ES for RegEx or WildCard queries? - elasticsearch

With a text field, if I do a doc['my_text_field'] it shows analyzed tokens. As a result, the regex and wildcard queries can not be formulated in a resonable way...
Although, the text is available when doing adding a scripted field via params._source.my_text_field in its original form.
How do I get the unanalyed text like params._source.my_text_field for a normal elasticsearch query?

Related

How to search exact word in a test in Elastic Search

Let's say I have two texts:
Text 1 - "The fox has been living in the wood cabin for days."
Text 2 - "The wooden hammer is a dangerous weapon."
And I would like to search for the word "wood", without it matching me "wooden hammer". How would I do that in Elastic Search or nest?
Term query is used for exact matches search. However it's not recommended to use it against text fields, the following quote from term query documentation:
To better search text fields, the match query also analyzes your
provided search term before performing a search. This means the match
query can search text fields for analyzed tokens rather than an exact
term.
The term query does not analyze the search term. The term query only
searches for the exact term you provide. This means the term query may
return poor or no results when searching text fields.
The problem with text exact matches, as described in the Term query documentation:
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
So, the documents data is modified (i.e., analyzed) before indexing. This depends on the index mapping definition for each field, defaults to the default index analyzer, or the standard analyzer.
But the default standard analyzer will not change the token "Wooden" to "Wood", this might happen if you used stemming for this field.
This means, if you don't use a different analyzer or stemming, querying with "Wood" shouldn't match "Wooden" token.
To summarize: Indexed data is modified/analyzed before indexing (based on the field mapping definition). Match query analyze the search query, while Term query doesn't analyze the search query. So you have to properly chose the field mapping and the search query to better suit your use case
For some use cases, like storing email addressed, phone numbers or keyword fields that always have the same value, consider using the Keyword type, which is suitable for exact matches in these use cases. However, ES recommends:
Avoid using keyword fields for full-text search. Use the text field
type instead.
So for better visibility and practical solution for your use case, it's better to elaborate more the field mapping you use and what you want to achieve.

Search by ignore value case checking

In my index I have inserted fields without changing the case of values(Upper case or Lower case), like in my elasticsearch document a field name contains value Hello World. And i have made name field as not_analyzed for exact match. But in that case, when i search by hello world this document don’t returned by elasticsearch, might be due to case sensitivity. I have tried by using term query and match query but haven't found a luck.
Please suggest, if there is a way.
Thanks
The only way you can do this in Elasticsearch is by analyzing the field and using token filters. There is a lowercase token filter available that you should use but this can't really be done on-the-fly like SQL where you wrap the field to be queried against in something like LOWER().
To get the effect you desire I would use something like the Keyword tokenizer with the Lowercase token filter. If you set this analyzer to be the default analyzer for indexing and searching then your searches will also be case insensitive too.

Elasticsearch autocomplete and searching against multiple term fields

I'm integrating elasticsearch into an asset tracking application. When I setup the mapping initially, I envisioned the 'brand' field being a single-term field like 'Hitachi', or 'Ford'. Instead, I'm finding that the brand field in the actual data contains multiple terms like: "MB 7 A/B", "B-7" or even "Brush Bull BB72X".
I have an autocomplete component setup now that I configured to do autocomplete against an edgeNGram field, and perform the actual search against an nGram field. It's completely useless the way I set it up because users expect the search results to be restricted to what the autocomplete matches.
Any suggestions on the best way to setup my mapping to support autocomplete and subsequent searches against a multiple term field like this? I'm considering a terms query against a keyword field, or possibly a match query with 'and' as the operator? I also have to deal with hyphens like "B-7".
you can use phrase suggest, the guide is here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters.html
the phrase suggest guide is here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html

Is it possible to set a custom analyzer to not tokenize in elasticsearch?

I want to treat the field of one of the indexed items as one big string even though it might have whitespace. I know how to do this by setting a non-custom field to be 'not-analyzed', but what tokenizer can you use via a custom analyzer?
The only tokenizer items I see on elasticsearch.org are:
Edge
NGram
Keyword
Letter
Lowercase
NGram
Standard
Whitespace
Pattern
UAX URL Email
Path
Hierarchy
None of these do what I want.
The Keyword tokenizer is what you are looking for.
The Keyword tokenizer doesn't really do:
When searching, it'll tokenize the entire query string into a single token, making text queries behave like a term query.
The issue I run into is that I want to add filters and then search indexed keywords in a long text (Keyword assignment). I would say there's no tokenizer that could do this, and that the normalizer can't accept necessary filters. The workaround for me is to prepare the text before feeding it to elasticsearch.

Solr: How to search for a full match on a text field? Is there a hidden equal() operator?

it is too simple to describe:
q=mydynamicfield_txt:"video"
I want only hits when mydynamicfield is exact "video.
Other way round, how to supress hits, where "video" is only part of the field (like "home video").
Is this supported with Solr3.1 out of the box, or do I have to add my own special brackets like "SOLRSTARTSOLR video SOLRENDSOLR" in my index, to retrieve later my term between "START" and "END". Kind of manual regex anchoring.
This is PITA cause it needs special handling in index/gui and breaks highlighting.
Where is the way to go?
regards
Peter
(=PA=)
One solution to create untokenized(KeywordAnalyzed) field and search within it - all your text will be distinct token in Solr index.
Other solution is to write filter which will read token count from index and compare to query tokens i.e. filter entities where doc_tokens > query_tokens assuming that all query tokens are matched.

Resources