The post title is pretty much what I want to ask:
Which Azure Cognitive Search analyzer is equivalent to ES keyword analyzer
I want to store composite values separated by space that should be always searched together, like:
Cap TC
Sis B
Act A
Act B
Act C
Using Java API I could find the same keyword analyzer
LexicalAnalyzerName.KEYWORD
Then I also found the documentation: https://learn.microsoft.com/en-us/azure/search/index-add-custom-analyzers#built-in-analyzers
Related
I understand that Elasticsearch tries to avoid using Fuzzy search with prefix matching, and this is also why it doesn't natively support such feature due to its complexity. However, we have a directory search system that solely relies on elasticsearch as a blackbox search engine, and we need the following logic:
E.g. Say the terms are "Michael Pierce Chem". We want to support full text search on the first two terms (with match query) and we also want to do fuzzy on the last term first and then do a prefix match, as if "Chem" matches "chemistry", "chen", and even "YouTube Chen" due to full-text support.
Please give me some advice on the implementation and design suggestions. Current stack is a NodeJS web app with Elasticsearch.
I am studying any kinds of analyzers that elasticsearch has.
I am doing tests to do something like Netflix search.
When I am typing in netflix search (after login) the results are returned after each keyup.
I noticed that netflix search the typed lettler in anywhere from film that contain this world.
For instance:
If I type: "a", result=> A man, Captain America, The Alien...
Considering that netflix look for only in title (in order to facilitate my example), they return even the typed letter is in the middle of the text. like a "Captain America".
Probably they are using "NGram Tokenizer" or are using another analyzed to do this behavior?
I know that "shingle" is good to autocomplete, but does not recognize letter in the middle of the world.
What is the best analyze configuration to make it look like netflix?
I am using Elasticsearch in-built Simple analyzer https://www.elastic.co/guide/en/elasticsearch/reference/1.7/analysis-simple-analyzer.html, which uses Lower Case Tokenizer. and text apple 8 IS Awesome is tokenized in the below format.
"apple",
"is",
"awesome"
You can clearly see, that it misses tokenizing the number 8, hence now if I just search with 8, my message will not appear in search.
I went through all the available analyzer available with ES but couldn't find any suitable analyzer which matches my requirement.
How can I tokenize all the words with a number using a custom or in-built analyzer of ES ?
Your question is about the simple analyzer, but you mention a very old link to documentation. Try
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html
Like Val told you, you probably looking for the standard analyser.
If you want to see the difference try the analysis api:
http://localhost:9200/_analyze?analyzer=simple&text=apple%208%20IS%20Awesome
http://localhost:9200/_analyze?analyzer=standard&text=apple%208%20IS%20Awesome
Hi I was wondering whether there is any analyzer in Elasticsearch to identify the grammar of the text (noun, verbs etc..)
For example when the user searches for "fast smartphone", the Elasticsearch should be able to put more emphasis on the "smartphone" rather than the "fast"So I would like Elasticsearch to return results in the following order:
1) docs where both words match "fast smartphone"
2) docs where smartphone matches
3) docs where "fast" matches. Or maybe docs with only fast should never come out since the user mainly looks for smartphones
I am confused about when stemmers are used in ElasticSearch.
In the Dealing with Human Language/Reducing Words to Their Root Form section I see that stemmers are used to strip words into their root forms. This lead me to believe that Stemmers were used as a token filter on an analyzer.
But a token filter only filters the token, does not actually reduce words to their root forms.
So, where are stemmers used?
In fact, you can do stemming with a token filter in an analyzer. That is exactly how stemming works in ES. Have a look at the documentation for Stemmer Token Filter.
ES also provides the Snowball Analyzer, which is a convenient analyzer to use for stemming.
Otherwise, if there is a different type of stemming you would like to use, you can always build your own Custom Analyzer. This gives you complete control over the stemming solution that works best for you, as discussed here in the guide.
Hope this helps!