Using Nest, how to mimic an _all field that includes ngram tokens? - elasticsearch

I believe it is impossible for the _all field to contain ngram tokens. How can I mimic this behavior?
I have 7 types of entities, each with about 10 fields. Of those 70 total fields, about 15 must support partial search (using an ngram index analyzer). All fields will use the same search analyzer.
Is copy_to supported in Nest? I don't see it. If so, can different fields have different analyzers?
My thinking so far: If copy_to is supported, all fields I want to search would be copied to a single field, one per type, called "aggregate". The search query would specify a multifield search which included each of these aggregate fields.

The _all field can in fact contain nGram tokens. You have the ability to define both the search and index analyzers for the _all field. Please see my previous question Set analyzers for _all field with NEST However, you will need to pull the source for NEST and compile it to get this functionality, as it is not in the NEST 1.0.0-beta1 release on NuGet.

Related

What's the difference between Search-as-you-type datatype and Edge NGram Tokenizer?

Can't understand the difference between setting a Search-as-you-type datatype to a field, setting an Edge NGram Tokenizer in analyzer, and adding an index_prefixes parameter. It seems to me that they do the same job after all.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-prefixes.html
edge_ngram is a tokenizer, which means it kicks in at indexing time to tokenize your input data. There is also a edge_ngram token filter. Both are similar but work at different levels.
search_as_you_type is a field type which contains a few sub-fields, one of which is called _index_prefix and which leverages the edge_ngram tokenizer.
So basically, what you see in the edge_ngram tokenizer documentation has actually been leveraged when they decided to add the new search_as_you_type field type.
Rafiqul is correct that search_as_you_go is built using edge_ngram, but it also incorporates the concept of shingles. Shingles are sets of words, which allows search_as_you_go to better handle multi-word queries.
Note that search_as_you_go requires the words to be in the order entered, which is especially ideal for known entities like movie titles than free form documents.

How Elasticsearch multi matching with _all work?

I wanted to know how multi matching with _all work. Let's say I have the following query:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Does it search all available fields for the particular phrase and returns a record if the phrase exists in all fields? What if some of the fields have it and some other do not?
_all field is just field which concatenate all your fields into one big string and then analyze it in standard way - if no defined using standard analyzer for text. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It's possible to remove some fields from _all fields while defining your mapping with param 'include_in_all' https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html
So does it make sense to use phrase query on concatenation of your all fields? Rather not. I would say that multi_match can let you achieve similar goals as _all fields; you can search multiple fields in one query. But when using _all fields you can just use 'match' query.
_all field (which is removed in 6.0) index all the values from your json document whatever the field they appeared in.

Favor exact matches over ngram matches in ElasticSearch when mapping

I have partial matching of words working with ngrams. How can I modify the mapping to always favor exact matches over ngram tokens? I do not want to modify the query. One search box will search multiple types, each with their own fields.
For example, lets say I'm searching job titles, one person has a title of "field engineer", the other a title of "engine technician". If a user searches for "engine", I'd want ES to return the latter as more relevant.
I'm using this mapping almost verbatim: https://stackoverflow.com/a/19874785/978622
-Exception: I'm using an ngram with min of 3 and max of 11 instead of edge ngram
Is it possible to apply a boost/function score to an analyzer? If so I'll apply both the "full_name" and "partial_name" analyzers to my index as well and boost the first.
Edit: I'm using ElasticSearch 1.1.1 and Nest 1.0.0 beta
I don't believe there is anyway to apply boosting to an analyzer as you're suggesting.
One thing you can try, is to use the multi field type in your mapping. You could then apply your partial_name analyzer to one version of the field, and your full_name analyzer to the other version.
With this mapping, you could query both fields differently, but combined (perhaps in a bool query), and apply a boost to the query that is being conducted on the full_name analyzed field.

How to query all fields individually with ElasticSearch

As I understand it, ElasticSearch searches on the magic _all field by default. The problem with this seems to be that if a field uses a different index analyzer, the analyzed data from this field is not searched.
I've had success with searching on the fields ['domain', '_all'] but I really need to avoid having to manually specify each field which was analyzed differently. I see fields supports wildcards but seemingly not '' on its own. I could do a, b*, c*, d* etc. but this seems a tad inefficient.
the special field "_all" is discontinued and copy_to function can be used instead as per the official documentation. This approach allows one to create a computed field (managed by elastic search) that one can specify to copy data from other fields to mimic _all search.
However there is an alternative approach through the use of multi_match providing wildcard field names as part of the query. This works just like the earlier mechanism searching "_all" field.
{"multi_match":{"query":"java","fields":["*"]}}]}}

Elasticsearch analyser only being used when I specify the field of the search

I have an analyser called autocomplete_analyser defined on a field name. When I run the query
http://localhost:9200/courses/course/_search?q=name:dav&pretty=true
it runs the analyser and returns the correct results. When I run
http://localhost:9200/courses/course/_search?q=dav&pretty=true
it does not.
How can I make ES run the analyser without me specifying the fields being searched on?
I need to use this analyser across a number of fields so its important that I can search all of them.
By default, queryString queries are applied on _all field which have its own analyzer.
You can define your specific analyzer for the _all field using the Put Mapping API.
Does it help?

Resources