How Elasticsearch multi matching with _all work? - elasticsearch

I wanted to know how multi matching with _all work. Let's say I have the following query:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Does it search all available fields for the particular phrase and returns a record if the phrase exists in all fields? What if some of the fields have it and some other do not?

_all field is just field which concatenate all your fields into one big string and then analyze it in standard way - if no defined using standard analyzer for text. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It's possible to remove some fields from _all fields while defining your mapping with param 'include_in_all' https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html
So does it make sense to use phrase query on concatenation of your all fields? Rather not. I would say that multi_match can let you achieve similar goals as _all fields; you can search multiple fields in one query. But when using _all fields you can just use 'match' query.

_all field (which is removed in 6.0) index all the values from your json document whatever the field they appeared in.

Related

Elasticsearch Filtering woes, uppercase vs lowercase field filtering

I have a field in my index called "status" of keyword type.
When I try to filter with {"term": {"status": "Publish"}}, it returns no hits
When I try to filter with {"term": {"status": 'publish"}}, it returns the correct result.
This would be one thing if the status was input as lowercase, but they're actually uppercase.
My kibana GET returns the products with "status": "Publish".
I also remember inserting the statuses with uppercase values. So why can I only filter by lowercase?
The big caveat, and I know its suspicious to do this, is I attempted to add the "status" mapping after the item indices were already created. Thats the main culprit for me right now as to why this is happening.
Does anyone know why the filtering only works with lowercase values when the actual value in the mapping is uppercase?
The standard analyzer is the default analyzer if no analyzer is specified. So, Publish gets indexed as publish.
If you have not explicitly defined any mapping then you need to add .keyword to the status field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after status field).
The term query does not apply any analyzers to the search term, so will only look for that exact term in the inverted index. So to search for the exact term, you need to use status.keyword OR change the mapping of the field.
{
"query": {
"term": {
"status.keyword": "Publish"
}
}
}

What on earth does elasticsearch _all field contains?

I have been using elasticsearch in work but confused by _all field for quite e long time. The document says that
The _all field is a special catch-all field which concatenates the
values of all of the other fields into one big string, using space as
a delimiter, which is then analyzed and indexed, but not stored
But do these "all fields" contains fields not analyzed, or not even indexed?
If anyone knows the answer, please kindly tell me, thanks in advance.
The _all field is a field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.
The _all field allows you to search for values in documents without knowing which field contains the value.
Example
suppose you have indexed a document as below
{
"first_name": "sunder",
"last_name": "r",
"date_of_birth": "1996-03-20"
}
ok then the all index field for this document will be generated which will be as follow
"sunder r 1996 03 20"
which is then analyzed and indexed(The _all field is just a text field, and accepts the same parameters that other string fields accept, including analyzer, term_vectors, index_options, and store.)
and the _all field is not present in the _source field and it is not stored or enabled by default
Note
The _all field is Deprecated in ES 6.0.0.
_all may no longer be enabled for indices created in 6.0+, use a custom field and the mapping copy_to parameter

Favor exact matches over ngram matches in ElasticSearch when mapping

I have partial matching of words working with ngrams. How can I modify the mapping to always favor exact matches over ngram tokens? I do not want to modify the query. One search box will search multiple types, each with their own fields.
For example, lets say I'm searching job titles, one person has a title of "field engineer", the other a title of "engine technician". If a user searches for "engine", I'd want ES to return the latter as more relevant.
I'm using this mapping almost verbatim: https://stackoverflow.com/a/19874785/978622
-Exception: I'm using an ngram with min of 3 and max of 11 instead of edge ngram
Is it possible to apply a boost/function score to an analyzer? If so I'll apply both the "full_name" and "partial_name" analyzers to my index as well and boost the first.
Edit: I'm using ElasticSearch 1.1.1 and Nest 1.0.0 beta
I don't believe there is anyway to apply boosting to an analyzer as you're suggesting.
One thing you can try, is to use the multi field type in your mapping. You could then apply your partial_name analyzer to one version of the field, and your full_name analyzer to the other version.
With this mapping, you could query both fields differently, but combined (perhaps in a bool query), and apply a boost to the query that is being conducted on the full_name analyzed field.

Using Nest, how to mimic an _all field that includes ngram tokens?

I believe it is impossible for the _all field to contain ngram tokens. How can I mimic this behavior?
I have 7 types of entities, each with about 10 fields. Of those 70 total fields, about 15 must support partial search (using an ngram index analyzer). All fields will use the same search analyzer.
Is copy_to supported in Nest? I don't see it. If so, can different fields have different analyzers?
My thinking so far: If copy_to is supported, all fields I want to search would be copied to a single field, one per type, called "aggregate". The search query would specify a multifield search which included each of these aggregate fields.
The _all field can in fact contain nGram tokens. You have the ability to define both the search and index analyzers for the _all field. Please see my previous question Set analyzers for _all field with NEST However, you will need to pull the source for NEST and compile it to get this functionality, as it is not in the NEST 1.0.0-beta1 release on NuGet.

ElasticSearch: How to specify specific fields to search at?

Right now in my mapping, I am setting "include_in_all" to true, which means all the fields are included in _all field.
However, when I am searching, instead of wasting space, and putting everything in the _all field, I want to specify the specific fields to certain for (and taking into account the boost scores in the mapping).
How do I create a query that tells Elastic Search to only look at specific fields(not just 1) and take into account the boosting I gave it during my mapping?
Start with a multi_match query. It allows you to query multiple fields, giving them different weights, and it's usually the way to go when you have a search box.
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^2", "message" ]
}
}
The query_string is more powerful but more dangerous too since it's parsed and can break. Use it only if you need it.
You don't need to keep data in _all field to query for a field.
You can use query_string or bool queries to search over multiple fields.

Resources