What on earth does elasticsearch _all field contains? - elasticsearch

I have been using elasticsearch in work but confused by _all field for quite e long time. The document says that
The _all field is a special catch-all field which concatenates the
values of all of the other fields into one big string, using space as
a delimiter, which is then analyzed and indexed, but not stored
But do these "all fields" contains fields not analyzed, or not even indexed?
If anyone knows the answer, please kindly tell me, thanks in advance.

The _all field is a field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.
The _all field allows you to search for values in documents without knowing which field contains the value.
Example
suppose you have indexed a document as below
{
"first_name": "sunder",
"last_name": "r",
"date_of_birth": "1996-03-20"
}
ok then the all index field for this document will be generated which will be as follow
"sunder r 1996 03 20"
which is then analyzed and indexed(The _all field is just a text field, and accepts the same parameters that other string fields accept, including analyzer, term_vectors, index_options, and store.)
and the _all field is not present in the _source field and it is not stored or enabled by default
Note
The _all field is Deprecated in ES 6.0.0.
_all may no longer be enabled for indices created in 6.0+, use a custom field and the mapping copy_to parameter

Related

How Elasticsearch multi matching with _all work?

I wanted to know how multi matching with _all work. Let's say I have the following query:
"multi_match": {
"query": x,
"type": "phrase",
"fields":"_all",
}
Does it search all available fields for the particular phrase and returns a record if the phrase exists in all fields? What if some of the fields have it and some other do not?
_all field is just field which concatenate all your fields into one big string and then analyze it in standard way - if no defined using standard analyzer for text. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It's possible to remove some fields from _all fields while defining your mapping with param 'include_in_all' https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html
So does it make sense to use phrase query on concatenation of your all fields? Rather not. I would say that multi_match can let you achieve similar goals as _all fields; you can search multiple fields in one query. But when using _all fields you can just use 'match' query.
_all field (which is removed in 6.0) index all the values from your json document whatever the field they appeared in.

how does Lucene index not_analyzed fields

I know for analyzed field, Lucene would tokenized the clause then store the tokens as an inverted index for searching. But how does Lucene index the Not_Analyzed fields, I don't believe it is still a inverted index. Is it BTree or Hash?
Not analyzed fields are also stored in the inverted index the same way analyzed fields are, they are simply... not analyzed. This means the field value will not be tokenized, etc, before being indexed.
So if your not_analyzed field contains the value New York, then that value will go unmodified and untokenized in the inverted index and you'll still be able to search for the documents containing that exact value. It's somehow similar to having an analyzed field whose analyzer is a keyword analyzer

Using Nest, how to mimic an _all field that includes ngram tokens?

I believe it is impossible for the _all field to contain ngram tokens. How can I mimic this behavior?
I have 7 types of entities, each with about 10 fields. Of those 70 total fields, about 15 must support partial search (using an ngram index analyzer). All fields will use the same search analyzer.
Is copy_to supported in Nest? I don't see it. If so, can different fields have different analyzers?
My thinking so far: If copy_to is supported, all fields I want to search would be copied to a single field, one per type, called "aggregate". The search query would specify a multifield search which included each of these aggregate fields.
The _all field can in fact contain nGram tokens. You have the ability to define both the search and index analyzers for the _all field. Please see my previous question Set analyzers for _all field with NEST However, you will need to pull the source for NEST and compile it to get this functionality, as it is not in the NEST 1.0.0-beta1 release on NuGet.

Stored field in elastic search

In the documentation, some types, such as numbers and dates, it specifies that store defaults to no. But that the field can still be retrieved from the json.
Its confusing. Does this mean _source?
Is there a way to not store a field at all, and just have it indexed and searchable?
None of the field types are stored by default. Only the _source field is. That means you can always get back what you sent to the search engine. Even if you ask for specific fields, elasticsearch is going to parse the _source field for you and give you back those fields.
You can disable the _source if you want but then you could only retrieve the fields that you explicitly stored, according to your mapping.

Field not searchable in ES?

I created an index myindex in elasticsearch, loaded a few documents into it. When I visit:
localhost:9200/myindex/mytype/1023
I noticed that my particular index has the following metadata for mappings:
mappings: {
mappinggroupname: {
properties: {
Aproperty: {
type: string
}
Bproperty: {
type: string
}
}
}
}
Is there some way to add "store:yes" and index: "analyzed" without having to reload/reindex all the documents?
Note that when i want to view a single document...
i.e. localhost:9200/myindex/mytype/1023
I can see the _source field contains all the fields of that document are and when I go to the "Browser" section of the head plugin it appears that all the columns are correct and corresponding to my fieldnames. So why is it that "stored" is not showing up in metadata? I can even perform a _search on them.
What is the difference between "stored":"true" versus the fact that I can see all my fields and values after indexing all my documents via the means I mention above?
Nope, no way! That's how your documents got indexed in the underlying lucene. The only way to change it is to reindex them all!
You see all those fields because you see the content of the special _source field in lucene, that's stored by default through elasticsearch. You are not storing all the fields separately but you do have the source document that you originally indexed through the _source, a single field that contains the whole document.
Generally the _source field is just enough, you don't usually need to configure every field as stored.
Also, the default is "index":"analyzed" if not specified for all the string fields. That means those fields are indexed and analyzed using the standard analyzer if not specified in the mapping. Therefore, as far as I can see from your mapping those two fields should be indexed, thus searchable.

Resources