What is use of fields in Elasticsearch? - elasticsearch

There is a type parameter which its value text, and again under the fields there is another type parameter which its value text. I don't understand what english and another are meaning. Can you tell me what is fields parameter's functionality and how can we use or benefit from them simply?
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
},
"another": {
"type": "keyword",
"analyzer": "standard"
}
}
}
}
}
}
}

This structure is called a multi-field.
The idea behind that is to be able to apply a different analysis pipeline to each sub-fields. In your case, the text.english sub-field will be analyzed using the english analyzer and the text.another sub-field will be analyzed by the standard analyzer.
The beauty of it is that your document only needs to have a text field and then in your queries you'll be able to reference the text, text.english and text.another sub-fields transparently.
# Your document
{
"text": "The nice dog runs after the cat"
}
After indexing this document,
in your text.english sub-field you're going to have the following tokens: nice, dog, run, after, cat
in your text.another sub-field you're going to have the following tokens: the, nice, dog, runs, after, the, cat
finally in your top-level text field you're going to have the same tokens as in text.another because the standard analyzer is also the default one when no analyzer is specified on a text field.

Related

How to conditionally apply an analyzer at index time to a field that could be one of many languages?

I have documents with a field (e.g. input_text) that contains a string that could be one of 20 odd languages. I have another field that has the short form of the language (e.g. lang)
I want to conditionally apply an analyzer at index time to the text field dependent on what the language is as detected from the language field.
I eventually want a Kibana dashboard with a single word cloud of the most common words in the text field (ie in multiple languages) but only words that have been stemmed and tokenized with stop words removed.
Is there a way to do this?
The elasticsearch documents suggest using multiple fields for each language and then specifying an analyzer for the appropriate field, but I can't do this as there are 20 some languages and this would overload my nodes.
There is no way to achieve what you want in Elasticsearch (applying analyzer to field A based on the value of field B).
I would recommend to create one index per language, and then create an index alias that groups all those indices and query against it.
PUT lang_de
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "german"
}
}
}
}
PUT lang__en
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "english"
}
}
}
}
POST _aliases
{
"actions": [
{
"add": {
"index": "lang_*",
"alias": "lang"
}
}
]
}

Elasticsearch: Cannot Get Results From Keyword Field

I have a field that is mapped as text and which includes another field mapped as keyword (fields keyword). I insert data and ensure it can retrieved using any query. However, when I query the additional field (mapped as keyword), I cannot find any data at all.
Here is the example (simplified):
POST people/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
And here is a query:
POST people/_search
{
"query": {
"match": {
"bio.keyword": "Portugal"
}
}
}
Same happens regardless of the casing (Portugal vs portugal). What is the reason for this behavior?
In elasticseach, if you have a text type field say description.
One value of description is: He likes dog but hate cat.
The revert index for this field is: He/like/dog/but/hate/cat
And there is another 'keyword' field which is description.keyword, which is exactly He likes dog but hate cat.
The keyword field requires 100% match.
Got it:
Keyword fields are only searchable by their exact value
References: https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Elasticsearch copy_to not working on keyword field

I am trying to copy two fields onto a third field, which should have the type 'keyword' (because I want to be able to aggregate by it, and do not need to perform a full-text search)
PUT /test/_mapping/_doc
{
"properties": {
"first": {
"copy_to": "full_name",
"type": "keyword"
},
"last": {
"copy_to": "full_name",
"type": "keyword"
},
"full_name": {
"type": "keyword"
}
}
}
I then post a new document:
POST /test/_doc
{
"first": "Bar",
"last": "Foo"
}
And query it using the composite field full_name:
GET /test2/_search
{
"query": {
"match": {
"full_name": "Bar Foo"
}
}
}
And no hits are returned.
If the type of the composite field full_name were text then it works as expected and described in the docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
Is it not possible to copy onto a keyword-type field?
The problem is that you use match query - When you index your docs you use keyword type which according to the ES documentation are "...only searchable by their exact value."
However when you query that field you use match query which is using the standard analyzer which, among other stuff, also does lower-casing which causes your terms to not match nothing.
You have few options I can think of in this case:
Change the field type to text which will perform the same analysis as the match query.
Create a custom field type with custom analyzer which will perform lower casing
Don't query more than a single term at a time and use term query instead of match
It seems that the type of destination field of copy_to must be text type.

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

How to not-analyze in ElasticSearch?

I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.
If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.
Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?
I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).
This is my mapping currently:
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string" }
}
}
}
my_field1 is language-dependent; this seems to work. my_field2 shall be verbatim. I'd like to give an analyzer there which simply does not do anything.
A sample value for my_field2 would be "B45c 14/04".
"my_field2": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.
This is no longer true due to the removal of the string (replaced by keyword and text) type as described here. Instead you should use keyword type with "index": true | false.
For Example OLD:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
becomes NEW:
{
"foo": {
"type" "keyword",
"index": true
}
}
This means the field is indexed but as it is typed as keyword not analyzed implicitly. If you would like to have the field analyzed, you need to use text type.
keyword analyser can be also used.
// don't actually use this, use "index": "not_analyzed" instead
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string", "analyzer": "keyword" }
}
}
}
As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed.
But keyword analyzer can be useful when it is set by default for whole index.
UPDATE: As it said in comments, string is no longer supported in 5.X

Resources