Prevent Elasticsearch from splitting on specific character while indexing

Prevent Elasticsearch from splitting on specific character while indexing - elasticsearch

I have a field with values such as 170726-001, 170726-002, 170726-003 and it appears that the values in the three fields get split into 170726 and 00N. This affects the relevance of my search results when searching for 170726-001 as a keyword using Query String Query.
How to I prevent Elasticsearch from splitting the value on the - character when indexing?

With the help of #filip-cordas and other comments I updated my index to reflect the following. Its using the keyword type instead of the default text type. Doing it on index like this prevents me from having to specify my_field.keyword in the search.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_field": {
"type": "keyword",
"index": true
}
}
}
}
}

Related

Elasticsearch: why can't find by `term` query but can find by `match` query?

I am using Elasticsearch 11 for query text.
I have below query but it doesn't return any document.
POST/_search
{
"query": {
"term":{
"metric_name" : {"value": "ConsumedReadCapacityUnits","boost": 1.0}
}
}
}
Then I change it to text query like below which can find the matched document:
POST/_search
{
"query": {
"match":{
"metric_name" : "ConsumedReadCapacityUnits"
}
}
}
Based on the doc in term query, it matches exact term but the value ConsumedReadCapacityUnits is an exact one for metric_name, so why term query doesn't return anything?

Match query analyzes the search term, based on the standard analyzer (if no analyzer is specified) and then matches the analyzed term with the terms stored in the inverted index. By default text type field uses a standard analyzer if no analyzer is specified. For eg. SchooL gets analyzed to school
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
As mentioned in the comments above mapping type of ConsumedReadCapacityUnits is text, so you can perform term query on ConsumedReadCapacityUnits by updating your index mapping
If you want to store the ConsumedReadCapacityUnits field as of both text and keyword type, then you can update your index mapping as shown below to use multi fields
PUT /_mapping
{
"properties": {
"ConsumedReadCapacityUnits": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
And then reindex the data again. After this, you will be able to perform term query using the "ConsumedReadCapacityUnits.keyword" field as of keyword type and "ConsumedReadCapacityUnits" as of text type
OR the other way is to create a new index, with the below index mapping
{
"mappings": {
"properties": {
"ConsumedReadCapacityUnits": {
"type": "keyword"
}
}
}
}
And then index the data in this new index

Elastic query bool must match issue

Below is the query part in Elastic GET API via command line inside openshift pod , i get all the match query as well as unmatch element in the fetch of 2000 documents. how can i limit to only the match element.
i want to specifically get {\"kubernetes.container_name\":\"xyz\"}} only.
any suggestions will be appreciated
-d ' {\"query\": { \"bool\" :{\"must\" :{\"match\" :{\"kubernetes.container_name\":\"xyz\"}},\"filter\" : {\"range\": {\"#timestamp\": {\"gte\": \"now-2m\",\"lt\": \"now-1m\"}}}}},\"_source\":[\"#timestamp\",\"message\",\"kubernetes.container_name\"],\"size\":2000}'"

For exact matches there are two things you would need to do:
Make use of Term Queries
Ensure that the field is of type keyword datatype.
Text datatype goes through Analysis phase.
For e.g. if you data is This is a beautiful day, during ingestion, text datatype would break down the words into tokens, lowercase them [this, is, a, beautiful, day] and then add them to the inverted index. This process happens via Standard Analyzer which is the default analyzer applied on text field.
So now when you query, it would again apply the analyzer at querying time and would search if the words are present in the respective documents. As a result you see documents even without exact match appearing.
In order to do an exact match, you would need to make use of keyword fields as it does not goes through the analysis phase.
What I'd suggest is to create a keyword sibling field for text field that you have in below manner and then re-ingest all the data:
Mapping:
PUT my_sample_index
{
"mappings": {
"properties": {
"kubernetes":{
"type": "object",
"properties": {
"container_name": {
"type": "text",
"fields":{ <--- Note this
"keyword":{ <--- This is container_name.keyword field
"type": "keyword"
}
}
}
}
}
}
}
}
Note that I'm assuming you are making use of object type.
Request Query:
POST my_sample_index
{
"query":{
"bool": {
"must": [
{
"term": {
"kubernetes.container_name.keyword": {
"value": "xyz"
}
}
}
]
}
}
}
Hope this helps!

Elasticsearch - Making a field aggregatable but not searchable

My elasticsearch data has a large number of fields that I don't need to search by. But I would like to get aggregations like percentiles, median, count, avg. etc. on these fields.
Is there a way to disable searchability of a field but let it still be aggregatable?

Most of the fields are indexed by default and hence make them searchable. If you want to make a field non-searchable all you have to do is set its index param as false and doc_values to true.
As per elastic documentation:
All fields which support doc values have them enabled by default.
So you need not explicitly set "doc_values": true for such fields.
For e.g.
{
"mappings": {
"_doc": {
"properties": {
"only_agg": {
"type": "keyword",
"index": false
}
}
}
}
}
If you try to search on field only_agg in above example, elastic will throw exception with reason as below:
Cannot search on field [only_agg] since it is not indexed.

yeah take a look at doc_value:
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

can we mandate elastic search to treat all numeric field as double

I am using dynamic binding while indexing my data. For example
{ "a" : 10 }
will create the mapping for the field as long . While second time while indexing the data may be double { "a" : 10.10 }. but since the mapping is already defined as long it would index data as long. The only way to fix this is defined the mapping in advance, which I dont want to do for various reasons.
So my question - Is there a way I can mandate elastic search to treat all numberic field as double.

You can use dynamic mapping template: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
If it matches as long map it to double:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "double"
}
}
}
]
}
}
}

Unanalyzed fields on Kibana

i need help to correct kibana field. when I try to visualizing the fields, shown me the following warning:
Careful! The field contains Analyzed selected strings. Analyzed
strings are highly unique and can use a lot of memory to visualize.
Values: such as bar will be foo-foo and bar broken into. See Core
Mapping Types for more information on setting esta field Analyzed as
not

Elasticsearch default dynamic mapping is to analyze any string field (break the field into tokens, for instance: aaa_bbb_ccc will be break down into aaa,bbb and ccc).
If you do not want such behavior you must change the mapping settings
before any document was pushed into the index.
You have two options to do that:
Change the mapping for a particular index using mapping API, in a static way or dynamic way (dynamic means that the mapping will be applies also to fields that still does not exist in the index)
You can change the behavior of any index according to a pattern, using the template API
This example shows a template that changes the mapping for any index that starts with "app", applying "not analyze" to any field in any type and make sure "timestamp" is a date (good for cases in with the timestamp is represented as a number of seconds from 1970):
{
"template": "myindciesprefix*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
},
{
"timestamp_field": {
"match": "timestamp",
"mapping": {
"type": "date"
}
}
}
]
}
}
}

Really you dont have any problem is only a message of info, but if you dont want analyzed fields when you build your index in elasticsearch you must indicate that one field is a not analyzed field.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Prevent Elasticsearch from splitting on specific character while indexing - elasticsearch

Related

Elasticsearch: why can't find by `term` query but can find by `match` query?

Elastic query bool must match issue

Elasticsearch - Making a field aggregatable but not searchable

can we mandate elastic search to treat all numeric field as double

Unanalyzed fields on Kibana

Categories

Resources