Auto mapping in ElasticSearch NEST client using underscore - elasticsearch

ElasticSearch recommends to use underscores for field names.
I'm using Nest client and I have the following type:
public class Employee
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
Nest client offers a feature called auto mapping that can automatically infer the correct mappings from the properties of the POCO. If use this feature I will get:
"employee": {
"properties": {
"firstName": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"lastName": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
}
}
But fields does not conform naming convention. There is another feature for defining own mappings using attributes. But I don't want to specify it manually for each field. So is there a possibility to configure client to use underscores for combining words by default?

You can change the default field name inference of using camel casing to instead use snake casing through DefaultFieldNameInferrer(Func<string, string>) on ConnectionSettings

Related

Keyword field created automatically without any mapping in Entity class

My ElasticSearch version is 7.6.2 and my spring-boot-starter-data-elasticsearch is version 2.2.0.
Due to some dependency i am not upgrading ES to lastest version.
Problem i am facing is ES index is sometimes created with .keyword fields and sometimes it is just normal text field.
Below is my entity class. i am not able to find why this is happening. I read that all text field will have keyword field also. but why it is not created always.
My Entity class
#Setter
#Getter
#Document(indexName="myindex", createIndex=true, shards = 4)
public class MyIndex {
#Field(type = FieldType.Keyword)
private String place;
#Field(type = FieldType.Text)
private String name;
#Id
private String dynamicId = UUID.randomUUID().toString();
public MyIndex()
{}
Mapping in ES:
{
"mappings": {
"myindex": {
"properties": {
"place": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"dynamicId": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
}
Sometimes it is created as below for the same entity class
{
"mappings": {
"myindex": {
"properties": {
"place": {
"type": "keyword"
},
"name": {
"type": "text"
},
"dynamicId": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
}
With the entity definition shown, when Spring Data Elasticsearch creates the index and writes the mapping, you will get the mapping shown in your second example with these value for the properties:
{
"properties": {
"place": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
If you want to have a nested keyword property in Spring Data Elasticsearch you have to define it on the entity with the corresponding annotation.
Please notice: the #Id property is not mapped explicitly but will be dynamically mapped on first indexing of a document.
The mapping in the first case and the part in the second where a String is mapped as
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
is the default value that Elasticsearch uses when a document is indexed with a text field that was not mapped before - see the docs about dynamic mapping.
So your second example shows the mapping of an index that was created by Spring Data Elasticsearch and where some documents have been indexed.
The first one would be created by Elasticsearch if some other application creates the index and writes data into the index. It could also be that the index was created outside your application, and on application startup no mapping would then be written, because the index already exists. So you should review the way your indices are created.

Elasticsearch match string with spaces, columns, dashes exactly

I'm using Elasticsearch 6.8, and trying to write a query in python notebook. Here is a mapping used for the index i'm working with:
{ "mapping": { "news": { "properties": { "dateCreated": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" }, "itemId": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "market": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "timeWindow": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }
I'm trying to search for exact string like "[2020-08-16 10:00:00.0,2020-08-16 11:00:00.0]" in "timeWindow" field (which is a "text" type, not a "date" field), and also select by market="en-us" (market is a "text" field too). This string has spaces,colons,commas, a lot of whitecharacters, and I don't know how to make a right query.
At the moment I have this query:
res = es.search(index='my_index',
doc_type='news',
body={
'size': size,
'query':{
"bool":{
"must":[{
"simple_query_string": {
"query": "[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]",
"default_operator": "and",
"minimum_should_match":"100%"
}
},
{"match":{"market":"en-us"}}
]
}
}
})
The problem is that is doesn't match my "simple_query_string" for timeWindow string exactly (I understand that this string gets tokenized, splitted into parts like "2020","08","17","00","01", etc, and each token is analyzed separately), and I'm getting different values for timeWindow that I want to exclude, like
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 00:05:00.0,2020-08-17 01:05:00.0]'
...
'[2020-08-17 00:50:00.0,2020-08-17 01:50:00.0]'
'[2020-08-17 00:55:00.0,2020-08-17 01:55:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]']
Is there a way to do what I want?
UPD (and answer):
My current query uses "term" and "timeWindow.keyword", this combination allows me to do exact search for string with spaces and other whitecharacters:
res = es.search(index='msn_click_events', doc_type='news', body={
'size': size,
'query':{
"bool":{
"must":[{
"term": {
"timeWindow.keyword": tw
}
},
{"match":{"market":"en-us"}}
]
}
}
})
And this query selects only right timewindows values (string):
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]'
'[2020-08-17 02:00:00.0,2020-08-17 03:00:00.0]'
...
'[2020-08-17 22:00:00.0,2020-08-17 23:00:00.0]'
'[2020-08-17 23:00:00.0,2020-08-18 00:00:00.0]']
On your timeWindow field you need a keyword aka exact search but you are using the full-text query and as you defined this field as text field and you already guessed it correct, it gets analyzed during the index time, hence you are not getting the correct results.
If you are using the dynamic mapping, then .keyword field would be generated for each text field in the mapping, so you can simply use timeWindow.keyword in your query and it will work.
If you have defined your mapping than you need to add the keyword field to store the timewindow, reindex the data and use that keyword field in query to get the expected results.

Avoid creating dual mappings from logstash

I notice that logstash creates an extra "keyword" field in the index mapping for every string field that it extracts from the log files and sends to elastic search.
There are many fields that I've removed completely with the prune plugin, but there are other fields that I don't want to remove completely, but I also don't need to have a *.keyword for them.
Is there a way to have logstash only create *.keyword fields for some fields and not others? Specifically, is there a way for logstash to have a whitelist of fields that it is OK to create *.keywords for, and not do it for anything else?
(using elasticsearch 6.x)
I think you need to change the mapping of the desired fields. The mapping page shows the default text type mapping:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_mapping_changes.html
I tried to set a field without a keyword field and it worked except you couldn't agregate on that field (I tried terms aggregation) even if you set index: true in the mapping. I might have missed something but I think this is where you should start.
The solution I'm working with for now is a dynamic templates.
I can map some fields to just text and others to text and a keyword. For example:
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"match_my_custom_fields": {
"match_mapping_type": "string",
"match": "custom_prefix_*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256
}
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
}
This way, everything beginning with custom_prefix_ will have a text and keyword field, and everything else will just have a keyword.
Of course, I somehow broke the geoip.geo_point that was being emitted from the geoip logstash plugin, and now my map visualizations won't work, so I need to figure out how to restore that.
EDIT: Got geo_point working again, see the "geoip" prop

How can I force a float casting on elasticsearch?

I have an index elasticsearch with a mapping:
{
"book": {
"mappings": {
"educational": {
"properties": {
"price": {
"type": "float"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now I can index a document with instead of a float, a string:
{
"title": "Test",
"price": "120.99"
}
The value of price will be presented as a string when I will retrieve this document later, despite the fact that the mapping say it should be a float.
I know that the price will still be indexed as a float despite the fact that it is presented as a string but is there a way to force a casting of the field into a float to have a better coherence in the data?
internally the field will be stored as a float, when coercing is used. However the original document will not be changed, which means the original JSON will still contain the field as a string.
You could use a convert processor in a pipeline to change the string to a float before the document is being indexed.

Increase ignore_above property in elastic search

I am using elastic search with nest 5.0
I need to increase ignore_above from default 256 to 512.
Can I do this by the Attribute mappings?
Otherwise how can I do this using the fluent API?
You can use [Keyword(IgnoreAbove=512)] to change the value:
[ElasticsearchType(Name = "mytype")]
class mytype
{
// default: will be mapped to both keyword (keyword search)
// and text (full-text search)
public string defaultString { get; set; }
// by default ignore above is set to 256
[Keyword]
public string keywordType { get; set; }
// change ignore above
[Keyword(IgnoreAbove=512)]
public string longKeyword { get; set; }
// text type, full-text search
[Text]
public string textType { get; set; }
// store but not searched
[Text(Index = false)]
public string textTypeNotSearchable { get; set; }
}
This is the created index in ElasticSearch for the above type:
"defaultString": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"keywordType": {
"type": "keyword"
},
"longKeyword": {
"type": "keyword",
"ignore_above": 512
},
"textType": {
"type": "text"
},
"textTypeNotSearchable": {
"type": "text",
"index": false
}

Resources