Avoid creating dual mappings from logstash - elasticsearch

I notice that logstash creates an extra "keyword" field in the index mapping for every string field that it extracts from the log files and sends to elastic search.
There are many fields that I've removed completely with the prune plugin, but there are other fields that I don't want to remove completely, but I also don't need to have a *.keyword for them.
Is there a way to have logstash only create *.keyword fields for some fields and not others? Specifically, is there a way for logstash to have a whitelist of fields that it is OK to create *.keywords for, and not do it for anything else?
(using elasticsearch 6.x)

I think you need to change the mapping of the desired fields. The mapping page shows the default text type mapping:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_mapping_changes.html
I tried to set a field without a keyword field and it worked except you couldn't agregate on that field (I tried terms aggregation) even if you set index: true in the mapping. I might have missed something but I think this is where you should start.

The solution I'm working with for now is a dynamic templates.
I can map some fields to just text and others to text and a keyword. For example:
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"match_my_custom_fields": {
"match_mapping_type": "string",
"match": "custom_prefix_*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256
}
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
}
This way, everything beginning with custom_prefix_ will have a text and keyword field, and everything else will just have a keyword.
Of course, I somehow broke the geoip.geo_point that was being emitted from the geoip logstash plugin, and now my map visualizations won't work, so I need to figure out how to restore that.
EDIT: Got geo_point working again, see the "geoip" prop

Related

Elasticsearch - Using copy_to on the fields of a nested type

Elastic version 7.17
Below I've pasted a simplified version of my mappings which represent a nested object structure. One top-level-object will have one or more second-level-object. A second-level-object will have one or more third-level-object. Fields field_a, field_b, and field_c on third-level-object are all related to each other so I'd like to copy them into a single field that can be partial matched against. I've done this on a lot of attributes at the top-level-object level, so I know it works.
{
"mappings": {
"_doc": { //one top level object
"dynamic": "false",
"properties": {
"second-level-objects": { //one or more second level objects
"type": "nested",
"dynamic": "false",
"properties": {
"third-level-objects": { //one or more third level objects
"type": "nested",
"dynamic": "false",
"properties": {
"my_copy_to_field": { //should have the values from field_a, field_b, and field_c
"type": "text",
"index": true
},
"field_a": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_b": {
"type": "long",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_c": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_d": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
}
}
However, when I run a nested query against that my_copy_to_field I get no results because the field is never populated, even though I know my documents have data in the 3 fields with copy_to. If I perform a nested query against field_d which is not part of the copied info I get the expected results, so it seems like there's something going on with nested (or double-nested in my case) usage of copy_to that I'm overlooking. Here is my query which returns nothing:
GET /my_index/_search
{
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects",
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects.third-level-objects",
"query": {
"bool": {
"should": [
{"match": {"second-level-objects.third-level-objects.my_copy_to_field": "my search value"}}
]
}
}
}
}
}
}
}
I've tried adding include_in_root:true to the third-level-objects, but that didn't make any difference. If I could just get the field to populate with the copied data then I'm sure I can work through getting the query working. Is there something I'm missing about using copy_to with nested fields?
Additionally, when I view my data in Kibana -> Discover, I see second-level-objects as an available "Nested" field, but I don't see anything for third-level-objects, even though KQL recognizes it as a field. Is that symptomatic of an issue?
You must add complete path nested, like this:
"field_a": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_b": {
"type": "long",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_c": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
}

Elasticsearch match string with spaces, columns, dashes exactly

I'm using Elasticsearch 6.8, and trying to write a query in python notebook. Here is a mapping used for the index i'm working with:
{ "mapping": { "news": { "properties": { "dateCreated": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" }, "itemId": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "market": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "timeWindow": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }
I'm trying to search for exact string like "[2020-08-16 10:00:00.0,2020-08-16 11:00:00.0]" in "timeWindow" field (which is a "text" type, not a "date" field), and also select by market="en-us" (market is a "text" field too). This string has spaces,colons,commas, a lot of whitecharacters, and I don't know how to make a right query.
At the moment I have this query:
res = es.search(index='my_index',
doc_type='news',
body={
'size': size,
'query':{
"bool":{
"must":[{
"simple_query_string": {
"query": "[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]",
"default_operator": "and",
"minimum_should_match":"100%"
}
},
{"match":{"market":"en-us"}}
]
}
}
})
The problem is that is doesn't match my "simple_query_string" for timeWindow string exactly (I understand that this string gets tokenized, splitted into parts like "2020","08","17","00","01", etc, and each token is analyzed separately), and I'm getting different values for timeWindow that I want to exclude, like
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 00:05:00.0,2020-08-17 01:05:00.0]'
...
'[2020-08-17 00:50:00.0,2020-08-17 01:50:00.0]'
'[2020-08-17 00:55:00.0,2020-08-17 01:55:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]']
Is there a way to do what I want?
UPD (and answer):
My current query uses "term" and "timeWindow.keyword", this combination allows me to do exact search for string with spaces and other whitecharacters:
res = es.search(index='msn_click_events', doc_type='news', body={
'size': size,
'query':{
"bool":{
"must":[{
"term": {
"timeWindow.keyword": tw
}
},
{"match":{"market":"en-us"}}
]
}
}
})
And this query selects only right timewindows values (string):
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]'
'[2020-08-17 02:00:00.0,2020-08-17 03:00:00.0]'
...
'[2020-08-17 22:00:00.0,2020-08-17 23:00:00.0]'
'[2020-08-17 23:00:00.0,2020-08-18 00:00:00.0]']
On your timeWindow field you need a keyword aka exact search but you are using the full-text query and as you defined this field as text field and you already guessed it correct, it gets analyzed during the index time, hence you are not getting the correct results.
If you are using the dynamic mapping, then .keyword field would be generated for each text field in the mapping, so you can simply use timeWindow.keyword in your query and it will work.
If you have defined your mapping than you need to add the keyword field to store the timewindow, reindex the data and use that keyword field in query to get the expected results.

Use Completion Suggester for all the fields of across the Elasticsearch type

I am implementing Completion Suggester in my application, and here goes my requirement:
I am using Elasticsearch 5.5.3 (which support multiple types). I have around 10 types in my Elasticsearch and each type has around 10 string fields. What I want to do is make a single search box, that would complete the phrase (of any fields of those 10 types) as user starts typing using completion suggester. What could be the best approach to it? Is using _all field a good idea?
Yes, that's perfectly doable using a "custom all field" field of type completion
First, create the index with all the types and make sure to copy each field in a custom field of type completion:
PUT my_index
{
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text",
"copy_to": "my_all"
},
"field2": {
"type": "text",
"copy_to": "my_all"
},
"my_all": {
"type": "completion"
}
}
},
"type1": {
"properties": {
"field1": {
"type": "text",
"copy_to": "my_all"
},
"field2": {
"type": "text",
"copy_to": "my_all"
},
"my_all": {
"type": "completion"
}
}
}
}
}
Then, you'd query the completion data like this (i.e. without specifying any mapping type and using the common my_all field):
POST my_index/_search
{
"suggest": {
"my-suggest": {
"prefix": "bla",
"completion": {
"field": "my_all"
}
}
}
}

How to set elasticsearch index mapping as not_analysed for all the fields

I want my elasticsearch index to match the exact value for all the fields. How do I map my index to "not_analysed" for all the fields.
I'd suggest making use of multi-fields in your mapping (which would be default behavior if you aren't creating mapping (dynamic mapping)).
That way you can switch to traditional search and exact match searches when required.
Note that for exact matches, you would need to have keyword datatype + Term Query. Sample examples are provided in the links I've specified.
Hope it helps!
You can use dynamic_templates mapping for this. As a default, Elasticsearch is making the fields type as text and index: true like below:
{
"products2": {
"mappings": {
"product": {
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
As you see, also it creates a keyword field as multi-field. This keyword fields indexed but not analyzed like text. if you want to drop this default behaviour. You can use below configuration for the index while creating it :
PUT products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
]
}
}
}
After doing this the index will be like below :
{
"products": {
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
],
"properties": {
"color": {
"type": "keyword",
"index": false
},
"type": {
"type": "keyword",
"index": false
}
}
}
}
}
}
Note: I don't know the case but you can use the multi-field feature as mentioned by #Kamal. Otherwise, you can not search on the not analyzed fields. Also, you can use the dynamic_templates mapping set some fields are analyzed.
Please check the documentation for more information :
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
Also, I was explained the behaviour in this article. Sorry about that but it is Turkish. You can check the example code samples with google translate if you want.

Dynamic Mapping for an object field that unwraps the parent path

I am evaluating whether ElasticSearch can meet the needs of a new system I'm building. It looks amazing, so I'm really hopeful I can figure out a mapping strategy that works.
In this system, administrators can define fields to be associated with documents dynamically. So a given type (in the elasticsearch sense of the word) can have any number of fields, which I do not know the name of ahead of time. And each field can be of any type: int, date, string, etc.
An example document may look like:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"tagline": "Ask not what your company can do for you, but..."
}
Notice that there are 2 string fields. Awesome. My problem though is that I want the "tagline" to be analyzed, but I do not want "title" to be analyzed.
Remember I don't know the names of these fields ahead of time. And there could be multiple fields of each type. So there could be 10 string fields of various names, 3 of which should be analyzed and 7 of which should not.
Another requirement I have is that the name the administrator gives the field should also be what they can search by. So, for example, if they want to find all the Vice Interns who have something to say, the lucene query may be:
+title:"Vice Intern" +tagline:"company"
So my thought was that I could define a dynamic mapping. Since I don't know the names of the fields ahead of time, it seems like a great approach. The key though is coming up with a way of differentiating string fields that should be analyzed and ones that shouldn't be!
I thought, hey, I'll just put all the fields that need analyzing into a nested object, like this:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"textfields": {
"tagline": "Ask not what your company can do for you, but...",
"somethingelse": "lorem ipsum",
}
}
Then, in my dynamic mapping, I have a way of mapping those fields differently:
{
"mytype": {
"dynamic_templates": {
"nested_textfields": {
"match": "textfields",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"analyzer": "default"
}
}
}
}
}
I know that isn't right, I actually need some kind of nested mapping, but no matter, because if I understand it correctly, even if I got that working, it would mean those fields are searched for (via lucene syntax) like this:
+title:"Vice Intern" +textfields.tagline:"company"
And I don't want the "textfields" prefix. Since I'm the one providing the textfields object that wraps the text fields, I know that the fields within it are still uniquely named across the entire document.
I thought of using a pattern match instead. So instead of wrapping them in a "textfields" object, I could prefix them, like "textfield_tagline". But when doing that, the {name} token in the dynamic mapping includes the prefix, I don't see a way to just pull out the "*" portion.
Any solution which gets me the necessary behavior is a correct answer. Even if that involves nested mapping information into the documents themselves (can you do that? I've seen something like that, I think...).
EDIT:
I've attempted the following dynamic template. I'm trying to use index_name to remove the 'textfields.' in the index. This dynamic template just doesn't seem to match though, because after putting a document and looking at the mapping I see no analyzer specified.
{
"mytype" : {
"dynamic_templates":
[
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
}
]
}
}
I was able to duplicate the results that you asked for specifically with the following index creation (with mappings), document, and search query. The type does vary a bit, but it serves the purpose of the example.
Index Settings
PUT http://localhost:9200/sandbox
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"mytype": {
"dynamic_templates": [
{
"indexedfields": {
"path_match": "indexedfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
},
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}"
}
}
},
{
"strings": {
"path_match": "*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Document
PUT http://localhost:9200/sandbox/mytype/1
{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
Search
POST http://localhost:9200/sandbox/mytype/_search
{
"query": {
"query_string": {
"query": "message:\"great balls\""
}
},
"filter":{
"query":{
"query_string":{
"query":"username:\"User Name\""
}
}
},
"from":0,
"size":10,
"sort":[
]
}
The search returns the following response:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[
{
"_index":"sandbox",
"_type":"mytype",
"_id":"1",
"_score":0.19178301,
"_source":{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
}
]
}
}

Resources