I would like to disable all the "raw" fields that are created in Elasticsearch by logstash-forwarder. So if I have a field as "host" logstash-forwarder won't create a "host.raw" field. But I need a general solution for all the string fields.
I have my string fields as "not_analyzed" so having raw fields has no point and just a duplicate of the data.
I tried to remove "fields" part of the mapping below but it's added back after the first log message. The closest thing I could achieve was to add the following mapping but that still creates empty raw fields:
curl -XPUT 'localhost:9200/myindex/' -d '{
"mappings": {
"_default_": {
"dynamic_templates" : [ {
"string_fields" : {
"mapping" : {
"index" : "not_analyzed",
"type" : "string",
"fields" : {
"raw" : {
"ignore_above" : 0,
"index" : "not_analyzed",
"type" : "string"
}
}
},
"match" : "*",
"match_mapping_type" : "string"
}
} ],
"_all": { "enabled": false }
}
}
}'
So how can I disable these fields?
Related
In my case, NIFI will receive data from syslog firewall, then after transformation sends JSON to ELASTIC. This is my first contact with ELASTICSEARCH
{
"LogChain" : "Corp01 input",
"src_ip" : "162.142.125.228",
"src_port" : "61802",
"dst_ip" : "177.16.1.13",
"dst_port" : "6580",
"timestamp_utc" : 1646226066899
}
In Elasticsearch automatically created Index with such types
{
"mt-firewall" : {
"mappings" : {
"properties" : {
"LogChain" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dst_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dst_port" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"src_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"src_port" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp_utc" : {
"type" : "long"
}
}
}
}
}
How to change type fields in Elasticsearch?
"src_ip": type "ip"
"dst_ip": type "ip"
"timestamp_utc": type "data"
You can change or configure field type using Mapping in Elasticsearch and some of the way i have given below:
1. Explicit Index Mapping
Here, you will define index mapping by your self with all the required field and specific type of field before indexing any document to Elasticsearch.
PUT /my-index-000001
{
"mappings": {
"properties": {
"src_ip": { "type": "ip" },
"dst_ip": { "type": "ip" },
"timestamp_utc": { "type": "date" }
}
}
}
2. Dyanamic Template:
Here, you will provide dynamic template while creating index and based on condition ES will map field with specific data type like if field name end with _ip then map field as ip type.
PUT my-index-000001/
{
"mappings": {
"dynamic_templates": [
{
"strings_as_ip": {
"match_mapping_type": "string",
"match": "*ip",
"runtime": {
"type": "ip"
}
}
}
]
}
}
Update 1:
If you want to update mapping in existing index then it is not recommndate as it will create data inconsistent.
You can follow bellow steps:
Use Reindex API to copy data to temp index.
Delete your original index.
define index with one of the above one method with index mapping.
Use Reindex API to copy data from temp index to original index (newly created index with Mapping)
Assuming I have the following index structure:
{
"title": "Early snow this year",
"body": "After a year with hardly any snow, this is going to be a serious winter",
"source": [
{
"name":"CNN",
"details": {
"site": "cnn.com"
}
},
{
"name":"BBC",
"details": {
"site": "bbc.com"
}
}
]
}
and I have a bool query to try and retrieve this document here:
{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "snow",
"fields" : ["title", "body"]
}
},
"filter": {
"bool": {
"must" : [
{ "term" : {"source.name" : "bbc"}},
{ "term" : {"source.details.site" : "BBC.COM"}}
]
}
}
}
}
}'
But it is not working with zero hits, how should I modify my query? It is only working if I remove the { "term" : {"source.details.site" : "BBC.COM"}}.
Here is the mapping:
{
"news" : {
"mappings" : {
"article" : {
"properties" : {
"body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"properties" : {
"details" : {
"properties" : {
"site" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
You are doing a term query on "source.details.site". Term query means that the value you provide will not be analysed at query time. If you are using default mapping then source.details.site will be lowercased. Now when you query it with term and "BBC.COM", "BBC.COM" will not be analysed and ES is trying to match "BBC.COM" with "bbc.com" (because it was lowercased at index time) and result is false.
You can use match instead of term to get it analysed. But its better to use term query on your keyword field, it you know in advance the exact thing that would have been indexed. Term queries have good advantage of caching from ES side and it is faster than match queries.
You should clean your data at index time as you will write once and read always. So anything like "/", "http" should be removed if you are not losing the semantics. You can achieve this from your code while indexing or you can create custom analysers in your mapping. But do remember that custom analysers won't work on keyword field. So, if you try to achieve this on ES side, you wont be able to do aggregations on that field without enabling field data, that should be avoided. We have an experimental support for normalisers in latest update, but as it is experimental, don't use it in production. So in my opinion you should clean the data in your code.
I am trying to stop elasticsearch from analyzing some fields in my documents.
I posted this mapping:
{
"properties" : {
"f1" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
},
"f2" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
},
"f3" : {
"index" : "not_analyzed",
"include_in_all" : false,
"type" : "string"
}
}
}
Then I ping the mapping endpoint and it doesn't tell me if those fields
are analyzed or not:
{
"myindex" : {
"mappings" : {
"mytype" : {
"properties" : {
"f1" : {
"type" : "keyword",
"include_in_all" : false
},
"f2" : {
"type" : "keyword",
"include_in_all" : false
},
"f3" : {
"type" : "keyword",
"include_in_all" : false
}
}
}
}
}
}
In the examples I have seen querying _mapping API seems to tell what fields are analyzed or not.
In elasticsearch 5.0 and later there's a new way of separating analyzed and non-analyzed content:
Strings are dead, long live strings!
Keyword datatype
But, to summarize:
keyword is not analyzed
text is analyzed
and the index property that had 3 values: "no","analyzed","not-analyzed" is now simplified to just "yes" and "no"
My logstash config is something like the following
if "user" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "user-%{+YYYY.MM.dd}"
template => '/path/to/elastic-template.json'
flush_size => 50
}
}
And the json template contains the lines
"fields" : {
"{name}" : {"type": "string", "index" : "analyzed", "omit_norms" : true, "index_options" : "docs"},
"{name}.raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
So I assume the .raw can be used when searching or generating the visualization.
However, I removed the existing index and rebuild again, I can see the data, but I still cannot find the .raw field either Kibana's settings, discover or visualize
How to use the .raw field?
The template you posted isn't even valid JSON. If you want to apply a raw field as in not_analyzed you have to do it like this:
"action" : {
"type" : "string",
"fields" : {
"raw" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}
This will create a action.raw field.
I encountered same issue.
I used ES5.5.1 and logstash 5.5.1, below is my template file
{
"template": "access_log",
"settings": {
"index.refresh_interval" : "5s"
},
"mappings": {
"log": {
"properties":{
"geoip":{
"properties":{
"location" : {
"type" : "geo_point",
"index": "false"
}
}
}
}
}
}
}
I create an index with a mapping which contains a field not_analyzed with command below and index a document with next command.
curl -XPUT localhost:9200/twitter -d '{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"tweet" : {
"properties" : {
"message" : { "type" : "string",
"index": "not_analyzed"}
}
}
}
}'
curl -XPOST 'http://localhost:9200/twitter/tweet?' -d '{
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
'
I checked to mappings with http://localhost:9200/twitter/_mapping?pretty=true and it outputs:
{
"twitter" : {
"mappings" : {
"tweet" : {
"properties" : {
"message" : {
"type" : "string",
"index" : "not_analyzed"
},
"post_date" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"user" : {
"type" : "string"
}
}
}
}
}
}
Finally when I search with this http://localhost:9200/twitter/tweet/_search?pretty=1&q=trying it finds the indexed document. Is it normal? I thought it should not find it unless I search the complete text "trying out Elasticsearch".
not_analyzed means that it's not doing tokenizing/other analysis to index the values, but it does still store the full value in Elasticsearch and it can be used as an exact match in a terms query. The field value is still getting included/analyzed into the _all field and indexed there so that it's searchable.
You need to set "include_in_all": false or "index": "no" to disable that.
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html for more information.