Converting nginx access log bytes to number in Kibana4 - elasticsearch

I would like to create a visualization of the sum of bytes sent using the data from my nginx access logs. When trying to create a "Metric" visualization, I can't use the bytes field as a sum because it is a string type.
And I'm not able to change it under settings.
How do I go about changing this field type to a number/bytes type?
Here is my logstash config for nginx access logs
filter {
if [type] == "nginx-access" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}
Since each logstash index is being created as an index, I'm guess I need to change it here.
I tried adding
mutate {
convert => { "bytes" => "integer" }
}
But it doesn't seem to make a difference.

Field types are configured using mappings, which is configured at the index level and can hardly change. With Logstash, as a new index is created everyday, so if you wan't to change these mappings either wait for the next day or delete the current index if you can.
By default these mappings are generated automatically by Elasticsearch depending on the syntax of the indexed JSON document and the applied Index Templates:
# Type String
{"bytes":"123"}
# Type Integer
{"bytes":123}
In the end there are 2 solutions:
Tune Logstash, to make it generate an integer and let Elasticsearch guess the field type → Use the mutate/convert filter
Tune Elasticsearch, to force the field bytes for the document type nginx-access to be of type integer → Use Index Template:
Index Template API:
PUT _template/logstash-nginx-access
{
"order": 1,
"template": "logstash-*",
"mappings": {
"nginx-access": {
"properties": {
"bytes": {
"type": "integer"
}
}
}
}
}

Related

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

Logstash/Elasticsearch keep auto-mapping geoip to object instead of geo_point

I have some logs with the following format(I changed the IPs from public to private, but you get the idea):
192.168.0.1 [20/Nov/2019:16:09:28 +0000] GET /some_path HTTP/1.1 200 2 2
192.168.0.2 [20/Nov/2019:16:09:28 +0000] GET /some_path HTTP/1.1 200 2 2
I then grok these logs using the following pattern:
grok { match => { "message" => "%{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] %{WORD:method} %{URIPATHPARAM:request} %{DATA:httpversion} %{NUMBER:response} %{NUMBER:duration}" } }
geoip { source => "clientip" }
On my output section, I have the following code:
else if "host.name" in [host][name]{ #if statement with the hostname
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "mms18-%{+YYYY.MM.dd}"
user => "admin-user"
password => "admin-password"
}
}
The problem is that when I go to Kibana the geoip.location is mapped as an object, and I can not use it on a map Dashboard.
Since the index's name changed daily, I can not manually put the correct geoip mapping, since I would have to do it every day.
One solution I thought that partially solves the problem is removing the date from the index in Logstash output, so it has a constant index of "mms18" and then using this on Kibana management console:
PUT mms18
{
"mappings": {
"properties": {
"geoip": {
"properties": {
"location": { "type": "geo_point" }
}
}
}
}
}
However, this is not ideal since I want to have the option of showing all the indexes with their respectful dates, and then choosing what to delete and what not.
Is there any way that I can achieve the correct mapping while also preserving the indexes with their dates?
Any help would be appreciated.
Use an index template (with a value for index_patterns like "mms-*") that maps geoip as a geo_point.

reindex while converting a string value of a specific field (present in old index) into a number field value (in the new index)

Could I ask, how could I reindex while converting a 'string' field e.g. "field2": "123.2" (in old index documents) into a float/double number e.g. "field2": 123.2 (intended to be in the new index) ? This post is the closest I could get, but I do not know which function to use for the cast/conversion of a string to a number. I am using ElasticSearch version 2.3.3. Thank you very much for any advice !!!
You could use Logstash to reindex your data and convert the field. Something like the following:
input {
elasticsearch {
hosts => "es.server.url"
index => "old_index"
query => "*"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
mutate {
convert => { "fieldname" => "long" }
}
}
output {
elasticsearch {
host => "es.server.url"
index => "new_index"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
Use Elasticsearch templates to specify the mapping for the new index and specify the field as a double type.
The easiest way to build a template is to use the existing mapping.
GET oldindex/_mapping
POST _template/templatename
{
"template" : "newindex", // this can be a wildcard pattern to match indexes
"mappings": { // this is copied from the response of the previous call
"mytype": {
"properties": {
"field2": {
"type": "double" // change the type
}
}
}
}
}
POST newindex
GET newindex/_mapping
Then use the elasticsearch _reindex API to move the data from the old index to the new index and parse the field as a double using an inline scripting (you may need to enable inline scripting)
POST _reindex
{
"source": {
"index": "oldindex"
},
"dest": {
"index": "newindex"
},
"script": {
"inline": "ctx._source.field2 = ctx._source.field2.toDouble()"
}
}
Edit: Updated to use _reindex endpoint

Why doesn't elasticsearch allow changes to an indexed data?

I tried to convert some of the fields in a previously indexed data from string to integer. But when i ran logstash again, the fields didn't get converted (checked in Kibana only). Why can't i make changes to an already indexed data and if not, how can i make the required changes to my index?
I've only been making changes in logstash. Here is a piece of logstash.conf:
input {
file {
type => "movie"
path => "C:/TestLogs/Test5.txt"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "(?<Movie_Name>[\w.\-\']*)\s(?<Rating>[\d.]+)\s(?<No. Of Downloads>\d+)\s(?<No. of views>\d+)" }
}
mutate {
convert => {"Rating" => "float"}
convert => {"No. of Downloads" => "integer"}
convert => {"No. of views" => "integer"}
}
}
Elasticsearch is using Lucene at its core for indexing and storing data. Lucene uses a read-only datastructure to store data and that's the reason why it is not possible to change data structures for data that is already stored in elasticsearch. It is possible to update the documents with new values, but not to change the structure for an entire index.
If you want to change the mappings, i.e. the data structure then you have to create a new index with a new mapping and store it there.
This of course is not that easy if elasticsearch is the master of the data. To do this you have to create a new index with a new mapping and read data from the old index and put it into the new index. You can do this by using the Scan and Scroll approach.
If you want to make this transparent to the application reading from elasticsearch you can use an alias:
At first the index name is data_v1 and the alias is data:
data -> data_v1
Then you create a new index: data_v2 with the new mapping. Read all data from data_v1 and store it in data_v2. Having done this, change the alias to point to data_v2
data -> data_v2
To change aliases you can use the 'remove' and 'add' functions:
POST /_aliases
{
"actions": [
{ "remove": {
"alias": "items",
"index": "items_v1"
}}
]
}
POST /_aliases
{
"actions": [
{ "add": {
"alias": "items",
"index": "items_v2"
}}
]
}

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources