Why doesn't elasticsearch allow changes to an indexed data? - elasticsearch

I tried to convert some of the fields in a previously indexed data from string to integer. But when i ran logstash again, the fields didn't get converted (checked in Kibana only). Why can't i make changes to an already indexed data and if not, how can i make the required changes to my index?
I've only been making changes in logstash. Here is a piece of logstash.conf:
input {
file {
type => "movie"
path => "C:/TestLogs/Test5.txt"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "(?<Movie_Name>[\w.\-\']*)\s(?<Rating>[\d.]+)\s(?<No. Of Downloads>\d+)\s(?<No. of views>\d+)" }
}
mutate {
convert => {"Rating" => "float"}
convert => {"No. of Downloads" => "integer"}
convert => {"No. of views" => "integer"}
}
}

Elasticsearch is using Lucene at its core for indexing and storing data. Lucene uses a read-only datastructure to store data and that's the reason why it is not possible to change data structures for data that is already stored in elasticsearch. It is possible to update the documents with new values, but not to change the structure for an entire index.
If you want to change the mappings, i.e. the data structure then you have to create a new index with a new mapping and store it there.
This of course is not that easy if elasticsearch is the master of the data. To do this you have to create a new index with a new mapping and read data from the old index and put it into the new index. You can do this by using the Scan and Scroll approach.
If you want to make this transparent to the application reading from elasticsearch you can use an alias:
At first the index name is data_v1 and the alias is data:
data -> data_v1
Then you create a new index: data_v2 with the new mapping. Read all data from data_v1 and store it in data_v2. Having done this, change the alias to point to data_v2
data -> data_v2
To change aliases you can use the 'remove' and 'add' functions:
POST /_aliases
{
"actions": [
{ "remove": {
"alias": "items",
"index": "items_v1"
}}
]
}
POST /_aliases
{
"actions": [
{ "add": {
"alias": "items",
"index": "items_v2"
}}
]
}

Related

Logstash/Elasticsearch JDBC document_id vs document_type?

So im trying to wrap my head around the document_type vs document_id when using the JDBC importer from logstash and exporting to elasticsearch.
I finally wrapped my head around indexes. But lets pretend im pulling from a table of sensor data (like temp/humidity/etc...) that has sensor id's...temps/ humidity (weather related data) with time recorded. (So it's a big table)
And I want to keep polling the database every X so often.
What would document_type vs document_id be in this instance, this is going to be stored (or whatever you want to call it) against 1 index.
The document_type vs document_id confuses me, especially in regards to JDBC importer.
If I set document_id to say my primary key, won't it get over-written each time? So i'll just have 1 document of data each time? (which seems pointless)
The jdbc plugin will create a JSON document with one field for each column. So to keep consistent with your example, if you had that data it would be imported as a document that looks like this:
{
"sensor_id": 567,
"temp": 90,
"humidity": 6,
"timestamp": "{time}",
"#timestamp": "{time}" // auto-created field, the time Logstash received the document
}
You were right when you said that if you set document_id to your primary key, it would get overwritten. You can disregard document_id unless you want to update existing documents in Elasticsearch, which I don't imagine you would want to do with this type of data. Let Elasticsearch generate the document id for you.
Now let's talk about document_type. If you want to set the document type, you need to set the type field in Logstash to some value (which will propagate into Elasticsearch). So the type field in Elasticsearch is used to group similar documents. If all of the documents in your table that you're importing with the jdbc plugin are of the same type (they should be!), you can set type in the jdbc input like this...
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "mysql"
parameters => { "favorite_artist" => "Beethoven" }
schedule => "* * * * *"
statement => "SELECT * from songs where artist = :favorite_artist"
...
type => "weather"
}
}
Now, in Elasticsearch you can take advantage of the type field by setting a mapping for that type. For example you might want:
PUT my_index
{
"mappings": {
"weather": {
"_all": { "enabled": false },
"properties": {
"sensor_id": { "type": "integer" },
"temp": { "type": "integer" },
"humidity": { "type": "integer" },
"timestamp": { "type": "date" }
}
}
}
}
Hope this helps! :)

reindex while converting a string value of a specific field (present in old index) into a number field value (in the new index)

Could I ask, how could I reindex while converting a 'string' field e.g. "field2": "123.2" (in old index documents) into a float/double number e.g. "field2": 123.2 (intended to be in the new index) ? This post is the closest I could get, but I do not know which function to use for the cast/conversion of a string to a number. I am using ElasticSearch version 2.3.3. Thank you very much for any advice !!!
You could use Logstash to reindex your data and convert the field. Something like the following:
input {
elasticsearch {
hosts => "es.server.url"
index => "old_index"
query => "*"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
mutate {
convert => { "fieldname" => "long" }
}
}
output {
elasticsearch {
host => "es.server.url"
index => "new_index"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
Use Elasticsearch templates to specify the mapping for the new index and specify the field as a double type.
The easiest way to build a template is to use the existing mapping.
GET oldindex/_mapping
POST _template/templatename
{
"template" : "newindex", // this can be a wildcard pattern to match indexes
"mappings": { // this is copied from the response of the previous call
"mytype": {
"properties": {
"field2": {
"type": "double" // change the type
}
}
}
}
}
POST newindex
GET newindex/_mapping
Then use the elasticsearch _reindex API to move the data from the old index to the new index and parse the field as a double using an inline scripting (you may need to enable inline scripting)
POST _reindex
{
"source": {
"index": "oldindex"
},
"dest": {
"index": "newindex"
},
"script": {
"inline": "ctx._source.field2 = ctx._source.field2.toDouble()"
}
}
Edit: Updated to use _reindex endpoint

No effect of “size” in ‘query’ while reindexing in elasticsearch

I have been using logstash to migrate a index to another. I have recently tried to reindex certain amount of data from large dataset in local environment. So I tried using following configuration for migration:
input{
elasticsearch{
hosts=>"localhost:9200"
index=>"old_indexindex"
query=>'{"query":{"match_all":{}},"size":10 }'
}
}filter{
mutate{
remove_field=>[
"#version",
"#timestamp"
]
}
}output{
elasticsearch{
hosts=>"localhost:9200"
index=>"new_index"
document_type=>"contact"
manage_template=>false
document_id=>"%{contactId}"
}
}
But this reindexes all the documents in old_index to new_index, where as , I was expecting just 10 documents to be reindexed in new_index.
Am I missing some concept using logstash with elasticsearch?
The elasticsearch input doesn't make a conventional search, but does a scan/scroll search type instead. This means that all data will be retrieved from the index and the role of the size parameter just serves to define how much data will be fetched during each scroll, not how much data will be fetched altogether.
Also, note that the size parameter in the query itself has no effect. You need to use the size parameter of the elasticsearch input and not specify it in the query.
input{
elasticsearch{
hosts=> "localhost:9200"
index=> "old_index"
query=> '*'
size => 10 <--- size goes here
}
}
That being said, if you're running ES 2.3 or later, there's a way to achieve what you desire using the Reindex API, like this:
POST /_reindex
{
"size": 10,
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}

Converting nginx access log bytes to number in Kibana4

I would like to create a visualization of the sum of bytes sent using the data from my nginx access logs. When trying to create a "Metric" visualization, I can't use the bytes field as a sum because it is a string type.
And I'm not able to change it under settings.
How do I go about changing this field type to a number/bytes type?
Here is my logstash config for nginx access logs
filter {
if [type] == "nginx-access" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}
Since each logstash index is being created as an index, I'm guess I need to change it here.
I tried adding
mutate {
convert => { "bytes" => "integer" }
}
But it doesn't seem to make a difference.
Field types are configured using mappings, which is configured at the index level and can hardly change. With Logstash, as a new index is created everyday, so if you wan't to change these mappings either wait for the next day or delete the current index if you can.
By default these mappings are generated automatically by Elasticsearch depending on the syntax of the indexed JSON document and the applied Index Templates:
# Type String
{"bytes":"123"}
# Type Integer
{"bytes":123}
In the end there are 2 solutions:
Tune Logstash, to make it generate an integer and let Elasticsearch guess the field type → Use the mutate/convert filter
Tune Elasticsearch, to force the field bytes for the document type nginx-access to be of type integer → Use Index Template:
Index Template API:
PUT _template/logstash-nginx-access
{
"order": 1,
"template": "logstash-*",
"mappings": {
"nginx-access": {
"properties": {
"bytes": {
"type": "integer"
}
}
}
}
}

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources