Convert existing field mapping to geoip - elasticsearch

I have already parsed a log file using logstash and put it into elasticsearch. I have a field called IP and it is mapped as a string now. I want to convert the existing mapping in elasticsearch to geoip without running logstash again. I have few million records in elasticsearch with this field. I want to convert the mapping of IP from string to geoip in all the records.

I'm afraid you still have to use Logstash for this because geoip is a Logstash filter and Elasticsearch doesn't have access to the GeoIP database by itself.
Fear not, though, you won't need to re-run Logstash on the raw log lines, you can simply re-index your ES documents using an elasticsearch input plugin and an elasticsearch output plugin and by tacking the geoip filter inbetween in order to transform the IP field into the geoip one.
Since you can't modify the mapping of your current IP field from string to geo_point, we need to make sure your index is ready to ingest GeoIP data. First check with the following command if your index already contains a geoip field in your mapping (which would have been created by Logstash using its predefined standard logstash-* template).
curl -XGET localhost:9200/logstash-xyz/_mapping
If you see a geoip field in the output of the above command, then you're good to go. Otherwise, we first need to create the geoip field with the type geo_point:
curl -XPUT localhost:9200/logstash-xyz/_mapping/your_type -d '{
"your_type": {
"properties": {
"geoip": {
"type": "object",
"dynamic": true,
"properties": {
"ip": {
"type": "ip",
"doc_values": true
},
"location": {
"type": "geo_point",
"doc_values": true
},
"latitude": {
"type": "float",
"doc_values": true
},
"longitude": {
"type": "float",
"doc_values": true
}
}
}
}
}
}'
Now your mapping is ready to receive GeoIP data. So, next we create a Logstash configuration file called geoip.conf that looks like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "logstash-xyz"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
geoip {
source => "IP" <--- the field containing the IP string
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "logstash-xyz"
document_id => "%{id}"
workers => 1
}
}
And then after setting the correct values (host + index), you can run this with bin/logstash -f geoip.conf. After running this, your documents should contain a new field called geoip with the GeoIP information.
Going forth, I suggest you directly add the geoip filter to your normal logstash configuration.

Related

how to transfer data to elastic via logstast and using analyzer?

I have a logstash config file below. Elastic is reading my data as a b where as i want it to read it as ab i found i need to use not_analyzed for my sscat filed and max_shingle_size , min_shingle_size for products to get the best result.
Should I use not_analyzed for products field as well? Will that give better result?
How should I fill my my_id_analyzer to actually use the analyzer on different fields?
How should I connect the template with logstash config file?
input{
file{
path => "path"
start_position =>"beginning"
}
}
filter{
csv{
separator => ","
columns => ["Index", "Category", "Scat", "Sscat", "Products", "Measure", "Price", "Description", "Gst"]
}
mutate{convert => ["Index", "float"] }
mutate{convert => ["Price", "float"] }
mutate{convert => ["Gst", "float"] }
}
output{
elasticsearch{
hosts => "host"
user => "elastic"
password => "pass"
index => "masterdb"
}
}
I also have a template that can do it for all the future files that i upload
curl user:pass host:"host" /_template/logstash-id -XPUT -d '{
"template": "logstash-*",
"settings" : {
"analysis": {
"analyzer": {
"my_id_analyzer"{
}
}
}
}
},
"mappings": {
"properties" : {
"id" : { "type" : "string", "analyzer" : "my_id_analyzer" }
}
}
}'
You can use "ignore_above:" to restrict to a max length along with "not_analyzed" while creating mapping so that text doesn't get analyzed.
Declaring type as keyword instead of text will be other alternative for you.
Regarding the connecting template with logstash, why you need this? Once you have template created on elasticsearch, you can create your index which will follow the created template definition and you can start indexing.

How to parse date in elasticsearch 5.x and Filebeat

I am using elasticsearch 5.x and Filebeat and want to know if there is a way of parsing date(timestamp) directly in filebeat (don't want to use logstash). I am using json.keys_under_root: true and it works great, but the problem is that timestamp (on us) is recognised as string. All of the other fields were automatically recognised as correct types only this one isn't.
How can I map it as date?
You can use Filebeat with the ES Ingest Node feature to parse your timestamp field and apply the value to the #timestamp field.
You would setup a simple pipeline in Elasticsearch that applies a date to incoming events.
PUT _ingest/pipeline/my-pipeline
{
"description" : "parse timestamp and update #timestamp",
"processors" : [
{
"date" : {
"field" : "timestamp",
"target_field" : "#timestamp"
}
},
{
"remove": {
"field": "timestamp"
}
}
],
"on_failure": [
{
"set": {
"field": "error.message",
"value": "{{ _ingest.on_failure_message }}"
}
}
]
}
Then in Filebeat configure the elasticsearch output to push data to your new pipeline.
output.elasticsearch:
hosts: ["http://localhost:9200"]
pipeline: my-pipeline

Elasticsearch - change field from not_analyzed to analyzed

Is it possible to modify property of an existing field from not_analyzed to analyzed ?
If not, what can I do in order to keep all my documents in store ?
I cannot delete mappings (because then all documents will be gone) and I need that old field as analyzed.
You cannot modify an existing field, however, you can either create another field or add a sub-field to your not_analyzed field.
I'm going for the latter solution. So first, add a new sub-field to your existing field, like this:
curl -XPUT localhost:9200/index/_mapping/type -d '{
"properties": {
"your_field": {
"type": "string",
"index": "not_analyzed",
"fields": {
"sub": {
"type": "string"
}
}
}
}
}'
Above, we've added the sub-field called your_field.sub (which is analyzed) to the existing your_field (which is not_analyzed)
Next, we'll need to populate that new sub-field. If you're running the latest ES 2.3, you can use the powerful Reindex API
curl -XPUT localhost:9200/_reindex -d '{
"source": {
"index": "index"
},
"dest": {
"index": "index"
},
"script": {
"inline": "ctx._source.your_field = ctx._source.your_field"
}
}'
Otherwise, you can simply use the following Logstash configuration which will re-index your data in order to populate the new sub-field
input {
elasticsearch {
hosts => "localhost:9200"
index => "index"
docinfo => true
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{[#metadata][_index]}"
document_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
You can use this... there is something known as multi field type mapping, which allows you to have more than one mapping for a single field, and you can also query based on the field type..

datetime parse in ELK

I am trying to parse log using ELK stack. following is my sample log
2015-12-11 12:05:24+0530 [process] INFO: process 0.24.5 started
I am using the following grok
grok{
match => {"message" => "(?m)%{TIMESTAMP_ISO8601:processdate}\s+\[%{WORD:name}\]\s+%{LOGLEVEL:loglevel}"}
}
and my elastic search mapping is
{
"properties": {
"processdate":{
"type": "date",
"format" : "yyyy-MM-dd HH:mm:ss+SSSS"
},
"name":{"type" : "string"},
"loglevel":{"type" : "string"},
}
}
But while loading into Elastic search i am getting below error,
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [processdate]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2015-12-11 12:05:39+0530\" is malformed at \" 12:05:39+0530\""}}}}, :level=>:warn}
How to modify it to a proper data format? I have added the proper date format in elastic search.
Update: localhost:9200/log
{"log":{"aliases":{},"mappings":{"filelog":{"properties":{"processdate":{"type":"date","format":"yyyy-MM-dd' 'HH:mm:ssZ"},"loglevel":{"type":"string"},"name":{"type":"string"}}}},"settings":{"index":{"creation_date":"1458218007417","number_of_shards":"5","number_of_replicas":"1","uuid":"_7ffuioZS7eGBbFCDMk7cw","version":{"created":"2020099"}}},"warmers":{}}}
The error you're getting means that your date format is wrong. Fix your date format like this, i.e. use Z (timezone) at the end instead of +SSSS (fraction of seconds):
{
"properties": {
"processdate":{
"type": "date",
"format" : "yyyy-MM-dd HH:mm:ssZ"
},
"name":{"type" : "string"},
"loglevel":{"type" : "string"}
}
}
Also, according to our earlier exchange, your elasticsearch output plugin is missing the document_type setting and should be configured like this instead in order to make use of your custom filelog mapping type (otherwise the default logs type is being used and your custom mapping type is not kicking in):
output {
elasticsearch {
hosts => ["172.16.2.204:9200"]
index => "log"
document_type => "filelog"
}
}

upload csv with logstash to elasticsearch with new mappings

I have a csv file which I'm tryng to upload to ES using Logstash. My conf file is as follows:
input {
file {
path => ["filename"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["name1", "name2", "name3", ...]
separator => ","
}
}
filter {
mutate {
remove_field => ["name31", "name32", "name33"]
}
}
output {
stdout{
codec => rubydebug
}
elasticsearch {
action => "index"
host => "localhost"
index => "newindex"
template_overwrite => true
document_type => "newdoc"
template => "template.json"
}
}
My template file looks like the following:
{
"mappings": {
"newdoc": {
"properties": {
"name1": {
"type": "integer"
},
"name2": {
"type": "float"
},
"name3": {
"format": "dateOptionalTime",
"type": "date"
},
"name4": {
"index": "not_analyzed",
"type": "string"
},
....
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"template": "newindex"
}
When I try to overwrite the default mapping, I get an 400 error even when I only try to write one line:
failed action with response of 400, dropping action: ["index", + ...
What can be the problem? Everything works fine if I don't overwrite the mapping but that is not a solution for me. I'm using Logstash 1.5.1 and Elasticsearch 1.5.0 on Red Hat.
Thanks
You should POST your request 'mapping' to elasticsearch before loading data in elasticsearch
POST mapping
You don't need to create the index before running logstash , It does create the index if you haven't yet , but it's better to create your own mapping before runing your conf file with logstash . Gives you more control over your field types etc.. Here is a simple tutorial on how to import csv to elasticsearch using logstash : http://freefilesdl.com/how-to-connect-logstash-to-elasticsearch-output

Resources