How to parse date in elasticsearch 5.x and Filebeat - elasticsearch

I am using elasticsearch 5.x and Filebeat and want to know if there is a way of parsing date(timestamp) directly in filebeat (don't want to use logstash). I am using json.keys_under_root: true and it works great, but the problem is that timestamp (on us) is recognised as string. All of the other fields were automatically recognised as correct types only this one isn't.
How can I map it as date?

You can use Filebeat with the ES Ingest Node feature to parse your timestamp field and apply the value to the #timestamp field.
You would setup a simple pipeline in Elasticsearch that applies a date to incoming events.
PUT _ingest/pipeline/my-pipeline
{
"description" : "parse timestamp and update #timestamp",
"processors" : [
{
"date" : {
"field" : "timestamp",
"target_field" : "#timestamp"
}
},
{
"remove": {
"field": "timestamp"
}
}
],
"on_failure": [
{
"set": {
"field": "error.message",
"value": "{{ _ingest.on_failure_message }}"
}
}
]
}
Then in Filebeat configure the elasticsearch output to push data to your new pipeline.
output.elasticsearch:
hosts: ["http://localhost:9200"]
pipeline: my-pipeline

Related

Combine two index into third index in elastic search using logstash

I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.

ELASTICSEARCH - Include date automatically without a predefined date field

It is possible to include a "date and time" field in a document that receives elasticsearch without it being previously defined.
The date and time corresponds to the one received by the json to elasticsearch
This is the mapping:
{
"mappings": {
"properties": {
"entries":{"type": "nested"
}
}
}
}
Is it possible that it can be defined in the mapping field so that elasticsearch includes the current date automatically?
What you can do is to define an ingest pipeline to automatically add a date field when your document are indexed.
First, create a pipeline, like this (_ingest.timestamp is a built-in field that you can access):
PUT _ingest/pipeline/add-current-time
{
"description" : "automatically add the current time to the documents",
"processors" : [
{
"set" : {
"field": "#timestamp",
"value": "_ingest.timestamp"
}
}
]
}
Then when you index a new document, you need to reference the pipeline, like this:
PUT test-index/_doc/1?pipeline=add-current-time
{
"my_field": "test"
}
After indexing, the document would look like this:
GET test-index/_doc/1
=>
{
"#timestamp": "2020-08-12T15:48:00.000Z",
"my_field": "test"
}
UPDATE:
Since you're using index templates, it's even easier because you can define a default pipeline to be run for each indexed documents.
In your index templates, you need to add this to the index settings:
{
"order": 1,
"index_patterns": [
"attom"
],
"aliases": {},
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1",
"default_pipeline": "add-current-time" <--- add this
}
},
...
Then you can keep indexing documents without referencing the pipeline, it will be automatic.
"value": "{{{_ingest.timestamp}}}"
Source

datetime parse in ELK

I am trying to parse log using ELK stack. following is my sample log
2015-12-11 12:05:24+0530 [process] INFO: process 0.24.5 started
I am using the following grok
grok{
match => {"message" => "(?m)%{TIMESTAMP_ISO8601:processdate}\s+\[%{WORD:name}\]\s+%{LOGLEVEL:loglevel}"}
}
and my elastic search mapping is
{
"properties": {
"processdate":{
"type": "date",
"format" : "yyyy-MM-dd HH:mm:ss+SSSS"
},
"name":{"type" : "string"},
"loglevel":{"type" : "string"},
}
}
But while loading into Elastic search i am getting below error,
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [processdate]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2015-12-11 12:05:39+0530\" is malformed at \" 12:05:39+0530\""}}}}, :level=>:warn}
How to modify it to a proper data format? I have added the proper date format in elastic search.
Update: localhost:9200/log
{"log":{"aliases":{},"mappings":{"filelog":{"properties":{"processdate":{"type":"date","format":"yyyy-MM-dd' 'HH:mm:ssZ"},"loglevel":{"type":"string"},"name":{"type":"string"}}}},"settings":{"index":{"creation_date":"1458218007417","number_of_shards":"5","number_of_replicas":"1","uuid":"_7ffuioZS7eGBbFCDMk7cw","version":{"created":"2020099"}}},"warmers":{}}}
The error you're getting means that your date format is wrong. Fix your date format like this, i.e. use Z (timezone) at the end instead of +SSSS (fraction of seconds):
{
"properties": {
"processdate":{
"type": "date",
"format" : "yyyy-MM-dd HH:mm:ssZ"
},
"name":{"type" : "string"},
"loglevel":{"type" : "string"}
}
}
Also, according to our earlier exchange, your elasticsearch output plugin is missing the document_type setting and should be configured like this instead in order to make use of your custom filelog mapping type (otherwise the default logs type is being used and your custom mapping type is not kicking in):
output {
elasticsearch {
hosts => ["172.16.2.204:9200"]
index => "log"
document_type => "filelog"
}
}

Dump from Mysql to elastic

We are using logstash to dump data from mysql to elastic search. I am trying to dump list of all payments against a userId(this will be my _id for the _type)
Elastic mapping looks like this
{
"Users": {
"properties" :{
"userId" : {
"type" : "long"
},
"payment" :
{
"properties":{
"paymentId": {
"type": "long"
}
}
}
}
The sql table has userId, paymentId.
Which filter should i user to get the json output that i can feed to elastic search
Use jdbc Logstash input plugin.

Convert existing field mapping to geoip

I have already parsed a log file using logstash and put it into elasticsearch. I have a field called IP and it is mapped as a string now. I want to convert the existing mapping in elasticsearch to geoip without running logstash again. I have few million records in elasticsearch with this field. I want to convert the mapping of IP from string to geoip in all the records.
I'm afraid you still have to use Logstash for this because geoip is a Logstash filter and Elasticsearch doesn't have access to the GeoIP database by itself.
Fear not, though, you won't need to re-run Logstash on the raw log lines, you can simply re-index your ES documents using an elasticsearch input plugin and an elasticsearch output plugin and by tacking the geoip filter inbetween in order to transform the IP field into the geoip one.
Since you can't modify the mapping of your current IP field from string to geo_point, we need to make sure your index is ready to ingest GeoIP data. First check with the following command if your index already contains a geoip field in your mapping (which would have been created by Logstash using its predefined standard logstash-* template).
curl -XGET localhost:9200/logstash-xyz/_mapping
If you see a geoip field in the output of the above command, then you're good to go. Otherwise, we first need to create the geoip field with the type geo_point:
curl -XPUT localhost:9200/logstash-xyz/_mapping/your_type -d '{
"your_type": {
"properties": {
"geoip": {
"type": "object",
"dynamic": true,
"properties": {
"ip": {
"type": "ip",
"doc_values": true
},
"location": {
"type": "geo_point",
"doc_values": true
},
"latitude": {
"type": "float",
"doc_values": true
},
"longitude": {
"type": "float",
"doc_values": true
}
}
}
}
}
}'
Now your mapping is ready to receive GeoIP data. So, next we create a Logstash configuration file called geoip.conf that looks like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "logstash-xyz"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
geoip {
source => "IP" <--- the field containing the IP string
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "logstash-xyz"
document_id => "%{id}"
workers => 1
}
}
And then after setting the correct values (host + index), you can run this with bin/logstash -f geoip.conf. After running this, your documents should contain a new field called geoip with the GeoIP information.
Going forth, I suggest you directly add the geoip filter to your normal logstash configuration.

Resources