I have the following CSV file
tstp,voltage_A_real,voltage_B_real,voltage_C_real #header not present in actual file
2000-01-01 00:00:00,2535.53,-1065.7,-575.754
2000-01-01 01:00:00,2528.31,-1068.67,-576.866
2000-01-01 02:00:00,2528.76,-1068.49,-576.796
2000-01-01 03:00:00,2530.12,-1067.93,-576.586
2000-01-01 04:00:00,2531.02,-1067.56,-576.446
2000-01-01 05:00:00,2533.28,-1066.63,-576.099
2000-01-01 06:00:00,2535.53,-1065.7,-575.754
2000-01-01 07:00:00,2535.53,-1065.7,-575.754
....
I am trying to insert the data into elasticsearch through logstash and have the following logstash config
input {
file {
path => "path_to_csv_file"
sincedb_path=> "/dev/null"
start_position => beginning
}
}
filter {
csv {
columns => [
"tstp",
"Voltage_A_real",
"Voltage_B_real",
"Voltage_C_real"
]
separator => ","
}
date {
match => [ "tstp", "yyyy-MM-dd HH:mm:ss"]
}
mutate {
convert => ["Voltage_A_real", "float"]
convert => ["Voltage_B_real", "float"]
convert => ["Voltage_C_real", "float"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "temp_load_index"
}
}
My output from rubydebug when I run logstash -f conf_file -v is
{
"message" => "2000-02-18 16:00:00,2532.38,-1067,-576.238",
"#version" => "1",
"#timestamp" => "2000-02-18T21:00:00.000Z",
"path" => "path_to_csv",
"host" => "myhost",
"tstp" => "2000-02-18 16:00:00",
"Voltage_A_real" => 2532.38,
"Voltage_B_real" => -1067.0,
"Voltage_C_real" => -576.238
}
However I see only 2 events in kibana when I look at the dashboard and both have the current datetime stamp and not that of the year 2000 which is the range of my data. Could someone please help me figure out what is happening?
A sample kibana object is as follows
{
"_index": "temp_load_index",
"_type": "logs",
"_id": "myid",
"_score": null,
"_source": {
"message": "2000-04-02 02:00:00,2528.76,-1068.49,-576.796",
"#version": "1",
"#timestamp": "2016-09-27T05:15:29.753Z",
"path": "path_to_csv",
"host": "myhost",
"tstp": "2000-04-02 02:00:00",
"Voltage_A_real": 2528.76,
"Voltage_B_real": -1068.49,
"Voltage_C_real": -576.796,
"tags": [
"_dateparsefailure"
]
},
"fields": {
"#timestamp": [
1474953329753
]
},
"sort": [
1474953329753
]
}
When you open Kibana, it usually show you only events in the last 15 min, according to the #timestamp field. So you need to set the time filter to the appropriate time range (cf documentation), in your case, using the absolute option and starting 2000-01-01.
Or you can put the parsed timestamp in another field (for example original_tst), so that the #timestamp added by Logstash will be kept.
date {
match => [ "tstp", "yyyy-MM-dd HH:mm:ss"]
target => "original_tst"
}
Related
I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?
i been told that, by using logstash pipeline i can re-create a log format(i.e JSON) when entering into elasticsearch. but not understanding how to do it .
current LOGStash Configure ( I took bellow from Google , not for any perticular reason)
/etc/logstash/conf.d/metrics-pipeline.conf
input {
beats {
port => 5044
client_inactivity_timeout => "3600"
}
}
filter {
if [message] =~ />/ {
dissect {
mapping => {
"message" => "%{start_of_message}>%{content}"
}
}
kv {
source => "content"
value_split => ":"
field_split => ","
trim_key => "\[\]"
trim_value => "\[\]"
target => "message"
}
mutate {
remove_field => ["content","start_of_message"]
}
}
}
filter {
if [system][process] {
if [system][process][cmdline] {
grok {
match => {
"[system][process][cmdline]" => "^%{PATH:[system][process][cmdline_path]}"
}
remove_field => "[system][process][cmdline]"
}
}
}
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => "1.2.1.1:9200"
manage_template => false
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
I have couple of log file located at
/root/logs/File.log
/root/logs/File2.log
the Log format there is :
08:26:51,753 DEBUG [ABC] (default-threads - 78) (1.2.3.4)(368)>[TIMESTAMP:Wed Sep 11 08:26:51 UTC 2019],[IMEI:03537],[COMMAND:INFO],[GPS STATUS:true],[INFO:true],[SIGNAL:false],[ENGINE:0],[DOOR:0],[LON:90.43],[LAT:23],[SPEED:0.0],[HEADING:192.0],[BATTERY:100.0%],[CHARGING:1],[O&E:CONNECTED],[GSM_SIGNAL:100],[GPS_SATS:5],[GPS POS:true],[FUEL:0.0V/0.0%],[ALARM:NONE][SERIAL:01EE]
in Kibana by default it shows like ethis
https://imgshare.io/image/stackflow.I0u7S
https://imgshare.io/image/jsonlog.IHQhp
"message": "21:33:42,004 DEBUG [LOG] (default-threads - 100) (1.2.3.4)(410)>[TIMESTAMP:Sat Sep 07 21:33:42 UTC 2019],[TEST:123456],[CMD:INFO],[STATUS:true],[INFO:true],[SIGNAL:false],[ABC:0],[DEF:0],[GHK:1111],[SERIAL:0006]"
but i want to get it like bellow :-
"message": {
"TIMESTAMP": "Sat Sep 07 21:33:42 UTC 2019",
"TEST": "123456",
"CMD":INFO,
"STATUS":true,
"INFO":true,
"SIGNAL":false,
"ABC":0,
"DEF":0,
"GHK":0,
"GHK":1111
}
Can this be done ? if yes how ?
Thanks
With the if [message] =~ />/, the filters will only apply to messages containing a >. The dissect filter will split the message between the >. The kv filter will apply a key-value transformation on the second part of the message, removing the []. The mutate.remove_field remove any extra field.
filter {
if [message] =~ />/ {
dissect {
mapping => {
"message" => "%{start_of_message}>%{content}"
}
}
kv {
source => "content"
value_split => ":"
field_split => ","
trim_key => "\[\]"
trim_value => "\[\]"
target => "message"
}
mutate {
remove_field => ["content","start_of_message"]
}
}
}
Result, using the provided log line:
{
"#version": "1",
"host": "YOUR_MACHINE_NAME",
"message": {
"DEF": "0",
"TIMESTAMP": "Sat Sep 07 21:33:42 UTC 2019",
"TEST": "123456",
"CMD": "INFO",
"SERIAL": "0006]\r",
"GHK": "1111",
"INFO": "true",
"STATUS": "true",
"ABC": "0",
"SIGNAL": "false"
},
"#timestamp": "2019-09-10T09:21:16.422Z"
}
In addition to doing the filtering with if [message] =~ />/, you can also do the comparison on the path field, which is set by the file input plugin. Also if you have multiple file inputs, you can set the type field and use this one, see https://stackoverflow.com/a/20562031/6113627.
I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}
My data has date in the format yyyy-MM-dd ex : "2015-10-12"
My logstash date filter is as below
input {
file {
path => "/etc/logstash/immport.csv"
codec => multiline {
pattern => "^S*"
negate => true
what => "previous"
}
start_position => "beginning"
}
}
filter {
csv {
separator => ","
autodetect_column_names => true
skip_empty_columns => true
}
date {
match => ["start_date", "yyyy-MM-dd"]
target => "start_date"
}
mutate {
rename => {"start_date" => "[study][startDate]"}
}
}
output {
elasticsearch {
action => "index"
hosts => ["elasticsearch-5-6:9200"]
index => "immport12"
document_type => "dataset"
template => "/etc/logstash/immport-mapping.json"
template_name => "mapping_template"
template_overwrite => true
}
stdout { codec => rubydebug }
}
However, my es instance is not able to parse it and I'm getting following error
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [study.startDate]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2012-04-17T00:00:00.000Z\" is malformed at \"T00:00:00.000Z\""}}}}}
Sample Data Row
][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"immport_2017_12_02", :_type=>"dataset", :_routing=>nil}, 2017-12-20T08:55:45.367Z 878192e51991 SDY816,HEPSV_COHORT: Participants that received Heplisav,,,2012-04-17,,10.0,Systems Biology Analysis of the response to Licensed Hepatitis B Vaccine (HEPLISAV) in specific cell subsets (see companion studies SDY299 and SDY690),Interventional,http://www.immport.org/immport-open/public/study/study/displayStudyDetail/SDY816,,Interventional,Vaccine Response,Homo sapiens,Cell,DNA microarray], :response=>{"index"=>{"_index"=>"immport_2017_12_02", "_type"=>"dataset", "_id"=>"AWBzIsBPov62ZQtaldxQ", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [study.startDate]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2012-04-17T00:00:00.000Z\" is malformed at \"T00:00:00.000Z\""}}}}}
I want my logstash to output date in this format yyyy-MM-dd without timestamp
Mapping template
"startDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
I tried this on my machine with reference to your logstash conf file and it worked fine.
My Logstash conf file :
input {
file {
path => "D:\testdata\stack.csv"
codec => multiline {
pattern => "^S*"
negate => true
what => "previous"
}
start_position => "beginning"
}
}
filter {
csv {
separator => ","
autodetect_column_names => true
skip_empty_columns => true
}
date {
match => ["dob", "yyyy-MM-dd"]
target => "dob"
}
mutate {
rename => {"dob" => "[study][dob]"}
}
}
output {
elasticsearch {
action => "index"
hosts => ["localhost:9200"]
index => "stack"
}
stdout { codec => rubydebug }
}
CSV file :
id,name,rollno,dob,age,gender,comments
1,hatim,88,1992-07-30,25,male,qsdsdadasd asdas das dasd asd asd asd as dd sa d
2,hatim,89,1992-07-30,25,male,qsdsdadasd asdas das dasd asd asd asd as dd sa d
Elasticsearch document after indexing :
{
"_index": "stack",
"_type": "doc",
"_id": "wuBTeGABQ7gwBQSQTX1q",
"_score": 1,
"_source": {
"path": """D:\testdata\stack.csv""",
"study": {
"dob": "1992-07-29T18:30:00.000Z"
},
"#timestamp": "2017-12-21T09:06:52.465Z",
"comments": "qsdsdadasd asdas das dasd asd asd asd as dd sa d",
"gender": "male",
"#version": "1",
"host": "INMUCHPC03284",
"name": "hatim",
"rollno": "88",
"id": "1",
"message": "1,hatim,88,1992-07-30,25,male,qsdsdadasd asdas das dasd asd asd asd as dd sa d\r",
"age": "25"
}
}
And everything worked perfectly. See if this example might help you with something.
The issue was I changed the logstash mapping template name to new name, I didn't delete the old template file and hence the index was still pointing to old template file
once I deleted the old template file
curl -XDELETE 'http://localhost:9200/_templates/test_template'
it worked, so whenever we are using new template it's required to delete old template and then process records
Im trying to index some data from file to ElasticSearch by using Logstash.
If I'm not using the Date filter in order to replace the #timestamp everything works very well, but when in using the filter I do not get all the data.
I can't figure out why there is a difference between the Logstash command line and Elasticsearch in the #timestamp value.
Logstash conf
filter {
mutate {
replace => {
"type" => "dashboard_a"
}
}
grok {
match => [ "message", "%{DATESTAMP:Logdate} \[%{WORD:Severity}\] %{JAVACLASS:Class} %{GREEDYDATA:Stack}" ]
}
date {
match => [ "Logdate", "dd-MM-yyyy hh:mm:ss,SSS" ]
}
}
Logstash Command line trace
{
**"#timestamp" => "2014-08-26T08:16:18.021Z",**
"message" => "26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version" => "1",
"host" => "bts10d1",
"path" => "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type" => "dashboard_a",
"Logdate" => "26-08-2014 11:16:18,021",
"Severity" => "DEBUG",
"Class" => "com.fnx.snapshot.mdb.SnapshotMDB",
"Stack" => " - SnapshotMDB Ctor is called\r"
}
ElasticSearch result
{
"_index": "logstash-2014.08.28",
"_type": "dashboard_a",
"_id": "-y23oNeLQs2mMbyz6oRyew",
"_score": 1,
"_source": {
**"#timestamp": "2014-08-28T14:31:38.753Z",
**"message": "15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version": "1",
"host": "bts10d1",
"path": "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type": "dashboard_a",
"tags": ["_grokparsefailure"]
}
}
Please make sure all your logs is in format!
You can see in the logstash command line trace the logs is
26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r
But, in the elastsicsearch the log is
15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
Two logs have different time and their format are not same! The second one do not have any information about daytime, therefore it will cause the grok filter parsing error. You can go to check the origin logs. Or can you provide the origin logs sample for more discussion if all of them are in format!