Elasticsearch treats strings as date - elasticsearch

I'm trying to use elasticsearch in pair with logstash and use it for storing my exim logs.
Particularly, I would like to extract message id field from log files to simplify searching on it:
grok {
match => [
"#message", "%{DATE} %{TIME} %{HOSTNAME:msgid} %{GREEDYDATA:details}"
]
}
mutate {
gsub => [
"msgid","[\\\:-]",""
]
}
As elasticsearch tries to parse as Date every string, that contains symbols like : / or -, I'm replacing them with mutate filter.
Unfortunately, even filtered msg id is not accepted by elasticsearch and the question is why?
[2013-12-24 21:32:32,823][DEBUG][action.bulk ] [Piledriver]
[logstash 2013.12.24][0] failed to execute bulk item (index) index
{[logstash-2013.12.24][exim][_7-j53yZRzmARuYsJEfgIA],
source[{"message":"<22>Dec 24 21:32:31 host exim[15691]:
2013-12-24 21:32:31 1VvWmN-000453-Fz Completed",
"#version":"1",
"#timestamp":"2013-12-24T21:32:31.000+03:00",
"type":"exim",
"host":"192.168.169.228",
"syslog_pri":"22",
"syslog_program":"exim",
"syslog_pid":"15691",
"received_at":"2013-12-24 18:32:31 UTC",
"received_from":"192.168.169.228",
"syslog_severity_code":6,
"syslog_facility_code":2,
"syslog_facility":"mail",
"syslog_severity":"informational",
"#source_host":"host",
"#message":"2013-12-24 21:32:31 1VvWmN-000453-Fz Completed",
"msgid":"1VvWmN000453Fz"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [msgid]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:613)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:466)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:516)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:460)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:353)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:402)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:156)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [1VvWmN000453Fz], tried both date format [dateOptionalTime], and timestamp number with locale []
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:487)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:424)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:194)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:390)
... 12 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "1VvWmN000453Fz" is malformed at "VvWmN000453Fz"
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:481)
... 15 more

Please share your elasticsearch Mapping if you have applied it on your index.
If not then share the default mapping that is getting created once you create the index.
Also you could try giving default mapping as String to msgid

Related

what is the reason for this logstash error("error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper)

bellow is my filebeat config and I added a logId :
- type: log
fields:
source: 'filebeat2'
logID: debugger
fields_under_root: true
enabled: true
paths:
- /var/log/path/*
and below is my output section of logstash conf :
if "debugger" in [logID] and ("" not in [Exeption]) {
elasticsearch {
user => ""
password => ""
hosts => ["https://ip:9200"]
index => "debugger"
}
}
and I put some log files in path(10 files) and I randomely got this error in logstash-plain.log :
{"index"=>{"_index"=>"debugger", "_type"=>"_doc", "_id"=>"9-DmvoIBPs8quoIM7hCa",
"status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper
[request.dubugeDate] cannot be changed from type [text] to [long]"}}}}
and also this :
"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field
[debug.registrationDate] of type [long] in document with id 'Bt_YvoIBPs8quoIMXfwd'.
Preview of field's value: '2022-08-1707:37:08.256'", "caused_by"=>
{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"2022-08-
1707:37:08.256\""}}}}}
can anybody help me ?
Look like, in the first case, in the index mapping, your field request.dubugeDate defined as long, and you try to ingest some string data.
In the second case the field debug.registrationDate find mapping, defined as long, and you try to ingest string (date).
You can check the mapping of your index with GET /YOUR_INDEX/_mapping command from the Kibana or same via curl

How to resolve parsing error for CSV file in Logstash

I am using Filebeat to send a CSV file to Logstash and then up to Kibana, however I am getting a parsing error when the CSV file is picked up by Logstash.
This is the contents of the CSV file:
time version id score type
May 6, 2020 # 11:29:59.863 1 2 PPy_6XEBuZH417wO9uVe _doc
The logstash.conf:
input {
beats {
port => 5044
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","index","score","type"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
Filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /etc/test/*.csv
#- c:\programdata\elasticsearch\logs\*
and the error in Logstash:
[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 # 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
I do get some data in Kibana but not what I want to see.
I have managed to get it to work locally. the mistakes I have noticed so far were:
Using ES reserved fields like #timestamp, #version, and more.
The timestamp was not in ISO8601 format. It had an # sign in the middle.
Your filter set the separator to , but your CSV real separator is "\t".
According to the error you can see it is trying to also work on your titles line, I suggest you remove it from the CSV or use the skip_header option.
Below is the logstash.conf file I used:
input {
file {
path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","score","type"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "csv-test"
}
}
The CSV file I used:
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
From my Kibana:

Grok filter for logstash to match a specific value from a log file

I have the following log:
2018-10-30 11:47:52 INFO 30464 SMS-MT [cid:300038] [queue-msgid:bb7a195d-fb23-42ae-bbfa-d2dcda405af9] [smpp-msgid:j.11082.639364178944.#MARKET SETU] [status:ESME_ROK] [prio:1] [dlr:NO_SMSC_DELIVERY_RECEIPT_REQUESTED] [validity:none] [from:2323232] [to:23232132312] [content:'#MARKET SETUP\nadsadadadadasdasdadaasdada mo ang:\nC jean_rivera\n--Mag reply ng A-C']
I've created a grok filter based on pattern in logstash so I can parse the log the way I want. And I have this:
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level} %{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} %{CID:CID} %{GREEDYDATA:Message}
I'm trying to create a GROK patter that will match 300038, which is the number coming after cid:. The syntax is always the same, [cid:number]. What I have now is:
CID (\[cid:[0-9]{6}\])
but that results into:
"CID": [
[
"[cid:300038]"
]
],
and I only want to match the 300038, without the [cid:] part
I have noticed that there are more than single space character between LOG and pid, you can match all of them using \s*.
To match just a number from [cid:300038] you can use custom pattern, \[cid:(?<CID>[0-9]{1,})\] this will match cid of any length, not just 6 digits.
Your pattern will become,
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level}\s*%{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} \[cid:(?<CID>[0-9]{1,})\] %{GREEDYDATA:Message}
Use
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level} %{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} \[cid:(?<CID>[0-9]{6})\] %{GREEDYDATA:Message}

Logstash Grok Parser not working for error logs

I am trying to parse error logs using Logstash to capture few fields especially errormessage. But unable to capture errormessage in Logstash. Below is the actual error message and parser which I wrote
12345 http://google.com 2017-04-17 09:02:43.065 ERROR 10479 --- [http-nio-8052-exec-2] com.utilities.TokenUtils : Error
org.xml.SAXParseException: An invalid XML character (Unicode: 0xe) was found in the value of attribute "ID" and element is "saml".
at org.apache.parsers.DOMParser.parse(Unknown Source)
at org.apache.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at com.utilities.TokenUtils.validateSignature(TokenUtils.java:99)
Parser:
`%{NOTSPACE:stnum}\s*%{NOTSPACE:requestURL}\s*%{TIMESTAMP_ISO8601:log_timestamp}\s*%{LOGLEVEL:loglevel}\s*%{NUMBER:pid}\s*---\s*\[(?<thread>[A-Za-z0-9-]+)\]\s*%{DATA:class}\s*:\s%{NOTSPACE:level}\s*(?<errormessage>.[^\n]*).[^\n]*`
I am trying to capture this message from the log:
org.xml.SAXParseException: An invalid XML character (Unicode: 0xe) was found in the value of attribute "ID" and element is "saml".
Which logstash parser you are using? Please provide while conf file which can give us more info. Here's the sample to parse exception type from your logs (Using grok filter).
filter {
grok {
match => ["message", "%{DATA:errormessage} %{GREEDYDATA:EXTRA}"]
}
}

fluent-plugin-elasticsearch: "Could not push log to Elasticsearch" error with "error"=>{"type"=>"mapper_parsing_exception"}

When I am injecting data collected by Fluentd to Elasticsearch using fluent-plugin-elasticsearch, some data caused the following error:
2017-04-09 23:47:37 +0900 [error]: Could not push log to Elasticsearch: {"took"=>3, "errors"=>true, "items"=>[{"index"=>{"_index"=>"logstash-201704", "_type"=>"ruby", "_id"=>"AVtTLz_cUzkwT9CQCxrH", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [message]", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:27"}}}}, .....]}
It seems that elasticsearch banned the data for error failed to parse [message] and Can't get text on a START_OBJECT at 1:27. but I cannot see what data is sent to Elasticsearch and what's wrong.
Any ideas?
fluent-plugin-elasticsearch uses _bulk API to sending data. I put the request-dumping code on /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-5.0.4/lib/elasticsearch/api/actions/bulk.rb as following:
def bulk(arguments={})
...
payload = body
end
$log.info([method, path, params, payload].inspect) # <=== here ($log is global logger of fluentd)
perform_request(method, path, params, payload).body
And I found the request sent to Elasticsearch was as following:
POST /_bulk
{"index":{"_index":"logstash-201704","_type":"ruby"}}
{"level":"INFO","message":{"status":200,"time":{"total":46.26,"db":33.88,"view":12.38},"method":"PUT","path":"filtered","params":{"time":3815.904,"chapter_index":0},"response":[{}]},"node":"main","time":"2017-04-09T14:39:06UTC","tag":"filtered.console","#timestamp":"2017-04-09T23:39:06+09:00"}
The problem is message field contains JSON object, although this field is mapped as analyzed string on Elasticsearch.

Resources