Why parse failure on Log-stash occurs while field type is the same as before with no change? - elasticsearch

The logstash log file says:
"tags"=>["_grokparsefailure"]}, "status_code"]}>, #data={"#version"=>"1", "#timestamp"=>"2016-09-24T08:00:54.894Z", "path"=>"/var/log/nginx/access.log", "host"=>"sample-com", "remote_addr"=>"127.0.0.1", "remote_user"=>"-", "date"=>"05/Sep/2016:10:03:01 +0000", "method"=>"GET", "uri_path"=>"/accounts", "version"=>"HTTP/1.1", "status_code"=>"200", "body_byte_sent"=>419, "referer"=>"-", "user_agent"=>"python-requests/2.4.3 CPython/2.7.9 Linux/3.16.0-4-amd64", "request_time"=>6.161, "auth_type"=>"Bearer", "client_id"=>"beta",
"web_client_ip"=>"172.*.131.177", "response_json"=>{"_links"=>{"applications"=>{"href"=>"/applications"}, "menus"=>{"href"=>"/menus"}, "messages"=>{"href"=>"/messages"}, "numbers"=>{"href"=>"/numbers"}, "self"=>{"href"=>"/accounts"}}, "account_status"=>"active", "creation_date"=>"2016-06-07 09:25:18", "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, "currency"=>"usd"}, "email"=>"*#gmail.com",
"id"=>"677756yt7557", "lastname"=>"Qurbani", "name"=>"M", "notifications"=>{"black_list"=>{"uids"=>[]}, "settings"=>{"email"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "language"=>"en", "push_notif"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "sms"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}}}, "phone"=>"+9****", "status"=>"inactive", "verification_status"=>{"email"=>"unverified", "phone"=>"verified"}}, "request_json"=>{}, "tags"=>["_grokparsefailure"]}, #metadata_accessors=#<LogStash::Util::Accessors:0x6ec6acbe #store={"path"=>"/var/log/nginx/access.log"}, #lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.24", "_type"=>"logs", "_id"=>"AVdbNisZCijYhuqEamFy", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [response_json.credit]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [balance]"}}}}, :level=>:warn}
Here I have a log like below in credit section:
"credit": {"balance": 0.0, "currency": "usd"}
I have removed all the indices from Elasticsearch, and I didn't found any .sincedb* in home or elsewhere to remove logstash DB.
Why this error happens when I don't actually have a change in balance value? What is the reason for that?
After restarting Logstash it does not aggregate data from log files!
I removed all since_dbs_* from /var/lib/logstash/ and said to start tailing from the beginning position in Logstash configuration.
Now the below error is raised:
object mapping for [response_json.credit] tried to parse field [credit] as object, but found a concrete value
It seems that sometimes credit is sent as a scalar value and sometimes as an object with two fields!
EDIT1:
2 different credit fields with different data has been posted to one credit in Elasticsearch. So I tried to rename these fields and remove the credit from both configs in logstash, so for now I have:
add_field => {"first_credit" => "%{[response_json.credit]}"}
remove_field => ["response_json.credit"]
New fields get added, but the value is literally %{[response_json.credit]} and field is not removed so error happens again. I want to get the value of credit and put it inside of first credit and remove the credit itself. I even tried the below:
add_field => {"first_credit" => "%{[response_json][credit]}"}
remove_field => ["response_json.credit"]
What I'm doing wrong?
EDIT:2
I have noticed that one file access.log has a credit field with different values.
One credit is numeric: 2.99
The other credit is a JSON: {"currency": "usd", "balance": 2.99}
I used the below logstash configuration to solve the problem and save them all as a string in ES:
if ([response_json][credit]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}
It gives the below error:
"new_credit"=>"{\"balance\":3.102,\"currency\":\"usd\"}", "tags"=>["_grokparsefailure"]},
#metadata_accessors=#<LogStash::Util::Accessors:0x46761362 #store={"path"=>"/var/log/nginx/access.log.1"},
#lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log.1"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.27", "_type"=>"logs", "_id"=>"AVdqrION3CJVjhZgZcnl", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [new_credit]", "caused_by"=>{"type"=>"number_format_exception", "reason"=>"For input string: \"{\"balance\":3.102,\"currency\":\"usd\"}\""}}}}, :level=>:warn

From looking at your log, "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, I think this issue may be related.
If you check the Elasticsearch mapping of your index at {elasticsearch:ip}:9200/logstash-api-2016.09.24/_mapping, I bet that the balance field has an Integer mapping. If there was initially an integer mapping, any value that is not an integer (for example, an object) will fail.
You can resolve this by creating an index template that specifies balance as a float. If you choose to do this, ensure that you delete the old index or create a new one, as existing mappings cannot be modified.
You could also ensure that balance is always the same data type in the source of the logs.
Or you could add a mutate filter and convert the balance field to your desired data type.
Check out your mapping and let me know if my theory is right. :)
EDIT:
The code block you just sent me will have exactly the same problem as before - object credit and int credit will be stored in the same field. The following will store credit[balance] (an int) and int credit in the same field called new_credit, which should be mapped to an Integer.
if ([response_json][credit][balance]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit][balance}"}
remove_field => [ "[response_json][credit]" ]
}
}
else {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}

Related

Add extra value to field before sending to elasticsearch

I'm using logstash, filebeat and grok to send data from logs to my elastisearch instance. This is the grok configuration in the pipe
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:messageDate} %{GREEDYDATA:messagge}"
}
}
}
This works fine, the issue is that messageDate is in this format Jan 15 11:18:25 and it doesn't have a year entry.
Now, i actually know the year these files were created in and i was wondering if it is possible to add the value to the field during the process, that is, somehow turn Jan 15 11:18:25 into 2016 Jan 15 11:18:25 before sending to elasticsearch (obviously without editing the files, which i could do and even with ease but it'll be a temporary fix to what i have to do and not a definitive solution)
I have tried googling if it was possible but no luck...
Valepu,
The only way to modify the data from a field is using the ruby filter:
filter {
ruby {
code => "#your code here#"
}
}
For more information like...how to get,set field values, here is the link:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html
If you have a separate field for date as a string, you can use logstash date plugin:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
If you don't have it as a separate field (as in this case) use this site to construct your own grok pattern:
http://grokconstructor.appspot.com/do/match
I made this to preprocess the values:
%{YEAR:yearVal} %{MONTH:monthVal} %{NUMBER:dayVal} %{TIME:timeVal} %{GREEDYDATA:message}
Not the most elegant I guess, but you get the values in different fields. Using this you can create your own date field and parse it with date filter so you will get a comparable value or you can use these fields by themselves. I'm sure there is a better solution, for example you could make your own grok pattern and use that, but I'm gonna leave some exploration for you too. :)
By reading thoroughly the grok documentation i found what google couldn't find for me and which i apparently missed the first time i read that page
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-add_field
Using the add_field and remove_field options i managed to add the year to my date, then i used the date plugin to send it to logstash as a timestamp. My filter configuration now looks like this
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:tMessageDate} %{GREEDYDATA:messagge}"
add_field => { "messageDate" => "2016 %{tMessageDate}" }
remove_field => ["tMessageDate"]
}
}
date {
match => [ "messageDate", "YYYY MMM dd HH:mm:ss"]
}
}
And it worked fine

elastic stack : i need set Time Filter field name with another field

i need read messages(content is logs) from rabbitMq by logstash and then send that to elasticsearch for make visualize monitoring in kibana. so i wrote input for read from rabbitmq in logstash like this:
input {
rabbitmq {
queue => "testLogstash"
host => "localhost"
}
}
and i wrote output configuration for store in elasticsearch in logstash like this:
output {
elasticsearch{
hosts => "http://localhost:9200"
index => "d13-%{+YYYY.MM.dd}"
}
}
Both of them are placed in myConf.conf
In the content of each message, there is a Json that contains the fields like this:
{
"mDate":"MMMM dd YYYY, HH:mm:ss.SSS"
"name":"test name"
}
But there are two problems. First, there is no date field in the field of creating a new index(Time Filter field name). Second, I use the same timestamp as the default #timestamp, this field will not be displayed in the build type of graphs. I think the reason for this is because of the data type of the field. The field is of type date, but the string is considered.
i try to convert value of field to date by mutate in logstash config like this:
filter {
mutate {
convert => { "mdate" => "date" }
}
}
Now, two questions arise:
1- Is this the problem? If yes What is the right solution to fix it?
2- My main need is to use the time when messages are entered in the queue, not when Logstash takes them. What is the best solution?
If you don't specify a value for #timestamp, you should get the current system time when elasticsearch indexes the document. With that, you should be able to see items in kibana.
If I understand you correctly, you'd rather use you mDate field for #timestamp. For this, use the date{} filter in logstash.

logstash add_field conversion issue

I am using logstash version 5.0.2. Parsing a file which holds a filename as one of the fields which is parsed by logstash grok filter, but for visualization i needed the file number to identify each file. So I added new field through mutate filter add_field checking the filename in [message].
if 'filename_1' in [message] {
mutate { add_field => { "file_no" => "13" } }
mutate {convert => [ "file_no", "float" ] }
}
If i check the parsing through stdin/stdout (rubydebug codec) filterit shows the file_no field is converted properly. but if I send the logstash output to elasticsearch kibana shows conflict in data type of that field.
there I am able to see file_no.keyword(as string) and file_no(as conflict), with error as:
Mapping conflict! A field is defined as several types (string, integer,
etc) across the indices that match this pattern. You may still be able to use
these conflict fields in parts of Kibana, but they will be unavailable for
functions that require Kibana to know their type. Correcting this issue will
require reindexing your data
I have converted the added filed so why is is still being sent to elasticsearch as string not sure.
any help would be great.
When tried converting the field there is not option of number in Kibana. The source logfile being monitored doesn't have this number to parse it directly as an integer with %{PATTERN_FOR_NUMBER:number_variable:int} otherwise this could have been easier

Is it possible to index only the matched log lines of grok in Logstash?

I'm having a log file, which actually has INFOs' and ERRORs. So I tried to match only the needful INFOs by using the grok filter. So this is how my log lines look like. Few of them from the file.
And this is how my grok look like in my logstash conf:
grok {
patterns_dir => ["D:/elk_stack_for_ideabiz/elk_from_chamith/ELK_stack/logstash-5.1.1/bin/patterns"]
match => {
"message" => [
"^TID\: \[0\] \[AM\] \[%{LOGTIMESTAMPTWO:logtimestamp}]%{REQUIREDDATAFORAPP:app_message}",
"^TID\: \[0\] \[AM\] \[%{LOGTIMESTAMPTWO:logtimestamp}]%{REQUIREDDATAFORRESPONSESTATUS:response_message}"
]
}
}
The pattern seems to be working fine. I could provide the pattern if required.
I've got two questions. One is I wanted only the grok matched lines to be sent to the index, and prevent Logstash from indexing the non-matched ones and the second is to prevent Logstash from showing the message in every single ES record.
I tried using the overwrite as such under the match but still no luck:
overwrite => [ "message" ]
All in all what I need to see in my indice are the messages (app_message, response_message from the above match), which should match the above two conditions. Where as now, all the lines are getting indexed.
Is it possible do something like this? Or does Logstash index all of them by default?
Where am I going wrong? Any help could be appreciated.
Indexing only the lines you want is pretty straightforward.
By default, if grok fails to match anything it will add the tag _grokparsefailure.
Now what you want to do is check for that tag and then use the drop filter.
if "_grokparsefailure" in [tags] {
drop { }
}
sysadmin1138 mentioned that you can select between different grok filters by adding tag_on_failure => ['_INFOFailure' ]
For your sencond question you can simply remove the field. Every filter has a remove_field option but I use the mutate filter to indicate that I am manipulating the message.
mutate {
remove_field => ["message"]
}

Logstash Dynamic Index From Document Field Fails

I still face problems to figure out, how to tell Logstash to send a dynamic index, based on a document field. Furthermore, this Field must be transformed in order to get the "real" index at the very end.
Given, that there is a field "time" (which is a UNIX Timestamp). This Field gets already transformed with a "date" Filter to a DateTime Object for Elastic.
Additionally, it should server as index (YYYYMM). The index should NOT be derived from #Timestamp, which is not touched.
Example:
{...,"time":1453412341,...}
Shall go to the Index: 201601
I use the following Config:
filter {
date {
match => [ "time", "UNIX" ]
target => "time"
timezone => "Europe/Berlin"
}
}
output {
elasticsearch {
index => "%{time}%{+YYYYMM}"
document_type => "..."
document_id => "%{ID}"
hosts => "..."
}
}
Sadly, its not working. Any idea, how to achieve that?
Thanks a lot!
The "%{+YYYYMM}" says to use the date values from #timestamp. If you want an index named after the YYYYMM in %{time}, you need to make a string out of that date field and then reference that string in the output stanza. There might be a mutate{} that would do it, or drop into ruby{}.
In most installations, you want to set #timestamp to the event's value. The default of logstash's own time is not very useful (imagine if your events were delayed by an hour during processing). If you did that, then %{+YYYYMM}" would work just fine.
This is caused because the index name is created based on UTC time by default.

Resources