Logstash Dynamic Index From Document Field Fails - elasticsearch

I still face problems to figure out, how to tell Logstash to send a dynamic index, based on a document field. Furthermore, this Field must be transformed in order to get the "real" index at the very end.
Given, that there is a field "time" (which is a UNIX Timestamp). This Field gets already transformed with a "date" Filter to a DateTime Object for Elastic.
Additionally, it should server as index (YYYYMM). The index should NOT be derived from #Timestamp, which is not touched.
Example:
{...,"time":1453412341,...}
Shall go to the Index: 201601
I use the following Config:
filter {
date {
match => [ "time", "UNIX" ]
target => "time"
timezone => "Europe/Berlin"
}
}
output {
elasticsearch {
index => "%{time}%{+YYYYMM}"
document_type => "..."
document_id => "%{ID}"
hosts => "..."
}
}
Sadly, its not working. Any idea, how to achieve that?
Thanks a lot!

The "%{+YYYYMM}" says to use the date values from #timestamp. If you want an index named after the YYYYMM in %{time}, you need to make a string out of that date field and then reference that string in the output stanza. There might be a mutate{} that would do it, or drop into ruby{}.
In most installations, you want to set #timestamp to the event's value. The default of logstash's own time is not very useful (imagine if your events were delayed by an hour during processing). If you did that, then %{+YYYYMM}" would work just fine.

This is caused because the index name is created based on UTC time by default.

Related

elastic stack : i need set Time Filter field name with another field

i need read messages(content is logs) from rabbitMq by logstash and then send that to elasticsearch for make visualize monitoring in kibana. so i wrote input for read from rabbitmq in logstash like this:
input {
rabbitmq {
queue => "testLogstash"
host => "localhost"
}
}
and i wrote output configuration for store in elasticsearch in logstash like this:
output {
elasticsearch{
hosts => "http://localhost:9200"
index => "d13-%{+YYYY.MM.dd}"
}
}
Both of them are placed in myConf.conf
In the content of each message, there is a Json that contains the fields like this:
{
"mDate":"MMMM dd YYYY, HH:mm:ss.SSS"
"name":"test name"
}
But there are two problems. First, there is no date field in the field of creating a new index(Time Filter field name). Second, I use the same timestamp as the default #timestamp, this field will not be displayed in the build type of graphs. I think the reason for this is because of the data type of the field. The field is of type date, but the string is considered.
i try to convert value of field to date by mutate in logstash config like this:
filter {
mutate {
convert => { "mdate" => "date" }
}
}
Now, two questions arise:
1- Is this the problem? If yes What is the right solution to fix it?
2- My main need is to use the time when messages are entered in the queue, not when Logstash takes them. What is the best solution?
If you don't specify a value for #timestamp, you should get the current system time when elasticsearch indexes the document. With that, you should be able to see items in kibana.
If I understand you correctly, you'd rather use you mDate field for #timestamp. For this, use the date{} filter in logstash.

Logstash doc_as_upsert cross index in Elasticsearch to eliminate duplicates

I have a logstash configuration that uses the following in the output block in an attempt to mitigate duplicates.
output {
if [type] == "usage" {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usage-%{+YYYY-MM-dd-HH}"
document_id => "%{[#metadata][fingerprint]}"
action => "update"
doc_as_upsert => true
}
}
}
The fingerprint is calculated from a SHA1 hash of two unique fields.
This works when logstash sees the same doc in the same index, but since the command that generates the input data doesn't have a reliable rate at which different documents appear, logstash will sometimes insert duplicates docs in a different date stamped index.
For example, the command that logstash runs to get the input generally returns the last two hours of data. However, since I can't definitively tell when a doc will appear/disappear, I tun the command every fifteen minutes.
This is fine when the duplicates occur within the same hour. However, when the hour or day date stamp rolls over, and the document still appears, elastic/logstash thinks it's a new doc.
Is there a way to make the upsert work cross index? These would all be the same type of doc, they would simply apply to every index that matches "usage-*"
A new index is an entirely new keyspace and there's no way to tell ES to not index two documents with the same ID in two different indices.
However, you could prevent this by adding an elasticsearch filter to your pipeline which would look up the document in all indices and if it finds one, it could drop the event.
Something like this would do (note that usages would be an alias spanning all usage-* indices):
filter {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usages"
query => "_id:%{[#metadata][fingerprint]}"
fields => {"_id" => "other_id"}
}
# if the document was found, drop this one
if [other_id] {
drop {}
}
}

Why parse failure on Log-stash occurs while field type is the same as before with no change?

The logstash log file says:
"tags"=>["_grokparsefailure"]}, "status_code"]}>, #data={"#version"=>"1", "#timestamp"=>"2016-09-24T08:00:54.894Z", "path"=>"/var/log/nginx/access.log", "host"=>"sample-com", "remote_addr"=>"127.0.0.1", "remote_user"=>"-", "date"=>"05/Sep/2016:10:03:01 +0000", "method"=>"GET", "uri_path"=>"/accounts", "version"=>"HTTP/1.1", "status_code"=>"200", "body_byte_sent"=>419, "referer"=>"-", "user_agent"=>"python-requests/2.4.3 CPython/2.7.9 Linux/3.16.0-4-amd64", "request_time"=>6.161, "auth_type"=>"Bearer", "client_id"=>"beta",
"web_client_ip"=>"172.*.131.177", "response_json"=>{"_links"=>{"applications"=>{"href"=>"/applications"}, "menus"=>{"href"=>"/menus"}, "messages"=>{"href"=>"/messages"}, "numbers"=>{"href"=>"/numbers"}, "self"=>{"href"=>"/accounts"}}, "account_status"=>"active", "creation_date"=>"2016-06-07 09:25:18", "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, "currency"=>"usd"}, "email"=>"*#gmail.com",
"id"=>"677756yt7557", "lastname"=>"Qurbani", "name"=>"M", "notifications"=>{"black_list"=>{"uids"=>[]}, "settings"=>{"email"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "language"=>"en", "push_notif"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}, "sms"=>{"low_credit"=>true, "new_feature"=>true, "receive_f"=>true, "send_f"=>true, "voice"=>true}}}, "phone"=>"+9****", "status"=>"inactive", "verification_status"=>{"email"=>"unverified", "phone"=>"verified"}}, "request_json"=>{}, "tags"=>["_grokparsefailure"]}, #metadata_accessors=#<LogStash::Util::Accessors:0x6ec6acbe #store={"path"=>"/var/log/nginx/access.log"}, #lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.24", "_type"=>"logs", "_id"=>"AVdbNisZCijYhuqEamFy", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [response_json.credit]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [balance]"}}}}, :level=>:warn}
Here I have a log like below in credit section:
"credit": {"balance": 0.0, "currency": "usd"}
I have removed all the indices from Elasticsearch, and I didn't found any .sincedb* in home or elsewhere to remove logstash DB.
Why this error happens when I don't actually have a change in balance value? What is the reason for that?
After restarting Logstash it does not aggregate data from log files!
I removed all since_dbs_* from /var/lib/logstash/ and said to start tailing from the beginning position in Logstash configuration.
Now the below error is raised:
object mapping for [response_json.credit] tried to parse field [credit] as object, but found a concrete value
It seems that sometimes credit is sent as a scalar value and sometimes as an object with two fields!
EDIT1:
2 different credit fields with different data has been posted to one credit in Elasticsearch. So I tried to rename these fields and remove the credit from both configs in logstash, so for now I have:
add_field => {"first_credit" => "%{[response_json.credit]}"}
remove_field => ["response_json.credit"]
New fields get added, but the value is literally %{[response_json.credit]} and field is not removed so error happens again. I want to get the value of credit and put it inside of first credit and remove the credit itself. I even tried the below:
add_field => {"first_credit" => "%{[response_json][credit]}"}
remove_field => ["response_json.credit"]
What I'm doing wrong?
EDIT:2
I have noticed that one file access.log has a credit field with different values.
One credit is numeric: 2.99
The other credit is a JSON: {"currency": "usd", "balance": 2.99}
I used the below logstash configuration to solve the problem and save them all as a string in ES:
if ([response_json][credit]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}
It gives the below error:
"new_credit"=>"{\"balance\":3.102,\"currency\":\"usd\"}", "tags"=>["_grokparsefailure"]},
#metadata_accessors=#<LogStash::Util::Accessors:0x46761362 #store={"path"=>"/var/log/nginx/access.log.1"},
#lut={"[path]"=>[{"path"=>"/var/log/nginx/access.log.1"}, "path"]}>,
#cancelled=false>], :response=>{"create"=>{"_index"=>"logstash-api-2016.09.27", "_type"=>"logs", "_id"=>"AVdqrION3CJVjhZgZcnl", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [new_credit]", "caused_by"=>{"type"=>"number_format_exception", "reason"=>"For input string: \"{\"balance\":3.102,\"currency\":\"usd\"}\""}}}}, :level=>:warn
From looking at your log, "credit"=>{"balance"=>#<BigDecimal:367dbf49,'0.19819267E4',8(12)>, I think this issue may be related.
If you check the Elasticsearch mapping of your index at {elasticsearch:ip}:9200/logstash-api-2016.09.24/_mapping, I bet that the balance field has an Integer mapping. If there was initially an integer mapping, any value that is not an integer (for example, an object) will fail.
You can resolve this by creating an index template that specifies balance as a float. If you choose to do this, ensure that you delete the old index or create a new one, as existing mappings cannot be modified.
You could also ensure that balance is always the same data type in the source of the logs.
Or you could add a mutate filter and convert the balance field to your desired data type.
Check out your mapping and let me know if my theory is right. :)
EDIT:
The code block you just sent me will have exactly the same problem as before - object credit and int credit will be stored in the same field. The following will store credit[balance] (an int) and int credit in the same field called new_credit, which should be mapped to an Integer.
if ([response_json][credit][balance]) {
mutate {
add_field => {"new_credit" => "%{[response_json][credit][balance}"}
remove_field => [ "[response_json][credit]" ]
}
}
else {
mutate {
add_field => {"new_credit" => "%{[response_json][credit]}"}
remove_field => [ "[response_json][credit]" ]
}
}

Elasticsearch converting a string to number

I am new to Elasticsearch and am just starting up with ELK stack. I am collecting key value type logs in my Logstash and passing it to an index in Elasticsearch. I am using the kv filter plugin in Logstash. Due to this, all the fields are string type by default.
When I try to perform aggregation like avg or sum on a numeric field in Elasticsearch, I am getting an Exception: ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]
When I check the mappings in the index, all the fields except the timestamp ones are marked as string.
Please tell me how to overcome this issue as I have many numeric fields in my log events for aggregation.
Thanks,
Keerthana
You could set explicit mappings for those fields (see e.g. Change default mapping of string to "not analyzed" in Elasticsearch for some guidance), but it's easier to just convert those fields to integers in Logstash using the mutate filter:
mutate {
convert => ["name-of-field", "integer"]
}
Then Elasticsearch will do a better job at guessing the best data type for your field(s).
(See also Data type conversion using logstash grok.)
In latest Logstash the syntax is as follows
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
You can visit this link for more detail: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-convert

How to set time in log as main #timestamp in elasticsearch

Im using logstash to index some old log files in my elastic DB.
i need kibana/elastic to set the timestamp from within the logfile as the main #timestamp.
Im using grok filter in the following way:
%{TIMESTAMP_ISO8601:#timestamp}
yet elasticsearch sets the time of indexing as the main #timestamp and not the timestamp written in the log line.
Any idea what am i doing wrong here?
Thanks
Use the date filter to set the #timestamp field. Extract the timestamp in whatever format it's in into a separate (temporary) field, e.g. timestamp, and feed it to the date filter. In your case you'll most likely be able to use the special ISO8601 timestamp format token.
filter {
date {
match => ["timestamp", "ISO8601"]
remove_field => ["timestamp"]
}
}

Resources