kafka s3 confluent connector - decimals dumped as 1.2345E9 - apache-kafka-connect

I'm using the Kafka s3 sink connector from confluent to send json to s3.
When dumping timestamp (1234567890.1234), gets format to 1.2345678901234E9.
Is it possible to just have it dump as is?
topics: my-topic
rotate.schedule.interval.ms: 60000
s3.bucket.name: my-bucket
s3.compression.type: gzip
storage.class: io.confluent.connect.s3.storage.S3Storage
format.class: io.confluent.connect.s3.format.json.JsonFormat
value.converter: org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable: false
key.converter: org.apache.kafka.connect.converters.ByteArrayConverter
key.converter.schemas.enable: false
partitioner.class: io.confluent.connect.storage.partitioner.TimeBasedPartitioner
partition.field.name: timestamp
path.format: "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH"
locale: en-US
timezone: UTC

Related

Elasticsearch/Kibana shows the wrong timestamp

I transfer logfiles with filebeat to elasticsearch.
The data are analyzed with kibana.
Now to my problem:
Kibana shows not the timestamp from the logfile.
Kibana shows the time of the transmission in #timestamp.
I want to show the timestamp from the logfile in kibana.
But the timestamp in the logfile is overwritten.
Where is my fault?
Has anyone a solution for my problem?
Here a example from my logfile and the my filebeat config.
{"#timestamp":"2022-06-23T10:40:25.852+02:00","#version":1,"message":"Could not refresh JMS Connection]","logger_name":"org.springframework.jms.listener.DefaultMessageListenerContainer","level":"ERROR","level_value":40000}
## Filebeat configuration
## https://github.com/elastic/beats/blob/master/deploy/docker/filebeat.docker.yml
#
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.autodiscover:
providers:
# The Docker autodiscover provider automatically retrieves logs from Docker
# containers as they start and stop.
- type: docker
hints.enabled: true
filebeat.inputs:
- type: filestream
id: pls-logs
paths:
- /usr/share/filebeat/logs/*.log
parsers:
- ndjson:
processors:
- add_cloud_metadata: ~
output.elasticsearch:
hosts: ['http://elasticsearch:9200']
username: elastic
password:
## HTTP endpoint for health checking
## https://www.elastic.co/guide/en/beats/filebeat/current/http-endpoint.html
#
http.enabled: true
http.host: 0.0.0.0
Thanks for any support!
Based upon the question, this could be one potential option, which would be to use filebeat processors. What you could do is write that initial #timestamp value to another field, like event.ingested, using the following script below:
#Script to move the timestamp to the event.ingested field
- script:
lang: javascript
id: init_format
source: >
function process(event) {
var fieldTest = event.Get("#timestamp");
event.Put("event.ingested", fieldTest);
}
And then the last processor you write could move that event.ingested field to #timestamp again using the following processor:
#setting the timestamp field to the Date/time when the event originated, which would be the event.created field
- timestamp:
field: event.created
layouts:
- '2006-01-02T15:04:05Z'
- '2006-01-02T15:04:05.999Z'
- '2006-01-02T15:04:05.999-07:00'
test:
- '2019-06-22T16:33:51Z'
- '2019-11-18T04:59:51.123Z'
- '2020-08-03T07:10:20.123456+02:00'

How can I read and decode AVRO messages from Kafka along with their associated kafka key using Benthos?

I am using Benthos to read AVRO-encoded messages from Kafka which have the kafka_key metadata field set to also contain an AVRO-encoded payload. The schemas of these AVRO-encoded payloads are stored in Schema Registry and Benthos has a schema_registry_decode processor for decoding them. I'm looking to produce an output JSON message for each Kafka message containing two fields, one called content containing the decoded AVRO message and the other one called metadata containing the various metadata fields collected by Benthos including the decoded kafka_key payload.
It turns out that one can achieve this using a branch processor like so:
input:
kafka:
addresses:
- localhost:9092
consumer_group: benthos_consumer_group
topics:
- benthos_input
pipeline:
processors:
# Decode the message
- schema_registry_decode:
url: http://localhost:8081
# Populate output content field
- bloblang: |
root.content = this
# Decode kafka_key metadata payload and populate output metadata field
- branch:
request_map: |
root = meta("kafka_key")
processors:
- schema_registry_decode:
url: http://localhost:8081
result_map: |
root.metadata = meta()
root.metadata.kafka_key = this
output:
stdout: {}

Filtering Filebeat input with or without Logstash

In our current setup we use Filebeat to ship logs to an Elasticsearch instance. The application logs are in JSON format and it runs in AWS.
For some reason AWS decided to prefix the log lines in a new platform release, and now the log parsing doesn't work.
Apr 17 06:33:32 ip-172-31-35-113 web: {"#timestamp":"2020-04-17T06:33:32.691Z","#version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
Before it was simply:
{"#timestamp":"2020-04-17T06:33:32.691Z","#version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
The question would be whether we can avoid using Logstash to convert the log lines into the old format? If not, how do I drop the prefix? Which filter is the best choice for this?
My current Filebeat configuration looks like this:
filebeat.inputs:
- type: log
paths:
- /var/log/web-1.log
json.keys_under_root: true
json.ignore_decoding_error: true
json.overwrite_keys: true
fields_under_root: true
fields:
environment: ${ENV_NAME:not_set}
app: myapp
cloud.id: "${ELASTIC_CLOUD_ID:not_set}"
cloud.auth: "${ELASTIC_CLOUD_AUTH:not_set}"
I would try to leverage the dissect and decode_json_fields processors:
processors:
# first ignore the preamble and only keep the JSON data
- dissect:
tokenizer: "%{?ignore} %{+ignore} %{+ignore} %{+ignore} %{+ignore}: %{json}"
field: "message"
target_prefix: ""
# then parse the JSON data
- decode_json_fields:
fields: ["json"]
process_array: false
max_depth: 1
target: ""
overwrite_keys: false
add_error_key: true
There is a plugin in Logstash called JSON filter that includes all the raw log line in a field called "message" (for instance).
filter {
json {
source => "message"
}
}
If you do not want to include the beginning part of the line, use the dissect filter in Logstash. It would be something like this:
filter {
dissect {
mapping => {
"message" => "%{}: %{message_without_prefix}"
}
}
}
Maybe in Filebeat there are these two features available as well. But in my experience, I prefer working with Logstash when parsing/manipulating logging data.

Can filebeat convert log lines output to json without logstash in pipeline?

We have standard log lines in our Spring Boot web applications (non json).
We need to centralize our logging and ship them to an elastic search as json.
(I've heard the later versions can do some transformation)
Can Filebeat read the log lines and wrap them as a json ? i guess it could append some meta data aswell. no need to parse the log line.
expected output :
{timestamp : "", beat: "", message: "the log line..."}
i have no code to show unfortunately.
filebeat supports several outputs including Elastic Search.
Config file filebeat.yml can look like this:
# filebeat options: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-reference-yml.html
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/../file.err.log
processors:
- drop_fields:
# Prevent fail of Logstash (https://www.elastic.co/guide/en/beats/libbeat/current/breaking-changes-6.3.html#custom-template-non-versioned-indices)
fields: ["host"]
- dissect:
# tokenizer syntax: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html.
tokenizer: "%{} %{} [%{}] {%{}} <%{level}> %{message}"
field: "message"
target_prefix: "spring boot"
fields:
log_type: spring_boot
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "filebeat_internal"
password: "YOUR_PASSWORD"
Well it seems to do it by default. this is my result when i tried it locally to read log lines. it wraps it exactly like i wanted.
{
"#timestamp":"2019-06-12T11:11:49.094Z",
"#metadata":{
"beat":"filebeat",
"type":"doc",
"version":"6.2.4"
},
"message":"the log line...",
"source":"/Users/myusername/tmp/hej.log",
"offset":721,
"prospector":{
"type":"log"
},
"beat":{
"name":"my-macbook.local",
"hostname":"my-macbook.local",
"version":"6.2.4"
}
}

Confluent Kafka-connect-JDBC connector showing hexa decimal data in the kafka topic

I'm trying to copy the data from a table in the oracle db and trying to put that data in a kafka topic. I've used the following JDBC source connector for that :
name=JDBC-DB-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.password = *******
connection.url = jdbc:oracle:thin:#1.1.1.1:1111/ABCD
connection.user = *****
table.types=TABLE
query= select * from (SELECT * FROM JENNY.WORKFLOW where ID = '565231')
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
mode=timestamp+incrementing
incrementing.column.name=ID
timestamp.column.name=MODIFIED
topic.prefix=workflow_data12
poll.interval.ms=6000
timestamp.delay.interval.ms=60000
transforms:createKey
transforms.createKey.type:org.apache.kafka.connect.transforms.ValueToKey
transforms.createKey.fields:ID
So far good. I'm able to get the data into my kafka topic. But the output looks like the following :
key - {"ID":"\u0001"}
value - {"ID":"\u0001","MODIFIED":1874644537368}
You can observer that my key "ID" is being printed as Hexadecimal format, despite I'm using Avro in my JDBC properties file.
(I'm using kafka-avro-console consumer to view the data on the command line)
(And the column "ID" is of type "NUMBER" in the oracle db.)
Could anyone help me to point out if I'm missing some property? to print the data properly in Avro format.
Thanks in advance!!
Add this property to your .properties file e.g before query:
numeric.mapping=best_fit
Detail Explanation can be found here

Resources