Renaming the field with special character with Simple Message Transform Kafka Connect - apache-kafka-connect

I'm using SMT in Splunk Sink Connector and having trouble with renaming the field. I need to change name of the field from, for example, "metric1" into "metric_name:cpu.usr".
That's how configurations of my connector look like:
SPLUNK_SINK_CONNECTOR_CONFIG='{
"name": "splunk_sink_connector_1",
"config": {
"connector.class": "com.splunk.kafka.connect.SplunkSinkConnector",
"tasks.max": "1",
"splunk.indexes": "'"$SPLUNK_INDEX"'",
"topics":"metrics_for_splunk",
"splunk.hec.uri": "'"$SPLUNK_HEC_URI"'",
"splunk.hec.token": "'"$SPLUNK_HEC_TOKEN"'",
"splunk.hec.raw": "true",
"offset.flush.interval.ms": 1000,
"splunk.hec.json.event.formatted": "true",
"transforms": "renameField,HoistField,insertTS,convertTS,insertEvent",
"transforms.renameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.renameField.renames": "metric1:"metric_name:cpu.usr"",
"transforms.HoistField.type": "org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.HoistField.field": "fields",
"transforms.insertTS.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.insertTS.timestamp.field": "message_timestamp",
"transforms.convertTS.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.convertTS.format": "yyyy-MM-dd hh:mm",
"transforms.convertTS.target.type": "string",
"transforms.convertTS.field": "message_timestamp",
"transforms.insertEvent.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.insertEvent.static.field": "event",
"transforms.insertEvent.static.value": "metric"
}}'
When I'm trying to run this connector, I get an error:
Connector configuration is invalid and contains the following 1 error(s):\nInvalid value [metric1:metric_name:cpu.usr] for configuration renames: Invalid rename mapping: metric1:metric_name:cpu.usr\n
If I'm running this connector without renameField SMT everything goes without a hitch.
I understand that the problem is with ":" character in the name of the field. I've tried to wrap metric_name:cpu.usr like this:
"transforms.renameField.renames": "metric1:'metric_name:cpu.usr'"
and like this
"transforms.renameField.renames": "metric1:'"metric_name:cpu.usr"'"
and like this
"transforms.renameField.renames": "metric1:"metric_name:cpu.usr""
and use escape character \ before :
"transforms.renameField.renames": "metric1:metric_name:cpu.usr"
with no positive effect.
Is it possible at all to use this SMT for renaming if I have special character in the name?
Or maybe there is some handy workaround?
It seems that this use case should be rather common, but I haven't found anything on the web.
Will be very grateful for advice.

Related

Numeric Timestamp in Kafka Oracle JDBC Connector

I'm currently trying to setup a JDBC Connector with the goal to read data from a Oracle DB and push it to a Kafka topic using Kafka Connect. I wanted to use the "timestamp" mode:
timestamp: use a timestamp (or timestamp-like) column to detect new and modified rows. This assumes the column is updated with each write, and that values are monotonically incrementing, but not necessarily unique.
https://docs.confluent.io/kafka-connectors/jdbc/current/source-connector/source_config_options.html#mode
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"dialect.name" : "OracleDatabaseDialect",
"connection.url": "XXX",
"connection.user" : "XXX",
"connection.password" : "XXX",
"mode" : "timestamp",
"quote.sql.identifiers": "never"
"timestamp.column.name" : "LAST_UPDATE_DATE",
"query" : "select a1.ID, a1.LAST_UPDATE_DATE, b1.CODE
from a1
left join b1 on a1.ID = b1.ID
...
}
My problem is, that the timestamp on the database is defined as NUMBER(15), e.g. 20221220145930000. The Connector creates a where statement at the end of my defined query like
where a1.LAST_UPDATE_DATE > :1 and a1.LAST_UPDATE_DATE < :2 order by a1.LAST_UPDATE_DATE asc
This leads to an error message: ORA-00932: inconsistent datatypes: expected NUMBER got TIMESTAMP
Unfortunately, the database is not under my control (proprietary software). I have only read permissions.
Is there a possibility to set the timestamp type in this connector? I already tried to use the (to_timestamp() function directly in the SQL-statement and a SMT (timestampConverter) without success

Summing a parsed field in Grafana Loki

I am trying to sum the count of rows (row_count) being inserted/updated according to my process logs, which look similar to the following line:
{"function": "Data Processing Insert", "module": "Data Processor", "environment": "DEV", "level": "INFO", "message": "Number of rows inserted", "time": "2022-04-29T09:07:02.735Z", "epoch_time": 1651223222.735133, "row_count": "8089"}
I'm able to build the filter to get those lines but haven't been able to perform calculations on row_count. How would I go about doing that?
The grafana community channel came thru. To do what I'm looking for I had to use unwrap:
sum_over_time({__aws_cloudwatch_log_group="/aws/batch/job"} | json | message=~".+inserted" | unwrap row_count [1h])

ActiveMQ jolokia gives different message response depending on environment

I have to get (not consume) part of a message that is in queue. I reused bash script that was prompted as a response here, with the use of /api/jolokia/ : ActiveMQ Jolokia API How can I get the full Message Body
Part of a response that I am interested to get is MsgId in value:text :
"request": {
"mbean": "org.apache.activemq:brokerName=MyBrokerName,destinationName=MyQueueName,destinationType=Queue,type=Broker",
"type": "exec",
"operation": "browseMessages()"
},
"value": [
{
"jMSCorrelationIDAsBytes": [],
***some other objects here ***
"text": "<?xml version=\"1.0\"?>\r\n<RepositoryOperationRq xmlns=\"http://www.ACORD.org/\">\r\n <MsgId>xxx28bab-e62c-4dbc-a2aa-xxx</MsgId>\r\n <CreationDtTime>2020-01-01T11:11:11-11:00</CreationDtTime>\r\n
There is no problem on DEV env ActiveMQ but when I tried do the same on UAT env ActiveMQ there is no value:text object in response at all, and some others objects values are different, like:
"connectionControl": false
and
"connectionControl": "false"
I thought it might be because of maxDepth parameter so I increased it. Unfortunately when set maxDepth=5 I got that error:
"error_type": "java.lang.IllegalStateException",
"error": "java.lang.IllegalStateException : Error while extracting next from org.apache.activemq.broker.region.cursors.FilePendingMessageCursor#3bb9ace4",
"status": 500
and the whole ActiveMQ stopped receiving any messages- had to force restart it. ActiveMQ configs should be the same on both envs, and the version is 5.13.3. Do you know why that text object is missing?
I think the difference here is down to the content of the messages in each environment. The browseMessages operation simply returns the messages in the corresponding destination (e.g. MyQueueName).
If the message is not a javax.jms.TextMessage then it won't have the text field. If a property is false instead of "false" that just means the property value was a boolean instead of a String respectively.

Fluent-bit - Splitting json log into structured fields in Elasticsearch

I am trying to find a way in Fluent-bit config to tell/enforce ES to store plain json formatted logs (the log bit below that comes from docker stdout/stderror) in structured way - please see image at the bottom for better explanation. For example, apart from (or along with) storing the log as a plain json entry under log field, I would like to store each property individually as shown in red.
The documentation for Filters and Parsers are really poor and not clear. On top of that the forward input doesn't have a "parser" option. I tried json/docker/regex parsers but no luck. My regex is here if I have to use regex. Currently using ES (7.1), Fluent-bit (1.1.3) and Kibana (7.1) - not Kubernetes.
If anyone can direct me to an example or give one I would be much appreciated.
Thanks
{
"_index": "hello",
"_type": "logs",
"_id": "T631e2sBChSKEuJw-HO4",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2019-06-21T21:34:02.000Z",
"tag": "php",
"container_id": "53154cf4d4e8d7ecf31bdb6bc4a25fdf2f37156edc6b859ba0ddfa9c0ab1715b",
"container_name": "/hello_php_1",
"source": "stderr",
"log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
},
"fields": {
"#timestamp": [
"2019-06-21T21:34:02.000Z"
]
},
"sort": [
1561152842000
]
}
Thanks
conf
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
[OUTPUT]
Name es
Match hello_*
Host elasticsearch
Port 9200
Index hello
Type logs
Include_Tag_Key On
Tag_Key tag
Solution is as follows.
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name forward
storage.type filesystem
Listen my_fluent_bit_service
Port 24224
[FILTER]
Name parser
Parser docker
Match hello_*
Key_Name log
Reserve_Data On
Preserve_Key On
[OUTPUT]
Name es
Host my_elasticsearch_service
Port 9200
Match hello_*
Index hello
Type logs
Include_Tag_Key On
Tag_Key tag
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
You can use the Fluent Bit Nest filter for that purpose, please refer to the following documentation:
https://docs.fluentbit.io/manual/filter/nest

How to suppress aws lambda cli output

I want to use aws lambda update-function-code command to deploy the code of my function. The problem here is that aws CLI always prints out some information after deployment. That information contains sensitive information, such as environment variables and their values. That is not acceptable as I'm going to use public CI services, and I don't want that info to become available to anyone. At the same time I don't want to solve this by directing everything from AWS command to /dev/null for example as in this case I will lose information about errors and exceptions which will make it harder to debug it if something went. What can I do here?
p.s. SAM is not an option, as it will force me to switch to another framework and completely change the workflow I'm using.
You could target the output you'd like to suppress by replacing those values with jq
For example if you had output from the cli command like below:
{
"FunctionName": "my-function",
"LastModified": "2019-09-26T20:28:40.438+0000",
"RevisionId": "e52502d4-9320-4688-9cd6-152a6ab7490d",
"MemorySize": 256,
"Version": "$LATEST",
"Role": "arn:aws:iam::123456789012:role/service-role/my-function-role-uy3l9qyq",
"Timeout": 3,
"Runtime": "nodejs10.x",
"TracingConfig": {
"Mode": "PassThrough"
},
"CodeSha256": "5tT2qgzYUHaqwR716pZ2dpkn/0J1FrzJmlKidWoaCgk=",
"Description": "",
"VpcConfig": {
"SubnetIds": [],
"VpcId": "",
"SecurityGroupIds": []
},
"CodeSize": 304,
"FunctionArn": "arn:aws:lambda:us-west-2:123456789012:function:my-function",
"Handler": "index.handler",
"Environment": {
"Variables": {
"SomeSensitiveVar": "value",
"SomeOtherSensitiveVar": "password"
}
}
}
You might pipe that to jq and replace values only if the keys exist:
aws lambda update-function-code <args> | jq '
if .Environment.Variables.SomeSensitiveVar? then .Environment.Variables.SomeSensitiveVar = "REDACTED" else . end |
if .Environment.Variables.SomeRandomSensitiveVar? then .Environment.Variables.SomeOtherSensitiveVar = "REDACTED" else . end'
You know which data is sensitive and will need to set this up appropriately. You can see the example of what data is returned in the cli docs and the API docs are also helpful for understanding what the structure can look like.
Lambda environment variables show themselves everywhere and cannot considered private.
If your environment variables are sensitive, you could consider using aws secret manager.
In a nutshell:
create a secret in the secret store. It has a name (public) and a value (secret, encrypted, with proper user access control)
Allow your lambda to access the secret store
In your lambda env, store the name of your secret, and tell your lambda to get the corresponding value at runtime
bonus: password rotation is made super easy, as you don't even have to update your lambda config anymore

Resources