Kafka Connect How to make namespace agnostic to database name - spring

My Environment
MySQL(5.7): We have multiple schemas and the naming convention is {application_name}_env.
Example: Consider we have two apps: app1 and app2
Dev Environment: The database names would be app1_dev, app2_dev
QA Environment: The database names would be app1_qa, app2_qa.
Debezium(0.8.3). The plugin is used to CDC MySQL Logs.
Connector Configuration is:
{
"name": "connector-1",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"decimal.handling.mode": "double",
"snapshot.mode": "when_needed",
"table.whitelist":"{database_name}.account",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"transforms" : "setSchema",
"transforms.setSchema.type" : "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.setSchema.schema.name" : "com.test.Account"
}
}
Spring Java Application I am using Kafka Consumer(#KafkaListener) to read the changes from the Debezium event.
I provide the avsc files and used gradle avro plugin to generate the classes.
Schema from Dev env
{
"type":"record",
"name":"Accounts",
"namespace":"com.test",
"fields":[
{
"name":"before",
"type":[
"null",
{
"type":"record",
"name":"Value",
"namespace":"dbserver1.app1_dev.account",
"fields":[
{
"name":"id",
"type":"long"
}
],
"connect.name":"dbserver1.app1_dev.account.Value"
}
],
"default":null
},
{
"name":"after",
"type":[
"null",
"dbserver1.app1_dev.account.Value"
],
"default":null
},
{
"name":"source",
"type":{
"type":"record",
"name":"Source",
"namespace":"io.debezium.connector.mysql",
"fields":[
{
"name":"version",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"name",
"type":"string"
},
{
"name":"server_id",
"type":"long"
},
{
"name":"ts_sec",
"type":"long"
},
{
"name":"gtid",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"file",
"type":"string"
},
{
"name":"pos",
"type":"long"
},
{
"name":"row",
"type":"int"
},
{
"name":"snapshot",
"type":[
{
"type":"boolean",
"connect.default":false
},
"null"
],
"default":false
},
{
"name":"thread",
"type":[
"null",
"long"
],
"default":null
},
{
"name":"db",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"table",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"query",
"type":[
"null",
"string"
],
"default":null
}
],
"connect.name":"io.debezium.connector.mysql.Source"
}
},
{
"name":"op",
"type":"string"
},
{
"name":"ts_ms",
"type":[
"null",
"long"
],
"default":null
}
],
"connect.name":"com.test.Account"
}
Issue:
Since my database schemas are dynamic i.e they end with env suffix.
The Schema generated in each environment has a different namespace.
Dev: dev.app1_dev.accounts
QA: dev.app1_qa.accounts
Because of the different namespace, I am not able to deserialize my dev code in QA. So If used schema generated in Dev, the code won't work in QA.
I want to make sure that namespace is consistent across all the environments.

Please use org.apache.kafka.connect.transforms.SetSchemaMetadata SMT - see https://github.com/a0x8o/kafka/blob/master/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/SetSchemaMetadata.java

Related

Is there a better/faster way to insert data to a database from an external api in Laravel?

I am currently getting data from an external API for use my in Laravel API. I have everything working but I feel like it is slow.
I'm getting the data from the API with Http:get('url) and that works fast. It is only when I start looping through the data and making edits when things are slowing down.
I don't need all the data, but it would still be nice to edit before entering the data to the database as things aren't very consitent if possible. I also have a few columns that use data and some logic to make new columns so that each app/site doesn't need to do it.
I am saving to the database on each foreach loop with the eloquent Model:updateOrCreate() method which works but these json files can easily be 6000 lines long or more so it obviously takes time to loop through each set modify values and then save to the database each time. There usually isn't more than 200 or so entries but it still takes time. Will probably eventually update this to the new upset() method to make less queries to the database. Running in my localhost it is currently take about a minute and a half to run, which just seams way too long.
Here is a shortened version of how I was looping through the data.
$json = json_decode($contents, true);
$features = $json['features'];
foreach ($features as $feature){
// Get ID
$id = $feature['id'];
// Get primary condition data
$geometry = $feature['geometry'];
$properties = $feature['properties'];
// Get secondary geometry data
$geometryType = $geometry['type'];
$coordinates = $geometry['coordinates'];
Model::updateOrCreate(
[
'id' => $id,
],
[
'coordinates' => $coordinates,
'geometry_type' => $geometryType,
]);
}
Most of what I'm doing behind the scenes to the data before going into the database is cleaning up some text strings but there are a few logic things to normalize or prep the data for websites and apps.
Is there a more efficient way to get the same result? This will ultimately be used in a scheduler and run on an interval.
Example Data structure from API documentation
{
"$schema": "http://json-schema.org/draft-04/schema#",
"additionalProperties": false,
"properties": {
"features": {
"items": {
"additionalProperties": false,
"properties": {
"attributes": {
"type": [
"object",
"null"
]
},
"geometry": {
"additionalProperties": false,
"properties": {
"coordinates": {
"items": {
"items": {
"type": "number"
},
"type": "array"
},
"type": "array"
},
"type": {
"type": "string"
}
},
"required": [
"coordinates",
"type"
],
"type": "object"
},
"properties": {
"additionalProperties": false,
"properties": {
"currentConditions": {
"items": {
"properties": {
"additionalData": {
"type": "string"
},
"conditionDescription": {
"type": "string"
},
"conditionId": {
"type": "integer"
},
"confirmationTime": {
"type": "integer"
},
"confirmationUserName": {
"type": "string"
},
"endTime": {
"type": "integer"
},
"id": {
"type": "integer"
},
"sourceType": {
"type": "string"
},
"startTime": {
"type": "integer"
},
"updateTime": {
"type": "integer"
}
},
"required": [
"id",
"userName",
"updateTime",
"startTime",
"conditionId",
"conditionDescription",
"confirmationUserName",
"confirmationTime",
"sourceType",
"endTime"
],
"type": "object"
},
"type": "array"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"nameId": {
"type": "string"
},
"parentAreaId": {
"type": "integer"
},
"parentSubAreaId": {
"type": "integer"
},
"primaryLatitude": {
"type": "number"
},
"primaryLongitude": {
"type": "number"
},
"primaryMP": {
"type": "number"
},
"routeId": {
"type": "integer"
},
"routeName": {
"type": "string"
},
"routeSegmentIndex": {
"type": "integer"
},
"secondaryLatitude": {
"type": "number"
},
"secondaryLongitude": {
"type": "number"
},
"secondaryMP": {
"type": "number"
},
"sortOrder": {
"type": "integer"
}
},
"required": [
"id",
"name",
"nameId",
"routeId",
"routeName",
"primaryMP",
"secondaryMP",
"primaryLatitude",
"primaryLongitude",
"secondaryLatitude",
"secondaryLongitude",
"sortOrder",
"parentAreaId",
"parentSubAreaId",
"routeSegmentIndex",
"currentConditions"
],
"type": "object"
},
"type": {
"type": "string"
}
},
"required": [
"type",
"geometry",
"properties",
"attributes"
],
"type": "object"
},
"type": "array"
},
"type": {
"type": "string"
}
},
"required": [
"type",
"features"
],
"type": "object"
}
Second, related question.
Since this is being updated on an interval I have it updating and creating records from the json data, but is there an efficient way to delete old records that are no longer in the json file? I currently get an array of current ids and compare them to the new ids and then loop through each and delete them. There has to be a better way.
Have no idea what to say to your first question, but I think you may try to do something like this regarding the second question.
SomeModel::query()->whereNotIn('id', $newIds)->delete();
$newIds you can collect during the first loop.

How to use XPATH to return an array of values based on a condition that a specific property exists in a JSON object array

How can I use XPATH to return an array based on the existence of a specific property?
Below is a section of my JSON file. Under "root" there are a number of array objects and SOME of them contain the property "detection". I would like to retrieve the "service_name" of each array object ONLY IF the object array (under root) contains the property "detection".
e.g., "service_name": "IPS" should be returned
but for the example below, the service_name should NOT be returned because property "detection" is not present
Finally, is there a way to combine the above query into one, in order to return an array of values "service_name" and "detection" together, based on the same condition?
My current Power Automate Set Variable command is:
xpath(xml(variables('varProductsRoot')), '//detection | //service_name')
and unfortunately it returns ALL service_names, even if the component they belong to does not contain the "detection" property.
Below is a sample of my JSON file I am trying to parse
{
"root": {
"fg": [
{
"product_name": "fg",
"remediation": {
"type": "package",
"packages": [
{
"service": "ips",
"service_name": "IPS",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Exploitation"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "22.414"
}
]
},
"detection": {
"attackid": [
51006,
50825
]
}
}
],
"fweb": [
{
"product_name": "fWeb",
"remediation": {
"type": "package",
"packages": [
{
"service": "waf",
"service_name": "Web App Security",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Exploitation"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "0.00330"
}
]
},
"detection": {
"signature_id": [
"090490119",
"090490117"
]
}
}
],
"fcl": [
{
"product_name": "fcl",
"remediation": {
"type": "package",
"packages": [
{
"service": "vuln",
"service_name": "Vulnerability",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Delivery"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "1.348"
}
]
},
"detection": {
"vulnid": [
69887,
2711
]
}
},
{
"product_name": "fcl",
"remediation": {
"type": "package",
"packages": [
{
"service": "ob-detect",
"service_name": "ob Detection",
"kill_chain": {
"step": "sm/SOAR"
},
"link": "https://www.fgd.com/services",
"minimum_version": "1.003"
}
]
}
}
],
"fss": [
{
"product_name": "fss",
"remediation": {
"type": "package",
"packages": [
{
"service": "ips",
"service_name": "IPS",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Exploitation"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "22.414"
}
]
}
}
],
"fadc": [
{
"product_name": "fADC",
"remediation": {
"type": "package",
"packages": [
{
"service": "ips",
"service_name": "IPS",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Exploitation"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "22.414"
}
]
},
"detection": {
"ips_rulename": [
"Error.Log.Remote.Code.Execution",
"Server.cgi-bin.Path.Traversal"
]
}
},
{
"product_name": "fADC",
"remediation": {
"type": "package",
"packages": [
{
"service": "waf",
"service_name": "Web App Security",
"description": "Detects and Blocks attack attempts",
"kill_chain": {
"step": "Exploitation"
},
"link": "https://fgd.fnet.com/updates",
"minimum_version": "1.00038"
}
]
},
"detection": {
"sigid": [
1002017267,
1002017273
]
}
}
],
"fsm": [
{
"product_name": "fsm",
"remediation": {
"type": "package",
"packages": [
{
"service": "ioc",
"service_name": "IOC",
"kill_chain": {
"step": "sm/SOAR"
},
"link": "https://www.fgd.com/services",
"minimum_version": "0.02355"
}
]
}
}
]
}
}
Thank you in advance,
Nikos
This will work for you. I've broken it up into three steps for ease ...
Step 1
This contains your JSON as you provided. The variable is defined as an Object.
Step 2
Initialise a string variable that contains the following expression ...
xml(variables('JSON'))
... which (as you know) will convert the JSON to XML.
Step 3
This is an Array variable that will extract the values of all service_name elements where the detection element exists, using the following expression ...
xpath(xml(variables('XML')), '//detection/..//service_name/text()')
Result
Voila! You have your values in an array.

Any idea how to do custom supportedCookingModes in Alexa discovery?

I'm trying to return a Discovery Response, but the supportedCookingModes only seems to accept standard values and only in the format of ["OFF","BAKE"], not Custom values as indicated by the documentation. Any idea on how to specify custom values?
{
"event": {
"header": {
"namespace": "Alexa.Discovery",
"name": "Discover.Response",
"payloadVersion": "3",
"messageId": "asdf"
},
"payload": {
"endpoints": [
{
"endpointId": "asdf",
"capabilities": [
{
"type": "AlexaInterface",
"interface": "Alexa.Cooking",
"version": "3",
"properties": {
"supported": [
{
"name": "cookingMode"
}
],
"proactivelyReported": true,
"retrievable": true,
"nonControllable": false
},
"configuration": {
"supportsRemoteStart": true,
"supportedCookingModes": [
{
"value": "OFF"
},
{
"value": "BAKE"
},
{
"value": "CUSTOM",
"customName": "FANCY_NANCY_MODE"
}
]
}
}
]
}
]
}
}
}
Custom cooking modes are brand specific. This functionality is not yet publicly available. I recommend you to choose one of the existing cooking modes:
https://developer.amazon.com/en-US/docs/alexa/device-apis/cooking-property-schemas.html#cooking-mode-values

ElasticSearch indexing with nested collections in document

I have been wrestling with an issue trying to index a document into a brand new index in ElasticSearch. My document looks something like this:
{
"id": "",
"name": "Process to run batch of steps",
"defaultErrorStep": {
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "General Error Handler",
"type": "ERROR",
"reference": "error",
"onError": "DEFAULT"
},
"startingStep": "one",
"steps": [
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step One",
"type": "CHAIN",
"reference": "one",
"onComplete": "two",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Two",
"type": "CHAIN",
"reference": "two",
"onComplete": "two",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Three",
"type": "BOOLEAN",
"reference": "three",
"onTrue": "four",
"onFalse": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Four",
"type": "LOOP",
"startingStep": "seven",
"steps": [
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Two",
"type": "CHAIN",
"reference": "six",
"onComplete": "seven",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Five",
"type": "FINISH_VOID",
"end": false,
"reference": "seven",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
}
],
"reference": "four",
"onComplete": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
},
{
"id": "d44fdeae-80ff-4509-8504-9dfbd7284631",
"name": "Step Five",
"type": "FINISH",
"end": true,
"reference": "five",
"onError": "DEFAULT",
"parameterKeys": {
"param-a": "value-a",
"param-b": "value-b",
"param-c": "value-c"
}
}
],
"configuration": {
"settings": {
"property-a": "a",
"property-b": "b",
"property-c": "c",
"property-d": "d",
"property-z": "z123"
}
}
}
My issue is that due to the nested structure of the property "steps" and its ability to also have loop objects with "steps" inside of that, I get into an issue of field duplication when trying to index. I understand the reason (I think) as to why my document is failing but I need to index it all the same. When I try to index the document I get the following error:
ElasticsearchException[Elasticsearch exception [type=json_parse_exception, reason=Duplicate field 'type'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#84a0697; line: 1, column: 186]]]
Again, I understand why this is an issue but I figured I could address this with mappings in my index. I have tried nested object type, flattened object types and even setting index:false on the steps field just to see if I could get the document to go in. But, no chance. I know this is a going to be a simple fix somewhere I just cannot see but does anyone have any thoughts on what I can try to get this document to index.
I am using ElasticSearch 7.3.1 via the latest Java SDK release. I have bypassed the java code for now and just using POSTMAN to send the indexing command but still I get the same issue.
Below is an example of one of the mappings I have tried.
{
"_source" : {
"enabled": true
},
"properties" : {
"name": {
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"steps":{
"type":"nested",
"properties":{
"steps":{
"type":"flattened",
"index":false
}
}
},
"configuration.settings":{"type":"flattened"}
}
}
As well as a more explicit mapping to cover the "defaultErrorStep" object.
{
"_source" : {
"enabled": true
},
"properties" : {
"name": {
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"defaultErrorStep":{
"type":"object",
"properties":{
"id":{"type":"text"},
"name":{"type":"text"},
"type":{"type":"text"},
"reference":{"type":"text"},
"onError":{"type":"text"}
}
},
"steps":{
"type":"nested",
"properties":{
"id":{"type": "text"},
"name":{
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"type":{"type": "text"},
"reference":{"type": "text"},
"onComplete":{"type": "text"},
"onError":{"type": "text"},
"parameterKeys":{"type": "object"},
"onTrue":{"type": "text"},
"onFalse":{"type": "text"},
"startingStep":{"type": "text"},
"steps":{
"type":"nested",
"properties":{
"id":{"type": "text"},
"name":{
"type": "text",
"fields": {
"raw":{"type": "keyword"}
}
},
"type":{"type": "text"},
"reference":{"type": "text"},
"onComplete":{"type": "text"},
"onError":{"type": "text"},
"parameterKeys":{"type": "object"},
"onTrue":{"type": "text"},
"onFalse":{"type": "text"},
"startingStep":{"type": "text"},
"steps":{
"type": "flattened",
"index":false
},
"end":{"type": "boolean"}
}
},
"end":{"type": "boolean"}
}
},
"configuration.settings":{"type":"flattened"}
}
}
Please also bear in mind that the nature of the document is to outline a process/workflow of logic and the structure is key and I would also say valid JSON. So in theory the steps property could nest 3,4,10 levels if it had to. So Ideally I wouldn't want to be updating the mapping every time a new level was added in the data.
Any help anyone can give me to get this document to index would be much appreciated.
Thanks,
EDIT:
I have since removed my explicit mapping from my index and let dynamic mapping take over as all my objects fit into the base types dynamic mapping supports. This has been successful and I am able to index the document shown above with infinitely nested steps no problem. I then tried the same operation with the same document structure using the JAVA SDK and this failed with the same duplicate field exception. This indicates to me the issue is with the JAVA SDK and not something native to elasticsearch itself.
Dynamic mapping is the better option in my case as I have no control over how many levels steps could eventually get to.
Has anyone experienced any issues with the SDK behaving differently to the base product?
I am running elastic 7.3.1 and with following index mapping i am successfully able to create index with nested types inside nested type.
PUT new_index_1
{
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"steps": {
"type": "nested",
"properties": {
"steps": {
"type": "flattened",
"index": false
}
}
},
"configuration.settings": {
"type": "flattened"
}
}
}
}
Following index creation also works for me
PUT new_index_2
{
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"steps": {
"type": "nested",
"properties": {
"steps": {
"type": "nested"
}
}
}
}
}
}
Document Indexed
POST new_index_1/_doc
{
"name": "ajay",
"steps": [
{
"test": "working",
"steps": [
{
"name": "crow"
}
]
}
]
}

Cloudformation AWS: Connect RDS to subnets

I am trying to build a cloudformation template but I have some trouble with how to connect my Oracle RDS instance with my two subnets.
My parameters are :
"3DCFDB": {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"DBInstanceClass": "db.t2.micro",
"AllocatedStorage": "20",
"Engine": "oracle-se2",
"EngineVersion": "12.1.0.2.v13",
"MasterUsername": {
"Ref": "user"
},
"MasterUserPassword": {
"Ref": "password"
}
},
"Metadata": {
"AWS::CloudFormation::Designer": {
"id": "*"
}
},
"DependsOn": [
"3DEXPSUBPU",
"3DSUBPRI"
]
}
What parameter am I supposed to add to connect my RDS to 2 subnets?
If I understand correctly, you need to create a resource with Type "Type": AWS::RDS::DBSubnetGroup, then inside your "Type": "AWS::RDS::DBInstance" you can refer to the subnet group with something similar to this
"3DCFDB": {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"DBInstanceClass": "db.t2.micro",
"AllocatedStorage": "20",
"Engine": "oracle-se2",
"EngineVersion": "12.1.0.2.v13",
"DBSubnetGroupName": {
"Ref": "DBsubnetGroup"
}
"MasterUsername": {
"Ref": "user"
},
"MasterUserPassword": {
"Ref": "password"
}
},
"Metadata": {
"AWS::CloudFormation::Designer": {
"id": "*"
}
},
"DependsOn": [
"3DEXPSUBPU",
"3DSUBPRI"
]
},
"DBsubnetGroup": {
"Type" : "AWS::RDS::DBSubnetGroup",
...
...
}
More info can be found here
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbsubnet-group.html

Resources