How to add new table to Debezium MySQL connector? - apache-kafka-connect

I have Debezium MySQL connector:
{
"name": "debezium_mysql",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "***",
"database.port": "3306",
"database.user": "kafkaconnect",
"database.password": "${file:/connect-credentials.properties:mysql_pass}",
"database.server.name": "mysql",
"heartbeat.interval​.ms": 5000,
"snapshot.mode": "when_needed",
"database.include.list": "compare_sports",
"table.include.list": "compare_sports.matches,compare_sports.games",
"database.history.kafka.topic": "mysql_compare_sports_history",
"database.history​.kafka.recovery​.poll.interval.ms": 5000,
"database.history.kafka.bootstrap.servers": "***:9092",
"include.schema.changes": "false",
"transforms": "extractInt",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id"
}
}
And I want to add new table (which exists for a long time) from other database in the same mysql. After adding it to include list, I am getting error:
Encountered change event for table new_database.new_table whose schema isn't known to this connector
I've tried to create new connector, with snapshot.mode: initial but then only new mysql_history topic is created but no desired new_database.new_table topic
What should I do to add to existing connector new table from new database?
Thanks

You can create a new connector and set the snapshot to initial and specify a different database.server.id
Since the table already exists, you should include a snapshot.override as well as the default implementation will select all rows in the existing table which will hold a lock and prevent writers from writing into the database. You definitely don't want to do this.
{
...
"snapshot.mode"="initial",
"database.server.id": "different",
"snapshot.select.statement.overrides":"new-table",
"snapshot.select.statement.new-table":"select * .. where <>"
...
}
Alternatively, if you want to use the same connector name, you'd have to stop it and manually clear its offset in the internal connect.offsets.storage topic. Then, when the connector is recreated (with the same name), it will restore its offsets based on the snapshot configuration that you have provided.
For more such cases, you can go through this blog (disc: I'm the author)

Since debezium:1.6 you should be able to send a signal to snapshot the newly added table.
https://debezium.io/documentation/reference/1.7/configuration/signalling.html
https://debezium.io/blog/2021/05/06/debezium-1-6-alpha1-released/
Sending a signal is done via inserting a record in a specific table:
INSERT INTO myschema.debezium_signal VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["schema1.table1", "schema1.table2"]}')
And to make this feature available you should first create this table
CREATE TABLE debezium_signal (id VARCHAR(42) PRIMARY KEY, type VARCHAR(32) NOT NULL, data VARCHAR(2048) NULL);
* syntax may differ between database types
Also the signal table must be added among the captured data collections, i.e., included in the table.include.list parameter.

Related

is there a way to link the created_by field that gets auto generated to users (create a relation) rather than it being related to the admin Users?

I'm new to Strapi tried creating a new collection and updating a value using postman for my endpoint.
The problem I'm having is that the "created_by" field seems to get auto-generated and will not allow me to update it using a created user credentials/id but it always picks the admin id.
I'm lost on this how can you relate the "created_by" field to your defined users rather than the admin table?
You shouldn't use "created_by" field.
You need a relation field between users and collection
Such as :
Relation with User (from: users-permissions)
Please check this "many to one relation" example.
So projectroot\api\bird\models\bird.settings.json will have the following lines:
"user": {
"via": "birds",
"plugin": "users-permissions",
"model": "user"
},

Using numerics as type in Elasticsearch

I am going to store transaction logs on elasticsearch. I am new to ELK stack and not sure about how I should implement this on ELK stack. My transaction is printing lines of log sequentially(upserts) and instead of logging these to a file I want to store these on ElastichSearch and later I will query the logs by the transactionId I have created.
Normally the URI for querying will be
/bookstore/books/_search
but in my case it must be like
/transactions/transactionId/_search
because I dont want to store lines as array attached to a single transaction record but I am not sure if this is a good practice to create a new type in the beginning of every transaction. I am not even sure if this is possible.
Can you give advices about storing these transaction data on elasticsearch?
if you want to query with a URI like /transactions/transactionId/_search, that means you are planning to create multiple types every time a new transactionid comes. Now , apart from this being a bad design, its not even possible to have more than one type in an index(post version 5.X I guess) and types have been completely removed since version 7.X .
One work-around is if you use the transactionId itself as the document ID while creation. Then you can get the log associated with one transactionId by querying GET transactions/transactionId (read about the length restrictions of the document id though) but this might cause another issue, that being , there can be multiple logs for the same transaction, so each log entry having the same id would simply overwrite the previous entry.
The best solution here will be to change how you query those records.
For this you can put transactionId as one of the fields in the json body, along with maybe a created time stamp at the time of insertion ( let ES create the documents with the auto generated id) and then query all logs associated with a transaction like :
POST transactions/_search
{
"sort": [
{
"createdDate": {
"order": "asc"
}
}
],
"query":{
"bool":{
"must":[
{
"term":{
"transactionId.keyword":"<transaction id>"
}
}
]
}
}
}
Hope, this helps

Laravel 5 Eloquent model dynamically generated at runtime

I am looking to make some sort of "GenericModel" class extending Eloquent's Model class, that can load database configuration (like connection, table name, primary key column) as well as relationships at runtime based on a configuration JSON file.
My reasons for wanting this are as follows: I'm going to have a lot of database tables and thus a lot of models, but most don't really have any complicated logic behind them. I've developed a generic CRUD API and front-end interface to interact with them. Each model has a "blueprint" JSON file associated with it that describes things like its attributes and relationships. This lets me automatically generate, say, a view to create a new model and it knows what attributes I need to fill in, what input elements to use, what to label them, which are mandatory, how to validate, whether to check for uniqueness, etc. without ever needing code specific to that model. Here's an example, project.json:
{
"db_table": "projects",
"primary_key": "projectid",
"display_attr": "title", // Attribute to display when picking row from list, etc
"attributes": {
"projectid": { // Attribute name matches column name
"display": "ID", // Display this to user instead of db column name
"data_type": "integer" // Can be integer, string, numeric, bool...
},
"title": {
"data_type": "string",
"unique": true // Check for uniqueness when validating field
},
"customer": {
"data_type": "integer", // Data type of local key, matches customer PK
"relationship": { // Relationship to a different model
"type": "manytoone",
"foreign_model": "customer"
},
"user": "autocomplete" // User input element/widget to use, queries customer model for matches as user types
},
"description": {
"data_type": "string",
"user": "textarea" // Big string, use <textarea> for user input
"required": false // Can be NULL/empty, default true
}
},
"views": {
"table": [ // Show only these attributes when viewing table
"customer",
"title"
],
"edit_form": [ // Show these when editing
"customer",
"title",
"description"
],
...
}
}
This works extremely well on the front end, I don't need any more information than this to describe how my models behave. Problem is I feel like I just end up writing this all over again in most of my Model classes and it seems much more natural to have them just pull information from the blueprint file as well. This would result in the information being in one place rather than two, and would avoid extra effort and possible mistakes when I change a database table and only need to update one file to reflect it.
I'd really just like to be able to do something like GenericModel::blueprint('project.json')->find($id) and get a functioning "product" instance. Is this possible, or even advisable? Or is there a better way to do this?
Have you looked at Migrations (was Schema Builder)? It allows you to programatically build models (from JSON if necessary).
Then you could leverage Eloquent on your queries...

Changing data in every document

I am working on an application that has messages and I want to store all the messages. But my problem is the message has a from first name and last name which could change. So if for example my JSON was
{
"subject": "Hello!",
"message": "Hello there",
"from": {
"user_id": 1,
"firstname": "George",
"lastname": "Lastgeorge"
}
}
The user could potentially change their last name or even first name. Which would require basically looping over every record in elasticsearch and updating everyone with the user_id.
Is there a better way to go about doing this?
I feel you should use parent mapping.
Keep the user info as parent with userID as key.
/index/userinfo/userID
{
"name" : "George",
"last" : "Lastgeorge"
}
Next , you need to maintain each chat as a child document and map the parent to the userindo type.
This way , whenever you want to make some change to the user information , simply make the change in userInfo type.
With this feature intact , you can search your logs based on user information , or search users based on chat records.
Link - http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html

MongoDB schema for JSON feed

Looking for a little guidance in setting up a MongoDB schema. Here's the scenario:
I'm creating a save bookmark feature for people. In the DB, all I need to store is the username, a title, and a link. From this, I would need to create a service that outputs JSON and queries a particular bookmark or a person's entire feed. Which of the two set ups makes more sense from both an implementation and performance stand point?
A) Each bookmark is its own object:
{
"_id": ObjectId("abcd1234"),
"username": "Choy",
"title": "This is my first link",
"url": "http://www.google.com"
},
{
"_id": ObjectId("abcd1234"),
"username": "Choy",
"title": "This is my second link",
"url": "http://www.bing.com"
}
B) Each user is its own object:
{
"_id": "Choy",
"bookmarks": {
"abcd1234": {
"title": "This is my first link",
"url": "http://www.google.com"
},
"abcd12345": {
"title": "This is my second link",
"url": "http://www.bing.com"
}
}
}
Initially (A) made more sense to me, as I could easily query a specific bookmark, update, and remove it. But from the application point of view, (B) would be easier when I want to list all the bookmarks for a person as I could just do a findOne(username) on the _id instead of having to iterate through each record after doing a find(username) and convert to an array and then JSON (which I believe is a bit memory intensive).
On the other hand, it would be an extra step in (B) to add a new bookmark, as I would have to get the record, push a new bookmark into it and then save.
When you have a has-a relation in MongoDB, it is usually the best decision to embed the data in the object which owns it.
Your goal is to fulfill the needs of the user with as few searches as possible, because every single document lookup costs time. When you don't need all the bookmarks from a user but only specific ones, you can always use the dot notation to reach into objects and retrieve subsets of fields.
Aggregation instead of relation is also useful when you delete or rename a user. MongoDB can't do auto-cascade like SQL databases, so you have to deal with any orphaned data yourself in that case. But when the user document is self-contained, this won't be a problem.
So I would recommend you to go for solution B).

Resources