I have a Clickhouse table that uses a Kafka engine.
However, I want to modify the kafka broker list of the table.
Is this possible? There seems to be no documentation.
This can be done by deleting and recreating the table with the Kafka engine. Because the table with such engine does not store, but only receives data, this operation should not negatively affect the operation of the service
Related
I am migrating some of the IoT data from MySQL to CnosDB and then replicate some of the anomalies and also the changes/delta of the change to ClickHouse for further analysis.
Currently I have to do full replication (similar to full backup). Is there an easy way to just replicate the change data from CnosDB to ClickHouse. FYI. I am using the Kafka in the middle to stream the data.
I have Debezium in a container, capturing all changes of PostgeSQL database records. In addition a have a Kafka container to store the topic messages. At last I have a JDBC container to write all changes to another database.
These three containers are working as expected, performing snapshots of the old data in the database on specific tables and streaming new changes while there are reflected into the destination database.
I have figure out that during this streaming the PostgreSQL WAL is increasing, to overcome this situation I enabled the following property on the source connector to clear all retrieved logs.
"heartbeat.interval.ms": 1000
Now the PostgreSQL WAL file is getting cleared in every heartbeat as the retrieved changed as flushed. But meanwhile even the changes are committed into the secondary database the kafka topics are remaining with the exact size.
Is there any way or property into sink connector that will force kafka to delete commited messages?
Consumers have no control over topic retention.
You may edit the topic config directly to reduce the retention time, but then your consumer must read the data within that time.
I'm using confluent platform 3.3 to pull data from Oracle database. Once the data has been pushed to kafka server the retrieved data should be deleted in the database.
Are there any way to do it ? Please suggest.
There is no default way of doing this with Kafka.
How are you reading your data from the database, using Kafka Connect, or with custom code that you wrote?
If the latter is the case I'd suggest implementing the delete in your code, collect ids once Kafka has confirmed send and batch delete regularly.
Alternatively you could write a small job that reads your Kafka topic with a different consumer group than your actual target system and deletes based on the records it pulls from the topic. If you run this job every few minutes, hours,... you can keep up with the sent data as well.
I am using phoenix query server for read write operation from hbase table. As hbase support ACID property so Do I need to manage externally transactions from phoenix server. As Transaction right now beta and I am not sure about it.
I want to setup an elasticsearch cluster using multicast feature.One node is a external elasticsearch node and the other node is a node client (client property set as true-not hold data).
This node client is created using spring data elasticsearch. So I want to index data from postgresql database to external elasticsearch node.I had indexed data by using jdbc river plugin.
But I want to know is there any application that I can use for index data from postgresql instead of using the river plugin?
It is possible to do this in realtime, although it requires writing a dedicated Postgres->ES gateway and using some Postgres-specific features. I've written about it here: http://haltcondition.net/2014/04/realtime-postgres-elasticsearch/
The principle is actually pretty simple, complexity of the method I have come up with is due to handling corner cases such as multiple gateways running and gateways becoming unavailable for a while. In short my solution is:
Attach a trigger to all tables of interest that copies the updated row IDs to a temporary table.
The trigger also emits an async notification that a row has been updated.
A separate gateway (mine is written in Clojure) attaches to the Postgres server and listens for notifications. This is the tricky part, as not all Postgres client drivers support async notifications (there is a new experimental JDBC driver that does, which is what I use).
On update the gateway reads, transforms and pushes the data to Elasticsearch.
In my experiments this model is capable of sub-second updates to Elasticsearch after a Postgres row insert/update. Obviously this will vary in the real world though.
There is a proof-of-concept project with Vagrant and Docker test frameworks here: https://bitbucket.org/tarkasteve/postgres-elasticsearch-realtime