I have a Kafka cluster running on AWS MSK with Kafka producer and consumer go clients running in kubernetes. The producer is responsible for sending the stream of data to Kafka. I need help solving the following problems:
Let's say, there is some code change in producer code and have to redeploy it in kubernetes. How can I do that? Since the data is continuously generated, I cannot just simply stop the already running producer and deploy the updated one. In this case, I will lose the data between the update process.
Sometimes due to a panic(golang) in the code, the client crashes, but since it is running as a pod, kubernetes restarts it. I am not able to understand as to whether it's a good thing or bad.
Thanks
For your first question, I would suggest having rolling update of your deployment in the cluster.
For second, that is the general behavior of deployments in kubernetes. I could think of an external monitoring solution that de-deploys your application or stops handling requests in case of a panic.
It would be better if you could explain why exactly you need such kind of behavior.!
Related
We are going to work on springboot application which will be deployed on two ECS containers to support the cluster environment. This application will accept the request and drop message into SQS. Another flow in the application should pick the message from queue and process it. As same application will be running on two different servers in cluster environment, I am not sure which server will pick the message from queue. How can I make sure that only one server picks up the message from queue. It could be either server.
Ordinary SQS queues do not even guarantee that a message only appears once on the queue - see AWS Standard SQS Queue docs
Using a reasonable value for visibility timeout, the time that a message can’t be seen by other consumers, vs the time it takes to consume a message should solve it.
Alternatively you can use an SQS FIFO queue but it’s much slower and can, in my experience, get stuck on a corrupt message.
I have a Java Program to run in Apache flink in AWS i want to run
real time communication through web socket how can i integrate serverless web socket in Apache flink Java ???
Thanks You
Flink is designed to help you process and move data continuously between storage or streaming solutions. It is not intended to, and would not work well with websockets directly for these reasons:
When submitting a job, the runtime serializes your logic and moves it to other TaskManager instances so that it can parallelize them. These can be on another machine entirely. Now, if you were intending to service a websocket with that code, it has just moved elsewhere!
TaskManagers can be stopped and restarted (scaling event, recovering from a checkpoint/savepoint, etc). That's where your websocket connection will be cut.
Also, the Flink planner can decide that your source functions need be read twice if it helps the processing. This means that your websockets would need to maintain a history of messages received, and make sure they are sent once to each operator instance.
This being said you can have a webserver managing the websocket, piping messages back and forth to a Kafka topic, which then Flink can operate on.
Since you're talking about AWS, I suggest you learn about their Websocket API Gateway service. I believe these can be connected easily with Kinesis, which Flink can read from and write to easily.
Current flow of the project that I'm working on involves pushing to a local kafka using ruby-kafka gem.
Now the need arose to add producer for the remote kafka, and duplicate also messages there.
And I'm looking for a better way, than calling Kafka.new(...) twice...
Could you please help me, and do you happen to have any ideas?
Another approach to consider would be writing the data once from your application, and then asynchronously replicating the message from one Kafka cluster to another. There are multiple ways of doing this including Apache Kafka's MirrorMaker, Confluent's Replicator, Uber's uReplicator etc.
Disclaimer: I work for Confluent.
I have a consumer that reads and writes messages to a time-series database. We have multiple instances of the time series database running as a cluster on multiple physical machines.
Our plan is to deploy the consumer on Kubernetes so I can scale if I need more instance with load-balance they all point to a single time series service that is running.
Now I getting an Issue where it's come to my mind that if I have 5 instances which consume the same topic then they work individually means( they all get message payload and save like any one instance is doing )
What we want is
we want if one consumer is busy then it will go to the next free instance but not be subscribed to by all instance running. To scale or load-balance means I want like normal load-balancing application or how spring-boot app works when you scale on Kubernetes
so is there any way to make it like a load-balancing consumer and processing only one, even consume by 1st or 2nd or 3rd like normal app work as loadbanlacer?
if anyone has ideas about this, how it going to behave and what kind of output we are going to get if doing this with Kafka Spring boot application?
I'm starting with kafka and I need to control the inserts in a specific Oracle table, send the new records through kafka at the moment. I have no control over the database, so, in principle, Debizium is excluded. How can I do this? Without using triggers.
I've made a producer read data from Oracle with a java program in eclipse but, that would make constant requests to the database. I use java for simulated a ETL with consumer.
PS: I work with Windows but that's secondary.
If I understand your problem correctly, you are trying to route inserts from Kafka to Oracle Database. There could be few possibilities:
You implement Kafka consumer and as soon as your kafka cluster gets a message consumer makes a insert. You could reuse your java code here- just remove the polling part. Please visit here
If you have kafka deployed in a cloud environment and are using it as a service(aws msk) you would have the option to handling the events. Again you can use java program or can write a python script to make inserts. Please visit here
I would like to understand your throughput requirements, whether you really need kafka as a distributed messaging system or a simple aws sqs would work just fine. If you can use sqs things would be straightforward for you. You create a queue and you write a listener in
python or java
boto3 is an excellent python library for working with sqs