The default setup with kafka-streams and spring-boot-actuator includes adding the state of the stream threads into the health check endpoint. This is great, but if my application is using a state store, and that store is still in the process of restoring, the application still returns as being Up. When I then try and interact with the store I get one of the following errors about the stream thread not being running
Cannot get state store <storename> because the stream thread is STARTING, not RUNNING
Cannot get state store <storename> because the stream thread is PARTITIONS_ASSIGNED, not RUNNING
Is there any way I can include this stream thread status in the health checks, so that I can prevent requested being routed to it until the state store is available again?
Related
Based on the suggestion of Confluent here, before we query the store we have to wait for it to become queryable, but at the same time we are receiving requests (http for example) even though the application cannot service them.
Wouldn't it be preferable to have this waiting time occurring during boot time?
Is there a way to wait until the state is "RUNNING" before the service is up and running?
Starting from the Substrate's Offchain Worker recipe that leverages the Substrate http module, I'm trying to handle http responses that are delivered as streams (basically interfacing a pubsub mechanism with a chain through a custom pallet).
Non-stream responses are perfectly handled as-is and reflecting them on-chain with signed transactions is working for me, as advertised in the doc.
However, when the responses are streams (meaning the http requests are never completed), I can only see the stream data logs in my terminal when I shut down the Substrate node. Trying to reflect each received chunk as a signed transaction doesn't work either: I can also see my logs only on node shut down, and the transaction is never sent (which makes sense since the node is down).
Is there an existing pattern for this use case? Is there a way to get the stream observed in background (not in the offchain worker runtime)?
Actually, would it be a good practice to keep the worker instance running ad vitam for this http request? (knowing that in my configuration the http request is sent only once, via a scheme of command queue - in the pallet storage - that gets cleaned at each block import).
I've been using Flink and kinesis analytics recently.
I have a stream of data and also I need a cache to be shared with the stream.
To share the cache data with the kinesis stream, it's connected to a broadcast stream.
The cache source extends SourceFunction and implements ProcessingTimeCallback. Gets the data from DynamoDB every 300 seconds and broadcast it to the next stream using KeyedBroadcastProcessFuction.
But after adding the broadcast stream (in the previous version I hadn't
a cache and I was using KeyedProcessFuction for kinesis stream), when I execute it in kinesis analytics, it keeps restarting about every 1000 seconds without any exception!
I have no configuration with this value and the scenario works fine in between!
Could anybody help me what could be the issue?
My first thought is to wonder if this might be related to checkpointing. Do you have access to the server logs? Flink's logging should make it somewhat clear what's causing the restart.
The reason why I suspect checkpointing is that it occurs at predictable times (and with a long timeout), and using broadcast state can put a lot of pressure on checkpointing. Each parallel instance will checkpoint a full copy of the broadcast state.
Broadcast state has to be kept on-heap, so another possibility is that you are running out of memory.
Given a recently started Kafka Streams app, how can one reliably determine that it has reached the "RUNNING" state? This is in the context of a test program that launches one or more streams apps and needs to wait until they are running before submitting test messages.
I know about the .setStateListener method but I'm wondering if there is a way of detecting this state from outside the app process. I thought it might be exposed as a jmx metric but I couldn't find one in VisualVM
The state listener method is the way to go. There is no other out-of-the-box way to achieve what you want.
That said, you can do the following for example:
Expose a simple "health check" (or "running yes/no check") in your Kafka Streams application, e.g. via a REST endpoint (use whatever REST tooling you are familiar with).
The health check can be based on Kafka Streams' built-in state listener, which you already know about.
Your test program can then remotely query the health check endpoints of your various Kafka Streams application to determine when all of them are up and running.
Of course, you can use other ways to communicate readiness of a Kafka Streams application. The REST endpoint idea in (1) is just one example.
You can also let the Kafka Streams application write its readiness status into a Kafka topic, and your test program will subscribe to that topic to determine when all apps are ready.
Another option would be to provide a custom JMX metric in your Kafka Streams apps that your test program can then access.
Kafka state store Rock DB is fault tolerant , from the change log how can restore that piece of data which is not functioning ?
The restoration of all built-in storage engines in the Kafka Streams API is fully automated.
Further details are described at http://docs.confluent.io/current/streams/developer-guide.html#fault-tolerant-state-stores, some of which I quote here:
In order to make state stores fault-tolerant (e.g., to recover from machine crashes) as well as to allow for state store migration without data loss (e.g., to migrate a stateful stream task from one machine to another when elastically adding or removing capacity from your application), a state store can be continuously backed up to a Kafka topic behind the scenes. We sometimes refer to this topic as the state store’s associated changelog topic or simply its changelog. In the case of a machine failure, for example, the state store and thus the application’s state can be fully restored from its changelog. You can enable or disable this backup feature for a state store, and thus its fault tolerance.