Kafka state store not available in distributed environment - spring-boot

I have a business application with the following versions
spring boot(2.2.0.RELEASE) spring-Kafka(2.3.1-RELEASE)
spring-cloud-stream-binder-kafka(2.2.1-RELEASE)
spring-cloud-stream-binder-kafka-core(3.0.3-RELEASE)
spring-cloud-stream-binder-kafka-streams(3.0.3-RELEASE)
We have around 20 batches.Each batch using 6-7 topics to handle the business.Each service has its own state store to maintain the status of the batch whether its running/Idle.
Using the below code to query th store
#Autowired
private InteractiveQueryService interactiveQueryService;
public ReadOnlyKeyValueStore<String, String> fetchKeyValueStoreBy(String storeName) {
while (true) {
try {
log.info("Waiting for state store");
return new ReadOnlyKeyValueStoreWrapper<>(interactiveQueryService.getQueryableStore(storeName,
QueryableStoreTypes.<String, String> keyValueStore()));
} catch (final IllegalStateException e) {
try {
Thread.sleep(1000);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
When deploying the application in one instance(Linux machine) every thing is working fine.While deploying the application in 2 instance we find the folowing observations
state store is available in one instance and other dosen't have.
When the request is being processed by the instance which has the state store every thing is fine.
If the request falls to the instance which does not have state store the application is waiting in the while loop indefinitley(above code snippet).
While the instance without store waiting indefinitely and if we kill the other instance the above code returns the store and it was processing perfectly.
No clue what we are missing.

When you have multiple Kafka Streams processors running with interactive queries, the code that you showed above will not work the way you expect. It only returns results, if the keys that you are querying are on the same server. In order to fix this, you need to add the property - spring.cloud.stream.kafka.streams.binder.configuration.application.server: <server>:<port> on each instance. Make sure to change the server and port to the correct ones on each server. Then you have to write code similar to the following:
org.apache.kafka.streams.state.HostInfo hostInfo = interactiveQueryService.getHostInfo("store-name",
key, keySerializer);
if (interactiveQueryService.getCurrentHostInfo().equals(hostInfo)) {
//query from the store that is locally available
}
else {
//query from the remote host
}
Please see the reference docs for more information.
Here is a sample code that demonstrates that.

Related

handle shutdown of Java AWS Lambda

Is there a way to hook into a Lambda's shutdown? I am opening a database connection and want to keep it open, but I want to make sure it gets closed when the Lambda is terminated.
You are probably interested in an event that is thrown when the Lambda instance is being killed and not when a single invocation ends, right? You have one option for both though, but I doubt that they'll help you..
You can either use the context method getRemainingTimeInMillis() (links to Node.js but similar in other programming languages) to find out when the current invocation of your Lambda function times out. This might be helpful to cleanup things or use the time of your Lambda function to the full extent. I don't recommend to cleanup your database connections at the end of each invocation because then you won't reuse them for future invocations which slows down your Lambda function. But if you're okay with that, then go for it. Remember that this only works as long as your function is running. As soon as you have returned a response, you can't perform any cleanup operations because your Lambda function will get into a 'sleep mode'. You need to do this before you return something.
Alternatively, you can make use of the Extensions API. It offers a shutdown phase and triggers an extension with a Shutdown event. However, since an extension sits besides your function (and not within your function code), I'm not sure if you have a chance to clean up any database connections with this approach... See also Lambda Execution Environment for additional information.
Assuming you have a pooled connection for a warm lambda, you may register a shutdown hook to close the DB connection or release any other resources, you only have 500 ms to perform this task.
class EnvironmentConfig {
private static volatile boolean shutdownRegistered;
private static volatile HikariDataSource ds;
private void registerShudownHook() {
if (!shutdownRegistered) {
synchronized (lock) {
if (!shutdownRegistered) {
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
if (ds != null) {
ds.close();
}
}));
EnvironmentConfig.shutdownRegistered = true;
}
}
}
}
public DataSource dataSource() throws PropertyVetoException {
HikariDataSource _ds = EnvironmentConfig.ds;
if (_ds == null) {
synchronized (lock) {
_ds = EnvironmentConfig.ds;
if (_ds == null) {
_ds = new HikariDataSource();
// TODO: set connection props
EnvironmentConfig.ds = _ds;
registerShudownHook();
}
}
}
return _ds;
}
}
You can reference the datasource anywhere to get a singleton copy which will create the instance and register the shutdown hook.
Your shutdown hook could do other tasks, provided it does them quickly, or you can register more than one hook, just don't go nuts with how many threads you're registering.
No, you can't hook into the shutdown of a Lambda Execution context.
Lambda handles that on it's own an decides if and when to re-use or destroy execution contexts.
You'll probably have to rely on the connections to time out on their own.

How to track java method calls and get alerts in Dynatrace?

I have a code similar to the one below. Every-time a DBLock appears, I want to get an alert in Dynatrace creating a problem so that I can see it on the dashboard and possibly get an email notification also. The DB lock would appear if the update count is greater than 1.
private int removeDBLock(DataSource dataSource) {
int updateCount = 0;
final Timestamp lastAllowedDBLockTime = new Timestamp(System.currentTimeMillis() - (5 * 60 * 1000));
final String query = format(RELEASE_DB_CHANGELOCK, lastAllowedDBLockTime.toString());
try (Statement stmt = dataSource.getConnection().createStatement()) {
updateCount = stmt.executeUpdate(query);
if(updateCount>0){
log.error("Stale DB Lock found. Locks Removed Count is {} .",updateCount);
}
} catch (SQLException e) {
log.error("Error while trying to find and remove Db Change Lock. ",e);
}
return updateCount;
}
I tried using the event API to trigger an event on my host mentioned here and was successful in raising a problem alert on my dashboard.
https://www.dynatrace.com/support/help/dynatrace-api/environment-api/events/post-event/?request-parameters%3C-%3Ejson-model=json-model
but this would mean injecting an api call in my code just for monitoring, any may lead to more external dependencies and hence more chance of failure.
I also tried creating a custom service detection by adding the class containing this method and the method itself in the custom service. But I do not know how I can link this to an alert or a event that creates a problem on the dashboard.
Are there any best practices or solutions on how I can do this in Dynatrace. Any leads would be helpful.
I would take a look at Custom Services for Java which will cause invocations of the method to be monitored in more detail.
Maybe you can extract a method which actually throws the exception and the outer method which handles it. Then it should be possible to alert on the exception.
There are also some more ways to configure the service via settings, i.e. raise an error based on a return value directly.
See also documentation:
https://www.dynatrace.com/support/help/how-to-use-dynatrace/transactions-and-services/configuration/define-custom-services/
https://www.dynatrace.com/support/help/technology-support/application-software/java/configuration-and-analysis/define-custom-java-services/

Kafka Streams - The state store may have migrated to another instance

I'm writing a basic application to test the Interactive Queries feature of Kafka Streams. Here is the code:
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
KeyValueBytesStoreSupplier waypointsStoreSupplier = Stores.persistentKeyValueStore("test-store");
StoreBuilder waypointsStoreBuilder = Stores.keyValueStoreBuilder(waypointsStoreSupplier, Serdes.ByteArray(), Serdes.Integer());
final KStream<byte[], byte[]> waypointsStream = builder.stream("sample1");
final KStream<byte[], TruckDriverWaypoint> waypointsDeserialized = waypointsStream
.mapValues(CustomSerdes::deserializeTruckDriverWaypoint)
.filter((k,v) -> v.isPresent())
.mapValues(Optional::get);
waypointsDeserialized.groupByKey().aggregate(
() -> 1,
(aggKey, newWaypoint, aggValue) -> {
aggValue = aggValue + 1;
return aggValue;
}, Materialized.<byte[], Integer, KeyValueStore<Bytes, byte[]>>as("test-store").withKeySerde(Serdes.ByteArray()).withValueSerde(Serdes.Integer())
);
final KafkaStreams streams = new KafkaStreams(builder.build(), new StreamsConfig(createStreamsProperties()));
streams.cleanUp();
streams.start();
ReadOnlyKeyValueStore<byte[], Integer> keyValueStore = streams.store("test-store", QueryableStoreTypes.keyValueStore());
KeyValueIterator<byte[], Integer> range = keyValueStore.all();
while (range.hasNext()) {
KeyValue<byte[], Integer> next = range.next();
System.out.println(next.value);
}
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
protected static Properties createStreamsProperties() {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "random167");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "client-id");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, Serdes.Integer().getClass().getName());
//streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10000);
return streamsConfiguration;
}
So my problem is, every time I run this I get this same error:
Exception in thread "main" org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, test-store, may have migrated to another instance.
I'm running only 1 instance of the application, and the topic I'm consuming from has only 1 partition.
Any idea what I'm doing wrong ?
Looks like you have a race condition. From the kafka streams javadoc for KafkaStreams::start() it says:
Start the KafkaStreams instance by starting all its threads. This function is expected to be called only once during the life cycle of the client.
Because threads are started in the background, this method does not block.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html
You're calling streams.store() immediately after streams.start(), but I'd wager that you're in a state where it hasn't initialized fully yet.
Since this is code appears to be just for testing, add a Thread.sleep(5000) or something in there and give it a go. (This is not a solution for production) Depending on your input rate into the topic, that'll probably give a bit of time for the store to start filling up with events so that your KeyValueIterator actually has something to process/print.
Probably not applicable to OP but might help others:
In trying to retrieve a KTable's store, make sure the the KTable's topic exists first or you'll get this exception.
I failed to call Storebuilder before consuming the store.
Typically this happens for two reasons:
The local KafkaStreams instance is not yet ready (i.e., not yet in
runtime state RUNNING, see Run-time Status Information) and thus its
local state stores cannot be queried yet. The local KafkaStreams
instance is ready (e.g. in runtime state RUNNING), but the particular
state store was just migrated to another instance behind the scenes.
This may notably happen during the startup phase of a distributed
application or when you are adding/removing application instances.
https://docs.confluent.io/platform/current/streams/faq.html#handling-invalidstatestoreexception-the-state-store-may-have-migrated-to-another-instance
The simplest approach is to guard against InvalidStateStoreException when calling KafkaStreams#store():
// Example: Wait until the store of type T is queryable. When it is, return a reference to the store.
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}

Hibernate sessions fetch stale data when using async controllers and long polling

I have a problem with async controllers in Grails. Consider the following controller:
#Transactional(readOnly=true)
class RentController {
def myService
UserProperties props
def beforeInterceptor = {
this.props = fetchUserProps()
}
//..other actions
#Transactional
def rent(Long id) {
//check some preconditions here, calling various service methods...
if (!allOk) {
render status: 403, text: 'appropriate.message.key'
return
}
//now we long poll because most of the time the result will be
//success within a couple of seconds
AsyncContext ctx = startAsync()
ctx.timeout = 5 * 1000 * 60 + 5000
ctx.start {
try {
//wait for external service to confirm - can take a long time or even time out
//save appropriate domain objects if successful
//placeRental is also marked with #Transactional (if that makes any difference)
def result = myService.placeRental()
if (result.success) {
render text:"OK", status: 200
} else {
render status:400, text: "rejection.reason.${result.rejectionCode}"
}
} catch (Throwable t) {
log.error "Rental process failed", t
render text: "Rental process failed with exception ${t?.message}", status: 500
} finally {
ctx.complete()
}
}
}
}
The controller and service code appear to work fine (though the above code is simplified) but will sometimes cause a database session to get 'stuck in the past'.
Let's say I have a UserProperties instance whose property accountId is updated from 1 to 20 somewhere else in the application while a rent action is waiting in the async block. As the async block eventually terminates one way or another (it may succeed, fail or time out), the app will sometimes get a stale UserProperties instance with accountId: 1. Let's say I refresh the updated user's properties page, I will see accountId: 1 about 1 time per 10 refreshes while the rest of the time it will be 20 - and this is on my development machine where noone else is accessing the application (though the same behaviour can be observed in production). My connection pool also holds 10 connections so I suspect there may be a correlation here.
Other strange things will happen - for example, I will get StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) from actions doing something as simple as render (UserProperties.list() as JSON) - after the response had already rendered (successfuly apart from the noise in the logs) and despite the action being annotated with #Transactional(readOnly=true).
A stale session doesn't seem to appear every time and so far our solution was to restart the server every evening (the app has few users for now), but the error is annoying and the cause was hard to pinpoint. My guess is that a DB transaction doesn't get committed or rolled back because of the async code, but GORM, Spring and Hibernate have many nooks and crannies where things could get stuck.
We're using Postgres 9.4.1 (9.2 on a dev machine, same problem), Grails 2.5.0, Hibernate plugin 4.3.8.1, Tomcat 8, Cache plugin 1.1.8, Hibernate Filter plugin 0.3.2 and the Audit Logging plugin 1.0.1 (other stuff too, obviously, but this feels like it could be relevant). My datasource config contains:
hibernate {
cache.use_second_level_cache = true
cache.use_query_cache = false
cache.region.factory_class = 'org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory'
singleSession = true
flush.mode = 'manual'
format_sql = true
}
Grails bug. And a nasty one, everything seems OK until your app starts acting funny in completely unrelated parts of the app.

Test Hector Spring Cassandra connection

In my Java-Spring based web app I'm connecting to Cassandra DB using Hector and Spring.
The connection works just fine but I would like to be able to test the connection.
So if I intentionally provide a wrong host to CassandraHostConfigurator I get an error:
ERROR connection.HConnectionManager: Could not start connection pool for host <myhost:myport>
Which is ok of course. But how can I test this connection?
If I define the connection pragmatically (and not via spring context) it is clear, but via spring context it is not really clear how to test it.
can you think of an idea?
Since I could not come up nor find a satisfying answer I decided to define my connection pragmatically and to use a simple query:
private ColumnFamilyResult<String, String> readFromDb(Keyspace keyspace) {
ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspace, tableName, StringSerializer.get(),
StringSerializer.get());
// It doesn't matter if the column actually exists or not since we only check the
// connection. In case of connection failure an exception is thrown,
// else something comes back.
return template.queryColumns("some_column");
}
And my test checks that the returned object in not null.
Another way that works fine:
public boolean isConnected() {
List<KeyspaceDefinition> keyspaces = null;
try {
keyspaces = cluster.describeKeyspaces();
} catch (HectorException e) {
return false;
}
return (!CollectionUtils.isEmpty(keyspaces));
}

Resources