We are deploying our spring boot applications in OpenShift.
Currently we are trying to run a potentially long running task (database migration) before the webcontext is fully set up.
It is especially important that the app does not accept REST requests or process messages before the migration is fully run.
See the following minimal example:
// DemoApplication.java
#SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}
// MigrationConfig.java
#Configuration
#Slf4j
public class MigrationConfig {
#PostConstruct
public void run() throws InterruptedException {
log.info("Migration...");
// long running task
Thread.sleep(10000);
log.info("...Migration");
}
}
// Controller.java
#RestController
public class Controller {
#GetMapping("/test")
public String test() {
return "test";
}
}
// MessageHandler.java
#EnableBinding(Sink.class)
public class MessageHandler {
#StreamListener(Sink.INPUT)
public void handle(String message) {
System.out.println("Received: " + message);
}
}
This works fine so far: the auto configuration class is processed before the app responds to requests.
What we are worried about, however, is OpenShifts readiness probe: currently we use an actuator health endpoint to check if the application is up and running.
If the migration takes a long time, OpenShift might stop the container, potentially leaving us with inconsistent state in the database.
Does anybody have an idea how we could communicate that the application is starting, but prevent REST controller or message handlers from running?
Edit
There are multiple ways of blocking incoming REST requests, #martin-frey suggested a servletfilter.
The larger problem for us is stream listener. We use Spring Cloud Stream to listen to a RabbitMQ queue.
I added an exemplary handler in the example above.
Do you have any suggestions on how to "pause" that?
What about a servletfilter that knows about the state of the migration? That way you should be able to handle any inbound request and return a responsecode to your liking. Also there would be no need to prevent any requesthandlers until the system is fully up.
I think it can run your app pod without influence if you set up good enough initialDelaySeconds for initialization of your application.[0][1]
readinessProbe:
httpGet:
path: /_status/healthz
port: 8080
initialDelaySeconds: 10120
timeoutSeconds: 3
periodSeconds: 30
failureThreshold: 100
successThreshold: 1
Additionally, I recommend to set up the liveness probes with same condition (but more time than the readiness probes' value), then you can implement automated recovery of your pods if the application is failed until initialDelaySeconds.
[0] [ https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes ]
[1] [ https://docs.openshift.com/container-platform/latest/dev_guide/application_health.html ]
How about adding an init container which only role is the db migration stuffs without the application.
Then another container to serve the application. But be careful when deploying the application with more than 1 replica. The replicas will also execute the initcontainer at the same time if you are using Deployment.
If you need multiple replicas, you might want to consider StatefulSets instead.
Such database migrations are best handled by switching to a Recreate deployment strategy and doing the migration as a mid lifecyle hook. At that point there are no instances of your application running so it can be safely done. If you can't have downtime, then you need to have the application be able to be switched to some offline or read/only mode against a copy of your database while doing the migration.
Don't keep context busy doing a long task in PostConstruct. Instead start migration as fully asynchronous task and allow Spring to build the rest of the context meanwhile. At the end of the task just set some shared Future with success or failure. Wrap controller in a proxy (can be facilitated with AOP, for example) where every method except the health check tries to get value from the same future within a timeout. If it succeeds, migration is done, all calls are available. If not, reject the call. Your proxy would serve as a gate allowing to use only part of API that is critical to be available while migration is going on. The rest of it may simply respond with 503 indicating the service is not ready yet. Potentially those 503 responses can also be improved by measuring and averaging the time migration typically takes and returning this value with RETRY-AFTER header.
And with the MessageHandler you can do essentially same thing. You wait for result of the future in the handle method (provided message handlers are allowed to hang indefinitely). Once the result is set, it will proceed with message handling from that moment on.
Related
I am using Netty server for a Spring boot application. Is there anyway to monitor the Netty server queue size so that we will come to know if the queue is full and server is not able to accept any new request? Also, Is there any logging by netty server if the queue is full or unable to accept a new request?
Netty does not have any logging for that purpose but I implemented a way to find pending tasks and put some logs according to your question. here is a sample log from my local
you can find all code here https://github.com/ozkanpakdil/spring-examples/tree/master/reactive-netty-check-connection-queue
About code which is very explanatory from itself but NettyConfigure is actually doing the netty configuration in spring boot env. at https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/src/main/java/com/mascix/reactivenettycheckconnectionqueue/NettyConfigure.java#L46 you can see "how many pending tasks" in the queue. DiscardServerHandler may help you how to discard if the limit is full. You can use jmeter for the test here is the jmeter file https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/PerformanceTestPlanMemoryThread.jmx
if you want to handle netty limit you can do it like the code below
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
totalConnectionCount.incrementAndGet();
if (ctx.channel().isWritable() == false) { // means we hit the max limit of netty
System.out.println("I suggest we should restart or put a new server to our pool :)");
}
super.channelActive(ctx);
}
You should check https://stackoverflow.com/a/49823055/175554 for handling the limits and here is another explanation about "isWritable" https://stackoverflow.com/a/44564482/175554
One more extra, I put actuators in the place http://localhost:8080/actuator/metrics/http.server.requests is nice to check too.
I have an external requirement that I provide an endpoint to tell the load balancer to send traffic to my app. Much like the Kubernetes "readiness" probe, but it has to be a certain format and path, so I can just give them the actuator health endpoint.
In the past I've used the HealthEndpoint and called health(), but that doesn't work for reactive apps. Is there a more flexible way to see if the app is "UP"? At this level I don't care if it's reactive or servlet, I just want to know what Spring Boot says about the app.
I haven't found anything like this, most articles talk about calling /actuator/health, but that isn't what I need.
Edit:
Just a bit more detail, I have to return a certain string "NS_ENABLE" if it's good. There are certain conditions where I return "NS_DISABLE", so I can't just not return anything, which would normally make sense.
Also, I really like how Spring Boot does the checking for me. I'd rather not re-implement all those checks.
Edit 2: My final solution
The answers below got me very far along even though it wasn't my final solution, so I wanted to give a hint to my final understanding.
It turns out that the HealthEndpoint works for reactive apps just as well as servlet apps, you just have to wrap them in Mono.
How do we define health of any web servers?
We look at how our dependent services are, we check the status of Redis, MySQL, MongoDB, ElasticSearch, and other databases, this's what actuator does internally.
Actuator checks the status of different databases and based on that it returns Up/Down.
You can implement your own methods that would check the health of dependent services.
Redis is healthy or not can be checked using ping command
MySQL can be verified using SELECT 1 command or run some query that should always success like SHOW TABLES
Similarly, you can implement a health check for other services. If you find all required services are up then you can declare up otherwise down.
What about shutdown triggers? Whenever your server receives a shutdown signal than no matter what's the state of your dependent services, you should always say down, so that upstream won't send a call to this instance.
Edit
The health of the entire spring app can be checked programmatically by autowiring one or more beans from the Actuator module.
#Controller
public class MyHealthController{
#Autowired private HealthEndpoint healthEndpoint;
#GetMapping("health")
public Health health() {
Health health = healthEndpoint.health();
return healthEndpoint.health();
}
}
There're other beans related to health check, we can auto wire required beans. Some of the beans provide the health of the respective component, we can combine the health of each component using HealthAggregator to get the final Health. All registered health indicator components can be accessed via HealthIndicatorRegistry.
#Controller
public class MyHealthController{
#Autowired private HealthAggregator healthAggregator;
#Autowired private HealthIndicatorRegistry healthIndicatorRegistry;
#GetMapping("health")
public Health health() {
Map<String, Health> health = new HashMap<>();
for (Entry<String, HealthIndicator> entry : healthIndicatorRegistry.getAll().entrySet()) {
health.put(entry.getKey(), entry.getValue().health());
}
return healthAggregator.aggregate(health);
}
}
NOTE: Reactive component has its own health indicator. Useful classes are ReactiveHealthIndicatorRegistry, ReactiveHealthIndicator etc
Simple solution is to write your own health endpoint instead of depending on Spring.
Spring Boot provides you production-ready endpoints but if it doesn't satisfy your purpose, write your end-point. It will just return "UP" in response. If the service is down, it will not return anything.
Here's the spring boot documentation on writing reactive health endpoints. Folow the guide and should be enough for your usecase.
They also document on how to write liveliness and Readiness of your application.
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#reactive-health-indicators
1st Question:
So i am using Spring Eureka and the DistributedCommandBus set via the following:
public CommandRouter springCloudCommandRouter(DiscoveryClient discoveryClient, Registration localServiceInstance) { ... }
public CommandBusConnector springHttpCommandBusConnector(#Qualifier("localSegment") CommandBus localSegment, RestOperations restOperations, Serializer serializer) { .. }
public DistributedCommandBus springCloudDistributedCommandBus(CommandRouter commandRouter, CommandBusConnector commandBusConnector) { ... }
and my question for this part is how can i show that this is working? I have two K8 pods running the above code and have seen one run the #CommandHandler and the other run the #EventSourcingEvent but did not see anything in the logs to give any indication that it is using the bus.
Just want to be able to show that it is "working" as i have been asked to do so.
the Eureka part is working and i see all the info from said dashboard.
Edit - removed 2nd question to ask in another thread
To keep focus with my answer, I'll only provide a suggestion for your first question, which summarizes to:
How can I point out my DistributedCommandBus set up with Eureka is actually routing the commands to different instances?
I would suggest to set up some logging around this.
That way, you could log when the message is dispatched from Node 1 and when it is handled by Node 2.
Ideal for this would be to register the LoggingInterceptor as a MessageHandlerInterceptor and MessageDispatchInterceptor.
To do so, you will have to register it on the DistributedCommandBus, but also on the "local segment" CommandBus. The DistributedCommandBus will be in charge of dispatching it and thus calling the LoggingInterceptor upon dispatching. The local segment/CommandBus is in charge of providing the command to a Command Handler in the right JVM and as such will call the LoggingInterceptor upon handling.
The sole downside to this, is that the LoggingInterceptor will be a handler and dispatch interceptor as off Axon Framework release 4.2.
Thus, for now, you will have to do with it only being a Handler Interceptor.
However, this would suffice as well, as the LoggingInterceptor will only log upon handling the command.
This would then only occur on the Node which actually handles the command.
Hope this helps!
I start a wiremock server in a integration test.
The IT pass in my local BUT some case failed in jenkins server, the error is
localhost:8089 failed to respond; nested exception is org.apache.http.NoHttpResponseException: localhost:8089 failed to respond
I try to add sleep(3000) in my test, that can fix the issue, But I don’t know the root cause of the issue, so the work around not a good idea
I also try to use #AutoConfigureWireMock(port=8089) to replace WireMockServer to start wiremock server, that could fix the problem, BUT I don't know how to do some configuration to the wiremock server using the annotation #AutoConfigureWireMock(port=8089).
Here my code to start a wiremock server, any suggestion to fix "NoHttpResponseException"?
#ContextConfiguration(
initializers = ConfigFileApplicationContextInitializer.class)
#SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
class BaseSpec extends Specification {
#Shared
WireMockServer wireMockServer
def setupSpec() {
wireMockServer = new WireMockServer(options().port(PORT).jettyHeaderBufferSize(12345)
.notifier(new ConsoleNotifier(new Boolean(System.getenv(“IT_WIREMOCK_LOG”) ?: ‘false’)))
.extensions(new ResponseTemplateTransformer(true)))
wireMockServer.start()
}
Apache HttpClient suffers from NoHttpResponseException from time to time. This is a very old problem.
Anyway, I guess in your case the problem might be caused by restarting the WireMock server between tests, and at the same time, Apache HttpClient pools HTTP connections and tries to reuse them between tests. If this is the case, there are two solutions:
Disable pooling HTTP connections in your tests. This makes sense because it's considered normal that the WireMock server can be restarted during tests execution. Alternatively, craft your WireMock stubs to always send "Connection": "close" among the headers. The outcome will be the same.
Switch from Apache HttpClient to Square OkHttp. OkHttp, although it pools http connections by default, is always able to gracefully recover from situations like a stale connection. Unfortunately the library from Apache is not so smart.
Coorect, as already written by G. Demecki it's not related to Wiremock.
It’s related to your application server, which is calling wiremock. Today it’s common, to reuse a connection to improve the performance in a micro service infrastructure. So connection-close-header, RequestScoped client, etc is not useful.
Check the apache http client:
httpclient-4.5.2 - PoolingHttpClientConnectionManager
documentation
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms
Each time a wiremock-endpoint was destroyed and a new one was created for a new test class, it takes 2 seconds, until your application detects, that the previous connection is broken and a new one has to be opened.
If you don’t wait 2 seconds, such a NoHttpResponseException could be thrown, depends on the last check.
So a Thread.sleep(2000); looks ugly. But it's not so bad, as long we know why this is required.
Each time a wiremock endpoint is destroyed (because the wiremock server is restarted between tests) and a new one is created for a new test, it takes 2 seconds (as stated in documentation), until the application detects that the previous http connection is broken and a new one has to be opened.
The solution is to simply override the default keep-alive connection behaviour for every stub using .withHeader("Connection", "close"). Something like:
givenThat(get("/endpoint_path")
.withHeader("Authorization", equalTo(authHeader))
.willReturn(
ok()
.withBody(body)
.withHeader(HttpHeaders.CONNECTION, "close")
)
)
Also possible to do it globally using a transformer:
public class NoKeepAliveTransformer extends ResponseDefinitionTransformer {
#Override
public ResponseDefinition transform(Request request,
ResponseDefinition responseDefinition,
FileSource files,
Parameters parameters) {
return ResponseDefinitionBuilder
.like(responseDefinition)
.withHeader(CONNECTION, "close")
.build();
}
#Override
public String getName() {
return "keep-alive-disabler";
}
}
Then this transformer have to be registered when you create the wiremock server:
new WireMockServer(
options()
.port(port)
.extensions(NoKeepAliveTransformer.class)
)
Solution that worked for us in this situation was just adding retry to apache client:
#Configuration
public class FeignTestConfig {
#Bean
#Primary
public HttpClient testClient() {
return HttpClientBuilder.create().setRetryHandler((exception, executionCount, context) -> {
if (executionCount > 3) {
return false;
}
return exception instanceof org.apache.http.NoHttpResponseException || exception instanceof SocketException;
}).build();
}
}
Socket exception is there as well, because sometimes this exception is thrown instead of NoHttpResponse
We have a spring application where redis cache has been implemented along with the database MySQL. Here we are using redis cache to store the temporary values for the server validations instead of hitting the database every time, hence hitting the database calls every time gets reduces system performance.
Now i explain my problem while hitting the spring boot action endpoints,
if suddenly my redis cache server stops, we would like to know how to get the notification that my redis cache server is down. So we need solution / example java application to get the notification using redis cache listener context or anything like that.
Redis doesn't work that way. In fact, no remote service will notify your application that it's down. Usually, it's the other way round: If the service you're consuming is accessed with a more or less sophisticated client, you might take advantage of the client's features.
Asynchronous clients that run I/O, or monitoring threads can help here. More specific, it depends on the client you're using with Spring Boot and Redis. Jedis is a plain client that reacts on a request basis. Lettuce allows you to register a RedisConnectionStateListener that is called on specific connection events, such as connected/disconnected:
RedisClient redisClient = …;
redisClient.addListener(new RedisConnectionStateListener() {
#Override
public void onRedisConnected(RedisChannelHandler<?, ?> redisChannelHandler) {
}
#Override
public void onRedisDisconnected(RedisChannelHandler<?, ?> redisChannelHandler) {
}
#Override
public void onRedisExceptionCaught(RedisChannelHandler<?, ?> redisChannelHandler, Throwable throwable) {
}
});
When using Spring Data Redis, retrieving the RedisClient from LettuceConnectionFactory might be a bit tricky as it is a private field. Hence it requires reflection.