Fail-fast behavior for Eureka client - spring-boot

It seems that following problem doesn't have common decision and I try to solve it from another side. Microservices infrastructure consists of Spring Boot Microservices with Eureka-Zuul-Config-Admin Servers as service mesh. All Microservices runs inside Docker containers at the Kubernetes platform. Kubernetes monitors application health check (liveness/readyness probes) and redeploy it when health check in down state exceeds liveness probe timeout.
The problem is following - sometimes Microservice doesn't get correct Eureka server address after redeployment. Service discovery registration fails but Microservice continue working with health check 'UP' and dependent Microservices miss it.
Microservices are interdependent and failure of one Microservice causes cascade failure of all dependent Microservices. I don't use Histrix because of some reasons and actually it is not resolve my problem - missed data from failed Microservice just disables entire functionality related to the set of dependent Microservices.
Question: Is it possible to configure something like 'fail-fast' behavior for Eureka client without writing custom HealthIndicator? The actuator health check should be in 'DOWN' state while Eureka client doesn't get 204 successful registration response from Eureka.
Here is an example of how I fix it in code. It has pretty simple behavior - healthcheck goes down 'forever' after exceeding timeout to successful registration in Eureka on start or/and during runtime. The main goal is that the Kubernetes will redeploy Microservice when liveness probe timeout exceeded.
#Component
public class CustomHealthIndicator implements HealthIndicator {
private static final Logger logger = LoggerFactory.getLogger(CustomHealthIndicator.class);
#Autowired
#Qualifier("eurekaClient")
private EurekaClient eurekaClient;
private static final int HEALTH_CHECK_DOWN_LIMIT_MIN = 15;
private LocalDateTime healthCheckDownTimeLimit = getHealthCheckDownLimit();
#Override
public Health health() {
int errCode = registeredInEureka();
return errCode != 0
? Health.down().withDetail("Eureka registration fails", errCode).build()
: Health.up().build();
}
private int registeredInEureka() {
int status = 0;
if (isStatusUp()) {
healthCheckDownTimeLimit = getHealthCheckDownLimit();
} else if (LocalDateTime.now().isAfter(healthCheckDownTimeLimit)) {
logger.error("Exceeded {} min. limit for getting 'UP' state in Eureka", HEALTH_CHECK_DOWN_LIMIT_MIN);
status = HttpStatus.GONE.value();
}
return status;
}
private boolean isStatusUp() {
return eurekaClient.getInstanceRemoteStatus().compareTo(InstanceInfo.InstanceStatus.UP) == 0;
}
private LocalDateTime getHealthCheckDownLimit() {
return LocalDateTime.now().plus(HEALTH_CHECK_DOWN_LIMIT_MIN, ChronoUnit.MINUTES);
}
}
Is it possible to do the same by just configuring Spring components?

Related

Load balancing problems with Spring Cloud Kubernetes

We have Spring Boot services running in Kubernetes and are using the Spring Cloud Kubernetes Load Balancer functionality with RestTemplate to make calls to other Spring Boot services. One of the main reasons we have this in place is historical - in that previously we ran our services in EC2 using Eureka for service discovery and after the migration we kept the Spring discovery client/client-side load balancing in place (updating dependencies etc for it to work with the Spring Cloud Kubernetes project)
We have a problem that when one of the target pods goes down we get multiple failures for requests for a period of time with java.net.NoRouteToHostException ie the spring load balancer is still trying to send to that pod.
So I have a few questions on this:
Shouldn't the target instance get removed automatically when this happens? So it might happen once but after that, the target pod list will be repaired?
Or if not is there some other configuration we need to add to handle this - eg retry / circuit breaker, etc?
A more general question is what benefit does Spring's client-side load balancing bring with Kubernetes? Without it, our service would still be able to call other services using Kubernetes built-in service / load-balancing functionality and this should handle the issue of pods going down automatically. The Spring documentation also talks about being able to switch from POD mode to SERVICE mode (https://docs.spring.io/spring-cloud-kubernetes/docs/current/reference/html/index.html#loadbalancer-for-kubernetes). But isn't this service mode just what Kubernetes does automatically? I'm wondering if the simplest solution here isn't to remove the Spring Load Balancer altogether? What would we lose then?
An update on this: we had the spring-retry dependency in place, but the retry was not working as by default it only works for GETs and most of our calls are POST (but OK to call again). Adding the configuration spring.cloud.loadbalancer.retry.retryOnAllOperations: true fixed this, and hence most of these failures should be avoided by the retry using an alternative instance on the second attempt.
We have also added a RetryListener that clears the load balancer cache for the service on certain connection exceptions:
#Configuration
public class RetryConfig {
private static final Logger logger = LoggerFactory.getLogger(RetryConfig.class);
// Need to use bean factory here as can't autowire LoadBalancerCacheManager -
// - it's set to 'autowireCandidate = false' in LoadBalancerCacheAutoConfiguration
#Autowired
private BeanFactory beanFactory;
#Bean
public CacheClearingLoadBalancedRetryFactory cacheClearingLoadBalancedRetryFactory(ReactiveLoadBalancer.Factory<ServiceInstance> loadBalancerFactory) {
return new CacheClearingLoadBalancedRetryFactory(loadBalancerFactory);
}
// Extension of the default bean that defines a retry listener
public class CacheClearingLoadBalancedRetryFactory extends BlockingLoadBalancedRetryFactory {
public CacheClearingLoadBalancedRetryFactory(ReactiveLoadBalancer.Factory<ServiceInstance> loadBalancerFactory) {
super(loadBalancerFactory);
}
#Override
public RetryListener[] createRetryListeners(String service) {
RetryListener cacheClearingRetryListener = new RetryListener() {
#Override
public <T, E extends Throwable> boolean open(RetryContext context, RetryCallback<T, E> callback) { return true; }
#Override
public <T, E extends Throwable> void close(RetryContext context, RetryCallback<T, E> callback, Throwable throwable) {}
#Override
public <T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback, Throwable throwable) {
logger.warn("Retry for service {} picked up exception: context {}, throwable class {}", service, context, throwable.getClass());
if (throwable instanceof ConnectTimeoutException || throwable instanceof NoRouteToHostException) {
try {
LoadBalancerCacheManager loadBalancerCacheManager = beanFactory.getBean(LoadBalancerCacheManager.class);
Cache loadBalancerCache = loadBalancerCacheManager.getCache(CachingServiceInstanceListSupplier.SERVICE_INSTANCE_CACHE_NAME);
if (loadBalancerCache != null) {
boolean result = loadBalancerCache.evictIfPresent(service);
logger.warn("Load Balancer Cache evictIfPresent result for service {} is {}", service, result);
}
} catch(Exception e) {
logger.error("Failed to clear load balancer cache", e);
}
}
}
};
return new RetryListener[] { cacheClearingRetryListener };
}
}
}
Are there any issues with this approach? Could something like this be added to the built in functionality?
Shouldn't the target instance get removed automatically when this
happens? So it might happen once but after that the target pod list
will be repaired?
To resolve this issue you have to use the Readiness and Liveness Probe in Kubernetes.
Readiness will check the health of the endpoint that your application has, on the period of interval. If the application fails it will mark your PODs as Unready to accept the Traffic. So no traffic will go to that POD(replica).
Liveness will restart your application if it fails so your container or we can say POD will come up again and once we will get 200 response from app K8s will mark your POD as Ready to accept the traffic.
You can create the simple endpoint in the application that give response as 200 or 204 as per need.
Read more at : https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Make sure you application using the Kubernetes service to talk with each other.
Application 1 > Kubernetes service of App 2 > Application 2 PODs
To enable load balancing based on Kubernetes Service name use the
following property. Then load balancer would try to call application
using address, for example service-a.default.svc.cluster.local
spring.cloud.kubernetes.loadbalancer.mode=SERVICE
The most typical way to use Spring Cloud LoadBalancer on Kubernetes is
with service discovery. If you have any DiscoveryClient on your
classpath, the default Spring Cloud LoadBalancer configuration uses it
to check for service instances. As a result, it only chooses from
instances that are up and running. All that is needed is to annotate
your Spring Boot application with #EnableDiscoveryClientto enable
K8s-native Service Discovery.
References : https://stackoverflow.com/a/68536834/5525824

Spring Boot microservices - dependency

There are two microservices deployed with docker compose. A dependecy between services is defined in docker compose file by depends_on property. Is it possible to achieve the same effect implicitly, inside the spring boot application?
Let's say the microservice 1 depends on microservice 2. Which means, microsearvice 1 doesn't boot up before microservice 2 is healthy or registered on Eureka server.
By doing some research, I found a solution to the problem.
Spring Retry resolves dependency on Spring Cloud Config Server. Maven dependency spring-retry should be added into the pom.xml, and the properties below into the .properties file:
spring.cloud.config.fail-fast=true
spring.cloud.config.retry.max-interval=2000
spring.cloud.config.retry.max-attempts=10
The following configuration class is used to resolve dependency on other microservices.
#Configuration
#ConfigurationProperties(prefix = "depends-on")
#Data
#Log
public class DependsOnConfig {
private List<String> services;
private Integer periodMs = 2000;
private Integer maxAttempts = 20;
#Autowired
private EurekaClient eurekaClient;
#Bean
public void dependentServicesRegisteredToEureka() throws Exception {
if (services == null || services.isEmpty()) {
log.info("No dependent services defined.");
return;
}
log.info("Checking if dependent services are registered to eureka.");
int attempts = 0;
while (!services.isEmpty()) {
services.removeIf(this::checkIfServiceIsRegistered);
TimeUnit.MILLISECONDS.sleep(periodMs);
if (maxAttempts.intValue() == ++attempts)
throw new Exception("Max attempts exceeded.");
}
}
private boolean checkIfServiceIsRegistered(String service) {
try {
eurekaClient.getNextServerFromEureka(service, false);
log.info(service + " - registered.");
return true;
} catch (Exception e) {
log.info(service + " - not registered yet.");
return false;
}
}
}
A list of services that the current microservice depends on are defined in .properties file:
depends-on.services[0]=service-id-1
depends-on.services[1]=service-id-2
A bean dependentServicesRegisteredToEureka is not being initialized until all services from the list register to Eureka. If needed, annotation #DependsOn("dependentServicesRegisteredToEureka") can be added to beans or components to prevent attempting an initialization before dependentServicesRegisteredToEureka initialize.

Feign Client throws HystrixTimeoutException even though the underlying request is successful

I have a feign client like this with endpoints to two APIs from PROJECT-SERVICE
#FeignClient(name = "PROJECT-SERVICE", fallbackFactory = ProjectServiceFallbackFactory.class)
public interface ProjectServiceClient {
#GetMapping("/api/projects/{projectKey}")
public ResponseEntity<Project> getProjectDetails(#PathVariable("projectKey") String projectKey);
#PostMapping("/api/projects")
public ResponseEntity<Project> createProject(#RequestBody Project project);
}
I'm using those clients like this:
#Service
public class MyService {
#Autowired
private ProjectServiceClient projectServiceClient;
public void doSomething() {
// Some code
ResponseEntity<Project> projectResponse = projectServiceClient.getProjectDetails(projectKey);
// Some more code
}
public void doSomethingElse() {
// Some code
ResponseEntity<Project> projectResponse = projectServiceClient.createProject(Project projectToBeCreated);
// Some more code
}
}
My problem is, most of the times (around 60% of the time), either one of these Feign calls result in a HystrixTimeoutException.
I initially thought there could be a problem in the downstream micro service (PROJECT-SERVICE in this case), but that is not the case. In fact, when getProjectDetails() or createProject() is called, the PROJECT-SERVICE actually does the job and returns a ResponseEntity<Project> with status 200 and 201 respectively, but my fallback is activated with the HystrixTimeoutException.
I'm trying in vain to find what might be causing this issue.
I, however, have this in my main application configuration:
feign.hystrix.enabled=true
feign.client.config.default.connect-timeout=5000
feign.client.config.default.read-timeout=60000
Can anyone point me towards a solution?
Thanks,
Sriram Sridharan
Hystrix's timeout is not tied to that of Feign. There is a default 1 second execution timeout enabled for Hystrix. You need to configure this timeout to be slightly longer than Feign's, to avoid HystrixTimeoutException getting thrown earlier than desired timeout. Like so:
feign.client.config.default.connect-timeout=5000
feign.client.config.default.read-timeout=5000
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=6000
Doing so would allow FeignException, caused by timeout after 5 seconds, to be thrown first, and then wrapped in a HystrixTimeoutException

Docker swarm springboot and eureka service discoverer not working

Currently working on swarmifying our Springboot microservice back-end with eureka service discoverer. The first problem was making sure the service discoverer doesn't pick de ingress IP-adress but instead IP-address from the overlay network. After some searching I found a post that suggest the following Eureka Client Configuration:
#Configuration
#EnableConfigurationProperties
public class EurekaClientConfig {
private ConfigurableEnvironment env;
public EurekaClientConfig(final ConfigurableEnvironment env) {
this.env = env;
}
#Bean
#Primary
public EurekaInstanceConfigBean eurekaInstanceConfigBean(final InetUtils inetUtils) throws IOException {
final String hostName = System.getenv("HOSTNAME");
String hostAddress = null;
final Enumeration<NetworkInterface> networkInterfaces = NetworkInterface.getNetworkInterfaces();
for (NetworkInterface netInt : Collections.list(networkInterfaces)) {
for (InetAddress inetAddress : Collections.list(netInt.getInetAddresses())) {
if (hostName.equals(inetAddress.getHostName())) {
hostAddress = inetAddress.getHostAddress();
System.out.printf("Inet used: %s", netInt.getName());
}
System.out.printf("Inet %s: %s / %s\n", netInt.getName(), inetAddress.getHostName(), inetAddress.getHostAddress());
}
}
if (hostAddress == null) {
throw new UnknownHostException("Cannot find ip address for hostname: " + hostName);
}
final int nonSecurePort = Integer.valueOf(env.getProperty("server.port", env.getProperty("port", "8080")));
final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(inetUtils);
instance.setHostname(hostName);
instance.setIpAddress(hostAddress);
instance.setNonSecurePort(nonSecurePort);
System.out.println(instance);
return instance;
}
}
After deploying the new discoverer I got the correct result and the service discoverer had the correct overlay IP-address.
In order to understand the next step here is some information about the environment we run this docker swarm on. We currently have 2 droplets one for development and the other for production. Currently we are only working on the development server to Swarmify it. The production hasn't been touched in months.
The next step is to deploy a Discovery Client Springboot application that will connect to the correct service discoverer and also has the overly IP-address instead of the ingress. But when I build the application it always connects to our production service discoverer outside the docker swarm into the other droplet. I can see the application being deployed on the swarm but looking at the Eureka dashboard from the production server I can see that it connects to it.
The second problem is that the application also has the EurekaClient config you see above but it is ignored. Even the logs within the method is not called when starting up the applicaiton.
Here is the configuration from the Discovery Client application:
eureka:
client:
serviceUrl:
defaultZone: service-discovery_service:8761/eureka
enabled: false
instance:
instance-id: ${spring.application.name}:${random.value}
prefer-ip-address: true
spring:
application:
name: account-service
I assume that you can use defaultZone to point at the correct service discoverer but I can be wrong.
Just dont use an eureka service discoverer but something else like treafik. Much easier solution.

Spring Boot JPA and HikariCP maintaining active connections

Brief:
Is there a way to ensure that a connection to the database is returned to the pool?
Not-brief:
Data flow:
I have some long running tasks that could be sent to the server in large volume bursts.
Each of the requests is recorded in the DB that the submission was started. Then send that request off for processing.
If failure or success the request is recorded after the task is completed.
The issue is that after the submission is recorded all the way through the long running task, the connection pool uses an "active" connection. This could potential use up any size pool I have if the burst was large enough.
I am using spring boot with the following structure:
Controller - responds at "/" and has the "service" autowired.
Service - Contains all the JPA repositories and #Transactional methods to interact with the database.
When every the first service method call is made from the controller it opens an active connection and doesn't release it until the controller method returns.
So, Is there a way to return the connection to the pool after each service method?
Here is the service class in total:
#Service
#Slf4j
class SubmissionService {
#Autowired
CompanyRepository companyRepository;
#Autowired
SubmissionRepository submissionRepository;
#Autowired
FailureRepository failureRepository;
#Autowired
DataSource dataSource
#Transactional(readOnly = true)
public Long getCompany(String apiToken){
if(!apiToken){
return null
}
return companyRepository.findByApiToken(apiToken)?.id
}
#Transactional
public void successSubmission(Long id) {
log.debug("updating submission ${id} to success")
def submissionInstance = submissionRepository.findOne(id)
submissionInstance.message = "successfully analyzed."
submissionInstance.success = true
submissionRepository.save(submissionInstance)
}
#Transactional
public long createSubmission(Map properties) {
log.debug("creating submission ${properties}")
dataSource.pool.logPoolState()
def submissionInstance = new Submission()
for (key in properties.keySet()) {
if(submissionInstance.hasProperty(key)){
submissionInstance."${key}" = properties.get(key)
}
}
submissionInstance.company = companyRepository.findOne(properties.companyId)
submissionRepository.save(submissionInstance)
return submissionInstance.id
}
#Transactional
public Long failureSubmission(Exception e, Object analysis, Long submissionId){
//Track the failures
log.debug("updating submission ${submissionId} to failure")
def submissionInstance
if (submissionId) {
submissionInstance = submissionRepository.findOne(submissionId)
submissionRepository.save(submissionInstance)
}
def failureInstance = new Failure(submission: submissionInstance, submittedJson: JsonOutput.toJson(analysis), errorMessage: e.message)
failureRepository.save(failureInstance)
return failureInstance.id
}
}
It turns out that #M.Deinum was onto the right track. Spring Boot JPA automatically turns on the "OpenEntityManagerInViewFilter" if the application property spring.jpa.open_in_view is set to true, which it is by default. I found this in the JPA Configuration Source.
After setting this to false, the database session wasn't held onto, and my problems went away.

Resources