my Pivotal cloud foundry app is crashing often while doing healthcheck - spring-boot

I have created a spring boot integration app and deployed it to Pivotal Cloud Foundry (PCF) environment. It works for couple of days and then it starts to crash randomly afterwards. I checked the PCF logs and found this information about the crash.
OUTApp instance exited with guid 3c348d47-48c4-403f-950a-29af1efa551d
payload: {"instance"=>"e2122543-214f-4806-62c7-00e1", "index"=>2,
"reason"=>"CRASHED", "exit_description"=>"Instance became unhealthy: Failed
to make HTTP request to '/health' on port 8080: timed out after 1.00
seconds", "crash_count"=>1, "crash_timestamp"=>1511959503256098495,
"version"=>"10cea919-d490-460d-83d6-5132c96ef781"}
My CPU utilization is not much. My memory is also not leaking.
Information about the application deployed in PCF:
Spring boot integration app connects to IBM MQ queues and polls for messages and then calls couple of web services.
There is also another application Service Bus, which makes the health check call on PCF application to check if the PCF app is available or not. If Service Bus finds that PCF app is available then the requests are routed to PCF else they are processed at Service Bus end itself.
Please let me know, how to find the root cause of the CRASH and fix it.
Thanks in advance. Please let me know, if you need further details.

I have changed the health check type to port type from http in manifest.yml file.
configuration change in manifest file is as follows:
health-check-type: port
Now the app is not crashing. It is working fine. Hope this helps.

Related

How to retry indefinitely while connecting spring boot application to consul

We use Consul by HashiCorp for configuration management in our spring boot application. Sometimes consul agent is occasionally unavailable when our app starts. It is inconvenient that the app fails if consul isn’t available for configuration.
I read about Consul Retry https://cloud.spring.io/spring-cloud-static/spring-cloud-consul/1.2.3.RELEASE/single/spring-cloud-consul.html#spring-cloud-consul-retry
Although it is a good solution for retrying, is there any solution in which the application retries indefinitely until it connects with consul?
We deploy this app on tomcat. Is there a better solution than consul-retry for reconnection? Should we try reloading app on tomcat whenever it fails ?
Can anyone tell me the best practice for reconnection in this scenario?

[Ignite]Problem upgrading Ignite 2.7 Services to 2.10....Services always getting canceled?

I completed the code upgrade and removed all deprecated calls from my service code.
This is a Java Spring application and all services come up (as reported in the log files) when the Cluster (of 5 nodes) starts up.
However, when I try to get a serviceProxy to each service, all services get cancelled as a result of ServiceDeploymentTask attempting to do a redeploy of the services!! The redeploy fails and only cancels all of the services and fails to restart them. This can be demonstrated with both a thick and a thin client.
Why is Ignite trying to redeploy the services? (and why don't the services restart?)
Is there something I'm missing from the move to Ignite 2.10???
Finally, why does a Java Thin Client create a NODE_JOIN event?
Thanks in advance.
Greg

How to restart kubernetes pod when issue because of Rabbit MQ connectivity in logs

I have a Spring Boot 2 standalone application( not REST service) which connect to rabbit MQ and process message. The application is deployed in kubernetes. While it work great, but when Rabbit MQ remain down for little longer and in logs I see hearbeat exception 60sec and eventually connection get drop even if the rabbit mq comes up after certain time:
Automatic retry connection to broker by spring-rabbitmq
https://www.rabbitmq.com/heartbeats.html
While I try to manage above issue by increasing number of retry :https://stackoverflow.com/questions/45385119/how-configure-timeouts-retries-or-max-attempts-in-differents-queues-with-spring
but after expiry of retry still above issue comes.
How can I reboot/delete-recreate pod if I see above issue in logs from kubernetes.
The easiest way is to use actuator, which has a /actuator/health endpoint. (Note that the recent version also add /actuator/health/liveness and /actuator/health/readiness).
You can assign the endpoint to livenessProbe property of k8s. Then it will automatically restart when it is necessary. You can parameterize, when your app is down if necessary.
See the docs:
Kubernetes liveness probe
Spring actuator health

Spring boot microservices doesn't work with Intelij IDEA

I am creating a spring boot microservice project with intelij IDEA.
Currently I have developed three seperate spring boot rest services as customer service, vehicle service and spring cloud config server. Spring cloud config server is pointing to a github repository.
The issue is sometimes above projects take more than 10 minutes to run and sometimes does't run and give an error message as "failed to check application readystate intellij attached provider for the vm is not found". I have no idea why this happens ?
There are two possible causes:
1. IntelliJ IDEA and the Spring application are running in different JVMs.
There is a bug for IntelliJ IDEA regarding that:
https://youtrack.jetbrains.com/issue/IDEA-210665
Here is short summary:
IntelliJ IDEA uses local JMX connector for retrieving Spring Boot actuator endpoint's data by default. However, it could be impossible to get local JMX connector address via attach api if Spring Boot application and IntelliJ IDEA are run by different JVMs. In this case, add the following lines to VM options of your Spring Boot run configuration:
-Dcom.sun.management.jmxremote.port={some_port}
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
As mentioned in the official Oracle documentation, this configuration is insecure. Any remote user who knows (or guesses) your port number and host name will be able to monitor and control your Java applications and platform.
2. Prolonged time to retrieve local hostname
You can check that time using inetTester. Normally it should take only several milliseconds to complete. If it takes a long time then you can add the hostname returned by inetTester to /etc/hosts file like this:
127.0.0.1 localhost winsky
::1 localhost winsky

Eureka First Discovery & Config Client Retry with Docker Compose

We've three Spring Boot applications:
Eureka Service
Config Server
Simple Web Service making use of Eureka and Config Server
I've set up the services so that we use a Eureka First Discovery, i.e. the simple web application finds out about the config server from the eureka service.
When started separately (either locally or by starting them as individual docker images) everything is ok, i.e. start config server after discovery service is running, and the Simple web service is started once the config server is running.
When docker-compose is used to start the services, they obviously start at the same time and essentially race to get up and running. This isn't an issue as we've added failFast: true and retry values to the simple web service and also have the docker container restarting so that the simple web service will eventually restart at a time when the discovery service and config server are both running but this doesn't feel optimal.
The unexpected behaviour we noticed was the following:
The simple web service reattempts a number of times to connect to the discovery service. This is sensible and expected
At the same time the simple web service attempts to contact the config server. Because it cannot contact the discovery service, it retries to connect to a config server on localhost, e.g. logs show retries going to http://localhost:8888. This wasn't expected.
The simple web service will eventually successfully connect to the discovery service but the logs show it stills tries to establish communication to the config server by going to http://localhost:8888. Again, this wasn't ideal.
Three questions/observations:
Is it a sensible strategy for the config client to fall back to trying localhost:8888 when it has been configured to use discovery to find the config server?
When the eureka connections is established, should the retry mechanism not now switch to trying the config server endpoint as indicated by Eureka? Essentially putting in higher/longer retry intervals and periods for the config server connection is pointless in this case as it's never going to connect to it if it's looking at localhost so we're better just failing fast.
Are there any properties that can override this behaviour?
I've created a sample github repo that demonstrates this behaviour:
https://github.com/KramKroc/eurekafirstdiscovery/tree/master

Resources