Do WebSockets trigger IOException: Too many opened files? - spring

I have spring-boot (version: 2.2.1.Release) application. The application has a scheduled task (call it Task-A) that makes a lot of requests to third api, that occasionally may be down. Also the application has an opened WebSocket, so that client may check in real-time status of some process. Web socket has following configuration:
#Configuration
#EnableWebSocketMessageBroker
#EnableWebSocket
class WebSocketConfig : WebSocketMessageBrokerConfigurer {
override fun registerStompEndpoints(registry: StompEndpointRegistry) {
registry.addEndpoint("/ws/activity")
.setAllowedOrigins("origin-from-where-connection-to-socket-comes.com")
.withSockJS()
}
}
And there is the second scheduled task (call it Task-B) that writes information for clients into the socket every 5 seconds:
#Component
class ChargersScheduled
#Autowired constructor(
private val processMonitor: ProcessMonitor,
private val messagingTemplate: SimpMessagingTemplate
) {
#Scheduled(fixedDelay = 5000)
fun getSchedulersActivity () {
messagingTemplate.convertAndSend("/web-socket/activity", processMonitor.checkActivity())
}
}
At some point of time Task-A starts to emit IOException : Too many open files, a minute or two later logs are starting to fill with:
o.a.t.u.n.Acceptor : Socket accept failed
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:461)
at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:73)
at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95)
at java.lang.Thread.run(Thread.java:748)
Is that caused by clients attempting to connect to WebSockets or making simple requests to the server or, perhaps, both? Aside from increasing ulimit (already done) what is the way of mitigating the problem? As for now I had to restart the application since it hanged, as if it was under DDOS attack.

you should increase the File Descriptor size. Ulimit itself is not enough. You should apply some changes on some other files.
see this link

Related

How to fix "NoHttpResponseException" when running Wiremock on jenkins?

I start a wiremock server in a integration test.
The IT pass in my local BUT some case failed in jenkins server, the error is
localhost:8089 failed to respond; nested exception is org.apache.http.NoHttpResponseException: localhost:8089 failed to respond
I try to add sleep(3000) in my test, that can fix the issue, But I don’t know the root cause of the issue, so the work around not a good idea
I also try to use #AutoConfigureWireMock(port=8089) to replace WireMockServer to start wiremock server, that could fix the problem, BUT I don't know how to do some configuration to the wiremock server using the annotation #AutoConfigureWireMock(port=8089).
Here my code to start a wiremock server, any suggestion to fix "NoHttpResponseException"?
#ContextConfiguration(
initializers = ConfigFileApplicationContextInitializer.class)
#SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
class BaseSpec extends Specification {
#Shared
WireMockServer wireMockServer
def setupSpec() {
wireMockServer = new WireMockServer(options().port(PORT).jettyHeaderBufferSize(12345)
.notifier(new ConsoleNotifier(new Boolean(System.getenv(“IT_WIREMOCK_LOG”) ?: ‘false’)))
.extensions(new ResponseTemplateTransformer(true)))
wireMockServer.start()
}
Apache HttpClient suffers from NoHttpResponseException from time to time. This is a very old problem.
Anyway, I guess in your case the problem might be caused by restarting the WireMock server between tests, and at the same time, Apache HttpClient pools HTTP connections and tries to reuse them between tests. If this is the case, there are two solutions:
Disable pooling HTTP connections in your tests. This makes sense because it's considered normal that the WireMock server can be restarted during tests execution. Alternatively, craft your WireMock stubs to always send "Connection": "close" among the headers. The outcome will be the same.
Switch from Apache HttpClient to Square OkHttp. OkHttp, although it pools http connections by default, is always able to gracefully recover from situations like a stale connection. Unfortunately the library from Apache is not so smart.
Coorect, as already written by G. Demecki it's not related to Wiremock.
It’s related to your application server, which is calling wiremock. Today it’s common, to reuse a connection to improve the performance in a micro service infrastructure. So connection-close-header, RequestScoped client, etc is not useful.
Check the apache http client:
httpclient-4.5.2 - PoolingHttpClientConnectionManager
documentation
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms
Each time a wiremock-endpoint was destroyed and a new one was created for a new test class, it takes 2 seconds, until your application detects, that the previous connection is broken and a new one has to be opened.
If you don’t wait 2 seconds, such a NoHttpResponseException could be thrown, depends on the last check.
So a Thread.sleep(2000); looks ugly. But it's not so bad, as long we know why this is required.
Each time a wiremock endpoint is destroyed (because the wiremock server is restarted between tests) and a new one is created for a new test, it takes 2 seconds (as stated in documentation), until the application detects that the previous http connection is broken and a new one has to be opened.
The solution is to simply override the default keep-alive connection behaviour for every stub using .withHeader("Connection", "close"). Something like:
givenThat(get("/endpoint_path")
.withHeader("Authorization", equalTo(authHeader))
.willReturn(
ok()
.withBody(body)
.withHeader(HttpHeaders.CONNECTION, "close")
)
)
Also possible to do it globally using a transformer:
public class NoKeepAliveTransformer extends ResponseDefinitionTransformer {
#Override
public ResponseDefinition transform(Request request,
ResponseDefinition responseDefinition,
FileSource files,
Parameters parameters) {
return ResponseDefinitionBuilder
.like(responseDefinition)
.withHeader(CONNECTION, "close")
.build();
}
#Override
public String getName() {
return "keep-alive-disabler";
}
}
Then this transformer have to be registered when you create the wiremock server:
new WireMockServer(
options()
.port(port)
.extensions(NoKeepAliveTransformer.class)
)
Solution that worked for us in this situation was just adding retry to apache client:
#Configuration
public class FeignTestConfig {
#Bean
#Primary
public HttpClient testClient() {
return HttpClientBuilder.create().setRetryHandler((exception, executionCount, context) -> {
if (executionCount > 3) {
return false;
}
return exception instanceof org.apache.http.NoHttpResponseException || exception instanceof SocketException;
}).build();
}
}
Socket exception is there as well, because sometimes this exception is thrown instead of NoHttpResponse

Readiness probe during Spring context startup

We are deploying our spring boot applications in OpenShift.
Currently we are trying to run a potentially long running task (database migration) before the webcontext is fully set up.
It is especially important that the app does not accept REST requests or process messages before the migration is fully run.
See the following minimal example:
// DemoApplication.java
#SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}
// MigrationConfig.java
#Configuration
#Slf4j
public class MigrationConfig {
#PostConstruct
public void run() throws InterruptedException {
log.info("Migration...");
// long running task
Thread.sleep(10000);
log.info("...Migration");
}
}
// Controller.java
#RestController
public class Controller {
#GetMapping("/test")
public String test() {
return "test";
}
}
// MessageHandler.java
#EnableBinding(Sink.class)
public class MessageHandler {
#StreamListener(Sink.INPUT)
public void handle(String message) {
System.out.println("Received: " + message);
}
}
This works fine so far: the auto configuration class is processed before the app responds to requests.
What we are worried about, however, is OpenShifts readiness probe: currently we use an actuator health endpoint to check if the application is up and running.
If the migration takes a long time, OpenShift might stop the container, potentially leaving us with inconsistent state in the database.
Does anybody have an idea how we could communicate that the application is starting, but prevent REST controller or message handlers from running?
Edit
There are multiple ways of blocking incoming REST requests, #martin-frey suggested a servletfilter.
The larger problem for us is stream listener. We use Spring Cloud Stream to listen to a RabbitMQ queue.
I added an exemplary handler in the example above.
Do you have any suggestions on how to "pause" that?
What about a servletfilter that knows about the state of the migration? That way you should be able to handle any inbound request and return a responsecode to your liking. Also there would be no need to prevent any requesthandlers until the system is fully up.
I think it can run your app pod without influence if you set up good enough initialDelaySeconds for initialization of your application.[0][1]
readinessProbe:
httpGet:
path: /_status/healthz
port: 8080
initialDelaySeconds: 10120
timeoutSeconds: 3
periodSeconds: 30
failureThreshold: 100
successThreshold: 1
Additionally, I recommend to set up the liveness probes with same condition (but more time than the readiness probes' value), then you can implement automated recovery of your pods if the application is failed until initialDelaySeconds.
[0] [ https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes ]
[1] [ https://docs.openshift.com/container-platform/latest/dev_guide/application_health.html ]
How about adding an init container which only role is the db migration stuffs without the application.
Then another container to serve the application. But be careful when deploying the application with more than 1 replica. The replicas will also execute the initcontainer at the same time if you are using Deployment.
If you need multiple replicas, you might want to consider StatefulSets instead.
Such database migrations are best handled by switching to a Recreate deployment strategy and doing the migration as a mid lifecyle hook. At that point there are no instances of your application running so it can be safely done. If you can't have downtime, then you need to have the application be able to be switched to some offline or read/only mode against a copy of your database while doing the migration.
Don't keep context busy doing a long task in PostConstruct. Instead start migration as fully asynchronous task and allow Spring to build the rest of the context meanwhile. At the end of the task just set some shared Future with success or failure. Wrap controller in a proxy (can be facilitated with AOP, for example) where every method except the health check tries to get value from the same future within a timeout. If it succeeds, migration is done, all calls are available. If not, reject the call. Your proxy would serve as a gate allowing to use only part of API that is critical to be available while migration is going on. The rest of it may simply respond with 503 indicating the service is not ready yet. Potentially those 503 responses can also be improved by measuring and averaging the time migration typically takes and returning this value with RETRY-AFTER header.
And with the MessageHandler you can do essentially same thing. You wait for result of the future in the handle method (provided message handlers are allowed to hang indefinitely). Once the result is set, it will proceed with message handling from that moment on.

how can i do for an MDB to deploy last on my wildfly

What is happening to me is that the MDB receives messages and tries to process them and even my server has not started completely
any idea how to solve this?
You can find out if your server startup is completed by one of the following two techniques:
use ServletContextListener, once your application deployment is complete, server would call ServletContextListener.contextInitialized method
Use mbean support from wildfly, you can query mBean via JMX interface of wildfly and figure out if the server state is 'started'. But mind you, your code would be tied down to wildfly only in this case.
Once you decide the option to figure out the server startup state, you need to check for it in your MDB's postconstruct method and go ahead only if the server is started.
#MessageDriven(...)
public class MyMdb implements MessageListener {
#PostConstruct
public void init() {
// check if server has started here
//if server is not started, sleep and re-check again.
}
public void onMessage(Message message) {
}
}

Spring STOMP over WebSockets not scheduling heartbeats

We have a Spring over WebSockets connection that we're passing a CONNECT frame:
CONNECT\naccept-version:1.2\nheart-beat:10000,10000\n\n\u0000
Which the handler acknowledges, starts a new session, and than returns:
CONNECTED
version:1.2
heart-beat:0,0
However, we want the heart-beats so we can keep the WebSocket open. We're not using SockJS.
I stepped through the Spring Message Handler:
StompHeaderAccessor [headers={simpMessageType=CONNECT, stompCommand=CONNECT, nativeHeaders={accept-version=[1.2], heart-beat=[5000,0]}, simpSessionAttributes={}, simpHeartbeat=[J#5eba717, simpSessionId=46e855c9}]
After it gets the heart-beat (native header), it sets what looks like a memory address simpHeartbeat=[J#5eba717, simpSessionId=46e855c9}]
Of note, after the broker authenticates:
Processing CONNECT session=46e855c9 (the sessionId here is different than simpSessionId)?
When running earlier TRACE debugging I saw a notice "Scheduling heartbeat..." or something to that effect...though I'm not seeing it now?
Any idea what's going on?
Thanks
I have found the explanation in the documentation:
SockJS Task Scheduler stats from thread pool of the SockJS task
scheduler which is used to send heartbeats. Note that when heartbeats
are negotiated on the STOMP level the SockJS heartbeats are disabled.
Are SockJS heartbeats different than STOMP heart-beats?
Starting Spring 4.2 you can have full control, from the server side, of the heartbeat negotiation outcome using Stomp over SockJS with the built-in SimpleBroker:
public class WebSocketConfigurer extends AbstractWebSocketMessageBrokerConfigurer {
#Override
public void configureMessageBroker(MessageBrokerRegistry config) {
ThreadPoolTaskScheduler te = new ThreadPoolTaskScheduler();
te.setPoolSize(1);
te.setThreadNamePrefix("wss-heartbeat-thread-");
te.initialize();
config.enableSimpleBroker("/")
/**
* Configure the value for the heartbeat settings. The first number
* represents how often the server will write or send a heartbeat.
* The second is how often the client should write. 0 means no heartbeats.
* <p>By default this is set to "0, 0" unless the {#link #setTaskScheduler
* taskScheduler} in which case the default becomes "10000,10000"
* (in milliseconds).
* #since 4.2
*/
.setHeartbeatValue(new long[]{heartbeatServer, heartbeatClient})
.setTaskScheduler(te);
}
#Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint(.....)
.setAllowedOrigins(....)
.withSockJS();
}
}
Yes SockJS heartbeats are different. Fundamentally the same thing but their purpose in the SockJS protocol are to ensure that the connection doesn't look like it's "dead" in which case proxies can close it pro-actively. More generally a heartbeat allows each side to detect connectivity issues pro-actively and clean up resources.
When using STOMP and SockJS at the transport layer there is no need to have both which is why the SockJS heartbeats are turned off if STOMP heartbeats are in use. However you're not using SockJS here.
You're not showing any configuration but my guess is that you're using the built-in simple broker which does not automatically send heartbeats. When configuring it you will see an option to enable heartbeats and you also need to set a task scheduler.
#Configuration
#EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
#Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
// ...
}
#Override
public void configureMessageBroker(MessageBrokerRegistry registry) {
registry.enableStompBrokerRelay(...)
.setTaskScheduler(...)
.setHeartbeat(...);
}
}
We got same problem with Spring, Websockets, STOMP and Spring Sessions - no heartbeats and Spring session may expire while websocket doesn't receive messages on server side. We ended up with enable STOMP heartbeats from browser every 20000ms and add SimpMessageType.HEARTBEAT to Spring sessionRepositoryInterceptor matches to keep Spring session last access time updated on STOMP heartbeats without messages. We had to use AbstractSessionWebSocketMessageBrokerConfigurer as a base to enable in-build Spring session and websocket session binding. Spring manual, second example. In official example Spring session is updated on inbound websocket CONNECT/MESSAGE/SUBSCRIBE/UNSUBSCRIBE messages, but not heartbeats, that's why we need to re-configure 2 things - enable at least inbound heartbeats and adjust Spring session to react to websocket heartbeats
public class WebSocketConfig extends AbstractSessionWebSocketMessageBrokerConfigurer<ExpiringSession> {
#Autowired
SessionRepositoryMessageInterceptor sessionRepositoryInterceptor;
#Override
public void configureMessageBroker(MessageBrokerRegistry config) {
sessionRepositoryInterceptor.setMatchingMessageTypes(EnumSet.of(SimpMessageType.CONNECT,
SimpMessageType.MESSAGE, SimpMessageType.SUBSCRIBE,
SimpMessageType.UNSUBSCRIBE, SimpMessageType.HEARTBEAT));
config.setApplicationDestinationPrefixes(...);
config.enableSimpleBroker(...)
.setTaskScheduler(new DefaultManagedTaskScheduler())
.setHeartbeatValue(new long[]{0,20000});
}
}
Another way we tried is some re-implementing of SessionRepositoryMessageInterceptor functionality to update Spring sessions last access time on outbound websocket messages plus maintain websocket session->Spring session map via listeners, but code above did the trick.

SockJS receive stomp messages from spring websocket out of order

I am trying to streaming time series data using Springframework SimpMessagingTemplate (default Stomp implementation) to broadcast messages to a topic that the SockJS client subscribed to. However, the messages is received out of order. The server is single thread and messages are sent in ascending order by their timestamps. The client somehow received the messages out of the order.
I am using the latest release version of both stompjs and springframework (4.1.6 release).
looks like there is a built in striped executor, so just enable it:
#Override
protected void configureMessageBroker(MessageBrokerRegistry registry) {
// ...
registry.setPreservePublishOrder(true);
}
https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#websocket-stomp-ordered-messages
Found the root cause of this issue. The messages were sending in "correct" order from the application implementation perspective (I.e, convertAndSend() are called in one thread or at least thread safe fashion"). However, Springframework web socket uses reactor-tcp implementation which will process the messages on clientOutboundChannel from the thread pool. Thus the messages can be written to the tcp socket in different order that they are arrived. When I configured the web socket to limit 1 thread for the clientOutboundChannel, the order is preserved.
This problem is not in the SocketJS but a limitation of current Spring web socket design.
It's Spring web socket design problem. To receive messages in valid order you have to set corePoolSize of websocket clients to 1.
#Configuration
#EnableWebSocketMessageBroker
public class WebSocketMessageBrokerConfiguration extends AbstractWebSocketMessageBrokerConfigurer {
#Override
public void configureClientOutboundChannel(ChannelRegistration registration) {
registration.taskExecutor().corePoolSize(1);
}
#Override
public void configureClientInboundChannel(ChannelRegistration registration) {
registration.taskExecutor().corePoolSize(1);
}
}
UPDATE
Please see #Jason's answer. Spring 5.1 has a setPreservePublishOrder() to order the messages based on their client ID.
I experienced this issue as well. I don't like to limit my thread pool size to 1 for this will cause an overhead on my application. Instead, I used a StripedExecutorService to process messages coming in and out of my application. This type of executor service guarantees ordered processing of messages for tasks that have same stripe. For me, I use WebSocket session ID as stripe. Register this executor via ChannelRegistration.taskExecutor() on your inbound, broker, and outbound channel and this will guarantee ordered messages. Choose your stripe wisely.

Resources