Correct usage of LoadbalanceRSocketClient with Spring's RSocketRequester - spring-messaging

I'm trying to understand the correct configuration and usage pattern of LoadbalanceRSocketClient in a context of SpringBoot application (RSocketRequester).
I have two RSocket server backends (SpringBoot, RSocket messaging) running and configuring the RSocketRequester on a client side like this:
List<LoadbalanceTarget> servers = new ArrayList<>();
for (String url: backendUrls) {
HttpClient httpClient = HttpClient.create()
.baseUrl(url)
.secure(ssl ->
ssl.sslContext(SslContextBuilder.forClient().trustManager(InsecureTrustManagerFactory.INSTANCE)));
servers.add(LoadbalanceTarget.from(url, WebsocketClientTransport.create(httpClient, url)));
}
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
//.rsocketConnector(connector -> connector.reconnect(Retry.fixedDelay(60, Duration.ofSeconds(1))))
.transports(Flux.just(servers), new RoundRobinLoadbalanceStrategy());
Once configured, the requester is being used repeatedly form the timer loop, as following:
#Scheduled(fixedDelay = 10000, initialDelay = 1000)
public void timer() {
requester.route("/foo").data(Data).send().block();
}
It works - client starts, connects to one of the servers and pushes messages to it. If I kill the server that clients connected to, client reconnects to another server on the next timer event. If I start first server again and kill a second one though, client doesn't connect anymore and the following exeption is observed on a client side:
java.util.concurrent.CancellationException: Pool is exhausted
at io.rsocket.loadbalance.RSocketPool.select(RSocketPool.java:202) ~[rsocket-core-1.1.0.jar:na]
at io.rsocket.loadbalance.LoadbalanceRSocketClient.lambda$fireAndForget$0(LoadbalanceRSocketClient.java:49) ~[rsocket-core-1.1.0.jar:na]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:125) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:220) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1784) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:251) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:336) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1784) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:61) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.subscribe(Mono.java:3987) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip.subscribe(MonoZip.java:128) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.subscribe(Mono.java:3987) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.block(Mono.java:1678) ~[reactor-core-3.4.0.jar:3.4.0]
I suspect that I'm either not configuring the requester correctly or not using it properly. Would appreciate any hints as documentation and tests are seems to be pretty thin in this area.
Ideally I would want a client to transparently switch to any next available server upon server/connectivity failure. Right now re-connection attempt seems to be happening only on the next call to timer() method, which is not ideal as client needs to handle incoming messages from the server. Another thing I observed is that even so "/foo" is a FnF route, unless I do block() after a send() server never receives the call.

Update Endpoints List Continuously
LoadbalanceClient is designed to be integrated with the Discovery service which is responsible for keeping a List of alive Instances. That said if one of the services disappears from the cluster, the Discovery service updates its List of available Instances.
On the other hand, to implement client-side loadblancing, we have to know the list of available services in the cluster. It is obvious, that to setup loadbalancing, we can retrieve the list of services and supply it to the Loadbalancer API.
ReactiveDiscoveryClient discoveryClient = ...
Mono<List<LoadbalanceTarget>> serversMono = discoveryClient
.getInstances(serviceGroupName)
.map(si -> {
HttpClient httpClient = HttpClient.create()
.baseUrl(si.getUri())
.secure(ssl -> ssl.sslContext(
SslContextBuilder.forClient()
.trustManager(InsecureTrustManagerFactory.INSTANCE)
));
return LoadbalanceTarget.from(si.getUri(), WebsocketClientTransport.create(httpClient, "/rsocket")));
})
.collectList()
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
.transports(serversMono.flux(), new RoundRobinLoadbalanceStrategy());
However, imagine that we are in a fully distributed environment, and now every service that disappears and appears again - runs on the absolutely new host and port (e.g. kubernates cluster which does not stick to a particular IP address). That said, Loadbalancing has to consider such a scenario and to avoid dead nodes in the pool, it removes unhealthy nodes from the pool completely.
Now, if all the nodes disappeared and appeared after some time, they are not included in the pool anymore (and if the Flux, which provides updates is completed, effectively, the pool is exhausted because no new update will come in from the Flux<List<LodbalanceTarget>>).
However, the nodes register themselves into the Discovery service and become available for observation. All that said we have to periodically pull info from the Discovery service to be up to date and update pool state continuously
ReactiveDiscoveryClient discoveryClient = ...
Flux<List<LoadbalanceTarget>> serversFlux = discoveryClient
.getInstances(serviceGroupName)
.map(si -> {
HttpClient httpClient = HttpClient.create()
.baseUrl(si.getUri())
.secure(ssl -> ssl.sslContext(
SslContextBuilder.forClient()
.trustManager(InsecureTrustManagerFactory.INSTANCE)
));
return LoadbalanceTarget.from(si.getUri(), WebsocketClientTransport.create(httpClient, "/rsocket")));
})
.collectList()
.repeatWhen(f -> f.delayElements(Duration.ofSeconds(1))) // <- continuously retrieve new List of ServiceInstances
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
.transports(servers, new RoundRobinLoadbalanceStrategy());
With such a setup, the RSocketPool will not be exhausted if all the nodes disappear from the cluster, because the Flux<List<LoadbalanceTraget>> has not completed yet and may provide new updates eventually.
Note, the implementation is smart enough to keep active nodes on every update from the discovery service. That said if there is such a service instance in the pool, you will not get 2 connections at the same time.
Side note on reconnect feature
You may notice, that RSocketConnector provides such a great feature called .reconnect. At first glance, it may seem that the usage of reconnect will keep your connection up and running infinitely. Unfortunately, that is not true. The .reconnect feature is designed to keep your Mono<RSocket> reusable with cache semantic, which means that you may create a #Bean Mono<RSocket> ... and autowire it in a various place and subscribe multiple times without worrying that the result RSocket instance will be different on every Mono<RSocket>.subscribe. On the other hand, .reconnect, if given RSocket becomes disconnected (e.g. lost connection case) the next subscription to such a Mono<RSocket> will resistible a new RSocket only once for all concurrent .subscribe calls.
Though it sounds useful feature, in RSocketPool we do not rely on it much and use Mono<RSocket> only once to resolve and cache an instance of RSocket inside RSocketPool. That said if such RSocket will be disconnected, we will not be trying to subscribe to the given Mono<RSocket> again (we assume, that set up host and port will be changed)

For the question around FnF, this is part of the Rx model. Without a subscribe the event doesn't happen. You are free to call an API returning a Mono without side effects before the subscribe, any other behaviour is a bug.
/**
* Perform a Fire-and-Forget interaction via {#link RSocket#fireAndForget(Payload)}. Allows
* multiple subscriptions and performs a request per subscriber.
*/
Mono<Void> fireAndForget(Mono<Payload> payloadMono);
If you call this method once, and then subscribe 3 times on the result it will execute it 3 times.

Oleh, I tried what you suggested and it works to some extent, although I still can't quite get the behavior I need.
What I want to do is:
Client connects to a single (random) backend at a time
If backend or connectivity to the backend fails, client should try to connect to the next available backend.
I guess I can't use RoundRobinLoadbalanceStrategy as it connects the client to all available backends. Should I use WeightedLoadbalanceStrategy instead? Or should discoveryClient abstraction only return a single server every time - but that no longer would be a 'pool' client, right?
Perhaps I should re-think by approach in general. I have a few dozens of thousands of clients so I want to balance the load on the back end - spread it across multiple instances of the backend, so each client randomly connects to one instance of the backend but is capable of re-connecting to another instance, if instance it conneced to fails. I assume that this is not a good idea to connect all clients to every backend instance at the same time, but maybe I'm wrong?

Related

MassTransit timeouts under load on .NETFramework under IIS

Under load in production we receive "RabbitMQ.Client.Exceptions.ConnectFailureException" connection failed and "MassTransit.RequestTimeoutException" timeout waiting for response. The consumer does receive the message and send it back. It's like the web app isn't listening, or unable to accept the connection.
We're running an ASP.NET web application ( not MVC ) on .NET Framework 4.6.2 on Windows Server 2019 on IIS. We're using MassTransit 7.0.4. In production, under load, we can get some exceptions dealing with sockets on RabbitMQ or timeouts from masstransit. It's difficult to reproduce them in Dev. RabbitMQ is in a mirror, it seems to happen once we turn on a high-load service that bumps from 140 message/sec to 250 message/sec.
I have a few questions about the code architecture, and then if anyone else is running into these kinds of timeout issues.
Questions:
Should I have static scope for the IBusControl? IE, should it be static inside Global asax? And does it matter at all if it's a singleton underneath?
Should I create a new IBusControl and start it per request ( maybe stick it in Application BeginRequest ). Would that make a difference?
Would adding another worker process affect the total number of open connections I'm able to make -- If this is a resource issue ( exhausting threads, connections or some resource ).
Exceptions:
MassTransit.RequestTimeoutException
Timeout Waiting for response
Stacktrace:
System.Runtime.ExceptionServices.ExceptionDispathInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
MassTransit.Clients.ResponseHandlerConnectionHandle`1+<GetTask>d_11.MoveNext
System.Threading.ExecutionContext.RunInternal
RabbitMQ.Client.Exceptions.ConnectFailureException
Connection failed
Statcktrace:
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily
RabbitMQ.Client.Impl.SocketFrameHandler..ctor
RabbitMQ.Client.ConnectionFactory.CreateFrameHandler
RabbitMQ.Client.EndPointResolverExtensions.SelectOne
RabbitMQ.Client.ConnectionFactory.CreateConnection
How Our Code Works ( overview )
Static IBusControl that is instantiated the first time someone tries to produce a message. The whole connection and send code is a little large to put in here ( connection factory and other metric classes, but below are the interesting parts ).
Static IBusControl B;
B = Bus.Factory.CreateUsingRabbitMq(x =>
{
hostAddress = host.HostAddress;
x.Host(new Uri(host.HostAddress), h =>
{
h.Username(host.UserName);
h.Password(host.Password);
});
x.Durable = false;
x.SetQueueArgument("x-message-ttl", 600000);
});
B.Start(new TimeSpan(0, 0, 10));
// Then send the Actual Messages
// Generic with TRequest and TResponse : class BaseMessage
// Pulling the code out of a few different classes
string serviceAddressString = string.Format("{0}/{1}?durable={2}", HostAddress, ChkMassTransit.QueueName(typeof(TRequest), typeof(TResponse)), false ? "true" : "false");
Uri serviceAddress = new Uri(serviceAddressString);
RequestTimeout rt = RequestTimeout.After(0, 0, 0, 0, timeout.Value);
IRequestClient<TRequest> reqClient = B.CreateRequestClient<TRequest>(serviceAddress, rt);
var v = reqClient.GetResponse<TResponse>(request, sendInfo.CT, sendInfo.RT);
if ( v.Wait(timeoutMS) ) { /*do some stuff*/ }
First, I find your lack of async disturbing. Using Wait or anything like it on TPL-based code is a recipe for death and destruction, pain and suffering, dogs and cats living together, etc.
Yes, you should have a single bus instance that is started when the application starts. Since you're doing request/response, set AutoStart = true on the bus configurator to make sure it's all warmed up and ready.
Never, no, one bus only!
Each bus instance only has a single connection, so you shouldn't see any resource issues related to capacity on RabbitMQ.
MassTransit 7.0.4 is really old, you might consider the easy upgrade 7.3.1 and see if that improves things for you. It's the last version of the v7 codebase available.

How to better correlate Spring Integration TCP Inbound and Outbound Adapters within the same application?

I currently have a Spring Integration application which is utilizing a number of TCP inbound and outbound adapter combinations for message handling. All of these adapter combinations utilize the same single MessageEndpoint for request processing and the same single MessagingGateway for response sending.
The MessageEndpoint’s final output channel is a DirectChannel that is also the DefaultRequestChannel of the MessageGateway. This DirectChannel utilizes the default RoundRobinLoadBalancingStrategy which is doing a Round Robin search for the correct Outbound Adapter to send the given response through. Of course, this round robin search does not always find the appropriate Outbound Adapter on first search and when it doesn’t it logs accordingly. Not only is this producing a large amount of unwanted logging but it also raises some performance concerns as I anticipate several hundred inbound/outbound adapter combinations existing at any given time.
I am wondering if there is a way in which I can more closely correlate the inbound and outbound adapters in a way that there is no need for the round robin processing and each response can be sent directly to the corresponding outbound adapter? Ideally, I would like this to be implemented in a way that the use of a single MessageEndpoint and single MessageGateway can be maintained.
Note: Please limit solutions to those which use the Inbound/Outbound Adapter combinations. The use of TcpInbound/TcpOutboundGateways is not possible for my implementation as I need to send multiple responses to a single request and, to my knowledge, this can only be done with the use of inbound/outbound adapters.
To add some clarity, below is a condensed version of the current implementation described. I have tried to clear out any unrelated code just to make things easier to read...
// Inbound/Outbound Adapter creation (part of a service that is used to dynamically create varying number of inbound/outbound adapter combinations)
public void configureAdapterCombination(int port) {
TcpNioServerConnectionFactory connectionFactory = new TcpNioServerConnectionFactory(port);
// Connection Factory registered with Application Context bean factory (removed for readability)...
TcpReceivingChannelAdapter inboundAdapter = new TcpReceivingChannelAdapter();
inboundAdapter.setConnectionFactory(connectionFactory);
inboundAdapter.setOutputChannel(context.getBean("sendFirstResponse", DirectChannel.class));
// Inbound Adapter registered with Application Context bean factory (removed for readability)...
TcpSendingMessageHandler outboundAdapter = new TcpSendingMessageHandler();
outboundAdapter.setConnectionFactory(connectionFactory);
// Outbound Adapter registered with Application Context bean factory (removed for readability)...
context.getBean("outboundResponse", DirectChannel.class).subscribe(outboundAdapter);
}
// Message Endpoint for processing requests
#MessageEndpoint
public class RequestProcessor {
#Autowired
private OutboundResponseGateway outboundResponseGateway;
// Direct Channel which is using Round Robin lookup
#Bean
public DirectChannel outboundResponse() {
return new DirectChannel();
}
// Removed additional, unrelated, endpoints for readability...
#ServiceActivator(inputChannel="sendFirstResponse", outputChannel="sendSecondResponse")
public Message<String> sendFirstResponse(Message<String> message) {
// Unrelated message processing/response generation excluded...
outboundResponseGateway.sendOutboundResponse("First Response", message.getHeaders().get(IpHeaders.CONNECTION_ID, String.class));
return message;
}
// Service Activator that puts second response on the request channel of the Message Gateway
#ServiceActivator(inputChannel = "sendSecondResponse", outputChannel="outboundResponse")
public Message<String> processQuery(Message<String> message) {
// Unrelated message processing/response generation excluded...
return MessageBuilder.withPayload("Second Response").copyHeaders(message.getHeaders()).build();
}
}
// Messaging Gateway for sending responses
#MessagingGateway(defaultRequestChannel="outboundResponse")
public interface OutboundResponseGateway {
public void sendOutboundResponse(#Payload String payload, #Header(IpHeaders.CONNECTION_ID) String connectionId);
}
SOLUTION:
#Artem's suggestions in the comments/answers below seem to do the trick. Just wanted to make a quick note about how I was able to add a replyChannel to each Outbound Adapter on creation.
What I did was create two maps that are being maintained by the application. The first map is populated whenever a new Inbound/Outbound adapter combination is created and it is a mapping of ConnectionFactory name to replyChannel name. The second map is a map of ConnectionId to replyChannel name and this is populated on any new TcpConnectionOpenEvent via an EventListener.
Note that every TcpConnectionOpenEvent will have a ConnectionFactoryName and ConnectionId property defined based on where/how the connection is established.
From there, whenever a new request is received I use theses maps and the 'ip_connectionId' header on the Message to add a replyChannel header to the Message. The first response is sent by manually grabbing the corresponding replyChannel (based on the value of the replyChannel header) from the application's context and sending the response on that channel. The second response is sent via Spring Integration using the replyChannel header on the message as Artem describes in his responses.
This solution was implemented as a quick proof of concept and is just something that worked for my current implementation. Including this to hopefully jumpstart other viewer's own implementations/solutions.
Well, I see now your point about round-robin. You create many similar TCP channel adapters against the same channels. In this case it is indeed hard to distinguish one flow from another because you have a little control over those channels and their subscribers.
On of the solution would be grate with Spring Integration Java DSL and its dynamic flows: https://docs.spring.io/spring-integration/reference/html/dsl.html#java-dsl-runtime-flows
So, you would concentrate only on the flows and won't worry about runtime registration. But since you are not there and you deal just with plain Java & Annotations configuration, it is much harder for you to achieve a goal. But still...
You may be know that there is something like replyChannel header. It is taken into an account when we don't have a outputChannel configured. This way you would be able to have an isolated channel for each flow and the configuration would be really the same for all the flows.
So,
I would create a new channel for each configureAdapterCombination() call.
Propagate this one into that method for replyChannel.subscribe(outboundAdapter);
Use this channel in the beginning of your particular flow to populate it into a replyChannel header.
This way your processQuery() service-activator should go without an outputChannel. It is going to be selected from the replyChannel header for a proper outbound channel adapter correlation.
You don't need a #MessagingGateway for such a scenario since we don't have a fixed defaultRequestChannel any more. In the sendFirstResponse() service method you just take a replyChannel header and send a newly created message manually. Technically it is exactly the same what you try to do with a mentioned #MessagingGateway.
For Java DSL variant I would go with a filter on the PublishSubscribeChannel to discard those messages which don't belong to the current flow. Anyway it is a different story.
Try to figure out how you can have a reply channel per flow when you configure particular configureAdapterCombination().

How to connect queues to a ZeroMQ PUB/SUB

Consider the following:
a set of 3 logical services: S1, S2 and S3
two instances of each service are running, so we have the following processes: S1P1, S1P2, S2P1, S2P2, S3P1, S3P2
a ZeroMQ broker running in a single process and reachable by all service processes
A logical service, let's say S1, publishes a message M1 that is of interest to logical services S2 and S3. Only one process of each logical service must receive M1, so let's say S2P1 and S3P2.
I have tried the following, but without success:
broker thread 1 is running a XSUB/XPUB proxy
broker thread 2 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S1)
broker thread 3 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S2)
broker thread 4 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S3)
each logical service process is running a REP socket thread connected to the broker DEALER socket
I figured that the XSUB/XPUB proxy would give me publish/subscribe semantics and that the ROUTER/DEALER proxies would introduce a competition between the REP sockets for the messages sent by the XSUB/XPUB proxy.
How can I combine ZeroMQ sockets to accomplish this?
Update1
I know "without success" isn't helpful, I've tried different configurations and got different errors. The latest configuration I tried is the following:
(XSUB proxy=> XPUB) => (SUB copyLoop=> REQ) => (ROUTER proxy=> DEALER) => REP
The copyLoop goes like this:
public void start() {
context = ZMQ.context(1);
subSocket = context.socket(ZMQ.SUB);
subSocket.connect(subSocketUrl);
subSocket.subscribe("".getBytes());
reqSocket = context.socket(ZMQ.REQ);
reqSocket.connect(reqSocketUrl);
while (!Thread.currentThread().isInterrupted()) {
final Message msg = receiveNextMessage();
resendMessage(msg);
}
}
private Message receiveNextMessage() {
final String header = subSocket.recvStr();
final String entity = subSocket.recvStr();
return new Message(header, entity);
}
private void resendMessage(Message msg) {
reqSocket.sendMore(msg.getKey());
reqSocket.send(msg.getData(), 0);
}
The exception I get is the following:
java.lang.IllegalStateException: Cannot send another request
at zmq.Req.xsend(Req.java:51) ~[jeromq-0.3.4.jar:na]
at zmq.SocketBase.send(SocketBase.java:613) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.send(ZMQ.java:1206) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.sendMore(ZMQ.java:1189) ~[jeromq-0.3.4.jar:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.resendMessage(SubReqProxyConnector.java:47) ~[classes/:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.start(SubReqProxyConnector.java:35) ~[classes/:na]
I'm running JeroMQ 0.3.4, Oracle Java 8 JVM and Windows 7.
You seem to be adding in some complexity with your ROUTER connection - you should be able to do everything connected directly to your publisher.
The error you're currently running into is that REQ sockets have a strict message ordering pattern - you are not allowed to send() twice in a row, you must send/receive/send/receive/etc (likewise, REP sockets must receive/send/receive/send/etc). From what it looks like, you're just doing send/send/send/etc on your REQ socket without ever receiving a response. If you don't care about a response from your peer, then you must receive and discard it or use DEALER (or ROUTER, but DEALER makes more sense in your current diagram).
I've created a diagram of how I would accomplish this architecture below - using your basic process structure.
Broker T1 Broker T2 Broker T3 Broker T4
(PUB*)------>(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*)
|_____________________||____| || | ||
|_____________________||_______________________||____| ||
|| || ||
========================|| ==================|| ===========||=
|| || || || || ||
|| || || || || ||
|| || || || || ||
(REP*) (REP*) (REP*) (REP*) (REP*) (REP*)
S1P1 S1P2 S2P1 S2P2 S3P1 S3P2
So, the main difference is that I've ditched your (SUB copyLoop=> REQ) step. Whether you choose XPUB/XSUB vs PUB/SUB is up to you, but I would tend to start simpler unless you currently want to make use of the extra features of XPUB/XSUB.
Obviously this diagram doesn't deal with how information enters your broker, where you currently show an XSUB socket - that's out of scope for the information you've provided thus far, presumably you're able to receive information into your broker successfully already so I won't deal with that.
I assume your broker threads that are dedicated to each service are making intelligent choices on whether to send the message to their service or not? If so, then your choice of having them subscribed to everything should work fine, otherwise more intelligent subscription setups might be necessary.
If you're using a REP socket on your service processes, then the service process must take that message and deal with it asynchronously, never communicating back any details about that message to the broker. It must then respond to each message with an acknowledgement (like "RECEIVED") so that it follows the strict receive/send/receive/send pattern for REP sockets.
If you want any other type of communication about how the service handles that message sent back to the broker, REP is no longer the appropriate socket type for your service processes, and DEALER may no longer be the correct socket type for your broker. If you want some form of load balancing so that you send to the next open service process, you'll need to use ROUTER/REQ and have each service indicate its availability and have the broker hold on to the message until the next service process says its available by sending results back. If you want some other type of message handling, you'll have to indicate what that is so a suitable architecture can be proposed.
Clearly I got mixed up with a few elements:
Sockets have the same API whether you're using it as a client-side socket (Socket.connect) or a server-side socket (Socket.bind)
Sockets have the same API regardless of the type (e.g. Socket.subscribe should not be called on a PUSH socket)
Some socket types require a send/receive response loop (e.g. REQ/REP)
Some nuances in communication patterns (PUSH/PULL vs ROUTER/DEALER)
The difficulty (impossiblity?) in debugging a ZeroMQ setup
So a big thanks to Jason for his incredibly detailed answer (and awesome diagram!) that pointed me to the right direction.
I ended up with the following design:
broker thread 1 is running a fan-out XSUB/XPUB proxy on bind(localhost:6000) and bind(localhost:6001)
broker thread 2 is running a queuing SUB/PUSH proxy on connect(localhost:6001) and bind(localhost:6002); broker threads 3 and 4 use a similar design with different bind port numbers
message producers connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker queuing proxy using a PULL socket on connect(localhost:6002)
On top of this service-specific queuing mechanism, I was able to add a similar service-specific fan-out mechanism rather simply:
broker thread runs a SUB/PUB proxy on connect(localhost:6001) and bind(localhost:6003)
message producers still connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker fan-out proxy using a SUB socket on connect(localhost:6003)
This has been an interesting ride.

Websocket best practice for groups chat / one websocket for all groups or one websocket per group?

I have to implement a chat application using websocket, users will chat via groups, there can be thousands of groups and a user can be in multiple groups. I'm thinking about 2 solutions:
[1] for each group chat, I create a websocket endpoint (using camel-atmosphere-websocket), users in the same group can subscribe to the group endpoint and send/receive message over that endpoint. it means there can be thousands of websocket endpoints. Client side (let's say iPhone) has to subscribes to multiple wbesocket endpoints. is this a good practice?
[2] I just create one websocket endpoint for all groups. Client side just subscribes to this endpoint and I manage the messages distribution myself on server: get group members, pick the websocket of each member from list of connected websockets then write the message to each member via websocket.
Which solution is better in term of performance and easy to implement on both client and server?
Thanks.
EDIT 2015-10-06
I chose the second approach and did a test with jetty websocket client, I use camel atmosphere websocket on server side. On client side, I create websocket connections to server in threads. There was a problem with jetty that I can just create around 160 websocket connections (it means around 160 threads). The result is that I almost see no difference when the number of clients increases from 1 to 160.
Yes, 160 is not a big number, but I think I will do more test when I actually see the performance problem, for now, I'm ok with second approach.
If you are interested in the test code, here it is:
http://www.eclipse.org/jetty/documentation/current/jetty-websocket-client-api.html#d0e22545
I think second approach will be better to use for performance. I am using the same for my application, but it is still in testing phase so can't comment about the real time performance. Now its running for 10-15 groups and working fine. In my app, there is similar condition like you in which user can chat based on group. I am handling the the group creation on server side using node.js. Here is the code to create group, but it is for my app specific condition. Just pasting here for the reference. Getting homeState and userId from front-end. Creating group based on the homeState. This code is only for example, it won't work for you. To improve performance you can use clustering.
this.ConnectionObject = function(homeState, userId, ws) {
this.homeState = homeState;
this.userId = userId;
this.wsConnection = ws;
},
this.createConnectionEntry = function(homeState, userId,
ws) {
var connObject = new ws.thisRefer.ConnectionObject(homeState, userId,
ws);
var connectionEntryList = null;
if (ws.thisRefer.connectionMap[homeState] != undefined) {
connectionEntryList = ws.thisRefer.connectionMap[homeState];
} else {
connectionEntryList = new Array();
}
connectionEntryList.push(connObject);
console.log(connectionEntryList.length);
ws.thisRefer.connectionMap[homeState] = connectionEntryList;
ws.thisRefer.connecteduserIdMap[userId] = "";
}
Browsers implement a restriction on the numbers of websocket that can be opened by the same tab. You can't rely on being able to create as many connection as possible. Go for solution #2

How can I implement fixed socket count on proxy->server connection?

I read netty proxy example, (https://github.com/netty/netty/tree/master/example/src/main/java/io/netty/example/proxy )
and I have two requirement.
I want to use fixed-count connection on proxy->server.
On proxy example, proxy->server conn. count equals client->proxy conn. count.
It may be too many.
When client->proxy connection ends, proxy->server connection has to be keep alived
And when new client->proxy connection established, reuse proxy->server connections.
How can it be implemented?
The first requirement can be realized rather easily by using a DefaultChannelGroup to store your channels. Assuming that the ChannelHandler which is accepting incoming connections is a singleton, then you can use the following code.
// initialize channelgroup in your singleton handler
ChannelGroup ALL_CONNECTIONS = new DefaultChannelGroup(GlobalEventExecutor.INSTANCE);
...
#Override
public synchronized void channelActive(ChannelHandlerContext ctx) throws Exception
{
if(ALL_CONNECTIONS.size() > 100){
ctx.channel().close();// dont accept further connections
}else{
ALL_CONNECTIONS.add(ctx.channel());
// do whatever logic.
}
}
I think you are thinking of "connection pooling" for the second requirement. If so, its not a great idea I think. Since, when a new client "connects" to your server, it is always a new connection since it is coming from outside of your network. However I am not sure of this and someone with more knowledge can answer.
Both what your need, i think, is a client with connection pool.
Both HttpComponents and AsyncHttpClient support pooling, You could have a look at the codes in AsyncHttpClient which also have a netty based implementation.

Resources