Can Netty efficiently handle scores of outgoing connections as a client? - client

I'm creating a client-server relationship whereby a single client will be connected to an arbitrary number of servers using persistent TCP connections. The actual number of servers is as-of-yet undetermined, but the design goal is to shoot for 1000.
I found an example using direct Java NIO that nearly completely matches my mental model of how this could work:
http://drdobbs.com/jvm/184406242
In general, it opens up all of the channels and adds them to a single thread monitoring java.nio.channels.Selector. The use of the Selector, in particular, is what allows this to scale far better than using the standard thread-per-channel.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO. Unfortunately, I have not been able to determine how Netty would handle a case like this. That is, the examples and discussions I've found all tend to center around the server side, with accepting scores of concurrent connections.
But what about doing this from the client side? If I create a large number of channels and just wait on their events, how is Netty going to handle this at the back-end?

This isn't a direct answer to your question but I hope it is helpful nonetheless. Below, I describe a way for you to determine the answer that you are looking for. This is something that I recently did myself for an upcoming project.
Compared to OIO (Old IO) the asynchronous nature of the Netty framework and NIO will indeed provide much better memory and CPU usage characteristics for your application. The way buffers are handled in Netty will also be of benefit as it will help you to avoid copying byte buffers. The point is that all of the thread pool and NIO details will be handled for you allowing you to focus on your business logic. You mentioned the NIO Selector and you will benefit from that; the nice thing about Netty is that you get the benefits without having to worry about that implementation yourself because it is already done for you.
My understanding of the client side is that it is very similar to the server side and should provide you with commensurate performance gains (as long as your business logic doesn't introduce any performance issues).
My advice would be to throw together a prototype that more or less does what you want. Leave out any time consuming details and just add in the basic Netty handlers that you need to make something that works.
Then I would use jmeter to invoke your client to apply load to the server and client. Using something like jconsole or jvisualvm will show you the performance characteristics of the client and server under load. You could also try jprobe. You can add a listener in jmeter that will indicate the throughput. I would advise to use jmeter in server mode, the client on another machine and the server on yet another. This is a bit of up front work but if you decide to move forward you will have these tools ready to go for further testing as your proceed.
I suspect a decent Netty implementation that doesn't introduce any extraneous poorly performing components will give you the performance characteristics you are looking for, but, the only way to know for sure is to measure the system under the expected load.
You need to define what the expected load looks like and the desired performance characteristics under such load. Given these inputs you can measure your system to find out if it will meet your expectations. I personally don't think anyone can tell you if it will behave in the desired manner. You have to measure it. It's the only reliable way to know if the system can meet your needs.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO.
This is the correct approach. You can try implementing your own NIO server and client but why do that when you have the benefit of a highly refined framework at your fingertips already?

Netty will use up to x worker threads that handle the work for you. Each worker thread will have one Selector that is used to register Channels to it. The number of used workers is configurable and by default 2 * cpu-count.

As you can see in the example from Netty's doc [http://netty.io/docs/stable/guide/html/#start.9][1] you can control exactly the number of worker threads (meaning the number of underlying selectors) on the Client side.
Netty solves a numbers of issues that are very hard to handle in a simple way such as NIO vs SSL, and have a lot of default encoder/decoder for Zip... etc.
I started using Netty a few week ago and it was quite fast to came into. (I recommend dowloading the project with all the example code inside, there is a lot of documentation in it that can not be found on the url above.
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
ClientBootstrap bootstrap = new ClientBootstrap(factory);
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
return Channels.pipeline(new TimeClientHandler());
}
});
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
bootstrap.connect(new InetSocketAddress(host, port));
Good luck,
Renaud

Related

What would be the right ZMQ Pattern?

I am trying to build a ZeroMQ pattern where,
There can be many clients connecting to a single server endpoint
Server will distribute incoming client tasks to available workers (will be mapped to the number of cores on the server)
These tasks are long running (in hours) and need to perform a lot of local I/O
During each task execution (iteration) there will be data/messages (potentially in order of [GB]s) sent back and forth between the client and the server worker
Client and server workers need to know if there are failures/errors on the peer side, so that they can recover (retry) or shutdown gracefully and try later
Based on the above, I presume that the ROUTER/DEALER pattern would be useful. PUB/SUB is discarded as I need to know if the peer fails.
I tried using various combinations of the ROUTER/DEALER pattern but I am unable to ensure that multiple messages from a client reach the same worker within an iteration. I understand that I need to implement a broker/forwarder/device that routes the incoming messages to the right recipient/handler/worker. But I am unable to map the frontend and backend sockets in the broker. I am looking at MajorDomo pattern, but I guess there has to be a simpler broker model that could just route the messages to the assigned worker. (not really get into services)
I am looking for some examples, if there are any or any guidance on what I may be missing. I am trying to build this in Golang.
Q : "What would be the right ZMQ Pattern?"
Based on the complex composition of all the requirements posted under items 1 - 5, I dare to say, The Right would be NOT to use a single one of the standard, built-in, ZeroMQ trivial primitive Communication Archetype Patterns, but to rather create a multi-layered application-specific composition of a ( M + N + 1 hot-standby robust-enough?) (self-resilient?) Signalling-Messaging infrastructure, that covers all your current ( and possibly extensible for any future one ) application-level requirements, like depicted here for a way simpler distributed-computing use-case, where but a trivial remote-SigKILL was implemented.
Yes, the best would be to create ( and maintain ) your own formalised signalling, that the application level can handle and interact across -- like the heart-beating for detecting dead-worker(s) + permitting to re-instate such failed jobs right on-detected failures (most probably re-located and/or re-scheduled to take place & respective resources not statically pre-mapped, but where physically most feasible at the re-instating moment of time - so even more telemetry signalling will help you decide about the re-instating of the such failed micro-jobs).
ZeroMQ is a fabulous framework right for such complex signalling and messaging hierarchies, so your System Architect's imagination is the only ceiling in this concept.
ZeroMQ will take the rest and do all the hard work nice and easily.

How To Scale Spring Reactive Stack Using Netty

I am starting a project using spring webflux reactive stack which by default uses Reactor Netty as the server. Pls correct me if i'm wrong, but i read that Netty can only have maximum number of event loops as the amount of processors on the instance.
This means that if a request gets blocked for a second (which should not be the use case i know, just for example), we would only be able to get max 1 Transaction Per Second if there is only 1 processor on instance.
I am wondering how scalable Netty is compared to servlet container like Tomcat? What are the pros and cons of using Netty vs Tomcat?
I also want to know the ways to optimize Netty configurations to make sure it is production ready.
This means that if a request gets blocked for a second (which should not be the use case i know, just for example)
the whole purpose of this stack is to scale hugely on a limited amount of resources (here, threads). This is all built on the critical requirement that every step is asynchronous and non-blocking.
So your "just for example" doesn't make any sense. Yes, if you block for one second that CPU will only process that single request during that second. That is also completely wrong of you to do so, and everything in the stack is made to help you avoid blocking.

How get a data without polling?

This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?
There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems
The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.
If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.
Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.

Is there any way to have an asynchronous client and server in ZeroMQ?

Is there any way to have an asynchronous client and server in ZeroMQ, using the same TCP-port and many sockets?
I already tried the ROUTER/ROUTER pattern, but with no luck.
The plan is to have an asymmetric connection with send and receive patterns among processors. So a Processor-entity will be a Client and also be a Server at the same time.
Is there any way to have an asynchronous client and server in ZeroMQ, using the same TCP-port and many sockets ?
Yes, there is.
As a preventive step, in other words, before getting into troubles, best review the main conceptual differences in [ ZeroMQ hierarchy in less than a five seconds ] or other posts and discussions here.
The Yes above means, one-.bind()-many-.connect()-s, which composition still uses a just one <transport-class>://<a-class-specific-address>, that for a tcp:// transport-class on IPv4 means that one tcp://A.B.C.D:port# occupied for this whole 1:MANY-sockets-composition.
For obvious reasons, more complex compositions, like many-.bind()-s-many-.connect()-s, are possible, where feasible, given both the ZeroMQ infrastructure topology options and also socket-"in-band"-message-routing features are thus setup and used for smart-decisions on actual message-flow mechanics.

Vertx worker verticle pool for jdbc

I'm new to Vert.x and I would like to implement a pool of worker verticles to make database queries using BoneCP. However, I'm a little bit confused about how to 'call' them to work and how to share the BoneCP connection pool between them.
I saw in Vertx DeploymentManager source that the start(Future) method is called synchronously and then the verticle is kept in memory until undeployed. After the start method completes, what's the correct way of calling methods on the worker verticle? If I deploy many instances of the verticle (using DeploymentOptions.setInstances()), will Vertx do load balancing between them?
I saw that Vert.x comes with a JDBC client and a worker pool, but it has limited datatypes I can work with because it uses the EventBus and serializes all data returned by the database. I need to work with many different datatypes (including dates, BigDecimals and binary objects) and I would like to avoid serialization as much as possible, but instead make queries in the worker verticle, process the results and return an object via a Future or AsyncResult (I believe this is done on-heap, so no serialization needed; is this correct?).
Please help me to sort out all these questions :) I will appreciate a lot if you give me examples of how can I make this work!
Thanks!
I'll try to answer your questions one by one.
how to 'call' them to work
You call your worker verticles using the EventBus. That's the proper way to communicate between them. Please see this example:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/verticle/worker/MainVerticle.java#L27
how to share the BoneCP connection pool between them.
Don't. Instead, create a small connection pool for each. Otherwise, it will cause unexpected behavior.
config.setMinConnectionsPerPartition(1);
config.setMaxConnectionsPerPartition(5);
config.setPartitionCount(1);
will Vertx do load balancing between them
No. That's the reason #Jochen Bedersdorfer and I suggest to use EventBus. You can have a reference to your worker verticle, as you suggested, but then you're stuck with 1:1 configuration.
return an object via a Future or AsyncResult (I believe this is done
on-heap, so no serialization needed; is this correct?)
This is correct. But again, you're stuck with 1:1 mapping then. Which is a lot worse in terms of performance that serialization (that's using buffers).
If you still do need something like that, maybe you shouldn't use worker verticles at all, but something like .executeBlocking:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/execblocking/ExecBlockingExample.java#L25
In your start(...) method, register event listeners with event bus as this is how you interact with verticles (worker or not).
Yes, if you deploy many instances, Vert.x will use round-robin to send messages to those instances.
For what you describe, Vert.x might not be the best fit, since it works best with asynchronous I/O.
You might be better off using standard Java concurrency tools to manage the load, i.e. Executor and friends.

Resources