Amazon Dynamo DB max client connections? - amazon-ec2

When creating an AmazonDynamoDBClient in the Java API we can specify the maximum connection pool size using setMaxConnections on ClientConfiguration. Is there a hard / recommended limit on this? For example although the default limit is 50 connections a linux client should be able to sustain 5,000 open connections, will Amazon allow this?
If there is a maximum limit does it only apply to a single client instance? What about if there are several machines using dynamo through the same account, will they share a connection limit?
Thanks!

Since the connections to Amazon DynamoDB is http(s) based, the concept of open connections is limited to your tcp max open connections at once. I highly doubt there's a limit on Amazons end at all as it is load balanced close to infinity.
Naturally, the exception is your read and write capacity limits. Note that they want you to contact them if you will exceed a certain amount of capacity units which depend on your region.
You've probably already read them, but the limitations of DynamoDB is found here:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

Related

Azure Redis Cache GET throughput per client connection

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-planning-faq The document mentions the Throughput numbers for GETS. But there are multiple client connections possible and there is also a limit based on the Pricing tier.
Question: Is the given number on "GET Requests per second" per client connection OR after creating a max possible connection with Redis cache and running GET operations from each client?
That's the total GETs/second regardless of the number of connections. I believe we tested with 50 connections. With lower numbers of connections, you may hit bottlenecks in the throughput of client instances or network connections before hit the limits of the server.
We always recommend benchmarking throughput with your application's actual architecture and workload to find actual cache capabilities for your use case: https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-performance

Microservices - Connection Pooling when connecting to a single legacy database

I am working on developing micro services for a monolithic application using spring boot + spring cloud + spring JDBC.
Currently, the application is connecting to a single database through tomcat JNDI connection pool.
We have a bottleneck here, not to change the database architecture at this point of time because of various reasons like large number of db objects,tight dependencies with other systems,etc.
So we have isolated the micro services based on application features. My concern is if we develop microservices with each having its own connection pool, then the number of connections to the database can increase exponentially.
Currently, I am thinking of two solutions
To calculate the number of connections that is being used currently by each application feature and arriving at max/min connection params per service- which is a very tedious process and we don't have any mechanism to get the connection count per app feature.
To develop a data-microservice with a single connection pool which gets the query object from other MS, triggers the query to the database and returns the resultset object to the caller.
Not sure whether the second approach is a best practice in the microservices architechture.
Can you please suggest any other standard approaches that can be helpful in the
current situation?
It's all about the tradeoffs.
To calculate the number of connections that is being used currently by each application feature and arriving at max/min connection params per service.
Cons: As you said, some profiling and guesswork needed to reach the sweet number of connection per app feature.
Pros: Unlike the second approach, you can avoid performance overhead
To develop a data-microservice with a single connection pool which gets the query object from other MS, triggers the query to the database and returns the resultset object to the caller.
Pros : Minimal work upfront
Cons: one more layer, in turn one more failure point. Performance will degrade as you have to deal with serialization -> Http(s) network latency -> deserialization->(jdbc fun stuff which is part of either approach) -> serialization -> Http(s) network latency -> deserialization. (In your case this performance cost may be negligible. But if every millisecond counts in your service, then this is a huge deciding factor)
In my opinion, I wouldn't split the application layer alone until I have analyzed my domains and my datastores.
This is a good read: http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/
I am facing a similar dilemma at my work and I can share the conclusions we have reached so far.
There is no silver bullet at the moment, so:
1 - Calculate the number of connections dividing the total desired number of connections for the instances of microservices will work well if you have a situation where your microservices don't need to drastically elastic scale.
2 - Not having a pool at all and let the connections be opened on demand. This is what is being used in functional programming (like Amazon lambdas). It will reduce the total number of open connections but the downside is that you lose performance as per opening connections on the fly is expensive.
You could implement some sort of topic that let your service know that the number of instances changed in a listener and update the total connection number, but it is a complex solution and goes against the microservice principle that you should not change the configurations of the service after it started running.
Conclusion: I would calculate the number if the microservice tend to not grow in scale and without a pool if it does need to grow elastically and exponentially, in this last case make sure that a retry is in place in case it does not get a connection in the first attempt.
There is an interesting grey area here awaiting for a better way of controlling pools of connections in microservices.
In time, and to make the problem even more interesting, I recommend reading the
article About Pool Sizing from HikariCP: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing
The ideal concurrent connections in a database are actually smaller than most people think.

Max connection pool size and autoscaling group

In Sequelize.js you should configure the max connection pool size (default 5). I don't know how to deal with this configuration as I work on an autoscaling platform in AWS.
The Aurora DB cluster on r3.2xlarge allows 2000 max connections per read replica (you can get that by running SELECT ##MAX_CONNECTIONS;).
The problem is I don't know what should be the right configuration for each server hosted on our EC2s. What should be the right max connection pool size as I don't know how many servers will be launched by the autoscaling group? Normally, the DB MAX_CONNECTIONS value should be divided by the number of connection pools (one by server), but I don't know how many server will be instantiated at the end.
Our concurrent users count is estimated to be between 50000 and 75000 concurrent users at our release date.
Did someone get previous experience with this kind of situation?
It has been 6 weeks since you asked, but since I got involved in this recently I thought I would share my experience.
The answer various based on how the application works and performs. Plus the characteristics of the application under load for the instance type.
1) You want your pool size to be > than the expected simultaneous queries running on your host.
2) You never want your a situation where number of clients * pool size approaches your max connection limit.
Remember though that simultaneous queries is generally less than simultaneous web requests since most code uses a connection to do a query and then releases it.
So you would need to model your application to understand the actual queries (and amount) that would happen for your 75K users. This is likely a lot LESS than 75K/second db queries a second.
You then can construct a script - we used jmeter - and run a test to simulate performance. One of the items we did during our test was to increase the pool higher and see the difference in performance. We actually used a large number (100) after doing a baseline and found the number made a difference. We then dropped it down until it start making a difference. In our case it was 15 and so I set it to 20.
This was against t2.micro as our app server. If I change the servers to something bigger, this value likely will go up.
Please note that you pay a cost on application startup when you set a higher number...and you also incur some overhead on your server to keep those idle connections so making larger than you need isn't good.
Hope this helps.

How many concurrent connections can MarkLogic server process?

Is there an upper limit on how many concurrent connections that MarkLogic can process? For example, ASP.NET is restricted to processing 10 requests concurrently, regardless of infrastructure and hardware. Is there a similar restriction in MarkLogic server? If not, are there any benchmarks that give some indication as to how many connections a typical instance can handle?
Given a large enough budget there is no practical limit on the number of concurrent connections.
The basic limit is the application server thread count, although excess requests will also pile up in the backlog queue. According to groups.xsd each application server is limited to at most 256 threads. The backlog seems to have no maximum, but most operating systems will silently limit it to something between 256-4096. So depending on whether or not you count the backlog, a single app server on a single host could have 256-4352 concurrent connections.
After that you can use multiple app servers, and add hosts to the cluster. Use a load balancer if necessary. Most operating systems will impose a limit of around 32,000 - 64,000 open sockets per host, but there is no hard limit on the number of hosts or app servers. Eventually request ids might be a problem, but those are 64-bit numbers so there is a lot of headroom.
Of course none of this guarantees that your CPU, memory, disk, and network can keep up with the demand. That is a separate problem, and highly application-specific.

max concurrent connection to amazon load balancer

My testing shows that amazon load balancer rest connection with its instance when it has about 10k concurrent connections into it. Is that a limit of Amazon load balancer? If not, is there a setting for it? I need to support upto 1M concurrent connections for my testing.
Thanks,
Sean Nguyen
The ELB should scale way beyond that, but you need to be testing from multiple test clients that appear to come from unique source IPs. This will cause multiple ELB instances to spawn multiple instances behind the scenes (this can be detected by DNS lookups). This is explained in the whitepaper that Rightscale published:
http://blog.rightscale.com/2010/04/01/benchmarking-load-balancers-in-the-cloud/
Note that it takes a little while for ELB resources to scale out, so tests need to run for 20 minutes or more.
You also need to be sure that you have enough resources behind the load balancer. EC2 instances (as shown in the white paper mentioned above) seem to hit a throughput limit of around 100k packets per second which limits the number of concurrent connections that can be served (bear in mind the overhead of TCP and HTTP). You will need a lot of instances to be able to cope with 1M concurrent connections, and I'm not sure at what point you will hit the limit of ELB; in RightScale's test they only hit 19k.
Also you need to be clear about exactly what you mean by 1M concurrent connections, do you mean total keep-alive connections (assuming keep-alive enabled), or do you mean 1M transactions per second?

Resources