Max connection pool size and autoscaling group - amazon-ec2

In Sequelize.js you should configure the max connection pool size (default 5). I don't know how to deal with this configuration as I work on an autoscaling platform in AWS.
The Aurora DB cluster on r3.2xlarge allows 2000 max connections per read replica (you can get that by running SELECT ##MAX_CONNECTIONS;).
The problem is I don't know what should be the right configuration for each server hosted on our EC2s. What should be the right max connection pool size as I don't know how many servers will be launched by the autoscaling group? Normally, the DB MAX_CONNECTIONS value should be divided by the number of connection pools (one by server), but I don't know how many server will be instantiated at the end.
Our concurrent users count is estimated to be between 50000 and 75000 concurrent users at our release date.
Did someone get previous experience with this kind of situation?

It has been 6 weeks since you asked, but since I got involved in this recently I thought I would share my experience.
The answer various based on how the application works and performs. Plus the characteristics of the application under load for the instance type.
1) You want your pool size to be > than the expected simultaneous queries running on your host.
2) You never want your a situation where number of clients * pool size approaches your max connection limit.
Remember though that simultaneous queries is generally less than simultaneous web requests since most code uses a connection to do a query and then releases it.
So you would need to model your application to understand the actual queries (and amount) that would happen for your 75K users. This is likely a lot LESS than 75K/second db queries a second.
You then can construct a script - we used jmeter - and run a test to simulate performance. One of the items we did during our test was to increase the pool higher and see the difference in performance. We actually used a large number (100) after doing a baseline and found the number made a difference. We then dropped it down until it start making a difference. In our case it was 15 and so I set it to 20.
This was against t2.micro as our app server. If I change the servers to something bigger, this value likely will go up.
Please note that you pay a cost on application startup when you set a higher number...and you also incur some overhead on your server to keep those idle connections so making larger than you need isn't good.
Hope this helps.

Related

Microservices - Connection Pooling when connecting to a single legacy database

I am working on developing micro services for a monolithic application using spring boot + spring cloud + spring JDBC.
Currently, the application is connecting to a single database through tomcat JNDI connection pool.
We have a bottleneck here, not to change the database architecture at this point of time because of various reasons like large number of db objects,tight dependencies with other systems,etc.
So we have isolated the micro services based on application features. My concern is if we develop microservices with each having its own connection pool, then the number of connections to the database can increase exponentially.
Currently, I am thinking of two solutions
To calculate the number of connections that is being used currently by each application feature and arriving at max/min connection params per service- which is a very tedious process and we don't have any mechanism to get the connection count per app feature.
To develop a data-microservice with a single connection pool which gets the query object from other MS, triggers the query to the database and returns the resultset object to the caller.
Not sure whether the second approach is a best practice in the microservices architechture.
Can you please suggest any other standard approaches that can be helpful in the
current situation?
It's all about the tradeoffs.
To calculate the number of connections that is being used currently by each application feature and arriving at max/min connection params per service.
Cons: As you said, some profiling and guesswork needed to reach the sweet number of connection per app feature.
Pros: Unlike the second approach, you can avoid performance overhead
To develop a data-microservice with a single connection pool which gets the query object from other MS, triggers the query to the database and returns the resultset object to the caller.
Pros : Minimal work upfront
Cons: one more layer, in turn one more failure point. Performance will degrade as you have to deal with serialization -> Http(s) network latency -> deserialization->(jdbc fun stuff which is part of either approach) -> serialization -> Http(s) network latency -> deserialization. (In your case this performance cost may be negligible. But if every millisecond counts in your service, then this is a huge deciding factor)
In my opinion, I wouldn't split the application layer alone until I have analyzed my domains and my datastores.
This is a good read: http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/
I am facing a similar dilemma at my work and I can share the conclusions we have reached so far.
There is no silver bullet at the moment, so:
1 - Calculate the number of connections dividing the total desired number of connections for the instances of microservices will work well if you have a situation where your microservices don't need to drastically elastic scale.
2 - Not having a pool at all and let the connections be opened on demand. This is what is being used in functional programming (like Amazon lambdas). It will reduce the total number of open connections but the downside is that you lose performance as per opening connections on the fly is expensive.
You could implement some sort of topic that let your service know that the number of instances changed in a listener and update the total connection number, but it is a complex solution and goes against the microservice principle that you should not change the configurations of the service after it started running.
Conclusion: I would calculate the number if the microservice tend to not grow in scale and without a pool if it does need to grow elastically and exponentially, in this last case make sure that a retry is in place in case it does not get a connection in the first attempt.
There is an interesting grey area here awaiting for a better way of controlling pools of connections in microservices.
In time, and to make the problem even more interesting, I recommend reading the
article About Pool Sizing from HikariCP: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing
The ideal concurrent connections in a database are actually smaller than most people think.

Total amount of sessions per user for oracle cluster of 4 nodes

We have Oracle 11g Enterprise 64bit and it is a cluster of 4 nodes.
There is a user with limit of 96 sessions_per_user. We thought that the total limit of sessions for this user is 4 nodes * 96 = 384 sessions. But the reality is no more than something about 180 sessions. After approximately 180 sessions being opened we get erros:
ORA-12850: Could not allocate slaves on all specified instances: 4
needed, 3 allocated ORA-12801: error signaled in parallel query
server P004, instance 3599
ORA-02391: exceeded simultaneous
SESSIONS_PER_USER limit
The question is why the total limit is only 180 sessions? Why is it not 4*96?
We would greatly appreciate your answer.
Although I can't find it documented, a quick test implies you are correct that the maximum total number of sessions is equal to SESSIONS_PER_USER * Number of Nodes. However, that will only be true if the sessions are balanced evenly across the nodes. Each instance still enforces that limit.
Check the service you are connecting to, and if that service is available on all nodes. Run these commands to look at the preferred nodes and the actual running nodes. It's possible that there was a failure, a service migrated to one node, and never migrated back.
# Preferred nodes:
srvctl config service -d $your_db_name
# Running nodes:
srvctl status service -d $your_db_name
Or possibly the connections are hard-wired to a specific instance. This is usually a mistake, but sometimes it is necessary for things like running the PL/SQL debuggers. Run this query to see where your parallel sessions are spawning:
select inst_id ,gv$session.* from gv$session;
Also check the parameter PARALLEL_FORCE_LOCAL and make sure it is not set to true:
select value from gv$parameter where name = 'parallel_force_local';
Or perhaps there's an issue with counting the number of sessions. The number of sessions is frequently more than the requested degree of parallelism. For example, if the query sorts or hashes Oracle will double the number of parallel sessions, one set to produce the rows and one set to consume the rows. Are you sure of the number of parallel sessions being requested?
Also, in my tests, when I ran a parallel query without enough SESSIONS_PER_USER, it simply downgraded my query. I'm not sure why your database is throwing an error. (Perhaps you've got parallel queuing and a timeout set?)
Lastly, it looks like you are using an extremely high degree of parallelism. Are you sure that you need hundreds of parallel processes?
Chances are there are a lot of other potential issues I haven't thought of. Parallelism and RAC are complicated.

What are good Dropwizard database config defaults when using Oracle in production?

What settings from the following reference would be sensible for well behaved connection pools connecting to a large Oracle production database where both connection setup times and typical queries can be relatively long (more than a few seconds...).
http://www.dropwizard.io/manual/configuration.html#database
Similar tips on thin driver specific config properties (below) that are worth using would be useful too.
http://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html
I'm particularly interested in Dropwizard 7 and Oracle 11gR2 with the JDBC thin driver (ojdbc6.jar), but generally applicable tips would be great :)
About the max size of your connection pool you need to be careful not to use a too high number because it may have a large negative impact on the database. The default value for maxSize is 100 which is a reasonable number. There is a golden rule that says that the total number of connections to Oracle shouldn't be more than 20 times the number of threads (usually each core has two threads). Also establishing brand new connections to Oracle is expensive so depending on your needs you may need to consider a higher number than 10 for minSize. It's always best to keep a small number of always-busy connections to the database.

WebSphere JDBC Connection Pool advice

I am having a hard time understanding what is happening in our WebSphere 7 on AIX environment. We have a JDBC Datasource that has a connection pool with a Min/Max of 1/10.
We are running a Performance Test with HP LoadRunner and when our test finishes we gather the data for the JDBC connection pool.
The Max Pool sizes shows as 10, the Avg pool size shows as 9, the Percent Used is 12%. With just this info would you make any changes or keep things the same? The pool size is growing from 1 to 9 during our test but it says its only 12% used overall. The final question is everytime our test is in the last 15 min before stopping we see an Avg Wait time of 1.8 seconds and avg thread wait of .5 but the percent used is still 10%. FYI, the last 15 min of our test does not add additional users or load its steady.
Can anyone provide any clarity or recommendations on if we should make any changes? thx!
First, I'm not an expert in this, so take this for whatever it's worth.
You're looking at WebSphere's PMI data, correct? PercentUsed is "Average percent of the pool that is in use." The pool size includes connections that were created, but not all of those will be in-use at any point in time. See FreePoolSize, "The number of free connections in the pool".
Based on just that, I'd say your pool is large enough for the load you gave it.
Your decreasing performance at the end of the test, though, does seem to indicate a performance bottleneck of some sort. Have you isolated it enough to know for certain that it's in database access? If so, can you tell if your database server, for instance, may be limiting things?

Are Clustered WebLogic JDBC DataSource settings per node or per cluster?

I have a WebLogic 9.2 Cluster which runs 2 managed server nodes. I have created a JDBC Connection Pool which I have targeted at All servers in the cluster. I believe this will result in the physical creation of connection pools on each of the 2 managed servers (although please correct me if I'm wrong)?
Working on this assumption I have also assumed that the configuration attributes of the Connection Pool e.g. Min/ Max Size etc are per managed server rather than per cluster. However I am unsure of this and can't find anything which confirms or denies it in the WebLogic documentation.
Just to be clear here's an example:
I create connection-pool-a with the following settings and target it at All servers in the cluster:
Initial Capacity: 30
Maximum Capacity: 60
Are these settings applied:
Per managed server - i.e. each node has an initial capacity of 30 and max of 60 connections.
Across the cluster - i.e. the initial number of connections across all managed servers is 30 rising to a maximum of 60.
In some other way I haven't considered?
I ask as this will obviously have a significant effect on the total number of connections being made to the Database and I'm trying to figure out how best to size the connection pools given the constraints of our Database.
Cheers,
Edd
1.Per managed server - i.e. each node has an initial capacity of 30 and max
of 60 connections.
It is per server in the Cluster.
I cannot find the documentation right now, but the reason I know this, is when the DBA used to monitor actual DB sessions, as each Managed server was started, our numberof open connections used to increment by the value of "Initial Capacity" for that Data Source.
Say Initial Capacity = 10 for the Cluster, which has Server A and B.
When both are starting up, we would first see 10 open (but inactive) sessions on the DB, then 20.
At the database, using Oracle for example, there is a limiting value set for the DB user's profile which limits the total number of open sessions which the Weblogic user can hold.
WebLogic's ability to target resources to a Cluster is intended to help keep settings consistent across a large number of application servers. The resource settings are per server so whenever you bump up the connections for a DS that is used by a cluster, you would want to multiply it by the maximum number of WebLogic servers running at any time (This isn't always the same as the number of members in the cluster).

Resources