WebSphere JDBC Connection Pool advice - jdbc

I am having a hard time understanding what is happening in our WebSphere 7 on AIX environment. We have a JDBC Datasource that has a connection pool with a Min/Max of 1/10.
We are running a Performance Test with HP LoadRunner and when our test finishes we gather the data for the JDBC connection pool.
The Max Pool sizes shows as 10, the Avg pool size shows as 9, the Percent Used is 12%. With just this info would you make any changes or keep things the same? The pool size is growing from 1 to 9 during our test but it says its only 12% used overall. The final question is everytime our test is in the last 15 min before stopping we see an Avg Wait time of 1.8 seconds and avg thread wait of .5 but the percent used is still 10%. FYI, the last 15 min of our test does not add additional users or load its steady.
Can anyone provide any clarity or recommendations on if we should make any changes? thx!

First, I'm not an expert in this, so take this for whatever it's worth.
You're looking at WebSphere's PMI data, correct? PercentUsed is "Average percent of the pool that is in use." The pool size includes connections that were created, but not all of those will be in-use at any point in time. See FreePoolSize, "The number of free connections in the pool".
Based on just that, I'd say your pool is large enough for the load you gave it.
Your decreasing performance at the end of the test, though, does seem to indicate a performance bottleneck of some sort. Have you isolated it enough to know for certain that it's in database access? If so, can you tell if your database server, for instance, may be limiting things?

Related

JMeter sending less requests than expected

I'm using jmeter to generate a performance test, to keep things short and straight i read the initial data from a json file, i have a single thread group in which after reading the data i randomize certain values to prevent data duplication when i need it, then i'm passing the final data to the endpoint using variables, this will end up in a json body that is recieved by the endpoint and it will basically generate a new transaction in the database. Also i added a constant timer to add a 7 seconds delay between requests, with a test duration of 10 minutes and no ramp up, i calculated the requests per second like this:
1 minute has 60 seconds and i have a delay of 7 seconds per request then it's logical to say that every minute i'm sending approximately 8.5 requests per minute, this is my calculation (60/7) = 8.5 now if the test lasts for 10 minutes then i multiply (8.5*10) = 85 giving me a total of 85 transactions in 10 minutes, so i should be able to see that exact same amount of transactions created in the database after the test completes.
This is true when i'm running 10-20-40 users, after the load test run i query the db and i get the exact same number of transaction however, as i increase the users in the thread group this doesn't happen anymore, for example if i set 1000 users i should be able to generate 8500 transactions in 10 minutes, but this is not the case, the db only creates around 5.1k transactions.
What is happening, what is wrong? Why it initially works as expected and as i increase the users it doesn't? I can provide more information if needed. Please help.
There could be 2 possible reasons for this:
You discovered your application bottleneck. When you add more users the application response time increases therefore throughput decreases. There is a term called saturation point which stands for the maximum performance of the system, if you go beyond this point - the system will respond slower and you will get less TPS than initially. From the application under test side you should take a look into the following areas:
It might be the case your application simply lacks resources (CPU, RAM, Network, etc.), make sure that it has enough headroom to operate using i.e. JMeter PerfMon Plugin
Your application middleware (application server, database, load balancer, etc.) are not properly set up for the high loads. Identify your application infrastructure stack and make sure to follow performance tuning guidelines for each component
It is also possible that your application code needs optimization, you can detect the most time/resource consuming functions, largest objects, slowest DB queries, idle times, etc. using profiling tools
JMeter is not sending requests fast enough
Just like for the application under test check that JMeter machine(s) have enough resources (CPU, RAM, etc.)
Make sure to follow JMeter Best Practices
Consider going for Distributed Testing
Can you please check once CPU and Memory utilization(RAM and java heap utilization) of jmeter load generator while running jemter for 1000 users? If it is higher or reaching to max then it may affect requests/sec. Also just to confirm requests/sec from Jmeter side, can you please add listener in Jmeter script to track Hit/sec or TPS?
This will also be true(8.5K requests in 10 mins test duration) if your API response time is 1 second and also you have provided enough ramp-up time for those 1000 users.
So possible reason is:
You did not provide enough ramp-up time for 1000 users.
Your API average response time is more than 1 second while you performing tests for 1000 users.
Possible workarounds:
First, try to measure the API response time for 1 user.
Then calculate accordingly that how many users you need to reach 8500 requests in 10 mins. Use this formula:
TPS* max response time in second
Give proper ramp-up time for 1000 users. Check this thread to understand how you should calculate ramp-up time.
Check that your load generator is able to generate 1000 users without any memory or health (i.e CPU usage) issues. If requires, try to use distributed architecture.

Jmeter tps adjustment

Do we need to adjust Throughput given by jmeter, to find out the actual tps of the system
For eg : I am getting 100 tps for concurrent 250 users. This ran for 10 hrs. Can I come to a conclusion like my software can handle 100 transactions per second. Or else do I need to do some adjustment and need to get a value. Why i am asking this because when load started, system will take sometime to perform in adequate level (warm up time). If so how to do this. Please help me to understand this.
By default JMeter sends requests as fast as it can, the main factor which are affecting TPS rate are:
number of threads (virtual users) - this you can define in Thread Group
your application response time - this is not something you can control
Ideally when you increase number of threads the number of TPS should increase by the same factor, i.e. if you have 250 users and getting 100 tps you should get 200 tps for 500 users. If this is not the case - these 500 users are beyond saturation point and your application bottleneck is somewhere between 250 and 500 users (if not earlier).
With regards to "warm up" time - the recommended approach of conducting the load is doing it gradually, this way you will allow your application to get prepared to increasing load, warm up caches, let JIT compiler/optimizer to go their work, etc. Moreover this way you will be able to correlate the increasing load with increasing/decreasing throughput, response time, number of errors, etc. while having 250 users released at once doesn't tell the full story. See
The system warmup period varies from one system to the other. Warm up period is where configurations are cached, different libraries are initialized (eg. Builder.init()) and other initial functions that usually don't happen for subsequent calls. If you study results of the load test, there is a slow period at the very beginning. For most systems, it could be as small as 5 to 10 minutes. These values could be even negligible if the test is as long as 10 hours. But then again, average calculation can be effected if the results give extremely low values at the start (it always depend on the jump from initial warming up period to normal operations).
As per jmeter configurations this thread may explain the configuration. How to exclude warmup time from JMeter summary?

Max connection pool size and autoscaling group

In Sequelize.js you should configure the max connection pool size (default 5). I don't know how to deal with this configuration as I work on an autoscaling platform in AWS.
The Aurora DB cluster on r3.2xlarge allows 2000 max connections per read replica (you can get that by running SELECT ##MAX_CONNECTIONS;).
The problem is I don't know what should be the right configuration for each server hosted on our EC2s. What should be the right max connection pool size as I don't know how many servers will be launched by the autoscaling group? Normally, the DB MAX_CONNECTIONS value should be divided by the number of connection pools (one by server), but I don't know how many server will be instantiated at the end.
Our concurrent users count is estimated to be between 50000 and 75000 concurrent users at our release date.
Did someone get previous experience with this kind of situation?
It has been 6 weeks since you asked, but since I got involved in this recently I thought I would share my experience.
The answer various based on how the application works and performs. Plus the characteristics of the application under load for the instance type.
1) You want your pool size to be > than the expected simultaneous queries running on your host.
2) You never want your a situation where number of clients * pool size approaches your max connection limit.
Remember though that simultaneous queries is generally less than simultaneous web requests since most code uses a connection to do a query and then releases it.
So you would need to model your application to understand the actual queries (and amount) that would happen for your 75K users. This is likely a lot LESS than 75K/second db queries a second.
You then can construct a script - we used jmeter - and run a test to simulate performance. One of the items we did during our test was to increase the pool higher and see the difference in performance. We actually used a large number (100) after doing a baseline and found the number made a difference. We then dropped it down until it start making a difference. In our case it was 15 and so I set it to 20.
This was against t2.micro as our app server. If I change the servers to something bigger, this value likely will go up.
Please note that you pay a cost on application startup when you set a higher number...and you also incur some overhead on your server to keep those idle connections so making larger than you need isn't good.
Hope this helps.

What's a sensible basic OLTP configuration for Postgres?

We're just starting to investigate using Postgres as the backend for our system which will be used with an OLTP-type workload: > 95% (possibly >99%) of the transactions will be inserting 1 row into 4 separate tables, or updating 1 row. Our test machine is running 9.5.6 (using out-of-the-box config options) on a modest cloud-hosted Windows VM with a 4-core i7 processor, with a conventional 7200 RPM disk. This is much, much slower than our targeted production hardware, but useful right now for finding bottlenecks in our basic design.
Our initial tests have been pretty discouraging. Although the insert statements themselves run fairly quickly (combined execution time is around 2ms), the overall transaction time is around 40ms, due to the commit statement taking 38 ms. Furthermore, during a simple 3-minute load test (5000 transactions), we're only seeing about 30 transactions per second, with pgbadger reporting 3 minutes spent in "commit" (38 ms avg.), and the next highest statements being the inserts at 10 (2ms) and 3 (0.6 ms) respectively. During this test, the cpu on the postgres instance is pegged at 100%
The fact that the time spent in commit is equal to the elapsed time of the test tells me the that not only is commit serialized (unsurprising, given the relatively slow disk on this system), but that it is consuming a cpu during that duration, which surprises me. I would have assumed before the fact that if we were i/o bound, we would be seeing very low cpu usage, not high usage.
In doing a bit of reading, it would appear that using Asynchronous Commits would solve a lot of these issues, but with the caveat of data loss on crashes/immediate shutdown. Similarly, grouping transactions together into a single begin/commit block, or using multi-row insert syntax improves throughput as well.
All of these options are possible for us to employ, but in a traditional OLTP application, none of them would be (you need to have fast, atomic, synchronous transactions). 35 transactions per second on a 4-core box would have unacceptable 20 years ago on other RDBMs running on much slower hardware than this test machine, which makes me think that we're doing this wrong, as I'm sure Postgres is capable of handling much higher workloads.
I've looked around but can't find some common-sense config options that would serve as starting points for tuning a Postgres instance. Any suggestions?
If COMMIT is your time hog, that probably means:
Your system honors the FlushFileBuffers system call, which is as it should be.
Your I/O is miserably slow.
You can test this by setting fsync = off in postgresql.conf – but don't ever do this on a production system. If that improves performance a lot, you know that your I/O system is very slow when it actually has to write data to disk.
There is nothing that PostgreSQL (or any other reliable database) can improve here without sacrificing data durability.
Although it would be interesting to see some good starting configs for OLTP workloads, we've solved our mystery of the unreasonably high CPU during the commits. Turns out it wasn't Postgres at all, it was Windows Defender constantly scanning the Postgres data files. The team that set up our VM that was hosting the test server didn't understand that we needed a backend configuration as opposed to a user configuration.

JMeter JDBC database testing - Max Wait (ms)

What is the best practice for Max Wait (ms) value in JDBC Connection Configuration?
JDBC
I am executing 2 types of tests:
20 loops for each number of threads - to get max Throupught
30min runtime for each number of Threads - to get Response time
With Max Wait = 10000ms I can execute JDBC request with 10,20,30,40,60 and 80 Threads without an error. With Max Wait = 20000ms I can go higher and execute with 100, 120, 140 Threads without an error. It seems to be logical behaviour.
Now question.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
Should I stop testing and do not increase number of Threads if any error occur in some Report? I got e.g. 0.06% errors from 10000 samples. Is this stop for my testing?
Thanks.
Everything depends on what your requirements are and how you defined performance baseline.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
If you are OK with higher response times and the functionality should be working, then you can keep max time as much as you want. But, practically, there will be the threshold to response times (like, 2 seconds to perform a login transaction), which you define as part of your performance SLA or performance baseline. So, though you are making your requests successful by increasing max time, eventually it is considered as failed request due to high response time (by crossing threshold values)
Note: Higher response times for DB operations eventually results in higher response times for web applications (or end users)
Should I stop testing and do not increase number of Threads if any error occur in some Report?
Same applies to error rates as well. If SLA says, some % error rate is agreed, then you can consider that the test is meeting SLA or performance baseline if the actual error rate is less that that. eg: If requirements says 0% error rate, then 0.1% is also considered as failed.
Is this stop for my testing?
You can stop the test at whatever the point you want. It is completely based on what metrics you want to capture. From my knowledge, It is suggested to continue the test, till it reaches a point where there is no point in continuing the test, like error rate reached 99% etc. If you are getting error rate as 0.6%, then I suggest to continue with the test, to know the breaking point of the system like server crash, response times reached to unacceptable values, memory issues etc.
Following are some good references:
https://www.nngroup.com/articles/response-times-3-important-limits/
http://calendar.perfplanet.com/2011/how-response-times-impact-business/
difference between baseline and benchmark in performance of an application
https://msdn.microsoft.com/en-us/library/ms190943.aspx
https://msdn.microsoft.com/en-us/library/bb924375.aspx
http://searchitchannel.techtarget.com/definition/service-level-agreement
This setting maps to DBCP -> BasicDataSource -> maxWaitMillis parameter, according to the documentation:
The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely
It should match the relevant setting of your application database configuration. If your goal is to determine the maximum performance - just put -1 there and the timeout will be disabled.
In regards to Is this stop for my testing? - it depends on multiple factors like what application is doing, what you are trying to achieve and what type of testing is being conducted. If you test database which orchestrates nuclear plant operation than zero error threshold is the only acceptable. And if this is a picture gallery of cats, this error level can be considered acceptable.
In majority of cases performance testing is divided into several test executions like:
Load Testing - putting the system under anticipated load to see if it capable to handle forecasted amount of users
Soak Testing - basically the same as Load Testing but keeping the load for a prolonged duration. This allows to detect e.g. memory leaks
Stress testing - determining boundaries of the application, saturation points, bottlenecks, etc. Starting from zero load and gradually increasing it until it breaks mentioning the maximum amount of users, correlation of other metrics like Response Time, Throughput, Error Rate, etc. with the increasing amount of users, checking whether application recovers when load gets back to normal, etc.
See Why ‘Normal’ Load Testing Isn’t Enough article for above testing types described in details.

Resources