we are getting a weird issue and we are not able to identify what can be the root cause. Our service is written using Sprint boot and we are connecting to MariaDB through JPA.
If we do performance test of our service directly hitting the service end point (returning list of records) we get 300ms performance for 40 tps load
If we do performance test with the same load 40 tps through an experience api which consumes our service through feign client we get performance of 5 seconds
Interestingly from our service logs the difference we see above is while opening the db connection and executing query in our service. We are confused why with two different ways of hitting the same service the db performance differs. Has anyone faced similar issue before or has any suggestions to debug?
Related
I have a simple rest endpoint that executes Postgres procedure.
This procedure returns the current state of device.
For example:
20 devices.
Client app connect to API and make 20 responses to that endpoint every second.
For x clients there are x*20 requests.
For 2 clients 40 requests.
It causes a big cpu load on Postgres server only if there are many clients and/or many devices.
I didn’t create it but I need to redesign it.
How to limit concurrent queries to db only for it? It would be a hot fix.
My second idea is to create background worker that executes queries only one in the same time. Then the endpoint fetches data from memory.
I would try the simple way first. Try to reduce
the amount of database connections in the pool OR
the amount of working threads in the build-in Tomcat.
More flexible option would be to put the logic behind a thread pool limiting the amount of working threads. It is not trivial, if the Spring context and database is used inside a worker. Take a look on a Spring annotation #Async.
Offtopic: The solution we are discussing here looks like a workaround. The discussed solution alone will most probably increase the throughput only by factor 2 maybe 3. It is not JEE conform and it will be most probably not very stable. It is better to refactor the application avoiding such a problem. Another option would be to buy a new database server.
Update: JEE compliant solution would be to implement some sort of bulkhead pattern. It will limit the amount of concurrent running requests and reject it, if the some critical number is reached. The server application answers with "503 Service Unavailable". The client application catches this status and retries a second later (see "exponential backoff").
I have deployed a simple Spring boot app in Google App Engine Flexible. The app. has two APIs, one to add the user data into the DB (xxx.appspot.com/add) the other to get all the user data from the DB (xxx.appspot.com/all).
I wanted to see how GAE scales for the load, hence used JMeter to create a load with 100 user concurrency ramped up in 10 seconds and calls these two APIs in half a second delay, forever. While it runs fine for sometime (with just one instance), it starts to fail after 30 seconds or so with a "java.net.SocketException" or "The server responded with a status of 502".
After this error, when I try to access the same API from the browser, it displays,
Error: Server Error
The server encountered a temporary error and could not complete your
request. Please try again in 30 seconds.
The service is back to normal after 30 mins or so, and whenever the load test happens it repeats the same behavior as mentioned above. I expect GAE to auto-scale based on the load coming in to handle it without any down time (using multiple instances), instead it just crashes or blocks the service (without any information in the log). My app.yaml configuration is,
runtime: java
env: flex
service: hello-service
automatic_scaling:
min_num_instances: 1
max_num_instances: 10
I am a bit stuck with this one, Any help would be greatly appreciated. Thanks in advance.
The solution was to increase the resource configuration, details below.
Given that I did not set a resource parameter, it defaulted to the pre-defined values for both CPU and Memory. In this case, the default
memory was set at 0.6GB. App Engine Flex instances uses about 0.4GB
for overhead processes. Given Java is known to consume higher memory, there is a
great likelihood that the overhead processes consumed more than the
approximate 0.4GB value. Now instances in App Engine are restarted due
to a variety of reasons including optimization due to memory use. This
explains why your instances went off and it shows Tomcat is starting
up (they got restarted) and ends up in 502 error due to the nginx is
not able to complete the request. Fixing the above may lessen if not completely eliminate the 502s.
After I have specified the resources attribute and increased the configuration in app.yaml 502 error seems to be gone.
I'm experiencing an issue with an asp.net core application that virtually hangs when started under load. Once started, the application can handle the load no problem. It feels like initialization is happening for every concurrent request. Is anyone else experiencing similar behavior?
test scenario:
start a test application that hits the service with 50 simultaneous tasks
start the service
notice the requests getting started, but extremely long delays before any start to finish and most will time out
As a temporary work around, I created middleware that throttles subsequent requests until the very first finishes. This effectively lets asp.net mvc initialize before processing the bulk of the requests. This particular app is asp.net core 1.1 (web api) with EF core.
When using a real database located halfway across the country, I experience a good 900ms delay for the first request to my ASP.NET Core WebAPI. This is because it needs to get a connection pool established for use with connecting to said database, and I do not eagerly create a connection pool when running the service. Instead, it gets initialized lazily when I request a connection via a connection factory that is registered as a singleton in the services container.
Perhaps you're experiencing the same type of delay as you stated you're using Entity Framework Core, which is presumably backed by SQL Server. Is a connection pool with the database being initiated lazily as part of this initial request?
Try making a controller that does not have any dependencies that returns a vanilla 200 OK. Assuming you do not have global filters that have expensive services to hydrate, you should see the baseline initialization performance of your web service.
I have a single node elastic search server running on ec2. I want to do some load testing using search requests with random search queries. I am using JMeter for load testing with two different approaches -
HTTP Client - When I test using these clients with 10k/20k/50k of requests, it works fine.
ES Transport Client - This works fine with approx 2k of requests.
Here are the steps I have followed -
Instantiating client on every run and close it once the test finished.
Once client instantiates, I start the jmeter sampling and send the search request.
After this run, stops the sampling.
I am getting No Node Available Exception after 2k of request with transport client.
ES Server is running with 3g of memory and have given 6g of memory to load tester.
Please help me if there is some config modification required and if I am not using the correct approach to test the load.
Thanks in Advance.
What kind of responses are you getting from the http test? Have you verified you are getting valid responses for all 10~50k requests? It might be perhaps your cluster cannot take on the load you're putting on it for either test. Since TransportClient is more intimately coupled to the ES server, you will explicitly see errors that come back from TransportClient, but if you're simply sending requests via HTTP without validating the response, it's easy to miss any issues.
Although, before taking a stab in the dark like I just did, I would also check to see what kind of QPS you are getting using the HTTP method vs the TC method, what your CPU/memory look like throughout both tests, what the response times look like, etc. It helps to monitor the health of your system throughout the process to detect any symptoms that might help explain the cause.
I am creating a service which receives some data from mobile phones and saves it to the database.
The phone is sending the data every 250 ms. As I noticed that the delay for data storing is increasing I tried to run WireShark and write a log as well.
I noticed that the web requests from mobile phone are being made without the delay (checked with WireShark), but the in the service log I noticed the request is received every second and a half or almost two seconds.
Does anyone know where could be the problem or the way to test and determine the cause of such delay?
I am creating a service with WCF (webHttpBinding) and the database is MS SQL.
By the way the log stores the time of http request and also the time of writing data to the database. As mentioned above the request is received every 1.5 - 2 seconds and after that it takes 50 ms to store data to the database.
Thanks!
My first guess after reading the question was that maybe you are submitting data so fast, the database server is hitting a write-contention lock (e.g. AutoNumber fields?)
If your database platform is SQL Server, take a look at http://www.sql-server-performance.com/articles/per/lock_contention_nolock_rowlock_p1.aspx
Anyway please post more information about the overall architecture of the system... what softwares/platforms are used at what parts etc...
Maybe there is some limitation in the connection imposed by the service provider?
What happens if you (for testing) don't write to the database and just log the page hits in the server log with timestamp?
Check that you do not have any tracing running on the web services, this can really kill perf.