Select on one row table takes seconds - performance

I am experiencing very low performance in my web application in which trivial HTTP requests take dozens of seconds to be processed. Tracing down the application code I discovered the majority of time is spent executing the first DB query, even if it is as simple as a SELECT on a single row-single column table. This happens for every HTTP request, independently from the query performed. After this first pathological DB interaction the remaining queries go smoothly.
I am using Hibernate on top of an Oracle DB (using jdbc).
It is not a problem of connection pool since I am successfully using Hibernate-c3p0, neither it seems to be related to Oracle itself, because all query returns immediately if performed directly on DB.
Furthermore, Hibernate SessionFactory is correctly created only once, at application start up time and concurrency is not a problem at all since tests have been done with single user.
Finally, my DB IP address is correctly resolved in my application server /etc/hosts so that even DNS related issues can be discarded (I am using two distinct virtual machines, DB and APP server).
I do not know what to look for, any help?

This sounds like your session factory object is being spun up on the first query. Generally I try to initialize the session factory on application startup to avoid this when issuing the first query because generally the user can see this slowdown. When doing it up front in application startup you will avoid this.

Related

Limit concurrent queries in Spring JPA

I have a simple rest endpoint that executes Postgres procedure.
This procedure returns the current state of device.
For example:
20 devices.
Client app connect to API and make 20 responses to that endpoint every second.
For x clients there are x*20 requests.
For 2 clients 40 requests.
It causes a big cpu load on Postgres server only if there are many clients and/or many devices.
I didn’t create it but I need to redesign it.
How to limit concurrent queries to db only for it? It would be a hot fix.
My second idea is to create background worker that executes queries only one in the same time. Then the endpoint fetches data from memory.
I would try the simple way first. Try to reduce
the amount of database connections in the pool OR
the amount of working threads in the build-in Tomcat.
More flexible option would be to put the logic behind a thread pool limiting the amount of working threads. It is not trivial, if the Spring context and database is used inside a worker. Take a look on a Spring annotation #Async.
Offtopic: The solution we are discussing here looks like a workaround. The discussed solution alone will most probably increase the throughput only by factor 2 maybe 3. It is not JEE conform and it will be most probably not very stable. It is better to refactor the application avoiding such a problem. Another option would be to buy a new database server.
Update: JEE compliant solution would be to implement some sort of bulkhead pattern. It will limit the amount of concurrent running requests and reject it, if the some critical number is reached. The server application answers with "503 Service Unavailable". The client application catches this status and retries a second later (see "exponential backoff").

Azure Functions - Java CosmosClientBuilder slow on initial connection

we're using Azure Cloud Functions with the Java SDK and connect to the Cosmos DB using the following Java API
CosmosClient client = new CosmosClientBuilder()
.endpoint("https://my-cosmos-project-xyz.documents.azure.com:443/")
.key(key)
.consistencyLevel(ConsistencyLevel.SESSION)
.buildClient();
This buildClient() starts a connection to CosmosDB, which takes 2 to 3 seconds.
The subsequent database queries using that client are fast.
Only this first setup of the connection is pretty slow.
We keep the CosmosClient as a static variable, so we can reuse it between multiple http requests that go to our function.
But once the function is getting cold (when Azure shuts it down after a few minutes unused), the static variable gets lost and will be reconnected, when the function is started up again.
Is there a way to make this initial connection to cosmos DB faster?
Or do you think we need to increase the time a function stays online, if we need faster response times?
This is a expected behavior, see https://youtu.be/McZIQhZpvew?t=850.
The first request a client does needs to go through a warm-up step. This warm-up consists of fetching the account information, container information, routing and partitioning information in order to know where to route the requests (as you experienced, further requests do not get this extra latency). Hence the importance of maintaining a singleton instance.
In some Functions plan (Consumption) instances get de-provisioned if there is no activity, in which case, any existing instance of the client is destroyed, so when a new instance is provisioned, your first request will pay this warm-up cost.
There are currently no workaround I'm aware of in the Java SDK but this should not affect your P99 latency since it's just the first request on a cold client.
Hope this and the video help with the reason.

Best way to initialize initial connection with a server for REST calls?

I've been building some apps that connect to a SQL backend. I use ajax calls to hit WebMethods, a WebAPI, etc.
I notice that the first initial call to the SQL backend retrieves the data fairly slow. I can only assume that this is because it must first negotiate credentials first before retrieving the data. It probably caches this somewhere, and thus, any calls made afterwards come back very fast.
I'm wondering if there's an ideal, or optimal way, to initialize this connection.
My thought was to make a simple GET call right when the page loads (grabbing something very small, like a single entry). I probably wouldn't be using the returned data in any useful way, other than to ensure that any calls afterwards come back faster.
Is this an okay way to approach fixing the initial delay? I'd love to hear how others handle this.
Cheers!
There are a number of reasons that your first call could be slower than subsequent ones
Depending on your server platform, code may be compiled when first executed
You may not have an active DB connection in your connection pool
The database may not have cached indices or data on the first call
Some VM platforms may take a while to allocate sufficient resources to your server if it has been idle for a while.
One way I deal with those types of issues on the server side is to add startup code to my web service that fetches data likely to be used by many callers when the service first initializes (e.g. lookup tables, user credential tables, etc).
If you only control the client, consider that you may well wish to monitor server health (I use the open source monitoring platform Zabbix. There are also many commercial web-based monitoring solutions). Exercising the server outside of end-user code is probably better than making an extra GET call from a page that an end user has loaded.

JDBC connection pool manager

We're in the process of rewriting a web application in Java, coming from PHP. I think, but I'm not really sure, that we might run into problems in regard to connection pooling. The application in itself is multitenant, and is a combination of "Separate database" and "Separate schema".
For every Postgres database server instance, there can be more than 1 database (named schemax_XXX) holding more than 1 schema (where the schema is a tenant). On signup, one of two things can happen:
A new tenant schema is created in the highest numbered schema_XXX database.
The signup process sees that a database has been fully allocated and creates a new schemas_XXX+1 database. In this new database, the tenant schema is created.
All tenants are known via a central registry (also a Postgres database). When a session is established the registry will resolve the host, database and schema of the tenant and a database session is established for that HTTP request.
Now, the problem I think I'm seeing here is twofold:
A JDBC connection pool is defined when the application starts. With that I mean that all databases (host+database) are known at startup. This conflicts with the signup process.
When I'm writing this we have ~20 database servers with ~1000 databases (for a total sum of ~100k (tenant) schemas. Given those numbers, I would need 20*1000 data sources for every instance of the application. I'm assuming that all pools are also, at one time or another, also started. I'm not sure how much resources a pool allocates, but it must be a non trivial amount for 20k pools.
So, is it feasable to even assume that a connection pool can be used for this?
For the first problem, I guess that a pool with support for JMX can be used, and that we create a new datasource when and if a new schemas_XXX database is created. The larger issue is that of the huge amount of pools. For this, I guess, some sort of pool manager should be used that can terminate a pool that have no open connections (and on demand also start a pool). I have not found anything that supports this.
What options do I have? Or should I just bite the bullet and fall back to an out of process connection pool such as PgBouncer and establish a plain JDBC connection per request, similar to how we're handling it now with PHP?
A few things:
A Connection pool need not be instantiated only at application start-up. You can create or destroy them whenever you want;
You obviously don't want to eagerly create one Connection pool per database or schema to be open at all times. You'd need to keep at least 20000 or 100000 Connections open if you did, a nonstarter even before you get to the non-Connection resources used by the DataSource;
If, as is likely, requests for Connections for a particular tenant tend to cluster, you might consider lazily, dynamically instantiating pools, and destroying them after some timeout if they've not handled a request for a while.
Good luck!

Proper way to handle ResultSet retrieved by JDBC (Converted to XML vs Validated Online)

My Java application is processing a lot of information parsed from XML. For some methods, I need to validate some info in an SQL Database on another machine. I am using JDBC, currently for each validation, I call a DB Handling method that opens a connection and returns the result set to validate. I am not sure if I am taking the best design option. This seems very dummy and expensive to me. I am wondering if there are better practices.
Is there is a proper way to have the connection open through the whole application run time so no time is wasted (opening connection) for every validation iteration (I might have hundreds of thousands).
Should I build another application that retrieves all needed tables and convert them to XMLs that are saved on my machine. Later, my application parses them normally and have better access and performance
I am open to any better suggestion
Of course you can use one db connection for whole life of application. Use Singleton pattern. If your program work long and do not use database for a longer time it may loose db connection (some kind of timeout on network equipment etc.). For such cases you can use db pool. Such pool should manage longer time of inactivity or give you separate db connections for separate threads.
I think your solution with converting data into XML is not good. Why do you want to convert it into XML? How often is this data changed? How big is the database you want to copy? How do you want to synchronize your local copy with database? I think that local copy in XML files adds too many problems. If data is small than maybe you can read it at start of your program, save it in some data structures and use to verify other data? It can even be SQLite or other small database. But I would go this way only if singleton or db pool performance is really weak.

Resources