Running out of SQL connections with Quarkus and hibernate-reactive-panache - quarkus

I've got a Quarkus app which uses hibernate-reactive-panache to run some queries and than process the result and return JSON via a Rest Call.
For each Rest call 5 DB queries are done, the last one will load about 20k rows:
public Uni<GraphProcessor> loadData(GraphProcessor graphProcessor){
return myEntityRepository.findByDateLeaving(graphProcessor.getSearchDate())
.select().where(graphProcessor::filter)
.onItem().invoke(graphProcessor::onNextRow).collect().asList()
.onItem().invoke(g -> log.info("loadData - end"))
.replaceWith(graphProcessor);
}
//In myEntityRepository
public Multi<MyEntity> findByDateLeaving(LocalDate searchDate){
LocalDateTime startDate = searchDate.atStartOfDay();
return MyEntity.find("#MyEntity.findByDate",
Parameters.with("startDate", startDate)
.map()).stream();
}
This all works fine for the first 4 times but on the 5th call I get
11:12:48:070 ERROR [org.hibernate.reactive.util.impl.CompletionStages:121] (147) HR000057: Failed to execute statement [$1select <ONE OF THE QUERIES HERE>]: $2could not load an entity: [com.mycode.SomeEntity#1]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable: Timeout
at <16 internal lines>
io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.lambda$null$0(SqlConnectionPool.java:202) <4 internal lines>
at io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.lambda$onEnqueue$1(SqlConnectionPool.java:199) <15 internal lines>
Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout
I've checked https://quarkus.io/guides/reactive-sql-clients#pooled-connection-idle-timeout and configured
quarkus.datasource.reactive.idle-timeout=1000
That itself did not make a difference.
I than added
quarkus.datasource.reactive.max-size=10
I was able to run 10 Rest calls before getting the timeout again. On a pool setting of max-size=20 I was able to run it 20 times. So it does look like each Rest call will use up a SQL connection and not release it again.
Is there something that needs to be done to manually release the connection or is this simply a bug?

The problem was with using #Blocking on a reactive Rest method.
See https://github.com/quarkusio/quarkus/issues/25138 and https://quarkus.io/blog/resteasy-reactive-smart-dispatch/ for more information.
So if you have a rest method that returns e.g. Uni or Multi, DO NOT use #Blocking on the call. I had to initially add it as I received an Exception telling me that the thread cannot block. This was due to some CPU intensive calculations. Adding #Blocking made that exception go away (in dev-mode but another problem popped up in native mode) but caused this SQL pool issue.
The real solution was to use emitOn to change the thread for the cpu intensive method:
.emitOn(Infrastructure.getDefaultWorkerPool())
.onItem().transform(processor::cpuIntensiveMethod)

Related

Reactive Quarkus app behaving differently when run as Java or native

I have a reactive quarkus app with hibernate-panache-reactive. The problem is it behaves differently when I run it as a Java app or a native app.
The app
loads a lot of data from a MySQL DB via hibernate-panache-reactive
builds a graph based on the data loaded
runs some time consuming algorithm on the graph
loads some more data from the DB based on the results returned from 3)
So initially the code looked something like this:
GraphProcessor graphProcessor = createInitialProcessor();
return Uni.createFrom().item(graphProcessor)
// 1) loading of initial data
.onItem().transformToUni(this::loadDataViaPanaceReactive1)
.onItem().transformToUni(this::loadDataViaPanaceReactive2)
.onItem().transformToUni(this::loadDataViaPanaceReactive3)
// 2) building of graph
.onItem().transform(graphProcessor::processLoadedData)
.onItem().invoke(graphProcessor::loadingComplete) //sync
// 3) running time consuming algorithm on graph
.onItem().transformToMulti(this::runTimeConsumingTask)
.onItem().invoke(this::prepareDBQueries)
// 4) load more data from DB
.onItem().transformToUniAndConcatenate(this::loadMoreData1)
.onItem().transformToUniAndConcatenate(this::loadMoreData2)
.onItem().transformToUniAndConcatenate(this::transformToPublicForm)
.onFailure().invoke(log::error);
That worked fine when run as a Java app but when I tried to run it as a native app it first complained that the computation in 2 and 3 were taking too long and this was blocking the calling thread.
I fixed that by using
.emitOn(Infrastructure.getDefaultWorkerPool())
Between 1 and 2
This time I got another error
java.lang.IllegalStateException: HR000069: Detected use of the
reactive Session from a different Thread than the one which was used
to open the reactive Session - this suggests an invalid integration;
original thread: 'vert.x-eventloop-thread-0' current Thread:
'vert.x-eventloop-thread-1'
I've fixed that by inserting
.emitOn(Infrastructure.getDefaultExecutor())
between 3 and 4.
GraphProcessor graphProcessor = createInitialProcessor();
return Uni.createFrom().item(graphProcessor)
// 1) loading of initial data
.onItem().transformToUni(this::loadDataViaPanaceReactive1)
.onItem().transformToUni(this::loadDataViaPanaceReactive2)
.onItem().transformToUni(this::loadDataViaPanaceReactive3)
// 2) building of graph
.emitOn(Infrastructure.getDefaultWorkerPool()) // Required for native mode
.onItem().transform(graphProcessor::processLoadedData)
.onItem().invoke(graphProcessor::loadingComplete)
// 3) running time consuming algorithm on graph
.onItem().transformToMulti(this::runTimeConsumingTask)
.onItem().invoke(this::prepareDBQueries)
.emitOn(Infrastructure.getDefaultExecutor()) // Required for native mode
// 4) load more data from DB
.onItem().transformToUniAndConcatenate(this::loadMoreData1)
.onItem().transformToUniAndConcatenate(this::loadMoreData2)
.onItem().transformToUniAndConcatenate(this::transformToPublicForm)
.onFailure().invoke(log::error);
That worked when run in native mode but now when I run it in Java I get the same exception (Detected use of the
reactive Session from a different Thread than the one which was used
to open the reactive Session)
The emitOn(Infrastructure.getDefaultExcecutor()) should have switched back to the original thread.
The odd thing is also that this exception is not thrown every time I hit the app.
So what am I doing wrong here? What is the best way to handle time consuming tasks and then having to do some more DB queries after?
You could use .runSubscriptionOn(Executor) but I would need to switch back to the original thread for part 4 again.
Thanks for you help.

ElasticSearch randomly fails when running tests

I have a test ElasticSearch box (2.3.0) and my tests that are using ES are failing in random order which is really frustrating (failed with All shards failed exception).
Looking at the elastic_search.log file it only showed me this
[2017-05-04 04:19:15,990][DEBUG][action.search.type ] [es-testing-1] All shards failed for phase: [query]
RemoteTransportException[[es-testing-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]];
Caused by: [derp_test][[derp_test][3]] IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]]
at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:993)
at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:814)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:641)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:618)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:369)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Any idea what's going on? So far my research only told me this is most likely due to corrupt translog -- but I don't think deleting translog will help because the test drops the test index for every namespace
ES test box has 3.5GB RAM and it's using 2.5GB heap size, CPU usage is quite normal during the test (peaked at 15%)
To clarify: when I said failing test, I meant error with the weird exception as mentioned above (not failing test due to incorrect value). I did manual refresh after every insert/update operation so value is correct.
After investigating ElasticSearch log file (at DEBUG level) and the source code, turns out what actually happened was that after index is created, the shards are entering RECOVERING state and sometimes my test tried to perform a query on ElasticSearch while the shards are not yet active -- thus the exception.
Fix is simple - after creating an index, just wait until shards are active using setWaitForActiveShards function and to be more paranoid I also added setWaitForYellowStatus
It's a recommendation use ESIntegTestCase to do the integration test.
ESIntegTestCase has some helper method, like: ensureGreen and refresh ... to ensure the Elasticsearch is ready to continue testing. and you can configure node settings for test.
if use Elasticsearch directly as a test box, it maybe cause various problems:
like your Exception, this seems it's recovering shards for index
derp_test.
even you have indexed your data into index, but when you immediately search will fail, since cluster need flush or refresh
...
Those most problems can just use Thread.sleep to wait some time to fix :), but it's a bad way to do this.
Try manually refreshing your indices after inserting the data and before performing a query to ensure the data is searchable.
Either:
As part of the index request - https://www.elastic.co/guide/en/elasticsearch/reference/2.3/docs-index_.html#index-refresh
Or separately - https://www.elastic.co/guide/en/elasticsearch/reference/2.3/indices-refresh.html
There could be another reason. I had the same problem with my elasticsearch unit tests, at first I thought the problem root cause is somewhere in .Net Core or Nest or elsewhere outside of my code because the test would run successfully in Debug mode (When debugging tests) but randomly failed in Release mode (when running tests).
After lots of investigations and many try and errors, I found out the problem root cause (in my case) was Concurrency !! or on the other hand Race Condition used to happen
Since the tests run concurrently and I used to recreate and seed my index (initializing and preparing) on test class constructor which means executing on the beginning of every test and since the tests would run concurrently, race condition were likely to happen and make my tests fail
Here is my initialization code that caused tests fail randomly when running them (on release mode)
public BaseElasticDataTest(RootFixture fixture)
: base(fixture)
{
ElasticHelper = fixture.Builder.Build<ElasticDataProvider<FakePersonIndex>();
deleteFakeIndex();
createFakeIndex();
fillFakeIndexData();
}
the code above used to run on every test concurrently. I fixed my problem by executing initialization code only once per test class (once for all the test cases inside the test class) and the problem went away.
Here is my fixed test class constructor code :
static bool initialized = false;
public BaseElasticDataTest(RootFixture fixture)
: base(fixture)
{
ElasticHelper = fixture.Builder.Build<ElasticDataProvider<FakePersonIndex>>();
if (!initialized)
{
deleteFakeIndex();
createFakeIndex();
fillFakeIndexData();
//for concurrency
System.Threading.Thread.Sleep(100);
initialized = true;
}
}
Hope it helps

Can MAX_UTILIZATION for PROCESSES reached cause "Unable to get managed connection" Exception?

A JBoss 5.2 application server log was filled with thousands of the following exception:
Caused by: javax.resource.ResourceException: Unable to get managed connection for jdbc_TestDB
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:441)
at org.jboss.resource.connectionmanager.TxConnectionManager.getManagedConnection(TxConnectionManager.java:424)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.allocateConnection(BaseConnectionManager2.java:496)
at org.jboss.resource.connectionmanager.BaseConnectionManager2$ConnectionManagerProxy.allocateConnection(BaseConnectionManager2.java:941)
at org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:96)
... 9 more
Caused by: javax.resource.ResourceException: No ManagedConnections available within configured blocking timeout ( 30000 [ms] )
at org.jboss.resource.connectionmanager.InternalManagedConnectionPool.getConnection(InternalManagedConnectionPool.java:311)
at org.jboss.resource.connectionmanager.JBossManagedConnectionPool$BasePool.getConnection(JBossManagedConnectionPool.java:689)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:404)
... 13 more
I've stripped off the first part of the exception, which is basically our internal JDBC wrapper code which tries to get a DB connection from the pool.
Looking at the Oracle DB side I ran the query:
select resource_name, current_utilization, max_utilization, limit_value
from v$resource_limit
where resource_name in ('sessions', 'processes');
This produced the output:
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION LIMIT_VALUE
processes 1387 1500 1500
sessions 1434 1586 2272
Given the fact that that PROCESSES limit of 1500 was reached, would this cause the JBoss exceptions we experienced? I've also been investigating the possibility of connection leaks, but haven't found any evidence of that so far.
What is the recommended course of action here? Is simply increasing the limit a valid solution?
Usually when max_utilization gets the processes value listener will refuse new connections to database. you can see the errors relates to it in alert log. to solve this in database side you should increase the processes parameter.
hmm strange. is it possible, that exception wrapping in JBOSS hides the original error? You should get some sql exception whose text starts with ORA-. Maybe your JDBC wrapper does not handle errors properly.
The recommended actions is to:
check configured size of connection pool against processes sessions Oracle startup paramters.
check Oracles view v$session, especially columns STATUS, LAST_CALL_ET, SQL_ID, PREV_SQL_ID.
translate sql_id(prev_sql_id) into sql_text via v$sql.
if you application has a connection leak, sql_id and pred_sql_id might point you onto a place in your source code, where a connection was used last (i.e. where it was leaked).

Why is Parse.Cloud.httpRequest failing non-deterministically on a cloud method?

I am doing a method where I am using 2 Parse.Cloud.httpRequest calls, with one being inside of the other. However, this method seem to fail with an alarming frequency. Like 1 in 5 tries, each time the error is:
Request failed with response code 500
{"uuid":"bc75e304-8964-30f9-c9d5-92fabf02f624","status":500,"error":{"code":-1,"error":"Request timed out"},"headers":{},"text":"{\"code\":124,\"error\":\"Request timed out\"}","cookies":{}}
I looked up code 124, and it corresponds to
Timeout 124 Error code indicating that the request timed out on the server. Typically this indicates that the request is too expensive to run.
I am only running a couple REST requests per minute and the run of the method does not exceed 3 seconds. I checked the same calls via REST and there is never any problems.
What's the cause for this problem and can I fix it by upgrading my parse account?

Timeout error trying to lock table in h2

I get the following error under a certain scenario
When a different thread is populating a lot of users via the bulk upload operation and I was trying to view the list of all users on a different web page. The list query, throws the following timeout error. Is there a way to set this timeout so that I can avoid this timeout error.
Env: h2 (latest), Hibernate 3.3.x
Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table "USER"; SQL statement:
[50200-144]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:327)
at org.h2.message.DbException.get(DbException.java:167)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.table.RegularTable.doLock(RegularTable.java:482)
at org.h2.table.RegularTable.lock(RegularTable.java:416)
at org.h2.table.TableFilter.lock(TableFilter.java:139)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:571)
at org.h2.command.dml.Query.query(Query.java:257)
at org.h2.command.dml.Query.query(Query.java:227)
at org.h2.command.CommandContainer.query(CommandContainer.java:78)
at org.h2.command.Command.executeQuery(Command.java:132)
at org.h2.server.TcpServerThread.process(TcpServerThread.java:278)
at org.h2.server.TcpServerThread.run(TcpServerThread.java:137)
at java.lang.Thread.run(Thread.java:619)
at org.h2.engine.SessionRemote.done(SessionRemote.java:543)
at org.h2.command.CommandRemote.executeQuery(CommandRemote.java:152)
at org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:96)
at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:342)
at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
at org.hibernate.loader.Loader.getResultSet(Loader.java:1808)
at org.hibernate.loader.Loader.doQuery(Loader.java:697)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:259)
at org.hibernate.loader.Loader.doList(Loader.java:2228)
... 125 more
Yes, you can change the lock timeout. The default is relatively low: 1 second (1000 ms).
In many cases the problem is that another connection has locked the table, and using multi-version concurrency also solves the problem (append ;MVCC=true to the database URL).
EDIT: MVCC=true param is no longer supported, because since h2 1.4.200 it's always true for a MVStore engine, which is a default engine.
I faced quite the same problem and using the parameter "MVCC=true", it solved it. You can find more explanations about this parameter in the H2 documentation here : http://www.h2database.com/html/advanced.html#mvcc
I'd like to suggest that if you are getting this error, then perhaps you should not be using a transaction on your bulk database operation. Consider instead doing a transaction on each individual update: does it make sense to think of an entire bulk import as a transaction? Probably not. If it does, then yes, MVCC=true or a bigger lock timeout is a reasonable solution.
However, I think for most cases, you are seeing this error because you are trying to perform a very long transaction - in other words you are not aware that you are performing a really long transaction. This was certainly the case for myself and I simply took more care on how I was writing records (either using no transactions or using smaller transactions) and the lock timeout issue was resolved.
For those having this issue with integration tests (i.e. server is accessing the h2 db and an integration test is accessing the db before calling the server, to prepare the test), adding a 'commit' to the script executed before the test makes sure that the data are in the database before calling the server (without MVCC=true - which I find is a bit 'weird' if it is not enabled by default).
I had MVCC=true in my connection string but still was getting error above. I had added ;DEFAULT_LOCK_TIMEOUT=10000;LOCK_MODE=0 and problem was solved
I got this issue with the PlayFramework
JPAQueryException occured : Error while executing query from
models.Page where name = ?: Timeout trying to lock table "PAGE"
It ended being an infinite loop of sorts because I had a
#Before
without an unless which caused the function to repeatedly call itself
#Before(unless="getUser")
Working with DBUnit, H2 and Hibernate - same error, MVCC=true helped, but I would still get the error for any tests following deletion of data. What fixed these cases was wrapping the actual deletion code inside a transaction:
Transaction tx = session.beginTransaction();
...delete stuff
tx.commit();
From a 2020 user, see reference
Basically, the reference says:
Sets the lock timeout (in milliseconds) for the current session. The default value for this setting is 1000 (one second).
This command does not commit a transaction, and rollback does not affect it. This setting can be appended to the database URL: jdbc:h2:./test;LOCK_TIMEOUT=10000

Resources