Spring Data CrudRepository.findAllById exception with large number of IDs

Spring Data CrudRepository.findAllById exception with large number of IDs - spring

I am using the Spring Data JDBC (version 1.1.6) method CrudRepository.findAllById to load entities from a database with a large number of IDs. The underlying database connection uses a Postgres database. The invocation of the method raises a PSQLException:
2020-05-28 05:58:35,260 WARN com.zaxxer.hikari.pool.ProxyConnection [task-2] HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#1224f39f marked as broken because of SQLSTATE(08006), ErrorCode(0)
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:358)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:109)
...
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 137525
at org.postgresql.core.PGStream.sendInteger2(PGStream.java:275)
at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1553)
at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1876)
at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1439)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
The exception seem to be ultimately caused by a 16-bit limit on the number of values in the SELECT IN (?) clause, that is generated by Sprint Data JDBC upon using findAllById.
Am I supposed to partition the list of IDs myself? Shouldn't the CrudRepository.findAllById handle this correctly in a fashion compatible with the underlying database dialect?

Am I supposed to partition the list of IDs myself?
Yes, assuming this kind of query makes sense in the first place.
Spring Data JDBC currently creates a straight forward select ... where id in (..) query which in turn is limited by the capabilities of the underlying database/JDBC driver.
With the apparent limit being ~216 for Postgres there doesn't seem to be much need for special handling in Spring Data JDBC since looking for so many ids in a single select seems rare enough to justify some manual coding.

Related

What is the best approach while pooling data from DB and query DB again to fetch additional information?

The spring boot application that I am working on
pools 1000 messages from table X [ This table X is populated by another service s1]
From each message get the account number and query table Y to get additional information about account.
I am using spring integrating to pool messages from table X and reading additional information for account, I am planning to use Spring JDBC.
We are expecting about 10k messages very day.
Is above approach, to query table Y for each message, a good approach ?

No, that indeed not. If all of that data is in the same database, consider to write a proper SELECT to join those tables in a single query performed by that source polling channel adapter.
Another approach is to implement a stored procedure which will do that job for you and will return the whole needed data: https://docs.spring.io/spring-integration/reference/html/jdbc.html#stored-procedures.
Although if the memory for that number of records to handle at once is a limit in your environment or you don't care how fast all of them are processed, then indeed an integration flow with parallel processing of splitted polling result is OK. For that goal you can use a JdbcOutboundGateway as a service in your flow instead of playing with plain JdbcTemplate: https://docs.spring.io/spring-integration/reference/html/jdbc.html#jdbc-outbound-gateway

Managing Database Calls For Every Socket Message Spring Boot

I have a spring boot application web-socket server up and running as a WEBRTC signaling server.
This server has to log some data to the database, based on the messages rotated from/to the server, via different client sockets.
While trying a conference call from client side using more than 2 peers, and running the signaling as debugger with breakpoints, the scenario was successfully completed and the database was updated as expected and the conference call took place.
But, if I run the server without debug and breakpoints, I get an sql error
Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; statement executed: HikariProxyPreparedStatement#1623454983 wrapping com.mysql.cj.jdbc.ClientPreparedStatement:
delete from my_table where my_table_id='601cbe2b-6af2-4e82-a69c-52b92d30686c'; nested exception is org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; statement executed: HikariProxyPreparedStatement#1623454983 wrapping com.mysql.cj.jdbc.ClientPreparedStatement:
delete from my_table where my_table_id='601cbe2b-6af2-4e82-a69c-52b92d30686c'
I am using Spring JPA and calling the save function on every message received by the web-socket after populating the entity data and all its nested lists and relational entity object, in order to keep data of the call flow.
conferenceRepository.save(conference);
I think the error is because of queries running concurrently on the database, in random order, while the data is still not there in the database to act upon.
As in debug mode, I am taking time to move from one breakpoint to another, and this is assuring the data persistence.
But I am not totally sure of the problem.
Is there an optimal way to apply concurrent database calls and updates and make sure the data is preserved and persisted properly in the database for concurrent web-socket related messages?

Informix enabling bulk insert

Enabling bulk insert in the Informix involves setting the environment variable IFX_USEPUT to the value 1 (the default value is 0). When used from a server side JDBC driver this has to be set in the JDBC URL. What are the implications of turning it on for all connections (for example configuring a connection pool where all the connections have this property set to 1)?
In other words, why is the property turned off by default?

IFX_USEPUT is off by default because of a few implications in how it speeds up batched inserts. It enables faster insertion by skipping server side data validation. This in turn means that if you attempt to insert say a double, into what the database has stored as an integer, your data will most likely end up incorrect in the database.
As long as you match your data types correctly setInt, setDate, etc to the database schema this is safe. Later versions of the JDBC driver also have better client side checks to ensure you don't corrupt the data by accident. It's just not at the point one would enable it by default.

Load file into Azure DB with MSSQLConnection

I just want to load some files into a Azure DB.
I am using the "Microsoft SQL Server" DB Type for the connection.
The problem is that when I insert like more than 10.000 rows, I have sometimes (90% of the time) an error:
Exception in component tMSSqlOutput_5
java.sql.BatchUpdateException: I/O Error: Connection reset
at net.sourceforge.jtds.jdbc.JtdsStatement.executeBatch(JtdsStatement.java:1091)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tFileInputDelimited_5Process(extractGC_child2.java:28852)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tFileList_6Process(extractGC_child2.java:32386)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tFileList_5Process(extractGC_child2.java:31540)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tMSSqlRow_1Process(extractGC_child2.java:30657)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tLoop_2Process(extractGC_child2.java:30440)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tFileList_4Process(extractGC_child2.java:29664)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tJava_3Process(extractGC_child2.java:34020)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tMSSqlInput_1Process(extractGC_child2.java:33593)
at dev_storch.extractgc_child2_0_1.extractGC_child2.tFTPConnection_2Process(extractGC_child2.java:33154)
[FATAL]: dev_storch.extractgc_child2_0_1.extractGC_child2 - tMSSqlOutput_5 I/O Error: Connection reset
[FATAL]: dev_storch.extractgc_child2_0_1.extractGC_child2 - tMSSqlRow_7 Invalid state, the Connection object is closed.
But when the volume of data inserted is lower, I don't receive any error.
My configuration looks like this:
tMSSQLConnection. Then I have some components to load files from a folder and load it inside a table.
The error comes at the tMSSQLOutput.
The following of the job are logs filling.
I tried to change the Batch size, to not use a DBConnection, but doesn't work.
I tried with a Generic JDBC component and it seems to work everytime. But I don't want to use the generic JDBC components because on the ouptut components, we can not choose the colume DB type (but maybe someone know how is it possible):
MSSQL:
Generic JDBC:
Thank you in advance...

here one solution maybe your :
Be aware that the batch size must be lower than or equal to the limit of parameter markers authorized by the JDBC driver (generally 2000) divided by the number of columns.

jdbi jdbc vertica streaming result set for handling large data

I am trying to connect to vertica through Jdbi jdbc to get huge result set.
Followed JDBI documentation and added this to dao,
#SqlQuery("<query>")
#Mapper(ResultRow.StreamMapper.class)
#FetchSize(chunkSizeInRows)
public Iterable<List<Object>> getStreamingResultSet(#Define("query") String query);
But it seems like its loading the entire data into memory instead of streaming it

I've been looking at streaming result sets from JDBI, and came across this question. The answer is on the SQL Object Queries documentation page:
because the method returns a java.util.Iterator it loads results
lazily
So in this case, the Iterable<List<Object>> should be a Iterator<List<Object>> (I assume JDBI can convert a database row to a List<Object>).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spring Data CrudRepository.findAllById exception with large number of IDs - spring

Related

What is the best approach while pooling data from DB and query DB again to fetch additional information?

Managing Database Calls For Every Socket Message Spring Boot

Informix enabling bulk insert

Load file into Azure DB with MSSQLConnection

jdbi jdbc vertica streaming result set for handling large data

Categories

Resources