Flink Streaming JdbcSink Exception Handling - jdbc

I am using JdbcSink to insert processed events into a Postgres DB.
Occasionally, I receive bad records from the source stream, and it fails to insert into the database (java.sql.BatchUpdateException) since it fails to satisfy some table constraints.
I can obviously pass the events through a Flink filter operator to filter them, but the filter would then become a complex code to check every possible combination of failure. Instead of a filter, I would like to catch the BatchUpdateException thrown by the JdbcSink, log it and continue processing other events.
No luck trying to find a way to catch BatchUpdateException from JdbcSink.
Has someone tried doing similar with success?

I took a quick look at the code, and didn't see an obvious solution. You could extend the JdbcOutputFormat class and override the attemptFlush method, and then clone the JdbcSink class and modify your version to use your output format class.

Related

Hiberante Listener: difference between EventType.POST_INSERT and EventType.POST_COMMIT_INSERT

Request
I need to excute a piece of code after every insert in some tables.
I don't want to edit every part of the application looking for all places where the insertion is executed, I would like to use the Hibernate Listener.
Question
I don't understand the difference between the EventTypes POST_INSERT and POST_COMMIT_INSERT. Both refer to the listener PostInsertEventListener, but I would like to know which of these I should use, and what are the differences. Both seems will be excecuted after the record is written in the database, maybe one of these is executed only on some special case?
Other details
I can't (and I don't want to) use a SQL trigger on the tables.
You would be better off using a org.hibernate.Interceptor or a JPA #PostPersist listener. The event listener infrastructure is designed for much more sophisticated usecases than yours.

Can't serialize due to concurrent operations: memgraph

I am performing mix of queries(reads/write/updates/deletes) to a single memgraph instance.
To do the same I am using Java client by Neo4j, all the APIs I am currently using are sync APIs from the driver.
Nature of queries in my case is such that I can execute them concurrently with no side effects. For better performance I am firing the queries in parallel. The error I am getting is for a CREATE operation where I am creating an edge between two nodes. This is consistent as I tried running this same setup multiple times and every time, all queries go through except it crashes when it comes to this create edge stage.
Query for reference:
OPTIONAL MATCH (node1) WHERE id(node1) = $nodeId1
OPTIONAL MATCH (node2) WHERE id(node2) = $nodeId2
CREATE (node1)-[:KNOWS]-> (node2)
I am not able to find any documentation around any such error. Please point me to some document like this or any workaround using which I can ask memgraph to put the query on hold if same objects are being operated by some other query.
One approach I am thinking is just implement retry for any such failed queries, but I am looking for a cleaner approach.
P.S. I was running the same setup on Neo4j earlier and did not encounter any problems with it.
Yep, in the case of this error, the code should retry the query. I think an equivalent issue can happen in Neo4j, but since Memgraph is more optimistic about locking, sometimes the error might happen more often. In general, the correct approach is to have error handling for this case implemented.

Batching stores transparently

We are using the following frameworks and versions:
jOOQ 3.11.1
Spring Boot 2.3.1.RELEASE
Spring 5.2.7.RELEASE
I have an issue where some of our business logic is divided into logical units that look as follows:
Request containing a user transaction is received
This request contains various information, such as the type of transaction, which products are part of this transaction, what kind of payments were done, etc.
These attributes are then stored individually in the database.
In code, this looks approximately as follows:
TransactionRecord transaction = transactionRepository.create();
transaction.create(creationCommand);`
In Transaction#create (which runs transactionally), something like the following occurs:
storeTransaction();
storePayments();
storeProducts();
// ... other relevant information
A given transaction can have many different types of products and attributes, all of which are stored. Many of these attributes result in UPDATE statements, while some may result in INSERT statements - it is difficult to fully know in advance.
For example, the storeProducts method looks approximately as follows:
products.forEach(product -> {
ProductRecord record = productRepository.findProductByX(...);
if (record == null) {
record = productRepository.create();
record.setX(...);
record.store();
} else {
// do something else
}
});
If the products are new, they are INSERTed. Otherwise, other calculations may take place. Depending on the size of the transaction, this single user transaction could obviously result in up to O(n) database calls/roundtrips, and even more depending on what other attributes are present. In transactions where a large number of attributes are present, this may result in upwards of hundreds of database calls for a single request (!). I would like to bring this down as close as possible to O(1) so as to have more predictable load on our database.
Naturally, batch and bulk inserts/updates come to mind here. What I would like to do is to batch all of these statements into a single batch using jOOQ, and execute after successful method invocation prior to commit. I have found several (SO Post, jOOQ API, jOOQ GitHub Feature Request) posts where this topic is implicitly mentioned, and one user groups post that seemed explicitly related to my issue.
Since I am using Spring together with jOOQ, I believe my ideal solution (preferably declarative) would look something like the following:
#Batched(100) // batch size as parameter, potentially
#Transactional
public void createTransaction(CreationCommand creationCommand) {
// all inserts/updates above are added to a batch and executed on successful invocation
}
For this to work, I imagine I'd need to manage a scoped (ThreadLocal/Transactional/Session scope) resource which can keep track of the current batch such that:
Prior to entering the method, an empty batch is created if the method is #Batched,
A custom DSLContext (perhaps extending DefaultDSLContext) that is made available via DI has a ThreadLocal flag which keeps track of whether any current statements should be batched or not, and if so
Intercept the calls and add them to the current batch instead of executing them immediatelly.
However, step 3 would necessitate having to rewrite a large portion of our code from the (IMO) relatively readable:
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
to:
userObjects.forEach(userObject -> {
dslContext.insertInto(...).values(userObject.getX(), ...).execute();
}
which would defeat the purpose of having this abstraction in the first place, since the second form can also be rewritten using DSLContext#batchStore or DSLContext#batchInsert. IMO however, batching and bulk insertion should not be up to the individual developer and should be able to be handled transparently at a higher level (e.g. by the framework).
I find the readability of the jOOQ API to be an amazing benefit of using it, however it seems that it does not lend itself (as far as I can tell) to interception/extension very well for cases such as these. Is it possible, with the jOOQ 3.11.1 (or even current) API, to get behaviour similar to the former with transparent batch/bulk handling? What would this entail?
EDIT:
One possible but extremely hacky solution that comes to mind for enabling transparent batching of stores would be something like the following:
Create a RecordListener and add it as a default to the Configuration whenever batching is enabled.
In RecordListener#storeStart, add the query to the current Transaction's batch (e.g. in a ThreadLocal<List>)
The AbstractRecord has a changed flag which is checked (org.jooq.impl.UpdatableRecordImpl#store0, org.jooq.impl.TableRecordImpl#addChangedValues) prior to storing. Resetting this (and saving it for later use) makes the store operation a no-op.
Lastly, upon successful method invocation but prior to commit:
Reset the changes flags of the respective records to the correct values
Invoke org.jooq.UpdatableRecord#store, this time without the RecordListener or while skipping the storeStart method (perhaps using another ThreadLocal flag to check whether batching has already been performed).
As far as I can tell, this approach should work, in theory. Obviously, it's extremely hacky and prone to breaking as the library internals may change at any time if the code depends on Reflection to work.
Does anyone know of a better way, using only the public jOOQ API?
jOOQ 3.14 solution
You've already discovered the relevant feature request #3419, which will solve this on the JDBC level starting from jOOQ 3.14. You can either use the BatchedConnection directly, wrapping your own connection to implement the below, or use this API:
ctx.batched(c -> {
// Make sure all records are attached to c, not ctx, e.g. by fetching from c.dsl()
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
});
jOOQ 3.13 and before solution
For the time being, until #3419 is implemented (it will be, in jOOQ 3.14), you can implement this yourself as a workaround. You'd have to proxy a JDBC Connection and PreparedStatement and ...
... intercept all:
Calls to Connection.prepareStatement(String), returning a cached proxy statement if the SQL string is the same as for the last prepared statement, or batch execute the last prepared statement and create a new one.
Calls to PreparedStatement.executeUpdate() and execute(), and replace those by calls to PreparedStatement.addBatch()
... delegate all:
Calls to other API, such as e.g. Connection.createStatement(), which should flush the above buffered batches, and then call the delegate API instead.
I wouldn't recommend hacking your way around jOOQ's RecordListener and other SPIs, I think that's the wrong abstraction level to buffer database interactions. Also, you will want to batch other statement types as well.
Do note that by default, jOOQ's UpdatableRecord tries to fetch generated identity values (see Settings.returnIdentityOnUpdatableRecord), which is something that prevents batching. Such store() calls must be executed immediately, because you might expect the identity value to be available.

Spring-integration: keep a context for a Message throught a chain

I am using spring-integration, and I have messages that goes through an int:chain with multiple elements: int:service-activator, int:transformers, etc. In the end, a message is sent to another app's Rest endpoint. There is also an errorHandler that will save any Exception in a text file.
For administration purpose, I would like to keep some information about what happened in the chain (ex: "this DB call returned this", "during this transformation, this rule was applied", etc.). This would be equivalent to a log file, but bound to a Message. Of course there is already a logger, but in the end, I need to create (either after the Rest called is made, or when an error occurs) a file for this specific Message with the data.
I was wondering if there was some kind of "context" for the Message that I could call through any part of the chain, and where I could store stuff. I didn't found anything in the official documentation, but I'm not really sure about what to look for.
I've been thinking about putting it all in the Message itself, but:
It's an immutable object, so I would need to rebuild it each time I want to add something to its header (or the payload).
I wouldn't be able to retrieve any new data from the error handler in case of Exception, because it takes the original message.
I can't really add it to the payload object because some native transformers/service-activators are directly using it (and that would also mean rewriting a lot of code ...)
I've been also thinking to some king of "thread-bound" bean that would act as a context for each Message, but I see too many problem arising from this.
Maybe I'm wrong about some of these ideas. Anyway, I just need a way to keep data though multiple element of a Spring integration chain and also be able to access it in the error handler.
Add a header, e.g. a map or list, and add to it in each stage.
The framework does something similar when message history is enabled.

Spring Batch exception Handling

Am currently working with spring batch for the first time. In spring batch i've set commit level to 1000 which gave me better performance but now I ve the issues in identifying the corrupt or exception item. We need to send mail update with the record line or item number with the exception data.
I tried item listener, chunk listener, step listener and job listener but am not able to figure out how to get those information from execution listener context while generating mail in job listener. Am able to get the information about exception and not able to track which record has the issue and item count in the chunk.
For example, if I have 1000 lines in file or db and commit level 100. If we have issue in 165 item. I need to get the line number as 165 in any listener so I can attach that in context to populate logging info to have a quick turn around time to fix the issue before reprocessing.
I Searched but I couldn't get suggestion or idea. I believe this will be a common problem in chunk commit greater than 1. Please suggest the better way to handle.
Thanks in advance
You'll want to perform the checks that can cause an issue in the processor, and create an error item out of them which will get persisted to its own table/file. Some errors are unavoidable, and unfortunately you'll need to do manual debugging within that chunk.
Edit:
To find the commit range, you would need to preserve order. If using a FlatFileItemReader, it will store the line for you if your POJO implements ItemCountAware. If running against a DB, you'll want to make sure the query preserves order with an order by on the unique index. Then you'll be able to track the chunk down by checking where read_count from the batch_step_execution table.
You can enable skipping. Spring Batch processes each item of a chunk again in a separate transaction after a chunk fails due to a skippable exception. It detects the item that caused the exception in this way.

Resources