I am using Spring-Batch to execute a batch that creates some objects in the database, creates a file from these objects and then sends the file to a FTP server.
Thus, I have 2 steps : One that reads conf from DB, insert into the DB and creates the file ; the second sends the file to the FTP server.
The problem is when there is a problem with the FTP server, I can't rollback the transaction (to cancel the new inserts into the DB).
How can I configure my Job to use just one transaction over the different steps?
This is a bad idea due to transactional nature of spring-batch.
IMHO a simple solution should be to mark data saved in step 1 with a token generated when job starts and, if your FTP upload will fail, move to a cleanup step to delete all data with token.
A agree with bellabax: this is a bad idea.
But I wouldn't do a 3rd cleanup step because this step may also fail, letting the transaction not rollbacked.
You could mark the inserted entries with a flag that indicates the entries has not yet been sent to the FTP.
The 3rd step would switch the flag to indicate that these entries has been sent to the FTP.
Then you just need a cron/batch/4th cleaning step/whatever that would remove all entries that haven't been sent to the FTP
Related
I am currently on batch processor and the issue we are facing currently is that when Batch processor is reading files, it could restart unexpectedly in middle of reading files, this will make the full flow not working because when the BP resumes reading file it may be reading file that is already saved in database and causing duplicate key exception.
So, I have been told to implement the solution where when the BP runs into duplicate key exception, it should read the file from bottom to top and when it runs into duplicate key exception again, it should move to next file.
I am looking for advice/guidance on how to implement/code this solution?
A correctly configured Spring Batch job (persistent job repository + chunk-oriented step) would allow you to restart that kind of failed jobs without any issue.
In fact, the read count will be saved in the database and used in a restart scenario. No data would be written to the database in case of a chunk failure (the transaction will rolled back). So upon restart, the job would resume reading from the last save point and save new data.
One of my nifi nodes/instances is refusing to reconnect to the cluster
Proposed flow is not inheritable by the flow controller and cannot completely replace the current flow due to: Proposed Flow does not contain a Connection with ID 4d2c4e9d-0176-1000-0000-0000310c611f but this instance has data queued in that connection, updateId=307]
Without entering in why this happened, how can I recover from this error? Even if I overwrite the flow.xml.gz file it will refuse to accept it because it knows that there is data queued for that connection.
Can I flush / delete that data somehow?
I had tried to delete/move
flow.xml.gz
flowfile_repository
content_repository
database_repository
But I get the same error on startup, where does Nifi track that connection 4d2c4e9d-0176-1000-0000-0000310c611f had data in this nifi node?
Deleting (back it up first) the flow.xml.gz file should fix it.
Make sure that you are actually moving/deleting the right flox.xml.gz file since it may not be in the default location.
So check the actual location of the flow file at $NIFI_HOME/conf/nifi.properties , look for nifi.flow.configuration.file. Then delete that one (backup first) and the node should be able to reconnect.
I am working on a design where I need to move files from one Storage account to another storage account. And after let's say a week, delete those files.
One file is going to successfully move I can either either send a message to Event Hub or Write a record into SQL DB
For Deletion of files I have two approach.
I have two approach:
Polling
Poll daily for SQL DB entry and then check the last modified timestamp and delete it.
Update the SQL DB entry for the file and reflect that file is deleted.
Event Based
Send a message to event grid as soon as the file is deleted.
However, I am not able to figure out how to wait for 1 week before I delete a file. If I had to delete file immediately I can do upon receiving message.
Have you considered using Service Bus queues with schedule feature? Service Bus queues/topics may be a better fit for delayed processing requirement.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sequencing#scheduled-messages
I have a spring-batch job scanning the SFTP server at a given interval. When it finds a new file, it starts the processing.
It works fine for most cases, but there is one case when it doesn't work:
User starts uploading a new file to the SFTP server
Batch job checks the server and finds a new file
It start processing it
But since the file is still being uploaded, during the processing it encounters unexpected end of input block, and the error occurs.
How can I check that file was fully uploaded to the SFTP server before batch job processing starts?
Locking files while uploading / Upload to temporary file name
You may have an automated system monitoring a remote folder and you want to prevent it from accidentally picking a file that has not finished uploading yet. As majority of SFTP and FTP servers (WebDAV being an exception) do not support file locking, you need to prevent the automated system from picking the file otherwise.
Common workarounds are:
Upload “done” file once an upload of data files finishes and have
the automated system wait for the “done” file before processing the
data files. This is easy solution, but won’t work in multi-user
environment.
Upload data files to temporary (“upload”) folder and move them atomically to target folder once the upload finishes.
Upload data files to distinct temporary name, e.g. with .filepart extension, and rename them atomically once the upload finishes. Have the automated system ignore the .filepart files.
Got from here
We had similar problem, Our solution was, we configured spring-batch cron trigger to trigger the job every 10min(though we could configure for 5min, as file transfer was taking less than 3min), then we read/process all the files created prior to 10 minutes. We assume the FTP operation completes within 3 minutes. This gave us some additional flexibility such as when spring-batch app was down etc.
For example if the batch job triggered at 10:20AM we read all the files that were created before 10:10AM, like-wise job that runs at 10:30, reads all the files created before 10:20.
Note: Once Read you need to either delete or move to history folder for duplicate reads.
I have a spring batch integration where multiple servers are polling a single file directory. This causes a problem where a file can be processed up by more than one. I have attempted to add a nio-lock onto the file once a server has got it but this locks the file for processing so it can't read the contents of the file.
Is there a spring batch/integration solution to this problem or is there a way to rename the file as soon as it is picked up by a node?
Consider to use FileSystemPersistentAcceptOnceFileListFilter with the shared MetadataStore: http://docs.spring.io/spring-integration/reference/html/system-management-chapter.html#metadata-store
So, only one instance of your application will be able to pick up a file.
Even if we find a solution for nio-lock, you should understand that lock means "do not touch until freed". Therefore when one instance has done its work, another one is ready to pick up the file. I guess that isn't your goal.