some questions about purpose of both ItemStreamReader and ItemStreamWriter in Spring Batch - spring

I'm studying about Spring Batch.
I'm using ItemReader and ItemWriter in my Spring Batch Project.
However, my project's the biggest problem is that all data reading logic is in constructor without paging.
I thought it's so unusual and improper usage.
So I read Spring Batch Documentations, and I found ItemStreamReader, ItemStreamWriter.
I thought it may be useful of improving my project that data reading logic moves to open and update methods with paging.
In order to add paging feature.
However, Document said about only Execution Context.
So I'm not sure that data reading logic in open or update with paging is proper.
Is it okay if I use open, update methods to read paged data?

open / close methods in ItemStream will only be called once to initialize/dispose any resources used by the reader, so they are not suitable to read pages.
I would recommend to use one of the provided paging item readers, or extend them if necessary (see AbstractPagingItemReader and AbstractPaginatedDataItemReader) rather creating a paging item reader from scratch.

Related

What's the best practice for NSPersistentContainer newBackgroundContext?

I'm familiarizing myself with NSPersistentContainer. I wonder if it's better to spawn an instance of the private context with newBackgroundContext every time I need to insert/fetch some entities in the background or create one private context, keep it and use for all background tasks through the lifetime of the app.
The documentation also offers convenience method performBackgroundTask. Just trying to figure out the best practice here.
I generally recommend one of two approaches. (There are other setups that work, but these are two that I have used, and tested and would recommend.)
The Simple Way
You read from the viewContext and you write to the viewContext and only use the main thread. This is the simplest approach and avoid a lot of the multithread issues that are common with core-data. The problem is that the disk access is happening on the main thread and if you are doing a lot of it it could slow down your app.
This approach is suitable for small lightweight application. Any app that has less than a few thousand total entities and no bulk changes at once would be a good candidate for this. A simple todo list, would be a good example.
The Complex Way
The complex way is to only read from the viewContext on the main thread and do all your writing using performBackgroundTask inside a serial queue. Every block inside the performBackgroundTask refetches any managedObjects that it needs (using objectIds) and all managedObjects that it creates are discarded at the end of the block. Each performBackgroundTask is transactional and saveContext is called at end of the block. A fuller description can be found here: NSPersistentContainer concurrency for saving to core data
This is a robust and functional core-data setup that can manage data at any reasonable scale.
The problem is that you much always make sure that the managedObjects are from the context you expect and are accessed on the correct thread. You also need a serial queue to make sure you don't get write conflicts. And you often need to use fetchedResultsController to make sure entities are not deleted while you are holding pointers to them.

The best way to store restart information in spring-batch readers, processors, writers and tasklets

Currently I'm designing my first batch application with spring batch using several tasklets and own readers, writers and processors primarily doing input data checks and tif-file handling (split, merge etc) depending on the input data i.e. document metadata with the appertaining image files. I want to store and use restart information persistet in the batch_step_execution_context in the spring-batch job-repository. Unfortunately I did not find many examples where and how to do this best. I want to make the application restartable so that it can continue after error correction at the point it left off.
What I have done so far and checked if in case of an exception the step information has been persistet:
Implemented ItemStream in a CustomItemWriter using update() and open() to store and regain information to/from the step_execution_context e.g. executionContext.putLong("count", count). Works good.
Used StepListeners and found that the context information written in beforeStep() has been persistet. Also works.
I appreciate help which will give or point to some examples, "restart tutorial" or sources where to read how to do it in Readers, Processors, Writers and tasklets. Does it make sense in Readers and Processors? I'm aware that handling restart information might also depend on commit-interval, restartable flags etc..
Remark: Maybe I require some deeper understanding of spring-batch concepts beyond what I read and tried so far. Also hints regarding this are welcome. I consider myself as intermediate level lacking details to make my application using some comforts of spring-batch.

Clearing and freeing memory

I am developing a windows application using C# .Net. This is in fact a plug-in which is installed in to a DBMS. The purpose of this plug-in is to read all the records (a record is an object) in DBMS, matching the provided criteria and transfer them across to my local file system as XML files. My problem is related to usage of memory. Everything is working fine. But, each time I read a record, it occupies the memory and after a certain limit the plug in stops working, because of out of memory.
I am dealing with around 10k-20k of records (objects). Is there any memory related methods in C# to clear the memory of each record as soon as they are written to the XML file. I tried all the basic memory handling methods like clear(), flush(), gc(), & finalize()/ But no use.
Please consider he following:
Record is an object, I cannot change this & use other efficient data
structures.
Each time I read a record I write them to XML. and repeat this
again & again.
C# is a garbage collected language. Therefore, to reclaim memory used by an object, you need to make sure all references to that object are removed so that it is eligible for collection. Specifically, this means you should remove the objects from any data structures that are holding references to them after you're done doing whatever you need to do with them.
If you get a little more specific about what type of data structures you're using we can probably give a more specific answer.

Cache Management with Numerous Similar Database Queries

I'm trying to introduce caching into an existing server application because the database is starting to become overloaded.
Like many server applications we have the concept of a data layer. This data layer has many different methods that return domain model objects. For example, we have an employee data access object with methods like:
findEmployeesForAccount(long accountId)
findEmployeesWorkingInDepartment(long accountId, long departmentId)
findEmployeesBySearch(long accountId, String search)
Each method queries the database and returns a list of Employee domain objects.
Obviously, we want to try and cache as much as possible to limit the number of queries hitting the database, but how would we go about doing that?
I see a couple possible solutions:
1) We create a cache for each method call. E.g. for findEmployeesForAccount we would add an entry with a key account-employees-accountId. For findEmployeesWorkingInDepartment we could add an entry with a key department-employees-accountId-departmentId and so on. The problem I see with this is when we add a new employee into the system, we need to ensure that we add it to every list where appropriate, which seems hard to maintain and bug-prone.
2) We create a more generic query for findEmployeesForAccount (with more joins and/or queries because more information will be required). For other methods, we use findEmployeesForAccount and remove entries from the list that don't fit the specified criteria.
I'm new to caching so I'm wondering what strategies people use to handle situations like this? Any advice and/or resources on this type of stuff would be greatly appreciated.
I've been struggling with the same question myself for a few weeks now... so consider this a half-answer at best. One bit of advice that has been working out well for me is to use the Decorator Pattern to implement the cache layer. For example, here is an article detailing this in C#:
http://stevesmithblog.com/blog/building-a-cachedrepository-via-strategy-pattern/
This allows you to literally "wrap" your existing data access methods without touching them. It also makes it very easy to swap out the cached version of your DAL for the direct access version at runtime quite easily (which can be useful for unit testing).
I'm still struggling to manage my cache keys, which seem to spiral out of control when there are numerous parameters involved. Inevitably, something ends up not being properly cleared from the cache and I have to resort to heavy-handed ClearAll() approaches that just wipe out everything. If you find a solution for cache key management, I would be interested, but I hope the decorator pattern layer approach is helpful.

Spring batch: chaining reader.processor, reader for the same initial data?

I am new to spring batch and researching the technology for some background processing project. I have gone through the doc but not sure it answers my question. So I need to chain the following for the same "stream" of data. read in , validate/process, and reread for new data(basically piping the same data though multiple readers sandwiched by a processor . I am not sure if I am expressing myself but may be it is clear.
I know I can do multiple reads but not sure if injecting the processor is viable
Any ideas, opinions,etc Thanks
Instead of trying to do multiple reads you should consider chaining item processors after a single read before the final write.
The reason for this suggestion is chaining item processors allows you to take input, act upon it, transform it, and pass it along to the next processor in the chain.
Take a look at the section 6.3.1. Chaining ItemProcessors in the Spring Batch documentation for more information and some simple examples.
In Spring Batch terms the "piping the same data though multiple readers sandwiched by a processor" sounds like you need a job with several steps chained one after another.
If in turn you want to feed the items (messages) obtained on one step to the next reader/processor/writer for further processing then you might be building a Message Driven Application and the Spring Integration might be a more natural choice to achieve the goal.

Resources