Currently I have multiple Item readers for individual Database queries. I want to get all of that information into a single Object. Is there a type of ItemReader which can do this for me? I'd like the Processor and Writer to handle that object after the reader reads in as a single object. So basically I need to create an object from a group of ItemReaders and use it from that point on as my only data source to process and write.
One thing to note is that I cannot change the existing ItemReaders due to it being a larger part of a project.
Even if you cannot change existing coded, I would avoid using multiple readers and instead look at using an ORM framework like Hibernate in a brand new reader?
Alternatively, create a new reader using JdbcCursorItemReader and just join the 3 (or more) tables together for your driving cursor and map the result set to your custom object. If the joins will cause the cursor to return multiple rows per object, I first would suggest to go back and look at Hibernate. If Hibernate isn't an option, you could extend the SingleItemPeekableItemReader to iterate through the cursor until you've fully built your objects.
Example using SingleItemPeekableItemReader.
Related
I am looking to read data from multiple tables (different database tables) and aggregate and create final result set. In my case, each query will return the List of object. I went through web many times, I found no link other than - Spring Batch How to read multiple table (queries) as Reader and write it as flat file write, but it returns only single object.
Is there any way if we can do this ? Any working sample example would help a lot.
Example -
One query gives List of Departments - from Oracle DB
One query gives List of Employee - from Postgres
Now I want to build Employee and Department relationship and send final object to processor to further lookup against MongoDB and send the final object to reader.
The question should rather be "how to join three tables from three different databases and write the result in a file". There is no built-in reader in Spring Batch that reads from multiple tables. You either need to create a custom reader, or decompose the problem at hand into tasks that can be implemented using Spring Batch tasklet/chunk-oriented steps.
I believe you can use the driving query pattern in a single chunk-oriented step. The reader reads employee items, then a processor enrich items with 1) department from postgres and 2) other info from mongo. This should work for small/medium datasets. If you have a lot of data, you can use partitioning to parallelize things and improve performance.
Another option if you want to avoid a query per item is to load all departments in a cache for example (I guess there should be less departments than employees) and enrich items from the cache rather than with individual queries to the db.
One advantage of Document DBs like Couchbase is schemaless entities. It gives me freedom to add new attributes within the document without any schema change.
Using Couchbase JsonObject and JsonDocument my code remains generic to perform CRUD operations without any need to modify it whenever new attribute is added to the document. Refer this example where no Entities are created.
However if I follow the usual Spring Data approach of creating Entity classes, I do not take full advantage of this flexibility. I will end up in code change whenever I add new attribute into my document.
Is there a approach to have generic entity using Spring Data? Or Spring Data is not really suitable for schemaless DBs? Or is my understanding is incorrect?
I would argue the opposite is true.
One way or another if you introduce a new field you have to handle the existing data that doesn't have that field.
Either you update all your documents to include that field. That is what schema based stores basically force you to do.
Or you leave your store as it is and let your application handle that issue. With Spring Data you have some nice and obvious ways to handle that in a consistent fashion, e.g. by having a default value in the entity or handling that in a listener.
I have a requirement where I will receive a flat file from a vendor and I need to read the records and insert/update/delete them in my DB table. I get the action flag from vendor indicating whether I need to insert/update/delete that particular record. The flat file will contain huge records and I do not want to do manual steps like checking the action flag for every record [by overriding write() method of ItemWriter and looping the items list in chunk] and construct sql manually and use JDBCTemplate to do the DB operation for every record.
Can I achieve this using JdbcBatchItemWriter? Is there a way to set the sql for every record in the chunk so that Spring Batch will do a batch update? How does the ItemPreparedStatementSetter can be invoked in that case?
Since your choice is at the record level, take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). That ItemWriter implementation takes a Classifier implementation that it uses to determine which ItemWriter to use. From there, you can configure one ItemWriter that does inserts, one for updates, and one for deletes. Each record will be funneled through to the correct instance and assuming your delegates are JdbcBatchItemWriters, you'll get the same batching you normally do (one batch for inserts, one for updates, and one for deletes).
I am using Spring batch 2.1.9.RELEASE
I need to configure a job-step which reads the data from Mysql DB, process it and write back to Mysql. I want to do it in chunks.
I considered using JdbcCursorItemReader but the SQL is a complex one. I need to fetch data from three other tables to create the actual SQL to use in the reader.
But if I use a customItemReader with JdbcTemplate/NamedParameterJdbcTemplate, how can i make sure the step processes the data in chunk? I am not using JPA/DAOs.
Many thanks,
In Spring-batch data are normally processed as chunk; the easy way is to declare a commit-interval in step definition; see Configuring a step.
Another way to define a custom chunk policy is to implements your own CompletionPolicy.
To answer your question use the Driving Query Based ItemReaders to read from main table and build complex object (reading from other tables), define a commit-interval and use the standard read/process/write step pattern.
I hope I was clear, English is not my language.
I am building a DataAccess layer to a DB, what data structure is recommended to use to pass and return a collection?
I use a list of data access objects mapped to the db tables.
I'm not sure what language you're using, but in general, there are tradeoffs of simplicity vs extensibility.
If you return the DataSet directly, you have now coupled yourself to database specific classes. This leaves little room for extension - what if you allow access to files or to other types of data sources? But, it is also very simple. This is the recordset pattern and C#/VB provide a lot of built-in support for this. The GUI layer can access the recordset and easily manipulate the data. This works well for simple applications.
On the other hand, you can wrap the datasets in a custom object, and provide gateway methods (see the Gateway pattern http://martinfowler.com/eaaCatalog/gateway.html). This method is more complex, but provides a lot more extensibility. In a larger application when you need to separate the the business logic, data logic, and GUI logic, this is a more robust way to go.
For larger enterprise applications, you can look into using Object Relational Mapping tools (ORM). They help to automatically map java objects to database tables. They hide a lot of the painful SQL details. Frameworks such as Spring provide excellent support for ORMs.
I tend to use arrays of objects, so that I can disconnect the DAO from the business logic.
You can store the data in the DAO as a dataset, for example, and give them an easy way to add to the database before doing an update, so they can pass in information to do modification operations, and then when they want to commit the changes they can do it in one shot.
I prefer that the user can't add/modify the structure themselves, as it makes it harder to determine what must be changed in the database.
By initially returning an array they can then display what is in the database.
Then, as the presentation layer makes changes, the DAO can be updated by the controller. By having a loose coupling the entire system becomes more flexible, as you can change the DAO from a dataset to something else, and the rest of the application doesn't care.
There are two choices that are the most generic.
The first way to look at a ResultSet is as a List of Maps, where each Map represents a row in the ResultSet. The keys are the columns listed in the FROM clause; the values are the database values.
The second way to look at a ResultSet is as a Map of Lists, where each List represents a column in the ResultSet. The Map keys are the columns listed in the FROM clause; the values are the List of database values.
If you don't want to do full-blown ORM, these can carry you a long way.