Data Migration using Spring - spring

We are beginning the process of re-architecting the systems within our company.
One of the key components of the work is a new data model which better meets our requirements.
A major part of the initial phase of the work is to design and build a data migration tool.
This will take data from one or more existing systems and migrate it to the new model.
Some requirements:
Transformation of data to the new model
Enrichment of data, with default values or according to business rules
Integration with existing systems to pull data
Integration with Salesforce CRM which is being introduced into the company.
Logging and notification about failures
Within the Spring world, which is the best Spring project to use as the underlying framework for such a data migration tool?
My initial thoughts are to look at implementing the tool using Spring Integration.
This would:
Through the XML or DSL, allow for the high level data flow to be seen, understood, and edited (possibly using a visual tool such as a STS plugin). Being able to view the high level flow in such a way is a big advantage.
Connectors to work with different data sources.
Transformers components to be built to migrate data formats.
Routers to route the data in the new model to endpoints which connect with systems.
However, are there other Spring projects, such as Spring Data or Spring Batch, which are a better match for the requirements?
Very much appreciate feedback and ideas.

I would certainly start with spring-integration which exposes bare bones implementation for Enterprise Integration Patterns which are at the core of most/all of your requirements listed.
It is also an exceptionally great problem modelling tool which helps you better understand the problem and then envision its implementation in one cohesive integration flow
Later on, once you have a clear understanding of how things are working it would be extremely simple to take it to the next level by introducing the "other frameworks" you mentioned/tagged adding #spring-cloud-data-flow and #spring-cloud-stream.
Overall this question is rather broad, so consider following the above pointers and get started and raise more concrete questions.

Related

Can Spring batch used for data processing or it is only an ETL tool?

I'm trying to utilize Spring Batch in one of the projects that I have, as there is another project that is based on Spring Batch.
However the more I read the more I realize that Spring batch is nothing like ApacheBeam or MapReduce, it is only used for transferring the SAME data from one place to another with some type mapping like varchar -> string.
However, the task in hand requires some processing, not only types mapping and converting but also aggregations and data structure.
Can Spring batch used for data processing or it is only an ETL tool ?
well, i disagree on this point that spring batch - is only used for transferring the SAME data from one place to another with some type mapping like varchar -> string.
Worked in 4 years in this technology and have witnessed this framework grow a lot.
Spring batch is well capable of processing data, mapping, required conversion and data aggregations - spring batch can definitely be used for data processing .
being open source technology - you will get lot of material to read about, and the forums like stackoverflow have ton of FAQs around it.
For scaling and paralleling there are various architectures in spring batch, which will help in enhancing your performance.
Further details you can find here
SPRING_BATCH_SCALING_AND_PARALLELING
If you want to monitor your jobs then you cas use - Spring cloud date flow.
Monitoring can also be done - with AppDynamics.
Referrer this blog -
MONITOR_SPRING_BATCH_JOB_WITH_APP_DYNAMICS
Another advantage of using spring batch is you have lot of standerd predefined reader , processor and writer types - which support sources like file , DB , stream etc..
On top of this - as it is a java based framework you can do all stuff that can be done with java.
I hope this helps.
Your below write up is incorrect because its comparing apples to oranges,
However the more I read the more I realize that Spring batch is
nothing like ApacheBeam or MapReduce, it is only used for transferring
the SAME data from one place to another with some type mapping like
varchar -> string.
Unlike ApacheBeam or MapReduce, Spring Batch is not an engine but a programming framework. A programming framework usually consists of two major components - Code Structure Guidelines + APIs
So only restriction on a Java developer is to follow Spring Batch program structure guidelines and usage of Spring Batch APIs is optional.Though the modeling is Read -> Process -> Write, but a Java developer is free to write any kind of logic that he or she wishes to write in these components - only thoughts can limit as what a Java developer could write in these components.Further on, one artifact can be integrated with another artifact.
So I reiterate again that Spring Batch is a programming framework & not an engine or pre configured software like Hadoop so that comparison is like apple to oranges.
See this - Spring Tips: Spring Batch and Apache Kafka
As I have already said, a Java developer can develop any kind of program by being only in program structure limitations but logic being written has no bounds!
Saying one more time - Spring Batch is not an ETL tool like Informatica or Pentaho but a programming framework using Java and Spring. A developer can be as creative as he or she wants to be.
I had developed a real time data matching job that needed free text search capabilities using Apache Lucene by fitting in my programming into Spring Batch model.
Spring Batch (SB) gives us all three - E, T and L.
However, we have to decide whether or not use SB. Its again a quantitative decision whether if an individual/team really needs to learn it, if they dont know it. Need to evaluate ROI (Return on Investment). If its just E or T or L only, there might be another simpler solutions.
If we talk about Java only, AND either of these three, SB is not required. But again, when it comes to simplicity (if you know SB), scalability, monitoring, Transaction Managed Parallel Processing - all these come hand-in-hand with SB out of the box.

Implement a custom Spring Data Repository for a non-supported database

I want to implement a Spring Data Repository for a database which is not currentlty supported (hyphothetical question - no need to ask about the database).
How is this possible and where can I have an example of that?
Short answer is "yes, definitely". One of the main Spring-data's intentions is to unify access to different data storage technologies under same API style. So you can implement spring-data adapter for any database as long as it is worth implementing a connector to that database in Java (which is definitely possible for the majority of databases).
Long answer would take several blog posts or even a small book :-) But let me just highlight couple of moments. Each of the existing spring-data modules expose one of (or both) the API flavors:
imperative - in a form of various template classes (e.g. RedisTemplate). It is mostly for the databases that don't have query language, but only a programmatic API. So you're just wrapping your db's API into template class and you're done.
declarative - in a form of so called Declarative Repositories, quite sophisticated mechanism of matching annotations on method signatures or method signatures themselves to a db's native queries. Luckily spring-data-commons module provides a lot of scaffolding and common infrastructure code for this, so you just need to fill the gaps for your specific data storage mechanism. You can look at slide deck from my conference talk, where I explained on a high level the mechanics of how particular spring-data module generates real implementations of repositories based on user declarations. Or you can just go into any of the existing modules and look into source code. The most interesting parts there are usually RepositoryFactory and QueryLookupStrategy implementations.
That is extremely simplified view of the spring-data concepts. In order to get more detailed information and explanations of core principles, I'd suggest reading spring-data-commons reference documentation and having a look at spring-data-keyvalue project, which is a good starting point to implement Spring Data Module for key-value storages.

Project structure and configuration for microservices

Please ignore English grammar.
For Learning purpose I want to create a microservice project in Spring and I download some sample project and now I have some very basic
idea of microservices. But I am confused how I start my own project.
I want to implement the following simple use case.
In my database I have three table
Product,
ProductStock
and Order and I want to write microservice for each table.
Product microservice will have end point for crud operation.
ProductStock microservice will only have update stock and check stock end point.
Order microservice will only have posting order operation.
I create a multi module maven project and now I have following question.
1: Is creating multi module maven project is the only way to create microservices project.
2: I am using Hibernate so in which module(microservice) I create model classes. I need model classes in every module(microservice).
(Model classes are Product, ProductStock and Order).
3: Where I set hibernate confiuration.
Even if this question is way too broad, i'll try to answer your question as good as i can:
A multimodule project is not the only way (and i would even say, not a recommended way for different services). Usually you have completely separated Maven projects for each service.
Every service has to have its own data model and entity classes. Services should never share any entities, and should not access the same databases/schemas. They can use the same database server with different schemas.
In every service, which uses Hibernate.
The microservice architecture is not a trivial area so I would suggest to you that you start with some theory first. One of the books which is often referred here and there is Building Microservices By Sam Neuman. I highly recommend reading it or at least a part of it. This is the theoretical part.
Then for some hands on experiences you may want to clone/fork the PiggyMetrics project. This is an educational project but at the same time it contains quite a lot of patterns and advanced stuff.
After that you will be able to answer your own questions yourself, albeit there will be much more to ask ;-)
Good luck!

When to use default Spring Data REST behavior?

I recently worked on a project which uses Spring Data REST with Spring Boot. While it is GREAT to harness the power of Spring Data REST and build a powerful web service in no time, I have come to regret one thing: how tightly coupled the "presentation" layer (JSON returns) is to the underlying data structure.
Sure, I have used Projections and ResourceProcessors to manipulate the JSON, but that still does not completely sever ties with the database structure.
I want to introduce Controllers to the project, to integrate some of the "old" ways of building a web service in Spring. But how should I draw the line? I don't want to eradicate Spring Data REST from my project.
I am sure many of you have faced similar decisions, so any advice would be most appreciated!

Spring Integration as embedded alternative to standalone ESB

Does anybody has an experience with Spring Integration project as embedded ESB?
I'm highly interesting in such use cases as:
Reading files from directory on schedule basis
Getting data from JDBC data source
Modularity and possibility to start/stop/redeploy module on the fly (e.g. one module can scan directory on schedule basis, another call query from jdbc data source etc.)
repeat/retry policy
UPDATE:
I found answers on all my questions except "Getting data from JDBC data source". Is it technically possible?
Remember, "ESB" is just a marketing term designed to sell more expensive software, it's not a magic bullet. You need to consider the specific jobs you need your software to do, and pick accordingly. If Spring Integration seems to fit the bill, I wouldn't be too concerned if it doesn't look much like an uber-expensive server installation.
The Spring Integration JDBC adapters are available in 2.0, and we just released GA last week. Here's the relevant section from the reference manual: http://static.springsource.org/spring-integration/docs/latest-ga/reference/htmlsingle/#jdbc
This link describes the FileSucker with Spring Integration. Read up on your Enterprise Integration patterns for more info I think.
I kinda think you need to do a bit more investigation your self, or do a couple of tries on some of your usecases. Then we can discuss whats good and bad
JDBC Adapters appear to be a work in progress.
Even if there is no specific adapter available, remember that Spring Integration is a thin wrapper around POJOs. You'll be able to access JDBC in any component e.g. your service activators.
See here for a solution based on a polling inbound channel adapter too.

Resources