In data integration architectures, which component is responsible for communicating with data sources? - etl

There are two approaches in data integration, data warehouse and virtual integration, which are described through two figures as follows:
So which component is responsible for communicating with data sources?
In virtual integration it is Wrapper
But in data warehouse, it's Extractor or ETL(extract-transform-load)

Related

logstash vs spring cloud data flow, which one is suitable for data preprocessing?

I'm using spring boot along with elasticsearch to make a search system on my website.
I've some data that i need to push in elastic search, this data ( a product for example ) must be processed before ( passed to another micro-service that filters the JSON, adds some fields for a better search result do some calculations and return the object i want to store ). is it possible to do so with log stash, or do i need to use Spring Cloud Data Flow ? thanks in advance.
what i want to do:
save a product ( product service )
log the saved product or stream it.
process it before storage ( another service )
save the document ( elastic search server )
Thanks in advance.
Obviously it depends on various factors but I can try to provide some insights on Spring Cloud Data Flow from the technical standpoint.
If you want to construct a streaming pipeline where your filtering apps are connected via a messaging system that does this flow of data processing, you can checkout Spring Cloud Data Flow.
Spring Cloud Data Flow (and the underlying framework supports such as Spring Cloud Stream and Spring Cloud Task) provides the operational benefits over how you manage your streaming pipelines but it may not make sense if you don't need a data pipeline with a messaging system etc., In those cases, you would just stick to a simple Spring Boot app that does this whole filtering model. As soon as you start exploring the distribution of these applications loosely coupled via messaging system, Spring Cloud Data Flow would be handy.
Please checkout SCDF guide to understand some of the features and recipes to know more about what SCDF can offer and choose what fits in your case.

Should microservices connected with axon share the axon framework related tables?

I am starting a project where I want to have multiple services that communicate with each other using the axon server.
I have more than one service with the following stack:
Spring Boot 2.3.0.RELEASE (with starters: Data, JPA, web, mysql)
Axon
Spring Boot Starter - 4.2.1
Each one of the services uses different schemas in the mysql server.
When I start the spring boot service with the axon framework activated, some tables for tokens, sagas, etc are created in the database schema of each application.
I have two questions
In the architecture that I am trying to build, should I have only
one database for all the ‘axon enabled’ services, so the sagas,
tokens, events, etc are only in one place?
If so, can anyone
provide an example of how to configure a custom
EntityManagerProvider to have the database of the service separated
from the database of Axon?
I assume each of your microservices models a sub-domain. Since the events do model a (sub)domain, along with aggregates, entities and value objects, I very much favor keeping the Axon-related schemas separated, most likely along with the databases/schemas corresponding to each service. I would, thus, prefer a modeling-first approach when considering such technical options.
It is what we're currently doing in our microservices ecosystem.
There is at least one more technical reason to go with the same schema (one per sub-domain, that is), both for Axon assets and application-specific assets. It was pointed out to me by my colleague Marian. If you (will) use Event Sourcing (thus reconstructing the state of an aggregate by fetching and applying all past events resulted after handling the commands) then you will, most likely, need transactions which encompass this fetching as well as the command handling code which might, in turn, trigger (through events) writes to your microservice-specific database.
Axon can require five tables, depending on your usages of Axon of course.
These are:
The Event table.
The Snapshot Event table.
The Token table.
The Saga table.
The Association Value Entry table.
When using Axon Server, tables 1 and 2 will not be created since Axon Server is the storage solution for events and snapshots.
When not using Axon Server, I would indeed suggest to have a dedicated datasource for these.
Table 3 which services the TokenStore, should be as close as possible to your Query Models. The tokens portray how far a given EventProcessor is with handling events. As these EventProcessors typically service projectors which create your query models, keeping them together is sensible from a transactional perspective.
Table 4 and 5 are both required for Sagas. The "Saga table" stores the serialized sagas, whereas the "Association Value Entry table" carries the associations values between events and sagas so that the framework can load the right sagas. I'd store these either in a dedicated database or along with the other tables of the given (micro)service.

Spring boot Distrubuted transaction

We need to find best way to address distributed transaction management in our microservices architecture.
Here is the Problem Statement.
We have one Composite microservice which shall interact with underlying other 2 Atomic microservices (Which are meant for specific purpose obviously) and have separate database e.g. We can consider these 2 microservices as
STUDENT_SERVICE (STU_DB)
TEACHER_SERVICE (TEACHR_DB)
Here in Composite Service Usecase is like user (Administrator) can assign a Teacher to a student for the specific course etc.
I wonder how can we address this problem in one transaction as each servie (STUDENT_SERVICE and TEACHER_SERVICE ) has separate DB and all should happen in one transaction either commit or rollback.
Since those 2 services are separate and I see JTA would not be of help as it is meant for having these 2 applications (services) deployed on same application server!
I have opted out JTA as mentioned above
//Pseudo Code
class CompositeService{
AssignStaff(resquest){
//txn Start
updateStudentServiceAPI(request);
UpdateTeacherServiceAPI(request);
//txn End
}
}
System should be in consistent state after api execution
This is a tricky question even it's not obvious at the first sight.
The functionality you call for is understood to be an anti-pattern for microservice architecture.
Microservice architecture is in general a distributed system. Transactions in distributed systems are hard (see https://martin.kleppmann.com/2015/09/26/transactions-at-strange-loop.html). Your application consists from two services.
The JTA is a Java API for ACID style transactions. ACID transactions usually requires locks to be established in databases. As the transaction spans over multiple services (in your case there are two) then a failure of one service can block processing of the other service. In such case you are loosing the advantage of the microservice architecture - loose coupling and Independence of the services. You can end up of building a distributed monolith (see nice article https://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/).
Btw. there are several discussion on the topic of transactions in microservices here at Stackoverflow. Just search or check e.g.
Distributed transactions in microservices
Transactions in microservices
Transactions across REST microservices?
What are your options
(disclaimer: I'm a developer for http://narayana.io and presented options are from perspective of Java EE and Narayana. There could be other projects providing similar functionality. Plus, even Narayana integrates nicely with Spring you will possibly need to handle some integration issues.)
you really need to run the ACID style transaction in your project - aka you insists you need the transaction behaviour in way you describe. Then you need to span transaction over services. Then if services communicate over REST you can consider for example Narayana REST-AT (http://jbossts.blogspot.com/2011/03/rest-cloud-and-transactions.html, start looking into quickstart here https://github.com/jbosstm/quickstart/tree/master/rts)
you relax your requirements for atomicity and then you can cosider some transaction model relaxing the consistency (you are fine to be eventual consistent). You can consider for example LRA (https://github.com/eclipse/microprofile-lra/blob/master/spec/src/main/asciidoc/microprofile-lra-spec.adoc). (Unfortunately the spec and implementation is still not ready but PoC could be run on current state.)
you want to use a different approach for transaction processing completely. Then you can investigate on event sourcing. You would deploy e.g. Apache Kafka and send events for updates to the event store. Each service will reads those events and updates independently the DBs.

Azure alternative to spring cloud dataflow process

I'm looking for the azure alternative for the Data flow model of Data Source-processor-sink.
I want the three entities to be separate microservices. I want to use messaging as a link between these three.
Basically, Source app takes the data from another service and sends it to processor while processor app acts on it and sends relevant notification/alert to sink.
I'm aware I can use rabbitmq for the messaging but I need to know which one will be better in azure - service bus topics or eventhub? and how can I use them?
At the moment, there isn't a Spring Cloud Stream binder implementation for Azure Event Hubs.
Unless we have this, the out-of-the-box or the custom apps cannot be built as a messaging-microservice app, where Spring Cloud Stream provides the programming model and Spring Cloud Data Flow lets you orchestrate the individual microserivces in to a data pipeline (i.e., source-processor-sink) via the DSL/Drag-and-Drop GUI.
Microsoft was exploring the binder implementation in the past; possibly it would end up in Azure Spring Boot project. Feel free to drop an issue on their backlog.

Does Spring XD supports in-memory processing in Hadoop

In SpringXD site
link
Features
Data from anywhere, to anywhere
Data-driven apps require refined and consolidated data at scale.
Spring XD’s stream and batch workflow lets you build pipelines to
consume data from various endpoints and consolidate them in Hadoop,
in-memory data grids such as Redis or GemFire, and virtually any data
store.
SpringXD is the data ingestion layer. If you want to do you want to use SpringXD to sink your data into Gemfire,Redis, or JBDC and do your in memory processing there, SpringXD gives you tools out of the box to do so. For example you could use SpringXD to mine the TwitterStream, take all those tweets and sink them into Gemfire where you would then do processing. SpringXD is meant to be the ingestion layer as opposed to the computation layer.

Resources