How to implement a multi-tenant database for spring cloud data flow

How to implement a multi-tenant database for spring cloud data flow - spring

We would like to implement a multi-tenant solution for SCDF for which each tenant may have unique task definitions / etc. Ideally we only want a single SCDF server (as opposed to setting up an SCDF server for each tenant), as pictured:
Is this possible or is the only way to achieve isolation of the data between tenants to have separate data flow server instances?

What you're attempting here is not possible today. You'd have to provision SCDF for each tenant. In cloud platforms like Kubernetes or Cloud Foundry, it is recommended because you can access-control the tenants through "namespace" and "org/space" isolation respectively. On this foundation, the platforms provide a more robust separation through RBAC assignments for each user in the Tenant.
A little bit of more background as to why we do this today. SCDF and the Task/Job repositories are coupled in the sense that the Dashboard and the other client tools interact with the same datasource to provide the consistent UX to monitor and manage the data pipelines centrally. With the recent multi-platform backends support for Tasks, you're still expected to use a common datasource in the current design.
All that said, we are looking into improving to allow users to have a database with schemas prefixed with an identifier [see: spring-cloud/spring-cloud-dataflow#2048]. With that in place, it would be possible to then filter by the identifier-specific task/job executions and likewise track them as isolated units of operations within the single SCDF instance.
However, it may not scale for cloud deployments. Each of the tenant isolation boundaries, for instance, a "namespace" in Kubernetes needs to have enough resources (cpu/memory/disk) to handle "multiple" tenant deployments of task/batch apps. If you don't autoscale the resource capacity, you'd have deployment failures.
Maybe you could help with describing your requirements in some more detail, so we could relate to why this could still be useful. Please also share how you're going to design the resource allocations in the underlying deployment platform - feel free to comment in #2048.

Related

Running multiple Quarkus instances on one machine

I have an application separated in various OSGI bundles which run on a single Apache Karaf instance. However, I want to migrate to a microservice framework because
Apache Karaf is pretty tough to set up due its dependency mechanism and
I want to be able to bring the application later to the cloud (AWS, GCloud, whatever)
I did some research, had a look at various frameworks and concluded that Quarkus might be the right choice due to its container-based approach, the performance and possible cloud integration opportunities.
Now, I am struggeling at one point and I didn't find a solution so far, but maybe I also might have a misunderstanding here: my plan is to migrate almost every OSGI bundle of my application into a separate microservice. In that way, I would be able to scale horizontally only the services for which this is necessary and I could also update/deploy them separately without having to restart the whole application. Thus, I assume that every service needs to run in a separate Quarkus instance. However, Quarkus does not not seem to support this out of the box?!? Instead I would need to create a separate configuration for each Quarkus instance.
Is this really the way to go? How can the services discover each other? And is there a way that a service A can communicate with a service B not only via REST calls but also use objects of classes and methods of service B incorporating a dependency to service B for service A?
Thanks a lot for any ideas on this!

I think you are mixing some points between microservices and osgi-based applications. With microservices you usually have a independent process running each microservice which can be deployed in the same o other machines. Because of that you can scale as you said and gain benefits. But the communication model is not process to process. It has to use a different approach and its highly recommended that you use a standard integration mechanism, you can use REST, you can use Json RPC, SOAP, or queues or topics to use a event-driven communication. By this mechanisms you invoke the 'other' service operations as you do in osgi, but you are just using a different interface, instead of a local invocation you do a remote invocation.
Service discovery is something that you can do with just Virtual IP's accessing other services through a common dns name and a load balancer, or using kubernetes DNS, if you go for kubernetes as platform. You could use also a central configuration service or let each service register itself in a central registry. There are already plenty different flavours of solutions to tackle this complexity.
Also more importantly, you will have to be aware of your new complexities, but some you already have.
Contract versioning and design
Synchronous or asynchronous communication between services.
How to deal with security in the boundary of the services / Do i even need security in most of my services or i just need information about the user identity.
Increased maintenance cost and redundant side code for common features (here quarkus helps you a lot with its extensions and also you have microprofile compatibility).
...
Deciding to go with microservices is not an easy decision and not one that should be taken in a single step. My recommendation is that you analyse your application domain and try to check if your design is ok to go with microservices (in terms of separation of concenrs and model cohesion) and extract small parts of your osgi platform into microservices, otherwise you mostly will be force to make changes in your service interfaces which would be more difficult to do due to the service to service contract dependency than change a method and some invocations.

Spring dataflow and GCP Pub Sub

I'm building an event-driven microservice architecture, which is supposed to be Cloud agnostic (as much as possible). Since this is initially going in GCP and I don't want to spend a long time in configurations and all that, I was going to use GCP's Pub/Sub directly for the event queue and would take care of other Cloud implementations later, but then I came across Spring Cloud Dataflow, which seemed nice because these are Spring Boot microservices and I needed a way to orchestrate them.
Does Spring Cloud Dataflow support Pub Sub as it's event queue?
Would it make my life easier in terms of configuration and setup going that path, rather than choosing a non native broker?

It'd be useful first to unpack the Spring Cloud Stream's "binder abstraction" because it is using this framework, you'd have a portable event-driven streaming application, which can run locally in your laptop or any cloud of your choice against the desired message broker.
Learn more about the binder-abstraction here. Here are all the available binder implementations of choice. Google PubSub is an option, and it is maintained by Google here.
Now, let's talk about Spring Cloud Data Flow (SCDF). Once when you have built the streaming applications, you could use SCDF to design+create a data pipeline made of such applications. There's the option to mix and reuse the collection of utility applications that we build, maintain, and release as well. The utility applications can be packaged with Google PubSub or other binders. More details here.
When you deploy the data pipeline, SCDF will resolve and download the individual applications to deploy them natively on platforms like Kubernetes or Cloud Foundry. We have users doing the same in a variety of cloud infrastructure (VMs, Bare-metal, EC2, Rackspace, etc.), including DIY platforms, too.
While also automating the deployment of the applications, SCDF will automate the configuration setup based on naming conventions derived from stream/task and application names as a combination. So, when the apps bootstrap, they would have automatically received the connection configurations (from SCDF) and as well the destination/topic to connect to along with the other metadata to reason through a collection of apps as a "stream" or a "task/batch" data pipeline. This allows you to monitor and manage the pipelines centrally.
Lastly, there's the native ability in SCDF to rolling-upgrade/rolling-downgrade 1 or many applications in a data pipeline without impacting the upstream or downstream consumers in production. More details here. There's a webinar recording (demo starts at ~41.25) on how to do with CI/CD automation.

Multi-tenancy in Kyma

Could you please help me to understand how we can achieve multi tenancy using Kyma?
If we want to migrate our existing cloud applications we need to support multi tenancy as they all supports it.

There is no special support for multitenancy in Kyma. It is up to application developer to decide what pattern to use. You can have multitenancy on infrastructure layer (kubernetes cluster per customer), or on application layer (single instance but separate storage for customers). Separate cluster per customer is the default solution for enterprise customers. For smaller customers, you share infrastructure usually.

How is state handled in the go cloud?

In terraform we get a state file, and CloudFormation also has a notion of a working state. How does go cloud handle the state, do we have to create it ourselves?
For more info on Go Cloud
https://github.com/google/go-cloud
https://godoc.org/github.com/google/go-cloud

Terraform wants to solve the problem of managing and provisioning Cloud services.
Go Cloud wants to solve the problem of using Cloud services in application code.
So, they work well together. For example, the Go Cloud sample guestbook app (https://github.com/google/go-cloud/tree/master/samples/guestbook) uses Terraform to provision the resources needed to run the app on various Cloud providers; the application code in the sample has a small amount of provider-specific setup code, but the application logic itself is provider-agnostic.

go-cloud:
The Go Cloud Project is an initiative that will allow application developers to seamlessly deploy cloud applications on any combination of cloud providers. It does this by providing stable, idiomatic interfaces for common uses like storage and databases. Think database/sql for cloud products.
Terraform:
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.
So with go-cloud you could create a tool like terraform that, for now can provide generic APIs for:
Unstructured binary (blob) storage
Variables that change at runtime (configuration)
Connecting to MySQL databases
Server startup and diagnostics: request logging, tracing, and health checking

How can you scale a Spring Boot application?

I understand that Spring Boot has a built-in Tomcat server (or Jetty) which facilitates rapid development. But what do you do when you need to scale out your application because traffic has increased?

As pointed out in the comments, there is no silver bullet here, it depends on your infrastructure and there are several tools out there to help you, you only need to choose what works best for you.
For load balancing you can either choose something like an Nginx or leave it to spring cloud which also has a lot of other handy features for scaling/clustering.
Scaling shouldn't be very hard because spring boot runs on it's own server.
Some tools that help with scaling/clustering:
Spring boot app:
If you are going to scale, your app has to be near-stateless (e.g: you cannot have a scheduled task or something like that because when you scale to x instances, they are executed x times).
You can use the spring cloud project for extra added features like service discovery and other goodies that make scaling easier (e.g: When you spin up a new instance, it can get the config easily from a config server, 'register' to ease the loadbalancing between services, have cluster-like behaviour, etc...).
Infrastructure and containers:
Docker is a no-brainer here to handle easy launching of your applications and their replicas, if needed. If you can go further with resources and go with Kubernetes but it all depends on the use case.
Various servers (nodes), in case one of them fails and to easily distribute loads.
Ngnix for load balancing is pretty straightforward if you already don't have something done with spring cloud.
Database:
You really do NOT want to go with MySQL here because it can not scale well as your spring apps. You can choose something like Cassandra or Redis but that would mean restructuring your data model. Maybe the least-painful transition from MySQL to something NoSQL that can scale is a MongoDB (imho: Cassandra performs better).
Logging:
This can be a nightmare but spring also has a solution for this. Check out zipkin and spring sleuth.
Also, there are a lot resources here that talk a lot about architecture in general and how it is necessary to change the mindset when trying to run distributed services.
Hope this helps.
Update 2021-02-23
Today, Kubernetes is pretty much a de-facto standard when we talk about scaling and is preferred because of the rich set of features that you will be able to leverage and focus your app purely on business domain logic and can remove things like spring cloud for service discovery. If you can use some public clouds like EKS and GKE, you are better off without having to manage the clusters by yourself.
It provides autoscaling and built-in healthchecks. Starting from Spring Boot 2.4, you have many added benefits for running Spring Boot on K8s like dedicated healthcheck endpoints for liveness and readiness probes, graceful shutdown, etc....
On the database side, aim for something that is managed and scales easily such as AWS Aurora or similar.
An important thing to mention when managing spring boot services at scale is probably configuration management. A very useful solution that you can use out of the box is Consul. This will enable you to hot reload the configuration which is important when you have 50 services that you need to restart only to change one boolean variable. Depending on how big is your application, the startup can be costly, in terms of time as well as CPU/memory resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio