What is a well documented caching strategy pattern for a microservice architecture dealing with legacy? - spring-boot

I'm building a microservices architecture that should deal with:
Direct database access
Call to external legacy services
I can think about 2 caching strategies, but can't figure out what is the best considering that I will not have control on what other people could do across the layers.
Caching at application level (#Cacheable)
I only provide a caching feature that everyone can use, by enforcing the spring.cache.redis.key-prefix to the microservice name to limit conflicting keys.
PRO: most flexible way
CONS:
No control over cache except for maximum space: people would just create new cache entries
No control over cache invalidation: we don't know what kind of data is actually stored so if, for example, a legacy system needs to be reloaded we cannot empty some cache keys
Possible redundancy: as caching is at application layer it could happen that microservices store about the same data. While I could have control on the database (one MS should own its own db or at least a subset of tables) I can't guarantee about the legacy SOAP layer
Caching at service layer (connectors)
I don't provide a caching feature but I provide custom soap connectors that will/will not cache response based on a configuration that I will provide (could also be a blacklist/whitelist)
PROS:
cache is controlled
easy to invalidate
CONS:
need to update connectors each time a cache policy changes
dependency between development and architecture
edit: I need suggestion about the theoretical approach, not about a specific technology.

I suppose you should build different microservices (apis) to deal with different set of responsibilities. Like , you can have a one microservice which deals with legacy and other one which deals with database. In order for these two microservices to communicate, you can have a message broker architecture like apache kafka (hazelcast being cost effective or Rabbit MQ).
Communication between these two microservices can be event driven as well.
Once you decide this, then you can finalize where to place your cache.
You will need to place cache at application level and not service level if there is an UI where you are showing these values.

Related

How do I access data that my microservice does not own?

A have a microservice that needs some data it does not own. It needs a read-only cache of data that is owned by another service. I am looking for guidence on how to implement this.
I dont' want my microserivce to call another microservice. I have too much data that is used in a join for this to be successful. In addition, I don't want my service to be dependent on another service (which may be dependent on another ...).
Currently, I am publishing an event to a queue. Then my service subscribes and maintains a copy of the data. I am haivng problem staying in sync with the source system. Plus, our DBAs are complaining about data duplication. I don't see a lot of informaiton on this topic.
Is there a pattern for this? What the name?
First of all, there are couple of ways to share data and two of them you mention.
One service call another service to get the data when it is required. This is good as you get up to date data and also there is no extra management required on consuming service. Problem is that if you are calling this too many times then other service performance may impact.
Another solution is maintained local copy of that data in consuming service using Pub/Sub mechanism.
Depending on your requirement and architecture you can keep this in actual db of consuming service or some type of cache ( persisted cache)
Here cons is consistency. When working with distributed architecture you will not get strong consistency but you have to depends on Eventual consistency.
Another solution is that and depends on your required you can separate out that tables that needs to join in some separate service. It depends on your use case.
If you still want consistency then at the time when first service call that update the data and then publish. Instead create some mediator component and that will call two service in sync fashion. Here things get complicated as you now try to implement transaction over distributed system.
One another point, when product build around Microservice architecture then it is not only technical move, as a organization and as a team your team needs to understand something that work in Monolith, it is not same in Microservices. DBA needs to understand that part and in Microservices Duplication of data across schema ( other aspect like code) prefer over reusability.
Last but not least, If it is always required to call another service to get data, It is worth checking service boundary as well. It may possible that sometime service needs to merge as business functionality required to stay together.

Why does each microservice get its own database?

It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?
There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.
Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !

Distributed caching with nhibernate orm

I am trying to implement caching in my application.
We are using Oracle database, asp.net web api to serve data to ui.
Api calls take more time, so we are thinking of implementing caching. Our code is deployed on 2 servers with load balancers.
How caching should be implemented.
What i am planning to implement is,
There should be a service API on any server, this api will store all data in memory. Ui will call our existing API, hit can go to any node, this api then will get data from new api(cache) and serve it to ui.
Is this architecture correct for distruted caching.
Can any one share their experience or guidance to implementation?
You might want to check NCache. Being a distributed caching solution, it provides first class support for sharing cache data between multiple clients due to the ache process running autonomously outside the address of any one application address space.
For your case, every web server in your load-balanced web farm will have be the client of NCache and have direct access to the cache servers. All the web servers,being clients to a central caching solution, will see the same cache data through simple-to-use NCache APIs. Any modification through insert, update or delete cache operations will be immediately observable to all the web servers.
The intelligence driving NCache allows for a seamless behind-the-scenes handling of all the tasks of storing and distributing the cache data among multiple cache server nodes on which the cache instance is distributed.
Furthermore, all the caching operations are completely independent of the framework used for database content retrieval and can be applied equally well with NHibernate, EF, EF Core and, of course, ADO.NET.
You can find more information about how to integrate NCache into your web farm environment and much more by using the following link:
http://www.alachisoft.com/resources/docs/ncache/admin-guide/ncache-architecture.html

Authorisation in microservices - how to approach domain object or entity level access control using ACL?

I am currently building microservices based system on java Spring Cloud. Some microservices use PostgreSQL and some of them MongoDB. REST and JMS is used for communication. The plan is to use SSO and OAuth2 for authentication
The challenge I am facing is that authorisation have to be done on domain object/entity level. It means some kind of ACL (Access Control List) is needed. The best practice for this kind of architecture is to avoid something like this and have coarse grained security probably on application/service layer level in every microservice but unfortunately it is not possible.
My final idea is to use Spring Security ACL and have the ACL tables in shared database between all microservices. The database would be accessed only by Spring infrastructure or through Spring api. The DB schema looks stable and unlikely will change. In this case I would simply break the rule about sharing db between microservices.
I was considering different kinds of distributed solutions but left them:
One microservice with ACL and accessing it using rest - The problem is too many http calls and performance degradation. I would have to extend Spring Security ACL to replace db access by rest calls
ACL in every microservice for its own entities - Sounds quite reasonable but imagine a case having some read models of entities synchronised to some other microservices or same entity that exists in different bounded contexts (different microservices). ACLs can become really unmanageable and can be source of errors.
One microservice with ACL tables that are synchronised to other microservices as a read model. The problem is that there is no support in Spring Security ACL for MongoDB. I have seen some custom solutions on github and yes it is doable. But...when creating a new entity I have to create record in the microservice that owns ACL and then it is asynchronously synchronised as a read model to microservice owning the entity. It does not sound as a easy solution
Choose some URL based access control on API gateway. But I would have to modify Spring Security ACL somehow. The API gateway would have to know too much about other services. Granularity of access control is bound to REST api granularity. Maybe I can not imagine all the consequences and other problems that would this approach bring
Finally the solution with shared db that I mentioned is my favorite. Actually it was the first one I have disqualified because it is “shared” database. But after going through possibilities it seemed to me that this is the only one that would work. There is some more additional complexity in case I would like to use some kind of caching because distributed cache would be needed.
I would really use some advice and opinions how to approach the architecture because this is really tricky and a lot of things can go wrong here.
Many thanks,
Lukas
I don't have a full and clear picture of your authorization requirements.
I'm assuming a correlation between authenticated users and domain object/entity permissions.
One option to consider is to define user attributes corresponding to your domain object/entity permissions, and implement an Attribute-based Access Control (ABAC) policy.
The attributes are tied to and stored with the users identity in your repository, and retrieved when performing your authentication.
I think nowadays a Google Zanzibar based approach would be best suited for this.
While tying services closer to each other - because every ACL related request must talk to the zanzibar service to evaluate on permissions - Googles paper on zanzibar describes really well how they solved the problem of latency and eventual consistency (or the "new enemy" problem in this case).
This is pretty much the "Shared Database" approach, but with a problem specific way of storing the database.
OSS implementations exist see SpiceDB (which supports CockroachDB as Backend) or Ory Kratos for example.
Shared Db is the best option with two data sources RO and RW. RO is for regular usage and RW for creating and modifying acl. We can think of storing the ACL in index server for faster look up. One final say for fastness is define / create more accessible fashion so that we can transact less. Especially acl based data approach has this caveat. In micro services approach the way to access data subjected to acl is first get data and filter based on the acl

Java EE App Design

I am writing a Java EE application which is supposed to consume SAP BAPIs/RFC using JCo and expose them as web-services to other downstream systems. The application needs to scale to huge volumes in scale of tens of thousands and thousands of simultaneous users.
I would like to have suggestions on how to design this application so that it can meet the required volume.
Its good that you are thinking of scalability right from the design phase. Martin Abbott and Michael Fisher (PayPal/eBay fame) layout a framework called AKF Scale for scaling web apps. The main principle is to scale your app in 3 axis.
X-axis: Cloning of services/ data such that work can be easily distributed across instances. For a web app, this implies ability to add more web servers (clustering).
Y-axis: separation of work responsibility, action or data. So for example in your case, you could have different API calls on different servers.
Z-Axis: separation of work by customer or requester. In your case you could say, requesters from region 1 will access Server 1, requesters from region 2 will access Server 2, etc.
Design your system so that you can follow all 3 above if you need to. But when you initially deploy, you may not need to use all three methods.
You can checkout the book "The Art of Scalability" by the above authors. http://amzn.to/oSQGHb
A final answer is not possible, but based on the information you provided this does not seem to be a problem as long as your application is stateless so that it only forwards requests to SAP and returns the responses. In this case it does not maintain any state at all. If it comes to e.g. asynchronous message handling, temporary database storage or session state management it becomes more complex. If this is true and there is no need to maintain state you can easily scale-out your application to dozens of application servers without changing your application architecture.
In my experience this is not necessarily the case when it comes to SAP integration, think of a shopping cart you want to fill based on products available in SAP. You may want to maintain this cart in your application and only submit the final cart to SAP. Otherwise you end up building an e-commerce application inside your backend.
Most important is that you reduce CPU utilization in your application to avoid a 'too-large' cluster and to reduce all kinds of I/O wherever possible, e.g. small SOAP messages to reduce network I/O.
Furthermore, I recommend to design a proper abstraction layer on top of JCo including the JCO.PoolManager for connection pooling. You may also need a well-thought-out authorization concept if you work with a connection pool managed by only one technical user.
Just some (not well structured) thoughts...

Resources