Actor granularity in Azure Service Fabric vs Project Orleans - actor

Take a simple example: I have a service that has 1,000,000 users and each user has some profile information. I want to manage CRUD operations on this profile information using actors.
In Project Orleans, my understanding is that I would have one grain per user, so 1,000,000 virtual grains of the same actor type (that would only be created if used), and each grain would manage the profile information of a single user stored in its state. As my users grow, so do the number of grains.
In Service Fabric, if I'm interpreting the documentation right, it would work slightly differently. I would have a stateful actor type that managed CRUD operations on all users, and for scalability I would partition the actor, giving each partition responsibility for a subset of user data. Given the partition options, I can't see an obvious way to implement it the same fine-grained way as Project Orleans.
I really like the approach in Project Orleans. The actor is just handling data for a single user, and scalability is obvious (more users equals more grains). The memory model is also simple: a single actor gets hydrated on demand with it's small quantity of state.
It seems the Service Fabric implementation would be slightly more complicated. Each actor is dealing with a set of users, and for scalability I have to decide in advance how many partitions I should make as this can't be modified later. As for the memory model, the amount of data managed by each actor grows as the number of users grows.
So my question is: Is my understanding correct that actors in Service Fabric are simply more coarse grained than Project Orleans?
Update
Thanks for the answers. In my mistake was thinking that a partition contained a single actor instance that would contain and manage the state for all actor IDs within the partition. This is totally wrong. Michiel points out that a partition contains a number of actor instances, one per actor ID. Therefore the actors could be implemented in the same way as in Project Orleans. This makes far more sense now, thanks.

An ActorType is actually hosted in a Service. That service is partitioned. Each partition will hold a number of instances of your ActorType (according to the ranges and partition count that you specify).
Using the API you can get hold of an Actor instance (you do not have to explicitly create one):
var actor = ActorProxy.Create<IActorType>(new ActorId("some id"), "fabric:/application");
In Orleans, your grains are spread out over silo's without bundling them in partitions. So Orleans can move a single instance to a different Silo if it wants to. In Service Fabric this is all done on the partition level. So all instances in a partition are moved together.

I don't know much about Project Orleans but I think you may have confused the notion of an actor and an actor type within Service Fabric.
An actor is an instance of an actor type - the relationship is similar to a classes and objects in object oriented languages.
In your case you'd have a single actor type for users e.g. UserActor but then you'd have many actor instances of that type. Those actor instances are the ones that hold state and are partitioned and distributed.

Related

Is it possible to replicate tables from multiple databases in Google Cloud?

The company that I work at uses a microservices architecture with the 'database per service' pattern. This pattern makes it harder to query based on data from multiple services, since each service has its own database. Imagine a service for managing your products and one for managing stock. You would have to somehow combine the data from both services to query for products based on stock.
I know that event sourcing and API composition are potential solutions to the problem, but I was wondering if it is possible to continuously replicate specific tables from the product and stock databases based on database transaction logs. Wouldn't this be much simpler than say implementing an event based solution like event sourcing? One service that I am working with contains a lot of domain events, which would make implementing and maintaining event-based solution rather complex.
Another reason for why I am considering to look at the problem from a different angle is that there is a lot of data. In-memory joins with say API composition will most likely be slow.
To sum it all up, I would like to know if it is possible to continuously replicate specific tables from different databases into one database.
The technologies that my company uses are primarily Spring Framework and PostgreSQL.
I would step back and ask why you have microservices (including why you have multiple databases). This is because it's quite easy to make choices that are superficially easy but which achieve that ease by negating the reason you had the microservices to begin with, and in such a situation, it may in fact be easier to just not do microservices.
For example, you might be doing microservices because you want to be able to have the team maintaining your product service be able to make changes without coordinating with the stock service or vice versa. By setting up a direct replication of a table from service A's database into service B's database, you essentially require many changes service A might want to make to that table to be coordinated with service B. It's perhaps less operationally coupled than unifying the services into a monolith, but in terms of developer velocity, you're giving up a fair amount.
Alternatively, if the rationale is to allow one service to be down (failures, maintenance, releases: doesn't matter) without taking the others down, a replication which guarantees strong consistency implies that taking service B's database down prevents service A from updating its database (because if you allowed service A to update its database in that situation, you couldn't have strong consistency).
Rather than direct replication, it might make sense to use change data capture (e.g. with Debezium) to publish a stream of changes from the transaction logs (e.g. to Kafka). The critical difference from logical replication is that the consumer can, for instance, choose to ignore updates to columns it doesn't care about: the stock service might include details like where things are stocked in a warehouse, for instance, which is data you don't need for answering a query like "show me the products in this category which are in stock". This can be a nice middle ground between going full event-sourcing and other approaches.

How can I divide one database to multi databases?

I want to decompose my application to adopt microservices architecture, and i will need to come up with a solid strategy to split my database (Mysql) into multiple small databases (mysql) aligned with my applications.
TL;DR: Depends on the scenario and from what each service will do
Although there is no clear answer to this, since it really depends on your needs and on what each service should do, you can come up with a general starting point (assuming you don't need to keep the existing database type).
Let's assume you have a monolithic application for an e-commerce, and you want to split this application into smaller services, each one with it's own database.
The approach you could use is to create some services that handles some parts of the website: for example you could have one service that handles users authentication,one for the orders, one for the products, one for the invoices and so on...
Now, each service will have it's own database, and here's come another question: which database a specific service should have? Because one of the advantages of this kind of architecture is that each service can have it's own kind of database, so for example the products service can have a non relational database, such as MongoDB, since all it does is getting details about products, so you don't have to manage any relation.
The orders service, on the other hand, could have a relational database, since you want to keep a relation between the order and the invoice for that order. But wait, invoices are handled by the invoice service, so how can you keep the relation between these two without sharing the database? Well, that's one of the "issues" of this approach: you have to keep services independent while also let them communicate each other. How can we do this? There is no clear answer here too... One approach could be to just pass all invoices details to the orders service as well, or you can just pass the invoice ID when saving the order and later retrieve the invoice via an API call to the invoice service, or you can pass all the relevant details you need for the invoice to an API endpoint in the order service that stores these data to a specific table in the database (since most of the times you don't need the entire actual object), etc... The possibilities are endless...

Distributed database design style for microservice-oriented architecture

I am trying to convert one monolithic application into micro service oriented architecture style. Back end I am using spring , spring boot frameworks for development. Front-end I am using angular 2. And also using PostgreSQL as database.
Here my confusion is that, when I am designing my databases as distributed, according to functionalities it may contain 5 databases. Means I am designing according to vertical partition. Then I am thinking to implement inter-microservice communication services to achieve the entire functionality.
The other way I am thinking that to horizontally partition the current structure. So my domain is based on some educational university. So half of university go under one DB and remaining will go under another DB. And deploy services according to Two region (two for two set of university).
Currently I am decided to continue with the last mentioned approach. I am new to these types of tasks, since it referring some architecture task. Also I am beginner to this microservice and distributed database world. Would someone confirm that my approach will give solution to my issue? Can I continue with my second approach - horizontal partitioning of databases according to domain object?
Can I continue with my second approach - Horizontal partitioning of
databases according to domain object?
Temporarily yes, if based on that you are able to scale your current system to meet your needs.
Now lets think about why on the first place you want to move to Microserices as a development style.
Small Components - easier to manager
Independently Deployable - Continous Delivery
Multiple Languages
The code is organized around business capabilities
and .....
When moving to Microservices, you should not have multiple services reading directly from each other databases, which will make them tightly coupled.
One service should be completely ignorant on how the other service designed its internal structure.
Now if you want to move towards microservices and take complete advantage of that, you should have vertical partition as you say and services talk to each other.
Also while moving towards microservices your will get lots and lots of other problems. I tried compiling on how one should start on microservices on this link .
How to separate services which are reading data from same table:
Now lets first create a dummy example: we have three services Order , Shipping , Customer all are three different microservices.
Following are the ways in which multiple services require data from same table:
Service one needs to read data from other service for things like validation.
Order and shipping service might need some data from customer service to complete their operation.
Eg: While placing a order one will call Order Service API with customer id , now as Order Service might need to validate whether its a valid customer or not.
One approach Database level exposure -- not recommened -- use the same customer table -- which binds order service to customer service Impl
Another approach, Call another service to get data
Variation - 1 Call Customer service to check whether customer exists and get some customer data like name , and save this in order service
Variation - 2 do not validate while placing the order, on OrderPlaced event check in async from Customer Service and validate and update state of order if required
I recommend Call another service to get data based on the consistency you want.
In some use cases you want a single transaction between data from multiple services.
For eg: Delete a customer. you might want that all order of the customer also should get deleted.
In this case you need to deal with eventual consistency, service one will raise an event and then service 2 will react accordingly.
Now if this answers your question than ok, else specify in what kind of scenario multiple service require to call another service.
If still not solved, you could email me on puneetjindal.11#gmail.com, will answer you
Currently I am decided to continue with the last mentioned approach.
If you want horizontal scalability (scaling for increasingly large number of client connections) for your database you may be better of with a technology that was designed to work as a scalable, distributed system. Something like CockroachDB or NoSQL. Cockroachdb for example has built in data sharding and replication and allows you to grow with adding server nodes as required.
when I am designing my databases as distributed, according to functionalities it may contain 5 databases
This sounds like you had the right general idea - split by domain functionality. Here's a link to a previous answer regarding general DB design with micro services.
In the Microservices world, each Microservice owns a set of functionalities and the data manipulated by these functionalities. If a microservice needs data owned by another microservice, it cannot directly go to the database maintained/owned by the other microservice rather it would call an API exposed by the other microservice.
Now, regarding the placement of data, there are various options - you can store data owned by a microservice in a NoSQL database like MongoDB, DynamoDB, Cassandra (it really depends on the microservice's use-case) OR you can have a different table for each micro-service in a single instance of a SQL database. BUT remember, if you choose a single instance of a SQL Database with multiple tables, then there would be no joins (basically no interaction) between tables owned by different microservices.
I would suggest you start small and then think about database scaling issues when the usage of the system grows.

Mongodb strategy for Multi-Company web app

I am developing a web app in Meteor, with Mongo, that will be running on cloud. Each user must belong to a Company.
Each Company can only access it's own data.
Each user can access it's own data and some data shared with other users of the same company.
Imagine 1.000 companies and 100 users per company, it could get very bad in performance and secutiry, if I use 1 Mongodb database for whole app.
So, because Mongo is "Schema-less and Database-less" I think I can define 1.000 dbs, lets say db_0001, db_0002, ... with same name collections, lets say tasks, messages, ..., so the app can be efficient and more secure (same code for every Company and isolation of data).
Also, on hosting side (let's say for example with Digital Ocean), I think its easier to distribute the dbs if the are already atomized.
Is this a good approach? Or should I not worry about it and let the hosting do this job?
Any thoughts are wellcome.
You are currently only looking at one side of the coin. That's fine to start with.
Think about how you are going to be displaying that data and what query does it translate to. Do a thorough due diligence on all the potential query. For example, how often would user/getbyid be called and how often would you have to show a user their info and their relationship with other users. What other meta data would be required beside user info, would you have to perform a join to get that data? or is it stored as an embedded document? What fields are you going to be searching and sorting by most? Which types of data are write heavy and what are read heavy?
Now lets get back to your database shading approach. It's great that you are thinking ahead of time on this front rather than having to rewrite your component later. Data volume/storage does not worry me here. How many concurrent users would be using at application and what are primary use cases should be the first place to look at to think about scale.
Additionally, you need to understand the nature of the business and project growth. Is it like Instragram type of hyper growth? or is it more predictable. A big Mongo cluster can handle thousands of concurrent read/write requests (assuming your design and query are optimized) so that does not bother me. If you want to keep it flexible MongoDB has a sharding mechanism and you can shard on a key and it takes care all the fancy stuff for ya.
MongoDB has eventual consistency (look up MongoDB CAP theorem) if you enable read from secondaries and you have a high volume business critical app you need to be careful because you can be reading out of date result.
As far as hosting is concerned, DO is fine but always have a backup in another region to maintain geographic redundancy so in case if a region goes down (Hello AWS!) you have something to fall back on.
Good luck on your project!

Microservices - Maintaining Multiple Data stores, initial data load etc

On aspects of granulatiry of mictoservices have read about the 2 pizza rule, services that can be developed in 2 weeks etc. When the case studies of amazon, nelflix, gilt are read we hear about 100s of services. While the service granularity does make sense, what is still not clear to me is about the data stores of each of these microservices. Will there not be just too many data stores if each of the services store/maintain their own data ?? It might be the same logical entity like a product, customer etc that is sliced & the relevant portion/attributes stored/maintained by a corresponding microservice. There could be a service that maintains basic customer information, another that maintains the additional customer information like say his subscription information or his interests etc.
Couple of questions that come to mind around the data stores
Will this not be a huge maintenance issue in terms of backups,
restores etc?
How is the initial data populated into these stores ? Are there any best practices around this ? Organisations are bound to have huge volumes of customer or product data & they will most likely be mastered in other systems.
How does this approach of multiple data stores impact the 'omni-channel' approach where it implies getting a single view of all data? Organizations might have had data consolidation initiatives going on to achieve the same
Edit: Edited the subject a bit
1.Will this not be a huge maintenance issue in terms of backups, restores etc?
From your view yes it will. I mean at the end of day you will not have just one database server to backup but tens or hundreds of them. But mostly people -at least that is what we do - is using a cloud database service to get rid of all these maintenance effort.
2.How is the initial data populated into these stores ? Are there any best practices around this ? Organisations are bound to have huge volumes of customer or product data & they will most likely be mastered in other systems.
I am not sure if there is a best way but we created a client to read the data from legacy system then convert and split it into the parts for each microservices and push them to those microservices by consuming their services. We used message queues to to be sure about health of migration.
3.How does this approach of multiple data stores impact the 'omni-channel' approach where it implies getting a single view of all data? Organizations might have had data consolidation initiatives going on to achieve the same.
Well I don't know what "omni-channel" is so I can't answer that.
Lastly you were mentioning about logical entities shared between services. The real hardest part about implementing microservices is defining what each service will provide. And while doing that you should carefully examine data needs for each services and those services should share as little as possible like only entity ids etc. At least that is what we are doing.

Resources