I have a database that I would like to be used by (n) number of applications.
This database sits behind a Webservice - So all CRUD operations call the respective webservice methods.
I will use a ticket based application as an example, although I'd imagine this could be expanded to most types of applications.
Let's say Site A - Is a site where tickets and events can be displayed and sold. Also Site A allows Authorized and Authenticated Users to add/remove events and tickets.
Let's say we also have Site B - Site B can only display and sell tickets and events. It cannot add or remove tickets and events.
Both sites are using the same database and webservice.
My question is - Is this a viable approach that will scale well? Is the single database a wise approach?
I don't understand what is the difference between "sell event" and "add event". The typical approaches for database scaling are:
Separate read and write calls. Write to single DB, read from multiple replicas
Separate entities to different databases. For example, store events in one database and tickets in another one.
Single database is a fine solution for a lot of applications. My suggestion is not to spend to mach on scaling at the beginning of your project, but keep in mind some ways how to scale it if required. It is nice that you have a single frontend to your database - in future you can add some logic there (like DB replication etc) and websites will use the same API without changes.
Related
I am new to sitecore and just trying to understand its architecture/design. Just curious to know how Intranet and Internet server communicates and how does the data flow happens between these two layers in on-prem and on AWS EC2 environment? I have surfed enough in the web and couldn't find the appropriate explanation.
Really appreciate if anyone can help me understand.
When u do a publish from CM, it puts a record in eventqueue table in Web Db.
all CD servers will hit the eventqueue table table for update and proceed.
default is 2 seconds once this hit happens.
In short, they communicate via events in the database(s). Note: This is very simplified but seeing it this way helped me understand how the events work and troubleshoot issues.
For example, when publishing an item, the publisher (running on CM or on a dedicated role) reads its data from the master database and writes it to the web database. When done, it raises an event by writing a row in the EventQueue table in web database. The CD server(s) picks up this event and clears its corresponding caches etc. causing a reload of that data from the web database.
All Sitecore databases have the EventQueue table and events goes to the table in different databases, depending on the type of event. An events is basically just a class name and a set of serialized data. Events can be raised "locally" and "globally" indicating if several instances should pick up the event. Think of a scenario where you have two CD servers sharing one web database, both CD's would have to pick up the event.
To keep track on what events has been processed, a "EQSTAMP" value is stored in the Properties table. It's named [database]_EQSTAMP_[InstanceName]. It's therefore essential that not two Sitecore instances share the same instance name. If not set, Sitecore will make an instance name by combining the hostname and IIS site name. The decimal Value of this timestamp corresponds to the hexadecimal Stamp column in the EventQueue table.
Normally, you should never have to play with these tables yourself, but I find it good to have some insights in how they work and keep an eye on them. They can grow in size and cause some issues. The CleanupEventQueue scheduled task is responsible for removing old processed events from the EventQueue tables. You may want to play with the scheduling of this agent if your EventQueue grows too large between cleanups.
Note: This is the most common way of communication between the servers. Later versions of Sitecore have other techniques as well, such as Rebus.
Event Queues. Why? How? When? article that explains it in detail, it also describes the pitfalls of using this mechanism in real life as well.
Please also be aware that Sitecore.Link project is a good place to get more knowledge regarding Sitecore functionality.
It accumulates Sitecore knowledge all around the web.
Thanks.
I have One Database with one domain. But my Database have 3 Websites available. I want my 2nd Website for publish in that Database. Is that possible ???
You might want to make sure that you're not violating the terms of service with the company who is hosting your database. Having many outside domains hitting an inside database may cause some undue stress on that server that the company is not counting on or eating up more bandwidth that is allotted for that machine.
In the same breath though, if you setup some type of data layered web service which you can connect to, then your many other domains are not directly hitting the database and do essentially the same thing, but in a more ordered fashion of predictable database calls. This may not be what you're looking for, but if setup correctly it could make developing against your database much easier.
It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?
There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.
Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !
I am now trying to design database for my micro service-oriented application in a distributed way. My application is related with management of universities. I have different universities say A, B, C. Each university have separate users for using their business data. Now I am planning to design separate databases for separate universities for storing their user data. So each university has their own database for their users and additional one database for managing their application tables. If I have 2 universities, Then I have 2 user details DB and other 2 DB for application tables.
Here my confusion is that, when I am searching for database design, I only see the approach of keeping one common database for storing all users (Here for one DB for all users of all universities). So every user is mixed within one database.
If I am following separate database for each university, Is possible to support distributed DB architecture pattern and micro service oriented standard? Or Do I need to keep one DB for all users?
How can I find out which method is appropriate for microservice / Distributed database design pattern?
Actually there could be multiple solutions and not one solution is best, the best solution is the one which is appropriate for your product's requirements.
I think it would be a better idea to go with separate databases for each of your client (university) to keep the data always isolated even if somethings wrong happens. Also with time, the database could go so huge that it could cause problems to configure/manage separate backups, cleanups for individual clients etc.
Now with separate databases there comes a challenge for managing distributed transactions across databases as you don't know which part is going to fail among many. To manage that, you may have to implement message/event driven mechanism across all your micro-services and ensure consistency.
Regarding message/event mechanism, here is a simple use case scenario, suppose there are two services "A" (user-registration) and "B" (email-service)
"A" registers a user temporarily and publishes an event of sending confirmation email.
The message goes to message broker
The message is received by "B".
The confirmation email is sent to the user.
The user confirms the email to "B"
The "B" publishes event of user confirmation to the broker
"A" receives the event of confirmation and the process is completed.
The above is the best case scenario, problems still can happen in between even with broker itself.
You have to go deep into it if you think you need this.
Some links that may help.
http://how-to-implement-a-microservice-event-driven-architecture-with-spring-cloud-stre
A Guide to Transactions Across Microservices
I don't think that this is a valid design, using a database per client which is a Multi-tenant architecture practice, and database per microservice is a microservice architecture practice. you are mixing things up.
if you will use microservice architecture you better design it as Bounded contexts and each Context has its own database to achieve microservices main rule Autonomy
I am designing an review analysis platform in microservices architecture.
Application is works like below;
all product reviews retrieved from ecommerce-site-a ( site-a ) as an excel file
reviews are uploaded to system with excel
Analysis agent can list all reviews, edit some of them, delete or approve
Analysis agent can export all reviews for site-a
Automated regexp based checks are applied for each review on upload and editing.
I have 3 microservices.
Reviews: Responsible for Review Crud operations plus operations similar to approve/reject..
Validations: Responsible for defining and applying validation rules on review.
Export/Import: Export service exports huge files given site name (like site-a)
The problem is at some point, validation service requires to get all reviews for site-a, apply validation rules and generate errors if is there any. I know sharing database schema's and entities breaks micro-services architecture.
One possible solution is
Whenever validation service requires reviews for a site, it requests gateway, gateway redirects request to Reviews service and response taken.
Two possible drawbacks of this approach is
validation service knows about gateway? Is it brings a dependency?
in case I have 1b reviews for a site, getting all reviews via rest request may be a problem. ( or not, I can make paginated requests from validation service to gateway..)
So what is the best practice for sharing huge data between micro-services without
sharing entity
and dublicating data
I read lot about using messaging queues but I think in my case it is not good to use messaging queue to share gigabytes of data.
edit 1: Instead of sharing entity, using data stores with rest API can be a solution? Assume I am using mongodb, instead of sharing my entity object between microservices, I can use rest interface of mongo (http://restheart.org/) and query data whenever possible.
Your problem here is not "sharing huge data", but rather the boundaries you choose to separate your micro services based on.
I can tell from your requirements that the 3 micro services you chose to separate (Reviews, Validations, Import/Export) are actually operating on the same context and business domain .. which is Reviews.
I would encourage you to reconsider your design decision and consider Reviews, as a single micro service, that handles all reviews operations and logic as a black box.
I assume that reviews are independent from each other and that validating a review therefore requires only that review, and no other reviews.
You don't want to share entities, which rules out things like shared databases, Hadoop clusters or data stores like Redis. You also don't want to duplicate data, thereby ruling out plain file copies or trigger-based replication on database level.
In summary, I'd say your aim should be a stream. Let the Validator request everything from Reviews about Site A, but not in one bulk, but in a stream of single or small packages of reviews.
The Validator can now process the reviews one after the other, at constant memory and processor consumption. To gain performance, you can make multiple instances of the Validator who pull different, disjunct pieces of the stream at the same time. Similarly, you can create multiple instances of the Reviews microservice if one alone wouldn't be able to answer the pull fast enough.
The Validator does not persist this stream, it produces only the errors and stores or sends them somewhere; this should fulfill your no-sharing no-duplication requirements.