Wisdom of merging 100s of Oracle instances into one instance

Wisdom of merging 100s of Oracle instances into one instance - oracle

Our application runs on the web, is mostly an inquiry tool, does some transactions. We host the Oracle database. The app has always had a different instance of Oracle for each customer. A customer is a company which pays us to provide our service to the company's employees, typically 10,000-25,000 employees per customer. We intend to have several hundred customers. We do a major release every few years, and migrating to that new release is challenging: we might have a team at the customer site for a couple weeks, explaining new functionality and setting up the driving data to suit that customer.
We're considering going multi-client, putting all our customers into a single shared Oracle 11g instance on a big honkin' Windows Server 2008 server -- in order to reduce costs. I'm wondering if that's advisable.
There are some advantages to having separate instances for each customer. Tell me if these are bogus, please. In my rough guess about decreasing importance:
Our customers MyCorp and YourCo can be migrated separately when breaking changes are made to the schema. (With multi-client, we'd be migrating 300+ customers overnight!?!)
MyCorp's data can be easily backed up and (!!!) restored, without affecting other customers.
MyCorp's data is securely separated from their competitor YourCo's data, without depending on developers to get the code right and/or DBAs getting the configuration right.
Multiple instances are lower risk, because a disaster with one customer (someone accidentally doubles everyone's salary and the error is discovered after pay day) doesn't affect other customers. A disaster that affected ALL our customers (whoops, new DBA, and suddenly every participant has the same SSN!?!) might put our company under.
Having one instance on one server presents a single point of failure, with our entire customer base out of business if a hurricane knocks the building over. Multiple instances on multiple servers permits geographic dispersion: no catastrophe will affect too large a proportion of our customers, and the unaffected servers in other regions can take on the load of the failed servers.
Performance is better because the database is smaller (10,000 vs 2,000,000 rows in ~50 tables).
If MyCorp's offices are (mostly) in just one region, then the MyCorp's instance can be geographically co-located there, so network lag doesn't hurt performance. We can provide better service to global clients, for the same reason.
In MyCorp wants to take their database in-house, then we can easily export their instance, to get MyCorp their data.
Load-balancing is easier because instances can be placed on different servers (this is with a web farm).
When a DEV or QA instance is needed, it's easier to clone the real instance and anonymize the data, because there's much less data.
Because they're small enough, developers can have their own instance running locally, so they can work on code while waiting at the airport and while in-flight, without fighting VPN hassles.
Q1: What are other advantages of separate instances?
We are contemplating changing the database schema and merging all of our customers into one Oracle instance, running on one hefty server.
Here are advantages of the multi-client instance approach, most important first (my WAG). Please snipe if these are bogus:
Less work for the DBAs, since they only need to maintain one instance instead of hundreds. Less DBA work translates to cheaper, our main motive for this change.
With just one instance, the DBAs can do a better job of optimizing performance. They'll have time to add appropriate indexes and review our SQL.
It will be easier for developers to debug & enhance the application, because there is only one schema and one app (there might be dozens of schema versions if there are hundreds of instances, with a different version of the app for each version of the schema). This reduces costs too. The alternative is having to start every debug session with (1) What version is this customer running and (2) Let's struggle to recreate the corresponding development environment, code and database. (We need a Virtual Machine that includes the code AND database instance for each patch and release!)
Licensing Oracle is cheaper because it's priced per server irrespective of heft (or something -- I don't know anything about the subject).
The database becomes a viable persistent store for web session data, because there is just one instance.
Some database operations are easier with one multi-client instance, like finding a participant when they're hazy about which customer they (or their spouse, maybe) works for: all the names are in one table. Reporting across customers is straightforward.
Q2: What are other advantages of having multiple clients in one instance?
Q3: Which approach do you think is better (why)? Instance per customer, or all customers in one instance?
I'm concerned that having one multi-client instance makes migration near-impossible, and that's a deal killer...
... unless there is a compromise solution like having two multi-client instances, the old and the new. In that case case, we would design cross-instance solutions for finding participants, reporting, etc. so customers could go from one multi-client instance to the next without anything breaking.

Unless you are using Oracle XE (the limited, free edition) having one database per server will get very expensive very quickly, even if you're buying single core, single CPU boxes. Having several databases per server is inefficient, because each database incurs an overhead of CPU and RAM usage. Tuning is more difficult, because contention is harder to diagnose.
So, as well as being easier to administer, a single big server ought to work out cheaper than lots of discrete little servers (no guarantees, no money back!). Make sure you buy the biggest, fastest chips you can and as much RAM as you have free slots. Those are things which give you better performance without affecting your licensing costs.
Consider the Partitioning option, if you can afford it. This will address your concerns regarding backup and recovery, because each partition can have its own tablespace. So (given partitioning by client_id) it becomes possible to backup or restore an individual client's data without affecting the other clients. We can even export and import individual partitions. I'm surprised by David's observation that Partition pruning didn't work with VPD. But I haven't tried this combo, so I'll take his word for it.
The one thing you might lose from consolidation is the ability to support different clients on different versions of your application. However, this is not necessarily a bad thing. As you observe, maintaining several hundred customers will be a lot easier if you forgo individualised versions of the application. If you do need to offer some bespoke features - even if you just want to beta test some functionality with an individual client - then have a look at Edition-Based Redefinition in 11gR2: it is a really nifty feature. Also it is available for all Oracle licenses, not just Enterprise.

When you say 'separate instances', are you talking about one instance with multiple schemas on it? Or do you really mean multiple instances running on a single machine? There is little reason to run multiple instances on a single machine, as opposed to running multiple schemas on a single instance - each schema would still have their own set of tables, indexes, etc.
Anyways, I don't have a full answer, but one thing to keep in mind is the licensing costs of Oracle, and how that can affect what the optimal solution is.
According to the Oracle store,
Oracle standard edition one is $5,800.00 / Processor (where on x86, a processor is a socket, and you can go to up 2 sockets)
Oracle standard edition is $17,500.00 / Processor (where on x86, a processor is a socket, and you can go to up 4 sockets)
Oracle enterprise edition is $47,500.00 / Processor (where on x86, a processor is 2 CORES - so you have to effectively double that price for quad core CPUs)
So if, for example, you need 8 quad core CPUs to handle 100 customers, licensing that on a single database is VASTLY more expensive than having 4 separate databases, each having 2 quad core CPUs, each running 25 customers.
8 quad core CPUs requires enterprise edition, and would have a list price of 16 x $47,500 = $760,000. 4 machines, each running standard edition one, and each with 2 quad-core CPUS, would have a list price of 8 x $5,800.00 = $46,400 - a factor of 16 difference. Now, keep in mind that no one pays list price for enterprise edition, but there is still a huge difference to consider.
If you don't have a huge need for database operations across clients, and you don't need enterprise edition features, and you need this level of CPU power (or expect to grow to need this level of CPU power), the licensing costs are going to be a huge downside of the one-instance approach.

It may be worth researching salesforce, and the buzz word you're looking for is "multi tenant architecture"
This makes a good read:
http://blog.dayspring-tech.com/2009/02/forcecom-multitenant-architecture-under-the-covers/
It's a good example because Salesforce do use an Oracle db under the covers.

Good question, glad to see you are considering all the alternatives. Lots of good points but I will stick to just addressing one.
I was the DBA for a hosted application and the developers decided to use Oracle Virtual Private Database feature for this.
The application was constructed with intention of customers sharing a pool of app servers for load balancing and a single database schema on the back end.
Before VPD we had a Java class that tacked "where customer_id=?" or "and customer_id=?" on every query right before it went to the database so the customer would only see their data. To implement this in VPD upon login ot the DB we would have the app set a variable in the app context that would be used by the VPD policies to allow the session to only see their records. So yeah, you have to code it up right and assign VPD policies to tables, and also trust that Oracle holds up their end of the bargain.
So was it good for us? In theory it was nice to offload the SQL predicate handling to something outside our application but in practice the advantages didn't outweight the disadvantages.
When we had dozens of clients in one database and when we upgraded they all had to get upgraded at the same time. We had lots of tug-of-wars with customers that didn't want to upgrade for whatever reason or wanted to do their own QA on the new versions.
We entertained the Old instance/New instance thing for upgrades but migrating data was risky and associated downtime did not make customers happy. We did roll our own procedure that would step through tables and export data... But certainly not as easy as a quickie Export or Data Pump job.
We also had issues with VPD predicate analysis when it came to Partitioning. As with alot of Oracle features they may work OK on their own but once you combine with other features things get unpredictable. For us partitions not related to the current customer_id weren't getting eliminated because the predicate analysis was coming too late in the processing of the SQL statement. We worked around it by changing from static to dynamic VPD policies but our time spent parsing shot up.
So after all that what is my take on it? I would have spent the time making sure our app made good use of bind variables and continued with the old mechanism that added customer_id to the SQL statement.

Oracle is made to handle that kind of load.
My Question - What do you do when you have thousand customers and say ten thousand?
Do you still keep separate instances/schema?
I doubt anyone will do that. I have worked earlier in a place where each client had separate database as well as a copy at a central place.
Change management becomes a headache, you'd have to maintain a very good information about which client/company is on which database revision, schema, app version and all those things. This'd become a software in itself.
I'd suggest to create software/design based around SaaS model, that'll allow you easy maintenance and same database/schema for all users.
For Reliability you can still use clustering - Oracle RAC.

I've had to consider the same decision a few times. In our case we use MySQL, so there is no cost associated with running all customers in a separate database.
The benefits to running all of our customers on a separate database have been great. We have a script that lets us move a customer's entire instance to any server to balance load. The script merely copies over the database, copies over any custom files, spins up the application, and sets up our routing system to send users to the new instance. The whole process takes just a few minutes.
Database changes can take a very long time on large mysql databases. Since all our clients have their own database we are able to keep all of our datasets small. Backups are also very fast.
Our development instances behave the same way, so this method allows us to run a variety of database schemas simultaneously as we develop and test new features. We often work with customers to have them try out a new feature before we deploy it to the rest of our instances. The one rule that we stick to (in order to avoid a few of the drawbacks you mention), is that all clients must be within one version of each other. Maintaining more than a couple versions across clients would have a huge overhead.
Facebook took the same approach when they started their company. Each school that they launched at had a separate database and they were able to set up new instances very quickly. The primary reason they finally consolidated their database was that they wanted to enable users to communicate between schools.
If not for potential cost issues I would definitely encourage you to stick with the separate database approach.

Related

Is it possible to replicate tables from multiple databases in Google Cloud?

The company that I work at uses a microservices architecture with the 'database per service' pattern. This pattern makes it harder to query based on data from multiple services, since each service has its own database. Imagine a service for managing your products and one for managing stock. You would have to somehow combine the data from both services to query for products based on stock.
I know that event sourcing and API composition are potential solutions to the problem, but I was wondering if it is possible to continuously replicate specific tables from the product and stock databases based on database transaction logs. Wouldn't this be much simpler than say implementing an event based solution like event sourcing? One service that I am working with contains a lot of domain events, which would make implementing and maintaining event-based solution rather complex.
Another reason for why I am considering to look at the problem from a different angle is that there is a lot of data. In-memory joins with say API composition will most likely be slow.
To sum it all up, I would like to know if it is possible to continuously replicate specific tables from different databases into one database.
The technologies that my company uses are primarily Spring Framework and PostgreSQL.

I would step back and ask why you have microservices (including why you have multiple databases). This is because it's quite easy to make choices that are superficially easy but which achieve that ease by negating the reason you had the microservices to begin with, and in such a situation, it may in fact be easier to just not do microservices.
For example, you might be doing microservices because you want to be able to have the team maintaining your product service be able to make changes without coordinating with the stock service or vice versa. By setting up a direct replication of a table from service A's database into service B's database, you essentially require many changes service A might want to make to that table to be coordinated with service B. It's perhaps less operationally coupled than unifying the services into a monolith, but in terms of developer velocity, you're giving up a fair amount.
Alternatively, if the rationale is to allow one service to be down (failures, maintenance, releases: doesn't matter) without taking the others down, a replication which guarantees strong consistency implies that taking service B's database down prevents service A from updating its database (because if you allowed service A to update its database in that situation, you couldn't have strong consistency).
Rather than direct replication, it might make sense to use change data capture (e.g. with Debezium) to publish a stream of changes from the transaction logs (e.g. to Kafka). The critical difference from logical replication is that the consumer can, for instance, choose to ignore updates to columns it doesn't care about: the stock service might include details like where things are stocked in a warehouse, for instance, which is data you don't need for answering a query like "show me the products in this category which are in stock". This can be a nice middle ground between going full event-sourcing and other approaches.

How to keep design docs in sync across per-user databases

I am building an app, for example, todoapp.
Features:
Offline-first (couchdb + pouchdb)
Multi-user
CouchDB options: require_valid_user, couch_peruser.
Ok, each user has a private database. But how can I validate docs on post/put?
design/validate_doc_update must be in every user db (userdb-{hex}).
How can I place it there and update it sometimes all at once? Sync? Third db/replicate? How can I replicate to all userdbs?

You have three basic options, which are all quite similar:
Set up continuous replications between a prototype database and each user's database. For large numbers of users, and low frequency of updates, this could amount to a lot of overhead for little gain.
Trigger a one-off replication between a prototype database and each user's database, every time an update occurs. This requires you to know when an update occurs in your application, and to manually handle any replication failures that may occur. This overhead may be pretty small in simple scenarios.
Have your application update the design docs every time they change, for each user database. This is sort of a manual sync option.
Which option you choose is really up to you, and is a matter of trade-offs.
If you would ever desire the option to update only a subset of users (say a beta testing group), then option 2 or 3 are going to be your best bets.

Mongodb strategy for Multi-Company web app

I am developing a web app in Meteor, with Mongo, that will be running on cloud. Each user must belong to a Company.
Each Company can only access it's own data.
Each user can access it's own data and some data shared with other users of the same company.
Imagine 1.000 companies and 100 users per company, it could get very bad in performance and secutiry, if I use 1 Mongodb database for whole app.
So, because Mongo is "Schema-less and Database-less" I think I can define 1.000 dbs, lets say db_0001, db_0002, ... with same name collections, lets say tasks, messages, ..., so the app can be efficient and more secure (same code for every Company and isolation of data).
Also, on hosting side (let's say for example with Digital Ocean), I think its easier to distribute the dbs if the are already atomized.
Is this a good approach? Or should I not worry about it and let the hosting do this job?
Any thoughts are wellcome.

You are currently only looking at one side of the coin. That's fine to start with.
Think about how you are going to be displaying that data and what query does it translate to. Do a thorough due diligence on all the potential query. For example, how often would user/getbyid be called and how often would you have to show a user their info and their relationship with other users. What other meta data would be required beside user info, would you have to perform a join to get that data? or is it stored as an embedded document? What fields are you going to be searching and sorting by most? Which types of data are write heavy and what are read heavy?
Now lets get back to your database shading approach. It's great that you are thinking ahead of time on this front rather than having to rewrite your component later. Data volume/storage does not worry me here. How many concurrent users would be using at application and what are primary use cases should be the first place to look at to think about scale.
Additionally, you need to understand the nature of the business and project growth. Is it like Instragram type of hyper growth? or is it more predictable. A big Mongo cluster can handle thousands of concurrent read/write requests (assuming your design and query are optimized) so that does not bother me. If you want to keep it flexible MongoDB has a sharding mechanism and you can shard on a key and it takes care all the fancy stuff for ya.
MongoDB has eventual consistency (look up MongoDB CAP theorem) if you enable read from secondaries and you have a high volume business critical app you need to be careful because you can be reading out of date result.
As far as hosting is concerned, DO is fine but always have a backup in another region to maintain geographic redundancy so in case if a region goes down (Hello AWS!) you have something to fall back on.
Good luck on your project!

Disadvantages of consolidating databases?

In an organization that has two applications each with its own Oracle database instance, what are the disadvantages of consolidating the two databases into one database with two schemas?
Backups and replicating the database are bigger and slower, probably. What else?
Some background:
The two databases are the "gold source" for their respective data. Each is critical to the operation of the organization and each is actually used by several appliations, tools, and reports (but each database is principally "owned" by one application). The need to join data across the databases, to relate entities in one to entities in the other, comes up frequently. For this reason there are DB links connecting the two and some cross-database materialized views to help with performance. There is an effort underway to reduce data duplication and these materialized views are under discussion. Some in the organization want to phase out DB links and materialized views and introduce more web services to make the data available across applications. My concern is that there are too many situations that require complex joins of data across the two databases so services that expose the data won't perform. Another approach for reducing DB links and materialized views is to consolidate the schemas into one database, but I want to make sure I'm not forgetting any critical disadvantages to that approach.

In a single consolidated database, you will lose some flexibility from a DBA point of view:
A database obviously can have only one version (10.2.0.5 for example), which means that upgrades and patches will affect all schemas -- this may be a bad thing in case of multiple vendor app requirement mismatch.
Similarly, some administrative tasks (restore database A to point in time t) may be more complicated with a single database.
Overall, you will have less administration tasks (a single backup, single patching...) but each task will be more critical since they will have a global effect.
On the development side, beware of namespace collisions: some features are global over a single database, for example:
directories,
public synonyms,
DB link
Schemas
This means that you will have some work to do if you want to consolidate two databases that have public synonyms with the same name that points to two different things.

Could have something to do with licence costs - scaling up vs. scaling out.

The biggest concern I would have is that all your code will need to be rewritten to account for the new database and schemas. Or at least looked at. This courl introduce new bugs. I don't know how Oracle handles refernces to different databases, so I'll use an example of what I mean using SQL Server syntax. If I was joining to two tables onthe same server in different databases my select would be something like this:
SELECT a.field1, b.field2
FROM database1.dbo.table1 a
JOIN database2.dbo.table2 b
ON a.myid = b.myFK
To go your your new consolidated idea, you would want to write:
SELECT a.field1, b.field2
FROM schema1.table1 a
JOIN schema2.table2 b
ON a.myid = b.myFK
You will need to be especially careful of any tables that have the same name in both databases now, this could cause some sneaky bugs.
Note these are not difficult changes but all SQL hitting your database would have to be examined to see if it will work or adjusted if not.
I'm not sure if just putting them in the same database would do it either. You might need to consolidate some tables to avoid the duplication across applications. (In this case add fields to reference the old id numbers for things people are used to looking up by id like person_id that may appear on old paperwork, so they can be researched) This is a fairly major rewrite with all the attendant possibilites to make things worse due to new bugs.
If you go down this path, I highly recommend that you read a book on refactoring datbases before you decide how to design.

its hard to tell by just the information provided, big in db world would be 100gb or more, so 2 dbs would be 200GB. if both db are not bigger than 100GB then size should not be a huge factor in the decision, replication and sync can be done on changes only and backups should not be a big difference (again this depends on specifics such as when backups are done or if downtime is possible or backups are done during non-peak times)
Other than that other factors are:
naming collisions in dbo's such as keys, foreign key names, table names, etc. some renaming of tables, store procedures names too.

One database or many?

I am developing a website that will manage data for multiple entities. No data is shared between entities, but they may be owned by the same customer. A customer may want to manage all their entities from a single "dashboard". So should I have one database for everything, or keep the data seperated into individual databases?
Is there a best-practice? What are the positives/negatives for having a:
database for the entire site (entity
has a "customerID", data has
"entityID")
database for each
customer (data has "entityID")
database for each entity (relation of
database to customer is outside of
database)
Multiple databases seems like it would have better performance (fewer rows and joins) but may eventually become a maintenance nightmare.

Personally, I prefer separate databases, specifically a database for each entity. I like this approach for the following reasons:
Smaller = faster regarding the queries.
Queries are simpler.
No risk of ever accidentally displaying one customer's data to another.
One database could pose a performance bottleneck as it gets large (# of entities increase). You get a sort of build in horizontal scalability with 1 per entity.
Easy data clean up as customers or entities are removed.
Sure it'll take more time to upgrade the schema, but in my experience modifications are fairly uncommon once you deploy and additions are trivial.

I think this is hard to answer without more information.
I lean on the side of one database. Properly coded business objects should prevent you from forgetting clientId in your queries.
The type of database you are using and how it scales might help you make your decision.
For schema changes down the road, it seems one database would be easier from a maintenance perspective - you have one place to make them.

What about backup and restore? Could you experience a customer wanting to restore a backup for one of their entities?

This is a fairly normal scenario in multi-tenant SAAS applications. Both approaches have their pros and cons. Search on best practices for multi-tenant SAAS (software as a service) and you will find tons of stuff to ponder upon.

Check out this article on Microsoft's site. I think it does a nice job of laying out the different costs and benefits associated with Multi-Tenant designs. Also look at the Multi tenancy article on wikipedeia. There are many trade offs and your best match greatly depends on what type of product you are developing.

One good argument for keeping them in separate databases is that its easier to scale (you can simply have multiple installations of the server, with the client databases distributed across the servers).
Another argument is that once you are logged in, you don't need to add an extra where check (for client ID) in each of your queries.
So, a master DB backed by multiple DBs for each client may be a better approach,

If the client would ever need to restore only a single entity from a backup and leave the others in their current state, then the maintenance will be much easier if each entity is in a separate database. if they can be backed up and restored together, then it may be easier to maintain the entities as a single database.

I think you have to go with the most realistic scenario and not necessarily what a customer "may" want to do in the future. If you are going to market that feature (i.e. seeing all your entities in one dashboard), then you have to either find a solution (maybe have the dashboard pull from multiple databases) or use a single database for the whole app.
IMHO, having the data for multiple clients in the same database just seems like a bad idea to me. You'll have to remember to always filter your queries by clientID.

It also depends on your RDBMS e.g.
With SQL server databases are cheep
With Oracle it is easy to partition tables by customer "customerID", so a single large database can run as fast as a small database for each customer.
However witch every you choose, try to hide it as a low level in your data access code

Do you plan to have your code deployed to multiple environments?
If so, then try to keep it within one database and have all table references prefixed with a namespace from a configuration file.

The single database option would make the maintenance much easier.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio