both the inheritance and the mapping - spring

Different tenants/ customers of a Spring Boot/ Hibernate application want to store slightly different data fields for their employees. About 60% of the entity's fields are the same, and 40% different. There will be no queries covering multiple tenants, only one tenant is queried at a time. Typically, employees will be searched for by customer-specific fields and single employees updated.
A fellow developer recommends that you use joined table inheritance mapping.
What do you think about their recommendation (both the inheritance and the mapping)? Which other options are there? How would you approach this situation?

Related

What is the most performant way to add/remove join table associations for a bi-directional many-to-many association of complex entities?

Just to be clear, I'm NOT asking how to get things WORKING, but what sort of approach will have the best PERFORMANCE.
The reason I'm so focused on performance is because I'm dealing with 2 very complex and hierarchical entities. To give you a little background, the program is for tracking attendance, so the 2 entities in question are Locations (where people attend) and Reasons (if people can attend and why). These 2 entities are in a bi-directional many-to-many relationship where each entity has multiple thousands of records in the database and are completely independent from each other aside from the mapping table that contains the associations. There can be as few as 1 and infinitely many Reasons a person may attend a Location, and as few as 1 and infinitely many Locations a Reason can be used at.
i.e. a Reason of 'person is in security group RED' is used to grant access to locations 'Lab A', 'Lab B', and 'Lab C', resulting in the join table containing 3 entries, each with the 'person is in security group RED' as the Reason side of the relationship and each lab location to associate it to each of the lab Location entities. Conversely, Location 'Lab A' may also include other Reasons, such as a 'security group BLUE' and 'training event X' and will thus have additional records in the join table to those Reasons.
Both entities are complex in and of themselves and have many cascading associations of their own. i.e. the Reason entity has a 3-tiered cascading relationship structure of its own to maintain, and the Location entity has many other associations it's concerned with completely independent from Reasons. I stress this to emphasize the performance impact concerns of always pulling in all the entity associations just to update the join table entries.
Most of the time when a Location or Reason is being updated, it will be completely independent from the associations they have with one another, and so I don't want to worry about the Location <--> Reason join table for normal operations. However the links between Locations and Reasons are crucial to the program's primary functions and thus adding and removing these associations are just as crucial. So when it comes to adding/removing these associations, what is the mos performant way of doing so?
I want to avoid the cascade behaviors if I can, because I don't want unrelated changes made to these entities to trigger a complex cascade save. i.e. when Location 'X' changes, I don't want to bring in its many Reasons and all their 3-tiered cascaded dependencies when I'm not changing anything related to Location 'X's reasons at the time. I also don't want to bring in all of Location 'X's Reasons (and their dependencies) just to add/remove an entry in the Location <--> Reason join table for that Location (and conversely the same argument for the Reason.)
I was wondering if there was a way to target the Location <--> Reason join table association directly when I want to add/remove such associations, but otherwise keep it pretty light when dealing with regular operations on the entities.
i.e. I still want to get basic details about the Reasons associated with a Location when I go to change other details about the Location, without cascading the save operation of that Location to its Reasons (and thus each Reasons' downstream associations), and vice versa, and when it comes time add or remove entries to the join table for these entities, I don't want those entities to be saved directly, because then I'd have to worry about all their other data and associations getting updated.
If such a setup isn't possible with hibernate and its nature for tracking entities rather than their associations, which association setup for these entities would have the best performance, knowing that each entity already has many other cascading operations to deal with and most operations on these entities will NOT touch the join table between them at all, and only those targeted add/remove operations will specifically need to update JUST that join table between the location and the reason and nothing else.
First of all, you almost always don't want a real many-to-many association. There will be a time where you want to track additional data like a timestamp for these join tables which is when you would have to rewrite your logic, so I suggest you always try to think about modelling the join table as entity with two many-to-one associations and on the inverse side, one-to-many associations. The entity for the join table has a composite id, consisting of the ids of the many-to-one associations.
This enables you to manage entries manually by doing persist/merge/remove or use update/delete DML queries. This will perform best and the only "downside" is that you might do work upfront that you is possibly not need, because it might in fact be a real many-to-many, although I highly doubt that.

Managing many companies with Oracle

My company is storing its data in an Oracle database.
It offers services to several other companies, each of them having thousands of customers placing orders.
For historical reasons, all of the data are in the same schema and, for example, all orders are in the same table, say ORDERS.
It now has performance problems and I'd like to take this opportunity to manage two goals :
eradicate performance problems
separate companies data for security concerns.
My initial thought was to use one schema for each company we serve :
schema A with a table named ORDERS containing all of the orders of company A customers
schema B with a table named ORDERS containing all of the orders of company B customers
etc. ...
but we have another concern : we also have some "super-companies" which also can manage data for many companies. For exemple : "supercompany" managing orders for both companies A and B.
With this approach, is there any mean to manage these "supercompany" ? I thought declaring another schema (let's say SUPERCOMPANY) having a synonym which refer to the union of ORDERS tables from A and B but :
Is there any performance problem having a synonym referencing a union ? Are tables' indexes used ?
How about INSERT ? How to target the appropriate table if "supercompany" wants to add an order for a customer which belong to company A ?
Should we better use another solution, like still having one big database and having schemas referecing appropriate partition of the big table ORDERS ? (I don't even know if this is possible)
schema DB containing data with huge ORDERS table, partitionned by company.
schema A having a synonym referencing DB.ORDERS#partitionA
schema B having a synonym referencing DB.ORDERS#partitionB
schema SUPERCOMPANY having a synonym referencing DB.ORDERS or DB.ORDERS#partitionA and ORDERS#partitionB
It doesn't sound good to me because we shouldn't directly target partitions, are we ?
I still have hope and I'm sure Oracle has solutions for such problems since it is a major player for relational databases.
What would be your approach ?
Thanks for your reading.
It sounds as though partitioning would be a good fit for you performance issue.
For the security issue you could look at Virtual Private Database which is designed for exactly this type of scenario. I don’t think you would even need synonyms (you’d probably need views if you went this route) as you would set up a policy such that depending on the user account you connected as, Oracle would apply the appropriate filter automatically to all affected queries.
You might look at using services too, to give more options for monitoring performance by company.

Multi-tenant database. One collection or one db per tenant?

For a multi-tenancy architecture for a web application using a document-oriented database I can see two conceivable options:
Having one database per tenant, and the collections logically separate different kinds of object.
Having one collection per tenant, and all user data is stored in one database, with some kind of flag or object type identifier on each record.
Have there been any studies or has any documentation been produced regarding these two options and the differences between them?
Is there a particular standard or good reason why someone designing a web application which allows multiple users to store vastly different kinds of data would choose one over the other?
Aside from speed/efficiency issues, are there any other things to be said about this that would influence the decision?
EDIT I'm aware some of the terminology might be database specific, so for all wondering I am specifically referring to MongoDB.
I wouldn't want tenant specific collections. In my application, I usually hard code collection names, in the same way as I'd hardcode table names if I were using SQL tables. There'd be one comments collection that stores all comments for a blog. I would not want to deal with collection names like comments_tenant_1 and comments_tenant_2, because 1) that feels error prone, and 2) would make the application code more complicated (collection names would have to be replaced with functions that computed the collection name). And 3) the number of collections in a single database could grow huge, which would make a list of all collections look daunting, and also MongoDB isn't built for having very many collections (see the link in the comment below your question, which David B posted, https://docs.mongohq.com/use-cases/multi-tenant.html).
However, database names aren't coupled to application data structures, and you can grant permissions on databases (but not on single collections). So one database per tenant could be reasonable. As could be a per document tenant_id field in a single database for all tenants (see the above-mentioned link).

solr More than on entity in DataImportHandler

I need to know what is the recommended solution when I want to index my solr data using multiple queries and entities.
I ask because I have to add a new fields into schema.xml configuration. And depends of entity(query) there should be different fields definition.
query_one = "select * from car"
query_two = "select * fromm user"
Tables car and user have differents fields, so I should include this little fact in my schema.xml config (when i will be preparing fields definition).
Maybe someone of you creates a new solr instance for that kind of problem ?
I found something what is call MultiCore. Is it alright solution for my problem ?
Thanks
Solr does not stop you to host multiple entities in a single collection.
You can define the fields for both the entities and have them hosted within the Collection.
You would need to have an identifier to identify the Entities, if you want to filter the results per entity.
If your collections are small or there is a relationship between the User and Car it might be helpful to host them within the same collection
For Solr Multicore Check Answer
Solr Multicore is basically a set up for allowing Solr to host multiple cores.
These Cores which would host a complete different set of unrelated entities.
You can have a separate Core for each table as well.
For e.g. If you have collections for Documents, People, Stocks which are completely unrelated entities you would want to host then in different collections
Multicore setup would allow you to
Host unrelated entities separately so that they don't impact each other
Having a different configuration for each core with different behavior
Performing activities on each core differently (Update data, Load, Reload, Replication)
keep the size of the core in check and configure caching accordingly
Its more a matter of preference and requirements.
The main question for you is whether people will search for cars and users together. If not (they are different domains), you can setup multiple collections/cores. If they are going to be used together (e.g. a search for something that shows up in both cars and people), you may want to merge them into one index.
If you do use single collection for both types, you may want to setup dedicated request handlers returning different sets of fields and possibly tuning the searches. You can see an example of doing that (and a bit more) in the multilingual example from my book.

Loose coupling among objects within oracle schema

I am building an information service that manages Suppliers.
The suppliers are used by our billing system, tender system and sales system.
Though 60% of the attributes of supplier are unique to each system, there are still 40% attributes of Supplier that are shared across the systems.
My objective is to build a flexible system, so that change to one individual system's data, should not impact other systems. For example, if i need to make certain tables offline for upgrading them, it should not impact rest of the systems that need supplier information.
What is the best way of achieving this? Should all the different context specific attributes live in one schema, but deployed on different table spaces?
Also, the read and update may happen more for one set of attributes than the other. How should i logically represent them via one model, but deploy them in such a fashion that they can evolve independently?
Thank you.
First, tablespaces are a means of controlling the storage characteristics of segments, they won't help wrt avoiding impact from changes.
I recommend you create separate child tables for each set of attributes, each with a 1:1 referential integrity constraint to a parent table. e.g.
SUPPLIERS (supplier_id PK, common attributes...)
SUPPLIER_BILLING_INFO (supplier_id PK, billing attributes...) + FK to SUPPLIERS
SUPPLIER_TENDER_INFO (supplier_id PK, tender attributes...) + FK to SUPPLIERS
SUPPLIER_SALES_INFO (supplier_id PK, sales attributes...) + FK to SUPPLIERS
Obviously they'll need to live in one instance. Whether you put them in one schema or in separate schemas is up to you.
Changes to one system should have no impact on other systems, as long as they don't all refer to all the tables (i.e. the Billing system should never access SUPPLIER_TENDER_INFO).
This sounds like a very difficult question that can't be easily answered here. But I can think of a few tricks that might help you with some of your issues. It is possible to make huge changes to your data and still keep the system online.
DBMS_REDEFINITION allows you to change your table structure while other people are still using the table (although it looks very complicated).
Partitioning also allows you to change part of your table without affecting other users. For example, you can truncate just one of the partitions of a table. Partitioning also allows you to use different physical structures for the same table. For example, one partition could use a tablespace with a small block size (good for writing), and another partition could use a tablespace with a larger block size (good for reading).

Resources