I am using Oracle. I have two tables:
Cat (cat_ID, cat_name, cat_age, cat_strength)
Dog (dog_ID, dog_name, dog_age)
Sometimes I need to get all pets into one query result. I was thinking of creating an animal_seq sequence that can be used by both Cat and Dog tables so that they never have duplicate IDs across tables and then when joined can be easily searched/queried whatever.
Is this bad practice? If so, why? Are there better ways to design the tables (eg just one Animal table, or multiple inheritence). Personally, I try to avoid inheritence due to the performance issues of joins.
Having a single sequence is totally acceptable and safe. Oracle ensures multiple reads to a sequence always returns a unique value so you shouldn't have any problems with duplicate keys.
If the the schema between CAT and DOG is truly unique and an additional animal entity will also be unique, I would keep separate tables. If you are going to maintain the same information about cats, dogs, monkeys, etc., I would recommend putting them into a single ANIMAL table. You'll have to give more information about the application/database for us to know what to recommend.
With your current design what if you wanted to record birds or rabits or any other animal ?You would have to create tables for each type.
I would say use KISS(Keep it simply stupid) principle and have one table and join to another table called ANIMAL_TYPE( animal_type_id,Animal_type_name) that way you can make the sure ids are not duplicate and you can tend should you want to record other animal types.
I think you are looking for a base table, animal. Then you have two sub classes cat and dog. Such a design will help you when you add information such as "owner" of that animal, or "animal observation" or whatever the purpose of your application is.
table animal(
animal_id
,animal_type <-- Discriminator column with for example C for cat, D for dog
,name
,age
,primary key(animal_id)
)
table cat(
animal_id
,cat_strength
,primary key(animal_id)
,foreign key(animal_id) references animal(animal_id)
)
table dog(
animal_id
,dog specific attributes here
,primary key(animal_id)
,foreign key(animal_id) references animal(animal_id)
)
As you can see, I've moved up common attributes to the base table animal, while keeping the specific attributes for the subclasses in the sub class tables.
Related
I have searched on the net but was not able to get an answer for my case. I am migrating a project on doctrine.
What is the correct way to link an entity to another entity that contains all the "families" of a project.
The families can be for instance :
"project_status" : status1, status2, status3
"countries" : en, us, cn ...
"tags" : tag1, tag2, ...
So all these values are stored in the same table in database and my entity handle this.
So now i have an entity that can have for example several countries or tags.
In the database i have one text field for the countries and one text field for the tags. And i store the ids of each tag or family inside these fields.
So let's say that I have one entity called "family" and one entity called "myEntity".
What is the best way to do ?
Ok maybe I have found one way in fact.
Instead of using my usual text field in my entity table to store the ids of the families I will use a join table.
So let's say that I have one table with my different lists in my project (country, file_status, and other possible lists). I will then have one entity for this table "familyEntity".
I will create a table that will be used for the cases in which one entityA can have several values of the same family (country for instance). In that case, I wil have a many to many association between familyEntity and entityA for that family.
If I have another entityB that uses the family "status" with several values possible, I will also have a many to many association using the same table for the association.
In the other cases in which only one value is possible for one family I will have many to one association.
Don't know if it is the right way but it occured to me that way. This solution seems ok for primary keys that are not compound).
I have a use case where I need to model reference data for e.g. different flavors of ice cream. Say I have 50 flavors of ice cream :-
20 attributes e.g. freezing-temp, creaminess will be shared across all flavors
every flavor of ice cream would have 20-30 attributes that will not be shared with other flavors e.g. :-
Strawberry ice cream might track tartness, fruit percentage etc.
Chocolate ice cream might track bitterness, cocoa level etc.
How would I model this data neatly in a database model, purely from a storage / retrieval point of view?
The options I can think of :-
One table per flavor. This will need 50 tables, and each table will have 20 columns that will overlap with each other, and another 20-30 attributes that will be unique to the flavor.
Pros : models the data of each flavor quite well
Cons : column overlap and large number of tables needed
One table for all flavors. This will only need 1 table, but will require 1000+ columns most of which would be empty.
Pros : models the data of ice cream in general, quite well
Cons : large number of columns and large amount of 'wasted' space
One key-value table for all flavors, with flavor Id, attribute name and attribute value.
Pros : simplest to create and insert data
Cons : harder to extract, not really a data model per se, difficult to form constraints for attributes, or for attributes related to other attributes
Never store a value in the wrong type.
Whatever design you choose, make sure that values are stored in their natural format. Use NUMBER, DATE, VARCHAR2, CLOB, XMLTYPE, CLOB (IS JSON), TIMESTAMP, etc. Trying to cram everything in a string will cause many problems. You lose validation, convenience, performance, and type safety.
For example, here is a common type safety problem. Imagine this simple query to find ice cream that is more than 25% fruit:
select *
from ice_cream_flavor_attribute
where attribute_name = 'Fruit Percentage'
and attribute_value > 25;
Do you see the bug? Do you see how the same query, with the same data, may work one day and fail the next with ORA-01722: invalid number?
It's difficult to write a query that forces Oracle to evaluate conditions in a specific order. Re-ordering the predicates won't help (99.9% of the time). Adding an inline view won't help (99.9% of the time). Using a CASE statement will work but not 100% of the time. Using hints will work but is tricky. Using an inline view and a ROWNUM is my preferred way of solving the problem but it looks odd and is difficult to understand.
If you must use an Entity Attribute Value model (and if you have more than 1000 attributes it may be unavoidable), at least use the right types.
Don't worry about space - a null column uses at most 1 byte.
Don't worry about complaints like "but then our queries are more complicated, we always need to know which column to use!" - realistically there is almost nothing useful you can do with a value without knowing its type. Every time you read or write a value you must already be thinking about the type.
I'd have one table with all the common attributes, then another for the non-shared attributes. For example:
CREATE TABLE ICE_CREAM_FLAVOR
(FLAVOR VARCHAR2(100) PRIMARY KEY,
FREEZING_TEMP NUMBER,
CREAMINESS NUMBER,
ETC VARCHAR2(25),
BLAH NUMBER);
CREATE TABLE ICE_CREAM_FLAVOR_ATTRIBUTE
(ID_ICF_ATTRIBUTE NUMBER, -- should be populated by an insert trigger
FLAVOR VARCHAR2(100)
NOT NULL
REFERENCES ICE_CREAM_FLAVOR(FLAVOR),
ATTRIBUTE_NAME VARCHAR2(25),
ATTRIBUTE_VALUE VARCHAR2(100));
Your mileage may vary.
Share and enjoy.
I would like to suggest, You can create 3 different tables.
Ice Cream Flavor: You can store all the flavors of ice cream. It will be icecream_flavor_master table. Let say if you have 50 flavors than 50 rows will create, like Strawberry,Chocolate etc.
Ice Cream Attributes: You can store all the attributes of ice cream. It will icecream_attribute_master table. Let say if you have 50 attributes than 50 rows will create, like tartness,bitterness,fruit percentage, cocoa level etc.
Ice Cream Flavor Attributes: You can store primary key of icecream_flavor_master and icecream_attribute_master in this table, to make the relation between flavor and attribute of icecream.
Let me know for further information.
You might be able to group flavors into classes of flavors, ones that share certain attributes. This lends itself to classes and subclasses that extend other classes.
If you want to do ER modeling on this, look up "generalization/specialization" on the web. Some websites will call this a feature of "Extended ER modeling" or EER.
If you want to design relational tables to implement the ER design, look into two patterns: Single Table Inheritance and Class Table Inheritance.
https://stackoverflow.com/tags/single-table-inheritance/info
https://stackoverflow.com/tags/class-table-inheritance/info
Also, look into Martin Fowler's treatment on this subject on the web, or in one of his textbooks.
What big vendors are doing for huge data in ECM (enterprise content management), where you have a quite similar scenario (many custom classes with custom attributes, some of them might be the same, having various types over attributes):
One key-value table for all flavors, with flavor Id, attribute name and attribute value.
They use one key-value table per type (string, number, date etc.).
For performance optimization, they allow to define dedicated tables for attributes, in order to keep index small and not crowded with other attributes.
Dedicated tables make sense for:
Massive usage (having many rows)
Bad histograms (like flags)
Otherwise Oracle index could be tricked, and full table access is the fastest access, which would be really bad.
So think early about performance when having huge amount of data.
My question may seems more general. But only answer I got so far is from the SO itself. My question is, I have a table customer information. I have 47 fields in it. Some of the fields are optional. I would like to split that table into two customer_info and customer_additional_info. One of its column is storing a file in byte format. Is there any advantage by splitting the table. I saw that the JOIN will slow down the query execution. Can I have more PROs and CONs of splitting a table into two?
I don't see much advantage in splitting the table unless some of the columns are very infrequently accessed and fairly large. There's a theoretical advantage to keeping rows small as you're going to get more of them in a cached block, and you improve the efficiency of a full table scan and of the buffer cache. Based on that I'd be wary of storing this file column in the customer table if it was more than a very small size.
Other than that, I'd keep it in a single table.
I can think of only 2 arguments in favor of splitting the table:
If all the columns in Customer_Addition_info are related, you could potentially get the benefit of additional declarative data integrity that you couldn't get with a single table. For instance, lets say your addition table was CustomerAddress. Your business logic may dictate that a customer address is optional, but once you have a customer Zip code, the addressL1, City and State become required fields. You could set these columns to non null if they exist in a customerAddress table. You couldn't do that if they existed directly in the customer table.
If you were doing some Object-relational mapping and your had a customer class with many subclasses and you didn't want to use Single Table Inheritance. Sometimes STI creates problems when you have similar properties of various subclasses that require different storage layout. Being that all subclasses have to use the same table, you might have name clashes. The alternative is Class Table inheritance where you have a table for the superclass, and an addition table for each subclass. This is a similar scenario to the one you described in your question.
As for CONS, The join makes things harder and slower. You also run the risk of accidentally creating a 1 to many relationship. I.E. You create 2 addresses in the CustomerAddress table and now you don't know which one is valid.
EDIT:
Let me explain the declarative ref integrity point further.
If your business rules are such that a customer address is optional, and you embed addressL1, addressL2, City, State, and Zip in your customer table, you would need to make each of these fields Nullable. That would allow someone to insert a customer with a City but no state. You could write a table level check constraint to cover this situation. But that isn't as easy as simply setting the AddressL1, City, State and Zip columns in the CustomerAddress table not nullable. To be clear, I am NOT advocating using the multi-table approach. However you asked for Pros and Cons, and I'm just pointing out this aspect falls on the pro side of the ledger.
I second what David Aldridge said, I'd just like to add a point about the file column (presumably BLOB)...
BLOBs are stored up to approx. 4000 bytes in-line1. If a BLOB is used rarely, you can specify DISABLE STORAGE IN ROW to store it out-of-line, removing the "cache pollution" without the need to split the table.
But whatever you do, measure the effects on realistic amounts of data before you make the final decision.
1 That is, in the row itself.
I am Working on web application where i have 90 fields for a Person class which are divided in to family details,education details, personal details etc....
I want separate form for each, like for family details has-father name, mother name siblings etc... fields and so on for other
I want separate table for each detail with common reference id for all tables
My question is how many bean classes should i write? Is it with one bean class can i map from multiple forms to multiple tables?
class PersonRegister{
private Long iD;
private String emailID;
private String password;
.
.
}//for register.......
once logged in i need to maintain his/her details
Either
class person{
}
or
class PersonFamilyDetails{}
class PersonEducationDetails{}
etc
which way software developing standards specify to create?
Don't go overboard, I believe in your case single but very wide (i.e. with a lot of columns) table would be most efficient and simplest from maintenance perspective. Only thing to keep in mind is too query only for a necessary subset of columns/fields when loading lots of rows. Otherwise you'll be fetching kilobytes of unnecessary data, not needed for particular use case.
Unfortunately Hibernate doesn't have direct support for that, when designing a mapping for Person, you'll end up with huge class and even worse - Hibernate will always fetch all simple columns (and many-to-one relationships). You can however overcome this problem either by creating several views in the database containing only subset of columns or by having several Java classes mapping to the same table but only to subset of columns.
Splitting your database model into several tables is beneficial only if your schema is not normalized. E.g. when storing siblings first name and last name you may wish to have a separate Sibling table and next time some other family member is entered, you can reuse the same row. This makes database smaller and might be faster when searching by sibling.
Your question comes down to database normalization, as described in-depth by Boyce and Codd, see
http://en.wikipedia.org/wiki/Database_normalization.
The main advantage of database normalization is avoiding modification anomalies. In your case, if you got one table with for each person e.g. father-firstname and father-lastname, and you have multiple people with the same father, this data will be duplicated, and when you discover a typo in the father-lastname, you could modify it for one sibling, and not for the next.
In this simplified case, database design best practices would call for a first normalization into a separate table with father-id, father-firstname and father-lastname, and your person table having a one-to-many relation to it.
For one-to-one relations, e.g. person->personeducationdetails, there's some debate. In the original definition of 1st Normal Form, every optional field would be normalized by putting it's own table. This was later weakened by introducing 'null' in relational databases, see http://en.wikipedia.org/wiki/First_normal_form#cite_note-CoddRule-12. But still, if a whole set of columns could be null at the same time, you put them in a separate table with a one-to-one relation.
E.g. if you don't know a person's educationdetails, all of its related fields are null, so you better split them off in a separate table, and simply not have a personeducationdetails record for that person.
So, this is a bit complicated: I have two tables, say cats and dogs.
They are in a many-to-many relationship (could be called friendships or whatever), so that Doctrine automatically creates a table cats_dogs for me with the appropriate fields. (that is rowid, cat_id, dog_id per default.)
Now, imagine I have a third table, award, where I want to award one of these friendships. Here I therefore need a field that references one row in cats_dogs. However, since this table does not really exist between my models, (Doctrine handles it for me) what would be the most elegant solution for this?
In the end, I want in my award model two fields, a cat and a dog, who need to be in a friendship.
I am using the annotation driver.
What stops you from manually creating the m:n table instead of having doctrine do it for you?
The Doctrine aims is to map objects from an E/R schema and to make easier the access to object connections. Therefore I believe that the table cats_dogs automatically provided by Doctrine is necessary as it is. It is concise and hits its purposes, i.e. it provides a list of all dogs of a cat or, vice versa, all the cats of a dog.
Thus, I can conclude that it is preferable to create a third entity (besides Cat and Dog) named Award which provides a one-to-one relationship with Cat and another one-to-one relationship with Dog. Making it consistent with the cats_dogs table is only up to you, and is not a Doctrine task by default. E.g., you can use some cascade persist option.
I believe that this is the most effective solution with Doctrine.
As a final remark, consider that each table should map a specific relationship between one or more entities, and in fact the table cats_dogs represents the friendship relationships, while the table Award will represent the awarded relationship relationship between two friends.