Transformation from UML class diagram to object-relational (SQL-99) model - oracle

I am trying to study object-relational databases, and am having a really hard time to find information on it and understand the concept. I found only some examples. Probably because English is not my first language.
I want to be able to make an object-relational database design model in the form of a UML class diagram and then create the object-relational (SQL-99) tables in Oracle. And I don't know how to do that.
This is a navigational model example (I think) :

Navigational models is a legacy of obsolete databases tachnologies that have been supplanted by the relational model:
The prehistoric hierarchical model allowed to group records in a tree. So every record has pointer to the related records. Navigation goes top down and was as far as I know unidirectional.
The less prehistoric network model was an alternative where navigation between records was more powerful, and not necessarily limited to tree structures. It gained some popularity as it could relate records very efficiently in a time where relational databases were not very performant on smaller computers (remember Oracle 5 had a table locking mecanism!)
Both where based on fixed, structured records, and the navigation needed to be fixed up-front. THis is the past. Don't go-there if you want to learn modern ORM or NoSQL. The translation from an UML class diagram to a database model is straightforward. If you need to subdivide large classes into smaller one (like Employee and Address), it means that your classes are too large. Fix it in the original UML model: decompose the classes in smaller classes.
UML allow to make associations navigable (but it's not useful if you target a relational database in the end). THere are then a coupe of techniques that allow you to easily build the tables from that model, using for example identity fields, foreign key mapping (for one-to-one or one-to-many) or association table mapping (for many-to-many). But there are full articles or books on these techniques and it would be too long to develop this here.
P.S: you need to improve your English. French literature on the topic is too poor. I can tell from my own experience. ;-)

Related

Identify data warehouse design methodologies in the following diagram

Can someone help me identify the top-down, bottom-up, and hybrid data warehouse design methodologies as mentioned here in Wikipedia in the following diagram? I am interested in understanding how the diagram differs depending on each design methodology.
The diagram is too generic to enable identification of a methodology. Further, the Wikipedia article is surprisingly out of date.
There are four mainstream DW methodologies in common use today - Dimensional (Kimball), 3NF (Inmon), Data Vault (Linstedt) and Anchor Modelling (Ronnback). All could be represented within that diagram.
The issue of top-down or bottom-up in this article is centred around data marts. There is no requirement that marts are stored in a separate database, or even in a DBMS. In the context of your diagram they might exist in either the data warehouse or the analysis tool. In any case, the diagram does not give any indication of what came first, so you can't infer an approach.
In order to identify the methodology (Kimball, etc.) that was used to design the warehouse you'd need to see its data model. It would be immediately apparent from the model.
To identify the order in which components were delivered you'd need to see some sort of timeline, project plan, etc.

Products database Design: Entity-Attribute-Value (EAV) model or No SQL or another alterative models?

Please, What is the best model to design and create an products database if I'm planning on having many products in the database: is Entity-Attribute-Value (EAV) model (By using Magento) or No SQL database or another alternative models?
There are pros and cons for both models:
Pros for EAV(Entity_Attribute_Value):
Facilitates a generic architecture (easy to add|remove attributes)
Relatively easy to implement
Cons for EAV
Slow, resource consuming
- Not scalable
Pros for NoSQL
Fast, increased performance, easy to get all the needed information, usually will be stored in the same document.
Easy to scale
Cons for NoSQL
This is a hell to implement, when the times comes to manage generic stuff.
In conclusion I will suggest to choose NoSQL the biggest pro that I see is scalability.

Is a fact table in normalized or de-normalized form?

I did a bit R&D on the fact tables, whether they are normalized or de-normalized.
I came across some findings which make me confused.
According to Kimball:
Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.
The other finding, which I also I think is ok, by fazalhp at GeekInterview:
The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...
So my question is, are fact tables normalized or de-normalized? If any of these then how & why?
From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.
However, dimensional modelling is a methodology unto itself, tailored to:
one use case, namely reporting
mostly one basic type (pattern) of a query
one main user category -- business analyst, or similar
row-store RDBMS like Oracle, SQl Server, Postgres ...
one independently controlled load/update process (ETL); all other clients are read-only
There are other DW design methodologies out there, like
Inmon's -- data structure driven
Data Vault -- data structure driven
Anchor modelling -- schema evolution driven
The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.
Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.
The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.
My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.

Star vs Snowflake schema in data warehousing?

Currently, I've been involved in an warehouse based intelligent transaction analysis banking system featuring customer churn behavior, fraud detection & CRM analysis. We've been using Oracle as the database & it's completely a data warehousing project with data mining algorithms used for analysis.
We have records of about 1000 customers of a bank. For modeling, whether it is better to use the star schema or snowflake schema or constellation schema? I know the basic difference of star and snowflake schema- normalization of dimension table occurs in snowflake (a.k.a. snowflaking) schema which may be problematic for joining in case of large-sized database.
So, which schema would be better for my case? Answers from experienced programmers involved in data warehousing are highly welcomed!
Thanks in advance!
In brief, my assumption going into a project like this would be that a star schema would be appropriate. I might modify that if it appeared that a dimension was getting too large to efficiently full scan and the efficiency of queries against it could be meaningfully improved by snowflaking unless that dimension joined to the fact table on a partitioning key (due to difficulties in applying partition pruning on a predicate placed on a snowflaked dimension).

Anyone know anything about OLAP Internals?

I know a bit about database internals. I've actually implemented a small, simple relational database engine before, using ISAM structures on disk and BTree indexes and all that sort of thing. It was fun, and very educational. I know that I'm much more cognizant about carefully designing database schemas and writing queries now that I know a little bit more about how RDBMSs work under the hood.
But I don't know anything about multidimensional OLAP data models, and I've had a hard time finding any useful information on the internet.
How is the information stored on disk? What data structures comprise the cube? If a MOLAP model doesn't use tables, with columns and records, then... what? Especially in highly dimensional data, what kinds of data structures make the MOLAP model so efficient? Do MOLAP implementations use something analogous to RDBMS indexes?
Why are OLAP servers so much better at processing ad hoc queries? The same sorts of aggregations that might take hours to process in an ordinary relational database can be processed in milliseconds in an OLTP cube. What are the underlying mechanics of the model that make that possible?
I've implemented a couple of systems that mimicked what OLAP cubes do, and here are a couple of things we did to get them to work.
The core data was held in an n-dimensional array, all in memory, and all the keys were implemented via hierarchies of pointers to the underlying array. In this way we could have multiple different sets of keys for the same data. The data in the array was the equivalent of the fact table, often it would only have a couple of pieces of data, in one instance this was price and number sold.
The underlying array was often sparse, so once it was created we used to remove all the blank cells to save memory - lots of hardcore pointer arithmetic but it worked.
As we had hierarchies of keys, we could write routines quite easily to drill down/up a hierarchy easily. For instance we would access year of data, by going through the month keys, which in turn mapped to days and/or weeks. At each level we would aggregate data as part of building the cube - made calculations much faster.
We didn't implement any kind of query language, but we did support drill down on all axis (up to 7 in our biggest cubes), and that was tied directly to the UI which the users liked.
We implemented core stuff in C++, but these days I reckon C# could be fast enough, but I'd worry about how to implement sparse arrays.
Hope that helps, sound interesting.
The book Microsoft SQL Server 2008 Analysis Services Unleashed spells out some of the particularities of SSAS 2008 in decent detail. It's not quite a "here's exactly how SSAS works under the hood", but it's pretty suggestive, especially on the data structure side. (It's not quite as detailed/specific about the exact algorithms.) A few of the things I, as an amateur in this area, gathered from this book. This is all about SSAS MOLAP:
Despite all the talk about multi-dimensional cubes, fact table (aka measure group) data is still, to a first approximation, ultimately stored in basically 2D tables, one row per fact. A number of OLAP operations seem to ultimately consist of iterating over rows in 2D tables.
The data is potentially much smaller inside MOLAP than inside a corresponding SQL table, however. One trick is that each unique string is stored only once, in a "string store". Data structures can then refer to strings in a more compact form (by string ID, basically). SSAS also compresses rows within the MOLAP store in some form. This shrinking I assume lets more of the data stay in RAM simultaneously, which is good.
Similarly, SSAS can often iterate over a subset of the data rather than the full dataset. A few mechanisms are in play:
By default, SSAS builds a hash index for each dimension/attribute value; it thus knows "right away" which pages on disk contain the relevant data for, say, Year=1997.
There's a caching architecture where relevant subsets of the data are stored in RAM separate from the whole dataset. For example, you might have cached a subcube that has only a few of your fields, and that only pertains to the data from 1997. If a query is asking only about 1997, then it will iterate only over that subcube, thereby speeding things up. (But note that a "subcube" is, to a first approximation, just a 2D table.)
If you're predefined aggregates, then these smaller subsets can also be precomputed at cube processing time, rather than merely computed/cached on demand.
SSAS fact table rows are fixed size, which presumibly helps in some form. (In SQL, in constrast, you might have variable-width string columns.)
The caching architecture also means that, once an aggregation has been computed, it doesn't need to be refetched from disk and recomputed again and again.
These are some of the factors in play in SSAS anyway. I can't claim that there aren't other vital things as well.

Resources