I have a few questions regarding dimensional modeling:
While designing dimensional model from existing OLTP system, do we use the same table structures in OLTP in dimension model? For example I have a customer table in OLTP and I want to include it in my dimensional model, so can I use the same table structure while designing a customer dimension table or I can change it?
Can dimension tables refer each other? For example, in my OLTP I have EMP and DEPT table, EMP references DEPT, so I choose these two tables to be part of dimensional model, is it necessary that I put a FK constraint on EMP dimensional table?
Now, about bridge table, suppose in my OLTP I have STORE and DEPT tables and a bridge table STORE_DEPT that joins STORE with DEPT, that means I can have multiple departments within each store which is recorded in this bride table. Now, suppose I want to create dimension tables for STORE and DEPT in my dimensional model, do I need to include this bridge table also in the model?
Thanks in advance for your help.
No, the dimensional model is usually very different from the OLTP schema. You'll want to read about star schemas.
I would say that dimensions are usually independent of each other. I wouldn't start by having them refer to each other. If you think they need to, then redesign.
STORE and DEPT sound like they should be part of a LOCATION dimension. I still see no need for a bridge or JOIN.
You sound like you're trying to design your first star schema. It might be a good idea to seek out some training or guidance.
Related
Having a bit of trouble getting my head round this.
I have three models - Sectors, Industries, Companies.
Companies is the viewable resource and are oraganised into Sectors and Industries.
Sectors contain Industries. Industries contain companies.
Previously this was achieved by a table column containing comma separated values of the industry and sector IDs - tacky, I know.
I'm now using a pivot table (company_industry) along with a bi-directional 'belongsToMany' relationship between the company and industry models.
That works fine! For a single tier organising system. But when I come to add Sectors as a parent to Industries, that's when my brain explodes.
I wonder if anyone recognises this problem and can share with me a good resource to explain a best practice resolution.
Thank you kindly.
Could you explain me please is it possible to create Analysis with only one fact table?
I have one fact table in physical and business layer. It has all columns which I need.
I've tried to create analysis I added months column to horizontal line and sum(sale_num) in vertical line from fact table in analysis and expected to see chart but nothing happened and query which perform OBI doesn't have any group by
Yes you can but you have to stick to the ground rules of dimensional analytics: Facts contain measures. Dimensions contain everything else. Facts do NOT contain attributes!
You simply model one logical fact and one logical dimension on your physical table. If you don't do weird things you don't even need to alias the physical table. It becomes the source of both your logical fact and logical dimension.
As long as you stick to the basic rules of dimensional modeling everything will work fine.
I have large unpartitioned tables in the database (100GB+), and to be able to improve performance I think about partitioning them, or maybe just indexes. Data comes in on regularly basis, and is selected by dates, so I think range partitioning by month of creation date would be good opion.
I am reading about oracle table and index partitioning, and it look quite promising.
But I have two questions, for which I can not find answers (I think my google skills are going down).
First one is:
What are risk and disadvantages of creating partitioned tables and indexes in oracle, in particular on such large and alive tables? Is there something that I should know about?
Second:
How to create partition on existing and unpartitioned table or index?
Besides the outage (see below) needed to partition your data, the main risk I see is that if you decide to partition your table and indexes, with local indexes, your performance will not be great for queries not relying on the partition key (date). But you can use global indexes in that case, and go back to similar performances.
The simplest way to create a partitioned table from an unpartitioned one, by far, is to use create table as select with a new name and all the partition storage detail, delete the unpartitioned table and renamed the new table as the old one. Obviously, this requires careful preparation, and an outage that can last a few minutes :)
I did a bit R&D on the fact tables, whether they are normalized or de-normalized.
I came across some findings which make me confused.
According to Kimball:
Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.
The other finding, which I also I think is ok, by fazalhp at GeekInterview:
The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...
So my question is, are fact tables normalized or de-normalized? If any of these then how & why?
From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.
However, dimensional modelling is a methodology unto itself, tailored to:
one use case, namely reporting
mostly one basic type (pattern) of a query
one main user category -- business analyst, or similar
row-store RDBMS like Oracle, SQl Server, Postgres ...
one independently controlled load/update process (ETL); all other clients are read-only
There are other DW design methodologies out there, like
Inmon's -- data structure driven
Data Vault -- data structure driven
Anchor modelling -- schema evolution driven
The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.
Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.
The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.
My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.
I've created a table on Vertica, and I want to create an index on that table. I can't see how to create an index on Vertica, though. Is it possible? If so, how can I do that?
Vertica's speed is hinged on using columnar projections, not indexes. Please see:
https://my.vertica.com/docs/6.1.x/HTML/index.htm#12037.htm
So, in fact, Vertica doesn't have the ability to create an index. You will have to use a projection to achieve good performance.
kimbo's answer is correct.
I try to explain it to people a few ways. But basically, the table itself is a construct like a view. Unlike traditional databases, the table itself isn't saved to disk and then indexed in different ways. Projections handle the sorting, indexing, layout on disk, etc.
I also use an analogy of a deck of cards. A table can be considered a deck of cards. You ask for particular hands. Projections are like particular shuffles. Some may be sorted by suit. Some by face value. And depending by what you ask for depends on what projection (in this analogy shuffle) you query.