Datamart modelling fact table: indicator in columns or lines with one column called indicator - business-intelligence

I am modelling a datamart and have multiple measures (indicators ) and dimensions.
Is it better when modelling the fact table to make indicators by column or having one column that contains indicators like creating a dimension of indicators ?
Please give me your opinions and when to choose each option?

Dimensional modelling aims for each fact table to represent a business process where you take measurements, with each measurement stored separately as columns. These are separately named, with the aim being that these are things you can drag onto your BI tool's report without a user having to worry about going off to another table to work out what measure you're looking at.
The Kimball Group don't normally recommend the approach where you create a measure type dimension, and producing a 'generic' fact. It makes the number of rows in the fact table bigger (one for each measurement) and makes calculations between measurements in a single measurement event (fact) more difficult.
Where would this end? You could feasibly have one fact that represents all measurements, from all your facts. This might be easier to model and load into, and might be exactly what you need in your situation, but it doesn't make it easier to report from, and wouldn't be called a dimensional model.
The situation Kimball suggests this would be an acceptable technique, however, is when you could have hundreds of potential measurements, but only a few would be applicable to any particular fact.

Related

Should I use multiple fact tables for each grain or just aggregate from lowest grain?

Fairly new to data warehouse design and star schemas. We have designed a fact table which is storing various measures about Memberships, our grain is daily and some of the measures in this table are things like qty sold new, qty sold renewing, qty active, qty cancelled.
My question is this, the business will want to see the measures at other grains such as monthly, quarterly, yearly etc.. so would typically the approach here just be to aggregate the day level data for whatever time period was needed or would you recommend creating separate fact tables for the "key" time periods for our business requirements e.g. monthly, quarterly, yearly? I have read some mixed information on this which is mainly why I'm seeking others views.
Some information I read had people embedding a hierarchy in the fact table to designate different grains which was then identified via a "level" type column, which was advised against by quite a few people and didn't seem good to me also, those advising against we're suggesting separate fact tables per grain but to be honest I don't see why we wouldn't just aggregate from the daily entries we have, what benefits would we get from a fact table for each grain other than some slight performance improvements maybe?
Each DataMart will have its own "perspective", which may require an aggregated fact grain.
Star schema modeling is a "top-down" process, where you start from a set of questions or use cases and build a schema that makes those questions easy to answer. Not a "bottom-up" process where you start with the source data and figure out the schema design from there.
You may end up with multiple data marts that share the same granular fact table, but which need to aggregate it in different ways, either for performance, or to have a gran to calculate and store a measure that only makes sense at the aggregated grain.
Eg
SalesFact (store,day,customer,product,quantiy,price,cost)
and
StoreSalesFact(store, week, revenue, payroll_expense, last_year_revenue)

How to assign two or more time series identifier columns in Vertex AI Tabular Forecasting

I was wondering if it is possible to have more than one time series identifier column in the model? Let's assume I'd like to create a forecast at a product and store level (which the documentation suggests should be possible).
If I select product as the series identifier, the only options I have left for store is either a covariate or an attribute and neither is applicable in this scenario.
Would concatenating product and store and using the individual product and store code values for that concatenated ID as attributes be a solution? It doesn't feel right, but I can't see any other option - am I missing something?
Note: I understand that this feature of Vertex AI is currently in preview and that because of that the options may be limited.
There isn't an alternate way to assign 2 or more Time Series Identifiers in the Forecasting Model on Vertex AI. The "Forecasting model" is in the "Preview" Product launch stage, as you are aware, with all consequences of that fact the options are limited. Please refer to this doc for more information about the best practices for data preparation to train the forecasting model.
As a workaround, the two columns can be concatenated and assigned a Time Series Identifier on that concatenated column, as you have mentioned in the question. This way, the concatenated column carries more contextual information into the training of the model.
Just to follow up on Vishal's (correct) answer in case someone is looking this up in the future.
Yes, concatenating is the only option for now as there can only be one time series identifier (I would hope this changes in the future). Having said that, I've experimented with adding the individual identifiers in the data as categorical attributes and it works actually pretty well. This way I have forecast generated at a product/store level, but I can aggregate all forecasts for individual products and the results are not much off from the models trained on aggregated data (obviously that would depend on the demand classification and selected optimisation method amongst other factors).
Also, an interesting observation. When you include things like product descriptions, you can classify them either as categorical or text. I wasn't able to find in the documentation if the model would only use unigrams (which is what the column statistics in the console would suggest) or a number of n-grams but it is definitely something you would want to experiment with with your data. My dataset was actually showing a better accuracy when the categorical classification was used, which is a bit counter-intuitive as it feels like redundant information, although it's hard to tell as the documentation isn't very detailed. It is likely to be specific to my data set, so as I said make sure you experiment with yours.

Is it possible to create Analysis in Oracle BI OBIEE based on only one table?

Could you explain me please is it possible to create Analysis with only one fact table?
I have one fact table in physical and business layer. It has all columns which I need.
I've tried to create analysis I added months column to horizontal line and sum(sale_num) in vertical line from fact table in analysis and expected to see chart but nothing happened and query which perform OBI doesn't have any group by
Yes you can but you have to stick to the ground rules of dimensional analytics: Facts contain measures. Dimensions contain everything else. Facts do NOT contain attributes!
You simply model one logical fact and one logical dimension on your physical table. If you don't do weird things you don't even need to alias the physical table. It becomes the source of both your logical fact and logical dimension.
As long as you stick to the basic rules of dimensional modeling everything will work fine.

Dimensional Modeling Created/Modified Date/Person

What is the best practice for including Created By, Created Timestamp, Modified By, Modified Timestamp into a dimensional model?
The first two never change. The last two will change slowly for some data elements but rapidly for other data elements. However, I'd prefer a consistent approach so that reporting users become familiar with it.
Assume that I really only care about the most recent value; I don't need history.
Is it best to put them into a dimension knowing that, for highly-modified data, that dimension is going to change often? Or, is it better to put them into the fact table, treating the unchanging Created information much the same way a sales order number becomes a degenerate dimension?
In my answer I will assume that these ADDITIONAL Columns do NOT define the validity of the Dimensional record and that you are talking about a Slowly Changing Dimension type 1
So we are in fact talking about dimensional metadata here, about who / which process created or modified the dimensional row.
I would always put this kind of metadata in the dimension because it:
Is related to changes in the dimension. These changes happen independent of the fact table
In general it is advised to keep Fact tables as small as possible. If your Fact table would contain 5 Dimensions, this would lead to adding 5*4=20 extra columns to your fact table which will seriously bloath it and impact performance.

Algorithm to organize table into many tables to have less cells?

I'm not really trying to compress a database. This is more of a logical problem. Is there any algorithm that will take a data table with lots of columns and repeated data and find a way to organize it into many tables with ID's in such a way that in total there are as few cells as possible, and that this tables can be then joined with a query to replicate the original one.
I don't care about any particular database engine or language. I just want to see if there is a logical way of doing it. If you will post code, I like C# and SQL but you can use any.
I don't know of any automated algorithms but what you really need to do is heavily normalize your database. This means looking at your actual functional dependencies and breaking this off wherever it makes sense.
The problem with trying to do this in a computer program is that it isn't always clear if your current set of stored data represents all possible problem cases. You can't only look at numbers of values either. It makes little sense to break off booleans into their own table because they have only two values, for example, and this is only the tip of the iceberg.
I think that at this point, nothing is going to beat good ol' patient, hand-crafted normalization. This is something to do by hand. Any possible computer algorithm will either make a total mess of things or make you define the relationships such that you might as well do it all yourself.

Resources