I have created a level-based hierarchy (Logical Dimension). Created physical joins and logical joins as well. Now, when I am creating the analysis using hierarchy and some measures from the joined fact tables, it is not giving the values and when I am going further to check the physical query of the analysis, the physical query is showing only dimension not the joined fact. What could be the reason for it?
Related
Could you explain me please is it possible to create Analysis with only one fact table?
I have one fact table in physical and business layer. It has all columns which I need.
I've tried to create analysis I added months column to horizontal line and sum(sale_num) in vertical line from fact table in analysis and expected to see chart but nothing happened and query which perform OBI doesn't have any group by
Yes you can but you have to stick to the ground rules of dimensional analytics: Facts contain measures. Dimensions contain everything else. Facts do NOT contain attributes!
You simply model one logical fact and one logical dimension on your physical table. If you don't do weird things you don't even need to alias the physical table. It becomes the source of both your logical fact and logical dimension.
As long as you stick to the basic rules of dimensional modeling everything will work fine.
I did a bit R&D on the fact tables, whether they are normalized or de-normalized.
I came across some findings which make me confused.
According to Kimball:
Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.
The other finding, which I also I think is ok, by fazalhp at GeekInterview:
The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...
So my question is, are fact tables normalized or de-normalized? If any of these then how & why?
From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.
However, dimensional modelling is a methodology unto itself, tailored to:
one use case, namely reporting
mostly one basic type (pattern) of a query
one main user category -- business analyst, or similar
row-store RDBMS like Oracle, SQl Server, Postgres ...
one independently controlled load/update process (ETL); all other clients are read-only
There are other DW design methodologies out there, like
Inmon's -- data structure driven
Data Vault -- data structure driven
Anchor modelling -- schema evolution driven
The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.
Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.
The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.
My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.
I am planning to do a project for implementing all aggregation operations in HBase. But I don’t know about its difficulty. I have only 6 months for completing that project. Should I go forward with it? I am planning to do it in java. I know that there are already some aggregation functions. But there in no INNER JOIN like queries now. I am planning to implement such type of queries. I don't know it’s a blunder or bluff.
I think technically we should distinguish two types of joins:
a) One small table + One Big Table. By small table I mean table which can be cached in memory of each node w/o seriously affecting cluster operation. In this case Join using coprocessor should be be possible by putting small table in the hash map, iterating over the node local part of the data of the big table and this way producing join results. In the Hive's term it is called "map" join http://www.facebook.com/note.php?note_id=470667928919.
b) Two big tables. I do not think it is viable to get it production quality in short time frame. I might state that such functionality is realm of MPP databases and serious part of their IP.
It is definitely harder in HBase than doing it in an RDBMS or a different Hadoop technology like PIG or Hive.
I know that when joining across multiple tables, performance is dependent upon the order in which they are joined. What factors should I consider when joining tables?
Most modern RDBM's optimize the query based upon which tables are joined, the indexes used, table statistics, etc. They rarely, if ever, differ in their final execution plan based upon the order of the joins in the query.
SQL is designed to be declarative; you specify what you want, not (in most cases) how to get it. While there are things like index hints that can allow you to direct the optimizer to use or avoid specific indexes, by and large you can leave that work to the engine and be about the business of writing your queries.
In the end, running different versions of your queries within SQL Server Management Studio and viewing the actual execution plans is the only way to tell if order can truly make a difference.
As far as I know, the join order has no effect on query performance. The query engine will parse the query and execute it in the way it believes is the most efficient. If you want, try writing the query using different join orders and look at the execution plan. They should be the same.
See this article: http://sql-4-life.blogspot.com/2009/03/order-of-inner-joins.html
We're looking at using Oracle Hierarchical queries to model potentially very large tree structures (potentially infinitely wide, and depth of 30+). My understanding is that hierarchal queries provide a method to write recursively joining SQL but they it does not provide any real performance enhancements over if you were to manually write an equivalent query... is this the case? What sort of experiences have people had, performance wise, with using oracle hierarchical queries?
Well the short answer is that without the hierarchical extension (connect by) you couldn't write a recursive query. You could programmitically issue many queries which were recurisively linked.
The rule of thumb with everything database is, especially oracle, is that if you can issue your result in a single query it will almost always be faster than doing it programatically.
My experiences have been with much smaller sets, so I can't speak for how well heirarchical queries will perform for large sets.
When doing these tree retrievals, you typically have these options
Query everything and assemble the tree on the client side.
Perform one query for each level of the tree, building on what you know that you need from the previous query results
Use the built in stuff Oracle provides (START WITH,CONNECT BY PRIOR).
Doing it all in the database will reduce unnecessary round trips or wasteful queries that pull too much data.
Try partitioning the data within you hierarchical table and then limiting the partition included in the query.
CREATE TABLE
loopy
(key NUMBER, key_hier number, info VARCHAR2, part NUMBER)
PARTITION BY
RANGE (part)
(
PARTITION low VALUES LESS THAN (1000),
PARTITION mid VALUES LESS THAN (10000),
PARTITION high VALUES LESS THAN (MAXVALUE)
);
SELECT
info
FROM
loopy PARTITION(mid)
CONNECT BY
key = key_hier
START WITH
key = <some value>;
The interesting problem now becomes your partitioning strategy. Oracle provides several options.
I've seen that using connect by can be slow but compared to what? There isn't really another option except building a result set using recursive PL/SQL calls (slower) or doing it on your client side.
You could try separating your data into a mapping (hierarchy definition) and lookup tables (the display data) and then joining them back together. I guess I wouldn't expect much of a gain assuming you are getting the hierarchy data from indexed fields but its worth a try.
Have you tried it using the connect by yet? I'm a big fan of trying different variations.