Many-to-many from relational design to dimensional design - etl

This is my database design:
I make a data warehouse a with 2 fact table design as you show :
I want to add Order dimension but the problem is that there is bridge table between product table and order table by the way order table contain 830 rows and Order details table contain 2155 rows.
How can I create order dimension?

Related

How can I add column from one table to another in Clickhouse?

I have a table having following columns: (id, col1,col2). I need to add col3 from a temporary table having (id,col3). So that after the operation table 1 should be: (id,col1,col2,col3) . After this, I drop the temporary table. How can this be done in Clickhouse?
I know of an approach that uses join table engine. However, join table data is stored in memory and I have memory resitrictions. How can I achieve the same result by not creating a in-memory table?
There is no magic in this realm. And the spoon does exists.
You can use that approach with samples and make many updates. Piece by piece by 10% for example.

How do I link two parse tables per common field?

I have a simple relationship between two parse tables (objects).
Both contain a 'name' string column.
All I need to do is to create a join between the two tables per 'name' column.
For example:
Table A (main) <--> Table B (ancillary)
where
'A.name' == 'B.name'.
Table A is the driving table: all criteria focuses on Table A.
Table B is the ancillary table that has a particular field (column) needed for Table A's result.
Note: the tables don't have any formal relation with each other.
What is the correct Parse syntax to allow this to happen?
...or must I make two queries instead on one merged query?

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Can truncate Magento catalog_product_entity_int table?

In magento db table catalog_product_entity_int too much large near about 500 MB and due to this performence is low.
How we can reduce the size or can this table truncate as like we can log tables?
you can not truncate catalog_product_entity_int table.
In Magento database, an entity can have several tables that share the same prefix.
For example, the product entity has the catalog_product_entity table for its main data and several other tables prefixed with “catalog_product_” such as catalog_product_entity_int, catalog_product_entity_media_gallery, catalog_product_entity_text and so on.
To store the data more efficiently, product details are stored separately depending on their data types.
When the value of the data is an integer type, it’s saved in the catalog_product_entity_int table, and when its type is an image, it’s saved in the catalog_product_entity_media_gallery table.
if you want truncate table for performance
then you can truncate below table i think you got lots of data in this table
log_customer
log_visitor
log_visitor_info
log_url_info
log_quote
report_viewed_product_index
report_compared_product_index
-catalog_compare_item
Let me know if you have any query

DB project - improving performance with relationships

I have two tables, let's call them TableA and TableB. One record in TableA is related to one or more in TableB. But there's also one special record within them in TableB for each record from TableA (for example with lowest ID), and I want to have quick access to that special one. Data from both tables aren't deleted - it's a kind of history rarely cleared. How do that the best in terms of performance?
I thought of:
1) two-way relationship, but it will affect insert performance
2) design next table, with primary key as FK_TableA (for TableA record exactly one is "special") and second column FK_TableB and then create view
3) design next table, with primary key as FK_TableA, FK_TableB, make FK_TableA unique and then create view
I'm open for all other ideas :)
4) I'd consider an indexed view to hide the JOIN and row restriction
This is similar to your options 2 and 3 but the DB engine will maintain it for you. With a new table you'll either compromise data integrity or have to manage the data via triggers

Resources