I am using golden gate to replicate table from one DB to multiple DB's. The challenging part is that in one DB the table should be replicated full (all table columns), but in the rest of the DBs the table needs to be half replicated, meaning just a few columns, not all.
Is it possible to have columns exception at the replication level ?
I know that is it possible at the extract level, but this doesn't fit to my scenario.
COLSEXCEPT is an EXTRACT parameter only. It cannot be used in replication.
For tables with large number of columns, using COLEXCEPT can help in excluding some columns instead of entering all the columns in the extract file.
You need to solve this in the REPLICAT side by mapping the necessary columns to the target table using COLMAP. I think USEDEFAULTS will not work in this case for REPLICAT since you mentioned that you need few columns only(Does that mean the table structure is different from SOURCE to TARGET???)
Related
After googling for a while , I am posting this question here since I was not able to find such a problem posted anywhere.
Our application has a table with 274 columns(No LOB or Long Raw columns) and over a period of 8 years the table started to have chained rows so any full table scan is impacting the performance.
When we dig deeper we found out that approximately 50 columns are not used anywhere in application and so could be dropped right away. But the challenge here is the application has to undergo many code changes to achieve this and we have exposed the underlying data as a service that is being consumed by other applications as well. So we cannot choose the code change as an option for now.
Another option we thought was, whether I can make these 50 columns as Virtual column set to NULL always, then we only we need to make changes to table loading procs and rest all will be as is. But I need experts' advice whether adding virtual columns to the table will not construct chained rows again. Will this solution work for the given problem statement?
Thanks
Rammy
Oracle only allows 255 columns per block. For tables with more than 255 columns it splits rows into multiple blocks. Find out more.
You table has 274 columns so you have chained rows because of the inherent table structure rather than the amount of space the data takes up. Making fifty columns all null won't change that.
So, if you want to eliminate the chained rows you really need to drop the rows. Of course you don't want to change all that application code. So what you can try is:
rename the table
drop the columns you don't want any more
create a view using the original table name and include NULL columns in the view's projection to match the original table structure.
I am from SQL Datawarehouse world where from a flat feed I generate dimension and fact tables. In general data warehouse projects we divide feed into fact and dimension. Ex:
I am completely new to Hadoop and I came to know that I can build data warehouse in hive. Now, I am familiar with using guid which I think is applicable as a primary key in hive. So, the below strategy is the right way to load fact and dimension in hive?
Load source data into a hive table; let say Sales_Data_Warehouse
Generate Dimension from sales_data_warehouse; ex:
SELECT New_Guid(), Customer_Name, Customer_Address From Sales_Data_Warehouse
When all dimensions are done then load the fact table like
SELECT New_Guid() AS 'Fact_Key', Customer.Customer_Key, Store.Store_Key...
FROM Sales_Data_Warehouse AS 'source'
JOIN Customer_Dimension Customer on source.Customer_Name =
Customer.Customer_Name AND source.Customer_Address = Customer.Customer_Address
JOIN Store_Dimension AS 'Store' ON
Store.Store_Name = Source.Store_Name
JOIN Product_Dimension AS 'Product' ON .....
Is this the way I should load my fact and dimension table in hive?
Also, in general warehouse projects we need to update dimensions attributes (ex: Customer_Address is changed to something else) or have to update fact table foreign key (rarely, but it does happen). So, how can I have a INSERT-UPDATE load in hive. (Like we do Lookup in SSIS or MERGE Statement in TSQL)?
We still get the benefits of dimensional models on Hadoop and Hive. However, some features of Hadoop require us to slightly adopt the standard approach to dimensional modelling.
The Hadoop File System is immutable. We can only add but not update data. As a result we can only append records to dimension tables (While Hive has added an update feature and transactions this seems to be rather buggy). Slowly Changing Dimensions on Hadoop become the default behaviour. In order to get the latest and most up to date record in a dimension table we have three options. First, we can create a View that retrieves the latest record using windowing functions. Second, we can have a compaction service running in the background that recreates the latest state. Third, we can store our dimension tables in mutable storage, e.g. HBase and federate queries across the two types of storage.
The way how data is distributed across HDFS makes it expensive to join data. In a distributed relational database (MPP) we can co-locate records with the same primary and foreign keys on the same node in a cluster. This makes it relatively cheap to join very large tables. No data needs to travel across the network to perform the join. This is very different on Hadoop and HDFS. On HDFS tables are split into big chunks and distributed across the nodes on our cluster. We don’t have any control on how individual records and their keys are spread across the cluster. As a result joins on Hadoop for two very large tables are quite expensive as data has to travel across the network. We should avoid joins where possible. For a large fact and dimension table we can de-normalise the dimension table directly into the fact table. For two very large transaction tables we can nest the records of the child table inside the parent table and flatten out the data at run time. We can use SQL extensions such as array_agg in BigQuery/Postgres etc. to handle multiple grains in a fact table
I would also question the usefulness of surrogate keys. Why not use the natural key? Maybe performance for complex compound keys may be an issue but otherwise surrogate keys are not really useful and I never use them.
Oracle has logical blocks (basic unit) to store data. I want to ask can a single block have rows for two different tables?
Yes it can. Tables belonging to the same cluster can have rows within same data block. This is the basic idea of the cluster. To keep the related data as close as possible. So if you make a logical join there is no work needed, the data is joined already. So both logical and physical IOs are reduced.
See https://docs.oracle.com/database/121/CNCPT/tablecls.htm#CNCPT608.
I have a client that mostly uses calculations on a single column of many rows from a table (each time another column), which is classic for a columnar DB.
The problem is that he is using Oracle, so what I thought of doing was to build a bunch of cluster table where each table has just one column besides the PK and this way allow him to work in a pseudo-columnar model.
What are you thoughts on the subject?
Will it even work as expected or am I just forcing the solution here ?
Thanks,
Daniel
I didn't test it in the end but I did achieve close to vertical performance time using sorted table hash cluster.
I have a question regarding the oracle copy command:
Is it possible to copy data between databases (were the structure is the same) and honor relationships in one go without(!) writing procedures?
To be more precise:
Table B refers (by B.FK) to table A (A.PK) by a foreign key (B.FK -> A.PK; no relationship information is stored in the db itself). The keys are generated by a sequence, which is used to create the PK for all tables.
So how to copy table A and B while keeping the relationship intact and use the target DBs sequence to generate new primary keys for the copied data (i cannot use the "original" PK values as they might already be used in the same table for a different dataset)?
I doubt that the copy command is capable to handle this situation but what is the way to achieve the desired behavior?
Thanks
Matthias
Oracle has several different ways of moving data from one database to another, of which the SQL*Plus copy command is the most basic and the least satisfactory. Writing your own replication routine (as #OldProgrammer suggests) isn't much better.
You're using 11g, so move into the 21st century by using the built-in Streams functionality.
There is no way to synchronize sequences across databases. There is a workaround, which is explained by the inestimable Tom Kyte.
I generally prefer db links and then use sql insert statements to copy over data.
In your scenario , first insert data of Table A using DB link and then table. If you try otherway round, you will error.
For info on DB link , you canc heck this link: http://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_concepts002.htm