Can Oracle Materialized views be used to join multiple related tables having foreign keys to create a larger denormalized big table which is refreshed instantaneously?
On some investigations, it says that joins are not allowed while using fast refresh.
Is it my assumption which is wrong that i can do this sort of thing with Oracle Materalized views?
Assuming all the tables are local (i.e. you are not trying to replicate the data from a remote database and do the joins all in one step), the restrictions you need to be aware of are listed in the Data Warehousing Guide, not the replication manual. The specific set of restrictions depends on the Oracle version but you should be able to create a fast-refreshable denormalized view of your data.
Related
Trying to understand if there is any such concept like this in Oracle Database.
Let's say I have two Databases, Database_A & Database_B
Database_A has schema_A, is there a way I can attach this schema to Database_B?
What I mean by this is if there is a job populating a TABLE_A in schema_A, I can see that read-only view in Database_B. We are trying to split a big Oracle database into two smaller databases and have a vast PL/SQL code, and trying to minimize the refactoring here.
Sharding might be what you're looking for. The schemas and tables will still logically exist on all databases, but you can arrange the data to be physically stored in specific databases. There might be a way to setup shardspaces, tablespaces, and user default tablespaces in a way where each schema's data is automatically stored in a specific database.
But I haven't actually used sharding. From what I've read, it seems to be designed for massive distributed OLTP systems, and it is likely complicated to administer. I'd guess this feature isn't worth the hassle unless you have petabytes of data.
Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison
We have transaction tables in Oracle and for reporting purposes we need this data transfered in real time to another flat Oracle table in another database. The performance of the report is great with table placed in this flat table.
Currently we are using golden gate for replication to the other database and using materialized view for this but due to some problems we need to switch to some other way of populating/maintaining this flat table. What options do we have?
It is a pretty basic requirement but the solutions I can see are for batch processing. Also if there are any other solutions you feel would better serve this purpose. Changing the target database to something other is also an option as there might be more such reports coming ahead.
We are trying to migrate oracle tables to hive and process them.
Currently the tables in oracle has primary key,foreign key and unique key constraints.
Can we replicate the same in hive?
We are doing some analysis on how to implement it.
Hive indexing was introduced in Hive 0.7.0 (HIVE-417) and removed in Hive 3.0 (HIVE-18448) Please read comments in this Jira. The feature was completely useless in Hive. These indexes was too expensive for big data, RIP.
As of Hive 2.1.0 (HIVE-13290) Hive includes support for non-validated primary and foreign key constraints. These constraints are not validated, an upstream system needs to ensure data integrity before it is loaded into Hive. These constraints are useful for tools generating ER diagrams and queries. Also such non-validated constraints are useful as self-documenting. You can easily find out what is supposed to be a PK if the table has such constraint.
In Oracle database Unique, PK and FK constraints are backed with indexes, so they can work fast and are really useful. But this is not how Hive works and what it was designed for.
Quite normal scenario is when you loaded very big file with semi-structured data in HDFS. Building an index on it is too expensive and without index to check PK violation is possible only to scan all the data. And normally you cannot enforce constraints in BigData. Upstream process can take care about data integrity and consistency but this does not guarantee you finally will not have PK violation in Hive in some big table loaded from different sources.
Some file storage formats like ORC have internal light weight "indexes" to speed-up filtering and enabling predicate push down (PPD), no PK and FK constraints are implemented using such indexes. This cannot be done because normally you can have many such files belonging to the same table in Hive and files even can have different schemas. Hive created for petabytes and you can process petabytes in single run, data can be semi-structured, files can have different schemas. Hadoop does not support random writes and this adds more complications and cost if you want to rebuild indexes.
I am converting sybase stored procedures into oracle views. This is what they want and this is not my first choice. My question is:
Are there any limitations in oracle views? Total number of columns? Can you create indexes for views? Since I will be creating subviews and joining views within views, are there any limitations on how many layers of views you can do?
Thanks
Are there any limitations in oracle views?
Total number of columns?
are there any limitations on how many layers of views you can do
All covered here: http://docs.oracle.com/database/121/REFRN/GUID-685230CF-63F5-4C5A-B8B0-037C566BDA76.htm#REFRN0043
The number of columns in a view has the same limits as the number of columns in a table.
Can you create indexes for views?
No you cannot. However you cate create a materialized view, that can be indexed
You're limited to 1000 columns in a table. It wouldn't shock me if there was a similar limit for views. But if you're creating a view with 1000 columns, you're probably doing something very wrong.
You cannot create indexes on views but you can create indexes on the underlying tables that queries against views can use. You can index materialized views since, as the name implies, they materialize the data into a separate structure. But then you have to deal with refreshing the materialized view on commit, which adds overhead to transactions, or tolerate stale data and refresh the materialized view on some schedule.
There is no limit to the number of layers of views you can have. Depending on the Oracle version, the complexity of the views, and things like the presence of constraints, you can end up with queries that either force Oracle to do extra work (i.e. joining in additional tables in view layers that your end query doesn't actually need) or that are too complex for the optimizer to find a decent plan.
There aren't any significant limitations on the complexity of views.
The Oracle cost-based optimizer does a decent (if inscrutable) job of figuring out how to carry out queries to views using the indexes of the underlying tables.
If that performance isn't enough, you might consider looking into materialized views.
The usual way of doing the kind of project you're doing is to get the views working, then use EXPLAIN PLAN to do the optimization.