So that I can use direct pathing for data loading? Can you turn off a cluster temporarily until data is loaded and then turn everything back on?
I'm guessing this questions is related to your other question regarding using SQL*Loader direct path. I believe the restriction on using direct path in SQL*Loader is that the table should not be clustered. If the table you are inserting data is not clustered, you can use direct path whether your Oracle instance is clustered or not.
So, if your table is not clustered, you should be able to use direct path loading without turning Oracle clustering off. If your table is cluster, you are completely out of luck because converting it to a non-clustered table and then clustering it back after the data is loaded would negate any performance gains from the direct path loading.
You are mixing two completely different concepts:
- a database cluster and
- a table cluster.
A database cluster provides scalability and HA, while the other (table cluster) determines how and where to physically store the data. Turning off RAC will not help with table clusters.
Related
Trying to understand if there is any such concept like this in Oracle Database.
Let's say I have two Databases, Database_A & Database_B
Database_A has schema_A, is there a way I can attach this schema to Database_B?
What I mean by this is if there is a job populating a TABLE_A in schema_A, I can see that read-only view in Database_B. We are trying to split a big Oracle database into two smaller databases and have a vast PL/SQL code, and trying to minimize the refactoring here.
Sharding might be what you're looking for. The schemas and tables will still logically exist on all databases, but you can arrange the data to be physically stored in specific databases. There might be a way to setup shardspaces, tablespaces, and user default tablespaces in a way where each schema's data is automatically stored in a specific database.
But I haven't actually used sharding. From what I've read, it seems to be designed for massive distributed OLTP systems, and it is likely complicated to administer. I'd guess this feature isn't worth the hassle unless you have petabytes of data.
I need to replicate Oracle database into one way. Data in replica will be only for the read.
Due to some limitations I cannot use Oracle Streams, Golden Gate and other commercial solutions.
What other possibilities do we have to perform that?
Materialized views over database link might be one option.
Or, perhaps you could even consider exporting & importing data (using Data Pump utilities).
Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison
For now, the only way I know to expand segments/hosts in greenplum is to use gpexpand utility. However, gpexpand stops the master server for quite a while(as I know) in the early expansion, and lock the table which is currently redistributing. I just want to know if there is any way that greenplum can work normally(no stop , no lock tables) when expand segments/hosts.Tks!
No, Greenplum must stop during the expansion phase but after it adds more nodes/segments, the redistribution of data can be done while users are active in the database.
Alternatively, Pivotal HDB (based on Apache HAWQ) does have dynamic virtual segments that you can even control at the query level. The optimizer controls how many segments are used for a query based on the cost of the query but you can also provide more segments to really leverage the resources available in the cluster.
At work we are thinking to move from Oracle to a NoSQL database, so I have to make some test on Cassandra and MongoDB. I have to move a lot of tables to the NoSQL database the idea is to have the data synchronized between this two platforms.
So I create a simple procedure that make selects into the Oracle DB and insert into mongo. Some of my colleagues point that maybe there is an easier(and more professional) way to do it.
Anybody had this problem before? how do you solve it?
If your goal is to copy your existing structure from Oracle to a NoSQL database then you should probably reconsider your move in the first place. By doing that you are losing any of the benefits one sees from going to a non-relational data store.
A good first step would be to take a long look at your existing structure and determine how it can be modified to affect positive impact on your application. Additionally, consider a hybrid system at the same time. Cassandra is great for a lot of things, but if you need a relational system and already are using a lot of Oracle functionality, it likely makes sense for most of your database to stay in Oracle, while moving the pieces that require frequent writes and would benefit from a different structure to Mongo or Cassandra.
Once you've made the decisions about your structure, I would suggest writing scripts/programs/add a module to your existing app, to write the data in the new format to the new data store. That will give you the most fine-grained control over every step in the process, which in a large system-wide architectural change, I would want to have.
You can also consider using components of Hadoop ecosystem to perform this kind of (ETL) task .For that you need to model your Cassandra DB as per the requirements.
Steps could be to migrate your oracle table data to HDFS (using SQOOP preferably) and then writing Map-Reduce job to transform this data and insert into Cassandra Data Model .