What is the use of disable operation in hbase? - hadoop

I know it is to disallow anyone from performing any operation on a table, when a schema change is going to be made.
> disable ‘table_name’
But I want more clarification on it. Why should we disallow others to perform any operation on it? Is it just because wrong and unexpected results would be given when a query is made while a schema change is undergoing...!

HBase is a strictly consistent NoSQL database in case of reads and writes.
So achieving consistency is very important for HBase during DB operations.
HBase demands disabling table in case of altering schema changes and dropping tables.
HBase doesn't have a protocol to tell all the regions to update the schema changes online. So we need to disable the table before alter it.
HBase table drop is two step procedure:
Closing all the regions. i.e disable the table
Dropping them. i.e drop the table.
So We must disable all operations except a few operations like list, is_enabled, is_disabled etc... on the table before dropping it.

Related

Does creating index in Oracle locks the table for reads?

If we specify ONLINE in the CREATE INDEX statement, the table isn't locked during creation of the index. Without ONLINE keyword it isn't possible to perform DML operations on the table. But is the SELECT statement possible on the table meanwhile? After reading the description of CREATE INDEX statement it still isn't clear to me.
I ask about this, because I wonder if it is similar to PostgreSQL or SQL Server:
In PostgreSQL writes on the table are not possible, but one can still read the table - see the CREATE INDEX doc > CONCURRENTLY parameter.
In SQL Server writes on the table are not possible, and additionally if we create a clustered index reads are also not possible - see the CREATE INDEX doc > ONLINE parameter.
Creating an index does NOT block other users from reading the table. In general, almost no Oracle DDL commands will prevent users from reading tables.
There are some DDL statements that can cause problems for readers. For example, if you TRUNCATE a table, other users who are in the middle of reading that table may get the error ORA-08103: Object No Longer Exists. But that's a very destructive change that we would expect to cause problems. I recently found a specific type of foreign key constraint that blocked reading the table, but that was likely a rare bug. I've caused a lot of production problems while adding objects, but so far I've never seen adding an index prevent users from reading the table.

Retain privileges when dropping objects in Oracle

It occurred to me that I have a fundamental issue with respect to privileges.. Anyone who is granted access to my data warehouse, will be given privileges to objects in the reporting schema. However, whenever we drop objects, those privileges are lost.
The fundamental requirements that should be met with the approach are:
Indexes not populated during load of data (dropped, disabled?) to avoid populating while inserting
Retain existing privileges.
What do you guys think is the best approach based on the requirements above?
For requirement 1: depending on the version of Oracle you're running, you may be able to alter the indexes as invisible. Making indexes invisible will cause the optimizer to ignore them, but it can come in handy because you can simply make them visible again after whatever operation you're performing. If that won't work, you could alter them unusable instead. More info here: https://oracle-base.com/articles/11g/invisible-indexes-11gr1
For requirement 2: Once an object is dropped, the privileges are dropped along with it. There's not really any straightforward way to retain the grants as they are when an object is dropped, however, you could use a number of different methods to "save" the privileges when a table is dropped. These are just some ideas to get you going, not a guaranteed method of success.
Method 1: Using Triggers and DBMS_SCHEDULER to issue the grants. Triggers can be very powerful, and if you create a trigger that is set to run when a table of a specific name is created under a specific schema, you can use DBMS_SCHEDULER to run a job that will issue the missing grants.
Method 2: Per Littlefoot's suggestion, you can save the grant statements in a SQL script and run it manually every time the table is created (or create a trigger for it!)
Method 3: Work with the business and implement a process wherein the table does not need to be dropped, and instead is altered to fit business needs. To use this method, you'll have to understand why the object is being dropped in the first place. Is a drop really necessary to accomplish the desired outcome? I've seen teams request that tables be dropped when they really just wanted the tables to be truncated. If this is one of those scenarios, truncating instead of dropping will let you keep the object and its grants intact.
In any scenario, you'll also want to make sure that you are managing permissions via roles whenever possible, rather than issuing grants to individual users/schemas. Utilizing roles will make managing permissions a lot easier in just about any scenario.
If you DROP an object, the grants are gone. However:
Indexes not populated during load of data (dropped, disabled?) to avoid
populating while inserting
Retain existing privileges.
Here is one common approach. There are others. If you have partitioning there are better ways.
ALTER INDEX my_index1 UNUSABLE;
ALTER INDEX my_index2 UNUSABLE;
...
ALTER INDEX my_indexn UNUSABLE;
TRUNCATE TABLE my_table_with_n_indexes; -- OPTIONAL (depends if you need to start empty)
INSERT /*+ APPEND */ INTO my_table_with_n_indexes; -- Do your load here. APPEND hint optional, depending on what you are doing
ALTER INDEX my_index1 REBUILD;
ALTER INDEX my_index2 REBUILD;
...
ALTER INDEX my_indexn REBUILD;

Dynamic Audit Trigger

I want to keep logs of all tables into 1 single log table. Suppose if any DML operation is going on any table inside DB. Than that should be logged in 1 single tables.
But there should be a dynamic trigger which will not hard coded the column names for every table.
Is there any solution for this.
Regards,
Somdutt Harihar
"Is there any solution for this"
No. This is not how databases work. Strongly enforced data structures is what they do, and that applies to audit tables just as much as transaction tables.
The reason is quite clear: the time you save not writing audit code specific to each transactional table is the time you will spend writing a query to retrieve the audit records. The difference is, when you're trying to get the audit records out you will have your boss standing over your shoulder demanding to know when you can tell them what happened to the payroll records last month. Or asking how long it will take you to produce that report for the regulators, are you trying to make the company look like a bunch of clowns? You get the picture. This is not where you want to be.
Also, the performance of a single table to store all the changes to all the tables in the database? That is going to be so slow, you have no idea.
The point is, we can generate the auditing code. It is easy to write some SQL which interrogates the data dictionary and produces DDL for the target tables and triggers to populate those tables.
In fact it gets even easier in 11.2.0.4 and later because we can use FLASHBACK DATA ARCHIVE (formerly Oracle Total Recall) to build and maintain such journalling functionality automatically, and query it automatically with the as of syntax. Find out more.
Okay, so technically there is a solution. You could have a trigger on each table which executes some dynamic PL/SQL to interrogate the data dictionary and assembles a piece of JSON which you stuff into your single table. The single table could be partitioned by day range and sub-partitioned by table name (assuming you have licensed the Partitioning option) to mitigate the performance of querying it.
But that is extremely complex. Running dynamic PL/SQL for every DML statement will have a bad effect on performance, which the users will notice. And this still doesn't solve the fundamental problem of retrieving the audit trail when you need it.
To audit DML actions on any table just enable such audit by using following code:
audit insert table, update table, delete table;
All actions with tables will then be logged to sys.dba_audit_object table.
Audit will only log timestamp, user, host and other params, not exact copies of new or old rows.

How to safely update hive external table

I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking

Can you using joins with direct path inserts?

I have tried to find examples but they are all simple with a single where clause. Here is the situation. I have a bunch of legacy data transferred from another database. I also have the "good" tables in that same database. I need to transfer (data-conversion) data from the legacy tables to thew tables. Because this is a different set of tables the data-conversion requires complex joins to put the old data into the new tables correctly.
So, old tables old data.
New tables must have the old data but it requires lots of joins to get that old data into the new tables correctly.
Can I use direct path with lots of joins like this? INSERT SELECT (lots of joins)
Does direct path apply to tables that are already on the same database (transfer between tables)? Is it only for loading tables from say a text file?
Thank you.
The query in your SELECT can be as complex as you'd like with a direct-path insert. The direct-path refers only to the destination table. It has nothing to do with the way that data is read or processed.
If you're doing a direct-path insert, you're asking Oracle to insert the new data above the high water mark of the table so you bypass the normal code that reuses space in existing blocks for new rows to be inserted. It also has to block other inserts since you can't have the high water mark of the table change during a direct-path insert. This probably isn't a big deal if you've got a downtime window in which to do the load but it would be quite problematic if you wanted the existing tables to be available for other applications during the load.
No, on the contrary, it means you need to do a backup after a NOLOGGING load, not that you can't backup the database.
Allow me to elaborate a bit. Normally, when you do DML in Oracle, the before images of the changes you are are making get logged in UNDO, and all the changes (including the UNDO changes) are first written to REDO. This is how Oracle manages transactions, instance recovery, and database recovery. If a transaction is aborted or rolled back, Oracle uses the information in UNDO to undo the changes your transaction made. If the instance crashes, then on instance restart, Oracle will use the information in REDO and UNDO to recover up to the last committed transaction. First, Oracle will read the REDO and roll forward, then, use UNDO to roll back all the transactions that were not committed at the time of the crash. In this way, Oracle is able to recover up to the last committed transaction.
Now, when you specify an APPEND hint on an insert statement, Oracle will execute the INSERT with direct load. This means that data is loaded into brand new, never before used blocks, from above the highwater mark. Because the blocks being loaded are brand new, there is no "before image", so, Oracle can avoid writing UNDO, which improves performance. If the database is in NOARCHIVELOG mode, then Oracle will also not write REDO. On a database in ARCHIVELOG mode, Oracle will still write REDO, unless, before you do the insert /*+ append */, you set the table to NOLOGGING, (i.e. alter table tab_name nologging;). In that case, REDO logging is disabled for the table. However, this is where you could run into backup/recovery implications. If you do a NOLOGGING direct load, and then you suffer a media failure, and the datafile containing the segment with the nologging operation is restored from a backup taken before the nologging load, then the redo log will not contain the changes required to recover that segment. So, what happens? Well, when you do a NOLOGGING load, Oracle writes extent invaldation records to the redo log, instead of the actual changes. Then, if you use that redo in recovery, those data blocks will be marked logically corrupt. Any subsequent queries against that segment will get an ORA-26040 error.
So, how to avoid this? Well, you should always take a backup imediately following any NOLOGGING direct load. If you restore/recover from a backup taken after the nologging load, there is no problem, because the data will be in the datablocks in the file that was restored.
Hope that's clear,
-Mark
Yes, there should not be any arbitrary limits on query complexity.
If you do
insert /*+ APPEND */ into target_table select .... from source1, source2..., sourceN where
It should work fine. Consider though, that the performance of the load will be limited by the performance of that query, so, be sure it's well-tuned, if you're expecting good performance.
Finally, consider whether setting NOLOGGING on the target table would improve performance significantly. But, also consider the backup recovery implications, if you decide to implement NOLOGGING.
Hope that helps,
-Mark

Resources