How to stop concurrent writing in Delta Lake External Table? - azure-databricks

General EXTERNAL table like Oracle doesn't allow Insert/Update operation. But Databricks EXTERNAL Delta Table enables Update/Insert operation. This way I can see a flaw, or is there anyway to stop that? Example -
CREATE TABLE employee
USING DELTA
LOCATION '/mnt/ADLS/employee'
PARTITIONED BY (date)
This will create an External table of employee loading data from /mnt/ADLS/employee. Now if I again create another External table say employee_new and do an INSERT it will get reflected to actual employee table. Is there anyway to stop this?

Related

AUTO-Scynhronize a table based on view - ORACLE DATABASES

I want to ask you if there is a solution to auto-synchronize a table ,e.g., every one minute based on view created in oracle.
This view is using data from another table. I've created a trigger but I noticed a big slowness in database whenever a user update a column or insert a row.
Furthermore, I've tested to create a job schedule on the specified table (Which I wanted to be synchronized with the view), however we don't have the privilege to do this.
Is there any other way to keep data updated between the table and the view ?
PS : I'm using toad for oracle V 12.9.0.71
A materialized view in Oracle is a database object that contains the results of a query. They are local copies of data located remotely, or are used to create summary tables based on aggregations of a table's data. Materialized views, which store data based on remote tables, are also known as snapshots.
Example:
SQL> CREATE MATERIALIZED VIEW mv_emp_pk
REFRESH FAST START WITH SYSDATE
NEXT SYSDATE + 1/48
WITH PRIMARY KEY
AS SELECT * FROM emp#remote_db;
You can use cronjob or dbms_jobs to schedule a snapshot.

How to use external table in hive?

Can anyone please explain why and where do we use external tables in hive?
Please explain a scenario to understand easily.
We use external table when our underlying dataset pointed by hive table is shared by many purpose i.e for map reduce job, pig etc and use managed table in hive when our dataset pointed by hive table is used only by hive application.
Actually in hive managed table has full control on dataset i.e in managed table if you will drop the table dataset will also be deleted from hive warehouse(/usr/hive/warehouse) present in HDFS, but in case of external table when you drop the table, dataset are not deleted from hive warehouse in HDFS.
Suppose take an example you have 50 gb data set now if you create multiple copies of dataset for different purpose it will simply take more space so the better option is to use external table so that when you drop the table dataset are not deleted and you can use it further by any other application like by pig or by any other purpose.
As a rule of thumb: use external table if you plan to work with those data not only from Hive but from other frameworks as well. Otherwise make it internal.
The only difference between External and Managed table in Hive is Drop table or Drop partition behavior. For Managed it will drop data as well, for External table the data will remain untouched in the table/partition location.
Use External in most cases. External table allows you to change table definition easily. Also you can create few tables on top of the same location.
Use Managed table if the table is temporary/intermediate and data should be deleted to free space.
Managed table can be converted to external and vice-versa using
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');

How to safely update hive external table

I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking

oracle synchronize 2 tables

I have the following scenario and need to solve it in ORACLE:
Table A is on a DB-server
Table B is on a different server
Table A will be populated with data.
Whenever something is inserted to Table A, i want to copy it to Table B.
Table B nearly has similar columns, but sometimes I just want to get
the content from 2 columns from tableA and concatenate it and save it to
Table B.
I am not very familiar with ORACLE, but after researching on GOOGLE
some say that you can do it with TRIGGERS or VIEWS, how would you do it?
So in general, there is a table which will be populated and its content
should be copien to a different table.
This is the solution I came up so far
create public database link
other_db
connect to
user
identified by
pw
using 'tns-entry';
CREATE TRIGGER modify_remote_my_table
AFTER INSERT ON my_table
BEGIN INSERT INTO ....?
END;
/
How can I select the latest row that was inserted?
If the databases of these two tables are in two different servers, then you will need a database link (db-link) to be created in Table A schema so that it can access(read/write) the Table B data using db-link.
Step 1: Create a database link in Table A server db pointing to Table B server DB
Step 2: Create a trigger for Table A, which helps in inserting data to the table B using database link. You can customize ( concatenate the values) inside the trigger before inserting it into table B.
This link should help you
http://searchoracle.techtarget.com/tip/How-to-create-a-database-link-in-Oracle
Yes you can do this with triggers. But there may be a few disadvantages.
What if database B is not available? -> Exception handling in you trigger.
What if database B was not available for 2h? You inserted data into database A which is now missing in database B. -> Do crazy things with temporarily inserting it into a cache table in database A.
Performance. Well, the performance for inserting a lot of data will be ugly. Each time you insert data, Oracle will start the PL/SQL engine to insert the data into the remote database.
Maybe you could think about using MViews (Materialized Views) to replicate the data via database link. Later you can build your queries so that they access tables from database B and add the required data from database A by joining the MViews.
You can also use fast refresh to replicate the data (almost) realtime.
From perspective of an Oracle Database Admin this would make a lot more sense than the trigger approach.
try this code
database links are considered rather insecure and oracle own options are having licences associated these days, some of the other options are deprecated as well.
https://gist.github.com/anonymous/e3051239ba401e416565cdd912e0de8c
uses ora_rowscn to sync tables across two different oracle databases.

DDL sync informatica

I have a question: I have a table (say tableA) in a database (say dbA) and I need to mirror tableA as another table (say tableB) in another database (say dbB).
I know this can be done via (materialised) view or via informatica. But by problem is that I need to sync DDL as well. For example if a column is added in tableA, the column should automatically reflect in tableB.
Can this be done anyway directly via oracle or Informatica.
(or I will have to write a procedure to sync table on basis of all_tab_cols).
Yes, you could:
create another database as a logical standby database with Data Guard
use Oracle Streams
I would use (2) if you just need a single table in the other database or (1) if you need an entire schema (or more).

Resources