Hive copy schema without partitions or remove paritioning - hadoop

I'm trying to create a table and copy another table's schema like so:
CREATE TABLE IF NOT EXISTS new_table LIKE old_table;
When I do this, because old_table is a partitioned, external table, it partitions new_table as well. I don't need or want new_table to be partitioned, I essentially just want the column definitions. There are a lot of them, and I have do this kind of thing often in my pipeline, so I'm essentially being lazy because I don't want a huge mess of column definitions spattered throughout my script.
Can I either copy the table schema and ignore partitioning, or can I at least remove partitioning once new_table is created? I have managed to find a way to drop partitions, but not to remove partitioning altogether.

You could use this query to create new_table without partitions:
CREATE TABLE IF NOT EXISTS new_table AS SELECT * FROM old_table LIMIT 1;

Related

Bulk update in Oracle12c

I have a situation like to update a column(all rows) in a table having 150 million records.
Creation of duplicate table with updates and dropping of previous table is the best way but there is no available disk space to hold the duplicate table.
So how to perform the update in less time? Partitions are there on the table.
I am using oracle 12c
The cleanest approach is NOT updating the table, but creating a new table with the new column of updated rows. For instance, let's say I needed to update a column called old_value with the max of some value, instead of updating the old_table one does:
create new_table as select foo, bar, max(old_value) from old_table;
drop table old_table;
rename new_table as old_table.
If you need even more speed, you can do this creation using a parallel query with nologging thereby generating very little redo and no undo logs. More details can be ascertained here: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:6407993912330

Add partition to hive table with no data

I am trying to create a hive table which has the same columns as another table (partitioned). I use the following query for the same
CREATE TABLE destTable STORED AS PARQUET AS select * from srcTable where 1=2;
Apparently I cannot use 'PARTITIONED BY(col_name)' because destTable must not be partitioned. But I want to mention that destTable should be partitioned by a column (same as srcTable) before I add data to it.
Is there a way to do that?
As you mentioned, destTable can not be a partitioned table so there is no way to do this directly. Also, destTable can not be an external table.
In this situation you will need to create a temporary "staging_table" (un-partitioned and a Hive-managed table) to hold the data.
Step 1: Transfer everything from srcTable to the staging_table
Step 2: Create a partitioned destTable and do:
INSERT OVERWRITE TABLE destTable PARTITION(xxxx)
SELECT * FROM staging_table;
Hope this helps.

Oracle, Spring JDBC, the fastest way how to delete data from table

I use JdbcTemplate's batchUpdate to insert big amount of data into db tables
What is the fastest way to delete it?
I use
DELETE FROM tableName WHERE ID >= MIN_ID
where MIN_ID is the startValue of the sequence = 1000000
Note, I want to start from MIN_ID and keep data with IDs below 1000000
Is there better approach?
What best practices should one follow?
Note, I use Oracle db
If you want to delete the entire table since MIN_ID is the startValue, you can use TRUNCATE TABLE as TRUNCATE TABLE tableName;
If you want to delete big part of table it is usually cheaper to create new_table as select * from tableName where nvl(ID,-1) < MIN_ID; and swap names. Don't forget about swaping indexes and recreating constraints and other objects. Of course if someone can insert the table during process of swapping you need either lock table or implement some consistency mechanism that will take care of data inserted during swap.

Oracle Create Table as Select * from Another_Table same table space

I didn't design the DB so don't judge me on this.
I have a log table that is receiving A LOT of entries. I only need to keep a day or so on this this log table. My initial thought was:
In a single transaction:
1. rename the log table
2. create the original log table from the renamed log table
3. commit the trx and life goes on
The second time this happens I drop the renamed table and do it all over again. This will run as an Oracle job once a day.
The original question:
Would anyone know if I specify a table space name in table #1 like so:
create table "my_user"."first_table" (pkid number, full_name varchar2(50)) nologging tablespace "my_custom_tablespace";
Then I do something like:
create table second_table as select * from first_table where 1=2 -- because I only want the structure
Will my second_table be in the same table_space?
Thanks in advance for your help.
If you are on Enterprise Edition with partitioning, then a simpler solution is to go with an interval partitioned table, with one partition per day. Then truncate the partitions when you don't need them.
If not, then go with two tables, a synonym to point to the 'current' one that is being inserted into, and a view that selects from a union of the two tables. The nightly job would truncate the 'old' table and switch the synonym to make it the 'new' one.

Oracle how to delete from a table except few partitions data

I have a big table with lot of data partitioned into multiple partitions. I want to keep a few partitions as they are but delete the rest of the data from the table. I tried searching for a similar question and couldn't find it in stackoverflow. What is the best way to write a query in Oracle to achieve the same?
It is easy to delete data from a specific partition: this statement clears down all the data for February 2012:
delete from t23 partition (feb2012);
A quicker method is to truncate the partition:
alter table t23 truncate partition feb2012;
There are two potential snags here:
Oracle won't let us truncate partitions if we have foreign keys referencing the table.
The operation invalidates any partitioned Indexes so we need to rebuild them afterwards.
Also, it's DDL, so no rollback.
If we never again want to store data for that month we can drop the partition:
alter table t23 drop partition feb2012;
The problem arises when we want to zap multiple partitions and we don't fancy all that typing. We cannot parameterise the partition name, because it's an object name not a variable (no quotes). So leave only dynamic SQL.
As you want to remove most of the data but retain the partition structure truncating the partitions is the best option. Remember to invalidate any integrity constraints (and to reinstate them afterwards).
declare
stmt varchar2(32767);
begin
for lrec in ( select partition_name
from user_tab_partitions
where table_name = 'T23'
and partition_name like '%2012'
)
loop
stmt := 'alter table t23 truncate partition '
|| lrec.partition_name
;
dbms_output.put_line(stmt);
execute immediate stmt;
end loop;
end;
/
You should definitely run the loop first with execute immediate call commented out, so you can see which partitions your WHERE clause is selecting. Obviously you have a back-up and can recover data you didn't mean to remove. But the quickest way to undertake a restore is not to need one.
Afterwards run this query to see which partitions you should rebuild:
select ip.index_name, ip.partition_name, ip.status
from user_indexes i
join user_ind_partitions ip
on ip.index_name = i.index_name
where i.table_name = 'T23'
and ip.status = 'UNUSABLE';
You can automate the rebuild statements in a similar fashion.
" I am thinking of copying the data of partitions I need into a temp
table and truncate the original table and copy back the data from temp
table to original table. "
That's another way of doing things. With exchange partition it might be quite quick. It might also be slower. It also depends on things like foreign keys and indexes, and the ratio of zapped partitions to retained ones. If performance is important and/or you need to undertake this operation regularly then you should to benchmark the various options and see what works best for you.
You must very be careful in drop partition from a partition table. Partition table usually used for big data tables and if (and only if) you have a global index on the table, drop partition make your global index invalid and you should rebuild your global index in a big table, this is disaster.
For minimum side effect for queries on the table in this scenario, I first delete records in the partition and make it empty partition, then with
ALTER TABLE table_name DROP PARTITION partition_name UPDATE GLOBAL INDEXES;
drop empty partition without make my global index invalid.

Resources