How to duplicate a partitioned table - oracle

I am constructing a test environment. Oracle 11g is my database. My goal is to place 80 million records in this database. I will start with 1 million records, which will be loaded into the partitioned table. Is there a way to duplicate the initial partition to create 80 partitions for a grand total of 80Meg records. The constraint is this process should take no longer than two hours to generate 80 million records.

After inserting the first partition, follow this principle:
INSERT INTO my_table (partition_column, col1, col2, col3, ...)
SELECT level, col1, col2, col3, ...
FROM my_table
CONNECT BY level < 80
The partition_column is just an assumption about your actual partitioning. You might have to change some values in the SELECT list for the new records to be put into different partitions. It helps to turn off constraints and indexes during this insert.

Related

Why reading from some table in Oracle slower than other table in the same database

I am doing a simple Select col1, col2, col22 from Table1 order by col1 and the same Select statement in Table2. Select col1, col2, col22 from Table2 order by col1.
I use Pentaho ETL tool to replicate data from Oracle 19c to SQL Server. Reading from Table1 is much much slower than reading from Table2. Both have almost the same number for columns and almost the same number for rows. Both exist in the same schema. Table1 is being read at 10 rows per sec while Table2 is being read at 1000 rows a sec.
What can cause this slowness?
Are the indexes the same on the two tables? It's possible Oracle is using a fast full index scan (like a skinny version of the table) if an index covers all the relevant columns in one table, or may be using a full index scan to pre-sort by COL1. Check the execution plans to make sure the statements are using the same access methods:
explain plan for select ...;
select * from table(dbms_xplan.display);
Are the table segment sizes the same? Although the data could be the same, occasionally a table can have a lot of wasted space. For example, if the table used to contain a billion rows, and then 99.9% of the rows were deleted, but the table was never rebuilt. Compare the segment sizes with a query like this:
select segment_name, sum(bytes)/1024/1024 mb
from all_segments
where segment_name in ('TABLE1', 'TABLE2')
It depends on many factors.
The first things I would check are the table indexes:
select
uic.table_name,
uic.index_name,
utc.column_name
from USER_TAB_COLUMNS UTC,
USER_IND_COLUMNS UIC
where utc.table_name = uic.table_name
and utc.column_name = uic.column_name
and utc.table_name in ('TABLE1', 'TABLE2')
order by 1, 2, 3;

Joins on two large tables using UDF in Hive - performance is too slow

I have two tables in hive. One has around 2 millions of records and other has 14 miliions of records. I am joining these two tables. Also I am applying UDF in WHERE clause. It is taking too much time to perform JOIN operation.
I have tried to run the query for many times but it run for around 2 hrs and still my reducer remains at 70% and after that I am getting exception "java.io.IOException: No space left on device" and job gets killed.
I have tried to set the parameters as below:
set mapreduce.task.io.sort.mb=256;
set mapreduce.task.io.sort.factor=100;
set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;
set mapred.child.java.opts=-Xmx1024m;
My Query -
insert overwrite table output select col1, col2, name1, name2, col3, col4,
t.zip, t.state from table1 m join table2 t ON (t.state=m.state and t.zip=m.zip)
where matchStrings(concat(name1,'|',name2))>=0.9;
The above query takes 8 mappers and 2 reducers.
Can someone please suggest what do I suppose to do to improve performance.
That exception probably indicates that you do not have enough space in the cluster for the temporary files created by the query you are running. You should try adding more disk space to the cluster or reducing the amount of rows that are joined by using a subquery to first filter the rows from each table.

Hive: Creating smaller table from big table

I currently have a Hive table that has 1.5 billion rows. I would like to create a smaller table (using the same table schema) with about 1 million rows from the original table. Ideally, the new rows would be randomly sampled from the original table, but getting the top 1M or bottom 1M of the original table would be ok, too. How would I do this?
As climbage suggested earlier, you could probably best use Hive's built-in sampling methods.
INSERT OVERWRITE TABLE my_table_sample
SELECT * FROM my_table
TABLESAMPLE (1m ROWS) t;
This syntax was introduced in Hive 0.11. If you are running an older version of Hive, you'll be confined to using the PERCENT syntax like so.
INSERT OVERWRITE TABLE my_table_sample
SELECT * FROM my_table
TABLESAMPLE (1 PERCENT) t;
You can change the percentage to match you specific sample size requirements.
You can define a new table with the same schema as your original table.
Then use INSERT OVERWRITE TABLE <tablename> <select statement>
The SELECT statement will need to query your original table, use LIMIT to only get 1M results.
This query will pull out top 1M rows and overwrite them in a new table.
CREATE TABLE new_table_name AS
SELECT col1, col2, col3, ....
FROM original_table
WHERE (if you want to put any condition) limit 100000;

Wrong index is chosen by Oracle

I have a problem in indexing in Oracle. Will try to explain my problem with an instance as follows.
I have a table TABLE1 with columns A,B,C,D
another table TABLE2 with columns A,B,C,E,F,H
I have created Indexes for TABLE1
IX_1 A
IX_2 A,B
IX_3 A,C
IX_4 A,B,C
I have created Indexes for TABLE1
IY_1 A,B,C
IY_2 A
when i gave query similar to this
SELECT * FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
When i give Explain Plan i got its not getting IX_1 nor IY_2
Its taking IX_4 nor IY_1
why this is not picking right index?
EDITED:
Can anyone help me to know difference between INDEX RANGE SCAN,INDEX UNIQUE SCAN, INDEX SKIP SCAN
I guess SKIP SCAN means when a column is skipped in Composite Index by Oracle
what about others i dont have idea!
The best benefit of indexes is that you can select a few rows from a table without scanning the entire table.
If you ask for too many rows(let's say 30% - depends of many things) the engine will prefer to scan the entire table for those rows.
That's because reading a row using an index is gets an overhead : reading some index blocks, and after that reading table blocks.
In your case, in order to join tables T1 and T2, Oracle needs all the rows from those table. Reading(full) the index will be an unsefull operation, adding unnecesary cost.
UPDATE: A step forward: if you run:
SELECT T1.B, T2.B FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
Oracle probably will use the indexes(IX2, IY2), because it does not need to read anything from table, because the values T1.B, T2.B, are in indexes.

Oracle : Identifying duplicates in a table without index

When I try to create a unique index on a large table, I get a unique contraint error. The unique index in this case is a composite key of 4 columns.
Is there an efficient way to identify the duplicates other than :
select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1
The explain plan above shows full table scan with extremely high cost, and just want to find if there is another way.
Thanks !
Try creating a non-unique index on these four columns first. That will take O(n log n) time, but will also reduce the time needed to perform the select to O(n log n).
You're in a bit of a bind here -- any way you slice it, the entire table has to be read in at least once. The naïve algorithm runs in O(n2) time, unless the query optimizer is clever enough to build a temporary index/table.
You can use the EXCEPTIONS INTO clause to trap the duplicated rows.
If you don't already have an EXCEPTIONS table create one using the provided script:
SQL> #$ORACLE_HOME/rdbms/admin/ultexcpt.sql
Now you can attempt to create a unique constraint like this
alter table Table1
add constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/
This will fail but now your EXCEPTIONS table contains a list of all the rows whose keys contain duplicates, identified by ROWID. That gives you a basis for deciding what to do with the duplicates (delete, renumber, whatever).
edit
As others have noted you have to pay the cost of scanning the table once. This approach gives you a permanent set of the duplicated rows, and ROWID is the fastest way of accessing any given row.
Since there is no index on those columns, that query would have to do a full table scan - no other way to do it really, unless one or more of those columns is already indexed.
You could create the index as a non-unique index, then run the query to identify the duplicate rows (which should be very fast once the index is created). But I doubt if the combined time of creating the non-unique index then running the query would be any less than just running the query without the index.
In fact, you need to look for a duplicate of every single row in a table. No way to do this effectively without an index.
I don't think there is a quicker way unfortunately.

Resources