How to avoid insertion of duplicate records into Database using sqlldr? - oracle

My table has no restrictions to deal with duplicate data. How can I achieve that, no same data is inserted into the table using 'sqlldr' utility? Do I have to make changes in the control file or in the parameters? or anything else?

You can't do it with sqlldr alone. One way is to remove duplicates after the load finishes.
To removing duplicates after sqlldr loads them:
DELETE FROM
table_name A
WHERE
a.rowid >
ANY (
SELECT
B.rowid
FROM
table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
);
Or you could use external tables for loading the data and just remove duplicates during the load using a query with distinct or group by clauses, or using a merge statement.

Related

How to make insert statement re-runnable?

Need to add two following insert statements:
insert into table1(schema, table_name, table_alias)
values ('ref_owner','test_table_1','tb1');
insert into table1(schema, table_name, table_alias)
values ('dba_owner','test_table_2','tb2');
Question is how can I make those two insert statements re-runnable meaning, if those two insert statement are compiled again, it should throw row exists error or something along those lines...?
Additional notes:
1. I've seen examples of Merge in Oracle however, thats only when you're using two tables to match records. In this case im only using a single table.
2. The table does not have any primary, unique or foreign keys - only check constraints on one of the columns.
Any help is highly appreciated.
You can use a MERGE statement, as follows:
MERGE into table1 t1
USING (SELECT 'ref_owner' AS SCHEMA_NAME, 'test_table_1' AS TABLE_NAME, 'tb1' AS ALIAS_NAME FROM DUAL
UNION ALL
SELECT 'dba_owner', 'test_table_2', 'tb2' FROM DUAL) d
ON (t1.SCHEMA = d.SCHEMA_NAME AND
t1.TABLE_NAME = d.TABLE_NAME)
WHEN NOT MATCHED THEN
INSERT (SCHEMA, TABLE_NAME, TABLE_ALIAS)
VALUES (d.SCHEMA_NAME, d.TABLE_NAME, d.ALIAS_NAME)
Best of luck.
You should have a primary key, especially when you want to check for duplicate records and data integrity.
Provide a primary key for your table, or, if you somehow do not want to do that, create a unique constraint for all of the columns in the table, so no duplicate rows are possible.

updating a table in VFP

There are two tables in my database. I am trying to update a column in table2 by setting it equal to one of the columns in table1. I've already looked at this answer visual foxpro - need to update table from another table
And tried to do this on my code, however, I kept having a syntax error on UPDATE table2. Why?
Here is what I have.
ALTER TABLE table2;
ADD COLUMN base2 B(8,2);
UPDATE table2
WHERE table2.itemid=table1.itemid from table1;
SET table2.base2=table1.base;
The simplest syntax is:
update table2 from table1 where table2.itemid = table1.itemid ;
set table2.base2 = table1.base
You could also add more fields to update separated by commas, i.e.
... set table2.base2 = table1.base, table2.this = table1.that
Using 'standard' VFP language syntax and RELATED Tables, you could quite easily do the following:
USE Table1 IN 0 EXCLUSIVE
SELECT Table1
INDEX ON ID TAG ID && Create Index on ID field
USE Table2 IN 0
SELECT Table2
SET RELATION TO ID INTO Table1
REPLACE ALL Table2.ID WITH Table1.ID FOR !EMPTY(Table2.ID)
You might want to spend some time looking over the free, on-line tutorial videos at: Learn Visual Foxpro # garfieldhudson.com
The videos named:
* Building a Simple Application - Pt. 5
and
* Q&A: Using Related Tables In A Report
Both discuss using VFP's language to work with Related Tables
Good Luck
Use join
Update table2 b
Join table1 a on b. Itemid=a.itemid
Set b. Base2=a.base

How to populate columns of a new hive table from multiple existing tables?

I have created a new table in hive (T1) with columns c1,c2,c3,c4. I want to populate data into this table by querying from other existing tables(T2,T3).
E.g c1 and c2 come from a query run on T2 & the other columns c3 and c4 come from a query run on T3.
Is this possible in hive ? I have done immense research but still am unable to find a solution to this
Didn't something like this work?
create table T1 as
select t2.c1, t2.c2, t3.c3, t3.c4 from (some query against T2) t2 JOIN (some query against T3) t3
Obviously replace JOIN with whatever is needed. I assume some join between T2 and T3 is possible or else you wouldn't be putting their columns alongside each other in T1.
According to the hive documentation, you can use the following syntax to insert data:
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Be careful that:
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
So, I would make a JOIN between the two existing table, and then insert only the needed values in the target table playing around with SELECT. Or maybe creating a temporary table would allow you to have more control over the data. Just remember to handle the problem with NULL, as stated in the official documentation. This is just an idea, I guess there are other ways to achieve what you need, but could be a good place to start from.

Dynamically load partitions in Hive with predicate pushdown

I have a very large table in Hive, from which we need to load a subset of partitions. It looks something like this:
CREATE EXTERNAL TABLE table1 (
col1 STRING
) PARTITIONED BY (p_key STRING);
I can load specific partitions like this:
SELECT * FROM table1 WHERE p_key = 'x';
with p_key being the key on which table1 is partitioned. If I hardcode it directly in the WHERE clause, it's all good. However, I have another query which calculates which partitions I need. It's more complicated than this, but let's define it simply as:
SELECT DISTINCT p_key FROM table2;
So now I should be able to construct a dirty query like this:
SELECT * FROM table1
WHERE p_key IN (SELECT DISTINCT p_key FROM table2);
Or written as an inner join:
SELECT t1.* FROM table1 t1
JOIN (SELECT DISTINCT p_key FROM table2) t2 ON t1.p_key = t2.p_key
However, when I run this, it takes enough time to let me believe it's doing a full table scan. In the explain for the above queries, I can also see the result of the DISTINCT operation are used in the reducer, not the mapper, meaning it would be impossible for the mapper to know which partitions should be loaded or not. Granted, I'm not fully familiar with Hive explain output, so I may be overlooking something.
I found this page: MapJoin and Partition Pruning on the Hive wiki and the corrosponding ticket indicates it was released in version 0.11.0. So I should have it.
Is it possible to do this? If so, how?
I'm not sure how to help with MapJoin, but in the worst case you could dynamically create second query with something like:
SELECT concat('SELECT * FROM table1 WHERE p_key IN (',
concat_ws(',',collect_set(p_key)),
')')
FROM table2;
then execute obtained result. With this, query processor should be able to prune unneeded partitions.

sybase insert from another database in same server

i am trying to get all extra data from one data base and trying to insert into another.
But i want to omit the column name and am trying to make only the table name as hard coded to achieve this. But we have some fields which are system generated in a table like an id which is not that necessary a data but still will create a integrity issue. How can i do a insert of just the wanna be details omitting those above columns, the names of the columns to omit also changes.. I can't do a total insert, just the addition of some extra data.
so far i have come to this.
while 1=1
begin
if exists(select 1 from db1.table1 not in (select * from db2.table1)
begin
insert into db2.table1 (columns) select (columns) from db1.table1
end
if(rowCount=0)
break
end
please advise how i can optimize this to get the least possible hard coding
Have left the pk part intentionally, as
the query being big.
If you want to something like:
insert into TAB
select * from TAB2
or
insert into TAB
select col1,col2 from TAB2
or
insert into TAB (col1,col2)
select * from TAB2
where TAB1 and TAB2 have different count or type of columns it's not possible, because it will generate an error.

Resources