ORACLE: Identify and update invalid duplicate records - oracle

I need some help.
I have a table as below.
+---------------+------------------+---------------+-------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+-------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+-------+
My requirement is: Same ITEM_NO cannot have different ITEM_CATEGORY.
So in above table, TestItem10001 has two different categories as Cat1 and Cat2. Which is invalid. In such case, I want to update ERROR column with some error string like:
+---------------+------------------+---------------+------------------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+------------------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+------------------+
Please suggest how this can be achieved in a cleaner way with less expensive approach as the real time table will have millions of records.
Thank you in advance.
EDIT1:
Create and inserts as requested in comments.
CREATE TABLE STAGING_TABLE
(
"ITEM_NO" VARCHAR2(1000 BYTE),
"ITEM_DESCRIPTION" VARCHAR2(1000 BYTE),
"ITEM_CATEGORY" VARCHAR2(1000 BYTE),
"ERROR" VARCHAR2(1000 BYTE)
);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat1',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat2',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10003','TestItem10003','Cat3',null);

Since it does not matter which row of the duplicates will be marked as invalid, you van use this query:
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
which returns the minimum ITEM_CATEGORY for each ITEM_NO with a MERGE INTO statement:
MERGE INTO STAGING_TABLE s
USING (
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
) t
ON (t.ITEM_NO = s.ITEM_NO AND t.MIN_ITEM_CATEGORY <> s.ITEM_CATEGORY)
WHEN MATCHED THEN
UPDATE SET s.ERROR = 'INVALID CATEGORY'
See the demo.
Results:
> ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR
> :------------ | :--------------- | :------------ | :---------------
> TestItem10001 | TestItem10001 | Cat1 | null
> TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10003 | TestItem10003 | Cat3 | null

You can use the below SQL to achieve your purpose
select item_no,item_description,item_category,'INVALID CATEGORY' from (
select count(item_no) over (partition by item_no order by item_category)item_cnt,
count(item_no) over (partition by item_no,item_category order by item_no)
category_cnt,
st.* from staging_table st)
where item_cnt<> category_cnt

Related

Querying HIVE Metadata

I need to query the following table and view information from my Apache HIVE cluster:
Each row needs to contain the following:
TABLE SCHEMA
TABLE NAME
TABLE DESCRIPTION
COLUMN NAME
COLUMN DATA TYPE
COLUMN LENGTH
COLUMN PRECISION
COLUMN SCALE
NULL OR NOT NULL
PRIMARY KEY INDICATOR
This can be easily queried from most RDBMS (metadata tables/views), but I am struggling to find much information about the equivalent metadata tables/views in HIVE.
Please help :)
This information is available from the Hive metastore. The below example query is for a MySQL-backed metastore (Hive version 1.2).
SELECT
DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN
(
SELECT DISTINCT TBL_ID, TBL_COMMENT
FROM
(
SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY, TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment' THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
FROM TBLS JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
) TBL_COMMENTS_INTERNAL
) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Sample output:
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_DESCRIPTION | COLUMN_NAME | COLUMN_DATA_TYPE_DETAILS |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| default | temp003 | This is temp003 table | col1 | string |
| default | temp003 | This is temp003 table | col2 | array<string> |
| default | temp003 | This is temp003 table | col3 | array<string> |
| default | temp003 | This is temp003 table | col4 | int |
| default | temp003 | This is temp003 table | col5 | decimal(10,2) |
| default | temp004 | | col11 | string |
| default | temp004 | | col21 | array<string> |
| default | temp004 | | col31 | array<string> |
| default | temp004 | | col41 | int |
| default | temp004 | | col51 | decimal(10,2) |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
Metastore tables referred in query:
DBS: Details of databases/schemas.
TBLS: Details of tables.
COLUMNS_V2: Details about columns.
SDS: Details about storage.
TABLE_PARAMS: Details about table parameters (key-value pairs)

Update in oracle with joining two table

I have these two tables below, I need to update Table1.Active_flag to Y, where Table2.Reprocess_Flag is N.
Table1
+--------+--------------+--------------+--------------+-------------+
| Source | Subject_area | Source_table | Target_table | Active_flag |
+--------+--------------+--------------+--------------+-------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | N |
| d | PRODUCT | PD_PD1 | PD_PD1 | N |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+-------------+
Table2
| Source | Subject_area | Source_table | Target_table | Reprocess_Flag |
+--------+--------------+--------------+--------------+----------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | Y |
| d | PRODUCT | PD_PD1 | PD_PD1 | Y |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+----------------+
Use all three columns in a single select statement.
UPDATE hdfs_cntrl SET active_flag = 'Y'
where (source,subject_area ,source_table ) in ( select source,subject_area ,source_table from proc_cntrl where Reprocess_Flag = 'N');
Updating one table based on data in another table is almost always best done with the MERGE statement.
Assuming source is a unique key in table2:
merge into table1 t1
using table2 t2
on (t1.source = t2.source)
when matched
then update set t1.active_flag = 'Y'
where t2.reprocess_flag = 'N'
;
If you are not familiar with the MERGE statement, read about it - it's just as easy to learn as UPDATE and INSERT and DELETE, it can do all three types of operations in a single statement, it is much more flexible and, in some cases, more efficient (faster).
merge into table1 t1
using table2 t2
on (t1.sorce=t2.source and t1.Subject_area = t2.Subject_area and t1.Source_table = t2.Source_table and t1.Target_table = t2.Target_table and t2.flag_status = 'N')
when matched then update set
t1.flag = 'Y';
UPDATE hdfs_cntrl SET active_flag = 'Y' where source in ( select source from proc_cntrl where Reprocess_Flag = 'N') and subject_area in (select subject_area from proc_cntrl where Reprocess_Flag = 'N') and source_table in (select target_table from proc_cntrl where Reprocess_Flag = 'N')

insert id number only in sql

I have a SQL Server table like this
+----+-----------+------------+
| id | acoount | date |
+----+-----------+------------+
| | John | 2/6/2016 |
| | John | 2/6/2016 |
| | John | 4/6/2016 |
| | John | 4/6/2016 |
| | Andi | 5/6/2016 |
| | Steve | 4/6/2016 |
+----+-----------+------------+
i want insert the id coloumn like this.
+-----------+-----------+------------+
| id | acoount | date |
+-----------+-----------+------------+
| 020616001 | John | 2/6/2016 |
| 020616002 | John | 2/6/2016 |
| 040616001 | John | 4/6/2016 |
| 040616002 | John | 4/6/2016 |
| 050616001 | Andi | 5/6/2016 |
| 040616003 | Steve | 4/6/2016 |
+-----------+-----------+------------+
I want to generate id number of the date provided like this. 02+06+16(from date)+001 = 020616001. if have same date, id + 1.
I have tried but still failed .
I want make it in oracle sql develop.
Someone help me.
Thanks.
Try the below SQL as per the given data, Its in SQL Server 2012....
select REPLACE(CONVERT(VARCHAR(10),convert(date,t.[date]), 101), '/', '')
+'00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) as ID,
t.account,
t.date
from (values ('John','2/6/2016'),
('John','2/6/2016'),
('John','4/6/2016'),
('John','4/6/2016'),
('Andi','5/6/2016'),
('Steve','4/6/2016'))T(account,[date])
Update your table using statement .
update table set id= replace(CONVERT(VARCHAR(10),CONVERT(datetime ,date,103),3) ,'/', '') + Right('00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) ,3)
MySql
i can give you the logic of 020616001 this part right now .......
for same id +1 i have to work on it....that i ll let u know after my work
insert into table_name(id)
select concat
(
if(length (day(current_date))>1,day(current_date),Concat(0,day(current_date))),
if(length (month(current_date))>1,month(current_date),Concat(0,month(current_date))),
(right(year(current_date),2)),'001'
)as id
you cannot convert your dates column to datetime type in normal way because it is dd/mm/yyyy.
Try this,
declare #t table(acoount varchar(50),dates varchar(20))
insert into #t values
('John','2/6/2016')
,('John','2/6/2016')
,('John','4/6/2016')
,('John','4/6/2016')
,('Andi','5/6/2016')
,('Steve','4/6/2016')
;With CTE as
(select * , SUBSTRING(dates,0,charindex('/',dates)) dd
,SUBSTRING(stuff(dates,1,charindex('/',dates),''),0, charindex('/',stuff(dates,1,charindex('/',dates),''))) MM
,right(dates,2) yy
from #t
)
,CTE1 as
(
select *
,ROW_NUMBER()over(partition by yy,mm,dd order by yy,mm,dd)rn from cte c
)
select *, REPLICATE('0',2-len(dd))+cast(dd as varchar(2))
+REPLICATE('0',2-len(MM))+cast(MM as varchar(2))
+yy+REPLICATE('0',3-len(rn))+cast(rn as varchar(2))
from cte1

How to use ResultSet to fetch the ID of the record

I have got a table with name table_listnames whose structure is given below
mysql> desc table_listnames;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.04 sec)
It has got sample data as shown
mysql> select * from table_listnames;
+----+------------+
| id | name |
+----+------------+
| 6 | WWW |
| 7 | WWWwww |
| 8 | WWWwwws |
| 9 | WWWwwwsSSS |
| 10 | asdsda |
+----+------------+
5 rows in set (0.00 sec)
I have a requirement where if name not found under the table , i need to insert or else do nothing
I am achieving it this way
String sql = "INSERT INTO table_listnames (name) SELECT name FROM (SELECT ?) AS tmp WHERE NOT EXISTS (SELECT name FROM table_listnames WHERE name = ?) LIMIT 1";
pst = dbConnection.prepareStatement(sql);
pst.setString(1, salesName);
pst.setString(2, salesName);
pst.executeUpdate();
Is it possible to know the id of the record of the given name in this case

LISTAGG function with two columns

I have one table like this (report)
--------------------------------------------------
| user_id | Department | Position | Record_id |
--------------------------------------------------
| 1 | Science | Professor | 1001 |
| 1 | Maths | | 1002 |
| 1 | History | Teacher | 1003 |
| 2 | Science | Professor | 1004 |
| 2 | Chemistry | Assistant | 1005 |
--------------------------------------------------
I'd like to have the following result
---------------------------------------------------------
| user_id | Department+Position |
---------------------------------------------------------
| 1 | Science,Professor;Maths, ; History,Teacher |
| 2 | Science, Professor; Chemistry, Assistant |
---------------------------------------------------------
That means I need to preserve the empty space as ' ' as you can see in the result table.
Now I know how to use LISTAGG function but only for one column. However, I can't exactly figure out how can I do for two columns at the sametime. Here is my query:
SELECT user_id, LISTAGG(department, ';') WITHIN GROUP (ORDER BY record_id)
FROM report
Thanks in advance :-)
It just requires judicious use of concatenation within the aggregation:
select user_id
, listagg(department || ',' || coalesce(position, ' '), '; ')
within group ( order by record_id )
from report
group by user_id
i.e. aggregate the concatentation of department with a comma and position and replace position with a space if it is NULL.

Resources