identifying duplicate row sets - oracle

I have table that looks like the following.
create table Testing(
inv_num varchar2(100),
po_num varchar2(100),
line_num varchar2(100)
)
data with the following.
Insert into Testing (INV_NUM,PO_num,line_num) values ('19782594','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19782594','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19968276','P0254836',1);
Insert into Testing (INV_NUM,PO_num,line_num) values ('19968276','P0254836',1);
what i'm trying to do is identify the multiple items within the table with the same PO_num but different inv_num.
I have try this
SELECT
T1.inv_num,
T1.Po_num,
T1.LINE_num ,
count(*) over( partition by
T1.inv_num)myRecords
FROM testing T1
where T1.Po_num = 'P0254836'
group by
T1.inv_num,
T1.Po_num,
T1.LINE_num
order by t1.inv_num
but this those not give me the desired end result.
I would like to end with the following.
INV_NUM PO_NUM LINE_NUM Myrecords
19782594 P0254836 1 1
19782594 P0254836 1 1
19968276 P0254836 1 2
19968276 P0254836 1 2
Where I'm going wrong? I really like to identify the change in INV_NUM for that po.
Please be aware this is part of a much larger project and I have only picked a small subset to show here.

Updated:
SELECT
inv_num
, po_num
, line_num
, DENSE_RANK() OVER (ORDER BY inv_num) "MyRecords"
FROM (
SELECT
po_num
, inv_num
, line_num
, COUNT(line_num) OVER (PARTITION BY po_num, inv_num ORDER BY NULL) cnt
FROM testing
)
WHERE cnt > 1;
returns
| INV_NUM | PO_NUM | LINE_NUM | MYRECORDS |
|----------|----------|----------|-----------|
| 19782594 | P0254836 | 1 | 1 |
| 19782594 | P0254836 | 1 | 1 |
| 19968276 | P0254836 | 1 | 2 |
| 19968276 | P0254836 | 1 | 2 |
SQL Fiddle

Maybe this helps:
SELECT inv_num,
po_num,
line_num,
DENSE_RANK() OVER (ORDER BY inv_num) AS rn
FROM testing

Related

How can I write code for use LOOP FOR , date for sum balance

I have two table :
on table is tbl_ledger with columns
| balance | eff_date |
| -------- | -------- |
| 200$ | 2FEB 2008 |
| 500$ | 2FEB 2008 |
| 250$ | 3FEB 2008 |
| 150$ | 5feb 2008 |
ANOTHER TABLE NAME IS : tbl_ledger_input
with columns balance and eff_date
| balance | eff_date |
| -------- | -------- |
|700$ | 2FEB 2008 |
| 250$ | 3FEB 2008 |
| 150$ |5FEB 2008 |
| ........ | ....... |
**I want insert into table tbl_ledger_input for each day from tbl_ledger sum (balance) for each date with LOOP FOR
**
PLEASE HELP ME .
I WANT INSERT INTO TABLE TBL_LEDGER_INPUT using LOOP FOR each EFF_DATE .
As you've already seen in comments, loop isn't the best choice for that. Row-by-row processing is slow; you won't notice any difference if you're dealing with small data sets (as in this example), though.
If you want to do it in PL/SQL (using a loop), a simple option is a cursor FOR loop:
SQL> begin
2 for cur_r in (select eff_date, sum(balance) balance
3 from tbl_ledger
4 group by eff_date
5 )
6 loop
7 insert into tbl_ledger_input (balance, eff_date)
8 values (cur_r.balance, cur_r.eff_date);
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL> select * From tbl_ledger_input;
BALANCE EFF_DATE
---------- -----------
700 02-FEB-2008
250 03-FEB-2008
What you should really be doing is
SQL> truncate table tbl_ledger_input;
Table truncated.
SQL> insert into tbl_ledger_input (balance, eff_date)
2 select sum(balance), eff_date
3 from tbl_ledger
4 group by eff_date;
2 rows created.
SQL> select * From tbl_ledger_input;
BALANCE EFF_DATE
---------- -----------
700 02-FEB-2008
250 03-FEB-2008
SQL>

Group by TIMESTAMP`S Date in Oracle

I am trying to group by Timestamp`s Date in oracle so far I used to_char. But I need another way. I tried like below:
SELECT d.summa,
d.FILIAL_CODE,
to_char(d.DATE_ACTION, 'YYYY-MM-DD')
FROM table1 d
WHERE d.action_id = 2
AND d.date_action Between to_date('01.01.2020', 'dd.mm.yyyy') AND to_date('01.03.2020', 'dd.mm.yyyy')
GROUP BY to_char(d.DATE_ACTION, 'YYYY-MM-DD')
table1
-----------------------------------------------------
summa | filial_code | date_action
--------------------------------------------------
100000.00 | 2100 | 2016-09-13 11:04:32
320000.12 | 3200 | 2016-09-12 21:04:58
400000.00 | 2100 | 2016-09-13 15:12:45
510000.12 | 3200 | 2016-09-15 09:30:58
------------------------------------------------------
I need like below
-------------------------------------------
summa | filial_code | date_action
------------------------------------------
500000.00 | 2100 | 2016-09-13
320000.12 | 3200 | 2016-09-12
510000.12 | 3200 | 2016-09-15
------------------------------------------
But I need except to_char function. I tried trunc but i could not do that
Using TRUNC should actually convert it to a date and remove the time part, but you also need to handle your other columns. Either group by them or use an aggregation function:
SELECT SUM( d.summa ) AS summa,
d.FILIAL_CODE,
TRUNC(d.DATE_ACTION) AS date_action
FROM table1 d
WHERE d.action_id = 2
AND d.date_action Between to_date('01.01.2020', 'dd.mm.yyyy')
AND to_date('01.03.2020', 'dd.mm.yyyy')
GROUP BY TRUNC(d.DATE_ACTION), d.FILIAL_CODE

Sets From a Single Table, Grouped By a Column

I have a table:
+-------+-------+----------+
| GROUP | State | Priority |
+-------+-------+----------+
| 1 | MI | 1 |
| 1 | IA | 2 |
| 1 | CA | 3 |
| 1 | ND | 4 |
| 1 | AZ | 5 |
| 2 | IA | 2 |
| 2 | NJ | 1 |
| 2 | NH | 3 |
And so on...
How do I write a query that makes all the sets of the states by group, in priority order? Like so:
+-------+--------------------+
| GROUP | SET |
+-------+--------------------+
| 1 | MI |
| 1 | MI, IA |
| 1 | MI, IA, CA |
| 1 | MI, IA, CA, ND |
| 1 | MI, IA, CA, ND, AZ |
| 2 | NJ |
| 2 | NJ, IA |
| 2 | NJ, IA, NH |
+-------+--------------------+
This is similar to my question here and I've tried to modify that solution but, I'm just a forty watt bulb and it's a sixty watt problem...
This problem actually looks simpler than the answer to the question you linked, which is an excellent solution to that problem. Nevertheless, this uses the same hierarchical queries, with connect by
If it is the case that priority is always a continuous sequence of numbers, this will work
SELECT t.grp, level, ltrim(SYS_CONNECT_BY_PATH(state,','),',') as "set"
from t
start with priority = 1
connect by priority = prior priority + 1
and grp = prior grp
However, if that's not always true, we would require row_number() to define the sequence based on the order of priority ( which need not be consecutive integer)
with t2 AS
(
select t.*, row_number()
over ( partition by grp order by priority) as rn from t
)
SELECT t2.grp, ltrim(SYS_CONNECT_BY_PATH(state,','),',') as "set"
from t2
start with priority = 1
connect by rn = prior rn + 1
and grp = prior grp
DEMO
I realize this has already been answered, but I wanted to see if I could do this using ANSI standard syntax. "connect by" is an Oracle only feature, the following will work on multiple databases:
WITH
-- ASET is just setting up the sample dataset
aset AS
(SELECT 1 AS grp, 'MI' AS state, 1 AS priority FROM DUAL
UNION ALL
SELECT 1 AS grp, 'IA', 2 FROM DUAL
UNION ALL
SELECT 1 AS grp, 'CA', 3 FROM DUAL
UNION ALL
SELECT 1 AS grp, 'ND', 4 FROM DUAL
UNION ALL
SELECT 1 AS grp, 'AZ', 5 FROM DUAL
UNION ALL
SELECT 2 AS grp, 'IA', 2 FROM DUAL
UNION ALL
SELECT 2 AS grp, 'NJ', 1 FROM DUAL
UNION ALL
SELECT 2 AS grp, 'NH', 3 FROM DUAL),
bset AS
-- In BSET we convert the ASET records into comma separated values
( SELECT grp, LISTAGG( state, ',' ) WITHIN GROUP (ORDER BY priority) AS set1
FROM aset
GROUP BY grp),
cset ( grp
, set1
, set2
, pos ) AS
-- CSET breaks our comma separated values up into multiple rows
-- Each row adding the next CSV value
(SELECT grp AS grp
, set1 AS set1
, SUBSTR( set1 || ',', 1, INSTR( set1 || ',', ',' ) - 1 ) AS set2
, 1 AS pos
FROM bset
UNION ALL
SELECT grp AS grp
, set1 AS set1
, SUBSTR( set1 || ','
, 1
, INSTR( set1 || ','
, ','
, 1
, pos + 1 )
- 1 ) AS set2
, pos + 1 AS pos
FROM cset
WHERE INSTR( set1 || ','
, ','
, 1
, pos + 1 ) > 0)
SELECT grp, set2
FROM cset
ORDER BY grp, pos;

Aggregating several columns in oracle sql

Having a difficult time phrasing this question. Let me know if there's a better title.
I have a query that produces data like this:
+----------+----------+----------+----------+----------+
| KEY | FEB_GRP1 | JAN_GRP1 | FEB_GRP2 | JAN_GRP2 |
+----------+----------+----------+----------+----------+
| 50840992 | 1 | 1 | 0 | 0 |
| 50840921 | 0 | 1 | 1 | 0 |
| 50848995 | 0 | 0 | 0 | 0 |
+----------+----------+----------+----------+----------+
Alternatively, I can produce data like this:
+----------+------+------+
| KEY | JAN | FEB |
+----------+------+------+
| 50840992 | <50 | ~<50 |
| 50840921 | <50 | <50 |
| 50848995 | ~<50 | ~<50 |
| 50840885 | <50 | <50 |
+----------+------+------+
Where <50 should be counter as "group 1" and ~<50 should be counter as "group 2".
And I want it to be like this:
+-------+------+------+
| MONTH | GRP1 | GRP2 |
+-------+------+------+
| JAN | 2 | 0 |
| FEB | 1 | 1 |
+-------+------+------+
I can already get JAN_GRP1_SUM just by summing JAN_GRP1, but I want that to just be a data point, not a column itself.
My query (generates the first diagram):
SELECT *
FROM (
SELECT KEY,
CASE WHEN "FEB-1-2016" = '<50' THEN 1 ELSE 0 END AS FEB_GRP1,
CASE WHEN "FEB-1-2016" != '<50' THEN 1 ELSE 0 END AS FEB_GRP2,
CASE WHEN "JAN-1-2016" = '<50' THEN 1 ELSE 0 END AS JAN_GRP1,
CASE WHEN "JAN-1-2016" != '<50' THEN 1 ELSE 0 END AS JAN_GRP2
FROM MY_TABLE);
Your data model doesn't make much sense, but from what you've shown you can do:
select 'JAN' as month,
count(case when "JAN-1-2016" = '<50' then 1 end) as grp1,
count(case when "JAN-1-2016" != '<50' then 1 end) as grp2
from my_table
union all
select 'FEB' as month,
count(case when "FEB-1-2016" = '<50' then 1 end) as grp1,
count(case when "FEB-1-2016" != '<50' then 1 end) as grp2
from my_table;
That doesn't scale well - if you have more months you need to add another union branch for each one.
If your query is based on a view or a previously calculated summary then it will probably be much easier to go back to the original data.
If you are stuck with this then another possible approach, which might be more manageable if you actually have more than two months to look at, could be to unpivot the data:
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
);
and then aggregate that:
select to_char(month, 'MON', 'NLS_DATE_LANGUAGE=ENGLISH') as month,
count(case when value = '<50' then 1 end) as grp1,
count(case when value != '<50' then 1 end) as grp2
from (
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
)
)
group by month;
Still not pretty and Oracle is doing pretty much the same thing under the hood I think, but fewer case expressions to create and maintain - the drudge part is the unpivot pairs. You might need to include the year in the `'month' field, depending on the range of data you have.

What is more efficent: several insert vs single insert with union

I have a large table (~6M rows, 41 cols) in Postgresql as follows:
id | answer1 | answer2 | answer3 | ... | answer40
1 | xxx | yyy | null | ... | null
2 | xxx | null | null | ... | null
3 | xxx | null | zzz | ... | aaa
Note that there are many empty columns in every rows and I only want those with data
I want to normalize it to get this:
id | answers
1 | xxx
1 | yyy
2 | xxx
3 | xxx
3 | zzz
...
3 | aaa
The question is, what is more efficient / fast, several inserts or a single insert and many unions?:
Option 1
create new_table as
select id, answer1 from my_table where answer1 is not null
union
select id, answer2 from my_table where answer2 is not null
union
select id, answer3 from my_table where answer3 is not null
union ...
Option 2
create new_table as select id, answer1 from my_table where answer1 is not null;
insert into new_table select id, answer2 from my_table where answer2 is not null;
insert into new_table select id, answer3 from my_table where answer3 is not null;
...
Option 3: is there a better way to do this?
Option 2 should be faster.
Wrap all the statements in a begin-commit block to save the time on individual commits.
For faster selects make sure that the columns being filtered (e.g. where answer1 is not null) have indexes

Resources