Remove duplicate values from a listagg in oracle - oracle

I have used listagg to concat and list books along with the supplementary books name.
SELECT DISTINCT SUBSTR(LISTAGG(',-'||B1.BOOK_NO||','||B1.BOOK_NAME||','||A.AUTHOR_NAME||',-'||B2.BOOK_NO||','||B2.BOOK_NAME) WITHIN GROUP (ORDER BY B2.BOOK_NO),2)
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123;
But since the number of supplementary books are more i get the main book name repeatedly.
Is there a way to remove the duplicate main book name and number.
My ouput is like this
-99,Anders Carlson ,-109,John Stuart,-99,Anders Carlson ,-47,James Anderson
Here the value 99 is repeated i want only one 99.
Desired Output:
-99,Anders Carlson ,-109,John Stuart,-47,James Anderson
DB data:
Book_list:
NO | MAIN_BOOK_NO | MAIN_BOOK_NAME | BOOK_SUP_NO | AUTHOR_NO
1 | 12 | xyz | 5 | 2
2 | 22 | abc | 7 | 4
Authors:
NO | AUTHOR_NO | AUTHOR_NAME
1 | 2 | Alex
2 | 3 | Leonard
3 | 4 | Benjamin
Supplementary:
NO | BOOK_NO | BOOK_NAME
1 | 5 | ABC
2 | 5 | XYZ
3 | 7 | LMN
4 | 7 | DEF
5 | 7 | NEW
The output should be like
NAME
12,xyz,Alex,-5,ABC,-5,XYZ
22,abc,Benjamin,-7,LMN,-7,DEF,-7,NEW
Similarly for the entire data in the table

If I understand you correctly, you need to append the list of supplementary books to the main book, so you're actually after something like:
SELECT B1.MAIN_BOOK_NO||','||B1.MAIN_BOOK_NAME||',-'||
LISTAGG(B2.BOOK_NO||','||B2.BOOK_NAME, ',-') WITHIN GROUP (ORDER BY B2.BOOK_NO)
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123
GROUP BY B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME;

See if this works
select T1.MAIN_BOOK_NO, T11.MAIN_BOOK_NAME, LISTAGG(',-'||',-'||T1.BOOK_NO||','||T1.BOOK_NAME) WITHIN GROUP (order by T1.BOOK_NO)
from
(
SELECT B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME, B2.BOOK_NO, B2.BOOK_NAME
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123
group by B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME, B2.BOOK_NO, B2.BOOK_NAME
order by B2.BOOK_NO
) T1
group by T1.MAIN_BOOK_NO, T1.MAIN_BOOK_NAME;

Related

TABLE ACCESS FULL in Oracle execution plan

I have been tasked to find out the SELECT statement for an explain plan
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN RIGHT ANTI | |
| 2 | VIEW | VW_NSO_1 |
| 3 | HASH JOIN RIGHT SEMI| |
| 4 | TABLE ACCESS FULL | PART |
| 5 | TABLE ACCESS FULL | ORDERS |
| 6 | TABLE ACCESS FULL | CUSTOMER |
------------------------------------------
I am able to find the select statement from Id 0-5 but what does the line 6 mean?
This is what I have managed to figure out so far I can't get where the last sentence comes into play.
select *
from customer c join orders o
on c.custkey = o.custkey
where o_totalprice
not in
(select p_retailprice
from part p join orders o
on orders.o_custkey >= 0 and 0.1*o_totalprice >= 0)
I can't get where the last sentence comes into play?
Your query is:
select *
from customer c join orders o
on c.custkey = o.custkey
where o_totalprice
not in
(select p_retailprice
from part p join orders o
on orders.o_custkey >= 0 and 0.1*o_totalprice >= 0)
And your explain plan is
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN RIGHT ANTI | |
| 2 | VIEW | VW_NSO_1 |
| 3 | HASH JOIN RIGHT SEMI| |
| 4 | TABLE ACCESS FULL | PART |
| 5 | TABLE ACCESS FULL | ORDERS |
| 6 | TABLE ACCESS FULL | CUSTOMER |
------------------------------------------
In your case, this is what happens:
You are getting all the records from both customer and orders that match the condition based on the custkey field.
Your predicate information is delimiting the output to those where o_totalprice ( by the way it should clarified for reading easiness where this field is coming from, although I guess is from orders table ) is not part of the dataset retrieved from the subquery.
the subquery is getting all values of p_retailprice that match the join between part and orders using orders.o_custkey >= 0 and 0.1*o_totalprice >= 0
Getting this in consideration the CBO is:
Accessing ( Line 6 ) by TABLE FULL SCAN the table CUSTOMER, which is logical as you are getting all fields from the table and probably you have no index over custkey.
Making a HASH SEMI JOIN ( line 3 ) between PARTS and ORDERS. In general, a semi join is used for an in or exists clause, and the join stops as soon as the exists condition or the in condition is satisfied.
The HASH JOIN ANTI of line 1 is when the optimizer push the join predicate into a view, normally when an anti join ( not in ) is in place. This is then join to the CUSTOMER TABLE in line 6.
You are filtering only in the right table of the join ( ORDERS ) that is why the access are reflecting that.
This is just an overview of your execution plan and the reasons why the CBO is using those access paths.

Oracle 11g insert into select from a table with duplicate rows

I have one table that need to split into several other tables.
But the main table is just like a transitive table.
I dump data from a excel into it (from 5k to 200k rows) , and using insert into select, split into the correct tables (Five different tables).
However, the latest dataset that my client sent has records with duplicates values.
The primary key usually is ENI for my table. But even this record is duplicated because the same company can be a customer and a service provider, so they have two different registers but use the same ENI.
What i have so far.
I found a script that uses merge and modified it to find same eni and update the same main_id to all
|Main_id| ENI | company_name| Type
| 1 | 1864 | JOHN | C
| 2 | 351485 | JOEL | C
| 3 | 16546 | MICHEL | C
| 2 | 351485 | JOEL J. | S
| 1 | 1864 | JOHN E. E. | C
Main_id: Primarykey that the main BD uses
ENI: Unique company number
Type: 'C' - COSTUMER 'S' - SERVICE PROVIDERR
Some Cases it can have the same type. just like id 1
there are several other Columns...
What i need:
insert any of the main_id my other script already sorted, and set a flag on the others that they were not inserted. i cant delete any data i'll need to send these info to the costumer validate.
or i just simply cant make this way and go back to the good old excel
Edit: as a question below this is a example
|Main_id| ENI | company_name| Type| RANK|
| 1 | 1864 | JOHN | C | 1 |
| 2 | 351485 | JOEL | C | 1 |
| 3 | 16546 | MICHEL | C | 1 |
| 2 | 351485 | JOEL J. | S | 2 |
| 1 | 1864 | JOHN E. E. | C | 2 |
RANK - would be like the 1864 appears 2 times,
1st one found gets 1 second 2 and so on. i tryed using
RANK() OVER (PARTITION BY MAIN_ID ORDER BY ENI)
RANK() OVER (PARTITION BY company_name ORDER BY ENI)
Thanks to TEJASH i was able to come up with this solution
MERGE INTO TABLEA S
USING (Select ROWID AS ID,
row_number() Over(partition by eniorder by eni, type) as RANK_DUPLICATED
From TABLEA
) T
ON (S.ROWID = T.ID)
WHEN MATCHED THEN UPDATE SET S.RANK_DUPLICATED= T.RANK_DUPLICATED;
As far as I understood your problem, you just need to know the duplicate based on 2 columns. You can achieve it using analytical function as follows:
Select t.*,
row_number() Over(partition by main_id, eni order by company_name) as rnk
From your_table t

Hierarchical Data Fetch in Spring and Hibernate

I have 2 tables Account and Group both contain data in a hierarchy.
Example -
(Just for reference I am using PostgresSQL)
Group
|------|----------|-------------------|
| id | name | parent_group_id |
|------|----------|-------------------|
| 1 | Group1 | null |
| 2 | Group2 | 1 |
| 3 | Group3 | 2 |
| 4 | Group4 | 1 |
|------|----------|-------------------|
Account
|----|----------|----------|
| id | name | group_id |
|----|----------|----------|
| 1 | Account1 | 1 |
| 2 | Account2 | 1 |
| 3 | Account3 | 2 |
| 4 | Account4 | 3 |
| 4 | Account5 | 4 |
-----|----------|-----------
This account and group hierarchy can be many levels deep. I want to fetch all groups and accounts in an efficient way using Spring and Hibernate.
I want the output to be like -
{"name":"Group1","groups":[{"name":"Group4","groups":[],"accounts":[{"name":"Account5"}]},{"name":"Group2","groups":[{"name":"Group3","groups":[],"accounts":[{"name":"Account4"}]}],"accounts":[{"name":"Account3"}]}],"accounts":[{"name":"Account2"},{"name":"Account1"}]}
I have checked some articles but they are not recursive (means group inside a group and so on).
This is the perfect use case for Blaze-Persistence.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. To model CTEs or recursive CTEs, which is what you need here, you first need to introduce a CTE entity that models the result type of the CTE.
#CTE
#Entity
public class GroupCTE {
#Id Integer id;
}
A query for this could look like the following
List<Group> groups = criteriaBuilderFactory.create(entityManager, Group.class)
.withRecursive(GroupCTE.class)
.from(Group.class, "g1")
.bind("id").select("g1.id")
.where("g1.parent").isNull()
.unionAll()
.from(Group.class, "g2")
.innerJoinOn(GroupCTE.class, "cte")
.on("cte.id").eqExpression("g2.parent.id")
.end()
.bind("id").select("g2.id")
.end()
.from(Group.class, "g")
.fetch("accounts", "groups")
.where("g.id").in()
.from(GroupCTE.class, "c")
.select("c.id")
.end()
.getResultList();
This renders to SQL looking like the following
WITH RECURSIVE GroupCTE(id) AS (
SELECT g1.id
FROM Group g1
WHERE g1.parent_group_id IS NULL
UNION ALL
SELECT g2.id
FROM Group g2
INNER JOIN GroupCTE cte ON g2.parent_group_id = cte.id
)
SELECT *
FROM Group g
LEFT JOIN Account a ON a.group_id = g.id
LEFT JOIN Group gsub ON gsub.parent_group_id = g.id
WHERE g.id IN (
SELECT c.id
FROM GroupCTE c
)
You can find out more about recursive CTEs in the documentation: https://persistence.blazebit.com/documentation/core/manual/en_US/index.html#recursive-ctes

Joining tables with same column names - ORACLE

I am using Oracle.
I am currently working one 2 tables which both have the same column names. Is there any way in which I can combine the 2 tables together as they are?
Simple example to show what I mean:
TABLE 1:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
TABLE 2:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| c | 3 | y |
| d | 4 | z |
RESULT THAT I WANT:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
| c | 3 | y |
| d | 4 | z |
Any help would be greatly appreciated. Thank you in advance!
You can use the union set operator to get the result of two queries as a single result set:
select column1, column2, column3
from table1
union all
select column1, column2, column3
from table2
union on its own implicitly removes duplicates; union all preserves them. More info here.
The column names don't need to be the same, you just need the same number of columns with the same datatpes, in the same order.
(This is not what is usually meant by a join, so the title of your question is a bit misleading; I'm basing this on the example data and output you showed.)

Sum of the grouped distinct values

This is a bit hard to explain in words ... I'm trying to calculate a sum of grouped distinct values in a matrix. Let's say I have the following data returned by a SQL query:
------------------------------------------------
| Group | ParentID | ChildID | ParentProdCount |
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 1 | 3 | 2 |
| A | 1 | 4 | 2 |
| A | 2 | 5 | 3 |
| A | 2 | 6 | 3 |
| A | 2 | 7 | 3 |
| A | 2 | 8 | 3 |
| B | 3 | 9 | 1 |
| B | 3 | 10 | 1 |
| B | 3 | 11 | 1 |
------------------------------------------------
There's some other data in the query, but it's irrelevant. ParentProdCount is specific to the ParentID.
Now, I have a matrix in the MS Report Designer in which I'm trying to calculate a sum for ParentProdCount (grouped by "Group"). If I just add the expression
=Sum(Fields!ParentProdCount.Value)
I get a result 20 for Group A and 3 for Group B, which is incorrect. The correct values should be 5 for group A and 1 for group B. This wouldn't happen if there wasn't ChildID involved, but I have to use some other child-specific data in the same matrix.
I tried to nest FIRST() and SUM() aggregate functions but apparently it's not possible to have nested aggregation functions, even when they have scopes defined.
I'm pretty sure there is some way to calculate the grouped distinct sum without needing to create another SQL query. Anyone got an idea how to do that?
Ok I got this sorted out by adding a ROW_NUMBER() function my SQL query:
SELECT Group, ParentID, ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY ChildID ASC) AS Position, ChildID, ParentProdCount FROM Table
and then I replaced the SSRS SUM function with
=SUM(IIF(Position = 1, ParentProdCount.Value, 0))
Put a grouping over the ParentID and use a summation over that group,
eg:
if group over ParentID = "ParentIDGroup"
then
column sum of ParentPrdCount = SUM(Fields!ParentProdCount.Value,"ParentIDGroup")

Resources