Oracle display value replacement of flattened, delimited foreign key values - oracle

I working on a data export for a painfully denormalized COTS product and am hung up over how to plug display values in my selection for columns that contain a delimited string of foreign keys.
Assume the following sets of data for example.
DEPARTMENTS table:
Key Value
---------------------------------
1 Finance
2 Human Resources
3 Public Affairs
4 Information Technology
PERSONNEL table:
PK FName LName Departments
-------------------------------------------------
111 Marty Graw 1|~*~|3|~*~|
222 Rick Shaw 2|~*~|4|~*~|
333 Jean Poole 4|~*~|2|~*~|3|~*~|1|~*~|
Desired output from select:
FName LName Departments
-----------------------------------------------------------------------------------
Marty Graw Finance, Public Affairs
Rick Shaw Human Resources, Information Technology
Jean Poole Information Technology, Human Resources, Public Affairs, Finance
I've found examples of how to deal with delimited strings but nothing that really seems to fit this particular scenario. Ideally I'd like to figure out how I could do it without having to create functions etc. as my permissions are pretty limited.

This will not preserve the original order of the IDs, but if that's not important then this will work:
select DISTINCT
p.fname
,p.name
,LISTAGG(d.value, ', ')
WITHIN GROUP (ORDER BY d.value)
OVER (PARTITION BY p.pk)
AS departments_list
from personnel p
left join departments d
on INSTR('|~*~|'||p.departments||'|~*~|'
,'|~*~|'||d.key||'|~*~|') > 0;
SQL Fiddle: http://sqlfiddle.com/#!4/d292e/3/0
EDIT
If you really need them listed in the same order as the IDs, you can use this variant:
select DISTINCT
p.fname
,p.lname
,LISTAGG(d.value, ', ')
WITHIN GROUP (
ORDER BY INSTR('|~*~|'||p.departments||'|~*~|'
,'|~*~|'||d.key||'|~*~|'))
OVER (PARTITION BY p.pk) AS departments_list
from personnel p
left join departments d
on INSTR('|~*~|'||p.departments||'|~*~|'
,'|~*~|'||d.key||'|~*~|') > 0;
http://sqlfiddle.com/#!4/d292e/4

Related

Oracle sql query returning all of the values instead limiting the values

full disclosure this is part of a homework question but I have tried 6 different versions and I am stuck.
I am trying for find 1 manager every time the query runs. I.e I put the department id in and 1 name pops out. currently, I get all the names, multiple times. I have tried a nesting with an '=' not nesting, union, intersection, etc. I can get the manager id with a basic query, I just can't get the name. the current version looks like this:
select e.ename
from .emp e
where d.managerid in (select unique d.managerid
from works w, .dept d, emp e1
where d.did=1 and e1.eid=w.eid and d.did=w.did );
I realize its probably a really basic mistake that I am not seeing - any ideas?
Its not clear what do you mean get 1 menager any time. are it should be different menagers any time or the same?
Lest go throw your query:
you select all empolyes from table emp where manager_id in next query dataset
You get all managers for dep=1. The rest of tables and conditions are not influent on result dataset.
I theing did is primary key for table dept, If so your query may be rewritten to
select e.ename
from emp e
where d.managerid in (select unique d.managerid
from dept d
where d.did=1);
but this query return to you all emploees and not manager from dept=1
and if you need a manager. you should get emploee who is a manager. If eid is primary key of employee, and managerid is id from employee table you need something like:
select e.ename
from emp e
where e1.eid in (select unique d.managerid
from dept d
where d.did=1);

Removing duplicate columns in one row while merging corresponding column values

Apologies if this seems simple to some, I am still in the (very) early stages of learning!
Basically I've got a table database that has multiple users (Users_ID), each with a corresponding access name(NAME). The problem is, some Users have multiple access names, meaning when the data is pulled, there is duplicates in the User_ID column.
I need to remove the duplicates in the User column and join their corresponding access names in the NAME column, so it only takes up 1 row and no data is lost.
The current SQL query I'm using is :
select Table1_user_id, Table2.name,
from Table1
inner join Table2
on Table1.role_id = Table2.role_id
An example of what this would return:
USER_ID | NAME
------- --------------
Tim Level_1 Access
John Level 2 Access
Tim Level 2 Access
Mark Level 3 Access
Tim Level 3 Access
Ideally, I would remove the duplicates for Tim and display as following:
USER_ID | NAME
------- ----------------------------------------------
Tim Level_1 Access, Level 2 Access, Level 3 Access
John Level 2 Access
Mark Level 3 Access
Thanks in advance for any help regarding this and sorry if something similar has been asked before!
Use GROUP_CONCAT with SEPARATOR :
SELECT Table1.user_id, GROUP_CONCAT(Table2.name SEPARATOR ',') AS Ename
FROM Table1
INNER JOIN Table2 ON Table1.role_id = Table2.role_id
GROUP BY Table1.user_id
select Table1_user_id, LISTAGG(Table2.name,', ') WITHIN GROUP (ORDER BY Table2.name) as Name
from Table1
inner join Table2
on Table1.role_id = Table2.role_id

Where two or more values match condition?

I have been asked this question;
You list county names and the surnames of the representatives if the representatives in the counties have the same surname.
and I have the following tables;
***REPRESENTATIVE***
REPI SURNAME FIRSTNAME COUNTY CONS
---- ---------- ---------- ---------- ----
R100 Gorege Larry kent CON1
R101 shneebly john kent CON2
R102 shneebly steve kent CON3
I cant seem to figure out the correct way to ask Orical to display a surname that exists more then twice and the surnames are in the same country.
I know how to ask WHERE something = something, but that's doesn't ask what I want to know.
It sounds like you want to use the HAVING clause after doing a GROUP BY
SELECT surname, county, count(*)
FROM you_table
GROUP BY surname, county
HAVING count(*) > 1;
If you really mean "more than twice" as you wrote, none of the data you'd want HAVING count(*) > 2 but then none of your sample data would be returned.
In words, this SQL statement says
Group the data into buckets by surname and county. Each distinct combination of surname and county is a separate bucket.
Count the number of rows in each bucket
Return those buckets where there are at least two rows

How to otimize select from several tables with millions of rows

Have the following tables (Oracle 10g):
catalog (
id NUMBER PRIMARY KEY,
name VARCHAR2(255),
owner NUMBER,
root NUMBER REFERENCES catalog(id)
...
)
university (
id NUMBER PRIMARY KEY,
...
)
securitygroup (
id NUMBER PRIMARY KEY
...
)
catalog_securitygroup (
catalog REFERENCES catalog(id),
securitygroup REFERENCES securitygroup(id)
)
catalog_university (
catalog REFERENCES catalog(id),
university REFERENCES university(id)
)
Catalog: 500 000 rows, catalog_university: 500 000, catalog_securitygroup: 1 500 000.
I need to select any 50 rows from catalog with specified root ordered by name for current university and current securitygroup. There is a query:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Where 100 - some catalog, 200 - some securitygroup, 300 - some university. This query return 50 rows from ~ 170 000 in 3 minutes.
But next query return this rows in 2 sec:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
I build next indexes: (catalog.id, catalog.name, catalog.owner), (catalog_securitygroup.catalog, catalog_securitygroup.index), (catalog_university.catalog, catalog_university.university).
Plan for first query (using PLSQL Developer):
http://habreffect.ru/66c/f25faa5f8/plan2.jpg
Plan for second query:
http://habreffect.ru/f91/86e780cc7/plan1.jpg
What are the ways to optimize the query I have?
The indexes that can be useful and should be considered deal with
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
So the following fields can be interesting for indexes
c: id, root
cs: catalog, securitygroup
cu: catalog, university
So, try creating
(catalog_securitygroup.catalog, catalog_securitygroup.securitygroup)
and
(catalog_university.catalog, catalog_university.university)
EDIT:
I missed the ORDER BY - these fields should also be considered, so
(catalog.name, catalog.id)
might be beneficial (or some other composite index that could be used for sorting and the conditions - possibly (catalog.root, catalog.name, catalog.id))
EDIT2
Although another question is accepted I'll provide some more food for thought.
I have created some test data and run some benchmarks.
The test cases are minimal in terms of record width (in catalog_securitygroup and catalog_university the primary keys are (catalog, securitygroup) and (catalog, university)). Here is the number of records per table:
test=# SELECT (SELECT COUNT(*) FROM catalog), (SELECT COUNT(*) FROM catalog_securitygroup), (SELECT COUNT(*) FROM catalog_university);
?column? | ?column? | ?column?
----------+----------+----------
500000 | 1497501 | 500000
(1 row)
Database is postgres 8.4, default ubuntu install, hardware i5, 4GRAM
First I rewrote the query to
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root < 50
AND cs.catalog = c.id
AND cu.catalog = c.id
AND cs.securitygroup < 200
AND cu.university < 200
ORDER BY c.name
LIMIT 50 OFFSET 100
note: the conditions are turned into less then to maintain comparable number of intermediate rows (the above query would return 198,801 rows without the LIMIT clause)
If run as above, without any extra indexes (save for PKs and foreign keys) it runs in 556 ms on a cold database (this is actually indication that I oversimplified the sample data somehow - I would be happier if I had 2-4s here without resorting to less then operators)
This bring me to my point - any straight query that only joins and filters (certain number of tables) and returns only a certain number of the records should run under 1s on any decent database without need to use cursors or to denormalize data (one of these days I'll have to write a post on that).
Furthermore, if a query is returning only 50 rows and does simple equality joins and restrictive equality conditions it should run even much faster.
Now let's see if I add some indexes, the biggest potential in queries like this is usually the sort order, so let me try that:
CREATE INDEX test1 ON catalog (name, id);
This makes execution time on the query - 22ms on a cold database.
And that's the point - if you are trying to get only a page of data, you should only get a page of data and execution times of queries such as this on normalized data with proper indexes should take less then 100ms on decent hardware.
I hope I didn't oversimplify the case to the point of no comparison (as I stated before some simplification is present as I don't know the cardinality of relationships between catalog and the many-to-many tables).
So, the conclusion is
if I were you I would not stop tweaking indexes (and the SQL) until I get the performance of the query to go below 200ms as rule of the thumb.
only if I would find an objective explanation why it can't go below such value I would resort to denormalisation and/or cursors, etc...
First I assume that your University and SecurityGroup tables are rather small. You posted the size of the large tables but it's really the other sizes that are part of the problem
Your problem is from the fact that you can't join the smallest tables first. Your join order should be from small to large. But because your mapping tables don't include a securitygroup-to-university table, you can't join the smallest ones first. So you wind up starting with one or the other, to a big table, to another big table and then with that large intermediate result you have to go to a small table.
If you always have current_univ and current_secgrp and root as inputs you want to use them to filter as soon as possible. The only way to do that is to change your schema some. In fact, you can leave the existing tables in place if you have to but you'll be adding to the space with this suggestion.
You've normalized the data very well. That's great for speed of update... not so great for querying. We denormalize to speed querying (that's the whole reason for datawarehouses (ok that and history)). Build a single mapping table with the following columns.
Univ_id, SecGrp_ID, Root, catalog_id. Make it an index organized table of the first 3 columns as pk.
Now when you query that index with all three PK values, you'll finish that index scan with a complete list of allowable catalog Id, now it's just a single join to the cat table to get the cat item details and you're off an running.
The Oracle cost-based optimizer makes use of all the information that it has to decide what the best access paths are for the data and what the least costly methods are for getting that data. So below are some random points related to your question.
The first three tables that you've listed all have primary keys. Do the other tables (catalog_university and catalog_securitygroup) also have primary keys on them?? A primary key defines a column or set of columns that are non-null and unique and are very important in a relational database.
Oracle generally enforces a primary key by generating a unique index on the given columns. The Oracle optimizer is more likely to make use of a unique index if it available as it is more likely to be more selective.
If possible an index that contains unique values should be defined as unique (CREATE UNIQUE INDEX...) and this will provide the optimizer with more information.
The additional indexes that you have provided are no more selective than the existing indexes. For example, the index on (catalog.id, catalog.name, catalog.owner) is unique but is less useful than the existing primary key index on (catalog.id). If a query is written to select on the catalog.name column, it is possible to do and index skip scan but this starts being costly (and most not even be possible in this case).
Since you are trying to select based in the catalog.root column, it might be worth adding an index on that column. This would mean that it could quickly find the relevant rows from the catalog table. The timing for the second query could be a bit misleading. It might be taking 2 seconds to find 50 matching rows from catalog, but these could easily be the first 50 rows from the catalog table..... finding 50 that match all your conditions might take longer, and not just because you need to join to other tables to get them. I would always use create table as select without restricting on rownum when trying to performance tune. With a complex query I would generally care about how long it take to get all the rows back... and a simple select with rownum can be misleading
Everything about Oracle performance tuning is about providing the optimizer enough information and the right tools (indexes, constraints, etc) to do its job properly. For this reason it's important to get optimizer statistics using something like DBMS_STATS.GATHER_TABLE_STATS(). Indexes should have stats gathered automatically in Oracle 10g or later.
Somehow this grew into quite a long answer about the Oracle optimizer. Hopefully some of it answers your question. Here is a summary of what is said above:
Give the optimizer as much information as possible, e.g if index is unique then declare it as such.
Add indexes on your access paths
Find the correct times for queries without limiting by rowwnum. It will always be quicker to find the first 50 M&Ms in a jar than finding the first 50 red M&Ms
Gather optimizer stats
Add unique/primary keys on all tables where they exist.
The use of rownum is wrong and causes all the rows to be processed. It will process all the rows, assigned them all a row number, and then find those between 0 and 50. When you want to look for in the explain plan is COUNT STOPKEY rather than just count
The query below should be an improvement as it will only get the first 50 rows... but there is still the issue of the joins to look at too:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
where rownum <= 50
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Also, assuming this for a web page or something similar, maybe there is a better way to handle this than just running the query again to get the data for the next page.
try to declare a cursor. I dont know oracle, but in SqlServer would look like this:
declare #result
table (
id numeric,
name varchar(255)
);
declare __dyn_select_cursor cursor LOCAL SCROLL DYNAMIC for
--Select
select distinct
c.id, c.name
From [catalog] c
inner join university u
on u.catalog = c.id
and u.university = 300
inner join catalog_securitygroup s
on s.catalog = c.id
and s.securitygroup = 200
Where
c.root = 100
Order by name
--Cursor
declare #id numeric;
declare #name varchar(255);
open __dyn_select_cursor;
fetch relative 1 from __dyn_select_cursor into #id,#name declare #maxrowscount int
set #maxrowscount = 50
while (##fetch_status = 0 and #maxrowscount <> 0)
begin
insert into #result values (#id, #name);
set #maxrowscount = #maxrowscount - 1;
fetch next from __dyn_select_cursor into #id, #name;
end
close __dyn_select_cursor;
deallocate __dyn_select_cursor;
--Select temp, final result
select
id,
name
from #result;

Select all rows from SQL based upon existence of multiple rows (sequence numbers)

Let's say I have table data similar to the following:
123456 John Doe 1 Green 2001
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
345678 Jim Doe 1 Red 2001
What I am attempting to do is only isolate the records for Jane Doe based upon the fact that she has more than one row in this table. (More that one sequence number)
I cannot isolate based upon ID, names, colors, years, etc...
The number 1 in the sequence tells me that is the first record and I need to be able to display that record, as well as the number 2 record -- The change record.
If the table is called users, and the fields called ID, fname, lname, seq_no, color, date. How would I write the code to select only records that have more than one row in this table? For Example:
I want the query to display this only based upon the existence of the multiple rows:
234567 Jane Doe 1 Yellow 2001
234567 Jane Doe 2 Red 2001
In PL/SQL
First, to find the IDs for records with multiple rows you would use:
SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1
So you could get all the records for all those people with
SELECT * FROM table WHERE ID IN (SELECT ID FROM table GROUP BY ID HAVING COUNT(*) > 1)
If you know that the second sequence ID will always be "2" and that the "2" record will never be deleted, you might find something like:
SELECT * FROM table WHERE ID IN (SELECT ID FROM table WHERE SequenceID = 2)
to be faster, but you better be sure the requirements are guaranteed to be met in your database (and you would want a compound index on (SequenceID, ID)).
Try something like the following. It's a single tablescan, as opposed to 2 like the others.
SELECT * FROM (
SELECT t1.*, COUNT(name) OVER (PARTITION BY name) mycount FROM TABLE t1
)
WHERE mycount >1;
INNER JOIN
JOIN:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1 JOIN users u2 ON (u1.ID = u2.ID and u2.seq_no = 2)
WHERE:
SELECT u1.ID, u1.fname, u1.lname, u1.seq_no, u1.color, u1.date
FROM users u1, thetable u2
WHERE
u1.ID = u2.ID AND
u2.seq_no = 2
Check out the HAVING clause for a summary query. You can specify stuff like
HAVING COUNT(*) >= 2
and so forth.

Resources