Related
Is it possible to count and also group by comma delimited values in the oracle database table? This is a table data example:
id | user | title |
1 | foo | a,b,c |
2 | bar | a,d |
3 | tee | b |
The expected result would be:
title | count
a | 2
b | 2
c | 1
d | 1
I wanted to use concat like this:
SELECT a.title FROM Account a WHERE concat(',', a.title, ',') LIKE 'a' OR concat(',', a.title, ',') LIKE 'b' ... GROUP BY a.title?
But I'm getting invalid number of arguments on concat. The title values are predefined, therefore I don't mind if I have to list all of them in the query. Any help is greatly appreciated.
This uses simple string functions and a recursive sub-query factoring and may be faster than using regular expressions and correlated joins:
Oracle Setup:
CREATE TABLE account ( id, "user", title ) AS
SELECT 1, 'foo', 'a,b,c' FROM DUAL UNION ALL
SELECT 2, 'bar', 'a,d' FROM DUAL UNION ALL
SELECT 3, 'tee', 'b' FROM DUAL;
Query:
WITH positions ( title, start_pos, end_pos ) AS (
SELECT title,
1,
INSTR( title, ',', 1 )
FROM account
UNION ALL
SELECT title,
end_pos + 1,
INSTR( title, ',', end_pos + 1 )
FROM positions
WHERE end_pos > 0
),
items ( item ) AS (
SELECT CASE end_pos
WHEN 0
THEN SUBSTR( title, start_pos )
ELSE SUBSTR( title, start_pos, end_pos - start_pos )
END
FROM positions
)
SELECT item,
COUNT(*)
FROM items
GROUP BY item
ORDER BY item;
Output:
ITEM | COUNT(*)
:--- | -------:
a | 2
b | 2
c | 1
d | 1
db<>fiddle here
Split titles to rows and count them.
SQL> with test (id, title) as
2 (select 1, 'a,b,c' from dual union all
3 select 2, 'a,d' from dual union all
4 select 3, 'b' from dual
5 ),
6 temp as
7 (select regexp_substr(title, '[^,]', 1, column_value) val
8 from test cross join table(cast(multiset(select level from dual
9 connect by level <= regexp_count(title, ',') + 1
10 ) as sys.odcinumberlist))
11 )
12 select val as title,
13 count(*)
14 From temp
15 group by val
16 order by val;
TITLE COUNT(*)
-------------------- ----------
a 2
b 2
c 1
d 1
SQL>
If titles aren't that simple, then modify REGEXP_SUBSTR (add + sign) in line #7, e.g.
SQL> with test (id, title) as
2 (select 1, 'Robin Hood,Avatar,Star Wars Episode III' from dual union all
3 select 2, 'Mickey Mouse,Avatar' from dual union all
4 select 3, 'The Godfather' from dual
5 ),
6 temp as
7 (select regexp_substr(title, '[^,]+', 1, column_value) val
8 from test cross join table(cast(multiset(select level from dual
9 connect by level <= regexp_count(title, ',') + 1
10 ) as sys.odcinumberlist))
11 )
12 select val as title,
13 count(*)
14 From temp
15 group by val
16 order by val;
TITLE COUNT(*)
------------------------------ ----------
Avatar 2
Mickey Mouse 1
Robin Hood 1
Star Wars Episode III 1
The Godfather 1
SQL>
I have an 'ITEMS' table like below:
ITEM_NO ITEM_NAME
1 Book
2 Pen
3 Sticky Notes
4 Ink
5 Corrector
6 Ruler
In another 'EMP_ITEMS' table I have the below:
EMPLOYEE ITEMS_LIST
John 1,2
Mikel 5
Sophia 2,3,6
William 3,4
Daniel null
Michael 6
The output has to be like this:
EMPLOYEE ITEMS_LIST ITEM_NAME
John 1,2 Book,Pen
Mikel 5 Corrector
Sophia 2,3,6 Pen,Sticky Notes,Ruler
William 3,4 Sticky Notes,Ink
Daniel null null
Michael 6 Ruler
I used the below query:
SELECT e.EMPLOYEE,e.ITEMS_LIST, LISTAGG(i.ITEM_NAME, ',') WITHIN GROUP (ORDER BY i.ITEM_NAME) ITEM_DESC
FROM EMP_ITEMS e
INNER JOIN ITEMS i ON i.ITEM_NO = e.ITEMS_LIST
GROUP BY e.EMPLOYEE,e.ITEMS_LIST;
But there is an error:
ORA-01722: invalid number
But there is an error: ORA-01722: invalid number
That is because your ITEMS_LIST is a string composed of numeric and comma characters and is not actually a list of numbers and you are trying to compare a single item number to a list of items.
Instead treat it as a string a look for sub-string matches. To do this you will need to surround the strings in the delimiter character and compare to see if one is the substring of the other:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Items ( ITEM_NO, ITEM_NAME ) As
SELECT 1, 'Book' FROM DUAL UNION ALL
SELECT 2, 'Pen' FROM DUAL UNION ALL
SELECT 3, 'Sticky Notes' FROM DUAL UNION ALL
SELECT 4, 'Ink' FROM DUAL UNION ALL
SELECT 5, 'Corrector' FROM DUAL UNION ALL
SELECT 6, 'Ruler' FROM DUAL;
CREATE TABLE emp_items ( EMPLOYEE, ITEMS_LIST ) AS
SELECT 'John', '1,2' FROM DUAL UNION ALL
SELECT 'Mikel', '5' FROM DUAL UNION ALL
SELECT 'Sophia', '3,2,6' FROM DUAL UNION ALL
SELECT 'William', '3,4' FROM DUAL UNION ALL
SELECT 'Daniel', null FROM DUAL UNION ALL
SELECT 'Michael', '6' FROM DUAL;
Query 1:
SELECT e.employee,
e.items_list,
LISTAGG( i.item_name, ',' )
WITHIN GROUP (
ORDER BY INSTR( ','||e.items_list||',', ','||i.item_no||',' )
) AS item_names
FROM emp_items e
LEFT OUTER JOIN
items i
ON ( ','||e.items_list||',' LIKE '%,'||i.item_no||',%' )
GROUP BY e.employee, e.items_list
Results:
| EMPLOYEE | ITEMS_LIST | ITEM_NAMES |
|----------|------------|------------------------|
| John | 1,2 | Book,Pen |
| Mikel | 5 | Corrector |
| Daniel | (null) | (null) |
| Sophia | 3,2,6 | Sticky Notes,Pen,Ruler |
| Michael | 6 | Ruler |
| William | 3,4 | Sticky Notes,Ink |
I need to postprocess a Oracle dataset in order to find the number of heat waves.
By definition, a heat waves occurs when the data value is greater than a threshold at least two consecutive times.
For example, given the threshold=20 and the sequence
23 31 32 17 16 23 16 21 22 18
the heat waves are 2:
{23,31,32} and {21,22}
and the lenght of the longest one is 3 (size of bigger subset)
My input dataset consists of several sequences; a sample input result set is:
-----------------------------
| ID | DAY | VALUE |
-----------------------------
| 100 | 1/1/17 | 20 |
| 100 | 2/1/17 | 21 |
| 200 | 1/1/17 | 12 |
| 200 | 2/1/17 | 24 |
| ... ... ...
In other words, I have a sequence per each ID and I need to output something like that:
-----------------------
| ID | #heat waves |
-----------------------
| 100 | 3 |
| 200 | 1 |
Here the current version of my stored procedure:
create or replace PROCEDURE sp (
p_query IN VARCHAR2,
cursor_ out sys_refcursor
) AS
processed processed_data_table := processed_data_table();
c sys_refcursor;
BEGIN
OPEN c FOR p_query;
processed.EXTEND;
processed(processed.count) := processed_data_obj();
fetch c INTO processed(processed.count).ID,
processed(processed.count).DAY, processed(processed.count).VALUE;
while c%found
processed.EXTEND;
processed(processed.count) := processed_data_obj();
fetch c INTO processed(processed.count).ID,
processed(processed.count).DAY, processed(processed.count).VALUE;
END loop;
CLOSE c;
processed.TRIM;
// HERE I NEED TO PROCESS processed TABLE AND STORE RESULT IN output
TABLE
OPEN cursor_ FOR
SELECT *
FROM TABLE( output);
END sp;
Anyone could help me providing a solution?
Thanks
In Oracle 12c, use MATCH_RECOGNIZE:
select id, count(*) "# of heatwaves" from series_data
match_recognize ( partition by id
order by day
one row per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
UPDATE: Also show longest heat wave for each series
To get the longest heatwave in each series, we have to introduce a MEASURES clause to the MATCH_RECOGNIZE, as below:
select id,
max(heatwave_length) "longest heatwave",
count(distinct heatwave_number) "# of heatwaves"
from series_data
match_recognize ( partition by id
order by day
measures
FINAL COUNT(*) as heatwave_length,
MATCH_NUMBER() heatwave_number
all rows per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
order by id;
Full example with data:
with series_data ( id, day, value ) as
( SELECT 100, date '2017-01-01', 23 from dual union all
SELECT 100, date '2017-01-02', 31 from dual union all
SELECT 100, date '2017-01-03', 32 from dual union all
SELECT 100, date '2017-01-04', 44 from dual union all
SELECT 100, date '2017-01-05', 16 from dual union all
SELECT 100, date '2017-01-06', 23 from dual union all
SELECT 100, date '2017-01-07', 16 from dual union all
SELECT 100, date '2017-01-08', 21 from dual union all
SELECT 100, date '2017-01-09', 22 from dual union all
SELECT 100, date '2017-01-10', 18 from dual union all
SELECT 200, date '2017-01-01', 23 from dual union all
SELECT 200, date '2017-01-02', 31 from dual union all
SELECT 200, date '2017-01-03', 32 from dual union all
SELECT 200, date '2017-01-04', 17 from dual union all
SELECT 200, date '2017-01-05', 16 from dual union all
SELECT 200, date '2017-01-06', 23 from dual union all
SELECT 200, date '2017-01-07', 16 from dual union all
SELECT 200, date '2017-01-08', 21 from dual union all
SELECT 200, date '2017-01-09', 22 from dual union all
SELECT 200, date '2017-01-10', 22 from dual union all
SELECT 200, date '2017-01-11', 6 from dual union all
SELECT 200, date '2017-01-12', 22 from dual union all
SELECT 200, date '2017-01-13', 22 from dual )
select id,
max(heatwave_length) "longest heatwave",
count(distinct heatwave_number) "# of heatwaves"
from series_data
match_recognize ( partition by id
order by day
measures
FINAL COUNT(*) as heatwave_length,
MATCH_NUMBER() heatwave_number
all rows per match
after match skip past last row
pattern ( over_threshold{2,} )
define
over_threshold as value > 20 )
group by id
order by id;
Results:
ID longest heatwave # of heatwaves
----- -------------- --------------
100 4 2
200 3 3
I have two tables. TABLE_A and TABLE_B.
Both tables maintain columns to save CREATION_USER. But this column has different name in respective tables.
My motive is to get a count of records each user has created in both tables.
That is, combining result of these two queries with few conditions. The user name should not get repeated and for user names who have created records in both tables, the count should be their sum.
SELECT A.CREATION_USER_A AS "USER",
COUNT(*)
FROM TABLE_A A
GROUP BY A.CREATION_USER_A;
SELECT B.CREATION_USER_B AS "USER",
COUNT(*)
FROM TABLE_B B
GROUP BY B.CREATION_USER_B;
For e.g.,
USER_A has created 2 records in TABLE_A,
USER_B has created 3 records in TABLE_B and
USER_C has created 4 records in TABLE_A and 3 records in TABLE_B.
So the output should look like this:
| USER | COUNT |
| USER_A | 2 |
| USER_B | 3 |
| USER_C | 7 |
I have written a query which does this but it performs really bad.
SELECT A.CREATION_USER_A AS "USER",
(COUNT(A.CREATION_USER_A)+(SELECT COUNT(CREATION_USER_B) FROM TABLE_B WHERE CREATION_USER_B = A.CREATION_USER_A)) AS "COUNT"
FROM TABLE_A A
GROUP BY A.CREATION_USER_A
UNION
SELECT B.CREATION_USER_B,
COUNT(B.CREATION_USER_B)
FROM TABLE_B B
WHERE B.CREATION_USER_B NOT IN (SELECT CREATION_USER_A FROM TABLE_A)
GROUP BY B.CREATION_USER_B;
Please suggest a way to get this done.
You can simply build a set given by the union (keeping duplicates) of all the records in your tables, and then count the records grouping by creation user:
Bulding some sample data:
create table table_a(id, creation_user_a) as (
select 1, 'USER_A' from dual union all
select 1, 'USER_A' from dual union all
select 1, 'USER_C' from dual union all
select 1, 'USER_C' from dual union all
select 1, 'USER_C' from dual union all
select 1, 'USER_C' from dual
);
create table table_b(id, creation_user_b) as (
select 1, 'USER_B' from dual union all
select 1, 'USER_B' from dual union all
select 1, 'USER_B' from dual union all
select 1, 'USER_C' from dual union all
select 1, 'USER_C' from dual union all
select 1, 'USER_C' from dual
)
The query:
select count(1), creation_user
from ( /* the union of all the records from table_a and table_b */
select creation_user_a as creation_user from table_a
union all /* UNION ALL keeps duplicates */
select creation_user_B from table_b
)
group by creation_user
order by creation_user
The result:
2 USER_A
3 USER_B
7 USER_C
The explain plan:
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12 | 96 | 8 (25)| 00:00:01 |
| 1 | SORT ORDER BY | | 12 | 96 | 8 (25)| 00:00:01 |
| 2 | HASH GROUP BY | | 12 | 96 | 8 (25)| 00:00:01 |
| 3 | VIEW | | 12 | 96 | 6 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | TABLE ACCESS FULL| TABLE_A | 6 | 48 | 3 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| TABLE_B | 6 | 48 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
An alternative (but more complicated, and possibly slower - you'd need to test both to check) solution to Aleksej's answer is to use a full outer join to join both grouped by queries, like so:
WITH table_a AS (SELECT 'USER_A' creation_user_a, 10 val FROM dual UNION ALL
SELECT 'USER_A' creation_user_a, 20 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_a, 30 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_a, 40 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_a, 50 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_a, 60 val FROM dual),
table_b AS (SELECT 'USER_B' creation_user_b, 10 val FROM dual UNION ALL
SELECT 'USER_B' creation_user_b, 20 val FROM dual UNION ALL
SELECT 'USER_B' creation_user_b, 30 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_b, 40 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_b, 50 val FROM dual UNION ALL
SELECT 'USER_C' creation_user_b, 60 val FROM dual)
-- end of mimicking your tables with data in them. See the SQL below:
SELECT COALESCE(a.creation_user_a, b.creation_user_b) "USER",
nvl(a.cnt_a, 0) + nvl(b.cnt_b, 0) total_records
FROM (SELECT creation_user_a,
COUNT(*) cnt_a
FROM table_a
GROUP BY creation_user_a) a
FULL OUTER JOIN (SELECT creation_user_b,
COUNT(*) cnt_b
FROM table_b
GROUP BY creation_user_b) b ON a.creation_user_a = b.creation_user_b
ORDER BY "USER";
USER TOTAL_RECORDS
------ -------------
USER_A 2
USER_B 3
USER_C 7
Thank you for helping me guys. I have found a simpler and more efficient solution. It works.
SELECT CREATION_USER, SUM(TOTAL_COUNT) TOTAL_COUNT FROM
(SELECT /*+ PARALLEL */ A.CREATION_USER_A CREATION_USER,
COUNT(A.CREATION_USER_A) TOTAL_COUNT
FROM TABLE_A A
GROUP BY A.CREATION_USER_A
UNION
SELECT /*+ PARALLEL */ B.CREATION_USER_B CREATION_USER,
COUNT(B.CREATION_USER_B) TOTAL_COUNT
FROM TABLE_B B
GROUP BY B.CREATION_USER_B)
GROUP BY CREATION_USER;
Using Oracle, I'm looking to do the following query, but I'd like to know if there is a more "intelligent" way to do it.
Select * from Sales Sales1
left join Sales Sales2 on Sales2.val = Sales1.val
left join Sales Sales3 on Sales3.val = Sales2.val
left join Sales Sales4 on Sales4.val = Sales3.val
left join Sales Sales5 on Sales5.val = Sales4.val
...
Here's what my sample data might look like
customer number | acct | start balance | open date | prev account
a 1 100 01-01-15 b-1
b 1 80 03-04-14
c 2 200 04-11-14 c-1
c 1 150 06-12-15
d 1 600 08-16-15
e 3 400 12-19-15 e-2
e 2 150 10-21-14 e-1
e 1 100 01-18-13
And a result set would look like this:
Customer | start | open | prevStart_01 | prevOpen_01 | prevStart_02 | prevOpen_02
a-1 | 100| 01-01-15| 80 | 03-04-14 | |
c-2 | 200| 04-11-14| 150 | 06-11-14 | |
e-3 | 400| 12-19-15| 150 | 10-21-14 | 100| 01-18-13
As you can see, I need to keep joining another record of sales based upon the result, and I need to keep doing so until I return an empty result set. My current scenario is running the query and seeing whether there are values in sales5, sales6, sales7, and so on.
Whenever you have to self-join an unknown number of times, you should be thinking CONNECT BY. Your particular need here isn't so straightforward, but CONNECT BY is still the key element of the solution.
In the SQL below, the mockup_data subfactor is just to give me some data. You'd use your actual table.
The idea is that you search your data for "root" -- records that are not a prev_account of any other record. Then, you start with those and CONNECT BY to get all their previous accounts, as many as there are. Then you PIVOT to get them all into columns.
One thing -- an Oracle SQL statement cannot have an arbitrary (data-driven) number of columns. The number must be known when the SQL is parsed. Therefore, in your PIVOT clause, you need to specify the maximum number of "levels" you'll support, so that Oracle knows how many columns the result set could have.
Here's the SQL.
WITH
mockup_data as (
SELECT
'a' customer_Number, 1 acct, 100 start_balance, to_date('01-01-15','MM-DD-YY') open_date, 'b-1' prev_account from dual union all
SELECT 'b' ,1, 80, to_date('03-04-14','MM-DD-YY'), null from dual union all
SELECT 'c' ,2, 200, to_date('04-11-14','MM-DD-YY'), 'c-1' from dual union all
SELECT 'c' ,1, 150, to_date('06-12-15','MM-DD-YY'), null from dual union all
SELECT 'd' ,1, 600, to_date('08-16-15','MM-DD-YY'), null from dual union all
SELECT 'e' ,3, 400, to_date('12-19-15','MM-DD-YY'), 'e-2' from dual union all
SELECT 'e' ,2, 150, to_date('10-21-14','MM-DD-YY'), 'e-1' from dual union all
SELECT 'e' ,1, 100, to_date('01-18-13','MM-DD-YY'), null from dual ),
data_with_roots AS
(SELECT d.*,
CASE
WHEN (SELECT COUNT (*)
FROM mockup_data d2
WHERE d2.prev_account = d.customer_number || '-' || d.acct) = 0 THEN
'Y'
ELSE
'N'
END
is_root
FROM mockup_data d),
hierarchy AS
(SELECT CONNECT_BY_ROOT (customer_number) customer_number,
CONNECT_BY_ROOT (acct) acct,
CONNECT_BY_ROOT (start_balance) start_balance,
CONNECT_BY_ROOT (open_date) open_date,
start_balance prev_start_balance,
open_date prev_open_date,
LEVEL - 1 lvl
FROM data_with_roots d
CONNECT BY customer_number || '-' || acct = PRIOR prev_account
START WITH is_root = 'Y'),
previous_only AS
(SELECT *
FROM hierarchy
WHERE lvl >= 1)
SELECT *
FROM previous_only PIVOT (MAX (prev_start_balance) AS prev_start, MAX (prev_open_date) AS prev_open
FOR lvl
IN (1 AS "01", 2 AS "02", 3 AS "03", 4 AS "04", 5 AS "05" -- etc... as many levels as you need to support
));