How to count or sum distinct values when there is a risk of intersection? - oracle

Imagine I have a table with people and their features:
group Name red_hair tall blue_eyes programmer
1 Mark 1 1 0 1
1 Sean 1 0 1 0
1 Lucas 1 1 1 1
2 Linda 0 1 1 1
I would like to count how many people of specific sets of features are in every group. In other words, I would like to make some bins without counting a person multiple times.
There are 2^4 (16) possible combinations of those sets, but I don't need so much.
For example, if a person has red_hair I don't care whether he or she has blue eyes or he or she a programmer. This person goes to the red hair bin of this group.
If a person is a programmer I don't care whether he or she is tall, but I don't want to count people who are already in a red hair bin. Because I have already counted them.
So I have a priority:
Red hair people counts first
Programmers second
People with blue eyes third
Expected result of this dataset:
group red_hair_persons programmers blue_eyes_persons
1 3 0 0
2 0 1 0
when I do this:
select group, count(case when red_hair = 1 then name end) as red_hair,
count(case when programmer = 1 and red_hair = 0 then name end) as programmers
from table
group by group
I fear that there would be some intersections. Or the logic with CASES would be so complex I could drown in it.
Am I right?
If so how could I avoid them? Maybe I am doing everything wrong and there is a better way to do what I want to. I have an enormous table with many features in it and I don't want to screw up.

Here's how I understood it:
SQL> with test (cgroup, name, red_hair, tall, blue_eyes, programmer) as
2 (select 1, 'mark' , 1, 1, 0, 1 from dual union all
3 select 1, 'sean' , 1, 0, 1, 0 from dual union all
4 select 1, 'lucas', 1, 1, 1, 1 from dual union all
5 select 2, 'linda', 0, 1, 1, 1 from dual
6 ),
7 priority as
8 (select t.*,
9 case when red_hair = 1 then 'A'
10 when programmer = 1 then 'B'
11 when blue_eyes = 1 then 'C'
12 else 'D'
13 end priority
14 from test t
15 )
16 select cgroup,
17 sum(case when priority = 'A' then 1 else 0 end) red_hair,
18 sum(case when priority = 'B' then 1 else 0 end) programmer,
19 sum(case when priority = 'C' then 1 else 0 end) blue_eyes,
20 sum(case when priority = 'D' then 1 else 0 end) other
21 from priority
22 group by cgroup;
CGROUP RED_HAIR PROGRAMMER BLUE_EYES OTHER
---------- ---------- ---------- ---------- ----------
1 3 0 0 0
2 0 1 0 0
SQL>
priority CTE puts every person into its priority group, based on their properties
the final select counts (using SUM + CASE) them per group

With a little bit of simple math involved in the conditional aggregation:
select "group",
sum("red_hair") red_hair_persons,
sum((1 - "red_hair") * "programmer") programmers,
sum((1 - "red_hair") * (1 - "programmer") * "blue_eyes") blue_eyes_persons
from tablename
group by "group"
See the demo.
Results:
> group | RED_HAIR_PERSONS | PROGRAMMERS | BLUE_EYES_PERSONS
> ----: | ---------------: | ----------: | ----------------:
> 1 | 3 | 0 | 0
> 2 | 0 | 1 | 0

Related

Sum all columns in a row

How can I get a total results of all my rows? (for ORACLE)
SELECT
NAME,
SUM(CASE WHEN ASSIST_1 = 'YES' THEN 1 END) WEEK1,
SUM(CASE WHEN ASSIST_2 = 'YES' THEN 1 END) WEEK2,
SUM(CASE WHEN ASSIST_3 = 'YES' THEN 1 END) WEEK3,
FROM TABLE_NAME
WHERE GROUP BY NAME;
I have this results:
Name week1 week2 week3
Anne 1 2 3
Bob 3 1 0
Charlie 4 5 1
I want this result:
Anne 1 2 3
Bob 3 1 0
Charlie 4 5 1
Total 8 8 4
How can I get a total results of all my rows?
By using rollup() extension of the group by clause. Something like this(just an example):
-- sample of date from your question
with t1(uname, c1, c2, c3) as(
select 'Anne' , 1, 2, 3 from dual union all
select 'Bob' , 3, 1, 0 from dual union all
select 'Charlie', 4, 5, 1 from dual
)
-- actual query
select case grouping(uname)
when 0 then uname
else 'Total' end
as uname1
, sum(c1) as c1
, sum(c2) as c2
, sum(c3) as c3
from t1
group by rollup(uname)
order by grouping(uname)
Result:
UNAME1 C1 C2 C3
------- ---------- ---------- ----------
Anne 1 2 3
Bob 3 1 0
Charlie 4 5 1
Total 8 8 4
4 rows selected.
Use UNION ALL with SUM values
WITH t1(name, week1, week2, week3) AS
( SELECT 'Anne', 1, 2, 3 FROM dual
UNION ALL
SELECT 'Bob', 3, 1, 0 FROM dual
UNION ALL
SELECT 'Charlie', 4, 5, 1 FROM dual
),
s AS
(SELECT 'Total' name,
SUM(week1) week1,
SUM(week2) week2,
SUM(week2) week3
FROM t1
)
SELECT * FROM t1
UNION ALL
SELECT * FROM s;
Result:
NAME WEEK1 WEEK2 WEEK3
Anne 1 2 3
Bob 3 1 0
Charlie 4 5 1
Total 8 8 8
Base on #Nicholas answer:
SELECT
CASE GROUPING(NAME) WHEN 0 THEN NAME ELSE 'TOTAL' END AS NAME,
SUM (WEEK1) AS WEEK1,
SUM (WEEK2) AS WEEK2,
SUM (WEEK3) AS WEEK3
FROM (
SELECT
NAME,
SUM(CASE WHEN ASSIST_1 = 'YES' THEN 1 END) WEEK1,
SUM(CASE WHEN ASSIST_2 = 'YES' THEN 1 END) WEEK2,
SUM(CASE WHEN ASSIST_3 = 'YES' THEN 1 END) WEEK3,
FROM TABLE_NAME
WHERE GROUP BY (NAME)
GROUP BY ROLLUP(NAME)
ORDER BY GROUPING(NAME);
Give this result:
Anne 1 2 3
Bob 3 1 0
Charlie 4 5 1
Total 8 8 4

BI Answers COUNT the number in cloumn?

Here is the data look like.
Name P_ID NUM
A P1 3
A P2 1
B P3 1
B P4 1
C P5 2
D P7 1
In BI Answers I want the result show like this:
Name NUM_OF_1 NUM_OF_2 NUM_OF_3 SUM
A 1 0 1 2
B 2 0 0 2
C 0 1 0 1
D 1 0 0 1
The column NUM_OF_N is occurrences of a number in a 'name' group.
If you are looking for a SQL query then you can try the following pivot:
SELECT Name,
SUM(CASE WHEN NUM = 1 THEN 1 ELSE 0 END) AS NUM_OF_1,
SUM(CASE WHEN NUM = 2 THEN 1 ELSE 0 END) AS NUM_OF_2,
SUM(CASE WHEN NUM = 3 THEN 1 ELSE 0 END) AS NUM_OF_3,
COUNT(*) AS "SUM"
FROM yourTable
GROUP BY Name
Tim has got it nailed in terms of SQL. In terms of pure OBI dev you should put that logic into logical (measure) columns in your RPD though so the BI server treats them as such and you can use them automatically with all the usual functionalities like drill, aggregate etc

Oracle Connect By seems to produce too many rows

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
I expect I'm just missing something, but if I run this query without the "connect by", I get 2 rows. When I add "connect by level <= 4", I would expect to get each of those 2 rows 4 times. The actual result is different.
Can anyone help me understand what's happening here? I'm not looking for a solution that only repeats each row 4 times - I've already got that. I'm just looking to understand what's happening and why.
with alpha as (
select 1 as id
from dual
),
beta as (
select 1 as alpha_id,
1 as beta_no
from dual
union all
select 1 as alpha_id,
2 as beta_no
from dual
)
select a.id,
b.beta_no,
level as the_level
from alpha a
inner join beta b
on b.alpha_id = a.id
connect by level <= 4
order by a.id,
b.beta_no,
level
;
ID BETA_NO THE_LEVEL
1 1 1
1 1 2
1 1 2
1 1 3
1 1 3
1 1 3
1 1 3
1 1 4
1 1 4
1 1 4
1 1 4
1 1 4
1 1 4
1 1 4
1 1 4
1 2 1
1 2 2
1 2 2
1 2 3
1 2 3
1 2 3
1 2 3
1 2 4
1 2 4
1 2 4
1 2 4
1 2 4
1 2 4
1 2 4
1 2 4
30 rows selected
Many thanks to mathguy. The second link he provided in the answer below had exactly what I was looking for. Specifically:
1 with t as (select 1 as id from dual union all
2 select 2 from dual)
3 --
4 select id, level
5 ,prior id
6 ,sys_connect_by_path(id,'=>') as cpath
7 from t
8* connect by level <= 3
SQL> /
ID LEVEL PRIORID CPATH
---------- ---------- ---------- --------------------------------------------------
1 1 =>1
1 2 1 =>1=>1
1 3 1 =>1=>1=>1
2 3 1 =>1=>1=>2
2 2 1 =>1=>2
1 3 2 =>1=>2=>1
2 3 2 =>1=>2=>2
2 1 =>2
1 2 2 =>2=>1
1 3 1 =>2=>1=>1
2 3 1 =>2=>1=>2
2 2 2 =>2=>2
1 3 2 =>2=>2=>1
2 3 2 =>2=>2=>2
14 rows selected.
It's clear to me from that example, but I'd be hard-pressed to succinctly put it into words.
With no condition other than "level <= 4", every row from the original table, view etc. (from the join, in this case) will produce two rows at level 2, then four more rows at level 3, and 8 more at level 4. "Connect by" is essentially a succession of joins, and you are doing cross joins if you have no condition with the PRIOR operator.
You probably want to add "and prior a.id = a.id". This will lead to Oracle complaining about cycles (because Oracle decides a cycle is reached when it sees the same values in the columns subject to PRIOR). That, in turn, is solved by adding a third condition, usually "and prior sys_guid() is not null".
(Edited; the original answer made reference to NOCYCLE, which is not needed when using the "prior sys_guid() is not null" approach.)
This has been discussed recently on OTN: https://community.oracle.com/thread/3999985
Same question discussed here: https://community.oracle.com/thread/2526535
To illustrate Mathguy's answer, you are missing some predicates out of your CONNECT BY clause:
with alpha as (
select 1 as id
from dual
),
beta as (
select 1 as alpha_id,
1 as beta_no
from dual
union all
select 1 as alpha_id,
2 as beta_no
from dual
)
select a.id,
b.beta_no,
level as the_level
from alpha a
inner join beta b
on b.alpha_id = a.id
connect by level <= 4
AND PRIOR a.id = a.id
AND PRIOR b.beta_no = b.beta_no
AND PRIOR sys_guid() IS NOT NULL
order by a.id,
b.beta_no,
LEVEL;
ID BETA_NO THE_LEVEL
---------- ---------- ----------
1 1 1
1 1 2
1 1 3
1 1 4
1 2 1
1 2 2
1 2 3
1 2 4
An alternative would be to use the recursive with clause:
with alpha as (
select 1 as id
from dual
),
beta as (
select 1 as alpha_id,
1 as beta_no
from dual
union all
select 1 as alpha_id,
2 as beta_no
from dual
),
multiply (id, beta_no, rn) AS (SELECT a.id,
b.beta_no,
1 rn
FROM alpha a
INNER JOIN beta b
ON a.id = b.alpha_id
UNION ALL
SELECT ID,
beta_no,
rn + 1
FROM multiply
WHERE rn + 1 <= 4)
SELECT ID,
beta_no,
rn AS the_level
FROM multiply
order by id,
beta_no,
rn;
ID BETA_NO THE_LEVEL
---------- ---------- ----------
1 1 1
1 1 2
1 1 3
1 1 4
1 2 1
1 2 2
1 2 3
1 2 4

add a new column for unique ID in hive table

i have a table in hive with two columns: session_id and duration_time like this:
|| session_id || duration||
1 14
1 10
1 20
1 10
1 12
1 16
1 8
2 9
2 6
2 30
2 22
i want to add a new column with unique id when:
the session_id is changing or the duration_time > 15
i want the output to be like this:
session_id duration unique_id
1 14 1
1 10 1
1 20 2
1 10 2
1 12 2
1 16 3
1 8 3
2 9 4
2 6 4
2 30 5
2 22 6
any ideas how to do that in hive QL?
thanks!
SQL tables represent unordered sets. You need a column specifying the ordering of the values, because you seem to care about the ordering. This could be an id column or a created-at column, for instance.
You can do this using a cumulative sum:
select t.*,
sum(case when duration > 15 or seqnum = 1 then 1 else 0 end) over
(order by ??) as unique_id
from (select t.*,
row_number() over (partition by session_id order by ??) as seqnum
from t
) t;

Get Hierarchy level and all node references on Oracle

I have been reading about CONNECT BY and CTE in Oracle, but I can't come up with a solution. I don't know how to use properly CONNECT BY to my needs, and recursive CTE's in Oracle are limited to 2 branches(one UNION ALL) and I'm using 3 branches.
In SQL Server it was kind of easy after I found this article. I only added another UNION ALL regarding to return all node references.
What I trying to do is having a hierarchy like this:
Code|Father
1 |NULL
2 |1
3 |2
And this should return me:
Node|Father|Level|JumpsToFather
1 |1 |1 |0
2 |1 |2 |1
2 |2 |2 |0
3 |1 |3 |2
3 |2 |3 |1
3 |3 |3 |0
Note: Yes I need to return a reference to themselves counting as zero jumps on the hierarchy
Here is a solution using a recursive CTE. I used lvl as column header since level is a reserved word in Oracle. You will see other differences in terminology as well. I use "parent" for the immediately higher level and "ancestor" for >= 0 steps (to accommodate your requirement of showing a node as its own ancestor). I used an ORDER BY clause to cause the output to match yours; you may or may not need the rows ordered.
Your question stimulated me to read again, in more detail, about hierarchical queries, to see if this can be done with them instead of recursive CTEs. Actually I already know you can, by using CONNECT_BY_PATH, but using a substr on that just to retrieve the top level in a hierarchical path is not satisfying at all, there must be a better way. (If that was the only way to do it with hierarchical queries, I would definitely go the recursive CTE route if it was available). I will add the hierarchical query solution here, if I can find a good one.
with h ( node, parent ) as (
select 1 , null from dual union all
select 2 , 1 from dual union all
select 3 , 2 from dual
),
r ( node , ancestor, steps ) as (
select node , node , 0
from h
union all
select r.node, h.parent, steps + 1
from h join r
on h.node = r.ancestor
)
select node, ancestor,
1+ (max(steps) over (partition by node)) as lvl, steps
from r
where ancestor is not null
order by lvl, steps desc;
NODE ANCESTOR LVL STEPS
---------- ---------- ---------- ----------
1 1 1 0
2 1 2 1
2 2 2 0
3 1 3 2
3 2 3 1
3 3 3 0
Added: Hierarchical query solution
OK - found it. Please test both solutions to see which performs better; from tests on a different setup, recursive CTE was quite a bit faster than hierarchical query, but that may depend on the specific situation. ALSO: recursive CTE works only in Oracle 11.2 and above; the hierarchical solution works with older versions.
I added a bit more test data to match Anatoliy's.
with h ( node, parent ) as (
select 1 , null from dual union all
select 2 , 1 from dual union all
select 3 , 2 from dual union all
select 4 , 2 from dual union all
select 5 , 4 from dual
)
select node,
connect_by_root node as ancestor,
max(level) over (partition by node) as lvl,
level - 1 as steps
from h
connect by parent = prior node
order by node, ancestor;
NODE ANCESTOR LVL STEPS
---------- ---------- ---------- ----------
1 1 1 0
2 1 2 1
2 2 2 0
3 1 3 2
3 2 3 1
3 3 3 0
4 1 3 2
4 2 3 1
4 4 3 0
5 1 4 3
5 2 4 2
5 4 4 1
5 5 4 0
thx for question, i spent 1 hour to write this:
with t as ( select code, parent, level l
from (select 1 as code, NULL as parent from dual union
select 2 , 1 from dual union
select 3 , 2 from dual
-- add some more data for demo case
union
select 4 , 2 from dual union
select 5 , 4 from dual
)
start with parent is null
connect by prior code = parent )
select code, (select code
from t t1
where l = ll
and rownum = 1
start with t1.code = main_t.code
connect by prior t1.parent = t1.code
) parent,
l code_level,
jumps
from (
select distinct t.*, l-level jumps, level ll
from t
connect by level <= l
) main_t
order by code, parent
as you can see, i'am add some more data to test my sql, here is output
CODE PARENT CODE_LEVEL JUMPS
---------- ---------- ---------- ----------
1 1 1 0
2 1 2 1
2 2 2 0
3 1 3 2
3 2 3 1
3 3 3 0
4 1 3 2
4 2 3 1
4 4 3 0
5 1 4 3
5 2 4 2
5 4 4 1
5 5 4 0
13 rows selected

Resources