SUM with distinct multiple lines - oracle

DB: ORACLE
Hi guys. I am constructing a query and I have the follow situation:
My table
---------------------------------------
Risk Risk Factor Control
---------------------------------------
RK 1 RF 1 Control 1
RK 1 RF 1 Control 2
RK 2 RF 3 Control 1
---------------------------------------
So I'd like to sum how much Risks Factors I have per risks e how much controls I have per Risk too.
Result
--------------------------------------
Risk SUM RF SUM Control
--------------------------------------
RK 1 1 2
RK 2 1 1
--------------------------------------
Does anyone knows how to fix this problem?
Kind Regards
I tried a simple sum. I created a view when a have the relation between Risk Factor and Control so I made a join with risk table, example:
SELECT RK.NAME,
SUM(CASE WHEN RFC.RISKFACTOR IS NOT NULL THEN 1 ELSE 0) SUM_RK,
SUM(CASE WHEN RFC.CONTROL IS NOT NULL THEN 1 ELSE 0) SUM_CONTROL
FROM T_RISK RK
JOIN V_RF_CONTROL RFC
ON RFC.RELATIONID = RK.RISKID

You don't need to sum here - you just need to count the distinct values:
SELECT RK.NAME,
COUNT(DISTINCT RFC.RISKFACTOR) SUM_RK,
COUNT(DISTINCT RFC.CONTROL) SUM_CONTROL
FROM T_RISK RK
JOIN V_RF_CONTROL RFC ON RFC.RELATIONID = RK.RISKID

Related

Complex Networks in Hive - Optimization Code

I have a problem with how to get my Hive code optimized.
I have a huge table as follows:
Customer_id Product_id Date Value
1 1 02/28 100.0
1 2 02/02 120.0
1 3 02/10 144.0
2 2 02/15 120.0
2 3 02/28 144.0
... ... ... ...
I want to create a complex network where I link the products through the buyers. The graph does not have to be directed and I have to count the number of links between them.
In the end I need this:
Product_x Product_y amount
1 2 1
1 3 1
2 3 2
Can anyone help me with this?
I need an optimized way to do this. The join of the table with itself is not the solution. I really need an optimum way on this =/
CREATE TABLE X AS
SELECT
a.product_id as product_x,
b.product_id as product_y,
count(*) as amout
FROM table as a
JOIN table as b
ON a.customer_id = b.customer_id
WHERE a.product_id < b.product_id
GROUP BY product_x, product_y;

SQL sample groups

I have a sqlite database that I can read as:
In [42]: df = pd.read_sql("SELECT * FROM all_vs_all", engine)
In [43]:
In [43]: df.head()
Out[43]:
user_data user_model \
0 037d05edbbf8ebaf0eca#172.16.199.165 037d05edbbf8ebaf0eca#172.16.199.165
1 037d05edbbf8ebaf0eca#172.16.199.165 060210bf327a3e3b4621#172.16.199.33
2 037d05edbbf8ebaf0eca#172.16.199.165 1141259bd36ba65bef02#172.21.44.180
3 037d05edbbf8ebaf0eca#172.16.199.165 209627747e2af1f6389e#172.16.199.181
4 037d05edbbf8ebaf0eca#172.16.199.165 303a1aff4ab6e3be82ab#172.21.112.182
score Class time_secs model_name bin_id
0 0.283141 0 1514764800 Flow 0
1 0.999300 1 1514764800 Flow 0
2 1.000000 1 1514764800 Flow 0
3 0.206360 1 1514764800 Flow 0
4 1.000000 1 1514764800 Flow 0
As the table is too big I rather than reading the full table I select a random subset of rows:
This can be done very quckly as:
random_query = "SELECT * FROM all_vs_all WHERE abs(CAST(random() AS REAL))/9223372036854775808 < %f AND %s" % (ratio, time_condition)
df = pd.read_sql(random_query, engine)
The problem is that for each triplet [user_data, user_model, time_secs] I want to get all the rows containing that triplet. Each triplet appears 1 or 2 times.
A possible way to do it is to firstly sample a random set of triplets and then get all the rows that have one of the selected triplets but this seems to be too slow.
Is there an efficient way to do it?
EDIT: If I could load all the data in pandas I would have done something like:
selected_groups = []
for group in df.groupby(['user_data', 'user_model', 'time_secs']):
if np.random.uniform(0,1) > ratio:
selected_groups.append(group)
res = pd.concat(selected_groups)
Few sample join and sql query:
currently admitted :
Select p.patient_no, p.pat_name,p.date_admitted,r.room_extension,p.date_discharged FROM Patient p JOIN room r ON p.room_location = r.room_location where p.date_discharged IS NULL ORDER BY p.patient_no,p.pat_name,p.date_admitted,r.room_extension,p.date_discharged;
vacant rooms:
SELECT r.room_location, r.room_accomadation, r.room_extension FROM room r where r.room_location NOT IN (Select p.room_location FROM patient.p where p.date_discharged IS NULL) ORDER BY r.room_location, r.room_accomadation, r.room_extension;
no charges yet :
SELECT p.patient_no, p.pat_name, COALESCE (b.charge,0.00) charge FROM patient p LEFT JOIN billed b on p.patient_no = b.patient_no WHERE p.patient_no NOT IN (SELECT patient_no FROM billed) group by p.patient_no ORDER BY p.patient_no, p.pat_name,charge;
max salarised :
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MAX(salary) FROM physician) UNION
SELECT phy_id,phy_name, salary FROM physician where salary in (SELECT MIN(salary) FROM physician) ORDER BY phy_id,phy_name, salary;
various item consumed by:
select p.pat_name, i.discription, count (i.item code) as item code from patient p join billed b on p.patient no = b. patient no join item i on b.item code = i.item code group by p.patient no, i.item code order by..
patient not receivede treatment:
SELECT p.patient_no,p.pat_name FROM patient p where p.patient_no NOT IN (SELECT t.patient_no FROM treats t)
ORDER BY p.patient_no,p.pat_name;
2 high paid :
Select phy_id, phy_name, date_of_joining, max(salary) as salary from physician group by salary having salary IN (Select salary from physician)
Order by phy_id, phy_name, date_of_joining, salary limit 2;
over 200:
select patient_no, sum (charge), as total charge from billed group by patient no having total charges > 200 order by patient no, total charges

Select different count from same table

I have the table T_LOCATION_DATA on Oracle DB as follows:
Person_ID | Location | Role
----------------------------
101 Delhi Manager
102 Mumbai Employee
103 Noida Manager
104 Mumbai Employee
105 Noida Employee
106 Delhi Manager
107 Mumbai Manager
108 Delhi Employee
109 Mumbai Employee
Another table is T_STATUS with following data:
Person_ID | Status
-------------------
101 Active
102 Active
103 Inactive
104 Active
105 Active
106 Inactive
107 Active
108 Active
109 Inactive
I am trying to get the count of both Employee and Manager who are Active; group by location in a single query so that the result comes as follows:
Location | MANAGER COUNT | EMPLOYEE COUNT
Delhi 1 1
Mumbai 1 1
Noida 0 1
I am trying with following query but with no result:
select location, count (a.person_id) as MANAGER COUNT,
count (b.person_id) as EMPLOYEE COUNT
from T_LOCATION_DATA a,T_LOCATION_DATA b
where a.person_id in (select person_id from t_status where status='Active')
... and I get lost here
Can someone guide me on this please?
From your data, I would query like this:
SELECT
Location,
COUNT(CASE WHEN Role='Manager' THEN 1 END) as count_managers,
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(*) count_everyone
FROM
t_location_data l
INNER JOIN
t_status s
ON
l.person_id = s.person_id AND
s.status = 'Active'
GROUP BY location
Differences to your SQL:
We dump the awful old join syntax (SELECT * FROM a,b WHERE a.id=b.id) - please always use a JOIN b ON a.id = b.id
We join in the status table but we only really do that for the active ones, hence the reason why i stated it as another clause in the ON. I could have put it in a WHERE. With an INNER JOIN it makes no difference. With an OUTER JOIN it can make a big difference, as if you write a LEFT JOIN b ON a.id = b.id WHERE b.id = 'active' will convert that LEFT JOIN back to an INNER JOIN behaviour unless you made a where clause like WHERE b.id = 'active' OR b.id IS NULL - and that's just ugly. If that comparison to a constant had been put in an ON clause, you can skip the or ... is null ugliness
We group by location, but we don't necessarily count everything. If we count the result of a CASE WHEN role = 'Manager' THEN ..., the case when produces a 1 for a manager, and it produces NULL for a non manager (i didn't specify anything for the else; this is the design behaviour of CASE WHEN in such a scenario). The number didn't have to be a 1 either; it could be 'a', Role; anything that is non null. COUNT counts anything non null as a 1, and null as a 0. The following are thus equivalent, pick whichever one makes more sense to you:
COUNT(CASE WHEN Role='Employee' THEN 1 END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN 'a' END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role END) as count_employees,
COUNT(CASE WHEN Role='Employee' THEN role ELSE null END) as count_employees,
SUM(CASE WHEN Role='Employee' THEN 1 ELSE 0 END) as count_employees,
They both work as counts, but in the SUM case, you really do have to use 1 and 0 if you want the output number to be a count. Actually, 0 is optional, as SUM doesn't sum nulls (but as mathguy points out below, if you didn't put ELSE 0, then the SUM method would produce a NULLwhen there were 0 items, rather than a 0. Whether this is helpful or hindering to you is a decision for you alone to make)
I wasn't clear whether managers are employees also. To me, they are, maybe not to you. I added a COUNT(*) that literally counts everyone at the location. Any difference meaning count_employees+count_managers != count_everyone means there was another role, not manager or employee, in the table.. Pick your poison
This COUNT/SUM(CASE WHEN...) pattern is really useful for turning data around - a PIVOT operation. It takes a column of data:
Manager
Employee
Manager
And turns it into two columns, for the count values:
Manager Employee
2 1
You can extend it as many times as you like. If you have 10 roles, make 10 case whens, and the results will have 10 columns with a grouped up count. The data is pivoted from row-ar representation to column-ar representation

Oracle Query - to sum up the balances on some criteria

Can someone please help me to solve this .. I need to write oracle query to sum up the balances by by REF and REFERENCE_ID
Below are some criteria,
MultiValue can start with 1, 2, 3 and it will have any number of Subvalues from 1 to n...
Now PROPERTY field value is available only for MultiValue=1 and SubValue=1..
we have consider same PROPERTY property field for that multivalue set.
e.g for MultiValue 1 PROPERTY=BALANCE and Multivalue =2 PROPERTY = INTEREST etc...and need to sum up the balances
by REF and REFERENC_ID
Also, need to separate BALANCES amouts and INTEREST amount and PENALTY amount.
Here is some sample data... Any help is appreciated.. Thanks in advance.
Here is the sample output for first two ids..
You can fill in the missing propery values with an analytic first_value() function call (or max, min, etc.):
select reference_id, multivalue, subvalue, code,
first_value(property) over (partition by reference_id, multivalue) as property,
ref, amount
from your_table;
REFERENCE_ID MULTIVALUE SUBVALUE CODE PROPERTY R AMOUNT
-------------- ---------- ---------- ---------- -------- - ----------
BILL121220PBD8 1 1 10001 BALANCE a 1061.08
BILL121220PBD8 1 2 10001 BALANCE b 5395.89
BILL121220PBD8 1 3 10001 BALANCE c 4043.07
BILL121220PBD8 1 4 10001 BALANCE d 4100.22
BILL121220R2HL 2 1 10001 INTEREST e 60487.88
BILL121220R2HL 2 2 10001 INTEREST e 60487.88
BILL121220R2HL 2 3 10001 INTEREST f 526631.51
...
Then you can use that as a subquery (as an inline view or CTE) to form the basis for your grouping:
select reference_id as bill_reference, property, ref as repay_ref,
sum(amount) as repay_amount
from (
select reference_id, multivalue, subvalue, code,
first_value(property) over (partition by reference_id, multivalue) as property,
ref, amount
from your_table
)
group by reference_id, property, ref
order by reference_id, property, ref;
BILL_REFERENCE PROPERTY R REPAY_AMOUNT
-------------- -------- - ------------
BILL121220PBD8 BALANCE a 1061.08
BILL121220PBD8 BALANCE b 5395.89
BILL121220PBD8 BALANCE c 4043.07
BILL121220PBD8 BALANCE d 4100.22
BILL121220R2HL INTEREST e 120975.76
BILL121220R2HL INTEREST f 526631.51
...
(I retyped the reference IDs from your image to make up the test data, but couldn't be bothered to retype the individual ref's too. One of the reasons it's preferred to have text in questions rather than images.)
Are you trying to aggregate the amount by REF or REFERENCE_ID? If so, you should use group by clause.
For example, if you want to sum up the amount for each type of REF:
select REF, SUM(AMOUNT)
from BALANCE_TBL
group by REF

displaying the top 3 rows

In the school assignment I'm working on I need to display the 3 criminals with the most crimes. But I'm having a few problems
Here's the code I have so far, and its output:
`Select Last, First, Count(Crime_ID)
From Criminals Natural Join crimes
Group by Last, First, Criminal_ID
order by Count(Crime_Id) Desc`
`LAST FIRST COUNT(CRIME_ID)
--------------- ---------- ---------------
Panner Lee 2
Sums Tammy 1
Statin Penny 1
Dabber Pat 1
Mansville Nancy 1
Cat Tommy 1
Phelps Sam 1
Caulk Dave 1
Simon Tim 1
Pints Reed 1
Perry Cart 1
11 rows selected `
I've been toying around with ROWNUM, but when I include it in the SELECT it won't run because of my GROUP BY. But If you put ROWNUM in the GROUP BY it just separates everything back out.
I just want to display the top 3 with the most crimes, which is weird because only 1 guy has more than 1 crime. Theoretically, more criminals would be added to the Database, but these are the tables given in the assignment.
select *
from
( Select Last, First, Count(Crime_ID)
From Criminals Natural Join crimes
Group by Last, First, Criminal_ID
order by Count(Crime_Id) Desc )
where ROWNUM <= 3;

Resources