Hi I have a dataframe as below with thousands of ID's. It has a list of ID's which have sub id's within them as shown. The subid's may get changed on daily basis, either a new sub id may be added, or an existing sub id maybe lost.
I need to create 2 new columns, which will flag whenever a sub id is added/lost.
So, in the below format you can see that on the 12th, a new sub id 'D' is added
and on the 13th, and existing sub id (c) is lost.
i want to create a new column/flag to track these sub ids. Can you please help me with this?
I am using Python 3.5. Thanks
Sample format for one ID:
ID Sub Id Date is_new
1 a 3/11/2016 0
1 b 3/11/2016 0
1 c 3/11/2016 0
1 a 3/12/2016 0
1 b 3/12/2016 0
1 c 3/12/2016 0
1 d 3/12/2016 1
1 a 3/13/2016 0
1 b 3/13/2016 0
1 d 3/13/2016 0
Below query will indicate when a sub-id is added or deleted. Hope this helps.
Get the max and min update date per id, I put it in a temp table name: min_max
If update date is same with min and max then mark them as 1
Lag and lead functions will get the previous and next sub id per ID, subid order by
Put everything on a subquery (table s)
If update date is not the start or end date per ID, then it can be added (is_mindte=0) or deleted (is_maxdte=0)
If is_added column is null, then it is added on that date (is_added is null); If is_deleted column is null, then it is deleted the next update date (is_added is null)
select s.id,
s.subid,
s.upddate,
(case when is_mindte=0 and is_added is null
then 1 else 0 end ) is_new,
(case when is_maxdte=0 and is_deleted is null
then 1 else 0 end) is_removed
from (
with min_max as
(select id,min(upddate) mindate,max(upddate) maxdate
from myTable
group by id)
select t.id,
t.subid,
t.upddate,
case when t.upddate=m.mindate
then 1 else 0 end is_mindte,
case when t.upddate=m.maxdate
then 1 else 0 end is_maxdte,
lag(t.subid) over (partition by t.id, t.subid order by t.upddate) is_added,
lead(t.subid) over (partition by t.id, t.subid order by t.upddate) is_deleted
from myTable t, min_max m
where t.id=m.id) s
order by s.id,
s.upddate,
s.subid
sample result:
ID SUBID UPDDATE IS_NEW IS_REMOVED
1 a 2016-03-11T00:00:00Z 0 0
1 b 2016-03-11T00:00:00Z 0 0
1 c 2016-03-11T00:00:00Z 0 0
1 a 2016-03-12T00:00:00Z 0 0
1 b 2016-03-12T00:00:00Z 0 0
1 c 2016-03-12T00:00:00Z 0 1
1 d 2016-03-12T00:00:00Z 1 0
1 a 2016-03-13T00:00:00Z 0 0
1 b 2016-03-13T00:00:00Z 0 0
1 d 2016-03-13T00:00:00Z 0 0
2 a 2016-03-11T00:00:00Z 0 0
2 b 2016-03-11T00:00:00Z 0 0
2 c 2016-03-11T00:00:00Z 0 0
Related
I have a table that contains one or more records for each item. Each item can contain multiple sub-items (boards) and so the Itemid is often replicated with each record showing the division category (a number) that the Item/sub-item combo resides in:
ItemId Board# Division
142585109 0 6
142585114 0 3
142585116 0 1
142585120 0 4
142585197 0 5
142585197 2 4
142585197 3 3
142585197 5 6
142585197 8 1
142585294 0 4
142585317 0 1
I want to update the table and aggregate all of the division values (as a comma separated string) in a new field in this table, something like:
ItemId Board# AggDivisions
142585109 0 6
142585114 0 3
142585116 0 1
142585120 0 4
142585197 0 1,3,4,5,6
142585294 0 4
142585317 0 1
I used a ListAgg query to do the aggregation which works correctly but when I tried to incorporate this into an update query, I end up with multiple duplicates in the aggregated field for each record.
Here is my update attempt:
update itemtable dd
set aggregateddivisions = (SELECT Listagg(division, ',') within GROUP (ORDER BY division)
FROM itemtable ev
WHERE ev.itemid = dd.itemid
)
where exists (select 1
from itemtable ev
where ev.itemid = dd.itemid
);
How can I update the table with the aggregated list of values from the same table without ending up with duplicates?
I want to merge two columns(Sender and Receiver) and get the Transaction Type count then merge another table with using Sender_Receiver primary id.
Sender Receiver Type Amount Date
773787639 777611388 1 300 2/1/2019
773631898 776806843 4 450 8/20/2019
773761571 777019819 6 369 2/11/2019
774295511 777084440 34 1000 1/22/2019
774263079 776816905 45 678 6/27/2019
774386894 777202863 12 2678 2/10/2019
773671537 777545555 14 38934 9/29/2019
774288117 777035194 18 21 4/22/2019
774242382 777132939 21 1275 9/30/2019
774144715 777049859 30 6309 7/4/2019
773911674 776938987 10 3528 5/1/2019
773397863 777548054 15 35892 7/6/2019
776816905 772345091 6 1234 7/7/2019
777035194 775623065 4 453454 7/20/2019
Second Table
Mobile_number Age
773787639 34
773787632 23
774288117 65
I am try to get like this kind of table
Sender/Receiver Type_1 Type_4 Type_12...... Type_45 Age
773787639 3 2 0 0 23
773631898 1 0 1 2 56
773397863 2 2 0 0 65
772345091 1 1 0 3 32
Ok, I have seen your old question and you just need inner join in sub-query as following:
SELECT
SenderReceiver,
COUNT(CASE WHEN Type = 1 THEN 1 END) AS Type_1,
COUNT(CASE WHEN Type = 2 THEN 1 END) AS Type_2,
COUNT(CASE WHEN Type = 3 THEN 1 END) AS Type_3,
...
COUNT(CASE WHEN Type = 45 THEN 1 END) AS Type_45,
Age -- changes here
FROM
( SELECT sr.SenderReceiver, sr.Type, st.Age from -- changes here
(SELECT Sender AS SenderReceiver, Type FROM yourTable
UNION ALL
SELECT Receiver, Type FROM yourTable) sr
join <second_table> st on st.Mobile_number = sr.SenderReceiver -- changes here
) t
GROUP BY
SenderReceiver,
Age; -- changes here
Changes done in your previous query are marked with comments -- changes here.
Please replace the name of the <second_table> with the original name of the table.
Cheers!!
I just want to count duplicated dae columnds in my table. My tables are like that:
VISIT:
ID_VISIT FK_PATIENT DATEA
0 1 20160425
1 2 20160425
2 3 20160426
I tried these :
SELECT VISIT.DATEA, COUNT(VISIT.DATEA) as numberOfDate FROM VISIT
SELECT VISIT.DATEA, COUNT(VISIT.DATEA) as numberOfDate FROM VISIT GROUP BY numberOfDate
but I got only like this :
DATEA NUMBEROFDATE
20160502 1
20160430 1
20160503 1
20160501 1
20160429 1
20160425 1
20160425 1
20160425 1
20160428 1
20160504 1
but I want to get like this
DATEA NUMBEROFDATE
20160502 1
20160430 1
20160503 1
20160501 1
20160429 1
20160425 3
20160428 1
20160504 1
Group by the column you want to be unique. Then aggregate functions like count() apply to each group
SELECT DATEA, COUNT(DATEA) as numberOfDate
FROM VISIT
GROUP BY DATEA
I come from a MySQL background and am having problems with the following query:
SELECT DISTINCT agenda.idagenda AS "ID_SERVICE", agenda.name AS "ID_SERVICE_NAME", specialities.id AS "ID_DEPARTMENT", specialities.name AS "ID_DEPARTMENT_NAME",
supervisor.clients_waiting AS "CWaiting",
(CASE WHEN supervisor.clients_resent_waiting_area IS NULL THEN 0 ELSE supervisor.clients_resent_waiting_area END) AS "CWaiting_Resent_Area",
supervisor.clients_attending AS "CAttending",
supervisor.clients_attended AS "CAttended",
(SELECT MAX(ROUND((SYSDATE-core.supervisor_time_data.time_attending)*86400)) FROM dual) AS "MTA",
(SELECT MAX(ROUND((SYSDATE-core.supervisor_time_data.time_waiting)*86400)) FROM dual) AS "MTE",
(SELECT SUM(SYSDATE-supervisor_time_data.time_attending)*86400 FROM dual)/(SELECT supervisor.clients_attending FROM dual) AS "TMA",
(SELECT SUM(SYSDATE-supervisor_time_data.time_waiting)*86400 FROM dual)/(SELECT supervisor.clients_waiting FROM dual) AS "TME",
supervisor.tme_accumulated AS "TME_ACCUMULATED",
supervisor.tma_accumulated AS "TMA_ACCUMULATED",
(CASE WHEN agenda.alarm_cee IS NULL THEN 0 ELSE agenda.alarm_cee END) AS "ALARM_CEE",
(CASE WHEN agenda.alarm_mte IS NULL THEN 0 ELSE agenda.alarm_mte END) AS "ALARM_MTE",
(CASE WHEN agenda.alarm_mta IS NULL THEN 0 ELSE agenda.alarm_mta END) AS "ALARM_MTA",
(CASE WHEN agenda.alarm_tme IS NULL THEN 0 ELSE agenda.alarm_tme END) AS "ALARM_TME"
FROM CORE.supervisor
LEFT JOIN CORE.supervisor_time_data ON supervisor_time_data.id_service = supervisor.id_service
LEFT JOIN CORE.agenda ON supervisor.id_service = agenda.id
LEFT JOIN CORE.specialities ON agenda.idspeciality = specialities.id
WHERE supervisor.booked_or_sequential = 1
GROUP BY agenda.idagenda, agenda.name, supervisor.id_service, specialities.id, specialities.name, supervisor.clients_waiting, supervisor.clients_resent_waiting_area,
supervisor.clients_attending, supervisor.clients_attended,
supervisor_time_data.time_attending, supervisor_time_data.time_waiting,
supervisor.tme_accumulated, supervisor.tma_accumulated, agenda.alarm_cee, agenda.alarm_mte,agenda.alarm_mta,agenda.alarm_tme;
It should return two records, but instead it's returning four. ID_SERIVE is returning 3 records with the same value.
"ID_SERVICE" "ID_SERVICE_NAME" "ID_DEPARTMENT" "ID_DEPARTMENT_NAME" "CWaiting" "CWaiting_Resent_Area" "CAttending" "CAttended" "MTA" "MTE" "TMA" "TME" "TME_ACCUMULATED" "TMA_ACCUMULATED" "ALARM_CEE" "ALARM_MTE" "ALARM_MTA" "ALARM_TME"
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 5504 5504 21 109 0 0 0 0
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 1590 1590.000000000000000000000000000000000002 21 109 0 0 0 0
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 21 109 0 0 0 0
"TRAU" "TRAU" 1 "SECUENCIALES" 1 0 0 0 1567 1567.000000000000000000000000000000000002 0 0 0 0 0 0
What am I doing wrong?
Thanks
You seem to be including all the columns you're interesting in the group by, and not aggregating properly; possibly you've got to this point by trial and error as you've tried to resolve errors from columns not being grouped. You don't need the sub-selects in all the column clauses.
Untested as we don't have your tables or raw data but it looks like you want something like:
SELECT agenda.idagenda AS "ID_SERVICE",
agenda.name AS "ID_SERVICE_NAME",
specialities.id AS "ID_DEPARTMENT",
specialities.name AS "ID_DEPARTMENT_NAME",
supervisor.clients_waiting AS "CWaiting",
NVL(supervisor.clients_resent_waiting_area, 0) AS "CWaiting_Resent_Area",
supervisor.clients_attending AS "CAttending",
supervisor.clients_attended AS "CAttended",
MAX(ROUND((SYSDATE - supervisor_time_data.time_attending)*86400)) AS "MTA",
MAX(ROUND((SYSDATE - supervisor_time_data.time_waiting)*86400)) AS "MTE",
SUM(SYSDATE - supervisor_time_data.time_attending)*86400
/ supervisor.clients_attending AS "TMA",
SUM(SYSDATE - supervisor_time_data.time_waiting)*86400
/ supervisor.clients_waiting AS "TME",
supervisor.tme_accumulated AS "TME_ACCUMULATED",
supervisor.tma_accumulated AS "TMA_ACCUMULATED",
NVL(agenda.alarm_cee, 0) AS "ALARM_CEE",
NVL(agenda.alarm_mte, 0) AS "ALARM_MTE",
NVL(agenda.alarm_mta, 0) AS "ALARM_MTA",
NVL(agenda.alarm_tme, 0) AS "ALARM_TME"
FROM CORE.supervisor
LEFT JOIN CORE.supervisor_time_data
ON supervisor_time_data.id_service = supervisor.id_service
LEFT JOIN CORE.agenda ON supervisor.id_service = agenda.id
LEFT JOIN CORE.specialities ON agenda.idspeciality = specialities.id
WHERE supervisor.booked_or_sequential = 1
GROUP BY agenda.idagenda, agenda.name, supervisor.id_service, specialities.id,
specialities.name, supervisor.clients_waiting,
supervisor.clients_resent_waiting_area, supervisor.clients_attending,
supervisor.clients_attended, supervisor.tme_accumulated,
supervisor.tma_accumulated, agenda.alarm_cee,
agenda.alarm_mte,agenda.alarm_mta,agenda.alarm_tme;
So specifically, supervisor_time_data.time_waiting and supervisor_time_data.time_attending don't need to be in the group by as they are used in aggregate.
I've replaced your case checks with nvl just because it's shorter; case is fine though if you prefer that.
I have a data set like this
id subid date(in yyyymmdd) time(in hh24miss) count1 count2
80013727 20000000431 20120429 001500 0 0
80013727 20000000431 20120429 003000 0 0
80013729 20000000432 20120429 001500 0 0
80013729 20000000432 20120429 003000 0 0
80013728 20000000435 20120429 001500 0 0
80013728 20000000435 20120429 003000 0 0
As you can see time is in 15 minutes increment . i want to show output the result set like below.
id Date subid 00:00:00-00:14:59 00:15:00-00:29:59
80013727 20120429 20000000431 0 0
80013729 20120429 20000000432 0 0
as you can see all all the data related to id 80013727 i s shown in one row instead of 2 for the date 20120429.
please tell me how to achieve it.
header row can be printed one time using dbms_output.put_line.
Hi here is your answers-
oracle ver 10.2 g
for a unique id,subid,date combination count1 and count2 is need to be shown in one row.
instead of 4 rows that can be seen from top most result set.
80013727 20000000431 20120429 has 2 rows for different time (i.e. 015000,030000)
I need to show
80013727 20000000431 20120429 count1(from 1st row),count1(from 2nd row)
80013727 20000000431 20120429 count2(from 1st row),count2(from 2nd row)
Obviously you have simplified your data and your output structure. I'm guessing you'll end up with 96 count columns (although I'm not going that far either).
with cte as
( select * from your_table )
select id
, subid
, date
, type
, sum(c01) as "00:00:00-00:14:59"
, sum(c02) as "00:15:00-00:29:59"
, sum(c96) as "23:45:00-23:59:59"
from (
select id
, subid
, date
, 'C1' type
, case when time between 0 and 899 then count1 else 0 end as c01
, case when time between 900 and 1799 then count1 else 0 end as c02
, case when time between 85500 and 86399 then count1 else 0 end as c96
from cte
union all
select id
, subid
, date
, 'C2' type
, case when time between 0 and 899 then count2 else 0 end as c01
, case when time between 900 and 1799 then count2 else 0 end as c02
, case when time between 85500 and 86399 then count2 else 0 end as c96
)
group by id, subid, date, type
order by id, subid, date, type
So, this use a sub-query factoring expression to select only once from your table. It uses case() to assign counts to a specific time column, on the basis of a range of seconds. There are two queries to aggregate the counts for lines 1 and 2.
The sum() calls may be unnecessary; it's not clear from your data whether you have more than one record in each time slot.