Data:
ID PARENT_ID
1 [null]
2 1
3 1
4 2
Desired result:
ID CHILD_AT_ANY_LEVEL
1 2
1 3
1 4
2 4
I've tried SYS_CONNECT_BY_PATH, but I don't understand how to convert it result into "inline view" which I can use for JOIN with main table.
select connect_by_root(id) id, id child_at_any_level
from table
where level <> 1
connect by prior id = parent_id;
Hi I have a dataframe as below with thousands of ID's. It has a list of ID's which have sub id's within them as shown. The subid's may get changed on daily basis, either a new sub id may be added, or an existing sub id maybe lost.
I need to create 2 new columns, which will flag whenever a sub id is added/lost.
So, in the below format you can see that on the 12th, a new sub id 'D' is added
and on the 13th, and existing sub id (c) is lost.
i want to create a new column/flag to track these sub ids. Can you please help me with this?
I am using Python 3.5. Thanks
Sample format for one ID:
ID Sub Id Date is_new
1 a 3/11/2016 0
1 b 3/11/2016 0
1 c 3/11/2016 0
1 a 3/12/2016 0
1 b 3/12/2016 0
1 c 3/12/2016 0
1 d 3/12/2016 1
1 a 3/13/2016 0
1 b 3/13/2016 0
1 d 3/13/2016 0
Below query will indicate when a sub-id is added or deleted. Hope this helps.
Get the max and min update date per id, I put it in a temp table name: min_max
If update date is same with min and max then mark them as 1
Lag and lead functions will get the previous and next sub id per ID, subid order by
Put everything on a subquery (table s)
If update date is not the start or end date per ID, then it can be added (is_mindte=0) or deleted (is_maxdte=0)
If is_added column is null, then it is added on that date (is_added is null); If is_deleted column is null, then it is deleted the next update date (is_added is null)
select s.id,
s.subid,
s.upddate,
(case when is_mindte=0 and is_added is null
then 1 else 0 end ) is_new,
(case when is_maxdte=0 and is_deleted is null
then 1 else 0 end) is_removed
from (
with min_max as
(select id,min(upddate) mindate,max(upddate) maxdate
from myTable
group by id)
select t.id,
t.subid,
t.upddate,
case when t.upddate=m.mindate
then 1 else 0 end is_mindte,
case when t.upddate=m.maxdate
then 1 else 0 end is_maxdte,
lag(t.subid) over (partition by t.id, t.subid order by t.upddate) is_added,
lead(t.subid) over (partition by t.id, t.subid order by t.upddate) is_deleted
from myTable t, min_max m
where t.id=m.id) s
order by s.id,
s.upddate,
s.subid
sample result:
ID SUBID UPDDATE IS_NEW IS_REMOVED
1 a 2016-03-11T00:00:00Z 0 0
1 b 2016-03-11T00:00:00Z 0 0
1 c 2016-03-11T00:00:00Z 0 0
1 a 2016-03-12T00:00:00Z 0 0
1 b 2016-03-12T00:00:00Z 0 0
1 c 2016-03-12T00:00:00Z 0 1
1 d 2016-03-12T00:00:00Z 1 0
1 a 2016-03-13T00:00:00Z 0 0
1 b 2016-03-13T00:00:00Z 0 0
1 d 2016-03-13T00:00:00Z 0 0
2 a 2016-03-11T00:00:00Z 0 0
2 b 2016-03-11T00:00:00Z 0 0
2 c 2016-03-11T00:00:00Z 0 0
I'm working in HIVE,
I have a dataset like :
client_id date nb_pts
1 2016-06-01 1
1 2016-06-02 3
1 2016-06-03 4
2 2016-06-01 2
2 2016-06-02 3
I need to output for each client, the difference between current nb_pts and previous nb_pts.
So my output should be :
client_id date nb_pts nb_pts_per_row
1 2016-06-01 1 1 (1-0)
1 2016-06-02 3 2 (3-1)
1 2016-06-03 4 1 (4-3)
2 2016-06-01 2 2 (2-0)
2 2016-06-02 3 1 (3-2)
I've tried to use LAG function un HIVE:
SELECT client_id, date, nb_pts,
nb_pts - (LAG(nb_pts, 1, 0) OVER (PARTITION BY client_id ORDER BY date ROWS 1 PRECEDING)) as nb_pts_per_row
FROM MyTable
But the validation failed. Its says :
Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: Expecting left window frame boundary for function LAG((TOK_TABLE_OR_COL nb_pts), 1, 0) org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec#27a007cd as LAG_window_0 to be unbounded.
EDIT (SOLUTION):
So it works without ROWS 1 PRECEDING :
SELECT client_id, date, nb_pts,
nb_pts - (LAG(nb_pts, 1, 0) OVER (PARTITION BY client_id ORDER BY date)) as nb_pts_per_row
FROM MyTable
I come from a MySQL background and am having problems with the following query:
SELECT DISTINCT agenda.idagenda AS "ID_SERVICE", agenda.name AS "ID_SERVICE_NAME", specialities.id AS "ID_DEPARTMENT", specialities.name AS "ID_DEPARTMENT_NAME",
supervisor.clients_waiting AS "CWaiting",
(CASE WHEN supervisor.clients_resent_waiting_area IS NULL THEN 0 ELSE supervisor.clients_resent_waiting_area END) AS "CWaiting_Resent_Area",
supervisor.clients_attending AS "CAttending",
supervisor.clients_attended AS "CAttended",
(SELECT MAX(ROUND((SYSDATE-core.supervisor_time_data.time_attending)*86400)) FROM dual) AS "MTA",
(SELECT MAX(ROUND((SYSDATE-core.supervisor_time_data.time_waiting)*86400)) FROM dual) AS "MTE",
(SELECT SUM(SYSDATE-supervisor_time_data.time_attending)*86400 FROM dual)/(SELECT supervisor.clients_attending FROM dual) AS "TMA",
(SELECT SUM(SYSDATE-supervisor_time_data.time_waiting)*86400 FROM dual)/(SELECT supervisor.clients_waiting FROM dual) AS "TME",
supervisor.tme_accumulated AS "TME_ACCUMULATED",
supervisor.tma_accumulated AS "TMA_ACCUMULATED",
(CASE WHEN agenda.alarm_cee IS NULL THEN 0 ELSE agenda.alarm_cee END) AS "ALARM_CEE",
(CASE WHEN agenda.alarm_mte IS NULL THEN 0 ELSE agenda.alarm_mte END) AS "ALARM_MTE",
(CASE WHEN agenda.alarm_mta IS NULL THEN 0 ELSE agenda.alarm_mta END) AS "ALARM_MTA",
(CASE WHEN agenda.alarm_tme IS NULL THEN 0 ELSE agenda.alarm_tme END) AS "ALARM_TME"
FROM CORE.supervisor
LEFT JOIN CORE.supervisor_time_data ON supervisor_time_data.id_service = supervisor.id_service
LEFT JOIN CORE.agenda ON supervisor.id_service = agenda.id
LEFT JOIN CORE.specialities ON agenda.idspeciality = specialities.id
WHERE supervisor.booked_or_sequential = 1
GROUP BY agenda.idagenda, agenda.name, supervisor.id_service, specialities.id, specialities.name, supervisor.clients_waiting, supervisor.clients_resent_waiting_area,
supervisor.clients_attending, supervisor.clients_attended,
supervisor_time_data.time_attending, supervisor_time_data.time_waiting,
supervisor.tme_accumulated, supervisor.tma_accumulated, agenda.alarm_cee, agenda.alarm_mte,agenda.alarm_mta,agenda.alarm_tme;
It should return two records, but instead it's returning four. ID_SERIVE is returning 3 records with the same value.
"ID_SERVICE" "ID_SERVICE_NAME" "ID_DEPARTMENT" "ID_DEPARTMENT_NAME" "CWaiting" "CWaiting_Resent_Area" "CAttending" "CAttended" "MTA" "MTE" "TMA" "TME" "TME_ACCUMULATED" "TMA_ACCUMULATED" "ALARM_CEE" "ALARM_MTE" "ALARM_MTA" "ALARM_TME"
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 5504 5504 21 109 0 0 0 0
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 1590 1590.000000000000000000000000000000000002 21 109 0 0 0 0
"DR" "DR" 1 "SECUENCIALES" 1 0 1 1 21 109 0 0 0 0
"TRAU" "TRAU" 1 "SECUENCIALES" 1 0 0 0 1567 1567.000000000000000000000000000000000002 0 0 0 0 0 0
What am I doing wrong?
Thanks
You seem to be including all the columns you're interesting in the group by, and not aggregating properly; possibly you've got to this point by trial and error as you've tried to resolve errors from columns not being grouped. You don't need the sub-selects in all the column clauses.
Untested as we don't have your tables or raw data but it looks like you want something like:
SELECT agenda.idagenda AS "ID_SERVICE",
agenda.name AS "ID_SERVICE_NAME",
specialities.id AS "ID_DEPARTMENT",
specialities.name AS "ID_DEPARTMENT_NAME",
supervisor.clients_waiting AS "CWaiting",
NVL(supervisor.clients_resent_waiting_area, 0) AS "CWaiting_Resent_Area",
supervisor.clients_attending AS "CAttending",
supervisor.clients_attended AS "CAttended",
MAX(ROUND((SYSDATE - supervisor_time_data.time_attending)*86400)) AS "MTA",
MAX(ROUND((SYSDATE - supervisor_time_data.time_waiting)*86400)) AS "MTE",
SUM(SYSDATE - supervisor_time_data.time_attending)*86400
/ supervisor.clients_attending AS "TMA",
SUM(SYSDATE - supervisor_time_data.time_waiting)*86400
/ supervisor.clients_waiting AS "TME",
supervisor.tme_accumulated AS "TME_ACCUMULATED",
supervisor.tma_accumulated AS "TMA_ACCUMULATED",
NVL(agenda.alarm_cee, 0) AS "ALARM_CEE",
NVL(agenda.alarm_mte, 0) AS "ALARM_MTE",
NVL(agenda.alarm_mta, 0) AS "ALARM_MTA",
NVL(agenda.alarm_tme, 0) AS "ALARM_TME"
FROM CORE.supervisor
LEFT JOIN CORE.supervisor_time_data
ON supervisor_time_data.id_service = supervisor.id_service
LEFT JOIN CORE.agenda ON supervisor.id_service = agenda.id
LEFT JOIN CORE.specialities ON agenda.idspeciality = specialities.id
WHERE supervisor.booked_or_sequential = 1
GROUP BY agenda.idagenda, agenda.name, supervisor.id_service, specialities.id,
specialities.name, supervisor.clients_waiting,
supervisor.clients_resent_waiting_area, supervisor.clients_attending,
supervisor.clients_attended, supervisor.tme_accumulated,
supervisor.tma_accumulated, agenda.alarm_cee,
agenda.alarm_mte,agenda.alarm_mta,agenda.alarm_tme;
So specifically, supervisor_time_data.time_waiting and supervisor_time_data.time_attending don't need to be in the group by as they are used in aggregate.
I've replaced your case checks with nvl just because it's shorter; case is fine though if you prefer that.
I need return a query where all the unique page numbers are returned with the max version number of each page.
Here is the an example of the data that I'd query
DocumentID PageNumber Version
1 1 1
1 2 1
1 2 2
1 3 1
1 3 2
1 3 3
And here is what I would need to get returned in my query
DocumentID PageNumber Version
1 1 1
1 2 2
1 3 3
Not sure how to finish this:
var pages = from p in dc.Pages where p.DocumentID == 1 && ...
I think this is what you're trying to achieve:
var results =
from p in dc.Pages
where p.DocumentID == 1
group p by p.PageNumber into g
select new
{
PageNumber = g.Key,
MaxVersion = g.Max(x => x.Version)
};
This query may help you:
Select DocumentID ,Distinct PageNumber, max(version) from table
group by DocumentID, Distinct PageNumber