Aggregated column use in Hive Query - hadoop

My hive table (tab1) structure:
people_id,time_spent,group_type
1,234,a
2,540,b
1,332,a
2,112,b
Below is the query i am trying to execute but getting error ("Not yet supported place for UDAF 'sum'"):
select people_id, sum(case when group_type='a' then time_spent else 0 end) as a_time, sum(pow(a_time,2)) as s_sq_a_time,sum(case when group_type='b' then time_spent else 0 end) as b_time, sum(pow(b_time,2)) as s_sq_b_time from tab1 group by people_id;
Is it possible to refer aggregated column from same select statement in Hive?
I have also referred below link but it didnt work:
http://grokbase.com/t/hive/user/095tpdkrgz/built-in-aggregate-function-standard-deviation#

Set an alias for the table name and use the table alias when accessing the columns.
E.g.
select startstation, count(tripid) as a
from 201508_trip_data as t
group by t.startstation
Note 't' is the alias for the table and I've used t.startstation to access the

You'll have to use a derived table to refer to a_time and b_time
select a_time, b_time,
pow(a_time,2) as s_sq_a_time,
pow(b_time,2) as s_sq_b_time
from (
select people_id,
sum(case when group_type='a' then time_spent else 0 end) as a_time,
sum(case when group_type='b' then time_spent else 0 end) as b_time
from tab1 group by people_id
) t1

Related

FOR LOOP Statement in BigQuery

I have data in two tables:
Table activity:
User_ID Event_Time Cmd
AMsySZb9GPcL 1512125190721078 1
AMsySZYQ-lAI 1512118629594674 0
AMsySZZMlPzD 1512125736366076 1
....
Table behaviour:
User_ID Event_Time
AMsySZZFezm 1512145788526664
AMsySZb9GPcL 1512125190721078
AMsySZY5YcTa 1512143509733637
AMsySZYQ-lAI 1512118629594674
AMsySZZMlPzD 1512125736366076
....
User_ID is type STRING, Event_Time is type INTEGER.
Step 1: The basic SELECT statement I am making now is:
SELECT activity.User_ID, activity.Event_Time FROM activity WHERE Cmd=1
Step 2: Then I would like to get data from behaviour table, but only for Users from Step 1 and only where behaviour.Event_Time is before activity.Event_Time.
For example:
From Step 1 I got User_ID='AMsySZb9GPcL' and I need:
SELECT behaviour.User_ID, behaviour.Event_Time
FROM behaviour
WHERE User_ID='AMsySZb9GPcL' AND activity.Event_Time >= behaviour.Event_Time
But the problem is that I have to do the same for every User_ID from Step 1, I am not sure if it is the supported functionality of SQL, but I need something like FOR LOOP.
You don't need FOR LOOP for this - you should think of set based operation when you deal with SQL of any sort - so you can process all your users in one shot using power of JOINs
Below is for BigQuery Standard SQL
#standardSQL
SELECT
activity.User_ID User_ID,
activity.Event_Time activity_Time,
behaviour.Event_Time behaviour_Time
FROM `project.dataset.activity` activity
JOIN `project.dataset.behaviour` behaviour
ON activity.User_ID = behaviour.User_ID
AND activity.Event_Time >= behaviour.Event_Time
WHERE Cmd = 1
You can test / play with above using dummy data from your example:
#standardSQL
WITH `project.dataset.activity` AS (
SELECT 'AMsySZb9GPcL' User_ID, 1512125190721078 Event_Time, 1 Cmd UNION ALL
SELECT 'AMsySZYQ-lAI', 1512118629594674, 0 UNION ALL
SELECT 'AMsySZZMlPzD', 1512125736366076, 1
), `project.dataset.behaviour` AS (
SELECT 'AMsySZZFezm ' User_ID, 1512145788526664 Event_Time UNION ALL
SELECT 'AMsySZb9GPcL', 1512125190721078 UNION ALL
SELECT 'AMsySZY5YcTa', 1512143509733637 UNION ALL
SELECT 'AMsySZYQ-lAI', 1512118629594674 UNION ALL
SELECT 'AMsySZZMlPzD', 1512125736366076
)
SELECT
activity.User_ID User_ID,
activity.Event_Time activity_Time,
behaviour.Event_Time behaviour_Time
FROM `project.dataset.activity` activity
JOIN `project.dataset.behaviour` behaviour
ON activity.User_ID = behaviour.User_ID
AND activity.Event_Time >= behaviour.Event_Time
WHERE Cmd=1

ORA-00907 Missing Right Parenthesis (Oracle)

I was previously holding my data in SharePoint. At that time, the below query ran fine :-
SELECT Nz(Abs(Sum(sales_route="Sales Mailbox")),0) AS AcceptDirect
FROM tblQuotesNew AS t1;
Now that I have moved my data to Oracle (but still retrieving it via Access), I get the error ORA-00907 Missing Right Parenthesis.
Can anyone suggest how I can modify the code above that that it is acceptable to Oracle?
Thanks in advance
I think your query counts number of rows with sales_route as 'Sales Mailbox' which can be simply written as:
select count(*) as AcceptDirect
from tblQuotesNew
where sales_route = 'Sales Mailbox';
If you want counts for different routes in the same query, you can do something like this:
select count(case when sales_route = 'Sales Mailbox' then 1 end) as AcceptDirect,
count(case when sales_route = 'XYZ' then 1 end) as XYZ
from tblQuotesNew
where sales_route in ('Sales Mailbox', 'XYZ');

oracle procedure cursor query when case statement

CURSOR BULKUPDATE IS
SELECT SUM(B.ACCOUNT_BALANCE) AS ACCOUNT_BALANCE,C.CIF AS CIF_ID FROM _ACCOUNTS_STAGING2 B JOIN _RELATION_STAGING2 C
ON B.ACCOUNT_IDENTIFICATION_NUMBER = C.ACCOUNT_IDENTIFICATION_NUMBER AND B.SOURCEID=C.SOURCEID JOIN _CUSTOMER_STAGING2 A ON A.CIF=C.CIF AND A.SOURCEID=C.SOURCEID WHERE C.ROLE_ON_ACCOUNT IN
(Select Rollonaccount From _Roleaccount_Master Where Aggregatebalance='Y')
And upper(B.Scheme_Type) In (Select Scheme_Type From _Schema_Type_Master Where
Depository_Account = 'Y') Group By C.Cif;
Rec_Bulkupdate Bulkupdate%Rowtype;
I am using this query to sum account balances based on different cif and source. The question is I want to calculate four different types of sum on the basis of _Schema_Type_Master. For example I want to check now current_account='Y' instead of Depository_Account='Y'
_ACCOUNTS_STAGING2 B JOIN _RELATION_STAGING2 C
ON B.ACCOUNT_IDENTIFICATION_NUMBER = C.ACCOUNT_IDENTIFICATION_NUMBER AND B.SOURCEID=C.SOURCEID JOIN _CUSTOMER_STAGING2 A ON A.CIF=C.CIF AND A.SOURCEID=C.SOURCEID WHERE C.ROLE_ON_ACCOUNT IN
(Select Rollonaccount From _Roleaccount_Master Where Aggregatebalance='Y')
And upper(B.Scheme_Type) In (Select Scheme_Type From _Schema_Type_Master Where
current_account='Y') Group By C.Cif;
Rec_Bulkupdate Bulkupdate%Rowtype;
Is there any way or do I need to write four different cursors for that??
You can remove dipository_account='Y' and current_account='Y' and use case in select as -
SELECT SUM(CASE WHEN Depository_Account = 'Y' THEN B.ACCOUNT_BALANCE ELSE 0 END) AS DIPOSITORY_ACCOUNT_BALANCE,
SUM(CASE WHEN current_account = 'Y' THEN B.ACCOUNT_BALANCE ELSE 0 END) AS CURRENT_ACCOUNT_BALANCE
and then rest of your code. You will get two different columns for sum of Depository account and Current account.
And if filter for dipository_account='Y' and current_account='Y' is required, then use them in where condition with or operator :
AND (dipository_account='Y' or current_account='Y')

Reference parent query column in subquery (Oracle)

How can I reference a column outside of a subquery using Oracle? I specifically need to use it in the WHERE statement of the subquery.
Basically I have this:
SELECT Item.ItemNo, Item.Group
FROM Item
LEFT OUTER JOIN (SELECT Attribute.Group, COUNT(1) CT
FROM Attribute
WHERE Attribute.ItemNo=12345) A ON A.Group = Item.Group
WHERE Item.ItemNo=12345
I'd like to change WHERE Attribute.ItemNo=12345 to WHERE Attribute.ItemNo=Item.ItemNo in the subquery, but I can't figure out if this is possible. I keep getting "ORA-00904: 'Item'.'ItemNo': Invalid Identifier"
EDIT:
Ok, this is why I need this kind of structure:
I want to be able to get a count of the "Error" records (where the item is missing a value) and the "OK" records (where the item has a value).
The way I have set it up in the fiddle returns the correct data. I think I might just end up filling in the value in each of the subqueries, since this would probably be the easiest way. Sorry if my data structures are a little convoluted. I can explain if need be.
My tables are:
create table itemcountry(
itemno number,
country nchar(3),
imgroup varchar2(10),
imtariff varchar2(20),
exgroup varchar2(10),
extariff varchar2(20) );
create table itemattribute(
attributeid varchar2(10),
tariffgroup varchar2(10),
tariffno varchar2(10) );
create table icav(
itemno number,
attributeid varchar2(10),
value varchar2(10) );
and my query so far is:
select itemno, country, imgroup, imtariff, im.error "imerror", im.ok "imok", exgroup, extariff, ex.error "exerror", ex.ok "exok"
from itemcountry
left outer join (select sum(case when icav.itemno is null then 1 else 0 end) error, sum(case when icav.itemno is not null then 1 else 0 end) ok, tariffgroup, tariffno
from itemattribute ia
left outer join icav on ia.attributeid=icav.attributeid
where (icav.itemno=12345 or icav.itemno is null)
group by tariffgroup, tariffno) im on im.tariffgroup=imgroup and imtariff=im.tariffno
left outer join (select sum(case when icav.itemno is null then 1 else 0 end) error, sum(case when icav.itemno is not null then 1 else 0 end) ok, tariffgroup, tariffno
from itemattribute ia
left outer join icav on ia.attributeid=icav.attributeid
where (icav.itemno=12345 or icav.itemno is null)
group by tariffgroup, tariffno) ex on ex.tariffgroup=exgroup and extariff=ex.tariffno
where itemno=12345;
It's also set up in a SQL Fiddle.
You can do it in a sub-query but not in a join. In your case I don't see any need to. You can put it in the join condition.
select i.itemno, i.group
from item i
left outer join ( select group, itemno
from attribute b
group by group itemno ) a
on a.group = i.group
and i.itemno = a.itemno
where i.itemno = 12345
The optimizer is built to deal with this sort of situation so utilise it!
I've changed the count(1) to a group by as you need to group by all columns that aren't aggregated.
I'm assuming that your actual query is more complicated than this as with the columns you're selecting this is probably equivilent to
select itemno, group
from item
where itemno = 12345
You could also write your sub-query with an analytic function instead. Something like count(*) over ( partition by group).
As an aside using a keyword as a column name, in this case group is A Bad Idea TM. It can cause a lot of confusion. As you can see from the code above you have a lot of groups in there.
So, based on your SQL-Fiddle, which I've added to the question I think you're looking for something like the following, which doesn't look much better. I suspect, given time, I could make it simpler. On another side note explicitly lower casing queries is never worth the hassle it causes. I've followed your naming convention though.
with sub_query as (
select count(*) - count(icav.itemno) as error
, count(icav.itemno) as ok
, min(itemno) over () as itemno
, tariffgroup
, tariffno
from itemattribute ia
left outer join icav
on ia.attributeid = icav.attributeid
group by icav.itemno
, tariffgroup
, tariffno
)
select ic.itemno, ic.country, ic.imgroup, ic.imtariff
, sum(im.error) as "imerror", sum(im.ok) as "imok"
, ic.exgroup, ic.extariff
, sum(ex.error) as "exerror", sum(ex.ok) as "exok"
from itemcountry ic
left outer join sub_query im
on ic.imgroup = im.tariffgroup
and ic.imtariff = im.tariffno
and ic.itemno = im.itemno
left outer join sub_query ex
on ic.exgroup = ex.tariffgroup
and ic.extariff = ex.tariffno
and ic.itemno = ex.itemno
where ic.itemno = 12345
group by ic.itemno, ic.country
, ic.imgroup, ic.imtariff
, ic.exgroup, ic.extariff
;
You can put WHERE attribute.itemno=item.itemno inside the subquery. You are going to filter the data anyway, filtering the data inside the subquery is usually faster too.

Complex SQL query

There is a customer table. I want to list active and inactive status in one query. How can I do this?
SELECT count(*) as ACTIVE,
count(*) as INACTIVE
FROM V_CUSTOMER
WHERE STATUS='a' AND STATUS='i'
We can use CASE statement to translate the two values of STATUS:
SELECT
sum(case when STATUS = 'a' then 1 else 0 end) as ACTIVE
, sum(case when STATUS = 'd' then 1 else 0 end) as DEACTIVE
FROM V_CUSTOMER
There is no need for a WHERE clause unless there are a large number of records with other values for STATUS, in which case use OR instead of AND:
WHERE STATUS='a' OR STATUS='d'
Try using group by:
SELECT count(*), STATUS FROM V_CUSTOMER
Where STATUS='a' OR STATUS='d'
GROUP BY STATUS
SELECT count(decode(status,'a',1)) as ACTIVE,
count(decode(status,'d',1)) as DEACTIVE
FROM V_CUSTOMER
WHERE STATUS='a' or STATUS='d'
I think you'll need something like this:
select Status, count(*) from V_Customer
where STATUS='a' or STATUS='d'
group by STATUS
This will give you the number of records per status.

Resources