Hive query counts of fields where fields are populated - hadoop

I have a huge Hive table consisting of ten product fields, date fields for the purchases, and an identifier. The product fields are named like prod1, prod2, ... , prod10 and refer to the last ten products purchased. For most IDs, we don't have purchase history all the way back to ten products.
I'd like to construct a distribution of population rates for each of the prod<X> fields, to show the breakdown of purchase history across the entire dataset.
Currently, I'm running a bash script that runs ten consecutive queries against the table like:
hive -e "select count(1) from db.tbl where prod<X> != '';"
... and saving the output to a file. This seems clunky and inefficient. Is there a better way to specify Hive counts on a range of fields with a range of field conditions? I've tried to come up with a strategy using groupby or even mapping a range of fields, but can't quite wrap my head around specifying the != '' condition for each field.
Thanks in advance for any direction.

select id,
sum(case when prod1='' then 0 else 1 end),
sum(case when prod2='' then 0 else 1 end),
sum(case when prod3='' then 0 else 1 end),
sum(case when prod4='' then 0 else 1 end),
sum(case when prod5='' then 0 else 1 end),
sum(case when prod6='' then 0 else 1 end),
sum(case when prod7='' then 0 else 1 end),
sum(case when prod8='' then 0 else 1 end),
sum(case when prod9='' then 0 else 1 end),
sum(case when prod10='' then 0 else 1 end)
from table group by id;

Related

CASE statement returning the same value regardless of input

I have a case statement where it returns 2 regardless of the value from v_trcdate. The v_trcdate is supposed to be from a table value. Then, I've tried manually inputting the value and still the same result.
Declare v_trcdate DATE := '10-JAN-23';
(CASE WHEN v_accesstype = 'Y' THEN
(CASE WHEN TRUNC(sysdate) < v_trcdate THEN 1 WHEN TRUNC(sysdate) > v_trcdate THEN 2 END)
ELSE 3 END) AS PRICE

query group by with having and get results in two different columns

I have a table like these:
Name Source ended_status date Environment
House DC 1 2019/10/03 Pro
Cat DC2 1 2019/10/05 Pro
Pen DC 1 2019/10/03 Pro
Pen DC 0 2019/11/07 Pre
I would like to get:
Source Environment Ended_Status_with_1 Ended_Status_with_2
DC Pro 2 0
DC Pre 1 0
DC2 Pro 1 0
So, they must be grouped by Source,Environment and I must calculate a summation of all that they have Ended_Status with 1 and all with ended status 2 and put in the same line.
How could I do that?
I can make query grouped by every ended_status but I can't put the two summations of ended status in same line.
Many thanks and sorry for my poor English!
also, try below query
select source,environment,sum(decode(ended_status,1,1,0)) ended_status_with_1,
sum(decode(ended_status,2,1,0)) ended_status_with_2 from mytable
group by source,environment
You can use conditional aggregation:
select
source,
environment,
sum(case when ended_status = 1 then 1 else 0 end) ended_status_with_1,
sum(case when ended_status = 2 then 1 else 0 end) ended_status_with_2
from mytable
group by
source,
environment

How to get the sum?

I have an encountered problem in my query in Oracle SQL. I don't know how to get the sum of this query:
select call_type, channel
,count (case when status="no answer" then 1 end else 0) as cnt_no_answer
,count (case when status="answered" then 1 end else 0) as cnt_answer
from app_account.cc_call;
Please Help me. Thanks!
To get the answered and not answered records sum instead of count. To get the number of all records that are either answered or not answered use count(status). To get the count of all records, i.e. also records with status null use count(*). Strings need single quotes, not double quotes. The case statement needs END.
EDITED (there were too many END used):
select call_type, channel
, sum(case when status='no answer' then 1 else 0 end) as cnt_no_answer
, sum(case when status='answered' then 1 else 0 end) as cnt_answer
, count(status) as cnt_all_stated
, count(*) as cnt_all_records
from app_account.cc_call
group by call_type, channel;
try this again, i edit it:
SELECT call_type, channel,
sum (CASE WHEN status='no answer' THEN 1 ELSE 0 END) AS cnt_no_answer,
sum (CASE WHEN status='answered' THEN 1 ELSE 0 END) AS cnt_answer
FROM app_account.cc_call
GROUP BY call_type, channel;
Check this out:
SELECT CALL_TYPE,
CHANNEL,
COUNT (CASE WHEN UPPER(STATUS) = UPPER('no answer') THEN 1 ELSE NULL END)
AS CNT_NO_ANSWER,
COUNT (CASE WHEN UPPER(STATUS) = UPPER('answered') THEN 1 ELSE NULL END)
AS CNT_ANSWER
FROM APP_ACCOUNT.CC_CALL
GROUP BY CALL_TYPE, CHANNEL;
select call_type, channel
, sum(case when status='no answer' then 1 else 0 end) as cnt_no_answer
, sum(case when status='answered' then 1 else 0 end) as cnt_answer
, count(status) as cnt_all_stated
, count(*) as cnt_all_records
from app_account.cc_call
group by call_type, channel;
why usind end twice
SELECT Call_type, Channel
,COUNT(CASE WHEN status="no answer" then 1 else 0 end ) as Cnt_no_answer
,COUNT(CASE WHEN status="answered" then 1 else 0 end ) as Cnt_answer
from App_account.cc_call
GROUP BY call_type,channel;
I think there is some error in table deceleration. Pleas give table structure. And you want SUM and YOU are COUNTING there I dnt think it is error but lack of overview. Please give some more details.

Advance mysql query with order by

I use 2 tables to combine data to become like shown as below
SELECT name, price, MIN(price) AS minprice
FROM c, cp
WHERE c.id = cp.id
GROUP BY id
ORDER BY minprice = 0, minprice ASC
For Example:
id name price
1 apple 0
1 green apple 20
2 orange 10
3 strawberry 0
As the data result above the minprice of the group 1 is 0 But I don't want the min price take zero, but this is incorrenct if I give condition having minprice > 0 cause
I wanna my result become like this
2 orange 10
1 green apple 20
3 strawberry 0
Is it possible?
Here is the answer:
SELECT
(
SELECT name
FROM yourtable
WHERE price = _inner._MIN AND id = _inner.id LIMIT 1
)
AS _NAME,
_inner._MIN
FROM
(
SELECT id, IFNULL(MIN(NULLIF(price, 0)),0) AS _MIN
FROM yourtable
GROUP BY id
)
AS _inner
where yourtable is the name of your table.
MIN(NULLIF(price, 0)) allows you to calculate minimum value while not counting a zero.
IFNULL(<...>,0) here just means, that we need a real zero instead of NULL in result.
LIMIT 1 is on the case if we have an items with the same id and price but with different names. I think, you can freely remove this statement.

T-SQL Use Table Variable or Sum Against Parent Table

The scenario is this, I am creating a log table that will end up being quite large once it is all said and done and I want to create a status table that will query from the table with different date ranges and sum the results into multiple total fields.
I plan on writing this into a Stored Procedure but my question would I gain the best performance from reading all my records from the log table into a temp table before doing the sum operations.
IE I have this table:
SummaryValues
90DayValues
60DayValues
30DayValues
14DayValues
7DayValues
1DayValues
Would it be logical to make a take all values for the previous 90 days and then insert them into a table value before then calculating my sum for my 6 fields in my summary table or would it be just as fast to execute 6 sum statements from the log table?
Sometimes you are better reading into a temp table first. Sometimes not. This makes sense if you have multiple passes of processing on the same data
However, if you want "last 90 days", "last 60 day" etc then it can be done in one query
Reading the question again, I'd just run one query and calculate all values in one go. And not bother with any intermediate tables
SELECT
Stuff,
SUM(CASE WHEN dayDiff <= 90 THEN SomeValue ELSE 0 END) AS SumValue90,
SUM(CASE WHEN dayDiff <= 60 THEN SomeValue ELSE 0 END) AS SumValue60,
SUM(CASE WHEN dayDiff <= 30 THEN SomeValue ELSE 0 END) AS SumValue30
FROM
(
SELECT
Stuff,
DATEDIFF(day, SomeData, GETDATE()) AS dayDiff
FROM
Mytable
WHERE
...
) foo
GROUP BY
...

Resources