hive - inserting rows for different column value

hive - inserting rows for different column value - hadoop

Honestly, I don't know how simply I can describe the question on the title line instead of showing an example.
I have a hive table which contains two columns: ID and date
ID Date
31 01-01-2017
31 01-02-2017
31 01-03-2017
123 01-01-2017
123 01-01-2017
...
In this table, I would like to include another column which is hour such as below
ID Date Hour
31 01-01-2017 00
31 01-01-2017 01
31 01-01-2017 02
31 01-01-2017 03
31 01-01-2017 04
...
31 01-01-2017 23
31 01-02-2017 00
31 01-02-2017 01
...
Basically, for every row, I would like add an hour column of values from 00 to 23.
Can this be achieved using hive?
Thank you so much.

You could create a temporary table which contains entries from 0 to 23 and do a cross join with the table you have. Or you can leverage on the CTE function a CTE table with entries from 0 to 23 and then do a cross join with it.
An example:
with temp as (
select 0 hour union all
select 1 hour union all
select 2 hour union all
select 3 hour union all
select 4 hour union all
select 5 hour union all
select 6 hour union all
select 7 hour union all
select 8 hour union all
select 9 hour union all
select 10 hour union all
select 11 hour union all
select 12 hour union all
select 13 hour union all
select 14 hour union all
select 15 hour union all
select 16 hour union all
select 17 hour union all
select 18 hour union all
select 19 hour union all
select 20 hour union all
select 21 hour union all
select 22 hour union all
select 23 hour
)
select * from table join temp
You can also insert the result into a table to persist the result. Hope it helps

Related

Adding a row to a resultset if it does not exist using Oracle Model Clause

Given the table
D V
--------------
2019-03-02, 13
2019-10-28, 12
2019-11-22, 34
2020-01-18, 21
2020-04-11, 39
I want to add a record with date 2019-12-31 replicating the last one partitioning by year
2019-03-02, 13
2019-10-28, 12
2019-11-22, 34
2019-12-31, 34 <<
2020-01-18, 21
2020-04-11, 39
2020-12-31, 39 <<
How can this be made using the Model Clause? I cannot even figure out where to start.

After reading through examples of the model clause and looking at the syntax and trying to work out how to insert rows into a model I reached the conclusion that it was not an easy task (possibly impossible) to solve your question using a MODEL clause.
However, if you want to use an appropriate method to the problem (as opposed to trying to use a MODEL clause for something it was not really designed for), you can use a recursive sub-query factoring clause:
WITH bounds (d, v, next_d) AS (
SELECT d,
v,
LEAD(d, 1, SYSDATE) OVER ( ORDER BY d )
FROM table_name
UNION ALL
SELECT ADD_MONTHS(TRUNC(d + INTERVAL '1' DAY, 'YY'), 12) - INTERVAL '1' DAY,
v,
next_d
FROM bounds
WHERE ADD_MONTHS(TRUNC(d + INTERVAL '1' DAY, 'YY'), 12) - INTERVAL '1' DAY
< next_d
)
SEARCH DEPTH FIRST BY d SET d_order
SELECT d, v
FROM bounds;
Which, for your sample data:
CREATE TABLE table_name (D, V) AS
SELECT DATE '2019-03-02', 13 FROM DUAL UNION ALL
SELECT DATE '2019-10-28', 12 FROM DUAL UNION ALL
SELECT DATE '2019-11-22', 34 FROM DUAL UNION ALL
SELECT DATE '2020-01-18', 21 FROM DUAL UNION ALL
SELECT DATE '2020-04-11', 39 FROM DUAL
Outputs:
D
V
2019-03-02 00:00:00
13
2019-10-28 00:00:00
12
2019-11-22 00:00:00
34
2019-12-31 00:00:00
34
2020-01-18 00:00:00
21
2020-04-11 00:00:00
39
2020-12-31 00:00:00
39
db<>fiddle here

Identify Most Recent Record on Yearly Snapshot by partition of Arrange Id

I have Scenario Like below and want set Indicator based on Arrange Id, Login Date.. If User login website multiple time in Calendar Year then Most recent record need to set Y else N. Also I need to set Indicator like Bottom two rows as well.. ( Means 1121221 Accessed on last year recent 12/13/2017 need to set 'Y' and if user accessed in next immediate year 1/12/2018 then 'Y' )
enter image description here

Here's one option. What does it do?
the TEST CTE are some sample rows. Note ARRANGE_ID = 999, which has dates from 2017 and 2019 (which means that there are no consecutive years, so the date in 2019 should get the indicator 'N'. You didn't say, though, what would happen if there's yet another date in 2019; would both of them get 'N', or would the max login date still get a 'Y'?
the INTER CTE uses the MAX analytic function to find the maximum login date for the year and the LAG analytic function which returns the previous login date (so that I could check whether those years are consecutive or not)
the final query uses CASE to find whether certain row satisfies conditions to make the indicator equal to 'Y'
Here you go:
SQL> with test (arrange_id, login_date) as
2 (select 234, date '2017-02-18' from dual union all
3 select 234, date '2017-04-13' from dual union all
4 select 234, date '2017-11-14' from dual union all
5 select 234, date '2018-01-14' from dual union all
6 select 234, date '2018-09-11' from dual union all
7 select 234, date '2019-04-02' from dual union all
8 select 234, date '2019-05-18' from dual union all
9 select 112, date '2017-02-23' from dual union all
10 select 112, date '2017-12-13' from dual union all
11 select 112, date '2018-01-12' from dual union all
12 select 999, date '2017-01-01' from dual union all
13 select 999, date '2017-05-25' from dual union all
14 select 999, date '2019-01-01' from dual
15 ),
16 inter as
17 (select arrange_id,
18 login_date,
19 max(login_date) over
20 (partition by arrange_id, extract (year from login_date)) maxdate,
21 lag(login_date) over (partition by arrange_id order by login_date) prev_date
22 from test
23 )
24 select arrange_id,
25 login_date,
26 case when login_date = maxdate and
27 extract(year from login_date) - extract(year from prev_date) <= 1 then 'Y'
28 else 'N'
29 end indicator
30 from inter
31 order by arrange_id, login_date;
ARRANGE_ID LOGIN_DATE I
---------- ---------- -
112 02/23/2017 N
112 12/13/2017 Y -- Y because it is MAX in 2017
112 01/12/2018 Y -- Y because it is MAX in 2018 and 2018 follows 2017
234 02/18/2017 N
234 04/13/2017 N
234 11/14/2017 Y -- Y because it is MAX in 2017
234 01/14/2018 N
234 09/11/2018 Y -- Y because it is MAX in 2018 and 2018 follows 2017
234 04/02/2019 N
234 05/18/2019 Y -- Y because it is MAX in 2019 and 2019 follows 2018
999 01/01/2017 N
999 05/25/2017 Y -- Y because it is MAX in 2017
999 01/01/2019 N -- N because it is MAX in 2019, but 2019 doesn't follow 2017
13 rows selected.
SQL>

How to get one query about sales set on default date?

I have two tables, T_TEST and T_DEFAULT_DATE. T_TEST contains date and amount, and T_DEFAULT_DATE contains just P_DATE.
First table T_TEST:
DATE AMOUNT
-------- ----------
01.01.99 77
16.02.99 59
01.01.00 12
15.01.00 32
01.02.00 144
15.02.00 320
16.02.00 521
01.03.00 98
15.03.00 76
16.03.00 33
01.01.01 65
15.01.01 78
01.02.01 95
15.02.01 39
16.02.01 97
02.02.02 63
07.03.02 75
And second table T_DEFAULT_DATE:
P_DATE
--------
16.02.01
What I want to get is two queries established in a single query :
1. what is the amount of sale achieved on the same day last year (- 12 mounths)
2. amount of sales for whole past year (based on table T_DEFAULT_DATE)
3. the amount (sum) for whole mounth (default mounth : 1.2. 2001 - 28.2.2001)
Expected output is :
P_SDLY P_LY P_MS
-----------
521 1236 231
I tryed with add_months(t_default_date.p_date, -12) , but I didn't get expected result. Please help

You can try something like this, assuming that your fields are stored in date columns.
SQL> with t_test(date_, amount) as
2 (
3 select to_date('01.01.99', 'dd.mm.rr'), 77 from dual union all
4 select to_date('16.02.99', 'dd.mm.rr'), 59 from dual union all
5 select to_date('01.01.00', 'dd.mm.rr'), 12 from dual union all
6 select to_date('15.01.00', 'dd.mm.rr'), 32 from dual union all
7 select to_date('01.02.00', 'dd.mm.rr'), 144 from dual union all
8 select to_date('15.02.00', 'dd.mm.rr'), 320 from dual union all
9 select to_date('16.02.00', 'dd.mm.rr'), 521 from dual union all
10 select to_date('01.03.00', 'dd.mm.rr'), 98 from dual union all
11 select to_date('15.03.00', 'dd.mm.rr'), 76 from dual union all
12 select to_date('16.03.00', 'dd.mm.rr'), 33 from dual union all
13 select to_date('01.01.01', 'dd.mm.rr'), 65 from dual union all
14 select to_date('15.01.01', 'dd.mm.rr'), 78 from dual union all
15 select to_date('01.02.01', 'dd.mm.rr'), 95 from dual union all
16 select to_date('15.02.01', 'dd.mm.rr'), 39 from dual union all
17 select to_date('16.02.01', 'dd.mm.rr'), 97 from dual union all
18 select to_date('02.02.02', 'dd.mm.rr'), 63 from dual union all
19 select to_date('07.03.02', 'dd.mm.rr'), 75 from dual
20 ),
21 t_default_date(p_date) as
22 (
23 select to_date('16.02.01', 'dd.mm.rr') from dual
24 )
25 select sum(
26 case
27 when date_ between add_months(trunc(p_date, 'yyyy'), -12)
28 and trunc(p_date, 'yyyy')-1
29 then amount
30 else 0
31 end
32 ) as year,
33 sum( decode (date_, add_months(p_date, -12), amount, 0) ) as day,
34 sum( case
35 when date_ between
36 trunc(p_date, 'MM') and
37 last_day(p_date)
38 then amount
39 else
40 0
41 end
42 ) as month
43 from t_test
44 inner join t_default_date on (date_ between add_months(trunc(p_date, 'yyyy'), -12) and last_day(p_date) );
YEAR DAY MONTH
---------- ---------- ----------
1236 521 231
SQL>
This makes use of add_months to get exactly "one year ago"; if you need "365 days ago" (think of leap years), consider using something like date - 365

how to convert dates to week numbers

I need to do my reporting on week on week basis but my week number should start from 1st day of month
here is my sample data:
report_date Vol
01 nov 2014 23
03 nov 2014 34
16 nov 2014 56
30 nov 2014 44
Desired output
Week no Vol
1 57
2 56
3 0
4 44
hope its clear
Thanks

Since your desired output include "zero" rows as well, and assuming you'd like this report to work across multiple months as well:
WITH sample_data AS
(SELECT DATE '2014-11-01' AS report_date, 23 AS vol FROM DUAL
UNION ALL SELECT DATE '2014-11-03', 34 FROM DUAL
UNION ALL SELECT DATE '2014-11-16', 56 FROM DUAL
UNION ALL SELECT DATE '2014-11-30', 44 FROM DUAL)
,weeks AS
(SELECT report_month
,TO_CHAR(ROWNUM) AS week_no
FROM (SELECT DISTINCT
TRUNC(report_date,'MM') AS report_month
FROM sample_data)
CONNECT BY LEVEL <= TO_NUMBER(TO_CHAR(LAST_DAY(report_month),'W')))
SELECT TO_CHAR(weeks.report_month,'Month') AS "Month"
,weeks.week_no AS "Week no"
,NVL(sum(sample_data.vol),0) AS "Vol"
FROM weeks
LEFT JOIN sample_data
ON weeks.report_month = TRUNC(report_date,'MM')
AND weeks.week_no = to_char(report_date,'W')
GROUP BY weeks.report_month, weeks.week_no ORDER BY 1,2;
We determine the number of weeks in each month of the source data by using the LAST_DAY function, and we do a hierarchical query (CONNECT BY LEVEL <= n) to generate one row for each week in each month.
The expected output should be:
Month Week no Vol
======== ======= ===
November 1 57
November 2 0
November 3 56
November 4 0
November 5 44

select to_char(report_date, 'W'), sum(vol)
from your_table
group by to_char(report_date, 'W');
W Week of month (1-5) where week 1 starts on the first day of the
month and ends on the seventh.

Get Gap between time range

In WORK_TIME column in my database table (EMP_WORKS), i have records as below.
WORK_TIME
19:03:00
20:00:00
21:02:00
21:54:00
23:04:00
00:02:00
i want to create a database view using these data. for it i need to get Gap between these times as below.
WORK_TIME GAP
19:03:00 -
20:00:00 00:57:00 (Gap between 19:03:00 and 20:00:00)
21:02:00 01:02:00 (Gap between 20:00:00 and 21:02:00)
21:54:00 00:52:00 (Gap between 21:02:00 and 21:54:00)
23:04:00 01:10:00 (Gap between 21:54:00 and 23:04:00)
00:02:00 00:58:00 (Gap between 23:04:00 and 00:02:00)
How could i do this ?

This query will get you the differences in hours:
SELECT
work_time,
( work_time - LAG(work_time) OVER (ORDER BY work_time) ) * 24 AS gap
FROM emp_works
Example on SQL Fiddle returns this:
WORK_TIME GAP
November, 07 2012 19:03:00+0000 (null)
November, 07 2012 20:00:00+0000 0.95
November, 07 2012 21:02:00+0000 1.033333333333
November, 07 2012 21:54:00+0000 0.866666666667
November, 07 2012 23:04:00+0000 1.166666666667
November, 08 2012 00:02:00+0000 0.966666666667

First you will need to have a primary key in the table containing the DATE/TIME field.
I have set up this demo on SQL Fiddle .. Have a look
I have represented the gap as a factor of hours between the two times. You can manipulate the figure to represent minutes, or days, whatever.
SELECT
TO_CHAR(A.WORK_TIME,'HH24:MI:SS') WORK_FROM,
TO_CHAR(B.WORK_TIME,'HH24:MI:SS') WORK_TO,
ROUND(24*(B.WORK_TIME-A.WORK_TIME),2) GAP FROM
sample A,
SAMPLE B
WHERE A.ID+1 = B.ID(+)
If your primary key values have difference greater than 1 (gaps within the values of the primary key) then you will need to offset the value dynamically like this:
SELECT
TO_CHAR(A.WORK_TIME,'HH24:MI:SS') WORK_FROM,
TO_CHAR(B.WORK_TIME,'HH24:MI:SS') WORK_TO,
ROUND(24*(B.WORK_TIME-A.WORK_TIME),2) GAP FROM
sample A,
SAMPLE B
WHERE b.ID = (select min(C.ID) from sample c where c.id>A.ID)

According to your desired result, provided in the question, you want to see time interval. And also I suppose that the WORK_TIME column is of date datatype and there is a date part(otherwise there will be a negative result of subtraction (previous value of WORK_TIME from 00.02.00)).
SQL> create table Work_times(
2 work_time
3 ) as
4 (
5 select to_date('01.01.2012 19:03:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
6 select to_date('01.01.2012 20:00:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
7 select to_date('01.01.2012 21:02:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
8 select to_date('01.01.2012 21:54:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
9 select to_date('01.01.2012 23:04:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
10 select to_date('02.01.2012 00:02:00', 'dd.mm.yyyy hh24:mi:ss') from dual
11 )
12 /
Table created
SQL>
SQL> select to_char(t.work_time, 'hh24.mi.ss') work_time
2 , (t.work_time -
3 lag(t.work_time) over(order by WORK_TIME)) day(1) to second(0) Res
4 from work_times t
5 ;
WORK_TIME RES
--------- -------------------------------------------------------------------------------
19.03.00
20.00.00 +0 00:57:00
21.02.00 +0 01:02:00
21.54.00 +0 00:52:00
23.04.00 +0 01:10:00
00.02.00 +0 00:58:00
6 rows selected

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

hive - inserting rows for different column value - hadoop

Related

Adding a row to a resultset if it does not exist using Oracle Model Clause

Identify Most Recent Record on Yearly Snapshot by partition of Arrange Id

How to get one query about sales set on default date?

how to convert dates to week numbers

Get Gap between time range

Categories

Resources