How to query on a specific date and time range using hive query language taking input from the user? - hadoop

I have a table in a database in hive.
The table is partitioned based on year month and day.
My query looks something like this
select entity1,entity2
from table_t
INNER JOIN tab_roll.cha alias2
ON alias1.sid = alias2.sid
INNER JOIN net_roll.net alias3
ON alias2.id=alias3.id
where event= 'unknown'
and day >= 10 and day < 12
and month >= 5 and month < 11
and year = 2014
now I want to get results between say mm-dd-yyy HH : MM :SS and mm-dd-yyy HH : MM :SS, how should I do that?
Is is possible to have a pop up where the user chooses the date/time ranges?
Don't know if this helps but the data has about 500 million rows.
Thanks

I think Between should work for you. & to optimize this you can index that column too.

Related

Why does my total session (aggregated using EXTRACT MONTH) is less than total session if I broke down by the date?

I'm trying to generate my total session by month. I've tried using two different ways.
I'm using date field for the first column
I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month
I have tried using below code for the 1st one:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the 2nd one I tried using this code:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the result, I got the output as per below:
20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019
May: 2,905
So, there's 114 session discrepancy and I'd like to know why.
Thank you in advance.
For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session
Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day
The following query should show you which sessions_ids occur on both dates.
select session_id, count(distinct date) as num_dates
from table
where date >= '2019-05-20' AND date <= '2019-05-21'
group by 1
having num_dates > 1
This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.

Oracle - Show 0 if no data for the month

i'm trying to show some averages over the past 12 months but there is no data for June/July so i want the titles for the months to display but just 0's in the 3 columns
currently it's only showing August - May which is 10 rows so it's throwing off formulas and charts etc.
select to_char(Months.Period,'YYYY/MM') As Period, coalesce(avg(ec.hours_reset),0) as AvgOfHOURSReset, coalesce(AVG(ec.cycles_reset),0) as AvgofCycles_Reset, Coalesce(AVG(ec.days_reset),0) as AvgofDAYS_Reset
from (select distinct reset_date as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months
left outer join engineering_compliance ec on ec.reset_date = months.Period
WHERE EC.EO = 'AT CHECK'
group by to_char(Months.Period,'YYYY/MM')
order by to_char(Months.Period,'YYYY/MM')
;
(select distinct to_char(reset_date,'YYYY/MM') as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months;
That query is pretty good, it's not far from working.
You would need to replace the Months table part. You want exactly one row per month, regardless of whether there's any data in the ec table.
You could maybe synthesize some data without going to any actual table in your own schema.
For example:
SELECT
extract(month from add_months(sysdate,level-1)) Row_Month,
extract(year from add_months(sysdate,level-1)) Row_Year,
to_char(add_months(sysdate,level-1),'YYYY/MM') Formatted_Date,
trunc(add_months(sysdate,level-1),'mon') Join_Date
FROM dual
CONNECT BY level <= 12;
gives:
ROW_MONTH,ROW_YEAR,FORMATTED_DATE,JOIN_DATE
6,2016,'2016/06',1/06/2016
7,2016,'2016/07',1/07/2016
8,2016,'2016/08',1/08/2016
9,2016,'2016/09',1/09/2016
10,2016,'2016/10',1/10/2016
11,2016,'2016/11',1/11/2016
12,2016,'2016/12',1/12/2016
1,2017,'2017/01',1/01/2017
2,2017,'2017/02',1/02/2017
3,2017,'2017/03',1/03/2017
4,2017,'2017/04',1/04/2017
5,2017,'2017/05',1/05/2017
Option 1: Write that subselect inline into your query, replacing sysdate with the start month and the figure 12 on the last line can be altered for the number of months you want in the series.
Option 2 (can be reused more conveniently in a variety of situations and queries): Write a view with a long series of months (for example, Jan 1970 to Dec 2199) using my SQL above. You can then join to that view on join_date with whatever start and end months you want. It will give you one row per month and you can pick up the formatted date from its column.

Is there any better way to get data between days in H2 Database?

We want to select data from table with below condition.
Date of Transactiontime <= (Current Date - n Days)
for e.g.
Today is - 2016-06-21.
Date of Transactiontime = '2016-06-19 11:45:07.148'.
With below query we could get Data which is 2 days older.
Query:
SELECT * FROM T WHERE FORMATDATETIME (Transactiontime,'YYYY-MM-d') <= FORMATDATETIME ( DATEADD('HH',-2*24,Now()), 'YYYY-MM-d');
Dataype of Transactiontime = TIMESTAMP
Is there any better way to get the same results?
Have you tried function DATEDIFF()?
SELECT * FROM T WHERE DATEDIFF('DAY', NOW(), Transactiontime) >= 2
EDIT
Might be better using DAY_OF_YEAR instead of DAY.

Oracle - Another way for month query?

first... sorry for my english.
I have a query like this:
Select *
From tableA
Where (
TO_NUMBER(TO_CHAR(dateA(+),'SYYYY')) = 2013
AND TO_NUMBER(TO_CHAR(dateA(+),'MM')) = 02
AND to_number(to_char(dateA(+),'dd')) <= 25
)
and retrieve me the data from each date until last number that I give as parameter, in this case the day 25. This working but delay very much because the form of "Where" statement... anybody know another way that retrieve the data so fast and with the same functionality?
It sounds like you want
SELECT *
FROM tableA
WHERE dateA BETWEEN trunc( date '2013-02-26', 'MM' ) AND date '2013-02-26'
This will return all the rows where dateA is between the first of the month and the specified date. If there is an index on dateA, Oracle would be able to use it for this sort of query (though whether it actually would is a separate issue).

"BETWEEN" SQL Keyword for Oracle Dates -- Getting an error in Oracle

I have dates in this format in my database "01-APR-12" and the column is a DATE type.
My SQL statement looks like this:
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
When I try to do it that way, I get this error -- ORA-01839: date not valid for month specified.
Can I even use the BETWEEN keyword with how the date is setup in the database?
If not, is there another way I can get the output of data that is in that date range without having to fix the data in the database?
Thanks!
April has 30 days not 31.
Change
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
to
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '30-APR-12');
and you should be good to go.
In case the dates you are checking for range from 1st day of a month to the last day of a month then you may modify the query to avoid the case where you have to explicitly check the LAST day of the month
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno
AND s.salestype = 1 AND (s.salesdate BETWEEN '01-APR-12' AND LAST_DAY(TO_DATE('APR-12', 'MON-YY'));
The LAST_DAY function will provide the last day of the month.
The other answers are missing out on something important and will not return the correct results. Dates have date and time components. If your salesdate column is in fact a date that includes time, you will miss out on any sales that happened on April 30 unless they occurred exactly at midnight.
Here's an example:
create table date_temp (temp date);
insert into date_temp values(to_date('01-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
insert into date_temp values(to_date('30-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
table DATE_TEMP created.
1 rows inserted.
1 rows inserted.
select * from date_temp where temp between '01-APR-2014' and '30-APR-2014';
Query Result: 01-APR-14
If you want to get all records from April that includes those with time-components in the date fields, you should use the first day of the next month as the second side of the between clause:
select * from date_temp where temp between '01-APR-2014' and '01-MAY-2014';
01-APR-14
30-APR-14

Resources