I have a table with few columns as shown below. I would like to get a count of all the records per week level but I am unable to count it. I could also use group by but I do not want to do that because it gives me too many records. I use denodo and oracle 18g.
s_id sub_id week year st_id
24hifew njfhwf 50 2020 ew1eer
939hjefbw newfkhwfe 34 2019 e3eef3
hewfhwe23 67832ghef 44 2018 ewfwf1
Code:
select
xx.s_id,
xx.sub_id,
xx.st_id,
yy.week,
yy.year,
count(*) OVER ( PARTITION BY yy.year, yy.week,xx.s_id,xx.sub_id xx.st_id) as week_l
from xx as xx left join yy as yy
Basically, I am looking for an equivalent query to partition by which will run fine.
Error:
finished with error: Error executing view: Function count is not executable
Related
I'm trying to generate my total session by month. I've tried using two different ways.
I'm using date field for the first column
I'm using month field that is extracted from date field using EXTRACT(MONTH FROM date) AS month
I have tried using below code for the 1st one:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT date_key, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the 2nd one I tried using this code:
with
session1 as(
select date,
session_id
from table
where date >= '2019-05-20' AND date <= '2019-05-21')
SELECT EXTRACT (MONTH FROM date_key) AS month, COUNT(DISTINCT session_id) AS sessions from session1
GROUP BY 1
For the result, I got the output as per below:
20 May: 1,548 Sessions; 21 May: 1,471 Sessions; Total: 3,019
May: 2,905
So, there's 114 session discrepancy and I'd like to know why.
Thank you in advance.
For simplicity sake - let's say there is only one session during two consecutive days. So if you will count by day and then sum result - you will get 2 sessions, while if you will count distinct sessions for whole two days - you will get just 1 session
Hope this shows you the reason why - you are counting some sessions twice on different days - maybe when they go over end of one and start of next day
The following query should show you which sessions_ids occur on both dates.
select session_id, count(distinct date) as num_dates
from table
where date >= '2019-05-20' AND date <= '2019-05-21'
group by 1
having num_dates > 1
This is either a data processing issue, or your session definition is allowed to span multiple days. Google Analytics, for example, traditionally ends a session and begins a new session at midnight. Other sessionization schemes might not impose this restriction.
Let me start by saying, I am very new to Hive, so I'm not sure what information folks will need to help me out. Please let me know what information would be useful. Also, while I'd usually create a small dataset to recreate the problem with, I think this problem has to do with the scale of my dataset, because I can't seem to recreate the problem on a smaller dataset. Let me know if you have suggestions to make this more easy to answer.
Okay now that's out of the way, here's my problem. I have a huge dataset, partitioned by month, with about 500 million rows per month. I have a column with an ID number in it (I'll call it idcol), and I want to closely examine a couple of examples where there's a high number of repeated IDs and a very low number. So, I used this:
SELECT idcol, COUNT(*) FROM table WHERE month = 7 GROUP BY idcol LIMIT 10;
And got:
000005185884381 13
000035323848000 24
000017027256315 531
000010121767109 54
000039844553332 3
000013731352481 309
000024387407996 3
000028461234451 67
000016564844672 1
000032933040806 17
So, I went to investigate the first idvar with a count of 3, with:
SELECT * FROM table WHERE month = 7 AND idcol = '000039844553332';
I expected to see just 3 rows, but ended up with 469 rows found! That was strange enough, but then I just happened to run the original line of code above but with LIMIT 5 instead and ended up with:
000005185884381 13
000017027256315 75
000010121767109 25
000013731352481 59
000024387407996 1
And, it may be hard to see because the idcol is so long, but idvar 000017027256315 ended up with a count of 531 when I did LIMIT 10 and just 75 when I did LIMIT 5.
What am I missing?! How can I get a correct count of just a small number of values so I can investigate further?!
BTW my first thought was to make the counting part a sub-query, but that didn't change a thing. I used:
SELECT * FROM (SELECT idcol, COUNT(*) FROM table WHERE month = 7 GROUP BY idcol) x LIMIT 10;
...same EXACT results
Most likely the counts are being computed from statistics.See here for the bug and the related discussion.
hive.compute.query.using.stats = FALSE
If this doesn't fix it try the ANALYZE command before running the count(*)
ANALYZE TABLE table_name PARTITION(month) COMPUTE STATISTICS;
i'm trying to show some averages over the past 12 months but there is no data for June/July so i want the titles for the months to display but just 0's in the 3 columns
currently it's only showing August - May which is 10 rows so it's throwing off formulas and charts etc.
select to_char(Months.Period,'YYYY/MM') As Period, coalesce(avg(ec.hours_reset),0) as AvgOfHOURSReset, coalesce(AVG(ec.cycles_reset),0) as AvgofCycles_Reset, Coalesce(AVG(ec.days_reset),0) as AvgofDAYS_Reset
from (select distinct reset_date as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months
left outer join engineering_compliance ec on ec.reset_date = months.Period
WHERE EC.EO = 'AT CHECK'
group by to_char(Months.Period,'YYYY/MM')
order by to_char(Months.Period,'YYYY/MM')
;
(select distinct to_char(reset_date,'YYYY/MM') as Period from engineering_compliance
where reset_date between '01/JUN/15' and '31/MAY/16') Months;
That query is pretty good, it's not far from working.
You would need to replace the Months table part. You want exactly one row per month, regardless of whether there's any data in the ec table.
You could maybe synthesize some data without going to any actual table in your own schema.
For example:
SELECT
extract(month from add_months(sysdate,level-1)) Row_Month,
extract(year from add_months(sysdate,level-1)) Row_Year,
to_char(add_months(sysdate,level-1),'YYYY/MM') Formatted_Date,
trunc(add_months(sysdate,level-1),'mon') Join_Date
FROM dual
CONNECT BY level <= 12;
gives:
ROW_MONTH,ROW_YEAR,FORMATTED_DATE,JOIN_DATE
6,2016,'2016/06',1/06/2016
7,2016,'2016/07',1/07/2016
8,2016,'2016/08',1/08/2016
9,2016,'2016/09',1/09/2016
10,2016,'2016/10',1/10/2016
11,2016,'2016/11',1/11/2016
12,2016,'2016/12',1/12/2016
1,2017,'2017/01',1/01/2017
2,2017,'2017/02',1/02/2017
3,2017,'2017/03',1/03/2017
4,2017,'2017/04',1/04/2017
5,2017,'2017/05',1/05/2017
Option 1: Write that subselect inline into your query, replacing sysdate with the start month and the figure 12 on the last line can be altered for the number of months you want in the series.
Option 2 (can be reused more conveniently in a variety of situations and queries): Write a view with a long series of months (for example, Jan 1970 to Dec 2199) using my SQL above. You can then join to that view on join_date with whatever start and end months you want. It will give you one row per month and you can pick up the formatted date from its column.
I need to analyse abnormal returns for an event study on mergers and acquisitions.
** I would like to analyse abnormal returns to acquirers by using event windows. Basically I would like to extract the prices for the acquirers using -1 (the day before the announcement date), announcement date, and +1 (the day after the announcement date).**
I have two different datasets to extract information from.
The first is a dataset with all the merger and acquisition information that has the information in the following format:
DealNO AcquirerNO TargetNO AnnouncementDate
123 abcd Cfgg 22/12/2010
222 qwert cddfgf 26/12/1998
In addition, I have a 2nd dataset which has all the prices.
ISINnumber Date Price
abcd 21/12/2010 10
abcd 22/12/2010 11
abcd 23/12/2010 11
abcd 24/12/2010 12
qwert 20/12/1998 20
qwert 21/12/1998 20
qwert 22/12/1998 21
qwert 23/12/1998 21
qwert 24/12/1998 21
qwert 25/12/1998 22
qwert 26/12/1998 21
qwert 27/12/1998 23
ISIN number is the same as acquirer no, and that is the matching code.
In the end I would like to have a database something like this:
DealNO AcquirerNO TargetNO AnnouncementDate Acquirerprice(-1day) Acquireeprice(0day) Acquirerprice(+1day)
123 abcd Cfgg 22/12/2010 10 11 12
222 qwert cddfgf 26/12/1998 22 21 23
Do you know how I can get this?
I'd prefer to use sas to run the code, but if you are familiar with any other programs that can get the data like this, please let me know.
Thank you in advance ^_^.
This can be done quite easily with PROC SQL and joining the PRICE dataset three times. Try this (assuming data set names of ANNOUCE and PRICE):
Warning: untested code
%let day='21DEC2010'd;
proc sql;
create table RESULT as
select a.dealno,
a.acquirerno,
a.targetno,
a.annoucementdate,
p.price as acquirerprice_prev,
c.price as acquirerprice_cur,
n.price as acquirerprice_next
from ANNOUCE a
left join (select * from PRICE where date = &day-1) p on a.acquirerno = p.isinumber
left join (select * from PRICE where date = &day) c on a.acquirerno = c.isinumber
left join (select * from PRICE where date = &day+1) n on a.acquirerno = n.isinumber
;
quit;
I'm trying to figure out how to compare the result of a date substraction in a where clause.
Clients subscribed to a service and therefore are linked to a subscription that has an end date. I want to display the list of subscriptions that will come to an end within the next 2 weeks.
I did not designed the databse but noticed that the End_Date column type is a varchar and not a date.. I can't change that.
My problem is the following:
If I try to select the result of the substraction for example with this request:
SELECT(TO_DATE(s.end_date,'YYYY-MM-DD') - TRUNC(SYSDATE)) , s.name
from SUBSCRIPTION s WHERE s.id_acces = 15
This will work and give me the number of days between the end of the subscription and the current date.
BUT now, if I try to include the exact same request in a clause where for comparison:
SELECT s.name
from SUBSCRIPTION S
WHERE (TO_DATE(s.end_date,'YYYY-MM-DD') - TRUNC(SYSDATE)) between 0 and 16
I will get an error: "ORA-01839 : date not valid for month specified".
Any help would be appreciated..
Somewhere in the table you have your date formatted in a different way from YYYY-MM-DD. In your first query you check a certain row (or a set of rows, s.id_acces = 15), which is probably ok, but in the second you scan through all the table.
Try finding these rows with something like,
select end_date from subscription
where not regexp_like(end_date, '[0-9]{4}-[0-9]{2}-[0-9]{2}')
Check your DD value (ie: day of the month). This value must be between 1 and the number of days in the month.
January - 1 to 31
February - 1 to 28 (1 to 29, if a leap year)
March - 1 to 31
April - 1 to 30
May - 1 to 31
June - 1 to 30
July - 1 to 31
August - 1 to 31
September - 1 to 30
October - 1 to 31
November - 1 to 30
December - 1 to 31
" the End_Date column type is a varchar and not a date.. I can't
change that."
If you can't change the date you'll have to chang3 the data. You can identify the rogue values with this function:
create or replace check_date_format (p_str in varchar2) return varchar2
is
d date;
begin
d := to_date(p_str,'YYYY-MM-DD');
return 'VALID';
exception
when others then
return 'INVALID';
end;
You can use this function in a query:
select sid, end_date
from SUBSCRIPTION
where check_date_format(end_date) != 'VALID';
Your choices are:
fix the data so all the dates are in the same format
fix the data and apply a check constraint to enforce future validity
write a bespoke MY_TO_DATE() function which takes a string and applies lots of different date format masks to it in the hope of getting a successful conversion.