Oracle query with different dates - oracle

I have to write this query, and it is a bit complex. I am hoping someone can help, as I've received much help from here before.
Say I have a customers stock portfolio. And a list of company tickers, and the date the ticker was purchased. My list looks something like this:
CYSL 1/16/2017
MCIG 4/1/2016
MSRT 9/13/2016
NTFU 1/16/2017
QNTM 10/30/2014
SIGWX 6/28/2014
TRMCX 6/25/2014
TRT2 4/19/2016
Now, in order for my to compute some YTD performance, I need to apply the following logic:
If the purchase date > 01/01/2017, I'll use the closing price of the ticker when it was purchased.
If the purchase date < 01/01/2017. I'll use the closing price of the ticker on <= 12/31/2016.
There are 2 tables involved: 1) Portfolio Table 2) Price History
I've gotten this far:
SELECT ticker, MIN(transaction_date) KEEP (DENSE_RANK FIRST ORDER BY transaction_date) transaction_date
FROM customer_portfolios
WHERE portfolio_id = 954118
GROUP BY ticker;
This gives me the list above. Now, I am lost on how to join this with the logic above, to get the proper date, and go after the proper price.
I hope I am explaining this correctly.
And help will be great, and I can explain more if it will help you, help me.
Thank you.

Use GREATEST to get the date at either 2016-12-31 or the later transaction date and then just join to the price history table:
SELECT cp.ticker,
cp.transaction_date,
h.close_price
FROM (
SELECT ticker,
GREATEST(
DATE '2016-12-31',
MIN(transaction_date) KEEP (
DENSE_RANK FIRST
ORDER BY transaction_date
)
) AS transaction_date
FROM customer_portfolios
WHERE portfolio_id = 954118
GROUP BY ticker
) cp
INNER JOIN price_history h
ON ( cp.ticker = h.ticker
AND cp.transaction_date BETWEEN h.start_date AND h.end_date )
or if the price history has a row per day (rather than the assumed range in the query above) then replace the last line with:
AND cp.transaction_date = h.price_date )

Perhaps, I'm missing something here but without any real sample data, this is the best I can come up with. Below are the table ddl's, inserts and query and an image of the results.
CREATE TABLE PRICE_HISTORY
( PRICE_DATE DATE,
TICKER VARCHAR2(20 BYTE),
OPEN_PRICE NUMBER,
CLOSE_PRICE NUMBER
)
CREATE TABLE PORTFOLIO_TABLE
( TICKER VARCHAR2(20 BYTE),
TICKER_DATE VARCHAR2(20 BYTE),
CUSTOMER VARCHAR2(20 BYTE)
)
Insert into PORTFOLIO_TABLE (TICKER,TICKER_DATE,CUSTOMER) values ('CYSL','1/16/2018','1');
Insert into PORTFOLIO_TABLE (TICKER,TICKER_DATE,CUSTOMER) values ('MCIG','04/1/2016','2');
Insert into PORTFOLIO_TABLE (TICKER,TICKER_DATE,CUSTOMER) values ('MSRT','09/13/2016','3');
Insert into PORTFOLIO_TABLE (TICKER,TICKER_DATE,CUSTOMER) values ('NTFU','01/16/2017','4');
Insert into PRICE_HISTORY (PRICE_DATE,TICKER,OPEN_PRICE,CLOSE_PRICE) values (to_date('27-MAR-2017 20:27:12','DD-MON-RRRR HH24:MI:SS'),'CYSL',1,2);
Insert into PRICE_HISTORY (PRICE_DATE,TICKER,OPEN_PRICE,CLOSE_PRICE) values (to_date('16-JUN-1997 20:27:33','DD-MON-RRRR HH24:MI:SS'),'MCIG',1,2);
Insert into PRICE_HISTORY (PRICE_DATE,TICKER,OPEN_PRICE,CLOSE_PRICE) values (to_date('31-MAY-2011 20:27:45','DD-MON-RRRR HH24:MI:SS'),'MSRT',5,8);
Insert into PRICE_HISTORY (PRICE_DATE,TICKER,OPEN_PRICE,CLOSE_PRICE) values (to_date('25-JAN-2021 20:27:55','DD-MON-RRRR HH24:MI:SS'),'NTFU',7,6);
WITH portfolio AS
( SELECT TICKER , TICKER_DATE, CUSTOMER FROM PORTFOLIO_TABLE
)
SELECT
CASE
WHEN PRICE_DATE > '01-JAN-17'
THEN CLOSE_PRICE
ELSE OPEN_PRICE
END AS AMOUNT,
PRICE_DATE,
P.TICKER,
OPEN_PRICE,
CLOSE_PRICE
FROM PRICE_HISTORY H,
PORTFOLIO P
WHERE H.TICKER = P.TICKER;

The ultimate goal of this query is to get a SUM of the customers portfolio, on his purchases.
This part is working perfectly!
SELECT m_ticker,
GREATEST(
DATE '2016-12-31',
MIN(transaction_date) KEEP (
DENSE_RANK FIRST
ORDER BY transaction_date
)
) AS transaction_date
FROM customer_portfolio_history
WHERE portfolio_id = 954118
GROUP BY m_ticker;
Gives me the data I need:
CYSL 1/16/2017
MCIG 12/31/2016
MSRT 12/31/2016
NTFU 1/16/2017
QNTM 12/31/2016
SIGWX 12/31/2016
TRMCX 12/31/2016
TRT2 12/31/2016
Now it gets trickier. With the results above, I need to go into the PRICE_HISTORY, and find the price which is on or closest to (earlier) that date.
So, if there is no entry for 12/31/2016 (maybe market was closed), then try 12/30, 12/29, etc. Same for 1/16/2017. If there is no entry, then try 1/15, then 1/14....
After, I can take the total # shares from the PORTFOLIO table for that customer / portfolio / ticker, and multiply it by the price I retrieve for the day found.....and there is the value.
Pretty crazy, I know.

Related

aggregate date ranges with gaps in oracle

I need to aggregate date ranges allowing for max 2 days gaps in between for each id. Any help would be much appreciated
create table tt ( id int, startdate date, stopdate date);
Insert into TT values (1,'24/05/2010', '29/05/2010');
Insert into TT values (1,'30/05/2010', '22/06/2010');
Insert into TT values (10,'26/06/2012', '28/06/2012');
Insert into TT values (10,'29/06/2012', '30/06/2012');
Insert into TT values (10,'01/07/2012', '30/07/2012');
Insert into TT values (10,'03/08/2012', '30/12/2012');
insert into TT values (90,'08/03/2002', '16/03/2002');
insert into TT values (90,'31/01/2002', '15/02/2002');
insert into TT values (90,'15/02/2002', '28/02/2002');
insert into TT values (90,'31/01/2002', '15/02/2004');
insert into TT values (90,'15/02/2004', '15/04/2004');
insert into TT values (90,'01/03/2002', '07/03/2002');
expected output would be:
1 24/05/2010 22/06/2010
10 26/06/2012 30/07/2012
10 03/08/2012 30/12/2012
90 31/01/2002 15/04/2004
If you're on 12c, you can use one of my favourite SQL features: pattern matching (match_recognize).
With this you need to define a pattern variable. This is where you'll check that the start date of the current row is within two days of the stop date for the previous row. Which is:
startdate <= prev ( stopdate ) + 2
The pattern you're searching for is any row, followed by zero or more rows that meet this criterium.
So you have an "always true" strt variable, followed by * (regular expression zero-or-more quantifier) occurrences of the within2 variable:
( strt within2* )
I'm guessing you also need to split the ranges up by ID. So I've added a partition by for this.
Put it all together and you get:
select *
from tt match_recognize (
partition by id
order by startdate, stopdate
measures
first ( startdate ) startdate,
last ( stopdate ) stopdate
pattern ( strt within2* )
define
within2 as startdate <= prev ( stopdate ) + 2
);
ID STARTDATE STOPDATE
1 24/05/2010 22/06/2010
10 26/06/2012 30/07/2012
10 03/08/2012 30/12/2012
If you want to know more about this, you can find several match_recognize examples here.

Delete duplicate rows from a BigQuery table

I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING

Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message

UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );

How to generate each day of backlog for a ticket

Hi I'm trying to create a procedure for calculating the backlog for each day.
For example: I have a ticket with ticket_submitdate on 12-sep-2015 and resolved_date on 15-sep-2015 in one table. This ticket should come as a backlog in the backlog_table because it was not resolved on the same day as the ticket_submitdate.
I have another column date_col in the backlog_table where the date on which the ticket was a backlog is displayed,i.e, it should be there in the ticket_backlog table for dates 13-sep-2015 and 14-sep-2015 and the date_col column should have this ticket for both these dates.
Please help.
Thanks in advance.
Here is some test data:
create table backlog (ticket_no number, submit_date date, resolved_date date);
insert into backlog values (100, date '2015-09-12', date '2015-09-15');
insert into backlog values (200, date '2015-09-12', date '2015-09-14');
insert into backlog values (300, date '2015-09-13', date '2015-09-15');
insert into backlog values (400, date '2015-09-13', date '2015-09-16');
insert into backlog values (500, date '2015-09-13', date '2015-09-13');
This query generates a list of dates which spans the range of BACKLOG records, and joins them to the BACKLOG.
with dt as ( select min(submit_date) as st_dt
, greatest(max(resolved_date), max(submit_date)) as end_dt
from backlog)
, dt_range as ( select st_dt + (level-1) as date_col
from dt
connect by level <= ( end_dt - st_dt ))
select b.ticket_no
, d.date_col
from backlog b
cross join dt_range d
where d.date_col between b.submit_date and b.resolved_date
and b.submit_date != b.resolved_date
order by b.ticket_no
, d.date_col
/
Therefore it produces a list of TICKET_NOs with all the dates when they are live:
TICKET_NO DATE_COL
---------- ---------
100 12-SEP-15
100 13-SEP-15
100 14-SEP-15
100 15-SEP-15
200 12-SEP-15
200 13-SEP-15
200 14-SEP-15
300 13-SEP-15
300 14-SEP-15
300 15-SEP-15
400 13-SEP-15
400 14-SEP-15
400 15-SEP-15
14 rows selected.
SQL>
The result set does not include ticket #500 because it was resolved on the day of submission. You will probably need to tweak the filters to fit your actual business rules.
I m not sure I understood your question, if you are looking for all dates between two date range then you can use below query -
select trunc(date_col2+lv) from
(select level lv from dual connect by level < (date_col1-date_col2-1) )
order by 1

Aggregate only new rows from source table

I got one Source table with a timestamp column (YYYY.MM.DD HH24:MI:SS) and a target table with aggregated rows on daily basis (Date column: YYYY.MM.DD).
My Problem is: How do I bring new data from source to target and aggregate it?
I tried:
select
a.Sales,
trunc(a.timestamp,'DD') as TIMESTAMP,
count(1) as COUNT,
from
tbl_Source a
where trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY)from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
group by a.sales,
trunc(a.Timestamp,'DD')
The problem with that is: when I have a row with timestamp '2013.11.15 00:01:32' and the max day from target is the 14th of november, it will only aggregate the 15th. Would I use >= instead of > some rows would get loaded twice.
It looks like you are looking for a merge statement: If the day is already present in tbl_target then update the count else insert the record.
merge into tbl_target dest
using
(
select sales, trunc(timestamp) as theday , count(*) as sales_count
from tbl_Source
where trunc(timestamp) >= ( select nvl(max(time_to_day),to_date('01.01.1975','dd.mm.yyyy')) from tbl_target )
group by sales, trunc(timestamp)
) src
on (src.theday = dest.time_to_day)
when matched then update set
dest.sales_count = src.sales_count
when not matched then
insert (time_to_day, sales_count)
values (src.theday, src.sales_count)
;
As far as I can understand your question: you need to get everything since the last reload to target table.
The problem here: you need this date, but it is truncated during the update.
If my guesses are correct you cannot do anything except to store the date of reload as an additional column because there is no way to get it back from the data presented here.
about your query:
count(*) and count(1) are the same in performance (proved many times, at least in 10-11 versions) - do not make this count(1), looks really ugly
do not use nvl, use coalesce instead of it - it is much faster
I would write your query like that:
with t as (select max(b.time_to_day) mx from tbl_target b)
select a.sales,trunc(a.timestamp,'dd') as timestamp,count(*) as count
from tbl_source a,t
where trunc(a.timestamp,'dd') > t.mx or t.mx is null
group by a.sales,trunc(a.timestamp,'dd')
Does this fit your needs:
WHERE trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
i.e. instead of 2013-11-15 00:00:00 compare to 2013-11-16 23:59:59
Update:
This one?
WHERE trunc(a.timestamp,'DD') BETWEEN nvl((select MAX(b.TIME_TO_DAY) from ...) AND nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from ...)

"BETWEEN" SQL Keyword for Oracle Dates -- Getting an error in Oracle

I have dates in this format in my database "01-APR-12" and the column is a DATE type.
My SQL statement looks like this:
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
When I try to do it that way, I get this error -- ORA-01839: date not valid for month specified.
Can I even use the BETWEEN keyword with how the date is setup in the database?
If not, is there another way I can get the output of data that is in that date range without having to fix the data in the database?
Thanks!
April has 30 days not 31.
Change
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '31-APR-12');
to
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno AND s.salestype = 1
AND (s.salesdate BETWEEN '01-APR-12' AND '30-APR-12');
and you should be good to go.
In case the dates you are checking for range from 1st day of a month to the last day of a month then you may modify the query to avoid the case where you have to explicitly check the LAST day of the month
SELECT DISTINCT c.customerno, c.lname, c.fname
FROM customer c, sales s
WHERE c.customerno = s.customerno
AND s.salestype = 1 AND (s.salesdate BETWEEN '01-APR-12' AND LAST_DAY(TO_DATE('APR-12', 'MON-YY'));
The LAST_DAY function will provide the last day of the month.
The other answers are missing out on something important and will not return the correct results. Dates have date and time components. If your salesdate column is in fact a date that includes time, you will miss out on any sales that happened on April 30 unless they occurred exactly at midnight.
Here's an example:
create table date_temp (temp date);
insert into date_temp values(to_date('01-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
insert into date_temp values(to_date('30-APR-2014 15:12:00', 'DD-MON-YYYY HH24:MI:SS'));
table DATE_TEMP created.
1 rows inserted.
1 rows inserted.
select * from date_temp where temp between '01-APR-2014' and '30-APR-2014';
Query Result: 01-APR-14
If you want to get all records from April that includes those with time-components in the date fields, you should use the first day of the next month as the second side of the between clause:
select * from date_temp where temp between '01-APR-2014' and '01-MAY-2014';
01-APR-14
30-APR-14

Resources