oracle sql totalize days for contiguos ranges - oracle

I have a table with date ranges and i need to count the days only for the contiguos date ranges...
-----------------------------------
| table RANGES |
----------------------------------
| d_start | d_end | days |
| (date) | (date) | (num)|
-----------------------------------
| 2014-02-01 | 2014-02-05 | 4 |
| 2014-02-06 | 2014-02-11 | 5 |
| 2014-03-22 | 2014-03-25 | 3 |
| 2014-04-02 | 2014-04-10 | 8 |
| 2014-04-11 | 2014-04-20 | 9 |
-----------------------------------
I need to totalize days with break when the date ranges are not contiguos, a result like this:
| 2014-02-01 | 2014-02-11 | 9 |
| 2014-03-22 | 2014-03-25 | 3 |
| 2014-04-02 | 2014-04-20 | 17 |
i Tryed with LEAD to check if next record's d_start is equal d_end but i can't achieve the goal.
many thanks for any idea!
Marco

The answer is quite tricky:
SQL> create table tmp$dates (d_start date, d_end date);
Table created
SQL> insert into tmp$dates values (DATE '2014-02-01', DATE '2014-02-05');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-02-06', DATE '2014-02-11');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-03-22', DATE '2014-03-25');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-04-02', DATE '2014-04-10');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-04-11', DATE '2014-04-20');
1 row inserted
SQL> select min(d_start), max(d_end), max(d_end) - min(d_start) + 1 n#
2 from tmp$dates d
3 start with d_start not in (select d_end + 1 from tmp$dates)
4 connect by prior d_end = d_start - 1
5 group by level - rownum
6 order by 1;
MIN(D_START) MAX(D_END) N#
------------ ----------- ----------
01.02.2014 11.02.2014 11
22.03.2014 25.03.2014 4
02.04.2014 20.04.2014 19

Related

How can I write code for use LOOP FOR , date for sum balance

I have two table :
on table is tbl_ledger with columns
| balance | eff_date |
| -------- | -------- |
| 200$ | 2FEB 2008 |
| 500$ | 2FEB 2008 |
| 250$ | 3FEB 2008 |
| 150$ | 5feb 2008 |
ANOTHER TABLE NAME IS : tbl_ledger_input
with columns balance and eff_date
| balance | eff_date |
| -------- | -------- |
|700$ | 2FEB 2008 |
| 250$ | 3FEB 2008 |
| 150$ |5FEB 2008 |
| ........ | ....... |
**I want insert into table tbl_ledger_input for each day from tbl_ledger sum (balance) for each date with LOOP FOR
**
PLEASE HELP ME .
I WANT INSERT INTO TABLE TBL_LEDGER_INPUT using LOOP FOR each EFF_DATE .
As you've already seen in comments, loop isn't the best choice for that. Row-by-row processing is slow; you won't notice any difference if you're dealing with small data sets (as in this example), though.
If you want to do it in PL/SQL (using a loop), a simple option is a cursor FOR loop:
SQL> begin
2 for cur_r in (select eff_date, sum(balance) balance
3 from tbl_ledger
4 group by eff_date
5 )
6 loop
7 insert into tbl_ledger_input (balance, eff_date)
8 values (cur_r.balance, cur_r.eff_date);
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL> select * From tbl_ledger_input;
BALANCE EFF_DATE
---------- -----------
700 02-FEB-2008
250 03-FEB-2008
What you should really be doing is
SQL> truncate table tbl_ledger_input;
Table truncated.
SQL> insert into tbl_ledger_input (balance, eff_date)
2 select sum(balance), eff_date
3 from tbl_ledger
4 group by eff_date;
2 rows created.
SQL> select * From tbl_ledger_input;
BALANCE EFF_DATE
---------- -----------
700 02-FEB-2008
250 03-FEB-2008
SQL>

Pull interlinked records based on rank and latest timestamp

I have a table like below.
myTable:
---------------------------------------------------------------------------------
id | ref | type | status | update_dt
---------------------------------------------------------------------------------
id1 | m1123 | 10 | 1 | 03-NOV-22 10.44.64.104000000 AM
id1 | m2123 | 10 | 2 | 03-NOV-22 10.44.64.104000000 AM
id1 | s1123 | 20 | | 03-NOV-22 10.44.64.104000000 AM
id1 | s2123 | 20 | | 03-NOV-22 10.44.54.104000000 AM
id1 | p1123 | 30 | | 03-NOV-22 10.44.54.104000000 AM
id2 | m1234 | 10 | | 02-NOV-22 10.44.64.104000000 AM
id2 | s1234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id2 | s2234 | 20 | | 02-NOV-22 10.44.54.104000000 AM
id3 | m1345 | 10 | 1 | 01-NOV-22 10.44.64.104000000 AM
id3 | s1345 | 20 | | 01-NOV-22 10.44.64.104000000 AM
id3 | s2345 | 20 | | 01-NOV-22 10.44.54.104000000 AM
---------------------------------------------------------------------------------
My requirement looks pretty complex to me and I have tried to reach somewhere but not completely there. Here are my requirements.
From the table, I have to pull records of type 10 and 20 alone. With type 10 having status either null or 1.
For type 10 comparison, I need to convert the update_dt to epoch and pull all the type 10 records above a specific epoch.
type 10 records are linked to type 20 records by the id. They have the same id.
For all the records pulled in step 2, need to pull their corresponding type 20 records. But only the latest one based on update_dt.
If multiple records of type 20 has the same update_dt from step 4, any one of them can be picked.
By the above requirements, I need to get a result like for a sample epoch that corresponds to Nov 1 2022 - 11AM (1667300400):
-----------------------------------------------------------------------------------------------
ref1 | ref2 | ref1_update_dt | ref2_update_dt
-----------------------------------------------------------------------------------------------
m1123 | s1123 | 03-NOV-22 10.44.64.104000000 AM | 03-NOV-22 10.44.64.104000000 AM
m1234 | s2234 | 02-NOV-22 10.44.64.104000000 AM | 02-NOV-22 10.44.54.104000000 AM
-----------------------------------------------------------------------------------------------
I tried the below. But didnt quite get there.
WITH cte_latest AS
(
SELECT
t1.ref ref1,
t2.ref ref2,
t1.update_dt ref1_update_dt,
t2.update_dt ref2_update_dt,
RANK() OVER(ORDER BY t2.update_dt DESC) rank_temp
FROM
myTable t1
JOIN myTable t2 ON
t1.id = t2.id
WHERE
t1.type = 10
AND (t1.status IS NULL
OR t1.status = 1)
AND t2.type = 20
AND (CAST(t1.update_dt AS DATE) - TO_DATE('01/01/1970', 'DD/MM/YYYY')) * 24 * 60 * 60 > '1667300400')
SELECT
ref1,
ref2,
ref1_update_dt,
ref2_update_dt
FROM
cte_latest
WHERE
rank_temp = 1
ORDER BY
ref1_update_dt;
Please help.
RANK will return the same number when there are multiple type 20 records that have the same update_dt. So, you will want to use ROW_NUMBER instead. That will ensure that each type 20 row gets a unique number to break any ties - per rule #5.
Also, you will need to partition the ROW_NUMBER based on the id of the type 10 records. That will cause the numbering to reset at 1 for each type 10 record id. Without partitioning every row in the result set would get a unique number.
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.update_dt DESC)

See number of live/dead tuples in MonetDB

I'm trying to get some precise row counts for all tables, given that some have deleted rows. I have been using sys.storage.count. But this seems to count the deleted ones also.
I assume using sys.storage would be simpler and faster than looping through count(*) queries, though both strategies may be fine in practice.
Maybe there is some column that counts modifications so I could just subtract the two counts?
If all you need to know is the number of actual rows in a table, I'd recommend just using a count(*) query. It's very fast. Even if you have N tables, it's easy to do a count(*) for each table.
sys.storage gives you information from the raw storage. With that, you can get pretty low-level information, but it has some edges. sys.storage.count returns the count in the storage, hence, indeed, it includes the delete rows since they are not actually deleted. As of Jul2021 version of MonetDB, deleted rows are automatically overwritten by new inserts (i.e. auto-vacuuming). So, to get the actual row count, you need to look up the 'deletes' from sys.deltas('<schema>', '<table>'). For instance:
sql>create table tbl (id int, city string);
operation successful
sql>insert into tbl values (1, 'London'), (2, 'Paris'), (3, 'Barcelona');
3 affected rows
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 2 | Paris |
| 3 | Barcelona |
+------+-----------+
3 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 0 |
| 15570 | 0 |
+-------+---------+
2 tuples
After we delete one row, the actual row count is sys.storage.count - sys.deltas ('sys', 'tbl').deletes:
sql>delete from tbl where id = 2;
1 affected row
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 3 | Barcelona |
+------+-----------+
2 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 1 |
| 15570 | 1 |
+-------+---------+
2 tuples
After we insert a new row, the deleted row is overwritten:
sql>insert into tbl values (4, 'Praag');
1 affected row
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 4 | Praag |
| 3 | Barcelona |
+------+-----------+
3 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 0 |
| 15570 | 0 |
+-------+---------+
2 tuples
So, the formula to compute the actual row count (sys.storage.count - sys.deltas ('sys', 'tbl').deletes) is generally applicable. sys.deltas() keeps stats for every column of a table, but the count and deletes are table wide, so you only need to check one column.

Extract timestamp from filename and add it in new column(say,date) by using Pig

I have a file with name YYYYMMDD_claims_portal.csv, I need only YYYYMMDD part and store this value in new column (say, date).
Earlier we have 3 columns: Claim,User,ID. Now I need to add one more column date having value as YYYYMMDD as per file.
input__file__name
Demo
bash
[]$ mkdir mytable
[]$ cat>mytable/20170918_claims_portal.csv
1
2
[]$ cat>mytable/20170919_claims_portal.csv
3
[]$ cat>mytable/20170920_claims_portal.csv
4
5
6
hive
create external table mytable (i int) stored as textfile
;
select i
,regexp_extract(input__file__name,'(\\d{8})_claims_portal.csv',1) as dt
from mytable
;
+----+-----------+
| i | dt |
+----+-----------+
| 4 | 20170920 |
| 5 | 20170920 |
| 6 | 20170920 |
| 3 | 20170919 |
| 1 | 20170918 |
| 2 | 20170918 |
+----+-----------+

Slow Update When Using Oracle PL/SQL Table

We're using a PL/SQL table (named pTable) to collect a number of ids to be updated.
However, the statement
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (pTable));
takes a long time to execute.
It seems that the optimizer comes up with a very bad execution plan, instead of using the index that is defined on id (as the primary key) it decides to use a full table scan on the aTable. pTable usually contains very few values (in most cases just one).
What can we do to make this faster? The best we've come up with is to handle low pTable.Count (1 and 2) as special cases, but that is certainly not very elegant.
Thanks for all the great suggestions. I wrote about this issue in my blog at http://smartercoding.blogspot.com/2010/01/performance-issues-using-plsql-tables.html.
You can try the cardinality hint. This is good if you know (roughly) the number of rows in the collection.
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT /*+ cardinality( pt 10 ) */
COLUMN_VALUE
FROM TABLE (pTable) pt );
Here's another approach. Create a temporary table:
create global temporary table pTempTable ( id int primary key )
on commit delete rows;
To perform the update, populate pTempTable with the contents of pTable and execute:
update
(
select aColumn
from aTable aa join pTempTable pp on aa.id = pp.id
)
set aColumn = 1;
The should perform reasonably well without resorting to optimizer hints.
The bad execution plan is probably unavoidable (unfortunately). There is no statistics information for the PL/SQL table, so the optimizer has no way of knowing that there are few rows in it. Is it possible to use hints in an UPDATE? If so, you might force use of the index that way.
It helped to tell the optimizer to use the "correct" index instead of going on a wild full-table scan:
UPDATE /*+ INDEX(aTable PK_aTable) */aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (CAST (pdarllist AS list_of_keys)));
I couldn't apply this solution to more complicated scenarios, but found other workarounds for those.
You could try adding a ROWNUM < ... clause.
In this test a ROWNUM < 30 changes the plan to use an index.
Of course that depends on your set of values having a reasonable maximum size.
create table atable (acolumn number, id number);
insert into atable select rownum, rownum from dual connect by level < 150000;
alter table atable add constraint atab_pk primary key (id);
exec dbms_stats.gather_table_stats(ownname => user, tabname => 'ATABLE');
create type type_coll is table of number(4);
/
declare
v_coll type_coll;
begin
v_coll := type_coll(1,2,3,4);
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (v_coll));
end;
/
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
UPDATE ATABLE SET ACOLUMN = 1 WHERE ID IN (SELECT COLUMN_VALUE FROM TABLE (:B1 ))
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 142 (100)| |
| 1 | UPDATE | ATABLE | | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 1 | 11 | 142 (8)| 00:00:02 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| | | | | |
| 4 | TABLE ACCESS FULL | ATABLE | 150K| 1325K| 108 (6)| 00:00:02 |
----------------------------------------------------------------------------------------------
declare
v_coll type_coll;
begin
v_coll := type_coll(1,2,3,4);
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (v_coll)
where rownum < 30);
end;
/
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
UPDATE ATABLE SET ACOLUMN = 1 WHERE ID IN (SELECT COLUMN_VALUE FROM TABLE (:B1 ) WHERE
ROWNUM < 30)
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 31 (100)| |
| 1 | UPDATE | ATABLE | | | | |
| 2 | NESTED LOOPS | | 1 | 22 | 31 (4)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 29 | 377 | 29 (0)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 58 | | |
|* 5 | COUNT STOPKEY | | | | | |
| 6 | COLLECTION ITERATOR PICKLER FETCH| | | | | |
|* 7 | INDEX UNIQUE SCAN | ATAB_PK | 1 | 9 | 0 (0)| |
---------------------------------------------------------------------------------------------------
I wonder if the MATERIALIZE hint in the subselect from the PL/SQL table would force a temp table instantiation and help the optimizer?
UPDATE aTable
SET aColumn = 1
WHERE id IN (SELECT /*+ MATERIALIZE */ COLUMN_VALUE
FROM TABLE (pTable));

Resources