I have a table like this where TIMELINE is just an overall series of dates in order and DATE that I've matched to the dates in TIMELINE and contains actual records of dates that have values (shown in the NUMBER column)
TIMELINE
DATE
NUMBER
2022-03-03
2022-03-03
NULL
2022-03-04
2022-03-04
40
2022-03-07
NULL
NULL
2022-03-08
NULL
NULL
2022-06-08
2022-06-08
45
2022-06-28
2022-06-28
NULL
2022-06-29
NULL
NULL
2022-06-30
NULL
NULL
2022-07-08
2022-07-08
80
I am trying to fill in this table so that for every DATE that is NULL, its NUMBER becomes the value of the most recent/max DATE that is <= to the current date of the row I am looking at. So for example, in the third row, the DATE is NULL so its NUMBER would become the value (40) that is associated with the date 2022-03-04 and the same thing for fourth row. Then for the third to last and the second to last row, its NUMBER would be NULL because that is the value associated with the most recent date before that row.
I am trying to get an output like this:
TIMELINE
DATE
NUMBER
2022-03-03
2022-03-03
NULL
2022-03-04
2022-03-04
40
2022-03-07
NULL
40
2022-03-08
NULL
40
2022-06-08
2022-06-08
45
2022-06-28
2022-06-28
NULL
2022-06-29
NULL
NULL
2022-06-30
NULL
NULL
2022-07-08
2022-07-08
80
Would this be achieved by joining the table onto itself?/Is there a function that would help me do this? Thanks!
Here's an example of the update query that would produce the desired results. Although I've used your reserved words, I would suggest avoiding those in your modeling.
--
-- Create test table
--
create
or replace table x (timeline date, date date, number integer);
--
-- load test data
--
insert into
x
values
('2022-03-03'::date,'2022-03-03'::date,NULL), ('2022-03-04'::date,'2022-03-04'::date,40), ('2022-03-07'::date,NULL,NULL), ('2022-03-08'::date,NULL,NULL), ('2022-06-08'::date,'2022-06-08'::date,45),('2022-06-28'::date,'2022-06-28'::date,NULL), ('2022-06-29'::date,NULL,NULL), ('2022-06-30'::date,NULL,NULL), ('2022-07-08'::date,'2022-07-08'::date,80);
--
-- Update the original table via a correlation
--
update
x
set
x.number = z.number
from
(
select
a.timeline,
a.date,
case
when (a.date is NULL) then b.number
else a.number
end number
from
x a
left join x b on b.timeline = (
select
max(timeline)
from
x
where
timeline < a.timeline
and date is not null
)
) z
where
x.timeline = z.timeline
and x.date = z.date;
--
-- Results
--
select * from x order by 1;
Results:
TIMELINE DATE NUMBER
2022-03-03 2022-03-03
2022-03-04 2022-03-04 40
2022-03-07 40
2022-03-08 40
2022-06-08 2022-06-08 45
2022-06-28 2022-06-28
2022-06-29
2022-06-30
2022-07-08 2022-07-08 80
Related
I ran into a problem and maybe there are experienced guys here to help me figure it out:
I have a table with rows:
ID
VALUE
DATE
2827
0
20.07.2022 10:40:01
490
27432
20.07.2022 10:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 9:43:21
161
13004
19.07.2022 16:11:01
This table has a million records, there are about 1000 ID instances, only the date of the value change and, therefore, the value itself changes in them.
When the value of the ID changes is added to this table:
ID | Tme the value was changed (DATE) | VALUE
My task is to get the all id's values closest to the input date.
I mean: if I input date "20.07.2022 10:00:00"
I want to get each ID (1-1000) with rows "value, date" with last date before "20.07.2022 10:00:00":
ID
VALUE
DATE
2827
0
20.07.2022 9:59:11
490
27432
20.07.2022 9:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 8:43:21
161
13004
19.07.2022 16:11:01
What query will be the most optimal and correct in this case?
If you want the data for each ID with the latest change up to, but not after, your input date then you can just filter on that date, and use aggregate functions to get the most recent data in that filtered range:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= <your input date>
group by id
With your previous sample data, using midnight this morning as the input date would give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 00:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-24 13:01:00
1
4
2022-07-24 10:48:00
67
and using midday today woudl give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 12:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-28 11:59:00
12
4
2022-07-28 11:45:00
63
5
2022-07-28 10:20:00
55
db<>fiddle with some other input dates to show the result set changing.
I have this series (notice the holes in the dates):
Date
Value
2019-12-31
100
2020-01-02
110
2020-01-05
120
2020-01-07
125
2020-01-08
130
And I need to get this one:
Date
Value
2019-12-31
100
2020-01-01
100
2020-01-02
110
2020-01-03
110
2020-01-04
110
2020-01-05
120
2020-01-06
120
2020-01-07
125
2020-01-08
130
Notice that the rows with bold font didn't exist in the first table and the values are forward filled from the most recent value available.
To get this done:
I created a dummy calendar with the List.Dates() function.
I merged this calendar with the first table obtaining this:
Date
Value
2019-12-31
100
2020-01-01
null
2020-01-02
110
2020-01-03
null
2020-01-04
null
2020-01-05
120
2020-01-06
null
2020-01-07
125
2020-01-08
130
Then I created a function that took a date as a parameter which filtered the first table and with the function List.Last() took the last non-null value and placed it in the row of the third table instead of the null.
It works quite well, but I find it too slow. For each row the function must be called to scan the table for the most recent value available.
Is there a quicker way to perform this?
I want to create a stored procedure or function that returns a sum of the column values for the period depending on the step (day, month, year). For example, I have table with consumption data. It saves data every 15 minutes. I would like to get report for period from 2019-05-01 to 2019-05-10 with step '1 day'. I need to define a daily dataset for each of the days in this interval and get the sum of the values for each day.
Then the procedure returns data to Laravel. Based on this data charts are built.
My code for this moment:
CREATE OR REPLACE FUNCTION "public"."test"("meterid" int4, "started" text, "ended" text, "preiod" text)
RETURNS TABLE("_kwh" numeric, "datetime" timestamp) AS $BODY$BEGIN
RETURN QUERY
SELECT kwh, a_datetime
FROM "public"."consumption"
WHERE meter_id = meterid
AND a_datetime
BETWEEN to_timestamp(started, 'YYYY-MM-DD HH24:MI:SS')
AND to_timestamp(ended, 'YYYY-MM-DD HH24:MI:SS');
END$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000
I'm using PostgreSQL 10.7.
You can use pg_generate_series(start, end, interval)
More information in: set returning functions
To simulate your situation I created a simple table:
postgres=# create table consumption (kwh int, datetime date);
CREATE TABLE
postgres=# insert into consumption values (10, 2019-01-01);
ERROR: column "datetime" is of type date but expression is of type integer
postgres=# insert into consumption values (10, '2019-01-01');
INSERT 0 1
postgres=# insert into consumption values (2, '2019-01-03');
INSERT 0 1
postgres=# insert into consumption values (24, '2019-03-06');
INSERT 0 1
postgres=# insert into consumption values (30, '2019-03-22');
INSERT 0 1
And made the select with generate_series()
postgres=# SELECT COALESCE(SUM(kwh), 0) AS kwh,
period::DATE
FROM GENERATE_SERIES('2019-01-01','2019-12-31', '1 day'::interval) AS period
LEFT JOIN consumption ON period::DATE=datetime::DATE
GROUP BY 2
kwh | period
-----+------------
0 | 2019-04-17
0 | 2019-05-29
....
0 | 2019-04-06
0 | 2019-04-26
2 | 2019-01-03
0 | 2019-03-15
...
0 | 2019-11-21
0 | 2019-07-24
30 | 2019-03-22
0 | 2019-05-22
0 | 2019-11-19
...
Title says it all pretty much. What is the exact value that is assigned to a(n) a)Arithmetic b)String c)Logical field to represent NULL, in Oracle DBMS?
Thank you for your time!
Null is the absence of meaning, the absence of value. What gets assigned is null. Not even an ASCII null (ascii value 0) but nothing.
That's why there's a special operation to test for null . This will return false:
...
where col1 = null
We need to test for:
where col1 is null
"we were asked by a professor at uni to find what exactly that value is in these 3 respective cases"
Okay, let's investigate that. Here is a table with two rows:
SQL> create table t42 (
2 type varchar2(10)
3 , colv varchar2(10)
4 , coln number
5 , cold date
6 )
7 /
Table created.
SQL> insert into t42 values ('not null', 'X', 1, sysdate);
1 row created.
SQL> insert into t42 values ('all null', null, null, null);
1 row created.
SQL>Exp
Oracle has a function dump() which shows us the datatype and content of the passed value. Find out more.
What does dump() tell us about our two rows?
SQL> select type
2 , dump(colv) as colv
3 , dump(coln) as coln
4 , dump(cold) as cold
5 from t42;
TYPE COLV COLN COLD
---------- -------------------- -------------------- ----------------------------------
not null Typ=1 Len=1: 88 Typ=2 Len=2: 193,2 Typ=12 Len=7: 120,117,4,29,6,60,44
all null NULL NULL NULL
SQL>
So: the null columns have no data type, no value.
"I don't think dump is suitable for supporting any argument over what "exactly" gets stored to represent a null - because if the expression is null, it simply returns null by definition "
#JeffreyKemp makes a fair point. So let's dip a toe into the internals. The first step is to dump the data block(s);l the dump is written to a trace file:
SQL> conn / as sysdba
Connected.
USER is "SYS"
SQL> select dbms_rowid.rowid_relative_fno(t42.rowid) as fno
2 , dbms_rowid.rowid_block_number(t42.rowid) as blk
3 from a.t42
4 /
FNO BLK
-------- --------
11 132
11 132
SQL> alter system dump datafile 11 block 132;
System altered.
SQL> select value from v$diag_info where name = 'Default Trace File';
VALUE
--------------------------------------------------------------------------------
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3275.trc
SQL>
Because T42 is small it fits into only one block. Here is the interesting bit of the dump:
data_block_dump,data header at 0x805664
===============
tsiz: 0x1f98
hsiz: 0x16
pbl: 0x00805664
76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f73
avsp=0x1f5d
tosp=0x1f5d
0xe:pti[0] nrow=2 offs=0
0x12:pri[0] offs=0x1f7f
0x14:pri[1] offs=0x1f73
block_row_dump:
tab 0, row 0, #0x1f7f
tl: 25 fb: --H-FL-- lb: 0x1 cc: 4
col 0: [ 8] 6e 6f 74 20 6e 75 6c 6c
col 1: [ 1] 58
col 2: [ 2] c1 02
col 3: [ 7] 78 75 05 01 02 08 08
tab 0, row 1, #0x1f73
tl: 12 fb: --H-FL-- lb: 0x1 cc: 1
col 0: [ 8] 61 6c 6c 20 6e 75 6c 6c
end_of_block_dump
End dump data blocks tsn: 33 file#: 11 minblk 132 maxblk 132
We can see there are two rows in the table. The first row has entries for four columns; this is the 'not null' row. The second row has only one column: this is the 'all null' row. So, Jeffrey is quite right. All the trailing fields are null so Oracle stores nothing for them.
Answer from APC is fully right, let's give some information on "what does it mean":
Arithmetic: NULL basically means "not defined". Every math operation with NULL (i.e. "not defined") also returns NULL
String: NULL is an empty string, i.e. '' IS NULL returns TRUE - this behavior of Oracle is different to many others RDBMS.
Logical: I assume you mean what happens to BOOLEAN data types. Unlike almost any other programming language in PL/SQL a BOOLEAN variable can have three different states: TRUE, FALSE and NULL. Be aware of this special behavior when you work with BOOLEAN in PL/SQL.
In addition to #APC, in DB there is a something like 'ternary logic' in comparison operations, when we can say that value are equal, not equal and the third is "we don't know", cause of it value absence NULL, and even comparison of to NULL values gives the NULL, meaning that we have no information about operands
How do I filter rows those have values in all columns (i.e exclude the row if they have a missing value/null in any of the columns)
Say:
id name age height
------------------------
1 abc 19 NULL
2 fds 34 2.3
3 grt NULL NULL
Output should be only row2. How do I do this?