Get closest date with id and value Oracle - oracle

I ran into a problem and maybe there are experienced guys here to help me figure it out:
I have a table with rows:
ID
VALUE
DATE
2827
0
20.07.2022 10:40:01
490
27432
20.07.2022 10:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 9:43:21
161
13004
19.07.2022 16:11:01
This table has a million records, there are about 1000 ID instances, only the date of the value change and, therefore, the value itself changes in them.
When the value of the ID changes is added to this table:
ID | Tme the value was changed (DATE) | VALUE
My task is to get the all id's values closest to the input date.
I mean: if I input date "20.07.2022 10:00:00"
I want to get each ID (1-1000) with rows "value, date" with last date before "20.07.2022 10:00:00":
ID
VALUE
DATE
2827
0
20.07.2022 9:59:11
490
27432
20.07.2022 9:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 8:43:21
161
13004
19.07.2022 16:11:01
What query will be the most optimal and correct in this case?

If you want the data for each ID with the latest change up to, but not after, your input date then you can just filter on that date, and use aggregate functions to get the most recent data in that filtered range:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= <your input date>
group by id
With your previous sample data, using midnight this morning as the input date would give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 00:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-24 13:01:00
1
4
2022-07-24 10:48:00
67
and using midday today woudl give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 12:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-28 11:59:00
12
4
2022-07-28 11:45:00
63
5
2022-07-28 10:20:00
55
db<>fiddle with some other input dates to show the result set changing.

Related

How to find the earliest date of the occurrence of a value for each year

I have a table with this structure:
STATION ID
YEAR
MONTH
DAY
RECDATE
VALUE
123456
1950
01
01
01-01-1950
95
123456
1950
01
15
01-15-1950
85
123456
1950
03
15
03-15-1950
95
123456
1951
01
02
01-02-1951
35
123456
1951
01
10
01-10-1951
35
123456
1952
02
12
02-12-1952
80
123456
1952
02
13
02-13-1952
80
And so on. There's a TMIN value for this station ID for every day of every year between 1888 and 2022. What I'm trying to figure out is a query that will give me the earliest date in each year that a value between -100 and 100 occurs.
The query select year, max(value) from table where value between -100 and 100 group by year order by year gives the year and value. The query select recdate, min(value) from table group by recdate order by recdate gives me every recdate with the value.
I have a vague memory of a query that practically partitions the data by a year or a date range so that the query would look at all the 1950 dates and give the earliest date for the value, then all the 1951 dates, and so on. Does anyone remember queries like that?
Thanks for any and all suggestions.
If I understood you correctly, this is your question:
What I'm trying to figure out is a query that will give me the earliest date in each year that a value between -100 and 100 occurs.
Then you posted 2 queries which return something, but I don't see relation to the question. What was their purpose? To me, they look like some random queries one could write against data in that table.
Therefore, back to the question: isn't that just
select min(recdate), --> "earliest date
year --> in each year
from that_table -- that a
where value between -100 and 100 --> value between -100 and 100 occurs"
group by year

Replace null values with the most recent value in a time series

I have this series (notice the holes in the dates):
Date
Value
2019-12-31
100
2020-01-02
110
2020-01-05
120
2020-01-07
125
2020-01-08
130
And I need to get this one:
Date
Value
2019-12-31
100
2020-01-01
100
2020-01-02
110
2020-01-03
110
2020-01-04
110
2020-01-05
120
2020-01-06
120
2020-01-07
125
2020-01-08
130
Notice that the rows with bold font didn't exist in the first table and the values are forward filled from the most recent value available.
To get this done:
I created a dummy calendar with the List.Dates() function.
I merged this calendar with the first table obtaining this:
Date
Value
2019-12-31
100
2020-01-01
null
2020-01-02
110
2020-01-03
null
2020-01-04
null
2020-01-05
120
2020-01-06
null
2020-01-07
125
2020-01-08
130
Then I created a function that took a date as a parameter which filtered the first table and with the function List.Last() took the last non-null value and placed it in the row of the third table instead of the null.
It works quite well, but I find it too slow. For each row the function must be called to scan the table for the most recent value available.
Is there a quicker way to perform this?

How to find Max value of an alphanumeric field in oracle?

I have the data as below and ID is VARCHAR2 type
Table Name :EMP
ID TST_DATE
A035 05/12/2015
BAB0 05/12/2015
701 07/12/2015
81 07/12/2015
I used below query to get max of ID group by TST_DATE.
SELECT TST_DATE,MAX(ID) from EMP group by TST_DATE;
TST_DATE MAX(ID)
05/12/2015 BAB0
07/12/2015 81
In the second row it returning 81 instead of 701.
To sort strings that represent (hex) numbers in numeric, rather than lexicographical, order you need to convert them to actual numbers:
SELECT TST_DATE, ID, TO_NUMBER(ID, 'XXXXXXXXXX') from EMP
ORDER BY TO_NUMBER(ID, 'XXXXXXXXXX');
TST_DATE ID TO_NUMBER(ID,'XXXXXXXXXX')
---------- ---- ---------------------------------------
07/12/2015 81 129
07/12/2015 701 1793
05/12/2015 A035 41013
05/12/2015 BAB0 47792
You can use that numeric form within your max() and convert back to a hex string for display:
SELECT TST_DATE,
TO_CHAR(MAX(TO_NUMBER(ID, 'XXXXXXXXXX')), 'XXXXXXXXXX')
from EMP group by TST_DATE;
TST_DATE TO_CHAR(MAX
---------- -----------
07/12/2015 701
05/12/2015 BAB0
With a suitable number of Xs in the format models of course; how many depends on the size of your varchar2 column.

PIG Script How to

I am trying clean up this employee volunteer data. There is no way to track if employee already is registered volunteer so he can sign up as new volunteer and will get a new VOLUNTEER_ID. I have a data feeding into where i can tie each VOLUNTEER_ID to its EMP_ID. The volunteer data needs to be cleaned up so we can figure out how the employee moved from a volunteer_level to another and when.
The business logic is that, when there is a overlaping dates, we give the highest level to the employee for the timeframe of between start_date and end_date.
I posted a Input sample of data and what the output should be.
Is it possible to do this a PIG script ? Can someone please help me
INPUT:
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 200 1 A 5/1/2006
10001 100 1 A 1/1/2008
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
10001 300 2 A 12/2/2008
OUTPUT NEEDED:( VOLUNTEER_ID is not needed in output but adding below to show which ID was selected for output and which did not)
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
It seems like you want the row in your data with the earliest start date for each V_LEVEL, STATUS, EMP_ID, and VOLUNTEER_ID
First we add a unix time column and then find the min for that column (this is in the latest version of pig so you may need to update your version).
data_with_unix = foreach data generate EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS, START_DATE, END_DATE, ToUnixTime((datetime)START_DATE) as unix_time;
grp = group data_with_unix by (EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS);
max_date = foreach grp generate group, MIN(data_with_unix.unix_time);
Then join the start and end date back into your dataset since there it doesn't look like there is currently a way to convert unix time back to date.

How to consolidate overlap date in single

How would I get one row out of two overlaping dates rows from the same table for same id. I have more then 50000 records.
I have following sample data.
ID start_dt end_dt division
1212 04/01/2006 03/01/2007 second
1212 05/01/2009 01/01/2010 second
1212 04/01/2006 03/01/2008 second --- This should be selected as longest timeframe
1212 09/03/2007 03/01/2008 third
1213 05/03/2005 04/11/2009 second
1214 07/03/2007 03/01/2008 third
And the data I should get as following.
ID start_dt end_dt division
1212 04/01/2006 03/01/2008 second
1212 05/01/2009 01/01/2010 second
1213 05/03/2005 04/11/2009 second
1214 07/03/2007 03/01/2008 third
Thank you.
Ramu
Now that I understand your issue, just subtract the 2 dates to determine the time frame difference:
SELECT S.Id, S.Start_dt, S.End_dt, S.Division
FROM Sample S
JOIN (
SELECT S.Id, Max(S.end_dt-S.start_dt) as timeframe
FROM Sample S
GROUP BY S.Id ) S2 ON S.Id = S2.Id
AND S.end_dt-S.start_dt = s2.timeframe
Here is the Fiddle.
Good luck.

Resources