Complex procedure to adjust data continously - oracle

I solved this in SQL Server with a trigger. Now I face it on Oracle.
I have a big set of data that periodically increases with new items.
The item has these fundamental columns:
ID string identifier (not null)
DATETIME (not null)
optional (eventually null, always null for type 1) DATETIME_EMIS emission datetame equal to the DATETIME of the corresponding emission item
type (0 or 1)
value (only if type 1)
It is basically a logbook.
For example: An item with ID='FIREALARM' and datetime='2023-02-12 12:02' has closing like this:
ID='FIREALARM' in datetime='2023-02-12 15:11', emission datetime='2023-02-12 12:02' (equal to the emission item).
What I need is to obtain a final item in the destination table like this:
ID='FIREALARM' in DATETIME_BEGIN ='2023-02-12 12:02', DATETIME_END ='2023-02-12 15:11'
Not all the items have the closing datetime (the ones of Type=1 instead 0), in this case the next item should be use to close the previous one (with the problem of finding it). For example:
Item1:
ID='DEVICESTATUS', datetime='2023-02-12 22:11', Value='Broken' ;
Item2:
ID='DEVICESTATUS', datetime='2023-02-12 22:14', Value='Running'
Should result in
ID='DEVICESTATUS', DATETIME_BEGIN ='2023-02-12 22:11',DATETIME_END ='2023-02-12 22:14', Value='Broken'
The final data should be extracted by a select query as faster as possible.
The process of the elaboration should be independent from the order of inserting.
In SQL Server, I created a trigger with several operations which involve a temporary table, some queries on the inserted set and the entire destination table, so a complex procedure that is not worth to be shown to understand the problem.
Now I discovered that Oracle has some limitations and is not easy to port the trigger on it. For example is not easy to use a temporary table in the same way, and the operation are for each row.
I am asking what could be a good strategy in Oracle to elaborate the data in the final form considering that the set increase continuously and the open and the closure items must be reduce to a single item. I am not asking for a solution of the problem, I am trying to understand what could be the instrument in Oracle useful to achieve a complex elaboration like this. Thanks.

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching:
SELECT *
FROM destination
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY datetime
MEASURES
FIRST(datetime) AS datetime_begin,
LAST(datetime) AS datetime_end,
FIRST(value) AS value
PATTERN ( ^ any_row+ $ )
DEFINE
any_row AS 1 = 1
)
Which, for the sample data:
CREATE TABLE destination (id, datetime, value) AS
SELECT 'DEVICESTATUS', DATE '2023-02-12' + INTERVAL '22:11' HOUR TO MINUTE, 'Broken' FROM DUAL UNION ALL
SELECT 'DEVICESTATUS', DATE '2023-02-12' + INTERVAL '22:14' HOUR TO MINUTE, 'Running' FROM DUAL;
Outputs:
ID
DATETIME_BEGIN
DATETIME_END
VALUE
DEVICESTATUS
2023-02-12 22:11:00
2023-02-12 22:14:00
Broken
fiddle

Related

How to calculate longest period between two specific dates in SQL?

I have problem with the task which looks like that I have a table Warehouse containing a list of items that a company has on stock. This
table contains the columns ItemID, ItemTypeID, InTime and OutTime, where InTime (OutTime)
specifies the point in time where a respective item has entered (left) the warehouse. I have to calculate the longest period that the company has gone without an item entering or leaving the warehouse. I am trying to resolve it this way:
select MAX(OutTime-InTime) from Warehouse where OutTime is not null
Is my understanding correct? Because I believe that it is not ;)
You want the greatest gap between any two consecutive actions (item entering or leaving the warehouse). One method is to unpivot the in and out times to rows, then use lag() to get the date of the "previous" action. The final step is aggregation:
select max(x_time - lag_x_time) max_time_diff
from warehouse w
cross apply (
select x_time, lag(x.x_time) over(order by x.x_time) lag_x_time
from (
select w.in_time as x_time from dual
union all select w.out_time from dual
) x
) x
You can directly perform date calculation in oracle.
The result is calculated in days.
If you want to do it in hours, multiply the result by 24.
To calculate the duration in [day], and check all the information in the table:
SELECT round((OutTime - InTime)) as periodDay, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER BY periodDay DESC
To calculate the duration in [hour]:
SELECT round((OutTime - InTime)*24) AS periodHour, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER periodHour DESC
round() is used to remove the digits.
Select only the record with maximum period.
SELECT *
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
Select only the record with maximum period, with the period indicated.
SELECT (OutTime - InTime) AS period, Warehouse.*
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
When finding the longest period, the condition where OutTime is null is not needed.
SQL Server has DateDiff, Oracle you can just take one date away from the other.
The code looks ok. Oracle has a live SQL tool where you can test out queries in your browser that should help you.
https://livesql.oracle.com/

Force Oracle to process one row at a time

I have a query that in the select statement uses a custom built function to return one of the values.
The problem I have is every now and then this function will error out because it returns more than one row of information. SQL Error: ORA-01422: exact fetch returns more than requested number of rows
To further compound the issue I have checked the table data within the range that this query should be running and can't find any rows that would duplicate based on the where clause of this Function.
So I would like a quick way to identify on which Row of the original query this crashes so that I can take the values from that query that would be passed into the function and rebuild the Functions query with these values to get it's result and see which two or more rows are returned.
Any ideas? I was hoping there could be a way to force Oracle to process one row at a time until it errors so you can see the results UP to the first error.
Added the code:
FUNCTION EFFPEG
--Returns Effective Pegged Freight given a Effdate, ShipTo, Item
DATE1 IN NUMBER -- Effective Date (JULIANDATE)
, SHAN IN NUMBER -- ShipTo Number (Numeric)
, ITM IN NUMBER -- Short Item Number (Numeric)
, AST IN VARCHAR -- Advance Pricing type (varchar)
, MCU IN VARCHAR Default Null --ShipFrom Plant (varchar)
) RETURN Number
IS
vReturn Number;
BEGIN
Select ADFVTR/10000
into vReturn
from PRODDTA.F4072
where ADEFTJ <= DATE1
and ADEXDJ >= DATE1
and ADAN8 = SHAN and ADITM = ITM
and TRIM(ADAST) = TRIM(AST)
and ADEXDJ = (
Select min(ADEXDJ) ADEXDJ
from PRODDTA.F4072
where ADEFTJ <= DATE1
and ADEXDJ >= DATE1
and ADAN8 = SHAN
and ADITM = ITM
and TRIM(ADAST) = TRIM(AST));
Query that calls this code and passes in the values is:
select GLEXR, ORDTYPE,
EFFPEG(SDADDJ, SDSHAN, SDITM, 'PEGFRTT', SDMCU),
from proddta.F42119
I think the best way to do it is trough Exceptions.
What you need to do is to add the code to handle many rows exception in your function:
EXCEPTION
WHEN TOO_MANY_ROWS THEN
INSERT INTO ERR_TABLE
SELECT your_columns
FROM query_that_sometimes_returns_multiple_rows
In this example the doubled result will go to separated table or you can decide to simply print out with dbms_output.
An easy page to start can be this, then just google exception and you should be able to find all you need.
Hope this can help.

Synchronizing two tables using a stored procedure, only updating and adding rows where values do not match

The scenario
I've got two tables with identical structure.
TABLE [INFORMATION], [SYNC_INFORMATION]
[ITEM] [nvarchar](255) NOT NULL
[DESCRIPTION] [nvarchar](255) NULL
[EXTRA] [nvarchar](255) NULL
[UNIT] [nvarchar](2) NULL
[COST] [float] NULL
[STOCK] [nvarchar](1) NULL
[CURRENCY] [nvarchar](255) NULL
[LASTUPDATE] [nvarchar](50) NULL
[IN] [nvarchar](4) NULL
[CLIENT] [nvarchar](255) NULL
I'm trying to create a synchronize procedure that will be triggered by a scheduled event at a given time every day.
CREATE PROCEDURE [dbo].[usp_SynchronizeInformation]
AS
BEGIN
SET NOCOUNT ON;
--Update all rows
UPDATE TARGET_TABLE
SET TARGET_TABLE.[DESCRIPTION] = SOURCE_TABLE.[DESCRIPTION],
TARGET_TABLE.[EXTRA] = SOURCE_TABLE.[EXTRA],
TARGET_TABLE.[UNIT] = SOURCE_TABLE.[UNIT],
TARGET_TABLE.[COST] = SOURCE_TABLE.[COST],
TARGET_TABLE.[STOCK] = SOURCE_TABLE.[STOCK],
TARGET_TABLE.[CURRENCY] = SOURCE_TABLE.[CURRENCY],
TARGET_TABLE.[LASTUPDATE] = SOURCE_TABLE.[LASTUPDATE],
TARGET_TABLE.[IN] = SOURCE_TABLE.[IN],
TARGET_TABLE.[CLIENT] = SOURCE_TABLE.[CLIENT]
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
--Add new rows
INSERT INTO SYNC_INFORMATION (ITEMNO, DESCRIPTION, EXTRA, UNIT, STANDARDCOST, STOCKTYPE, CURRENCY_ID, LASTSTANDARDUPDATE, IN_ID, CLIENTCODE)
SELECT
src.ITEM,
src.DESCRIPTION,
src.EXTRA,
src.UNIT,
src.COST,
src.STOCKTYPE,
src.CURRENCY_ID,
src.LASTUPDATE,
src.IN,
src.CLIENT
FROM LSERVER.dbo.INFORMATION src
LEFT JOIN SYNC_INFORMATION targ ON src.ITEMNO = targ.ITEMNO
WHERE
targ.ITEMNO IS NULL
END
Currently, this procedure (including some others that are also executed at the same time) takes about 15 seconds to execute.
I'm planning on adding a "Synchronize" button in my work interface so that users can manually synchronize when, for instance, a new item is added and needs to be used the same day.
But in order for me to do that, I need to trim those 15 seconds as much as possible.
Instead of updating every single row, like in my procedure, is it possible to only update rows that have values that does not match?
This would greatly increase the execution speed, since it doesn't have to update all the 4000 rows when maybe only 20 actually needs it.
Can this be done in a better way, or optimized?
Does it need improvements, if yes, where?
How would you solve this?
Would also appreciate some time differences between the solutions so I can compare them.
UPDATE
Using marc_s's CHECKSUM is really brilliant. The problem is that in some instances the information creates the same checksum. Here's an example, due to the classified content, I can only show you 2 columns, but I can say that all columns have identical information except these 2. To clarify: this screenshot is of all the rows that had duplicate CHECKSUMs. These are also the only rows with a hyphen in the ITEM column, I've looked.
The query was simply
SELECT *, CHECKSUM(*) FROM SYNC_INFORMATION
If you can change the table structure ever so slightly - you could add a computed CHECKSUM column to your two tables, and in the case the ITEM is identical, you could then check that checksum column to see if there are any differences at all in the columns of the table.
If you can do this - try something like this here:
ALTER TABLE dbo.[INFORMATION]
ADD CheckSumColumn AS CHECKSUM([DESCRIPTION], [EXTRA], [UNIT],
[COST], [STOCK], [CURRENCY],
[LASTUPDATE], [IN], [CLIENT]) PERSISTED
Of course: only include those columns that should be considered when making sure whether a source and a target row are identical ! (this depends on your needs and requirements)
This persists a new column to your table, which is calculated as the checksum over the columns specified in the list of arguments to the CHECKSUM function.
This value is persisted, i.e. it could be indexed, too! :-O
Now, you could simplify your UPDATE to
UPDATE TARGET_TABLE
SET ......
FROM SYNC_INFORMATION TARGET_TABLE
JOIN LSERVER.dbo.INFORMATION SOURCE_TABLE ON TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
WHERE
TARGET_TABLE.ITEMNO = SOURCE_TABLE.ITEMNO
AND TARGET_TABLE.CheckSumColumn <> SOURCE_TABLE.CheckSumColumn
Read more about the CHECKSUM T-SQL function on MSDN!

How to? Correct sql syntax for finding the next available identifier

I think I could use some help here from more experienced users...
I have an integer field name in a table, let's call it SO_ID in a table SO, and to each new row I need to calculate a new SO_ID based on the following rules
1) SO_ID consists of 6 letters where first 3 are an area code, and the last three is the sequenced number within this area.
309001
309002
309003
2) so the next new row will have a SO_ID of value
309004
3) if someone deletes the row with SO_ID value = 309002, then the next new row must recycle this value, so the next new row has got to have the SO_ID of value
309002
can anyone please provide me with either a SQL function or PL/SQL (perhaps a trigger straightaway?) function that would return the next available SO_ID I need to use ?
I reckon I could get use of keyword rownum in my sql, but the follwoing just doens't work properly
select max(so_id),max(rownum) from(
select (so_id),rownum,cast(substr(cast(so_id as varchar(6)),4,3) as int) from SO
where length(so_id)=6
and substr(cast(so_id as varchar(6)),1,3)='309'
and cast(substr(cast(so_id as varchar(6)),4,3) as int)=rownum
order by so_id
);
thank you for all your help!
This kind of logic is fraught with peril. What if two sessions calculate the same "next" value, or both try to reuse the same "deleted" value? Since your column is an integer, you'd probably be better off querying "between 309001 and 309999", but that begs the question of what happens when you hit the thousandth item in area 309?
Is it possible to make SO_ID a foreign key to another table as well as a unique key? You could pre-populate the parent table with all valid IDs (or use a function to generate them as needed), and then it would be a simple matter to select the lowest one where a child record doesn't exist.
well, we came up with this... sort of works.. concurrency is 'solved' via unique constraint
select min(lastnumber)
from
(
select so_id,so_id-LAG(so_id, 1, so_id) OVER (ORDER BY so_id) AS diff,LAG(so_id, 1, so_id) OVER (ORDER BY so_id)as lastnumber
from so_miso
where substr(cast(so_id as varchar(6)),1,3)='309'
and length(so_id)=6
order by so_id
)a
where diff>1;
Do you really need to compute & store this value at the time a row is inserted? You would normally be better off storing the area code and a date in a table and computing the SO_ID in a view, i.e.
SELECT area_code ||
LPAD( DENSE_RANK() OVER( PARTITION BY area_code
ORDER BY date_column ),
3,
'0' ) AS so_id,
<<other columns>>
FROM your_table
or having a process that runs periodically (nightly, for example) to assign the SO_ID using similar logic.
If your application is not pure sql, you could do this in application code (ie: Java code). This would be more straightforward.
If you are recycling numbers when rows are deleted, your base table must be consulted when generating the next number. "Legacy" pre-relational schemes that attempt to encode information in numbers are a pain to make airtight when numbers must be recycled after deletes, as you say yours must.
If you want to avoid having to scan your table looking for gaps, an after-delete routine must write the deleted number to a separate table in a "ReuseMe" column. The insert routine does this:
begins trans
selects next-number table for update
uses a reuseme number if available else uses the next number
clears the reuseme number if applicable or increments the next-number in the next-number table
commits trans
Ignoring the issues about concurrency, the following should give a decent start.
If 'traffic' on the table is low enough, go with locking the table in exclusive mode for the duration of the transaction.
create table blah (soc_id number(6));
insert into blah select 309000 + rownum from user_tables;
delete from blah where soc_id = 309003;
commit;
create or replace function get_next (i_soc in number) return number is
v_min number := i_soc* 1000;
v_max number := v_min + 999;
begin
lock table blah in exclusive mode;
select min(rn) into v_min
from
(select rownum rn from dual connect by level <= 999
minus
select to_number(substr(soc_id,4))
from blah
where soc_id between v_min and v_max);
return v_min;
end;

Oracle optimizing query involving date calculation

Database
Table1
Id
Table2Id
...
Table2
Id
StartTime
Duration //in hours
Query
select * from Table1 join Table2 on Table2Id = Table2.Id
where starttime < :starttime and starttime + Duration/24 > :endtime
This query is currently taking about 2 seconds to run which is too long. There is an index on the id columns and a function index on Start_time+duration/24 In Sql Developer the query plan shows no indexes being used. The query returns 475 rows for my test start and end times. Table2 has ~800k rows Table1 has ~200k rows
If the duration/24 calculation is removed from the query, replaced with a static value the query time is reduced by half. This does not retrieve the exact same data, but leads me to believe that the division is expensive.
I have also tested adding an endtime column to Table2 that is populated with (starttime + duration/24) The column was prepopulated via a single update, if it would be used in production I would populate it via an update trigger.
select * from Table1 join Table2 on Table2Id = Table2.Id
where starttime < :starttime and endtime > :endtime
This query will run in about 600ms and it uses an index for the join. It is less then ideal because of the additional column with redundant data.
Are there any methods of making this query faster?
Create a function index on both starttime and the expression starttime + Duration/24:
create index myindex on table2(starttime, starttime + Duration / 24);
A compound index on the entire predicate of your query should be selected, whereas individually indexed the optimizer is likely deciding that repeated table accesses by rowid based on a scan of one of those indexes is actually slower than a full table scan.
Also make sure that you're not doing an implicit conversion from varchar to date, by ensuring that you're passing DATEs in your bind variables.
Try lowering the optimizer_index_cost_adj system parameter. I believe the default is 100. Try setting that to 10 and see if your index is selected.
Consider partitioning the table by starttime.
You have two criteria with range predicates (greater than/less than). An index range scan can start at one point in the index and end at another.
For a compound index on starttime and "Starttime+duration/24", since the leading column is starttime and the predicate is "less than bind value", it will start at the left most edge of the index (earliest starttime) and range scan all rows up to the point where the starttime reaches the limit. For each of those matches, it can evaluate the calculated value for "Starttime+duration/24" on the index against the bind value and pass or reject the row. I'd suspect most of the data in the table is old, so most entries have an old starttime and you'd end up scanning most of the index.
For a compound index on "Starttime+duration/24" and starttime, since the leading column is the function and the predicate is "greater than bindvalue", it will start partway through the index and work its way to the end. For each of those matches, it can evaluate the starttime on the index against the bind value and pass or reject the row. If the enddate passed in is recent, I suspect this would actually involve a much smaller amount of the index being scanned.
Even without the starttime as a second column on the index, the existing function based index on "Starttime+duration/24" should still be useful and used. Check the explain plan to make sure the bindvalue is either a date or converted to a date. If it is converted, make sure the appropriate format mask is used (eg an entered value of '1/Jun/09' may be converted to year 0009, so Oracle will see the condition as very relaxed and would tend not to use the index - plus the result could be wrong).
"In Sql Developer the query plan shows no indexes being used. " If the index wasn't being used to find the table2 rows, I suspect the optimizer thought most/all of table2 would be returned [which it obviously isn't, by your numbers]. I'd guess that it though most of table1 would be returned, and thus neither of your predicates did a lot of filtering. As I said above, I think the "less than" predicate isn't selective, but the "greater than" should be. Look at the explain plan, especially the ROWS value, to see what Oracle thinks
PS.
Adjusting the value means the optimizer changes the basis for its estimates. If a journey planner says you'll take six hours for a trip because it assumes an average speed of 50, if you tell it to assume an average of 100 it will comes out with three hours. it won't actually affect the speed you travel at, or how long it takes to actually make the journey.
So you only want to change that value to make it more accurately reflect the actual value for your database (or session).
Oracle would not use indexes if the selectivity of the where clause is not very good. Index would be used if the number of rows returned would be some percentage of the total number of rows in the table (the percentage varies, since oracle will count the cost of reading the index as well as reading the tables).
Also, when the index columns are modified in where clause, the index would get disabled. For example, UPPERCASE(some_index_column), would disable the usage of the index on some_index_column. This is why starttime + Duration/24 > :endtime does not use the Index.
Can you try this
select * from Table1 join Table2 on Table1.Id = Table2.Table1Id
where starttime < :starttime and starttime > :endtime - Duration/24
This should allow the use of the Index and there is no need for an additional column.

Resources