Oracle SQL, How do you compare a timestamp and take the data point before it? - oracle

I have two tables. The parts of the table I care about look more or less like this.
CAUSEDATE |
____________________________________|
ID | Timestamp |
1 | 01-JAN-15 07.00.01.163000000 |
2 | 01-JAN-15 07.00.30.023000000 |
3 | 01-JAN-15 07.01.01.293000000 |
5 | 01-JAN-15 07.01.11.153000000 |
6 | 01-JAN-15 07.02.01.523000000 |
EVENTS |
___________________________________________________|
ID | Timestamp | INFO |
101 | 01-JAN-15 07.00.01.123000000 | Ker |
102 | 01-JAN-15 07.00.01.233000000 | Bal |
103 | 01-JAN-15 07.00.01.323000000 | Spa |
105 | 01-JAN-15 07.00.01.553000000 | CeP |
106 | 01-JAN-15 07.00.01.633000000 | rog |
I want to match the timestamp in EVENTS to the timestamp in CAUSEDATE, so that when I pull ID = 1 from CAUSEDATE by its timestamp, It'll match with ID 101 in CAUSEDATE, but not any of the ones that take place afterwards within the same second. I'm only interested in the first result, and not the ones afterwards.
It's pretty variable whether "EVENTS" registers a millisecond after, or ten milliseconds, or sometimes a hundred, And in some cases it can be more then second. So what I'm looking for is a solution that looks at the timestamp in CAUSEDATE, then looks what timestamp in EVENTS takes place right before it (so it would presumably be the event that triggered the "Cause").
I've tried using TRUNC(CAUSEDATE.Timestamp, 'MI') = TRUNC(EVENTS.Timestamp, 'MI') but this is way too granular, and will return too much irrelevant information. There's no option to use 'SS' and even then, that wouldn't grab the data I need if it's entered a little late. Like, when it's comparing 01.993000000 with 02.006000000.
How can I retrieve the instance from EVENTS.info that is the first one before the timestamp in CAUSEDATE? So, it would give me back "Ker" as the cause of ID=1, and not "Bal"
I'm sorry for the lengthy explanation. I hope I have made my problem clear enough.
EDIT: Nearly forgot an important part.
I've thought of converting the timestamps to floats.
What I did, was use the following function
create or replace FUNCTION oracle_to_unix
(
in_date IN DATE)
RETURN NUMBER
IS
BEGIN
RETURN (in_date -TO_DATE('19700101','yyyymmdd'))*86400 - TO_NUMBER(SUBSTR(TZ_OFFSET(sessiontimezone),1,3))*3600;
END;
But this only finds the events that happen to synch up perfectly with the cause time. I also want the ones that weren't logged at the perfect same time.

If I understand correctly, then ...
select *
from events e1
join causes c on (e1.timestamp = (select max(timestamp)
from events e2
where e2.timestamp < c.timestamp))

Related

Oracle v19c: hierarchical searches take forever and a day

We are working on a ticketing system ("tickets" in the sense of IT tickets where we capture issues or customer contacts, not tickets that allow entry to some event).
These tickets can have references to some other (parent-)ticket, such that these tickets can form a tree. The tree depth (or height?) is typically rather low. Maybe 20% of the tickets have 1, 2 or maybe up to 3 ancestors.
Tickets that have no ancestor (i.e. where "TICKET_VORGAENGER_ID" = null / "Vorgaenger" means "ancestor" in German) are called root tickets.
We have a query that searches the root ticket for any give ticket-id.
The SQL reads like so:
SELECT CONNECT_BY_ROOT TICKET_ID AS TICKET_ID
FROM TICKET
WHERE TICKET_ID = :ticketId
START WITH TICKET_VORGAENGER_ID IS NULL
CONNECT BY PRIOR TICKET_ID = TICKET_VORGAENGER_ID;
:ticketId is the id of the ticket for which to search its root ancestor.
Both columns, TICKET_ID (PK) and TICKET_VORGAENGER_ID have defined indices.
For some odd reason these queries take on average 40 and can take up to 80(!) seconds on our DB.
Currently we have about 3.5 million tickets in the DB and the DB server is quite potent.
Why on earth are these queries taking so long? What's wrong with it? It was taken more or less 1:1 from an Oracle example page.
The query's execution plan looks like so:
Executionplan
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6390K| 158M| 22765 (62)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 3 | TABLE ACCESS FULL | TICKET | 3654K| 31M| 8877 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TICKET_ID"=TO_NUMBER(:1))
2 - access("TICKET_VORGAENGER_ID"=PRIOR "TICKET_ID")
filter("TICKET_VORGAENGER_ID" IS NULL)
What's puzzling me: why is there a "TABLE ACCESS FULL" here, if we have indices on all columns mentioned in this query?
Any idea or advise how to speed this up anyone?
If :ticketId is the root then "START WITH TICKET_ID = :ticketId" and no need of the WHERE clause that will not use any index on ticket_id alone anyway since you are looking for NULL and NULLs are not in the index, that should trigger the usage of the correct index.

power bi dax - previous quarter value?

I have a simple table with data in the following format:
category | score | date
cat1 | 3 | 31/3/2019
cat2 | 9 | 31/3/2019 Q1 data
cat3 | 7 | 31/3/2019
...
cat1 | 6 | 30/6/2019
cat2 | 4 | 30/6/2019 Q2 data etc.
cat3 | 1 | 30/6/2019
Basically, I have many rows for quarterly data (scores for different categories) where the date column references the actual quarter. I have a chart where I'm showing the values from the latest quarter (most recent data), but I need a column to give me previous quarter's score. I found out about PREVIOUSQUARTER, which looked like an easy trick, but it returns blanks.
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER(data[date]))
Can someone tell me what I'm doing wrong? I tried creating a date table, with continuous dates between the first and the last date of my column, it didn't help. No other time intelligence function seems to return anything, so I guess it's something generic. I tried the documentation, but it doesn't mention any limitation. What I'm looking for is:
category | score | date | prevQtr
cat1 | 3 | 31/3/2019 |
cat2 | 9 | 31/3/2019 |
cat3 | 7 | 31/3/2019 |
...
cat1 | 6 | 30/6/2019 | 3
cat2 | 4 | 30/6/2019 | 9
cat3 | 1 | 30/6/2019 | 7
Thanks
Screenshots:
Okay, three things here.
You need to reference your date dimension date column for the built in time intelligence functions. (Note, this also implies you should only use date fields from the date dimension. Hide 'data'[date] in your fact so you aren't tempted to use it.)
Since you're adding this as a calculated column, you're going to have to do some extra context manipulation.
Since there is no context being contributed by the date dimension here (you're only in row context of the fact table), you can't use a built-in time intelligence function effectively here.
So! My suggestion is to define your PrevQtr as a measure, rather than a column, in which case it will work with the small refactoring below:
PrevQtr =
CALCULATE (
SUM ( 'Data'[Score] ),
PREVIOUSQUARTER ( 'dateTable'[Date] ) // just referencing the date table here
)
If you must create a calculated column for this, and your data all follows the pattern in your sample of having values on the last day of the quarter, then I think this is the best bet for it:
PrevQtrColumn =
VAR CurrentRowDate = 'Data'[Date]
VAR PriorQuarterEnd = EOMONTH ( CurrentRowDate, -3 )
RETURN
CALCULATE (
SUM ( 'Data'[Score] ),
ALLEXCEPT ( 'Data', 'Data'[Category] ),
'Data'[Date] = PriorQuarterEnd
)
The big gotcha there is covered by the ALLEXCEPT. CALCULATE converts all row context to filter context, so your 'Data'[Score] values become filter context. ALLEXCEPT clears that, preserving only the [Category].
I would recommend just using a measure to achieve this, BUT.. your measure:
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER(data[date]))
Needs to reference your date table:
prevQtr = CALCULATE(SUM(data[score]), PREVIOUSQUARTER('dateTable'[date]))
This modification should do it.

How to select nth row in CockroachDB?

If I use something like a SERIAL (which is a random number) for my table's primary key, how can I select a numbered row from my table? In MySQL, I just use the auto incremented ID to select a specific row, but not sure how to approach the problem with an arbitrary numbering sequence.
For reference, here is the table I'm working with:
+--------------------+------+-------+
| id | name | score |
+--------------------+------+-------+
| 235451721728983041 | ABC | 1000 |
| 235451721729015809 | EDF | 1100 |
| 235451721729048577 | GHI | 1200 |
| 235451721729081345 | JKL | 900 |
+--------------------+------+-------+
Using the LIMIT and OFFSET clauses will return the nth row. For example SELECT * FROM tbl ORDER BY col1 LIMIT 1 OFFSET 9 returns the 10th row.
Note that it’s important to include the ORDER BY clause here because you care about the order of the results (if you don’t include ORDER BY, it’s possible that the results are arbitrarily ordered).
If you care about the order in which things were inserted, you could ORDER BY the SERIAL column (id in your case), though it’s not always the case because transaction contention and other things could cause the generated SERIAL values to not be strictly ordered.

ORDER BY subquery and ROWNUM goes against relational philosophy?

Oracle 's ROWNUM is applied before ORDER BY. In order to put ROWNUM according to a sorted column, the following subquery is proposed in all documentations and texts.
select *
from (
select *
from table
order by price
)
where rownum <= 7
That bugs me. As I understand, table input into FROM is relational, hence no order is stored, meaning the order in the subquery is not respected when seen by FROM.
I cannot remember the exact scenarios but this fact of "ORDER BY has no effect in the outer query" I have read more than once. Examples are in-line subqueries, subquery for INSERT, ORDER BY of PARTITION clause, etc. For example in
OVER (PARTITION BY name ORDER BY salary)
the salary order will not be respected in outer query, and if we want salary to be sorted at outer query output, another ORDER BY need to be added in the outer query.
Some insights from everyone on why the relational property is not respected here and order is stored in the subquery ?
The ORDER BY in this context is in effect Oracle's proprietary syntax for generating an "ordered" row number on a (logically) unordered set of rows. This is a poorly designed feature in my opinion but the equivalent ISO standard SQL ROW_NUMBER() function (also valid in Oracle) may make it clearer what is happening:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, *
from table
) t
where rn <= 7;
In this example the ORDER BY goes where it more logically belongs: as part of the specification of a derived row number attribute. This is more powerful than Oracle's version because you can specify several different orderings defining different row numbers in the same result. The actual ordering of rows returned by this query is undefined. I believe that's also true in your Oracle-specific version of the query because no guarantee of ordering is made when you use ORDER BY in that way.
It's worth remembering that Oracle is not a Relational DBMS. In common with other SQL DBMSs Oracle departs from the relational model in some fundamental ways. Features like implicit ordering and DISTINCT exist in the product precisely because of the non-relational nature of the SQL model of data and the consequent need to work around keyless tables with duplicate rows.
Not surprisingly really, Oracle treats this as a bit of a special case. You can see that from the execution plan. With the naive (incorrect/indeterminate) version of the limit that crops up sometimes, you get SORT ORDER BY and COUNT STOPKEY operations:
select *
from my_table
where rownum <= 7
order by price;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=7)
If you just use an ordered subquery, with no limit, you only get the SORT ORDER BY operation:
select *
from (
select *
from my_table
order by price
);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
| 2 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
With the usual subquery/ROWNUM construct you get something different,
select *
from (
select *
from my_table
order by price
)
where rownum <= 7;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 13 | 3 (34)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 13 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=7)
3 - filter(ROWNUM<=7)
The COUNT STOPKEY operation is still there for the outer query, but the inner query (inline view, or derived table) now has a SORT ORDER BY STOPKEY instead of the simple SORT ORDER BY. This is all hidden away in the internals so I'm speculating, but it looks like the stop key - i.e. the row number limit - is being pushed into the subquery processing, so in effect the subquery may only end up with seven rows anyway - though the plan's ROWS value doesn't reflect that (but then you get the same plan with a different limit), and it still feels the need to apply the COUNT STOPKEY operation separately.
Tom Kyte covered similar ground in an Oracle Magazine article, when talking about "Top- N Query Processing with ROWNUM" (emphasis added):
There are two ways to approach this:
- Have the client application run that query and fetch just the first N rows.
- Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.
The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do.
... and then goes on to outline what it's actually doing, rather more authoritatively than I can.
Interestingly I don't think the order of the final result set is actually guaranteed; it always seems to work, but arguably you should still have an ORDER BY on the outer query too to make it complete. It looks like the order isn't really stored in the subquery, it just happens to be produced like that. (I very much doubt that will ever change as it would break too many things; this ends up looking similar to a table collection expression which also always seems to retain its ordering - breaking that would stop dbms_xplan working though. I'm sure there are other examples.)
Just for comparison, this is what the ROW_NUMBER() equivalent does:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, my_table.*
from my_table
) t
where rn <= 7;
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 2 | 52 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 2 | 26 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | MY_TABLE | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"<=7)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "PRICE")<=7)
Adding to sqlvogel's good answer :
"As I understand, table input into FROM is relational"
No, table input into FROM is not relational. It is not relational because "table input" are tables and tables are not relations. The myriads of quirks and oddities in SQL eventually all boil down to that simple fact : the core building brick in SQL is the table, and a table is not a relation. To sum up the differences :
Tables can contain duplicate rows, relations cannot. (As a consequence, SQL offers bag algebra, not relational algebra. As another consequence, it is as good as impossible for SQL to even define equality comparison for its most basic building brick !!! How would you compare tables for equality given that you might have to deal with duplicate rows ?)
Tables can contain unnamed columns, relations cannot. SELECT X+Y FROM ... As a consequence, SQL is forced into "column identity by ordinal position", and as a consequence of that, you get all sorts of quirks, e.g. in SELECT A,B FROM ... UNION SELECT B,A FROM ...
Tables can contain duplicate column names, relations cannot. A.ID and B.ID in a table are not distinct column names. The part before the dot is not part of the name, it is a "scope identifier", and that scope identifier "disappears" once you're "outside the SELECT" it appears/is introduced in. You can verify this with a nested SELECT : SELECT A.ID FROM (SELECT A.ID, B.ID FROM ...). It won't work (unless your particular implementation departs from the standard in order to make it work).
Various SQL constructs leave people with the impression that tables do have an ordering to rows. The ORDER BY clause, obviously, but also the GROUP BY clause (which can be made to work only by introducing rather dodgy concepts of "intermediate tables with rows grouped together"). Relations simply are not like that.
Tables can contain NULLs, relations cannot. This one has been beaten to death.
There should be some more, but I don't remember them off the tip of the hat.

Simple Oracle UPDATE Statement unusually bad performance

every month I do a simple update statement on my oracle database. But, since monday it takes very long. The table grows every month by 5 percent. Now there are 8 million records stored.
The Statement:
update /*+ parallel(destination_tab, 4) */ destination_tab dest
set (full_name, state) =
(select /*+ parallel(source_tab, 4) */ dest.name, src.state
from source_tab src
where src.city = dest.city);
In real there are 20 fields to update, not only two... but so it looks easier to descripe the problem.
explain plan:
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | update statement | | 8517K| 3167M| 579M (50)|999:59:59 |
| 1 | update | destination_tab | | | | |
| 2 | PX COORDINATOR | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 4 | px block iterator | | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 5 | table access full | DESTINATION_TAB | 8517K| 3167M| 6198 (1)| 00:01:27 |
| 6 | table access by index rowid| SOURCE_TAB | 1 | 56 | 1 (0)| 00:00:01 |
|* 7 | index unique scan | CITY_PK | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Could anyone descripe to me, how this can be? The plan looks very bad! Thank you very very much.
You didn't say how long is too long. You are joining an 8 million row table. Not sure how many rows are in source_tab.
I noticed the execution plan indicates a full table scan of destination_tab. Is the city column on the destination_tab table indexed? If not, try adding an index. If it is, Oracle may be ignoring it because it knows it needs to return every value anyway and destination_tab is the driving table.
No matter how you optimize it, this will always degrade in performance as the tables grow because you are updating every row by fetching a value from the same table joined to another. That is, you are always doing N operations where N is the number of rows in destination_tab.
High-level questions/suggestions:
Do you need to update every row every time? Are only certain rows likely to have changed values? If so, can you somehow predict which rows you need to update and limit your updates to it.
Why are the hints there? If performance changes, I would experiment with dropping hints. It's the optimizer's job to find the best plan for you. By using hints, you are telling the optimizer how to do its job. You'd better be right.
You are updating the full_name column on destination_tab to the name column of the same row. But you are obtaining the name column through a join to the table. It may be quicker to take that out of your select and use something like below. This is a guess. It may not matter.
update destination_tab dest
set full_name = name,
state =
(select src.state
from source_tab src
where src.city = dest.city);
Try the following.
merge
into destination_tab d
using source_tab s
on (d.city = d.city)
when matched then
update
set d.state = s.state
where decode(d.state, s.state, 1, 0) = 0;
If this is a data warehouse, I wouldn't do updates, especially not every row in a large table. I'd probably create a materialized view combining the pieces from various base tables, and do a full refresh when needed (non-atomic: truncate + insert append).
Edit:
As for WHY the current update approach is taking much longer than usual, my guess is that in previous runs Oracle found a good number of blocks needed for the update in buffer cache, and lately Oracle has had to pull a lot from disk into buffer first. You can look into consistent gets and db block gets (logical io) vs physical io (disk).
I understand the comments about the sense of a data warehouse and so on. However, I have to do this update in this kind. The update is part of an ETL workflow. I have to copy every month the complete 8 million records of the table "destination". After this step I have to do the UPDATE which makes problems.
I do not understand the problem, that the performance is so bad day-to-day. Usually, the update runs 45 minutes. Now, it runs about 4 hours. But why? There is no sorting necessary, so the famous reason "sorting on disc instead on main memory" is not possible. What is the problem in my case?
Could there be an difference about the performance between normal update (how I do it) and the merge-update?

Resources