SYS_CONNECT_BY_PATH and START WITH/CONNECT BY PostgreSQL Equivalent

SYS_CONNECT_BY_PATH and START WITH/CONNECT BY PostgreSQL Equivalent - oracle

I have a query that is part of a much larger query written in Oracle that I need to convert to PostgreSQL.
/*rn and cnt are defined earlier*/
SELECT wtn, LTRIM(SYS_CONNECT_BY_PATH(RESP_TCSI, ','),',') TCSI_CODES
FROM DATA
WHERE rn = cnt
START WITH rn = 1
CONNECT BY PRIOR rn = rn-1
AND PRIOR WTN = WTN
From what I'm able to tell, there's not an equivalent to SYS_CONNECT_BY_PATH() in Postgres. I know that Postgres has a CONNECTBY() function in tablefunc, but I don't think it does what either the start with and connect by bits do. I'm also aware of what the Postgres equivalent to LTRIM() is, but if I have to use CONNECTBY() or something similar, I'm not sure if trimming the string is important.
Reading and searching around I noticed that there is probably a way to do this with some recursive select, but I'm unsure how I would do that, and beyond that, I don't really understand what the code is doing. My assumption would be that it has something to do with a hierarchical tree, based on the Oracle equivalents, but even then I'm not sure. How would I do something equivalent or similar to this in Postgres?
Thanks.

Use a recursive common table expression:
with recursive tree as (
select wtn,
resp_tcsi as tcsi_codes
from data
where rn = 1 -- this is the "start with" part
union all
select ch.wtn,
p.tcsi_codes||','||ch.resp_tcsi
from data as ch
join tree p
on ch.rn -1 = p.rn -- this is the "connect by" part
and ch.wtn = p.wtn
)
select *
from tree;

Related

order by rownum — is it correct or not?

I have a canonical top-N query against an Oracle database suggested by all FAQs and HowTos:
select ... from (
select ... from ... order by ...
) where ronwum <= N
It works perfectly on Oracle 11, i.e. it returns top-N records in the order specified in inner select.
However it breaks on Oracle 12. It still returns the same top-N records, but they may get shuffled. The final order of these records is non-deterministic.
I googled but haven't found any related discussions. Looks like everyone else is always getting the correct record order from such select.
One finding was interesting though. I saw that some people use (without an explanation, unfortunately) an additional order by rownum clause in the outer select:
select ... from (
select ... from ... order by ...
) where ronwum <= N
order by rownum
(both rownum's here are references to the Oracle pseudocolumn; it's not something returned by inner select)
It appears to work. But with Oracle optimizer you can never be sure if it's just luck or a really correct solution.
The question is: does order by rownum guarantee correct ordering in this case or not, and why? Me and my colleagues could not come to agreement about it.
P.S. I'm aware of other ways to select top-N records, e.g. using row_number analytic function and fetch first clause introduced in Oracle 12. I'm also aware that I can repeat the same order by ... on the outer select. The question is about order by rownum only — is it correct or not.

Inner query and outer query may or may not give different order and hence different order of rownum. As rownum is already ordered and if you want to get top N records then best thing is to do is create alias of rownum in inner query and use it on outer query.
select ... from (
select rownum rn ... from ...
) where rn <= N
order by rn

Forcing Oracle to do distinct last

I have a quite complicated view (using several layers of views across several database links) which takes a second to return all of it's rows. But, when I ask for distinct rows, it takes considerably more time. I stopped waiting after 4 minutes.
To make my self as clear as possible:
select a, b from compicated_view; -- takes 1 sec (returns 6 rows)
select distinct a, b from compicated_view; -- takes at least 4 minutes
I find that pretty weird, but hey, that's how it is. I guess Oracle messed something up when planing that query. Now, is there a way to force Oracle to first finish the select without distinct, and then do a "select distinct *" on the results? I looked into optimizer hints, but I can't find anything about hinting the order in which distinct is applied (this is first time I'm optimizing a query, obviously :-/).
I'm using Oracle SQl Developer on Oracle 10g EE.

Try:
SELECT DISTINCT A,B FROM (
SELECT A,B FROM COMPLICATED_VIEW
WHERE rownum > 0 );
this forces to materialize the subquery and prevents from view merging/predicate pushing, and likely from changing the original plan of the view.
You may also try NO_MERGE hint:
SELECT /*+ NO_MERGE(alias) */
DISTINCT a,b
FROM (
SELECT a,b FROM COMPLICATED_VIEW
) alias

Since you haven't posted details... try the following:
SELECT DISTINCT A,B
FROM
(SELECT A,B FROM COMPLICATED_VIEW);

oracle : how to ensure that a function in the where clause will be called only after all the remaining where clauses have filtered the result?

I am writing a query to this effect:
select *
from players
where player_name like '%K%
and player_rank<10
and check_if_player_is_eligible(player_name) > 1;
Now, the function check_if_player_is_eligible() is heavy and, therefore, I want the query to filter the search results sufficiently and then only run this function on the filtered results.
How can I ensure that the all filtering happens before the function is executed, so that it runs the minimum number of times ?

Here's two methods where you can trick Oracle into not evaluating your function before all the other WHERE clauses have been evaluated:
Using rownum
Using the pseudo-column rownum in a subquery will force Oracle to "materialize" the subquery. See for example this askTom thread for examples.
SELECT *
FROM (SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1)
WHERE check_if_player_is_eligible(player_name) > 1
Here's the documentation reference "Unnesting of Nested Subqueries":
The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
Using CASE
Using CASE you can force Oracle to only evaluate your function when the other conditions are evaluated to TRUE. Unfortunately it involves duplicating code if you want to make use of the other clauses to use indexes as in:
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND CASE
WHEN player_name LIKE '%K%'
AND player_rank < 10
THEN check_if_player_is_eligible(player_name)
END > 1

There is the NO_PUSH_PRED hint to do it without involving rownum evaluation (that is a good trick anyway) in the process!
SELECT /*+NO_PUSH_PRED(v)*/*
FROM (
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
) v
WHERE check_if_player_is_eligible(player_name) > 1

You usually want to avoid forcing a specific order of execution. If the data or the query changes, your hints and tricks may backfire. It's usually better to provide useful metadata to Oracle so it can make the correct decisions for you.
In this case, you can provide better optimizer statistics about the function with ASSOCIATE STATISTICS.
For example, if your function is very slow because it has to read 50 blocks each time it is called:
associate statistics with functions
check_if_player_is_eligible default cost(1000 /*cpu*/, 50 /*IO*/, 0 /*network*/);
By default Oracle assumes that a function will select a row 1/20th of the time. Oracle wants to eliminate as many rows as soon
as possible, changing the selectivity should make the function less likely to be executed first:
associate statistics with functions
check_if_player_is_eligible default selectivity 90;
But this raises some other issues. You have to pick a selectivity for ALL possible conditions, 90% certainly won't always be accurate. The IO cost is the number of blocks fetched, but CPU cost is "machine instructions used", what exactly does that mean?
There are more advanced ways to customize statistics,for example using the Oracle Data Cartridge Extensible Optimizer. But data cartridge is probably one of the most difficult Oracle features.

You did't specify whether player.player_name is unique or not. One could assume that it is and then the database has to call the function at least once per result record.
But, if player.player_name is not unique, you would want to minimize the calls down to count(distinct player.player_name) times. As (Ask)Tom shows in Oracle Magazine, the scalar subquery cache is an efficient way to do this.
You would have to wrap your function call into a subselect in order to make use of the scalar subquery cache:
SELECT players.*
FROM players,
(select check_if_player_is_eligible(player.player_name) eligible) subq
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1
AND subq.eligible = 1

Put the original query in a derived table then place the additional predicate in the where clause of the derived table.
select *
from (
select *
from players
where player_name like '%K%
and player_rank<10
) derived_tab1
Where check_if_player_is_eligible(player_name) > 1;

Performing simultaneous unrelated queries in EF4 without a stored procedure?

I have a page that pulls together aggregate data from two different tables. I would like to perform these queries in parallel to reduce the latency without having to introduce a stored procedure that would do both.
For example, I currently have this:
ViewBag.TotalUsers = DB.Users.Count();
ViewBag.TotalPosts = DB.Posts.Count();
// Page displays both values but has two trips to the DB server
I'd like something akin to:
var info = DB.Select(db => new {
TotalUsers = db.Users.Count(),
TotalPosts = db.Posts.Count());
// Page displays both values using one trip to DB server.
that would generate a query like this
SELECT (SELECT COUNT(*) FROM Users) AS TotalUsers,
(SELECT COUNT(*) FROM Posts) AS TotalPosts
Thus, I'm looking for a single query to hit the DB server. I'm not asking how to parallelize two separate queries using Tasks or Threads
Obviously I could create a stored procedure that got back both values in a single trip, but I'd like to avoid that if possible as it's easier to add additional stats purely in code rather than having to keep refreshing the DB import.
Am I missing something? Is there a nice pattern in EF to say that you'd like several disparate values that can all be fetched in parallel?

This will return the counts using a single select statement, but there is an important caveat. You'll notice that the EF-generated sql uses cross joins, so there must be a table (not necessarily one of the ones you are counting), that is guaranteed to have rows in it, otherwise the query will return no results. This isn't an ideal solution, but I don't know that it's possible to generate the sql in your example since it doesn't have a from clause in the outer query.
The following code counts records in the Addresses and People tables in the Adventure Works database, and relies on StateProvinces to have at least 1 record:
var r = from x in StateProvinces.Top("1")
let ac = Addresses.Count()
let pc = People.Count()
select new { AddressCount = ac, PeopleCount = pc };
and this is the SQL that is produced:
SELECT
1 AS [C1],
[GroupBy1].[A1] AS [C2],
[GroupBy2].[A1] AS [C3]
FROM
(
SELECT TOP (1) [c].[StateProvinceID] AS [StateProvinceID]
FROM [Person].[StateProvince] AS [c]
) AS [Limit1]
CROSS JOIN
(
SELECT COUNT(1) AS [A1]
FROM [Person].[Address] AS [Extent2]
) AS [GroupBy1]
CROSS JOIN
(
SELECT COUNT(1) AS [A1]
FROM [Person].[Person] AS [Extent3]
) AS [GroupBy2]
and the results from the query when it's run in SSMS:
C1 C2 C3
----------- ----------- -----------
1 19614 19972

You should be able to accomplish what you want with Parallel LINQ (PLINQ). You can find an introduction here.

It seems like there's no good way to do this (yet) in EF4. You can either:
Use the technique described by adrift which will generate a slightly awkward query.
Use the ExecuteStoreQuery where T is some dummy class that you create with property getters/setters matching the name of the columns from the query. The disadvantage of this approach is that you can't directly use your entity model and have to resort to SQL. In addition, you have to create these dummy entities.
Use the a MultiQuery class that combines several queries into one. This is similar to NHibernate's futures hinted at by StanK in the comments. This is a little hack-ish and it doesn't seem to support scalar valued queries (yet).

Common Table Expression in Sub-Query

I would request for help in understanding which all RDBMS from Oracle, DB2, Sybase support a common table expression (CTE) in a sub-query. I am aware that PostgreSQL does while MS SQL Server does not.
SELECT a.*, b.*
FROM (WHERE aa as (
<<select_query>),
SELECT *
FROM aa
WHERE <<criteria>>
) as a
LEFT JOIN (
WITH bb as (
<<select_query>
),
select * from bb inner join tbl_c on <<innerjoin>> where <<criteria>>
) as b
on <<join_expr>>
I am unable to define the with clause outside the sub-queries - both the queries are dynamically generated w.r.t. the columns, criteria, security, etc.
Also, the above query itself may be used in another query as a sub-query.
In summary, the principle is dynamically generated views, re-usable later. Some queries may have upto 10-12 such dynamic views being merged together as well.
The problem is that the application is supposed to be database-agnostic at least so far as PG, Oracle & DB2 are concerned and features not supported by one are not implemented at all.

Yes, you can use CTE's in subqueries in Oracle. From the Oracle 11g docs:
You can specify this clause in any top-level SELECT statement and in
most types of subqueries. The query name is visible to the main query
and to all subsequent subqueries. For recursive subquery factoring,
the query name is even visible to the subquery that defines the query
name itself.
As an example, this works in Oracle:
SELECT a.*, b.*
FROM (WITH aa AS
(
SELECT LEVEL l1, mod(level, 5) m1 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM aa WHERE m1 < 3) a LEFT JOIN
(WITH bb AS
(
SELECT LEVEL l2, mod(level, 5) m2 FROM dual CONNECT BY LEVEL < 50
)
SELECT * FROM bb WHERE m2 BETWEEN 1 AND 4) b
ON a.l1 = b.l2;

That's not directly an answer to your question, but maybe you can think about this:
SQL Server seems to limit the semantics (not necessarily the syntax) of SQL where it makes sense to do so. For instance, you cannot have a subquery with an ORDER BY clause, if you don't also specify a TOP n clause. This makes sense, as ordered subqueries are pointless unless they have limits. Other RDBMS allow for such pointlessness.
In your case (that's just a guess), having CTE's in subqueries only makes limited sense, because you can rewrite your whole query in a way that the CTE's are declared at the top-most level. The only difference you'll have is the scope and maybe the readability of each declaration.
On the other hand, CTE's allow for recursive queries, which might be very hard to apply when CTE's are declared in subqueries...
Since you need to implement database-agnostic SQL, I recommend you do not make heavy use of CTE's yet. If CTE's are simple, you can always rewrite them as simple views...

The newer Microsoft SQL Server versions do support CTE's.

While PostgreSQL supports CTE's, they are an optimisation barrier which prevents predicate or join pushing into the CTE query. This makes them less effective in many cases than a simple subquery.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

SYS_CONNECT_BY_PATH and START WITH/CONNECT BY PostgreSQL Equivalent - oracle

Related

order by rownum — is it correct or not?

Forcing Oracle to do distinct last

oracle : how to ensure that a function in the where clause will be called only after all the remaining where clauses have filtered the result?

Performing simultaneous unrelated queries in EF4 without a stored procedure?

Common Table Expression in Sub-Query

Categories

Resources