I'm looking for a way to generate a script that would generate an SQL query that would select all child tables columns from a parent table.
Let's say you have a table Class (teacher, room, program) and a table Student (firstname, lastname, age, score, email).
Let's say you want to get a select of all students in Class.
Sure you could write the query manually.
But now imagine you have a complex table with dozens of child tables, how do you do this efficiently/programmatically ?
This is something that all programmers would like to have, no ?
I can't believe no one has ever done that.
I understand the answer may depend on the DBMS vendor, I'm personally looking for a solution for Oracle.
Questions that are a bit similar :
Oracle: Easy way to find names and/or number of child record tables
Postgres: select data from parent table and all child tables
And here is an idea to solve this partially : use a tool such as PowerBi or Visual Studio to generate Model from database in ASP.NET MVC. You won't get the SQL query but you will get the data.
You can start with this POC:
select
juc.table_name as parent_table,
/*
uc.table_name as child_table, uc.constraint_name, uc.r_constraint_name,
juc.constraint_type,
uccc.column_name as parent_col_name, uccc.position as parent_col_position,
uccp.column_name as child_col_name, uccp.position as child_col_position,
*/
'SELECT c.* FROM ' || juc.table_name || ' p JOIN ' || uc.table_name || ' c ON '
||
LISTAGG( 'c.' || uccp.column_name || ' = p.' || uccc.column_name, ' AND ' ) WITHIN GROUP(order by uccc.position)
as sql
from user_constraints uc
join user_constraints juc on juc.constraint_name = uc.r_constraint_name
join user_cons_columns uccc on uccc.constraint_name = uc.r_constraint_name
join user_cons_columns uccp on uccp.constraint_name = uc.constraint_name and uccc.position = uccp.position
where uc.constraint_type = 'R'
group by uc.table_name, juc.table_name, uc.constraint_name
;
You can create your own entity relationship model metadata and write PL/SQL that will traverse it and assemble SQL intelligently. I've done this myself to avoid having to hard-code SQL in my front-end apps. But it is highly complex and involves a lot of coding, far more than can be shared in a forum like this. But to give you the general gist, I have the following metadata tables that describe my model:
sql_statements - associates a logical entity with a primary table, and specifies the PK column.
sql_statement_parents - defines the parent entity and the child attribute used to join to the parent's PK.
sql_attribute_dictionary - lists every available attribute for every statement, the source column, its datatype, plus optional derived column expressions.
attribute_dependencies - used for derived column expressions, specifies which attributes are needed by the derived attribute.
Then you write code that takes a sql_statement name and a list of desired attributes and a set of optional filters, and it builds a list of needed source tables/columns using the data relationships in the metadata, and then using the parent-child relationships recursively builds SQL (using nested query blocks) from the child to whatever parent ancestor(s) it needs to obtain the required columns, intelligently aliasing everything and joining in the write way to be performant. It can then pass back the finished SQL as a REF CURSOR which you can then parse, open and fetch from to get results. It works great for me, but it did take weeks of work to perfect, and that's with decades of experience in SQL and PL/SQL. This is no simple task, but it is doable. And of course there are always complex needs that defy the capabilities of our metadata model, and so for those we end up either creating views or pipeline functions, and registering those in our metadata so that generated SQL can invoke them when needed.
But in the end, however you do it, you will not get away from having to describe your data model in detail so that code can walk it.
Related
I understand that the performance of our queries is improved when we use EXISTS and NOT EXISTS in the place of IN and NOT IN, however, is performance improved further when we replace NOT IN with an OUTER JOIN as opposed to NOT EXISTS?
For example, the following query selects all models from a PRODUCT table that are not in another table called PC. For the record, no model values in the PRODUCT or PC tables are null:
select model
from product
where not exists(
select *
from pc
where product.model = pc.model);
The following OUTER JOIN will display the same results:
select product.model
from product left join pc
on pc.model = product.model
where pc.model is null;
Seeing as these both return the same values, which option should we use to better improve the performance of our queries?
The query plan will tell you. It will depend on the data and tables. In the case of OUTER JOIN and NOT EXISTS they are the same.
However to your opening sentence, NOT IN and NOT EXISTS are not the same if NULL is accepted on model. In this case you say model cannot be null so you might find they all have the same plan anyway. However when making this assumption, the database must be told there cannot be nulls (using NOT NULL) as opposed to there simply not being any. If you don't it will make different plans for each query which may result in different performance depending on your actual data. This is generally true and particularly true for ORACLE which does not index NULLs.
Check out EXPLAIN PLAN
At first, I seen the select statement on Oracle Docs.
I have some question about oracle select behaviour, when my query contain select,join,where.
see this below for information:
My sample table:
[ P_IMAGE_ID ]
IMAGE_ID (PK)
FILE_NAME
FILE_TYPE
...
...
[ P_IMG_TAG ]
IMG_TAG_ID (PK)
IMAGE_ID (FK)
TAG
...
...
My requirement are: get distinct of image when it's tag is "70702".
Method 1: Select -> Join -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
INNER JOIN P_IMG_TAG PTAG
ON PTAG.IMAGE_ID = PID.IMAGE_ID
WHERE PTAG.TAG = '70702';
I think the query behaviour should be like:
join table -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan:
Method 1 cost 76.
Method 2: Select -> Where -> Where -> Distinct
SELECT DISTINCT PID.IMAGE_ID
, PID.FILE_NAME
FROM P_IMAGE_ID PID
WHERE PID.IMAGE_ID IN
(
SELECT PTAG.IMAGE_ID
FROM P_IMG_TAG PTAG
WHERE PTAG.TAG = '70702'
);
I think the second query behaviour should be like:
hint where cause -> hint where cause -> distinct select
I use Oracle SQL developer to get the explain plan too:
Method 2 cost 76 too. Why?
I believe when I try where cause first for reduce the database process and avoid join table that query performance should be better than the table join query, but now when I test it, I am confused, why 2 method cost are equal ?
Or am I misunderstood something ?
List of my question here:
Why 2 method above cost are equal ?
If the result of sub select Tag = '70702' more than thousand or million or more, use join table should be better alright ?
If the result of sub select Tag = '70702' are least, use sub select for reduce data query process is better alright ?
When I use method 1 Select -> Join -> Where -> Distinct mean the database process table joining before hint where cause alright ?
Someone told me when i move hint cause Tag = '70702' into join cause
(ie. INNER JOIN P_IMG_TAG PTAG ON PAT.IMAGE_ID = PID.IMAGE_ID AND PTAG.TAG = '70702' ) it's performance may be better that's alright ?
I read topic subselect vs outer join and subquery or inner join but both are for SQL Server, I don't sure that may be like Oracle database.
The DBMS takes your query and executes something. But it doesn't execute steps that correspond to SQL statement parts in the order they appear in an SQL statement.
Read about "relational query optimization", which could just as well be called "relational query implementation". Eg for Oracle.
Any language processor takes declarations and calls as input and implements the described behaviour in terms of internal data structures and operations, maybe through one or more levels of "intermediate code" running on a "virtual machine", eventually down to physical machines. But even just staying in the input language, SQL queries can be rearranged into other SQL queries that return the same value but perform significantly better under simple and general implementation assumptions. Just as you know that your question's queries always return the same thing for a given database, the DBMS can know. Part of how it knows is that there are many rules for taking a relational algebra expression and generating a different but same-valued expression. Certain rewrite rules apply under certain limited circumstances. There are rules that take into consideration SQL-level relational things like primary keys, unique columns, foreign keys and other constraints. Other rules use implementation-oriented SQL-level things like indexes and statistics. This is the "relational query rewriting" part of relational query optimization.
Even when two different but equivalent queries generate different plans, the cost can be similar because the plans are so similar. Here, both a HASH and SORT index are UNIQUE. (It would be interesting to know what the few top plans were for each of your queries. It is quite likely that those few are the same for both, but that the plan that is more directly derived from the particular input expression is the one that is offered when there's little difference.)
The way to get the DBMS to find good query plans is to write the most natural expression of a query that you can find.
Can I use Oracle sys tables to trace a path between two table, all the possibilities to go from X table to Y table.
The problem is:
I work on an enormous database, where it's really difficult to know rapidly, which tables are vital to make a join between two tables.
Can I do this?
First Need:
The problem with SQL Developer Data Modeler and the other tools, is the fact to have to select tables to rev_eng (So I should already know the tables to select) but for me, this is the major problem. In my case I have 800 tables and I can't select them all to trace the path. My desire is to submit as arguments two tables and then generate all the possible paths.
Second Need :
I have already try to query sys.all_constraints and the max I've done, is to detect the tables directly connected to a table X.
The query:
SELECT C1.TABLE_NAME,C2.TABLE_NAME
FROM ALL_CONSTRAINTS C1, ALL_CONSTRAINTS C2
WHERE C2.CONSTRAINT_NAME = C1.R_CONSTRAINT_NAME
AND UPPER(C1.OWNER) LIKE '**MY_SCHEMA**'
AND C1.CONSTRAINT_TYPE='R'
AND UPPER(C1.TABLE_NAME) LIKE '**X**'
ORDER BY C1.TABLE_NAME
So if somebody can help me to conceive at least the query to have this result:
Table1 | Table2 | JoinCollumnofTable1 | JoinCollumnofTable2
To have that, I surmise the other table to join to ALL_CONSTRAINTS is ALL_CON_COLUMNS
But the problem I've found is the composite primary_keys.
This is why Nature gave us data models: to assist in tasks like this.
If you don't have a data model then you can reverse engineer one from the data dictionary. See my answer to a question on reverse engineering.
Reverse engineering can only identify relationships which have been defined by foreign keys. This shouldn't need stating but let's say it anyway: if your database hasn't got constraints you have no chance of deriving a data model automatically.
"I have 800 tables and I can't select them all to trace the path. "
Hmmm, I suppose recommending you reverse engineer a data model is a bit like the punchline to the old joke about how to get to Cork: "Well I wouldn't start from here". The whole point about having a data model upfront is that we have it when when we really need it.
If primary and foreign key relationships are established in the database, you can use a tool like Oracle Developer with Data Modeler to reverse engineer the model and give a graphical representation of what the relationships are.
Tools like this read the Oracle dictionary to determine the relationships between tables. You can do this yourself by querying views such as sys.all_constraints.
I cobbled the following query together using Tim Hall's Generic Function Using a Ref Cursor, since I only have 10g here (you can use 11g's LISTAGG function if you've got 11g). It should get you close.
SELECT ac1.table_name "Table", ac2.table_name "Referencing Table"
, concatenate_list(CURSOR(SELECT acc.column_name
FROM all_cons_columns acc
WHERE acc.constraint_name = ac1.constraint_name
AND acc.owner = 'the_owner'
ORDER BY position)) "PK Columns"
, concatenate_list(CURSOR(SELECT acc.column_name
FROM all_cons_columns acc
WHERE acc.constraint_name = ac2.constraint_name
AND acc.owner = 'the_owner'
ORDER BY position)) "FK Columns"
FROM all_constraints ac1 JOIN all_constraints ac2
ON ac1.constraint_name = ac2.r_constraint_name
WHERE ac1.table_name = 'your_table'
AND ac1.owner = 'the_owner'
AND ac2.owner = 'the_owner'
AND ac1.constraint_type = 'P';
Also try schemaspy - an open source free alternative which uses the foreign keys to generate a relationship model!
I am not an expert on TSQL but I wonder if such a thing is possible:
Imagine I have a select that joins to a table which is another query's result set:
SELECT *
FROM tProduct
JOIN (SELECT ProductId FROM ...... -- some other joins) tInlineQuey
ON tInlineQuery.ProductId = tProduct.Id
WHERE tInlineQuery. -- some econdition
Is it possible or meaningful to create an indedx on the tInlineQuery so that to apply a filtering on that result set can perform faster?
If so, how is it possible?
No, you could have suitable indexes on the objects within the subquery, but you can't add an temporary index to the subquery as you have it there. You can query hint the way in which the data is joined, e.g. nested loop, merge or hash join - but the optimizer tends to make the right decision.
An option to get that effect would be to select the results of that subquery into a temp table, and place an index on there, then join to that temp table.
To do this, you would need a stored procedure and to include the following code:
SELECT yourFields
INTO #TempTableName
JOIN SomeOtherTables
WHERE SomeField=SomeValue;
CREATE CLUSTERED INDEX SomeIndexName ON #TempTableName(SomeField,AnotherField);
SELECT *
FROM tProduct p
JOIN #TempTableName t ON t.SomeField = p.SomeField
...
DROP TABLE #TempTableName -- optional, the table will die when it goes out of scope at the end of the procedure.
The temp table index doesn't have to be clustered, thats down to your choice.
Andrew has an excellent answer, but if this is a subquery you will be using often, another option would be to create an indexed view. There are several good articles about that, including one I wrote at SqL Server Central titled On Indexes and Views .
I have a page that pulls together aggregate data from two different tables. I would like to perform these queries in parallel to reduce the latency without having to introduce a stored procedure that would do both.
For example, I currently have this:
ViewBag.TotalUsers = DB.Users.Count();
ViewBag.TotalPosts = DB.Posts.Count();
// Page displays both values but has two trips to the DB server
I'd like something akin to:
var info = DB.Select(db => new {
TotalUsers = db.Users.Count(),
TotalPosts = db.Posts.Count());
// Page displays both values using one trip to DB server.
that would generate a query like this
SELECT (SELECT COUNT(*) FROM Users) AS TotalUsers,
(SELECT COUNT(*) FROM Posts) AS TotalPosts
Thus, I'm looking for a single query to hit the DB server. I'm not asking how to parallelize two separate queries using Tasks or Threads
Obviously I could create a stored procedure that got back both values in a single trip, but I'd like to avoid that if possible as it's easier to add additional stats purely in code rather than having to keep refreshing the DB import.
Am I missing something? Is there a nice pattern in EF to say that you'd like several disparate values that can all be fetched in parallel?
This will return the counts using a single select statement, but there is an important caveat. You'll notice that the EF-generated sql uses cross joins, so there must be a table (not necessarily one of the ones you are counting), that is guaranteed to have rows in it, otherwise the query will return no results. This isn't an ideal solution, but I don't know that it's possible to generate the sql in your example since it doesn't have a from clause in the outer query.
The following code counts records in the Addresses and People tables in the Adventure Works database, and relies on StateProvinces to have at least 1 record:
var r = from x in StateProvinces.Top("1")
let ac = Addresses.Count()
let pc = People.Count()
select new { AddressCount = ac, PeopleCount = pc };
and this is the SQL that is produced:
SELECT
1 AS [C1],
[GroupBy1].[A1] AS [C2],
[GroupBy2].[A1] AS [C3]
FROM
(
SELECT TOP (1) [c].[StateProvinceID] AS [StateProvinceID]
FROM [Person].[StateProvince] AS [c]
) AS [Limit1]
CROSS JOIN
(
SELECT COUNT(1) AS [A1]
FROM [Person].[Address] AS [Extent2]
) AS [GroupBy1]
CROSS JOIN
(
SELECT COUNT(1) AS [A1]
FROM [Person].[Person] AS [Extent3]
) AS [GroupBy2]
and the results from the query when it's run in SSMS:
C1 C2 C3
----------- ----------- -----------
1 19614 19972
You should be able to accomplish what you want with Parallel LINQ (PLINQ). You can find an introduction here.
It seems like there's no good way to do this (yet) in EF4. You can either:
Use the technique described by adrift which will generate a slightly awkward query.
Use the ExecuteStoreQuery where T is some dummy class that you create with property getters/setters matching the name of the columns from the query. The disadvantage of this approach is that you can't directly use your entity model and have to resort to SQL. In addition, you have to create these dummy entities.
Use the a MultiQuery class that combines several queries into one. This is similar to NHibernate's futures hinted at by StanK in the comments. This is a little hack-ish and it doesn't seem to support scalar valued queries (yet).