Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am looking for single line definition to understand concepts or terms like below.
I refered many sites even oracle.docs. But cant able to understand those concepts and map with real time scenerio..please help me to understand.
Thanks in advance.
1)Normalization and its forms
2)Table level locking and how to resolve it
3)Dead locking and how to resolve it
4)Cube and Rollup
5)Table partition
1) Normalization and its forms
Normalization is a database design technique which organizes tables in
a manner that reduces redundancy and dependency of data. It divides
larger tables to smaller tables and links them using relationships.
There are several types of normalization form:
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
Boyce-Codd Normal Form (BCNF)
4NF (Fourth Normal Form)
5NF (Fifth Normal Form)
Refer this document for more information on normalization
2) Table level locking and how to resolve it
According to Oracle documentation, A transaction automatically
acquires a table lock (TM lock) when a table is modified with the
following statements: INSERT, UPDATE, DELETE, MERGE, and SELECT ...
FOR UPDATE. These DML operations require table locks to reserve DML
access to the table on behalf of a transaction and to prevent DDL
operations that would conflict with the transaction.
Refer this document for more information on Table level locking .
3) Dead locking and how to resolve it
A deadlock occurs when two or more sessions are waiting for data
locked by each other, resulting in all the sessions being blocked.
Oracle automatically detects and resolves deadlocks by rolling back
the statement associated with the transaction that detects the
deadlock.
Refer this document for more information on Deadlock.
4) Cube and Rollup
ROLLUP :
In addition to the regular aggregation results we expect from the
GROUP BY clause, the ROLLUP extension produces group subtotals from
right to left and a grand total.
CUBE:
In addition to the subtotals generated by the ROLLUP extension, the
CUBE extension will generate subtotals for all combinations of the
dimensions specified.
Refer this document for more information on ROLLUP and CUBE.
5) Table partition
Partitioning allows a table, index, or index-organized table to be
subdivided into smaller pieces, where each piece of such a database
object is called a partition. Each partition has its own name, and may
optionally have its own storage characteristics.
There are several types of partitioning:
- Range Partitioning Tables
- Hash Partitioning Tables
- Composite Partitioning Tables
Refer this document for more information on ROLLUP and CUBE.
Cheers!!
Related
I am trying to prepare DB design for APEX application. Requirement is as follows.
In Departments IR page, users are asking below columns
Number of employees in each department (Department may or may not have employees)
Primary Location for Department (Department can have multiple addresses and addresses are stored in other table, along with primary flag)
Alternative Manager's Email Address for Department (alt_manager_id column, this is optional column and refers to employees table)
I can implement these requirements using either inline sub queries or using OUTER JIONs. But, these approaches will have performance impact as the data grows (like 100s of thousands of rows). So, my question is, is it ok to store these data directly at "Departments" table and update "Departments" table when child tables gets updated. Basically, I am trying to store summary data at master table, instead of deriving it as on when needed from child tables. Is this considered bad practice? Is it ok to implement such DB design?
Thank you
"Is this considered bad practice?"
Usually yes. There are several problems with maintaining summary detail information in a master record.
Your inserts into child tables (and deletes if you have them) now also have to take a lock on the master record, to increment the count. This adds complexity to what should be simple transactions.
It also has two performance hits: the additional overhead of maintaining the counts and the potential for sessions to hang in multi-user environments.
Note that you are adding a definite performance hit to your insert activity for a possible saving in the performance of aggregating queries.
The good practice is to just run the counts when you need the summaries. Tune the queries if you need to.
If you think you really are going to be querying the summary data often enough for the workload to be a problem you should consider building materialized views for the summary queries. Then, when you enable query rewrites, Oracle will transparently query the materialized view if it can satisfy the query rather than re-running the aggregations. This is a technique which is used a lot in data warehouses, but there's no reason not to use it in OLTP environments if you really have the data volumes to justify it. Find out more.
Generally, try the simplest thing which could work first. Only look to do something different (like building a materialized view for aggregations) when you know you have a demonstrable problem with performance.
I'm an ETL developer that's currently being tasked with developing a type 2 SCD from existing historical data in a relational database. I'm perfectly capable of creating a type 2 SCD that's responsible for tracking future changes to the data, but I'm completely useless when it comes to the task at hand.
The relational model is in our ODS . Based on that relational model, I'm supposed to build flat records in our DW dimension. There are multiple attributes which need to be monitored for changes, each in specific related tables in the relational model. Historical changes must be kept on a daily basis, and if multiple changes to the same attribute occur on the same day, only the last subsists.
How can I tackle this? I'm lost. Thanks in advance.
P.S. we're talking tables with 20-30 million rows and multiple attributes that may change at any given time and therefore must result in a new record in the SCD.
This will indeed be painful. I'm assuming from your question that the tables containing the attribute values are currently varying independently (or you wouldn't need to ask the question).
If you have a table 'Table1' containing 'Key', 'Attribute1' and 'Effective From','Effective To' columns, then you can 'explode' that table into a virtual table in the form 'Key','Attribute1','Date', projecting out one row for every date where that attribute was current.
(Note that you probably don't want to do this as a ranged join against your date dimension, because this will be a Triangular Join (ie perform really badly), you probably need to explode the rows in an ETL tool/programmatically)
If you perform this process across multiple tables, you will have a set of tables giving you the full day-by-day snapshot of each attribute for every day that you care about. It's then fairly easy to join those tables based on 'FK' and 'Date' to give you the complete daily snapshot across all of the attribute values.
Then, of course, you need to run this though another process to collapse rows with the same Key, contiguous dates and all the same attribute values, ie 'unexplode' the rows, back into 'effective from','effective to' form. Note again, that this is fundamentally a row-by-row operation (or at very least a windowing function), and a set-based approach will perform very badly. Personally I'd just stream it all though some .net/java code to achieve this.
Given data volumes this will take a while, but should be achievable.
I have a situation where I need a large amount of data (9+ billion per day) data being collected in a loading table that has fields like
-TABLE loader
first_seen,request,type,response,hits
1232036346,mydomain.com,A,203.11.12.1,200
1332036546,ogm.com,A,103.13.12.1,600
1432039646,mydomain.com,A,203.11.12.1,30
that need to split into two tables (de-duplicated)
-TABLE final
request,type,response,hitcount,id
mydomain.com,A,203.11.12.1,230,1
ogm.com,A,103.13.12.1,600,2
and
-TABLE timestamps
id,times_seen
1,1232036346
2,1432036546
1,1432039646
I can create the schemas and do the select like
select request,type,response,sum(hitcount) from loader group by request,type,response;
get data into the final table. for best performance I want to see if I can use "insert all" to move data from the loader to these two tables and perhaps use triggers in the database to try to achieve this. Any ideas and recommendations on the best ways to solve this?
"9+ billion per day"
That's more than just a large number of rows: that's a huge number, and it will require special engineering to handle it.
For starters, you don't just need INSERT statements. The requirement to maintain the count for existing (request,type,response) tuples points to UPDATE too. The need to generate and return a synthetic key is problematic in this scenario. It rules out MERGE, the easiest way of implementing upserts (because the MERGE syntax doesn't support the RETURNING clause).
Beyond that, attempting to handle nine billion rows in a single transaction is a bad idea. How long will it take to process? What happens if it fails halfway through? You need to define a more granular unit of work.
Although, that raises some business issues. What do the users only want to see the whole picture, after the Close-Of-Day? Or would they derive benefit from seeing Intra-day results? If yes, how to distinguish Intra-day from Close-Of-Day results? If no, how to hide partially processed results whilst the rest is still in flight? Also, how soon after Close-Of-Day do they want to see those totals?
Then there are the architectural considerations. These figure mean processing over one hundred thousand (one lakh) rows every second. That requires serious crunch and expensive licensing extras. Obviously Enterprise Edition for parallel processing but also Partitioning and perhaps RAC options.
By now you should have an inkling why nobody answered your question straight-away. This is a consultancy gig not a StackOverflow question.
But let's sketch a solution.
We must have continuous processing of incoming raw data. So we stream records for loading into FINAL and TIMESTAMP tables alongside the LOADER table, which becomes an audit of the raw data (or else perhaps we get rid of the LOADER table altogether).
We need to batch the incoming records to leverage set-based operations. Depending on the synthetic key implementation we should aim for pure SQL, otherwise Bulk PL/SQL.
Keeping the thing going is vital so we need to pay attention to Bulk Error Handling.
Ideally the target tables can be partitioned, so we can load into offline tables and use Partition Exchange to bring the cleaned data online.
For the synthetic key I would be tempted to use a hash key based on the (request,type,response) tuple rather than a sequence, as that would give us the option to load TIMESTAMP and FINAL independently. (Collisions are extremely unlikely.)
Just to be clear, this is a bagatelle not a serious architecture. You need to experiment and benchmark various approaches against realistic volumes of data on Production-equivalent hardware.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
If we use Oracle Row Level Security(RLS) to hide some records - Are there any Performance Implications - will it slow down my SQL Queries? The Oracle Package for this is: DBMS_RLS.
I plan to add: IS_HISTORICAL=T/F to some tables. And then using RLS, hide the records which have value of IS_HISTORICAL=T.
The SQL Queries we use in application are quite complex, with inner/outer joins, subqueries, correlated subqueries etc.
Of the 200 odd tables, about 50 of them will have this RLS Policy (to hide records by IS_HISTORICAL=T) applied on them. Rest of the 150 tables are child tables of these 50 Tables, so RLS is implicit on them.
Any License implications?
Thanks.
"Are there any Performance Implications - will it slow down my SQL
Queries? "
As with all questions relating to performance the answer is, "it depends". RLS works by wrapping the controlled query in an outer query which applies the policy function as a WHERE clause...
select /*+ rls query */ * from (
select /*+ your query */ ... from t23
where whatever = 42 )
where rls_policy.function_t23 = 'true'
So the performance implications rest entirely on what goes in the function.
The normal way of doing these things is to use context namespaces. These are predefined areas of session memory accessed through the SYS_CONTEXT() function. As such the cost of retrieving a stored value from a context is negligible. And as we would normally populate the namespaces once per session - say by an after logon trigger or a similar connection hook - the overall cost per query is trivial. There are different ways of refreshing the namespace which might have performance implications but again these are trivial in the overall schem of things (see this other answer).
So the performance impact depends on what your function actually does. Which brings us to a consideration of your actual policy:
"this RLS Policy (to hide records by IS_HISTORICAL=T)"
The good news is the execution of such a function is unlikely to be costly in itself. The bad news is the performance may still be Teh Suck! anyway, if the ratio of live records to historical records is unfavourable. You will probably end up retrieving all the records and then filtering out the historical ones. The optimizer might push the RLS predicate into the main query but I think it's unlikely because of the way RLS works: it avoids revealing the criteria of the policy to the general gaze (which makes debugging RLS operations a real PITN).
Your users will pay the price of your poor design decision. It is much better to have journalling or history tables to store old records and keep only live data in the real tables. Retaining historical records alongside live ones is rarely a solution which scales.
"Any License implications?"
DBMS_RLS requires an Enterprise Edition license.
I am developing an enterprise application with an Oracle backend. I am designing a core part of the DB architecture now and im having some questions on it.
First and most important thing is, most of my tables needs to preserve old data. For example
Consider a table with the fields
Contract No, Contract Name, Contract Person, Contract Email
I have a records like
12, xxx, yyy, xxx#zzz.ccc
and some one modifies it to
12, xxx, zzz, xxx#zzz.ccc
at any point of time i need to display the new record while still have copy of the old record.
So what i thought was to put a duplicate record of the old data and update the fields that was changed and have a flag to keep track of active records with something like "is active" as 1.
The downside is that this creates redundancy in the table and seems like a bad design. But any other model seems unnecessarily complex and this seems cleaner to me. Also i dont see any performance issues having a duplicate record too. So please let me know if this is ok or am i missing something here.
Some times where there is a one to many relationship my assumption is to have a mapping table where i map the multiple entity in individual records by repeating master ID and changing child ID in each record. Is this a right way to do it or is there a better way to do it.
Is there a book on database best practices.
Thanks.
The database im dealing with is Oracle 11g on a two node RAC cluster
Also i dont see any performance issues having a duplicate record too.
Assume you have a row that, over time, has 15 updates to it. If you don't store any temporal data (if you don't store different versions of the row), you end up storing one row. If you do store temporal data, you end up storing 15 rows.
You also need more indexes, because the id number is no longer sufficient to identify a single row.
If you have only relatively small tables, you probably won't see any performance difference. (There will be one, but it probably won't be noticeable to users.) But a table that has 10 million rows will perform differently than a table that has 150 million rows. (15 versions per row, times 10 million rows.)
Some times where there is a one to many relationship my assumption is
to have a mapping table where i map the multiple entity in individual
records by repeating master ID and changing child ID in each record.
Is this a right way to do it or is there a better way to do it.
You probably need to know which child rows belong to which parent rows. So you need more than a single master id for the key. The master id alone doesn't tell you which version of that row in the parent table applies to a given child row.
Is there a book on database best practices.
There are books on temporal databases. The first one that I know of is Snodgrass's Developing Time-Oriented Database Applications in SQL. It's available in several formats, and it's free. It's also kind of old, but the information in it is important to understand if you're going to be building a temporal database. Also, think about reading Date's book Temporal Data and the Relational Model.
Wikipedia has an article that summarizes the ideas behind temporal databases.
Is normalization completely mandatory.
That's a meaningless question. You will have different issues with tables normalized to 2NF than you'll have with tables normalized to 5NF or 6NF.
I would keep the old/history records in a separate table. Create an upd/del trigger to populate your audit/history table for you, and keep only the most current data in your main table.
See here for an example. Many other similar examples exists in SO.