Advantages of temporary tables in Oracle - oracle

I've tried to figure out which performance impacts the use of temporary tables has on an Oracle database. We want to use these tables in our ETL process to save temporary results. At this time we are using physical tables for this purpose and truncating this tables at the beginning of the ETL process. I know that the truncate process is very expensive and therefore I thought if it would be better to use temporary tables instead.
Have anyone of you experiences if there is a performance boost by using temporary tables in this scenario?
There were only some answers on this question regarding to the SQL Server like in this question. But I don't know if these recommendations also fit for the Oracle db.
It would be nice if anyone could list the advantages and disadvanteges of this feature and also point out in which scenarios this feature could be applicable.
Thanks in advance.

First of all: truncate is not expensive, a delete with no condition is very expensive.
Second: do your temporary table have indexes? What about external keys?
That could affect performance.
The temporary table works more or less like Sql Server (of course the syntax is different, like global temporary table), and both are just table.
You won't get any performance gain with temporary tables against normal table, they are just the same: they have a definition on DB, can have indexes, and are logged.
The only difference is that temporary table are exclusive to your session (except for global table) and that means if multiple scripts from multiple sessions refer to the same table, every one is reading/writing a different table and they cannot locking each other (in this case you could gain performance, but I think it's rarely the case).

Related

oracle schema sharing , is it possible?

Trying to understand if there is any such concept like this in Oracle Database.
Let's say I have two Databases, Database_A & Database_B
Database_A has schema_A, is there a way I can attach this schema to Database_B?
What I mean by this is if there is a job populating a TABLE_A in schema_A, I can see that read-only view in Database_B. We are trying to split a big Oracle database into two smaller databases and have a vast PL/SQL code, and trying to minimize the refactoring here.
Sharding might be what you're looking for. The schemas and tables will still logically exist on all databases, but you can arrange the data to be physically stored in specific databases. There might be a way to setup shardspaces, tablespaces, and user default tablespaces in a way where each schema's data is automatically stored in a specific database.
But I haven't actually used sharding. From what I've read, it seems to be designed for massive distributed OLTP systems, and it is likely complicated to administer. I'd guess this feature isn't worth the hassle unless you have petabytes of data.

Hive Managed vs External tables maintainability

Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison

Benefits of External Tables vs. UTL_FILE

I am writing an application in PL/SQL that takes a .csv flat-file, reads it, does some data processing on it, and then decides which of several tables to update, insert into, or delete.
I have the option of using the UTL_FILE.GET_LINE functionality to process a single record at a time, parsing it with various REGEX tools, storing the data temporarily in some variables, and then doing work with it (making decisions, updating tables, etc.)
I ALSO have the option, of creating an External table, and then just stepping through it using a cursor on said external table (using a for each loop for performance) I should still be able to do all of the same things with the data(making decisions, updating tables, etc.)
I have looked around, and a couple of forums suggest that External Tables are the preferred solution to this, as they scale better, are faster, and more reliable. I have not, however, heard a why. Oracles documentation on utl_file and/or external tables does not talk about why one might be faster than the other, so I'm curious if anyone has some more information or references that I do not about what would make one perform better over the other.
The performance difference is quite simple: UTL_FILE is a PL/SQL package, while external tables use the SQL*Loader code written in C.
If you have enough data, you can even load external tables in parallel with minimal effort f.i. ALTER TABLE my_external_table PARALLEL 4;
External tables can be used in bulk mode (INSERT INTO my_table SELECT ... FROM my_external_table JOIN my_lookup_table USING (lookup_column)).
External tables can be set to transactionally safe mode (REJECT LIMIT 0), so the above INSERT either works or rolls back.
Do you need more reasons?
If the file has data that has a known structure/file format then external table is the way to go. UTL_FILE is at a different abstraction level - you are now just working with a file - your use of UTL_FILE will be brittle and likely introduce bugs. The deciding factor should not be performance; however I doubt you will be able to 'outperform' Oracle's external table implementation by rolling your own using REGEX and UTL_FILE.

Doesn't Read-Only make a difference for SQL Server?

I’ve been tasked with optimizing a rather nasty stored procedure in a legacy system. It’s a database dedicated to search, and a new copy is being generate every day, with a lot of complex joins being de-normalized. No writes are being performed, only SELECTs, so I figured some easy improvements could be made by making the whole database read-only and changing the recovery model to “Simple”.
Much to my surprise, this didn’t help – at all! The stored procedure still takes the same amount of time of complete. If fact, I’m so surprised that I figured I did it wrong!
My questions:
Do I need to do anything other than setting “Database read-only” to “true”?
Am I wrong to expect significant performance improvement by making the database read-only?
Same for the recovery model: Shouldn’t “Simple” have some noticeable impact?
Are there other similar database-wide configurations that can improve performance in this scenario?
The stored procedure is huge, with temporary tables, 40+ tables joined in 20+ queries. But I’d like to optimize the database itself before I edit this proc.
Since no writes are performed by your SP, there is no reason to expect noticable performance improvement from changing recovery model and read-write mode.
As others mentioned, you should look into the query plan and optimize your queries.
Another hint: indexes in the database might get fragmented while the database is filled up. Since the data is not going to be modified any more, it might help to rebuild all the indexes with fillfactor 100 - this might help to get rid of fragmentation and to compact data.
Call this for each table in the database: ALTER INDEX ALL ON table_name REBUILD WITH (FILLFACTOR = 100).
Generally, I won't expect much of performance improvement from this, but it depends on the particular database.
Speaking of query optimization, there are very useful features in SQL Server 2005 and later: Execution Related and Index-Related Dynamic Management Views. In particular, sys.dm_exec_query_stats and missing indexes are of interest.
These give you almost the same information as Tuning Advisor, but using you real-life workload, so you don't need to simulate it and feed to the Advisor.
Have you tried using the Database Engine Tuning Advisor included in SQL Server? It will analyze your query and suggest new indexes that will improve the performance of the query. Some of them will be good, some will be bad (for example, I've seen it suggest adding every column in a table to an index, sometimes like 30 of them!), so I don't follow it blindly. Generally I'll add a few indexes and then retest, to find the suggestions that are the most important. I've used it to optimize many queries that I thought I had properly indexed, only to find I could get a lot more performance out of them.
I had a similar setup, large stored procedures with lots of large temp tables.
Our problem was that the joins with and between the temp tables was very slow.
I recommend that you look at your execution plan and try to add relevant indexes to the temp tables too if you have not already.

Slow Performance on Sql Express after inserting big chunks of data

We have noticed that our queries are running slower on databases that had big chunks of data added (bulk insert) when compared with databases that had the data added on record per record basis, but with similar amounts of data.
We use Sql 2005 Express and we tried reindexing all indexes without any better results.
Do you know of some kind of structural problem on the database that can be caused by inserting data in big chunks instead of one by one?
Thanks
One tip I've seen is to turn off Auto-create stats and Auto-update stats before doing the bulk insert:
ALTER DATABASE databasename SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT
ALTER DATABASE databasename SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT
Afterwards, manually creating statistics by one of 2 methods:
--generate statistics quickly using a sample of data from the table
exec sp_createstats
or
--generate statistics using a full scan of the table
exec sp_createstats #fullscan = 'fullscan'
You should probably also turn Auto-create and Auto-update stats back on when you're done.
Another option is to check and defrag the indexes after a bulk insert. Check out Pinal Dave's blog post.
Probably SQL Server allocated new disk space in many small chunks. When doing big transactions, it's better to pre-allocate much space in both the data and log files.
That's an interesting question.
I would have guessed that Express and non-Express have the same storage layout, so when you're Googling for other people with similar problems, don't restrict yourself to Googling for problems in the Express version. On the other hand though, bulk insert is a common-place operation and performance is important, so I wouldn't consider it likely that this is a previously-undetected bug.
One obvious question: which is the clustered index? Is the clustered index also the primary key? Is the primary key unassigned when you insert, and therefore initialized by the database? If so then maybe there's a difference (between the two insert methods) in the pattern or sequence of successive values assigned by the database, which affects the way in which the data is clustered, which then affects performance.
Something else: as well as indexes, people say that SQL uses statistics (which it created as a result of runing previous queries) to optimize its execution plan. I don't know any details of that, but as well as "reindexing all indexes", check the execution plans of your queries in the two test cases to ensure that the plans are identical (and/or check the associated statistics).

Resources