External Tables vs SQLLoader - oracle

So, I often have to load data into holding tables to run some data validation checks and then return the results.
Normally, I create the holding table, then a sqlldr control file and load the data into the table, then I run my queries.
Is there any reason I should be using external tables for thing instead?
In what way will they make my life easier?

The big advantage of external tables is that we can query them from inside the database using SQL. So we can just run the validation checks as SELECT statements without the need for a holding table. Similarly if we need to do some manipulation of the loaded data it is almost always easier to do this with SQL rather than SQLLDR commands. We can also manage data loads with DBMS_JOB/DBMS_SCHEDULER routines, which further cuts down the need for shell scripts and cron jobs.
However, if you already have a mature and stable process using SQLLDR then I concede it is unlikely you would realise tremendous benefits from porting to external tables.
There are also some cases - especially if you are loading millions of rows - where the SQLLDR approach may be considerably faster. Howver, the difference will not be as marked with more recent versions of the database. I fully expect that SQLLDR will eventually be deprecated in favour of external tables.

If you look at the External Table syntax, it looks suspiciously like SQL*Loader control file syntax :-)
If your external table is going to be repeatedly used in multiple queries it might be faster to load a table (as you're doing now) rather than rescan your external table for each query. As #APC notes, Oracle is making improvements in them, so depending on your DB version YMMV.

I would use external tables for their flexibility.
It's easier to modify the data source on them to be a different file alter table ... location ('my_file.txt1','myfile.txt2')
You can do multitable inserts, merges, run it through a pipelined function etc...
Parallel query is easier ...
It also establishes dependencies better ...
The code is stored in the database so it's automatically backed up ...

Another thing that you can do with external tables is read compressed files. If your files are gzip compressed for example, then you can use the PREPROCESSOR directive within your external table definition, to decompress the files as they are read.

Related

Hive Managed vs External tables maintainability

Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison

How can I load large amount of data into oracle database from .csv -file without risking to drop och mismatch data?

I’m in the middle of trying to migrate a large amount of data into a oracle database from existing excel-files.
Due to the large amount of rows loaded (10 000 and more) every time, it is not possible to use SQL Developer for this tasks.
In every work-sheet there’s data that need to go into different tables, but at the same time keep the relations and not dropping any data.
As for now, I use one .CSV file for each table and mapping them together afterwards. This is thou combined with a great risk of adding the wrong FK and with that screw up the hole shit. And I don’t have the time, energy or will for clean ups even if it is my own mess…
My initial thought was if I could bulk transfer with sql loader using some kind of plsql-script in maybe an ctl-file (the used for mapping the properties) but it seems like I.m quite out in the bush with that one… (or am I…? )
The other thought was to create a simple program In c# and use fastMember and load the database that way. (But that means that I need to take the time to actually make the program, however small it is).
I can’t possible be the only one that have had this issue, but trying to us my notToElevatedNinjaGoogling-skills ends up with either using sql developer (witch is not an alternative) or the bulk copy thing from sql load (and where I need to map it all together afterwards).
Is there any alternative solutions for my problem or is the above solutions the one that I need to cope with?
Did you consider using CSV files as external tables? As they act as if they were ordinary Oracle tables, you can write (PL/)SQL against them, inserting data into different tables in the target schema. That might give you some more freedom & control over what you are doing.
Behind the scene, it is still SQL*Loader.

Advantages of temporary tables in Oracle

I've tried to figure out which performance impacts the use of temporary tables has on an Oracle database. We want to use these tables in our ETL process to save temporary results. At this time we are using physical tables for this purpose and truncating this tables at the beginning of the ETL process. I know that the truncate process is very expensive and therefore I thought if it would be better to use temporary tables instead.
Have anyone of you experiences if there is a performance boost by using temporary tables in this scenario?
There were only some answers on this question regarding to the SQL Server like in this question. But I don't know if these recommendations also fit for the Oracle db.
It would be nice if anyone could list the advantages and disadvanteges of this feature and also point out in which scenarios this feature could be applicable.
Thanks in advance.
First of all: truncate is not expensive, a delete with no condition is very expensive.
Second: do your temporary table have indexes? What about external keys?
That could affect performance.
The temporary table works more or less like Sql Server (of course the syntax is different, like global temporary table), and both are just table.
You won't get any performance gain with temporary tables against normal table, they are just the same: they have a definition on DB, can have indexes, and are logged.
The only difference is that temporary table are exclusive to your session (except for global table) and that means if multiple scripts from multiple sessions refer to the same table, every one is reading/writing a different table and they cannot locking each other (in this case you could gain performance, but I think it's rarely the case).

Benefits of External Tables vs. UTL_FILE

I am writing an application in PL/SQL that takes a .csv flat-file, reads it, does some data processing on it, and then decides which of several tables to update, insert into, or delete.
I have the option of using the UTL_FILE.GET_LINE functionality to process a single record at a time, parsing it with various REGEX tools, storing the data temporarily in some variables, and then doing work with it (making decisions, updating tables, etc.)
I ALSO have the option, of creating an External table, and then just stepping through it using a cursor on said external table (using a for each loop for performance) I should still be able to do all of the same things with the data(making decisions, updating tables, etc.)
I have looked around, and a couple of forums suggest that External Tables are the preferred solution to this, as they scale better, are faster, and more reliable. I have not, however, heard a why. Oracles documentation on utl_file and/or external tables does not talk about why one might be faster than the other, so I'm curious if anyone has some more information or references that I do not about what would make one perform better over the other.
The performance difference is quite simple: UTL_FILE is a PL/SQL package, while external tables use the SQL*Loader code written in C.
If you have enough data, you can even load external tables in parallel with minimal effort f.i. ALTER TABLE my_external_table PARALLEL 4;
External tables can be used in bulk mode (INSERT INTO my_table SELECT ... FROM my_external_table JOIN my_lookup_table USING (lookup_column)).
External tables can be set to transactionally safe mode (REJECT LIMIT 0), so the above INSERT either works or rolls back.
Do you need more reasons?
If the file has data that has a known structure/file format then external table is the way to go. UTL_FILE is at a different abstraction level - you are now just working with a file - your use of UTL_FILE will be brittle and likely introduce bugs. The deciding factor should not be performance; however I doubt you will be able to 'outperform' Oracle's external table implementation by rolling your own using REGEX and UTL_FILE.

How can I use Oracle Preprocessor for External Tables to consume this type of format?

Suppose I have a custom file format, which can be analogous to N tables. Let's pick 3. I could transform the file, writing a custom load wrapper to fill 3 database tables.
But suppose for space and resource constraints, I can't store all of this in the tablespace.
Can I use Oracle Preprocessor for External Tables to transform the custom file three different ways?
The examples of use I have read give gzip'd text files an example. But this is a one-to-one file-to-table relationship, with only one transform.
I have a single file with N possible extractions of data.
Would I need to define N external tables, each referencing a different program?
If I map three tables to the same file, how will this affect performance? (Access is mostly or all reads, few or no writes).
Also, what format does the standard output of my preprocessor have to be? Must it be CSV, or are there ways to configure the external table driver?
"If I map three tables to the same
file, how will this affect
performance? (Access is mostly or all
reads, few or no writes"
There should be little or no difference between three sessions accessing the same file through one external table definition or three external table definitions.
External tables aren't cached by the database (might be by the file system or disk), so any access is purely physical reads.
Depending on the pre-processor program, there might be some level of serialization there (or you may use a pre-processor program to impose serialization).
Performance-wise, you'd be better for a single session to scan the external file/table and load it into one or more database tables. The other sessions read it from there and it is cached in the SGA. Also, you can index a database table so you don't have to read it all.
You may be able to use multi-table inserts to load multiple database tables from a single external table definition in a single pass.
"what format does the standard output
of my preprocessor have to be? Must it
be CSV, or are there ways to configure
the external table driver?"
It pretty much follows SQL*Loader, and both are in the Utilities manual. You can use fixed format or other delimiters.
Would I need to define N external
tables, each referencing a different
program?
Depends on how the data is interleaved. Ignoring pre-processors, you can have different external tables pulling different columns from the same file or use the LOAD WHEN clause to determine which records to include or exclude.

Resources