I have read many posts compaing External table with sqlloader and the main advantage is optimizing the select query with many options available in SQL for the external table. But i am finding it difficult to do selects on large files(1.5 GB). Just for a select count(*) itself it takes minutes to perform.
My plan is to generate a report based on this data by doing a number of select statements from this data. I wonder if this is a better idea compared to loading the data to an internal table.
I assume the ideal use of External table would be to do SELECT on the file to perform cleanup and Load to an internal table more efficiently. It is not meant to use the file as a table for a longer duration(Especially for large files). Please correct if i am wrong.
If you're going to execute multiple select on data from big file it is much better to load it to some internal staging table (either by SQLoader or by external table and insert as select) and then perform queries.
You should probably consider creating some indexes on table to speed up your queries.
Related
In db2wh,
One of our tasks is to look for candidate alternatives for INSERT FROM master SELECT * FROM staging and dbload may be the one.
Comparing the elapsed time of INSERT and dbload from a same local CSV file, dbload is a little bit faster than INSERT but is nearly the same.
Question is:
As internal implementation, is dbload same as INSERT? What is dbload advantage compared to INSERT OR which one is better to use for laoding data?
dbload uses Db2 Warehouse EXTERNAL TABLEs to get data into Db2. INSERTs from EXTERNAL TABLEs are the same as INSERTs from SELECTs in many respects. They use much of the same internal processing within Db2.
Generally speaking, once you have got your data into the database (i.e. into Staging), you are better off leaving it in the database, rather than exporting it, then re-importing it again.
In short, stick to INSERT FROM SELECT.
I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking
This is known to us that all DML statement has been supported by Oracle Regular Table but not the same for External Table? I tried below :
SQL> INSERT INTO xtern_empl_rpt VALUES ('70','Rakshit','Nantu','4587966214','na
tu.rakshit#ge.com','55');
INSERT INTO xtern_empl_rpt VALUES ('70','Rakshit','Nantu','4587966214','natu.ra
kshit#ge.com','55')
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
SQL> update xtern_empl_rpt set FIRST_NAME='Arup' where SSN='896743856';
update xtern_empl_rpt set FIRST_NAME='Arup' where SSN='896743856'
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
SQL>
So it seems External table not support this. But my question is - what the logical reason behind this design?
There is no mechanism in Oracle for locking rows in external tables, none of the concurrency controls which we get with regular heap tables. So updating is not allowed.
External tables created with the Oracle Loader driver are read only; the Datapump driver allows us to write to external table files but only in an CTAS mode.
The problem is that eternal tables are basically windows on OS files, without the layer of abstraction and control that internal tables offer. Basically, there is no way for the database to lock a record in an OS file, because the notion of a "record" is a databse thang, not an OS file thang.
External tables are designed for only one thing: data loading and unloading. They are simply not meant to be used with normal DML, and they're not really meant for normal selects either - that works, but if you need to do a lot of selections on an external table, you're "doing it wrong": load the data into proper tables, calculate statistics & add indexes as necessary.
Having external tables behave like normal tables would need that all the transactional machinery be implemented for them, which is very complex, and not worth it since that's not what they are meant for.
If you need normal tables and want to transplant them from one Oracle database to another, you should evaluate using transportable tablespaces too.
Limitations of external table are an obvious consequence of their being read-only; they are an adapter to involve in SQL queries either arbitrary record-organized files (ORACLE_LOADER type) or exported copies of tables in another database (ORACLE_DATAPUMP type).
As already mentioned, external tables are only good for full table scan queries; if one needs to use indexes in heavy duty queries or to modify foreign data sets that have been imported from files, regular tables can be populated using the SQL Loader tool.
I am going to create a lot of data scripts such as INSERT INTO and UPDATE
There will be 100,000 plus records if not 1,000,000
What is the best way to get this data into Oracle quickly? I have already found that SQL Loader is not good for this as it does not update individual rows.
Thanks
UPDATE: I will be writing an application to do this in C#
Load the records in a stage table via SQL*Loader. Then use bulk operations:
INSERT INTO SELECT (for example "Bulk Insert into Oracle database")
mass UPDATE ("Oracle - Update statement with inner join")
or a single MERGE statement
To keep It as fast as possible I would keep it all in the database.
Use external tables (to allow Oracle to read the file contents),
and create a stored procedure to do the processing.
The update could be slow, If possible, It may be a good idea to consider creating a new table based on all the records in the old (with updates) then switch the new & old tables around.
How about using a spreadsheet program like MS Excel or LibreOffice Calc? This is how I perform bulk inserts.
Prepare your data in a tabular format.
Let's say you have three columns, A (text), B (number) & C (date). In the D column, enter the following formula. Adjust accordingly.
="INSERT INTO YOUR_TABLE (COL_A, COL_B, COL_C) VALUES ('"&A1&"', "&B1&", to_date ('"&C1&"', 'mm/dd/yy'));"
I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?
It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.
Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm
Drop the index, then insert the rows, then re-create the index.
If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html
Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');
Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed
Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html
Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo