When we are talking about parallel mode with sqlloader what does that actually mean? When i have in my script to execute:
Sqlldr control=first.ctl parallel=true direct=true data=first.unl
Sqlldr control=second.ctl parallel=true direct=true data=second.unl
I am inserting into 2 tables using as a data file for the inserts of the first table the first.unl and for the 2nd table the second.unl
By having parallel=true and direct=true will this run the 2 instances of sqlloader for first.unl and second.unl in parallel or will it run the first instance and do multiple inserts based on the first.unl and the run the second instance and do multiple inserts again based on the second.unl?
From the documentation
"PARALLEL specifies whether direct loads can operate in multiple concurrent sessions to load data into the same table."
So, one instance of SQL Loader uses multiple sessions to insert into one table. The actual degree of parallelism is governed by the usual parallelization parameters.
"So i cannot have inserting into multiple tables in parallel?"
If you kick off two SQL Loader instances they will run simultaneously. You need to be careful that you have enough CPU to handle the number of threads you're spawning.
Related
I have a scenario where there are 3 files of 5 million lines each with 3 weeks of data is bulk mapped to a staging table. They run parallelly. If the data transfer for a file fails with a concurrency error, what will be the best way to load the data of the 3 files effectively into the stage table.
(I was asked this in an interview.)
In ODI 12c, Parallel Target Table Load can be achieved by selecting the Use Unique Temporary Object Names checkbox in the Physical tab of a mapping. The work tables for each concurrent session will have a unique name.
I discuss it in more details in the Parallel Target Table Load section of this blog on ODI 12c new features
In my business case, I need insert one row and can't use batch insert . So I want to know what the throughput can made by Oracle. I try these ways:
Effective way
I use multi-thread, each thread owns one connection to insert data
I use ssd to store oracle datafile
Ineffective way
I use multi table to store data in one schema
I use table partition
I use multi schema to store data
Turn up data file block size
Use append hint in insert SQL
In the end the best TPS is 1w/s+
Other:
Oracle 11g
Single insert data size 1k
CPU i7, 64GB memory
Oracle is highly optimized for anything from one row inserts to batches of hundreds of rows. You do not mention whether you are having performance problems with this one row insert nor how long the insert takes. For such a simple operation, you don't need to worry about any of those details. If you have thousands of web-based users inserting one row into a table every minute, no problem. If you are committing your work at the appropriate time, and you don't have a huge number of indexes, a single row insert should not take more than a few milliseconds.
In SQL*Plus try the commands
set autotrace on explain statistics
set timing on
and run your insert statement.
Edit your question to include the results of the explain plan. And be sure to indent the results 4 spaces.
I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking
I am not an expert of oracle, I just write queries and scripts through TOAD but I don't know how oracle works.
In order to make a data analysis I have created several tables in order to build a DB from them.
In order to build them I have used the following command
CREATE TABLE SCHEMA.NAME
TABLESPACE SCHEMASPACE NOLOGGING PARALLEL AS
select [...]
with the parallel option active in order to make the server work on more processors.
Once that the DB was created I have tried to do some queries on it but the same query done on the same DB 2 times gave me two different answers. In detail the first time I started the query it returned to me a table with about 2500 rows and the second time it returned to me a table with about 2650 rows.
I have tried to redo the entire process without the parallel option and now it seems to work, but it take too much time and I need to run the process several times, does anyone have a solution?
I am writing a Java program that creates a CSV file with 6,800,000 records conforming to specific distribution parameters and populates a table using Oracle's SQL*Loader.
I am testing my program using different sizes of records (50,000 and 500.000). The CSV File generation by itself is quite fast, using concurrency it takes miliseconds to create and insert these records into a file.
Inserting said records, on the other hand, is taking too long. Reading the log file generated by SQL*Loader, it takes 00:00:32.90 seconds to populate the table with 50,000 records and 00:07:58.83 minutes to populate it with 500,000.
SQL*Loader benchmarks I've googled show much better perfomances, such as 2 million rows in less than 2 minutes. I've followed this tutorial to improve the time, but it barely changed at all. There's obviously something wrong here, but I don't know what.
Here's my control file:
OPTIONS (SILENT=ALL, DIRECT=TRUE, ERRORS=50, COLUMNARRAYROWS=50000, STREAMSIZE=500000)
UNRECOVERABLE LOAD DATA
APPEND
INTO TABLE MY_TABLE
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
...
Another important info: I've tried using PARALLEL=TRUE, but I get the ORA-26002 error (Table MY_TABLE has index defined upon it). Unfortunatly, running with skip_index_maintenance renders the index UNUSABLE.
What am I doing wrong?
Update
I have noticed that soon after running the program (less than a second), all rows are already present in the database. Yet, SQL*Loader is still busy and only finishes after 32-45 seconds.
What could it be doing?
One thought would be to create an external table and set the name to the csv file. Then after creating the file you can run a sql script inside Oracle to process the data directly.
Or, look at the following (copied from here:)
This issue is caused when using the bulk load option in parallel to load an Oracle target that has an index on it. An Oracle limitation.
To resolve this issue do one of the following:
· Change the target load option to Normal.
· Disable the enable parallel mode option in relational connection browser.
· Drop the indexes before loading.
· Or create a pre- and post-session sql to drop and create indexes and key constraints