Is there a way to optimize the insertion of a lot of data into an empty CockroachDB table?
To optimize inserting data into CockroachDB tables, there are a few pieces of guidance:
Create the table without any secondary indexes, insert your data, and then add any secondary indexes you want.
Insert 500 rows per INSERT statement. That number might vary a bit depending on the size of your rows, but is a good guideline to optimize the speed at which you can write data.
Use the IMPORT statement to bulk import CSV files into a single table. This is the fastest way to get data into CockroachDB.
If you're moving from PostgreSQL to CockroachDB, you can also use pg_dump to create a COPY statement, which CockroachDB is optimized to ingest. It's a slightly more involved process, but you can find the details about how to do it in CockroachDB's import documentation.
Related
In my business case, I need insert one row and can't use batch insert . So I want to know what the throughput can made by Oracle. I try these ways:
Effective way
I use multi-thread, each thread owns one connection to insert data
I use ssd to store oracle datafile
Ineffective way
I use multi table to store data in one schema
I use table partition
I use multi schema to store data
Turn up data file block size
Use append hint in insert SQL
In the end the best TPS is 1w/s+
Other:
Oracle 11g
Single insert data size 1k
CPU i7, 64GB memory
Oracle is highly optimized for anything from one row inserts to batches of hundreds of rows. You do not mention whether you are having performance problems with this one row insert nor how long the insert takes. For such a simple operation, you don't need to worry about any of those details. If you have thousands of web-based users inserting one row into a table every minute, no problem. If you are committing your work at the appropriate time, and you don't have a huge number of indexes, a single row insert should not take more than a few milliseconds.
In SQL*Plus try the commands
set autotrace on explain statistics
set timing on
and run your insert statement.
Edit your question to include the results of the explain plan. And be sure to indent the results 4 spaces.
In db2wh,
One of our tasks is to look for candidate alternatives for INSERT FROM master SELECT * FROM staging and dbload may be the one.
Comparing the elapsed time of INSERT and dbload from a same local CSV file, dbload is a little bit faster than INSERT but is nearly the same.
Question is:
As internal implementation, is dbload same as INSERT? What is dbload advantage compared to INSERT OR which one is better to use for laoding data?
dbload uses Db2 Warehouse EXTERNAL TABLEs to get data into Db2. INSERTs from EXTERNAL TABLEs are the same as INSERTs from SELECTs in many respects. They use much of the same internal processing within Db2.
Generally speaking, once you have got your data into the database (i.e. into Staging), you are better off leaving it in the database, rather than exporting it, then re-importing it again.
In short, stick to INSERT FROM SELECT.
So, I have a .NET program doing batch loading of records into partitioned tables using array bound stored procedure calls via Oracle ODP.NET, but that's neither here nor there.
What I would like to know is: because I have a partitioned index on said tables, the speed of the batch load is pretty slow. I fully understand that I cannot drop an index partition, but I would obviously prefer not to have to drop and rebuild the entire index since that will take considerably more time to execute. Is this my only recourse?
Is there a fairly simple way to drop the partition itself and then rebuild the partition and index partition that would save time and go about accomplishing my goal?
Are you loading an entire partition at once? Or are you merely adding new rows to an existing partition? Are all the indexes equipartitioned with the table?
Normally, if you are loading data into a partitioned table, your partitioning scheme is chosen so that each load will put data into a fresh partition. If that is the case, you can use partition exchange to load the data. In a nutshell, you load data into an (unindexed) staging table whose structure matches the real table, you create the indexes to match the indexes on the real table, and then do
ALTER TABLE partitioned_table
EXCHANGE PARTITION new_partition_name
WITH TABLE staging_table_name
WITHOUT VALIDATION;
We're trying to figure out the best way to handle BULK INSERTs using Oracle (10gR2), and I'm finding that it can be a pretty complicated subject. One method that I've found involves using the Append optimizer hint:
INSERT /*+ Append*/
INTO some_table (a, b)
VALUES (1, 2)
My understanding is that this will tell Oracle to ignore indexes and just put the results at the end of the table. Then, all I should have to do is rebuild the indexes:
ALTER INDEX some_index REBUILD
This would be easier than trying to launch SQL*Loader as an external process or doing some pl/SQL. This almost seems too easy. Is there something I'm missing? Any things that could come back to bite me if I take this approach?
A few notes ...
A single row cannot be appended, therefore APPEND is only valid with INSERT INTO ... SELECT FROM syntax.
An append is the addition of data above the high water mark of the table, in which the data is formatted into complete blocks that are then written to the table and which bypass the SQL engine
An append in parallel mode requires that each parallel query thread allocate at least one new extent to the table, into which the new blocks are written. This can be wasteful of space.
The indexes are not ignored, but maintenance of them is defered until the blocks have been written into the table.
See he docs for more important information: http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tables.htm#ADMIN01509
I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?
It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.
Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm
Drop the index, then insert the rows, then re-create the index.
If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html
Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');
Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed
Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html
Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo