HSQL simple one column update runs forever - performance

I have a database with about 125 000 rows, each row with primary key, couple of int columns and couple of varchars.
I've added an int column and I'm trying to populate it before adding not null constraint.
The db is persisted in script file. I've read somewhere that all the affected rows get loaded to memory before the actual update, which means there wont be a disk write for every row. The whole db is about 20MB which would mean loading it and doing the update should be reasonably fast, right?
So, no joins, no nested queries, basic update.
I've tried multiple db managers including the one bundled with hsql jar.
update tbl1 set col1 = 1
Query never finishes executing.

It is probably running out of memory.
The easier way to do this operation is to define the column with DEFAULT 1, which does not use much memory regardless of the size of the table. You can even add the not null constraint at the same time
ALTER TABLE T ADD COLUMN C INT DEFAULT 1 NOT NULL

Related

Oracle sql does not release space from temp when executing finishes

There is a table(lets say TKubra) which has 2.255.478 record in it.
And there is a query like:
select *
from kubra.tkubra
where ckubra is null
order by c1kubra asc;
ckubra does not have null records. It has 3 thousand record of ids and rest of it has empty space characters.
ckubra has index but when the statement executes, it does full table scan and its cost is 258.794.
And the result returns null as normally.
When the statement executes, it consumes temporary tablespace space and does not release space after finishes.
what causes of this ?
This is the query and the results for the temporary tablespace usage:
Oracle does not store information about NULL values in normal (BTree) indexes. Thus, when you query using a condition like WHERE CKUBRA IS NULL the database engine has to perform a full table scan to generate the answer.
However, bitmap indexes do store NULL values, so if you want to be able to use an index to find NULL values you can create a bitmap index on the appropriate fields, as in:
CREATE BITMAP INDEX KUBRA.TKUBRA_2
ON KUBRA.TKUBRA(CKUBRA);
Once you've created an index, remember to gather statistics on the table:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS('KUBRA', 'TKUBRA');
END;
That may allow the database to use an index to find NULL values - but be aware that bitmap indexes are intended for low-update applications, such as data warehouses, and using one on a transactional table (one which is frequently updated) may lead to performance problems.
Still, it's something to play with - and you can always drop it later.
Best of luck.
Temporary tablespace is not released until all the rows are returned, or the cursor is closed, or the session is closed.
Are you sure all those 19 sessions are truly done executing the query? It looks like it's returning a lot of data, which implies it may take a while for the application to retrieve all the rows.
If you run the query in an IDE like SQL Developer, it will normally only return the Top N rows. Your IDE may imply the query has finished, but if there are more rows to receive it is not truly finished yet.

Adding column with default value

I have a table A (3 columns) in production which is around 10 million records. I wanted to add one more column to that table and also I want to make default value to 1. Is it going to impact production DB performance If add a column with default value 1 or something else. What would be best approach to this to avoid any kind of performance impact on DB? your thoughts are much appreciated!!
In Oracle 11g the process of adding a new column with a default value has been considerably optimized. If a newly added column is specified as NOT NULL, default value for that column is maintained in the data dictionary and it's no longer required for a default value of a column to be stored for all records in a table, so it's no longer required to update each record with a default value. Such an optimization considerably reduces amount of time the table is exclusively locked during the operation.
alter table <tab_name> add(<col_name> <data_type> default <def_val> not null)
Moreover, column with a default value added that way will not consume space, until you deliberately start to update that column or insert a record with a non default value for that column. So the operation of adding a new column with a default value and not null constraint specified completes pretty quick.
i think that it is better that you create a table as backup table with this syntax:
create table BackUpTable as SELECT * FROM YourTable;
alter table BackUpTable add (newColumn number(5,0)default 1);

Sqlite appending data performance linear degradation, is this solvable?

I have a test set up to write rows to a database.
Each transaction inserts 10,000 rows, no updates.
Each step takes a linear time longer then the last.
The first ten steps took the following amount of time in ms to perform a commit
568, 772, 942, 1247, 1717, 1906, 2268, 2797, 2922, 3816, 3945
By the time it reaches adding 10,00 rows to a table of 500,000 rows, it takes 37149 ms to commit!
I have no foreign key constraints.
I have found using WAL, improves performance (gives figures above), but still linear degradation
PRAGMA Synchronous=OFF has no effect
PRAGMA locking_mode=EXCLUSIVE has no effect
Ran with no additional indexes and additional indexes. Made a roughly constant time difference, so was still a linear degradation.
Some other settings I have
setAutocommit(false)
PRAGMA page_size = 4096
PRAGMA journal_size_limit = 104857600
PRAGMA count_changes = OFF
PRAGMA cache_size = 10000
Schema has Id INTEGER PRIMARY KEY ASC, insertion of which is incremental and generated by Sqlite
Full Schema as follows (I have run both with and without indexes, but have included)
create table if not exists [EventLog] (
Id INTEGER PRIMARY KEY ASC,
DocumentId TEXT NOT NULL,
Event TEXT NOT NULL,
Content TEXT NOT NULL,
TransactionId TEXT NOT NULL,
Date INTEGER NOT NULL,
User TEXT NOT NULL)
create index if not exists DocumentId ON EventLog (DocumentId)
create index if not exists TransactionId ON EventLog (TransactionId)
create index if not exists Date ON EventLog (Date)
This is using sqlite-jdbc-3.7.2 running in a windows environment
SQLite tables and indexes are internally organized as B-Trees. In tables, the Rowid is the sorting key. (Your INTEGER PRIMARY KEY is the Rowid.)
If your inserted IDs are not larger than the largest ID already in the table, then the records are not appended, but inserted somewhere in the middle of the tree. When inserting enough records in one transaction, and if the distribution of IDs is random, this means that almost every page in the database must be rewritten.
To avoid this,
insert the IDs in increasing order; or
insert the IDs as NULL so that SQLite chooses the next value; or
prevent SQLite from using your ID field a Rowid by declaring it as INTEGER UNIQUE (or just INTEGER if you don't need the extra check/index), thus making the table ordering independent of your ID.
In the case of indexes, inserting an indexed field with a random distribution requires that the index is updated at a random position. Like with tables, when inserting enough records in one transaction, this means that almost every page in the index must be rewritten.
When you're loading large amounts of data, it is recommended to do this without any indexes and to recreate them afterwards. (Unlike some other databases, SQLite has no function to temporarily disable indexes; just drop them.)
FYI, although I haven't limited the structure in terms of the content of the key, in 99.999% of cases, it will be a guid. So to resolve the performance issue I just wrote an algorithm for generating sequential guids using a time based value for the first 8 hex digits. This worked very well, even if blocks of guids are generated using early time values.

create index before adding columns vs. create index after adding columns - does it matter?

In Oracle 10g, does it matter what order create index and alter table comes in?
Say i have a query Q with a where clause on column C in table T. Now i perform one of the following scenarios:
I create index I(C) and then add columns X,Y,Z.
Add columns X,Y,Z then create index I(C).
Q is 'select * from T where C = whatever'
Between 1 and 2 will there be a significant difference in performance of Q on table T when T contains a very large number of rows?
I personally make it a practice to do #2 but others seem to have a different opinion.
thanks
It makes no difference if you add columns to a table before or after creating an index. The optimizer should pick the same plan for the query and the execution time should be unchanged.
Depending on the physical storage parameters of the table, it is possible that adding the additional columns and populating them with data may force quite a bit of row migration to take place. That row migration will generate changes to the indexes on the table. If the index exists when you are populating the three new columns with data, it is possible that populating the data in X, Y, and Z will take a bit longer because of the additional index maintenance.
If you add columns without populating them, then it is pretty quick as it is just a metadata change. Adding an index does require the table to be read (or potentially another index) so that can be very time consuming and of much greater impact than the simple metadata change of recording the new index details.
If the new columns are going to be populated as part of the ALTER TABLE, it is a different matter.
The database may undergo an unplanned shutdown during the course of adding that data to every row of the table data
The server memory may not have room to record every row changed in that table
Therefore those row changes may be written to datafiles before commit, and are therefore written as dirty blocks
The next read of those blocks, after the ALTER table has successfully completed will do a delayed block cleanout (ie record the fact that the change has been committed)
If you add the columns (with data) first, then the create index will (probably) read the table and do the added work of the delayed block cleanout.
If you create the index first then add the columns, the create index may be faster but the delayed block cleanout won't happen and that housekeeping will be picked up by the application later (potentially by the select * from T where C = whatever)

How do I prevent the loading of duplicate rows in to an Oracle table?

I have some large tables (millions of rows). I constantly receive files containing new rows to add in to those tables - up to 50 million rows per day. Around 0.1% of the rows I receive are duplicates of rows I have already loaded (or are duplicates within the files). I would like to prevent those rows being loaded in to the table.
I currently use SQLLoader in order to have sufficient performance to cope with my large data volume. If I take the obvious step and add a unique index on the columns which goven whether or not a row is a duplicate, SQLLoader will start to fail the entire file which contains the duplicate row - whereas I only want to prevent the duplicate row itself being loaded.
I know that in SQL Server and Sybase I can create a unique index with the 'Ignore Duplicates' property and that if I then use BCP the duplicate rows (as defined by that index) will simply not be loaded.
Is there some way to achieve the same effect in Oracle?
I do not want to remove the duplicate rows once they have been loaded - it's important to me that they should never be loaded in the first place.
What do you mean by "duplicate"? If you have a column which defines a unique row you should setup a unique constraint against that column. One typically creates a unique index on this column, which will automatically setup the constraint.
EDIT:
Yes, as commented below you should setup a "bad" file for SQL*Loader to capture invalid rows. But I think that establishing the unique index is probably a good idea from a data-integrity standpoint.
Use Oracle MERGE statement. Some explanations here.
You dint inform about what release of Oracle you have. Have a look at there for merge command.
Basically like this
---- Loop through all the rows from a record temp_emp_rec
MERGE INTO hr.employees e
USING temp_emp_rec t
ON (e.emp_ID = t.emp_ID)
WHEN MATCHED THEN
--- _You can update_
UPDATE
SET first_name = t.first_name,
last_name = t.last_name
--- _Insert into the table_
WHEN NOT MATCHED THEN
INSERT (emp_id, first_name, last_name)
VALUES (t.emp_id, t.first_name, t.last_name);
I would use integrity constraints defined on the appropriate table columns.
This page from the Oracle concepts manual gives an overview, if you also scroll down you will see what types of constraints are available.
use below option, if you will get this much error 9999999 after that your sqlldr will terminate.
OPTIONS (ERRORS=9999999, DIRECT=FALSE )
LOAD DATA
you will get duplicate records in bad file.
sqlldr user/password#schema CONTROL=file.ctl, LOG=file.log, BAD=file.bad

Resources