I have this huge (8 GB/14126762 rows) with two tables, table1 (very big) and table2 (much smaller) on which I need to reduce a value from table1 using table2 vakyes.
While I did some tests using a smaller database(5MB database), it was fine. But now, when i use it on the bigger database, it takes forever and I don't know if it works at all.
For instance, it takes 12 mins to create the database with INSERT command.
The troublesome transaction is the following:
UPDATE table1
SET vl_empenho = vl_empenho -
(SELECT vl_estorno
FROM table2 WHERE table1.cd_ugestora =
table2.cd_ugestora AND table1.dt_ano =
table2.dt_ano AND table1.nu_empenho =
table2.nu_empenho)
WHERE cd_ugestora IN (SELECT table2.cd_ugestora FROM
table2 WHERE table1.dt_ano =
table2.dt_ano AND table1.nu_empenho =
table2.nu_empenho);
I´m not proficient on Sqlite and the transaction gave what I wanted, but I don't know if it redundant.
Thanks for any help!
After reading the comments and other stackoverflow related question, I made a index for each columns on the query, also I set:
PRAGMA synchronize = OFF;
PRAGMA jorunal_mode = MEMORY;
With that, i took about 20mins to do the UPDATE above mentioned and 6 mins for the INSERT command, which I find appropriate, considering the filesize(actually 10gb).
Thanks for all the attention!
EDIT: Regarding the comment from David Stein it is TRUE! You cant easily corrupt your database with these options. On my case, it was a very replaceable rebuidable database with no sensitive data. I rebuild it anytime I wanted and I was its only user. So I needed it blazing-fast!
Maybe it´s not your situation.
Related
I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.
I got a SQLite database (First project with SQLite) where I need to delete a bulk of records at once, this is like 14.000 records. The quere is as follows (I changed the names for better read) :
delete from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
This query takes like 8 minutes to delete. But when I do
select * from table_1 where table_2_id in (
select id from table_2 where table_3_id in (
select id from table_3
where (deleted = 1 or
table_4_id in (select id from table_4 where deleted = 1))));
it gives me a result in 3 seconds.
I tried using a transaction, cache size, journal mode, but I do not get it to work to get a better performance. What am I missing?
I had the same problem. Solution was to divide it to many smaller chunks, and do it again and again until no more rows are affected (sqlite3_changes() returns zero).
Of course this way the operation is not completed sooner, but the table is not locked for too long continuously.
Hope this helps someone.
It was 30MB and also turned of the virusscanner, but without any result. So I tried multiple things, I copied the database, delete all foreign keys and tried again, and it was much faster. So then I put an index on all foreign keys and it also deleted fast. So this was solution, but I do not know why it then select fast and deletes super slow. But hope this helps anyone!
Do you have an index on any of the columns involved? If so, consider dropping it for a large delete and then rebuilding it. If you don't, try adding one.
I have a script in python to delete entries in sqLite DB, lfl is a list of files that I want to delete from DB, in statement speeds process a lot:
while len(lfl)>0:
print ("Deleting entries in DB: ",len(lfl))
sql="""delete from md5t where file in ('%s')"""%('\',\''.join(tuple(lfl[:500])))
cursor.execute(sql)
db.commit()
del lfl[:500]
I am executing the below MERGE statement for Insert Update operation.
It is working fine for 1 to 2 million records but for more than 4 to 5 billion records it takes 6 to 7 hours to complete.
Can anyone suggest some alternative or performance tips for Merge Statement
merge into employee_payment ep
using (
select
p.pay_id vista_payroll_id,
p.pay_date pay_dte,
c.client_id client_id,
c.company_id company_id,
case p.uni_ni when 0 then null else u.unit_id end unit_id,
p.pad_seq pay_dist_seq_nbr,
ph.payroll_header_id payroll_header_id,
p.pad_id vista_paydist_id,
p.pad_beg_payperiod pay_prd_beg_dt,
p.pad_end_payperiod pay_prd_end_d
from
stg_paydist p
inner join company c on c.vista_company_id = p.emp_ni
inner join payroll_header ph on ph.vista_payroll_id = p.pay_id
left outer join unit u on u.vista_unit_id = p.uni_ni
where ph.deleted = '0'
) ps
on (ps.vista_paydist_id = ep.vista_paydist_id)
when matched then
update
set ep.vista_payroll_id = ps.vista_payroll_id,
ep.pay_dte = ps.pay_dte,
ep.client_id = ps.client_id,
ep.company_id = ps.company_id,
ep.unit_id = ps.unit_id,
ep.pay_dist_seq_nbr = ps.pay_dist_seq_nbr,
ep.payroll_header_id = ps.payroll_header_id
when not matched then
insert (
ep.employee_payment_id,
ep.vista_payroll_id,
ep.pay_dte,
ep.client_id,
ep.company_id,
ep.unit_id,
ep.pay_dist_seq_nbr,
ep.payroll_header_id,
ep.vista_paydist_id
) values (
seq_employee_payments.nextval,
ps.vista_payroll_id,
ps.pay_dte,
ps.client_id,
ps.company_id,
ps.unit_id,
ps.pay_dist_seq_nbr,
ps.payroll_header_id,
ps.vista_paydist_id
) log errors into errorlog (v_batch || 'EMPLOYEE_PAYMENT') reject limit unlimited;
Try using the Oracle hints:
MERGE /*+ append leading(PS) use_nl(PS EP) parallel (12) */
Try to using hints to optimize inner using query.
Processing lots of data takes lots of time...
Here are some things that may help you (assuming there is not a probolem with bad execution plan):
Adding a where-clause in the UPDATE-part to only update records when the values are actually different. If you are merging the same data over and over again and only a smaller subset of the data is actually modified, this will improve performance.
If you indeed are processing the same data over and over again, investigate whether you can add some modification flag/date to only process new records since last time.
Depending on the kind of environment and when/who is updating your source tables, investigate whether a truncate-insert approach is beneficial. Remember to set the indexes unusuable on before hand.
I think your best bet here is to exploit the patterns in your data. This is something oracle does not know about, so you may have to get creative.
I was working on a similar problem and a good solution i found was to break the query up.
The primary reason big table merges are a bad idea is because of the in memory storage of the result of the using query. Because the PGA gets filled up pretty quickly so the database starts using the temporary table space of sort operations and joins. The temp tablespace being on disk is excruciatingly slow. The use of excessive temp table space can be easily avoided by splitting the query into two queries.
So the below query
merger into emp e
using (
select a,b,c,d from (/* big query here */)
) ec
on /*conditions*/
when matched then
/* rest of merge logic */
can become
create table temp_big_query as select a,b,c,d from (/* big query here */);
merger into emp e
using (
select a,b,c,d from temp_big_query
) ec
on /*conditions*/
when matched then
/* rest of merge logic */
if the using query also has CTEs and sub queries try breaking that query up to use more temp tables like the one shown above. Also avoid using parallel hints because they mostly tend to slow the query down unless the query itself has something that can be done in parallel, try using indexes instead instead as much as possible parallel should be used as the last option for optimization.
I know some references are missing please feel free to comment and add references or point out mistakes in my answer.
We are running into performance issue where I need some suggestions ( we are on Oracle 10g R2)
The situation is sth like this
1) It is a legacy system.
2) In some of the tables it holds data for the last 10 years ( means data was never deleted since the first version was rolled out). Now in most of the OLTP tables they are having around 30,000,000 - 40,000,000 rows.
3) Search operations on these tables is taking flat 5-6 minutes of time. ( a simple query like select count(0) from xxxxx where isActive=’Y’ takes around 6 minutes of time.) When we saw the explain plan we found that index scan is happening on isActive column.
4) We have suggested archive and purge of the old data which is not needed and team is working towards it. Even if we delete 5 years of data we are left with around 15,000,000 - 20,000,000 rows in the tables which itself is very huge, so we thought of having table portioning on these tables, but we found that the user can perform search of most of the columns of these tables from UI,so which will defeat the very purpose of table partitioning.
so what are the steps which need to be taken to improve this situation.
First of all: question why you are issuing the query select count(0) from xxxxx where isactive = 'Y' in the first place. Nine out of ten times it is a lazy way to check for existence of a record. If that's the case with you, just replace it with a query that select 1 row (rownum = 1 and a first_rows hint).
The number of rows you mention are nothing to be worried about. If your application doesn't perform well when number of rows grows, then your system is not designed to scale. I'd investigate all queries that take too long using a SQL*Trace or ASH and fix it.
By the way: nothing you mentioned justifies the term legacy, IMHO.
Regards,
Rob.
Just a few observations:
I'm guessing that the "isActive" column can have two values - 'Y' and 'N' (or perhaps 'Y', 'N', and NULL - although why in the name of Fred there wouldn't be a NOT NULL constraint on such a column escapes me). If this is the case an index on this column would have very poor selectivity and you might be better off without it. Try dropping the index and re-running your query.
#RobVanWijk's comment about use of SELECT COUNT(*) is excellent. ONLY ask for a row count if you really need to have the count; if you don't need the count, I've found it's faster to do a direct probe (SELECT whatever FROM wherever WHERE somefield = somevalue) with an apprpriate exception handler than it is to do a SELECT COUNT(*). In the case you cited, I think it would be better to do something like
BEGIN
SELECT IS_ACTIVE
INTO strIsActive
FROM MY_TABLE
WHERE IS_ACTIVE = 'Y';
bActive_records_found := TRUE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
bActive_records_found := FALSE;
WHEN TOO_MANY_ROWS THEN
bActive_records_found := TRUE;
END;
As to partitioning - partitioning can be effective at reducing query times IF the field on which the table is partitioned is used in all queries. For example, if a table is partitioned on the TRANSACTION_DATE variable, then for the partitioning to make a difference all queries against this table would have to have a TRANSACTION_DATE test in the WHERE clause. Otherwise the database will have to search each partition to satisfy the query, so I doubt any improvements would be noted.
Share and enjoy.
The program that I am currently assigned to has a requirement that I copy the contents of a table to a backup table, prior to the real processing.
During code review, a coworker pointed out that
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
is unduly risky, as it is possible for the tables to have different columns, and different column orders.
I am also under the constraint to not create/delete/rename tables. ~Sigh~
The columns in the table are expected to change, so simply hard-coding the column names is not really the solution I am looking for.
I am looking for ideas on a reasonable non-risky way to get this job done.
Does the backup table stay around? Does it keep the data permanently, or is it just a copy of the current values?
Too bad about not being able to create/delete/rename/copy. Otherwise, if it's short term, just used in case something goes wrong, then you could drop it at the start of processing and do something like
create table backup_table as select * from primary_table;
Your best option may be to make the select explicit, as
insert into backup_table (<list of columns>) select <list of columns> from primary_table;
You could generate that by building a SQL string from the data dictionary, then doing execute immediate. But you'll still be at risk if the backup_table doesn't contain all the important columns from the primary_table.
Might just want to make it explicit, and raise a major error if backup_table doesn't exist, or any of the columns in primary_table aren't in backup_table.
How often do you change the structure of your tables? Your method should work just fine provided the structure doesn't change. Personally I think your DBAs should give you a mechanism for dropping the backup table and recreating it, such as a stored procedure. We had something similar at my last job for truncating certain tables, since truncating is frequently much faster than DELETE FROM TABLE;.
Is there a reason that you can't just list out the columns in the tables? So
INSERT INTO backup_table( col1, col2, col3, ... colN )
SELECT col1, col2, col3, ..., colN
FROM primary_table
Of course, this requires that you revisit the code when you change the definition of one of the tables to determine if you need to make code changes, but that's generally a small price to pay for insulating yourself from differences in column order, differences in column names, and irrelevent differences in table definitions.
If I had this situation, I would retrieve the column definitions for the two tables right at the beginning of the problem. Then, if they were identical, I would proceed with the simple:
INSERT INTO BACKUP_TABLE
SELECT *
FROM PRIMARY_TABLE
If they were different, I would only proceed if there were no critical columns missing from the backup table. In this case I would use this form for the backup copy:
INSERT INTO BACKUP_TABLE (<list of columns>)
SELECT <list of columns>
FROM PRIMARY_TABLE
But I'd also worry about what would happen if I simply stopped the program with an error, so I might even have a backup plan where I would use the second form for the columns that are in both tables, and also dump a text file with the PK and any columns that are missing from the backup. Also log an error even though it appears that the program completed normally. That way, you could recover the data if the worst happened.
Really, this is a symptom of bad processes somewhere which should be addressed, but defensive programming can help to make it someone else's problem, not yours. If they don't notice the log error message which tells them about the text dump with the missing columns, then its not your fault.
But, if you don't code defensively, and the worst happens, it will be partly your fault.
You could try something like:
CREATE TABLE secondary_table AS SELECT * FROM primary_table;
Not sure if that automatically copies data. If not:
CREATE TABLE secondary_table AS SELECT * FROM primary_table LIMIT 1;
INSERT INTO secondary_table SELECT * FROM primary_table;
Edit:
Sorry, didn't read your post completely: especially the constraints part. I'm afraid I don't know how. My guess would be using a procedure that first describes both tables and compares them, before creating a lengthy insert / select query.
Still, if you're using a backup-table, I think it's pretty important it matches the original one exactly.