Oracle 10g direct path write/read wait events - oracle

My 10g oracle prod database have performance problem. Some queries begun to return in 20 seconds which was comes in milliseconds. I get AWR report and top3 wait event shown below. I searched but i couldnt understand as well.
Can someone explain this events ? Thanks,
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
---------------------- ---------- ------- ------------ ----------------- ----------
direct path write temp 11,941,557 866,004 73 29.8 User I/O FEBRUARY
direct path write temp 16,197,445 957,129 59 17.2 User I/O MARCH
db file scattered read 5,826,190 58,095 10 2.0 User I/O FEBRUARY
db file scattered read 10,128,657 70,408 7 1.3 User I/O MARCH
direct path read temp 34,197,762 324,663 9 11.2 User I/O FEBRUARY
direct path read temp 88,688,686 507,715 6 9.1 User I/O MARCH

Two of your wait events are related to sorting: direct path write temp and direct path read temp. These indicate an increase in sorting on disk rather than in memory; disk I/O is always slower.
So, what has changed regarding memory allocation usage? Perhaps you need to revisit the values of SORT_AREA_SIZE or PGA_AGGREGATE_TARGET init parameters (depending on whether you are using Automatic PGA memory). Here is a query which calculates the memory/disk sort ratio:
SELECT 100 * (mem.value - dsk.value)/(mem.value) AS sort_ratio
FROM v$sysstat mem
cross join v$sysstat dsk
WHERE mem.name = 'sorts (memory)'
AND dsk.name ='sorts (disk)'
In an OLTP application we would expect this to be over 95%.
The other thing is, instead of looking at macro events you need to look at the specific queries which are running much slower. What has changed with regards to them? Lots more data? New indexes or dropped indexes? Refreshed statistics?
"SORT_RATIO ---------- 99.9985462"
So, sorts are higher but not too high. You need to focus on specific queries.
"in march we begun to user phyton application for some new queries. reason can be this ? "
Could be. Application change is always the prime suspect when our system exhibits different behavior.

Related

How to detect the cpu-peak-inducing transaction/statements (mostly read/select) from mon$... data?

I can see that the Firebird 2.1 process (on Linux) (for our program) reaches 97% CPU load, the load may be distributed, e.g. the server can have 4 cores and 2 cores are consumed with 97% load and the remaining 2 cores are under normal load (1-10%) from the Firebird process. The bad thing is, that this 97% peak can last half hour, an hour or even longer.
As I understand, then I just need to determine the Firebird transaction and the Firebird attachment (i.e. connection) that has created this peek and then I can just ask the user/software instance, that created this connection/attachment to close his/her program and start anew. When attachment is closed, the Firebird can sense this and Firebird process stops any CPU loads and processes that were assigned to that attachment.
So, my aim is to look on the data from the monitoring tables (mon$...) and to determine the offending transaction/connection.
I came up with the select (for Firebird 2.1):
select a.mon$user, sa.*, t.*
from mon$transactions t
left join mon$io_stats s on (t.mon$stat_id=s.mon$stat_id)
left join mon$attachments a on (t.mon$attachment_id=a.mon$attachment_id)
left join mon$statements sa on (t.mon$transaction_id=sa.mon$transaction_id)
where s.mon$page_reads>1000000
This SQL seems to be right, but practically the results are misleading. For example, my select returns several entries with a.mon$timestamp that is 4 or even more hours old. I can not believe that there are transactions that are so old and that still are taking resources. The strange thing is that the records have no data from left-joined mon$statements. So, I have some information about long-running transactions, but I have no information about statements that case created or prolonged this transaction. I don't even understand whether such transactions are actually creating the CPU peak or if this data is obsolete.
So, how to correct this SQL (or write completely anew) to find the statements/attachments that is causing CPU % in Firebird 2.1?

How to improve sqlicifer performance?

I have a very small encrypted sqlite test database. I run a very simple select: just one record from the table which contains one record. This request takes very significant time: 0.3 sec.
lesnik#westfall:~/Projects/ls$ cat sql_enc.sql
PRAGMA KEY = "DUMMYKEYDUMMYKEY";
SELECT * FROM 'version';
lesnik#westfall:~/Projects/ls$
lesnik#westfall:~/Projects/ls$ time sqlcipher rabbits_enc.sqlite3 < sql_enc.sql
key ver
---------- ----------
1 aaa
real 0m0.299s
user 0m0.297s
sys 0m0.000s
Experiments show that the time doesn't depend on number of requests in script and doesn't depend on size of database (this test database is just 5kb, result is the same on 500kb databases)
There is no such problem if database is not encrypted.
Performance is slightly better on another linux installation (in different Virtual Box on the same host). And there is no this problem on yet another linux installation (script execution time is about 0.001s there), so I believe this is some problem with environment. But I have no idea how to investigate this problem further. Any help is appreciated.
We provide general performance guidance for utilizing SQLCipher here

Hadoop vs Cassandra: Which is better for the following scenario?

There is a situation in our systems in which the user can view and "close" a report. After they close it, the report is moved to a temporary table inside the database where it is kept for 24 hrs, and then moved to an archives table(where the report is stored for next 7 years). At any point during the 7 years, a user can "reopen" the report and work on it. The problem is that archives storage is getting large and finding/reopening reports tend to be time consuming. And I need to get statistics on the archives from time to time(i.e. report dates, clients, average length "opened", etc). I want to use a big data approach but I am not sure whether to use Hadoop, Cassandra, or something else ? Can someone provide me with some guidelines how to get started and decide on what to use ?
If you archive is large and you'd like to get reports from it, you won't be able to use just Cassandra, as it has no easy means of aggregating the data. You'll end up collocating Hadoop and Cassandra on the same nodes.
From my experience archives (write once - read many) is not the best use case for Cassandra if you're having a lot of writes (we've tried it for a backend for a backup sysyem). Depending on your compaction strategy you'll pay either in space or in iops for having that. Added changes are propagated through the SSTable hierarchies resulting in a lot more writes than the original change.
It is not possible to answer your question in full without knowing other variables: how much hardware (servers, their ram/cpu/hdd/ssd) are you going to allocate? what is the size of each 'report' entry? how many reads / writes you usually serve daily? How large is your archive storage now?
Cassandra might work fine. Keep two tables, reports and reports_archive. Define the schema using a TTL of 24 hours and 7 years:
CREATE TABLE reports (
...
) WITH default_time_to_live = 86400;
CREATE TABLE reports_archive (
...
) WITH default_time_to_live = 86400 * 365 * 7;
Use the new Time Window Compaction Strategy (TWCS) to minimize write amplification. It could be advantageous to store the report metadata and report binary data in separate tables.
For roll-up analytics, use Spark with Cassandra. You don't mention the size of your data, but roughly speaking 1-3 TB per Cassandra node should work fine. Using RF=3 you'll need at least three nodes.

Oracle slow down unexpected and rapidly when using sql "update" continuously

The situation is simple, there is a table in oracle used as a "shared table" for data exchange. The table structure and number of records remains unchanged. In normal case, I continuously update data into this table and other process read this table for current data.
Strange thing is, when my process starts, the time consumption of each update statement execution is approximately 2 ms. And after a certain peroid of time(like 8 hours), the time consumption increased to 10 ~ 20 ms per statement. It makes the procedure quite slow.
the structure of table
and the update statement is like:
anaNum = anaList.size();
qry.prepare(tr("update YC set MEAVAL=:MEAVAL, QUALITY=:QUALITY, LASTUPDATE=:LASTUPDATE where YCID=:YCID"));
foreach(STbl_ANA ana, anaList)
{
qry.bindValue(":MEAVAL",ana.meaVal);
qry.bindValue(":QUALITY",ana.quality);
qry.bindValue(":LASTUPDATE",QDateTime::fromTime_t(ana.lastUpdate));
qry.bindValue(":YCID",ana.ycId);
if(!qry.exec())
{
qWarning() << QObject::tr("update yc failed, ")
<< qry.lastError().databaseText() << qry.lastError().driverText();
failedAnaList.append(ana);
}
}
the update statement using qt interface
There is many reasons which can cause orcle opreation slowd down, but I cannot find a clue to explain this.
I never start a transaction manually in qt code, which means the commit operation is executed every time after update statement.
The update frequency is about 200 records per second, but the number is dynamically changed by time. It maybe increase to 1000 in one time and drop to 10 in next time.
once the time consumption up to 10 ~ 20 ms per statement, it'll never dorp down. time consumption can be restored to 2ms only be restart oracle service.(it's useless to shutdown or restart any user process which visit orcle)
Please tell me how to solve it or at least what to be examined.
Good starting points is to check the AWR and ASH reports.
Comparing the reports in "good" and "bad" times you can spot the cause of the change. This can be for example a change of an execution plan or increase of wait events. One possible outcome is that only change you see is that the database is waiting more time on the client (i.e. the problem is not in the DB).
Anyway as diagnosed in other answer, the root cause of problems seems to be the update in a loop. If your update lists are long (say more that 10-100 entries) you can profit by updating the whole list in a single statement using MERGE.
build a collection from your list
cast the collection as TABLE
use this table in a MERGE statement to update the rows.
See here for details.
You can trace the session while it is running quickly and again later when it is running slowly. Use the sql trace functionality and tkprof to get a breakdown of where the update is spending its time in each case and see what has changed.
https://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm#i4640
If you need help interpreting the results you can update your question or ask a new one.
Secondly, as a rule single record updates are not the best way to do updates in Oracle. Since you have many records to update already prepared before you prepare the query, look at execBatch.
https://doc.qt.io/qt-4.8/qsqlquery.html#execBatch
This will both execute the update faster and only issue a single commit.

Oracle SQL*loader running in direct mode is much slower than conventional path load

In the past few days I've playing around with Oracle's SQL*Loader in attempt to bulk load data into Oracle. After trying out different combination of options I was surprised to found the conventional path load runs much quicker than direct path load.
A few facts about the problem:
Number of records to load is 60K.
Number of records in target table, before load, is 700 million.
Oracle version is 11g r2.
The data file contains date, character (ascii, no conversion required), integer, float. No blob/clob.
Table is partitioned by hash. Hash function is same as PK.
Parallel of table is set to 4 while server has 16 CPU.
Index is locally partitioned. Parallel of index (from ALL_INDEXES) is 1.
There's only 1 PK and 1 index on target table. PK constraint built using index.
Check on index partitions revealed that records distribution among partitions are pretty even.
Data file is delimited.
APPEND option is used.
Select and delete of the loaded data through SQL is pretty fast, almost instant response.
With conventional path, loading completes in around 6 seconds.
With direct path load, loading takes around 20 minutes. The worst run takes 1.5 hour to
complete yet server was not busy at all.
If skip_index_maintenance is enabled, direct path load completes in 2-3 seconds.
I've tried quite a number of options but none of them gives noticeable improvement... UNRECOVERABLE, SORTED INDEXES, MULTITHREADING (I am running SQL*Loader on a multiple CPU server). None of them improve the situation.
Here's the wait event I kept seeing during the time SQL*Loader runs in direct mode:
Event: db file sequential read
P1/2/3: file#, block#, blocks (check from dba_extents that it is an index block)
Wait class: User I/O
Does anyone has any idea what has gone wrong with direct path load? Or is there anything I can further check to really dig the root cause of the problem? Thanks in advance.
I guess you are falling fowl of this
"When loading a relatively small number of rows into a large indexed table
During a direct path load, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by a direct path load."
from When to Use a Conventional Path Load in: http://download.oracle.com/docs/cd/B14117_01/server.101/b10825/ldr_modes.htm

Resources