Subqueries and MySQL Cache for 18M+ row table - caching

As this is my first post it seems I can only post 1 link so I have listed the sites I'm referring to at the bottom. In a nutshell my goal is to make the database return the results faster, I have tried to include as much relevant information as I could think of to help frame the questions at the bottom of the post.
Machine Info
8 processors
model name : Intel(R) Xeon(R) CPU E5440 # 2.83GHz
cache size : 6144 KB
cpu cores : 4
top - 17:11:48 up 35 days, 22:22, 10 users, load average: 1.35, 4.89, 7.80
Tasks: 329 total, 1 running, 328 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 87.4%id, 12.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8173980k total, 5374348k used, 2799632k free, 30148k buffers
Swap: 16777208k total, 6385312k used, 10391896k free, 2615836k cached
However we are looking at moving the mysql installation to a different machine in the cluster that has 256 GB of ram
Table Info
My MySQL Table looks like
CREATE TABLE ClusterMatches
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
cluster_index INT,
matches LONGTEXT,
tfidf FLOAT,
INDEX(cluster_index)
);
It has approximately 18M rows, there are 1M unique cluster_index's and 6K unique matches. The sql query I am generating in PHP looks like.
SQL query
$sql_query="SELECT `matches`,sum(`tfidf`) FROM
(SELECT * FROM Test2_ClusterMatches WHERE `cluster_index` in (".$clusters."))
AS result GROUP BY `matches` ORDER BY sum(`tfidf`) DESC LIMIT 0, 10;";
where $cluster contains a string of approximately 3,000 comma separated cluster_index's. This query makes use of approximately 50,000 rows and takes approximately 15s to run, when the same query is run again it takes approximately 1s to run.
Usage
The content of the table can be assumed to be static.
Low number of concurrent users
The query above is currently the only query that will be run on the table
Subquery
Based on this post [stackoverflow: Cache/Re-Use a Subquery in MySQL][1] and the improvement in query time I believe my subquery can be indexed.
mysql> EXPLAIN EXTENDED SELECT `matches`,sum(`tfidf`) FROM
(SELECT * FROM ClusterMatches WHERE `cluster_index` in (1,2,...,3000)
AS result GROUP BY `matches` ORDER BY sum(`tfidf`) ASC LIMIT 0, 10;
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 48528 | Using temporary; Using filesort |
| 2 | DERIVED | ClusterMatches | range | cluster_index | cluster_index | 5 | NULL | 53689 | Using where |
+----+-------------+----------------------+-------+---------------+---------------+---------+------+-------+---------------------------------+
According to this older article [Optimizing MySQL: Queries and Indexes][2] in Extra info - the bad ones to see here are "using temporary" and "using filesort"
MySQL Configuration Info
Query cache is available, but effectively turned off as the size is currently set to zero
mysqladmin variables;
+---------------------------------+----------------------+
| Variable_name | Value |
+---------------------------------+----------------------+
| bdb_cache_size | 8384512 |
| binlog_cache_size | 32768 |
| expire_logs_days | 0 |
| have_query_cache | YES |
| flush | OFF |
| flush_time | 0 |
| innodb_additional_mem_pool_size | 1048576 |
| innodb_autoextend_increment | 8 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 8388608 |
| join_buffer_size | 131072 |
| key_buffer_size | 8384512 |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| max_binlog_cache_size | 18446744073709547520 |
| sort_buffer_size | 2097144 |
| table_cache | 64 |
| thread_cache_size | 0 |
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 0 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| read_rnd_buffer_size | 262144 |
+---------------------------------+----------------------+
Based on this article on [Mysql Database Performance turning][3] I believe that the values I need to tweak are
table_cache
key_buffer
sort_buffer
read_buffer_size
record_rnd_buffer (for GROUP BY and ORDER BY terms)
Areas Identified for improvement - MySQL Query tweaks
Changing the datatype for matches to an index that is an int pointing to another table [MySQL will indeed use a dynamic row format if it contains variable length fields like TEXT or BLOB, which, in this case, means sorting needs to be done on disk. The solution is not to eschew these datatypes, but rather to split off such fields into an associated table.][4]
Indexing the new match_index feild so that the GROUP BY matches occurs faster, based on the statement ["You should probably create indices for any field on which you are selecting, grouping, ordering, or joining."][5]
Tools
To tweak perform I plan to use
[Explain][6] making reference to [the output format][7]
[ab - Apache HTTP server benchmarking tool][8]
[Profiling][9] with [log data][10]
Future Database Size
The goal is to build a system that can have 1M unique cluster_index values 1M unique match values, approx 3,000,000,000 table rows with a response time to the query of around 0.5s (we can add more ram as necessary and distribute the database across the cluster)
Questions
I think we want to keep the entire recordset in ram so that the query doesnt touch the disk, if we keep the entire database in the MySQL cache does that eliminate the need for memcachedb?
Is trying to keep the entire database in MySQL cache a bad strategy as its not designed to be persistent? Would something like memcachedb or redis be a better approach, if so why?
Is the temporary table "result" that is created by the query automatically destroyed when the query finishes?
Should we switch from Innodb to MyISAM [as its good for read heavy data where as InnoDB is good for write heavy][11] ?
my cache doesnt appear to be on as its zero in my [Query Cache Configuration][12], why does the query currently occur faster the second time I run it?
can i restructure my query to eliminate "using temporary" and "using filesort" occuring, should i be using a join instead of a subquery?
how do you view the size of the MySQL [Data Cache][13]?
what sort of sizes for the values table_cache, key_buffer, sort_buffer, read_buffer_size, record_rnd_buffer would you suggest as a starting point?
Links
1: stackoverflow.com/questions/658937/cache-re-use-a-subquery-in-mysql
2: databasejournal.com/features/mysql/article.php/10897_1382791_4/Optimizing-MySQL-Queries-and-Indexes.htm
3: debianhelp.co.uk/mysqlperformance.htm
4: 20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
5: 20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
6: dev.mysql.com/doc/refman/5.0/en/explain.html
7: dev.mysql.com/doc/refman/5.0/en/explain-output.html
8: httpd.apache.org/docs/2.2/programs/ab.html
9: mtop.sourceforge.net/
10: dev.mysql.com/doc/refman/5.0/en/slow-query-log.html
11: 20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
12: dev.mysql.com/doc/refman/5.0/en/query-cache-configuration.html
13: dev.mysql.com/tech-resources/articles/mysql-query-cache.html

Changing the table
Based on the advice in this post on How to pick indexes for order by and group by queries the table now looks like
CREATE TABLE ClusterMatches
(
cluster_index INT UNSIGNED,
match_index INT UNSIGNED,
id INT NOT NULL AUTO_INCREMENT,
tfidf FLOAT,
PRIMARY KEY (match_index,cluster_index,id,tfidf)
);
CREATE TABLE MatchLookup
(
match_index INT UNSIGNED NOT NULL PRIMARY KEY,
image_match TINYTEXT
);
Eliminating Subquery
The query without sorting the results by the SUM(tfidf) looks like
SELECT match_index, SUM(tfidf) FROM ClusterMatches
WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index LIMIT 10;
Which eliminates using temporary and using filesort
explain extended SELECT match_index, SUM(tfidf) FROM ClusterMatches
WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index LIMIT 10;
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 14938 | Using where; Using index |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
Sorting Problem
However if i add the ORDER BY SUM(tfdif) in
SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches
WHERE cluster_index in (1,2,3 ... 3000) GROUP BY match_index
ORDER BY total DESC LIMIT 0,10;
+-------------+--------------------+
| match_index | total |
+-------------+--------------------+
| 868 | 0.11126546561718 |
| 4182 | 0.0238558370620012 |
| 2162 | 0.0216601379215717 |
| 1406 | 0.0191618576645851 |
| 4239 | 0.0168981291353703 |
| 1437 | 0.0160425212234259 |
| 2599 | 0.0156466849148273 |
| 394 | 0.0155945559963584 |
| 3116 | 0.0151005545631051 |
| 4028 | 0.0149106932803988 |
+-------------+--------------------+
10 rows in set (0.03 sec)
The result is suitably fast at this scale BUT having the ORDER BY SUM(tfidf) means it uses temporary and filesort
explain extended SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches
WHERE cluster_index IN (1,2,3 ... 3000) GROUP BY match_index
ORDER BY total DESC LIMIT 0,10;
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | ClusterMatches | range | PRIMARY | PRIMARY | 4 | NULL | 65369 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------------+
Possible Solutions?
Im looking for a solution that doesn't use temporary or filesort, along the lines of
SELECT match_index, SUM(tfidf) AS total FROM ClusterMatches
WHERE cluster_index IN (1,2,3 ... 3000) GROUP BY cluster_index, match_index
HAVING total>0.01 ORDER BY cluster_index;
where I dont need to hardcode a threshold for total, any ideas?

Related

How to properly apply index_desc hint for a remote database object in oracle database?

We were facing some issues with execution plans while accessing remote database objects with dblink. Here is the query itself run on the remote database:
select --+ index_desc (d DAY_OPERATIONAL_PK)
d.oper_day
from day_operational d
where rownum = 1
The plan for this query is the following :
Plan Hash Value : 2761870770
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 2 | 00:00:01 |
| * 1 | COUNT STOPKEY | | | | | |
| 2 | INDEX FULL SCAN DESCENDING | DAY_OPERATIONAL_PK | 1 | 8 | 2 | 00:00:01 |
---------------------------------------------------------------------------------------------
This one works correct that is it returns the last operational day. In this case 14.09.2021. However if execute this exact same query from other database connecting to this one via dblink, wrong results are returned . In this case the first row of the table is returned - 05.09.2009.
Here is the query:
select --+ index_desc (d DAY_OPERATIONAL_PK)
d.oper_day
from day_operational#iabs d
where rownum = 1
The plan generated for this query in local database is the following:
Plan Hash Value :
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | | 1 | 8 | 2 | 00:00:01 |
| * 1 | COUNT STOPKEY | | | | | |
| 2 | INDEX FAST FULL SCAN | XPKDAY_OPERATIONAL | 1 | 8 | 2 | 00:00:01 |
---------------------------------------------------------------------------------------
As it can be seen, the plan generated when connected via dblink uses full table scan and ignores index_desc hint. How could we enforce oracle to use this index? Tried adding driving_site hint but it didn't help
Sorry for confusions. It occured that index names in local database and remote database are not same and therefore hint was ignoring since there was no index with that name in the remote db. With right index names, both the plan and result were correct

Oracle Function based index returning slowly

I have a table setup that contains some 640m records and I'm trying to create an index.
The manner in which I want to select records involves something like this:
index_i9(ORA_HASH(placard_bcd,128),event);
However that still returns about 3-4 million records, and from my testing, it takes a length amount of time (~12 minutes or so).
Is this a bad idea as an index? I don't think getting 3-4m records should take that long.
Any ideas?
Edit (adding more info):
The table has a bunch of columns but I don't know if I need to list all of them:
table_a
container NOT NULL NUMBER(19),
placard_bcd NOT NULL VARCHAR2(30),
event NOT NULL VARCHAR(5),
bin_number NUMBER(3),
...
...
It takes about 12 minutes to return all of the records that would return based on the index above. So to provide me all 3-4 million records.
The query used looks something like this:
select barcode, event, bin_number
from table_a
where ora_hash(barcode,128) = 105;
and event in ('CLOS','PASG','BUILD');
The explain plan provided is this:
Plan hash value: 4185630329
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35074 | 4212K| 338 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_A | 35074 | 4212K| 338 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TABLE_A_I9 | 14030 | | 14 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("EVENT_TYPE"='BUILD' OR "EVENT_TYPE"='CLOS' OR "EVENT_TYPE"='PASG') AND
ORA_HASH("PLACARD_BCD",128)=105)
Everything seems correct, but it still is taking a while to provide me with the records.

Oracle insert in index table:Time to load 500 thousand rows is more than inserting 16 million rows

At first I tried normal insert into target table from temporary table.
INSERT /*+ APPEND */ INTO RDW10DM.INV_ITEM_LW_DM
SELECT
*
FROM
RDW10PRD.TMP_MDS_RECLS_INV_ITEM_LW_DM
;
COMMIT;
It tooks only 17 min to load.Total count in temp table TMP_MDS_RECLS_INV_ITEM_LW_DM is 16491650.
Plan for Execution:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 16M| 1290M| 4927 |
| 1 | LOAD AS SELECT | | | | |
| 2 | TABLE ACCESS FULL | TMP_MDS_RECLS_INV_ITEM_LW_DM | 16M| 1290M| 4927 |
--------------------------------------------------------------------------------------
Note: cpu costing is off
Then I tried to load loc wise:
INSERT /*+ APPEND */ INTO RDW10DM.INV_ITEM_LW_DM
SELECT
*
FROM
RDW10PRD.TMP_MDS_RECLS_INV_ITEM_LW_DM
where LOC_KEY=222
;
COMMIT;
Then it tooks around 28 min to load. Total count in temp table with filter is 493465
Plan for execution:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 492K| 38M| 4927 |
| 1 | LOAD AS SELECT | | | | |
|* 2 | TABLE ACCESS FULL | TMP_MDS_RECLS_INV_ITEM_LW_DM | 492K| 38M| 4927 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TMP_MDS_RECLS_INV_ITEM_LW_DM"."LOC_KEY"=222)
Note: cpu costing is off
Index in Target table:
Does anyone has any idea why this is happening?
My guess? The TMP table doesn't have an index.
Therefore - selecting all records and inserting them is faster then applying an a filter on 16Mil records.
As you can see, in your second execution plan the scanner is using FULL ACCESS , which slows down the query. Try adding an index on TMP_MDS_RECLS_INV_ITEM_LW_DM(LOC_KEY) . It should boost your query performance.
Thank everyone for your valuable thoughts.
I found the actual problem later. Since I have doing frequent truncate and load in target table RDW10DM.INV_ITEM_LW_DM so index pages might have fragmented.
So, ran query after rebuilding indexes and got expected results.

Improving SQL Exists scalability

Say we have two tables, TEST and TEST_CHILDS in the following way:
creat TABLE TEST(id1 number PRIMARY KEY, word VARCHAR(50),numero number);
creat TABLE TEST_CHILD (id2 number references test(id), word2 VARCHAR(50));
CREATE INDEX TEST_IDX ON TEST_CHILD(word2);
CREATE INDEX TEST_JOIN_IDX ON TEST_CHILD(id);
insert into TEST SELECT ROWNUM,U1.USERNAME||U2.TABLE_NAME, LENGTH(U1.USERNAME) FROM ALL_USERS U1,ALL_TABLES U2;
INSERT INTO TEST_CHILD SELECT MOD(ROWNUM,15000)+1,U1.USER_ID||U2.TABLE_NAME FROM ALL_USERS U1,ALL_TABLES U2;
We would like to query to get rows from TEST table that satisfy some criteria in the child table, so we go for:
SELECT /*+FIRST_ROWS(10)*/* FROM TEST T WHERE EXISTS (SELECT NULL FROM TEST_CHILD TC WHERE word2 like 'string%' AND TC.id = T.id ) AND ROWNUM < 10;
We always want just the first 10 results, not any more at all. Therefore, we would like to get the same response time to read 10 results whether table has 10 matching values or 1,000,000; since it could get 10 distinct results from the child table and get the values on the parent table (or at least that is the plan that we would like). But when checking the actual execution plan we see:
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54 | 5 (20)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 1 | 54 | 5 (20)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 23 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TEST_CHILD | 1 | 23 | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TEST_IDX | 1 | | 2 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | SYS_C005145 | 1 | | 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | TEST | 1 | 31 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<10)
6 - access("WORD2" LIKE 'string%')
filter("WORD2" LIKE 'string%')
7 - access("TC"."ID"="T"."ID")
SORT UNIQUE under the STOPKEY, what afaik means that it is reading all results from the child table, making the distinct to finally select only the first 10, making the query not as scalable as we would like it to be.
Is there any mistake in my example?
Is it possible to improve this execution plan so it scales better?
The SORT UNIQUE is going to find and sort all of the records from TEST_CHILD that matched 'string%' - it is NOT going to read all results from child table. Your logic requires this. IF you only picked the first 10 rows from TEST_CHILD that matched 'string%', and those 10 rows all had the same ID, then your final results from TEST would only have 1 row.
Anyway, your performance should be fine as long as 'string%' matches a relatively low number of rows in TEST_CHILD. IF your situation is such that 'string%' often matches a HUGE record count on TEST_CHILD, there's not much you can do to make the SQL more performant given the current tables. In such a case, if this is a mission-critical SQL, with performance tied to your annual bonus, there's probably some fancy footwork you could do with MATERIALIZED VIEWs to, e.g. pre-compute 10 TEST rows for high-cardinality WORD2 values in TEST_CHILD.
One final thought - a "risky" solution, but one which should work if you don't have thousands of TEST_CHILD rows matching the same TEST row, would be the following:
SELECT *
FROM TEST
WHERE ID1 IN
(SELECT ID2
FROM TEST_CHILD
WHERE word2 like 'string%'
AND ROWNUM < 1000)
AND ROWNUM <10;
You can adjust 1000 up or down, of course, but if it's too low, you risk finding less than 10 distinct ID values, which would give you final results with less than 10 rows.

Oracle 11g how to estimate needed TEMP tablespace?

We do an initial bulk load of some tables (both, source and target are Oracle 11g). The process is as follows: 1. truncate, 2. drop indexes (the PK and a unique index), 3. bulk insert, 4. create indexes (again the PK and the unique index). Now I got the following error:
alter table TARGET_SCHEMA.MYBIGTABLE
add constraint PK_MYBIGTABLE primary key (MYBIGTABLE_PK)
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
So obviously TEMP tablespace is to small for PK creation (FYI the table has 6 columns and about 2.2 billion records). So I did this:
explain plan for
select line_1,line_2,line_3,line_4,line_5,line_6,count(*) as cnt
from SOURCE_SCHEMA.MYBIGTABLE
group by line_1,line_2,line_3,line_4,line_5,line_6;
select * from table( dbms_xplan.display );
/*
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2274M| 63G| | 16M (2)| 00:05:06 |
| 1 | HASH GROUP BY | | 2274M| 63G| 102G| 16M (2)| 00:05:06 |
| 2 | TABLE ACCESS FULL| MYBIGTABLE | 2274M| 63G| | 744K (7)| 00:00:14 |
-----------------------------------------------------------------------------------------------
*/
Is this how to tell how much TEMP tablespace will be needed for PK creation (102 GB in my case)? Or would you make the estimate differently?
Additional: The PK only exists on the target system. But fair point, so I run your query on target PK:
explain plan for
select MYBIGTABLE_PK
from TARGET_SCHEMA.MYBIGTABLE
group by MYBIGTABLE_PK ;
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 13 | 3 (34)| 00:00:01 |
| 2 | TABLE ACCESS FULL| MYBIGTABLE | 1 | 13 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
So how would I have to read this now?
This is a good question.
First, If you create the following primary key
alter table TARGET_SCHEMA.MYBIGTABLE
add constraint PK_MYBIGTABLE primary key (MYBIGTABLE_PK)
then you should query
explain plan for
select PK_MYBIGTABLE
from SOURCE_SCHEMA.MYBIGTABLE
group by PK_MYBIGTABLE
To get an estimate (make sure you gather stats exec dbms_stats.gather_table_stats('SOURCE_SCHEMA','MYBIGTABLE').
Second , you can query V$TEMPSEG_USAGE to see how much temp blocks were consumed before you got thrown and v$session_longops to see how much of the total process you finished.
Oracle docs suggests creating a dedicated temp tablespace for the process to not disturb any other operations.
Please post an edit if you find a more accurate solution.

Resources