Magento 1.7 / 1.8 deadlocks from index_process table - magento

I'm having greate problems with Magento last friday we upgraded Magento from 1.7 to 1.8..
The issue is that we're having a lot of deadlocks in the MySQL database.
Our server setup is
1 Load Balancer
4 Webservers (Apache, PHP5, APC)
2 MySQL Servers (64 GB Ram, 30 cores SSD HDD) - 1 Master (Has Memcache for sessions) - 1 Slave (Has Redis for caching)
The deadlock's is less on Magento 1.8 than 1.7 but the still appear from time to time ..
Any one has some good ideas on how to get pass this problem.
Heres some data from SHOW ENGINE INNODB STATUS;
LATEST DETECTED DEADLOCK
130930 12:03:35
* (1) TRANSACTION:
TRANSACTION 918EEC3B, ACTIVE 37 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 41 lock struct(s), heap size 6960, 50 row lock(s), undo log entries 6
MySQL thread id 51899, OS thread handle 0x7f9774169700, query id 2583719 xxx.xx.xxx.47 dbxxx Updating
UPDATE m17_index_process SET started_at = '2013-09-30 10:03:36' WHERE (process_id='8')
* (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 594 page no 3 n bits 208 index PRIMARY of table xxx.xx.xxx.47 dbxxx.m17_index_process trx id 918EEC3B lock_mode X locks rec but not gap waiting
* (2) TRANSACTION:
TRANSACTION 918EE3E7, ACTIVE 72 sec starting index read
mysql tables in use 1, locked 1
680 lock struct(s), heap size 80312, 150043 row lock(s), undo log entries 294
MySQL thread id 51642, OS thread handle 0x7f8a336c7700, query id 2586254 xxx.xx.xxx.47 dbxxx Updating
UPDATE m17_index_process SET started_at = '2013-09-30 10:03:40' WHERE (process_id='8')
(2) HOLDS THE LOCK(S):
RECORD LOCKS space id 594 page no 3 n bits 208 index PRIMARY of table dbxxx.m17_index_process trx id 918EE3E7 lock mode S locks rec but not gap
(2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 594 page no 3 n bits 208 index PRIMARY of table dbxxx.m17_index_process trx id 918EE3E7 lock_mode X locks rec but not gap waiting
* WE ROLL BACK TRANSACTION (1)
Best Regards.
Rasmus

Seems deadlocks are due to indexing processes. Try disabling automatic indexes Magento - Programmatically Disable Automatic Indexing
and doing them manually.
Also try disabling cron for some time and check if issues reoccur.
Its possible that many store admins saving products from different stores. In that case product save may be causing deadlock with index processes.
Thanks

Related

Root cause of deadlock?

I see below details for one of the deadlock detected in oracle 12g trace files but i am not getting why deadlock is happening here ?
Deadlock happens when thread 1 acquires lock on table1 or table rows but wait for table 2 rows and at the same time thread 2 acquires lock on table 2 rows byt wait for table1 rows
But i do not see the details which session is acquired the lock on which table and waiting for which resource . Any help what are the object which got locked here and
cause of it ?
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00290010-00015F75-00000000-00000000 295 1200 X 288 10 X
TX-00570012-00005D9B-00000000-00000000 288 10 X 295 1200 X
session 1200: DID 0001-0127-00014421 session 10: DID 0001-0120-00016BD1
session 10: DID 0001-0120-00016BD1 session 1200: DID 0001-0127-00014421
Rows waited on:
Session 1200: obj - rowid = 00051348 - BABRNIAARAAKfNLAAl
...
Session 10: obj - rowid = 000514F2 - BABRTyAAJAAKWbIAAY
....
----- Information for the OTHER waiting sessions -----
....
current SQL:
update employee set name=:1
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=5dfr2prw60rh1) -----
update department set address =:1 where id=:1
===================================================
Your output says the current session is trying to update a locked record in the department table (the "information for THIS session" output). The other session is trying to update every employee record (the "information for the OTHER waiting sessions" output). The current session must have updated a record in the employee table, blocking the other session, while the other session updated the record the current session is trying to update.
I assume this is some sort of exercise to cause a deadlock, since you're setting every employee record to the same name.

Cassandra timing out when queried for key that have over 10,000 rows even after giving timeout of 10sec

Im using a DataStax Community v 2.1.2-1 (AMI v 2.5) with preinstalled default settings.
And i have a table :
CREATE TABLE notificationstore.note (
user_id text,
real_time timestamp,
insert_time timeuuid,
read boolean,
PRIMARY KEY (user_id, real_time, insert_time))
WITH CLUSTERING ORDER BY (real_time DESC, insert_time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}
AND **default_time_to_live** = 20160
The other configurations are:
I have 2 nodes. on m3.large having 1 x 32 (SSD).
Im facing the issue of timeouts even if consistency is set to ONE on this particular table.
I increased the heap space to 3gb [ram size of 8gb]
I increased the read timeout to 10 secs.
select count (*) from note where user_id = 'xxx' limit 2; // errors={}, last_host=127.0.0.1.
I am wondering if the problem could be with time to live? or is there any other configuration any tuning that matters for this.
The data in the database is pretty small.
Also this problem occurs not as soon as you insert. This happens after some time (more than 6 hours)
Thanks.
[Copying my answer from here because it's the same environment/problem: amazon ec2 - Cassandra Timing out because of TTL expiration.]
You're running into a problem where the number of tombstones (deleted values) is passing a threshold, and then timing out.
You can see this if you turn on tracing and then try your select statement, for example:
cqlsh> tracing on;
cqlsh> select count(*) from test.simple;
activity | timestamp | source | source_elapsed
---------------------------------------------------------------------------------+--------------+--------------+----------------
...snip...
Scanned over 100000 tombstones; query aborted (see tombstone_failure_threshold) | 23:36:59,324 | 172.31.0.85 | 123932
Scanned 1 rows and matched 1 | 23:36:59,325 | 172.31.0.85 | 124575
Timed out; received 0 of 1 responses for range 2 of 4 | 23:37:09,200 | 172.31.13.33 | 10002216
You're kind of running into an anti-pattern for Cassandra where data is stored for just a short time before being deleted. There are a few options for handling this better, including revisiting your data model if needed. Here are some resources:
The cassandra.yaml configuration file - See section on tombstone settings
Cassandra anti-patterns: Queues and queue-like datasets
About deletes
For your sample problem, I tried lowering the gc_grace_seconds setting to 300 (5 minutes). That causes the tombstones to be cleaned up more frequently than the default 10 days, but that may or not be appropriate based on your application. Read up on the implications of deletes and you can adjust as needed for your application.

Finding cause of deadlock error from oracle trace file

I have been getting this "ora-00060 deadlock detected while waiting for resource" error often now in my application when multiple users are using the application. I have got the trace file from the oracle Admin, but need help in reading it. Below is bits of data from the trace file, which i hope would help in locating the cause.
*** 2013-06-25 09:37:35.324
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a deadlock due
to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-000151a2-00000000 210 72 SX SSX 208 24 SX SSX
TM-000151a2-00000000 208 24 SX SSX 210 72 SX SSX
session 72: DID 0001-00D2-000000C6 session 24: DID 0001-00D0-00000043
session 24: DID 0001-00D0-00000043 session 72: DID 0001-00D2-000000C6
Rows waited on:
Session 72: no row
Session 24: no row
----- Information for the OTHER waiting sessions -----
Session 24:
sid: 24 ser: 45245 audsid: 31660323 user: 90/USER
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
pid: 208 O/S info: user: zgrid, term: UNKNOWN, ospid: 2439
image: oracle#xyz.local
client details:
O/S info: user: , term: , ospid: 1234
machine: xyz.local program:
current SQL:
delete from EMPLOYEE where EMP_ID=:1
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=dyfg1wd8xa9qt) -----
delete from EMPLOYEE where EMP_ID=:1
===================================================
I would appreciate if some one can tell me what the "Deadlock graph::" is saying. Also the rows waited on section says no rows.
I also read in some blogs that "sqltxt" section from the trace file can suggest the cause. Below is the query i see in that section.
select /*+ all_rows */ count(1) from "USERS"."EMPLOYEE_SALARY" where EMPSAL_EMP_ID=:1
The employee_salary table has foreignkey constraint on EMPSAL_EMP_ID column.
The sql hint says "all_rows", so does it mean that this table gets table level lock when deleting records from employee table? i dont have an index on the foreign key column currently. Would adding an index on this column help?
Kindly post, in case any more information is need.
Thanks
First of all, select statement never lock anything in Oracle, just uses last available consistent version of data. It's not a case for select ... for update which locks data like update since Oracle 9i, but there are no for update clause in the query from question.
Resource Name process session holds waits process session holds waits
TM-000151a2-00000000 210 72 SX SSX 208 24 SX SSX
Session #72 holds table-level lock (TM) with "Row Exclusive" type (SX) and want to acquire "Share Row Exclusive" (SSX) lock on same table. This session blocked by Session #24 which already holds table-level lock of a same type (SX) and waits while SSX lock would be available.
Resource Name process session holds waits process session holds waits
TM-000151a2-00000000 208 24 SX SSX 210 72 SX SSX
This (second row) demonstrates exactly same situation, but in opposite direction: Session #24 waits for SSX lock become available, but blocked by Session #72 which already holds SX lock on same table.
So, Sessions #24 and Session #72 blocks each other: deadlock happens.
Both lock types (SX and SSX) are table-level locks.
To understand the situation I recommend to read this article by Franck Pachot.
Below is citation from this article, which directly relevant to your situation(note that SSX and SRX abbreviations are equivalent):
Referential integrity also acquires TM locks. For example, the common
issue with unindexed foreign keys leads to S locks on child table when
you issue a delete, or update on the key, on the parent table. This is
because without an index, Oracle has no single lower level resource to
lock in order to prevent a concurrent insert that can violate the
referential integrity.
When the foreign key columns are the leading
columns in a regular index, then the first index entry with the parent
value can be used as a single resource and locked with a row level TX
lock.
And what if referential integrity has an on delete cascade? In
addition to the S mode, there is the intention to update rows in the
child table, as with Row X (RX) mode. This is where the share row
exclusive (SRX) occurs: S+RX=SRX.
So, most probable variant is that Session #72 and Session #24 deletes some rows in EMPLOYEE table at same time, and there are on delete cascade constraint for EMPSAL_EMP_ID in conjunction with absence of index on EMPLOYEE_SALARY table in which EMPSAL_EMP_ID column listed first.

Is a deadlock possible when updating and deleting different rows in a table?

In Oracle 10+ versions, can update and delete on the same table cause deadlocks even if they are operating on different rows of same table concurrently?
The table has primary key made-up of two columns, and do not have any FK associated/refereed with any other table. And there is no parent/child relation with other table
What I believe is, it will not create a deadlock, but I'm facing a issue in my application.
adding the oracle trace :
The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0007003e-0081d6c3 45 790 X 104 20 X
TX-00080043-0085e6be 104 20 X 45 790 X
session 790: DID 0001-002D-000035F9 session 20: DID 0001-0068-000007F6
session 20: DID 0001-0068-000007F6 session 790: DID 0001-002D-000035F9
Rows waited on:
Session 790: obj - rowid = 0000F0C8 - AAAPDIAAMAAAEfIAAA
(dictionary objn - 61640, file - 12, block - 18376, slot - 0)
Session 20: obj - rowid = 0000F0C8 - AAAPDIAAMAAAEfGAAA
(dictionary objn - 61640, file - 12, block - 18374, slot - 0)
----- Information for the OTHER waiting sessions ----- Session 20:
sid: 20 ser: 4225 audsid: 57496371 user: 72/RPT_TABLE
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
pid: 104 O/S info: user: oracle, term: UNKNOWN, ospid: 20798
image: oracle#caidb10p-node1
client details:
O/S info: user: gtsgen, term: unknown, ospid: 1234
machine: caiapp08p-node0.nam.nsroot.net program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
delete from RPT_TABLE.TEMP_TABLE_T1 where TEMP_T1_ID=:1
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=bsaxpc2bdps9q) ----- UPDATE RPT_TABLE.TEMP_TABLE_T1 temp1 SET temp1.CLIENT_ID = (SELECT MIN(INVMAP.CLIENT_ID) FROM LI_REF.REF_CLIENT_MAP INVMAP WHERE INVMAP.F_CODE = :B2 AND INVMAP.AID = temp1.ID AND temp1.R_ID=:B1 )
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
45887d750 24 procedure RPT_TABLE.T1_UPDATE_StoredProc
6399ba188 1 anonymous block
If you could update your question with the deadlock graph, that would be useful information. (When your application encounters a deadlock, Oracle will raise an ORA-00060, and a tracefile will be written to the user_dump_dest.) If you look in the trace file, you'll find a section called the "Deadlock Graph". If you can post that, and also post the statement that caused the deadlock and other statements involved in the deadlock, then we can begin to draw some conclusions. (All the information I requested is available in the trace file.)
As Alessandro mentioned, it's possible for sessions locking different rows in the same table to deadlock due to unindexed foreign keys on the child table of a parent/child relationship. Also, It's possible that you could have deadlocks on two sessions updating different rows of the same table, even if the table is not part of a parent/child relationship, if, for example, the table has a shortage of ITL entries.
Again, post the information requested above, and I'm confident we can determine the root cause of your deadlock.
Added on 7/30/2012 **
Adding the following, now that the deadlock trace file has been supplied:
Ok, first off, based on the trace file contents, this is a simple deadlock due to sessions overlapping/colliding on the rows they are trying to lock. Despite your previous comments about the deadlock being on different rows, I'm here to tell you that this particular deadlock is due to row-level locking on the same rows.
The fact that the deadlock graph shows mode the lock is held in is 'X' (exclusive) and the mode the lock is waited on is 'X', tells me this is simple row-level locking.
In this case, SID 20 is executing "delete from RPT_TABLE.TEMP_TABLE_T1 where TEMP_T1_ID=:1" and already has a lock on rowid AAAPDIAAMAAAEfIAAA.
Meanwhile, SID 790 is executing "RPT_TABLE.T1_UPDATE_StoredProc", while already holding a lock on rowid AAAPDIAAMAAAEfGAAA.
Note from the "Rows waited on" section of the tracefile, that SID 20 is waiting on the row that SID 790 holds and SID 790 is waiting on the row that SID 20 is holding. This is a classic deadlock.
Some additional information:
Enqueue type is TX (see the deadlock graph), so, this is definitely not locking due to unindexed foreign keys. If it were locking due to unindexed FKs, the enqueue type would be TM, not TX. (There is at least one other case where TM enqueues are involved, and it's not unindexed FKs. So, don't assume that TM enqueue always means unindexed FKs.)
The mode the lock is being waited on is 'X' (exclusive), so this is row-level locking. If the mode waited on was 'S' (shared), then it would not be row-level locking. Rather, it could be ITL shortage or PK or UK enforcement.
Hope that helps!
I don't know if you have foreign keys involved in your application but it could probably be the source of your locks. If so take a look at these links:
http://docs.oracle.com/cd/E11882_01/server.112/e16508/consist.htm#BABCAHDJ
http://docs.oracle.com/cd/E11882_01/server.112/e16508/datainte.htm#CNCPT1657
Oracle Database maximizes the concurrency control of parent keys in relation to dependent foreign keys. Locking behaviour depends on whether foreign key columns are indexed. If foreign keys are not indexed, then the child table will probably be locked more frequently, deadlocks will occur, and concurrency will be decreased. For this reason foreign keys should almost always be indexed. The only exception is when the matching unique or primary key is never updated or deleted.
Locks and Unindexed Foreign Keys
When both of the following conditions are true, the database acquires a full table lock on the child table:
No index exists on the foreign key column of the child table.
A session modifies a primary key in the parent table (for example, deletes a row or modifies primary key attributes) or merges rows into the parent table. Inserts into the parent table do not acquire table locks on the child table.
If this is not your case try to provide more informations about it. Tell us about the Kind of locks Holden/requested by the sessions and take a look at the system tables V$LOCK, V$LOCKED_OBJECT, DBA_DDL_LOCKS, DBA_DML_LOCKS or V$SESSION_WAIT.

Oracle insert performs too long

I'm confused about time Oracle 10g XE performs insert. I implemented bulk insert from xml file into several tables with programmatical transaction managment. Why one insert performs in a moment and another more than 10 minutes! I can't wait more and stop it. I think there's something more complex I have not payed attention yet.
Update:
I found lock using Monitor.
Waits
Event enq: TX - row lock contention
name|mode 1415053316
usnusnusnusn<<16 | slot 327711
sequence 162
SQL
INSERT INTO ESKD$SERVICESET (ID, TOUR_ID, CURRENCY_ID) VALUES (9, 9, 1)
What does it mean and how should I resolve it?
TX- Enqueues are well known and a quick google will give you a clear answer.
From that article:
1) Waits for TX in mode 6 occurs when a session is waiting for a row level lock that is already held by another session. This occurs when one user is updating or deleting a row, which another session wishes to update or delete. This type of TX enqueue wait corresponds to the wait event enq: TX - row lock contention.
If you have lots of simultaneous inserts and updates to a table you want each transaction to be a short as possible. Get in, get out... the longer things sit in between, the longer the delays for OTHER transactions.
PURE GUESS:
I have a feeling that your mention of "programmatical transaction managment" is that you're trying to use a table like a QUEUE. Inserting a start record, updating it frequently to change the status and then deleting the 'finished' ones. That is always trouble.
This question will be really hard to answer with so little specific information. All that I can tell you is why this could be.
If you are doing an INSERT ... SELECT ... bulk insert then perhaps your SELECT query is performing poorly. There may be a large number of table joins, innefficient use of inline views and other resources that may be negatively impacting the performance of your INSERT.
Try executing your SELECT query in an Explain Plan to see how the Optimizer is deriving the plan and to evaluation the COST of the query.
The other thing that you mentioned was a possible lock. This could be the case however you will need to analyze this with the OEM tool to tell for sure.
Another thing to consider may be that you do not have indexes on your tables OR the statistics on these tables may be out of date. Out of date statistics can GREATLY impact the performance of queries on large tables.
see sites.google.com/site/embtdbo/wait-event-documentation/oracle-enqueues
The locking wait indicates a conflict that could easily be the cause of your performance issues. On the surface it looks likely that the problem is inserting a duplicate key value while the first insert of that key value had not yet committed. The lock you see "enq: TX - row lock contention" happens because one session is trying to modify uncommited data from another session. There are 4 common reasons for this particular lock wait event:
update/delete of the same row
inserting the same uniq key
modifying the same bitmap index chunk
deleting/updating a parent value to a foreign key
We can eliminate the first and last case are you are doing an insert.
You should be able to identify the 2nd if you have no bitmap indexes involved. If you have bitmap indexes involved and you have uniq keys involved then you could investigate easily if you had Active Session History (ASH) data, but unfortunately Oracle XE doesn't. On the other hand you can collected it yourself with S-ASH, see : http://ashmasters.com/ash-simulation/ . With ASH or S-ASH you can run a query like
col event for a22
col block_type for a18
col objn for a18
col otype for a10
col fn for 99
col sid for 9999
col bsid for 9999
col lm for 99
col p3 for 99999
col blockn for 99999
select
to_char(sample_time,'HH:MI') st,
substr(event,0,20) event,
ash.session_id sid,
mod(ash.p1,16) lm,
ash.p2,
ash.p3,
nvl(o.object_name,ash.current_obj#) objn,
substr(o.object_type,0,10) otype,
CURRENT_FILE# fn,
CURRENT_BLOCK# blockn,
ash.SQL_ID,
BLOCKING_SESSION bsid
--,ash.xid
from v$active_session_history ash,
all_objects o
where event like 'enq: TX %'
and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time
/
Which would output something like:
ST EVENT SID LM P2 P3 OBJ OTYPE FN BLOCKN SQL_ID BSID
10:41 enq: TX - row lock c 143 4 966081 4598 I1 INDEX 0 0 azav296xxqcjx 144
10:41 enq: TX - row lock c 143 4 966081 4598 I1 INDEX 0 0 azav296xxqcjx 144
10:41 enq: TX - row lock c 143 4 966081 4598 I1 INDEX 0 0 azav296xxqcjx 144
10:41 enq: TX - row lock c 143 4 966081 4598 I1 INDEX 0 0 azav296xxqcjx 144
showing that the object name "OBJ" and the object type "OTYPE" with the contention and that the type is an INDEX. From there you could look up the type of INDEX to verify that it is bitmap.
IF the problem is a bitmap index, then you should probably re-evaluate using bitmap indexes or revisit the way that data is loaded and/or modify to reduce conflicts.
If the problem isn't BITMAP indexes, then it's trying to insert a duplicate key. Some other process had inserted the same key value and not yet committed. Then your process tries to insert the same key value and has to wait for the first session to commit or rollback.
For more information see this link: lock waits
It means, your sequence cache is to small. Increase it.

Resources