Soft version as follows:
apache hbase 2.1.6
apache flink 1.13.6
apache hadoop 3.1.1
When I use the hbase-client api to access hbase, I get the following error:
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=16, exceptions:
Wed Sep 28 03:03:11 UTC 2022, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68532: java.io.IOException: Invalid currTagsLen -32239. Block offset: 1319713, block length: 99991, position: 42422 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/cd083a4a1ef04baff94ebb5aabdb8cb8/i/1f6dd8a1bc054eefbc9faa1bf625e24f
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalStateException: Invalid currTagsLen -32239. Block offset: 1319713, block length: 99991, position: 42422 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/cd083a4a1ef04baff94ebb5aabdb8cb8/i/1f6dd8a1bc054eefbc9faa1bf625e24f
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkTagsLen(HFileReaderImpl.java:642)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:630)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6581)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6745)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6518)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3155)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3404)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
... 3 more
The exception for hbase regionserver is as follows:
2022-09-28 11:19:36,019 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started
2022-09-28 11:20:20,946 INFO [MemStoreFlusher.0] regionserver.HRegion: Flushing 1/1 column families, dataSize=1.95 MB heapSize=2.09 MB
2022-09-28 11:20:20,969 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed memstore data size=1.95 MB at sequenceid=8934625 (bloomFilter=true), to=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d5468
55f/.tmp/i/2629dbae7d5e402489ef56b1c097289f
2022-09-28 11:20:20,977 INFO [MemStoreFlusher.0] regionserver.HStore: Added hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/2629dbae7d5e402489ef56b1c097289f, entries=1212, sequenceid=8934625, filesize=359.
1 K
2022-09-28 11:20:20,978 INFO [MemStoreFlusher.0] regionserver.HRegion: Finished flush of dataSize ~1.95 MB/2041026, heapSize ~2.09 MB/2190200, currentSize=0 B/0 for e63ee2269b0b076a415c5f76d546855f in 32ms, sequenceid=8934625, compaction requested=true
2022-09-28 11:20:20,986 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.HRegion: Starting compaction of i in expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f.
2022-09-28 11:20:20,986 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.HStore: Starting compaction of [hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/98d0ecd1ed
7744a8a5f94923c382861e, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/30bab1682dba4721b25e58b78dd17255, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/f8
0c2f08176e417a9184f434d4300935, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/52baca576c154c26b7df3b5d126d47b8, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d5468
55f/i/7d8291d422d042de9aa43aa5b79da6ad, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/8bf3b47909ab4eeb86d8a5c283cfe942, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5
f76d546855f/i/0663d48a4ed94dbe9fdc78f6649c1eb3, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/b80b55d744174bc882db93283cd70c71] into tmpdir=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e
63ee2269b0b076a415c5f76d546855f/.tmp, totalSize=18.9 M
2022-09-28 11:20:21,153 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] throttle.PressureAwareThroughputController: e63ee2269b0b076a415c5f76d546855f#i#compaction#637 average throughput is 122.45 MB/second, slept 0 time(s) and
total slept time is 0 ms. 0 active operations remaining, total limit is 61.86 MB/second
2022-09-28 11:20:21,159 ERROR [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.CompactSplit: Compaction failed region=expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f., storeName=i, priority=73, startTime=
1664335220978
java.lang.IllegalStateException: Invalid currTagsLen -9. Block offset: 1677972, block length: 161891, position: 48652 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/b80b55d744174bc88
2db93283cd70c71
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkTagsLen(HFileReaderImpl.java:642)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:630)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327)
at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187)
at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596)
at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-09-28 11:20:25,000 INFO [RpcServer.default.FPBQ.Fifo.handler=18,queue=3,port=16020] regionserver.HRegion: writing data to region expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f. with WAL disabled. Data may be lost in the event of a cra
sh.
2022-09-28 11:24:01,565 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.08 GB, freeSize=2.52 GB, max=3.60 GB, blockCount=17155, accesses=133155383, hits=132992986, hitRatio=99.88%, , cachingAccesses=132985682, cachingHits=132951576, cac
hingHitsRatio=99.97%, evictions=16199, evicted=0, evictedPerRun=0.0
2022-09-28 11:24:01,569 INFO [MobFileCache #0] mob.MobFileCache: MobFileCache Statistics, access: 0, miss: 0, hit: 0, hit ratio: 0%, evicted files: 0
2022-09-28 11:24:05,246 INFO [regionserver/bghbaseclusterdn9528:16020.logRoller] wal.AbstractFSWAL: Rolled WAL /apps/hbase/data/WALs/bghbaseclusterdn9528,16020,1664173440239/bghbaseclusterdn9528%2C16020%2C1664173440239.1664331845190 with entries=21, files
ize=5.39 KB; new WAL /apps/hbase/data/WALs/bghbaseclusterdn9528,16020,1664173440239/bghbaseclusterdn9528%2C16020%2C1664173440239.1664335445235
I found some solutions in code. such as HBASE-21507、HBASE-24515、HBASE-21775
Performing a simple query on a small sample dataset (195 rows, 22 columns) either throws an out of memory exception, or, following many suggestions to increase memory sizes, never ends.
Options tried
set hive.optimize.sort.dynamic.partition = true
increase tez memory
increase memory & decrease shuffle size
increase memory
more like that
Sometimes the OOM error is gone, but then it runs for hours without any result...
Query
select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table
Example query that never stops
hive -hiveconf hive.tez.container.size=2048 -hiveconf hive.tez.java.opts=-Xmx1640m -hiveconf tez.runtime.io.sort.mb=820 -hiveconf tez.runtime.unordered.output.buffer.size-mb=205 -e "select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table"
Out of memory
Status: Running (Executing on YARN cluster with App id application_1473144435077_0015)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 1 0 0 1 4 0
Reducer 2 KILLED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 18.30 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1473144435077_0015_1_00, diagnostics=[Task failed, taskId=task_1473144435077_0015_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:159)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.<init>(PipelinedSorter.java:172)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.<init>(PipelinedSorter.java:116)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:142)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Never stops
(33 secs is example, doesnt stop in hours)
Status: Running (Executing on YARN cluster with App id application_1473144435077_0025)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 INITED 1 0 0 1 0 0
Reducer 2 INITED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 33.32 s
--------------------------------------------------------------------------------
Took me too long to find the answer, hopefully this will help someone else...
So this breaks down to 2 problems:
Heap size too small, solved by increasing heap size
Hive job stuck in pending status
The following command solved my issues
hive -hiveconf hive.tez.container.size=512 -hiveconf hive.tez.java.opts="-server -Xmx512m -Djava.net.preferIPv4Stack=true" -e "select * lag(status, 1, null) over (partition by type_id order by time) as status_prev from sample_table"
Source
have a batch process and a regular application that updates the same table.My batch have multiple threads that run on multiple sessions. I got the following errors in my batch Tomcat:
2012-09-10 11:30:17,043 [SyncDataThread567] ERROR org.springframework.batch.core.step.AbstractStep - Encountered an error executing the step
aaa.bbb.ccc.framework.orm.DAOException:
--- The error occurred in abc.xml.
--- The error occurred while applying a parameter map.
--- Check the ear.updateServiceTimeParamMap.
--- Check the statement (update procedure failed).
--- Cause: java.sql.SQLException: ORA-20011: FUNC_UPDATESERVICETIME : Error occured ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "ER.FUNC_UPDATESERVICETIME", line 154
ORA-06512: at line 1
at aaa.bbb.ccc.ddd.eee.Sss.updateServiceTimes(ServiceOrderDAOImpl.java:76)
at sun.reflect.GeneratedMethodAccessor352.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy6.updateServiceTimes(Unknown Source)
at aaa.bbb.ccc.ddd.eeee.Inbddd.updateServiceTimes(InbDataWriter.java:144)
at aaa.bbb.ccc.ddd.eeee.Inbddd.write(InbDataWriter.java:74)
at sun.reflect.GeneratedMethodAccessor270.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy7.write(Unknown Source)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.writeItems(SimpleChunkProcessor.java:171)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.doWrite(SimpleChunkProcessor.java:150)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.write(SimpleChunkProcessor.java:268)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:194)
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:74)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:109)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:107)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at org.springframework.core.task.SimpleAsyncTaskExecutor$ConcurrencyThrottlingRunnable.run(SimpleAsyncTaskExecutor.java:192)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred in ael.xml.
--- The error occurred while applying a parameter map.
--- Check the eraa.updateServiceTimeParamMap.
--- Check the statement (update procedure failed).
--- Cause: java.sql.SQLException: ORA-20011: FUNC_UPDATESERVICETIME : Error occured ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "ER.FUNC_UPDATESERVICETIME", line 154
ORA-06512: at line 1
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryWithCallback(MappedStatement.java:201)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryForObject(MappedStatement.java:120)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:518)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:493)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:106)
at com.iit.integration.erl.orm.ServiceOrderDAOImpl.updateServiceTimes(ServiceOrderDAOImpl.java:71)
... 44 more
Caused by: java.sql.SQLException: ORA-20011: FUNC_UPDATESERVICETIME : Error occured ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "ER.FUNC_IIT_UPDATESERVICETIME", line 154
ORA-06512: at line 1
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:331)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:288)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:743)
at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:215)
at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:954)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1168)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3390)
at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:4223)
at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:169)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQueryProcedure(SqlExecutor.java:278)
at com.ibatis.sqlmap.engine.mapping.statement.ProcedureStatement.sqlExecuteQuery(ProcedureStatement.java:39)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryWithCallback(MappedStatement.java:189)
... 49 more
This is my oracle trace file:
Redo thread mounted by this instance: 1
Oracle process number: 63
Windows thread id: 2464, image: ORACLE.EXE (SHAD)
*** 2012-09-10 11:30:12.384
*** SERVICE NAME:(SYS$USERS) 2012-09-10 11:30:12.244
*** SESSION ID:(411.3766) 2012-09-10 11:30:12.244
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
UPDATE SP SET SRVC_TM = :B4 , MODIFICATION_DTM=SYSDATE WHERE OPERATION_AREA_CD = :B3 AND ROUTE_TYP = :B2 AND OBJECTID = :B1
----- PL/SQL Call Stack -----
object line object
handle number name
000000057D9B52E8 134 function ER.FUNC_UPDATESERVICETIME
000000057C3A5848 1 anonymous block
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00040020-0017465b 63 411 X 94 364 X
TX-00020020-00166804 94 364 X 63 411 X
session 411: DID 0001-003F-00000033 session 364: DID 0001-005E-00000016
session 364: DID 0001-005E-00000016 session 411: DID 0001-003F-00000033
Rows waited on:
Session 364: obj - rowid = 0000CC64 - AAAMxkAA2AAA1q2AAY
(dictionary objn - 52324, file - 54, block - 219830, slot - 24)
Session 411: obj - rowid = 0000CC64 - AAAMxkAA2AAA1q2AAR
(dictionary objn - 52324, file - 54, block - 219830, slot - 17)
Information on the OTHER waiting sessions:
Session 364:
pid=94 serial=6104 audsid=693767 user: 57/ER
O/S info: user: , term: , ospid: 1234, machine: abc
program:
Current SQL Statement:
UPDATE SP SET ORIG_NO='751' ,ORIG_SEQ_NO=0,SP_ROUTING_STATUS='A', USER_ID='XXXX', MODIFICATION_DTM=SYSDATE WHERE OBJECTID IN ('104883389','104883404','104883407','104883440','104883443','104883455','104883467','104883509','104883545','104883764','104883788','104883806','104883812','104883821','104883836','104883854','104883863','104883893','104883899','104883931','104883937','104883964','104884084','104884117','104884120','104884138','104884141','104885439','104883386','104883422','104883560','104883587','104883767','104883785','104883809','104883824','104883845','104883851','104883884','104883890','104883955','104883958','104884012','104884093','104884114','104885412','104885436','104885442','104885445','104883383','104883395','104883413','104883419','104883464','104883494','104883524','104883773','104883842','104883917','104883920','104883943','104883949','104883967','104883997','104884051','104884105','104884108','104885451','104883437','104883461','104883476','104883497','104883500','104883503','104883566','104883584','104883614','104883794','104883800','104883815','104883830','104883857','104883869','104883923','104883952','104884048','104884057','104884063','104884066','104884081','104884087','104884102','104884111','104884135','104885415','104885424','104885427','104886297','104886308','104883398','104883410','104883458','104883473','104883512','104883515','104883527','104883530','104883536','104883554','104883596','104883770','104883782','104883803','104883827','104883833','104883839','104883848','104883866','104883875','104883878','104883896','104883902','104883914','104883970','104883976','104884060','104884069','104884072','104884123','104884132','104885409','104885430','104883425','104883431','104883446','104883449','104883452','104883482','104883506','104883518','104883539','104883548','104883569','104883575','104883578','104883623','104883779','104883797','104883818','104883860','104883925','104883934','104883940','104883946','104883973','104883979','104883982','104884078','104884090','104884096','104885421','104885448','104885454','104883392','104883416','104883428','104883479','104883491','104883521','104883542','104883551','104883557','104883563','104883872','104883911','104883928','104883961','104883994','104884018','104884054','104884099','104884129','104886299','104883401','104883434','104883470','104883485','104883533','104883572','104883581','104883776','104883791','104883881','104883887','104883905','104883908','104884075','104884126','104885418','104885433')
End of information on OTHER waiting sessions.
===================================================
PROCESS STATE
-------------
Process global information:
process: 000000057B3343D8, call: 0000000574FCBF78, xact: 0000000576A07F60, curses: 000000057E48D858, usrses: 000000057E48D858
----------------------------------------
SO: 000000057B3343D8, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=63, calls cur/top: 0000000574FCBF78/0000000574FD4C48, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 108 0 4
last post received-location: aaa
last process to post me: 7e31d890 1 6
last post sent: 0 0 112
last post sent-location: bbb
last process posted by me: 7b334c00 3 0
(latch info) wait_event=0 bits=10
holding (efd=19) 4745310 Parent+children enqueue hash chains level=4
Location from where latch is held: cmi: gpl:
Context saved from call: 0
state=busy, wlstate=free
recovery area:
Dump of memory from 0x000000057E300810 to 0x000000057E300830
57E300810 00000000 00000000 00000000 00000000 [...............
I have been reaserching this issue from past few days. From what I saw few are saying its a indexing issue, few are saying its INITRANS... I am not sure.. But this deadlock happens very rare. But whenever it happens its a big issue.
So please help me guys.. what to look for.. and how I can solve this issue..
Look at your two UPDATE statements and try to understand why they would request the same row, but in a different order. That's how almost all
deadlocks happen.
There are several possible ways to avoid this error:
1) Update rows in the same order. You may be able to do this with a hint to force a full table scan or index. (I'm not 100% certain that using the same access method will always avoid this issue, but in practice it does seem to fix it. See my old question for a painful discussion about deadlocks when the access method is the same.)
2) Do not run your two processes at the same time.
3) Handle exceptions. For example, something like this:
declare
deadlock exception;
pragma exception_init(deadlock, -00060);
begin
<code>
exception when deadlock then
<do something about it here, such as re-try>
end;
/
You need to add the exception handling to both blocks of code. And a deadlock will still generate a trace file, which will probably
slow things down and take up a lot of space where no one expects it.
When I run apache bench I get results like:
Command: abs.exe -v 3 -n 10 -c 1 https://mysite
Connection Times (ms)
min mean[+/-sd] median max
Connect: 203 213 8.1 219 219
Processing: 78 177 88.1 172 359
Waiting: 78 169 84.6 156 344
Total: 281 389 86.7 391 564
I can't seem to find the definition of Connect, Processing and Waiting. What do those numbers mean?
By looking at the source code we find these timing points:
apr_time_t start, /* Start of connection */
connect, /* Connected, start writing */
endwrite, /* Request written */
beginread, /* First byte of input */
done; /* Connection closed */
And when request is done some timings are stored as:
s->starttime = c->start;
s->ctime = ap_max(0, c->connect - c->start);
s->time = ap_max(0, c->done - c->start);
s->waittime = ap_max(0, c->beginread - c->endwrite);
And the 'Processing time' is later calculated as
s->time - s->ctime;
So if we translate this to a timeline:
t1: Start of connection
t2: Connected, start writing
t3: Request written
t4: First byte of input
t5: Connection closed
Then the definitions would be:
Connect: t1-t2 Most typically the network latency
Processing: t2-t5 Time to receive full response after connection was opened
Waiting: t3-t4 Time-to-first-byte after the request was sent
Total time: t1-t5
From http://chestofbooks.com/computers/webservers/apache/Stas-Bekman/Practical-mod_perl/9-1-1-ApacheBench.html:
Connect and Waiting times
The amount of time it took to establish the connection and get the first bits of a response
Processing time
The server response time—i.e., the time it took for the server to process the request and send a reply
Total time
The sum of the Connect and Processing times
I equate this to:
Connect time: the amount of time it took for the socket to open
Processing time: first byte + transfer
Waiting: time till first byte
Total: Sum of Connect + Processing
Connect: Time it takes to connect to remote host
Processing: Total time minus time it takes to connect to remote host
Waiting: Response first byte receive minus last byte sent
Total: From before connect till after the connection is closed