Copying an Oracle Database to another environment does not work - oracle

I am trying to restore a database from one environment to another. I backed up all the files from machineA and copied them to machineB.
The directory structures and file locations of both machines is the same and both are running Oracle 10.2.0.3.0.
I have done this several times before and it has always worked fine but this time i seem to be struggling and i appear to be stuck. After restoring all files into machineB, i startup oracle and it is showing that it has started.
SQL> startup
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size 2030456 bytes
Variable Size 234882184 bytes
Database Buffers 1358954496 bytes
Redo Buffers 14745600 bytes
Database mounted.
Database opened.
A few minutes later it just terminates. I look at the alert log and this is what it is showing
ALTER DATABASE MOUNT
Wed Nov 23 11:16:14 2011
Setting recovery target incarnation to 1
Wed Nov 23 11:16:14 2011
Successful mount of redo thread 1, with mount id 4202976378
Wed Nov 23 11:16:14 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Wed Nov 23 11:16:14 2011
ALTER DATABASE OPEN
Wed Nov 23 11:16:15 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Wed Nov 23 11:16:15 2011
Started redo scan
Wed Nov 23 11:16:15 2011
Completed redo scan
22887 redo blocks read, 29 data blocks need recovery
Wed Nov 23 11:16:15 2011
Started redo application at
Thread 1: logseq 29229, block 72
Wed Nov 23 11:16:15 2011
Recovery of Online Redo Log: Thread 1 Group 3 Seq 29229 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo03.log
Wed Nov 23 11:16:15 2011
Completed redo application
Wed Nov 23 11:16:16 2011
Completed crash recovery at
Thread 1: logseq 29229, block 22959, scn 10603747634124
29 data blocks read, 29 data blocks written, 22887 redo blocks read
Wed Nov 23 11:16:17 2011
Thread 1 advanced to log sequence 29230
Thread 1 opened at log sequence 29230
Current log# 1 seq# 29230 mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Successful open of redo thread 1
Wed Nov 23 11:16:17 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Nov 23 11:16:17 2011
SMON: enabling cache recovery
Wed Nov 23 11:16:18 2011
Successfully onlined Undo Tablespace 1.
Wed Nov 23 11:16:18 2011
SMON: enabling tx recovery
Wed Nov 23 11:16:18 2011
Database Characterset is WE8ISO8859P1
Wed Nov 23 11:16:18 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=16, OS id=13532
Wed Nov 23 11:16:20 2011
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:16:20 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery stopped at EOT rba 29230.66.16
Block recovery completed at rba 29230.66.16, scn 2468.3768347663
Doing block recovery for file 2 block 25
Block recovery from logseq 29230, block 56 to scn 10603747634177
Wed Nov 23 11:16:20 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.58.16, scn 2468.3768347651
Wed Nov 23 11:16:20 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-01595: error freeing extent (3) of rollback segment (2))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:16:20 2011
Completed: ALTER DATABASE OPEN
Wed Nov 23 11:16:21 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_mmon_13521.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:16:22 2011
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:16:22 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Doing block recovery for file 2 block 25
Block recovery from logseq 29230, block 56 to scn 10603747634208
Wed Nov 23 11:16:23 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.88.16, scn 2468.3768347681
Wed Nov 23 11:18:27 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:18:28 2011
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:28 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:28 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:30 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:30 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:32 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:32 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:34 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:34 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at "SYS.PRVT_ADVISOR", line 4896
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at line 1
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:35 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:35 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at "SYS.PRVT_ADVISOR", line 4896
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at line 1
Wed Nov 23 11:18:36 2011
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:36 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:36 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_pmon_13503.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:18:37 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_pmon_13503.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
PMON: terminating instance due to error 472
Instance terminated by PMON, pid = 13503
The only difference this time (compared from last time i restored the database) is that this time i cleared the trace files from the bdump directory before i started up the directory(not the alert log). Could this have caused this problem?
Here is an example of one of the trace file mentioned in one of the alert log
/u/db1/app/oracle/admin/ccsbill/bdump/mydb_pmon_13503.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u/db1/app/oracle/product/10.2.0/db
System name: SunOS
Node name: myPC
Release: 5.9
Version: Generic_122300-13
Machine: sun4u
Instance name: mydb
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 13503, image: oracle#myPC (PMON)
*** 2011-11-23 11:18:36.626
*** SERVICE NAME:(SYS$BACKGROUND) 2011-11-23 11:18:36.625
*** SESSION ID:(170.1) 2011-11-23 11:18:36.625
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
DEBUG: Reconstructing undo block 0x8003cc for xcb 0x3ddf94728
Doing block recovery for file 2 block 972
Block header before block recovery:
buffer tsn: 1 rdba: 0x008003cc (2/972)
scn: 0x09a4.e09bc65c seq: 0x01 flg: 0x04 tail: 0xc65c0201
frmt: 0x02 chkval: 0x409e type: 0x02=KTU UNDO BLOCK
Block recovery from logseq 29230, block 56 to scn 10603747634191
*** 2011-11-23 11:18:36.641
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 383Kb in 0.06s => 6.24 Mb/sec
Total physical reads: 4096Kb
Longest record: 0Kb, moves: 0/11 (0%)
Longest LWN: 1Kb, moves: 0/7 (0%), moved: 0Mb
Last redo scn: 0x09a4.e09c6c0e (10603747634190)
----------------------------------------------
Block image after block recovery:
buffer tsn: 1 rdba: 0x008003cc (2/972)
scn: 0x09a4.e09bc65c seq: 0x01 flg: 0x04 tail: 0xc65c0201
frmt: 0x02 chkval: 0x409e type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000003D13FA000 to 0x00000003D13FC000
3D13FA000 02A20000 008003CC E09BC65C 09A40104 [...........\....]
3D13FA010 409E0000 00020018 00065BF2 C49F1515 [#.........[.....]
3D13FA020 00001FE8 1F641ED8 1E041D7C 1CF81C5C [.....d.....|...\]
3D13FA030 1BC41B40 1ADC1A88 1A1C1998 192818A8 [...#.........(..]
3D13FA040 1810172C 16C4166C 15E8156C 14D40000 [...,...l...l....]
3D13FA050 00000000 00000000 00000000 00000000 [................]
Repeat 328 times
3D13FB4E0 00000000 00000000 000C0048 0020001D [...........H. ..]
3D13FB4F0 00020000 000000ED 000000ED 00000000 [................]
3D13FB500 00000000 0B011800 04080001 008003CC [................]
3D13FB510 C49F1300 E09BC00B 09A40000 E09BC011 [................]
3D13FB520 09A40001 000A0024 E09BC647 09A4FFFF [.......$...G....]
3D13FB530 008003C5 00000000 00000000 04010000 [................]
3D13FB540 00000000 00070010 0004935C 00800580 [...........\....]
3D13FB550 9B5B1600 800009A4 E09BC643 0040067A [.[.........C.#.z]
3D13FB560 00400679 12FF0501 020076C0 2C000100 [.#.y......v.,...]
3D13FB570 00001301 FFF90100 00000000 00050000 [................]
...skipping...
child# table reference handle
------ -------- --------- --------
0 3dac7ffd8 3dac7fc48 3de7ab528
DATA BLOCKS:
data# heap pointer status pins change whr
----- -------- -------- --------- ---- ------ ---
0 3de7abae8 3dac80628 I/P/A/-/- 0 NONE 00
----------------------------------------
SO: 3df696d30, type: 12, owner: 3df4091d8, flag: -/-/-/0x00
KSV Slave Class State
--------------
slave num 0, incarnation 1, KSV Context 3df694da0, creator: 3df2f5ff8
slave flags: 0x102
ksvcctx: 3df694da0 dpptr: 3df696d30 exitcond: 0 class#: 5
active: 1 spawned: 1 max: a flags: 0x2 enqueue: 0
directmsghdl: 3df4678b8 workmsghdl: 3df467928
ksvwqlr: 3df694da0 latch 3df694da0
ksvrecv: 3df694e40 op: 0x0 ro = 0 owner = 0
Queue (0)
ksvmqd: 3df694e90 count : 0
ksvwqlr: 3df694e90 latch 3df694e90
ksvrecv: 3df694f30 op: 0x0 ro = 0 owner = 0
Queue messages 3df694f50 Is Empty [3df694f50,3df694f50]
Queue (1)
ksvmqd: 3df694f68 count : 0
ksvwqlr: 3df694f68 latch 3df694f68
ksvrecv: 3df695008 op: 0x0 ro = 0 owner = 0
Queue messages 3df695028 Is Empty [3df695028,3df695028]
Queue (2)
ksvmqd: 3df695040 count : 0
ksvwqlr: 3df695040 latch 3df695040
ksvrecv: 3df6950e0 op: 0x0 ro = 0 owner = 0
Queue messages 3df695100 Is Empty [3df695100,3df695100]
dmsg: sendq: 3df696dc0 Is Empty [3df696dc0,3df696dc0]
dmsg: recvq: 3df696dd0 Is Empty [3df696dd0,3df696dd0]
dmsg: doneq: 3df696de0 Is Empty [3df696de0,3df696de0]
wmsg: workq: 3df696df0 Is Empty [3df696df0,3df696df0]
wmsg: doneq: 3df696e00 Is Empty [3df696e00,3df696e00]
Class Context: active: 1, spawned: 1, max: 10
Context Flags: 0x2, Work Queue: 3df694e90, Class Num: 5
----------------------------------------
SO: 3ddfcbbe8, type: 41, owner: 3df4091d8, flag: INIT/-/-/0x00
(dummy) nxc=0, nlb=1
----------------------------------------
SO: 3ddf46648, type: 39, owner: 3ddfcbbe8, flag: -/-/-/0x00
(List of Blocks) next index = 5
index itli buffer hint rdba savepoint
-----------------------------------------------------------
0 1 0x3d0fa00a8 0xc05534 0x6b69
1 2 0x3d0f9f2d8 0xc002ee 0x6b6b
2 2 0x3d0f97ce8 0xc002f8 0x6b6d
3 2 0x3d0f97ac8 0xc00300 0x6b6f
4 2 0x3d0f97578 0xc0894a 0x6b71
----------------------------------------
SO: 3df43ad08, type: 3, owner: 3df2f9f38, flag: INIT/-/-/0x00
(call) sess: cur 3df4091d8, rec 3df4053c8, usr 3df4091d8; depth: 0
(k2g table)
error 600 detected in background process
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []

Did you see the ORA-00600 [4194] errors? They look like this:
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
That's your problem.
ORA-00600 always means to work with Oracle Support on the problem. I did a quick lookup, and the 4194 error means you have undo segment corruption.
You may try redoing the clone, assuming the source database itself isn't corrupted. If the source has this problem too, you'll probably need to restore/recover the UNDO tablespace, at a minimum.
I strongly suggest you login to MOS support site, and look closely at this document:
ORA-600 [4194] "Undo Record Number Mismatch While Adding Undo Record"
[ID 39283.1]
Hope that helps.

Related

DPC_WATCHDOG_VIOLATION (133/1) Potentially related to NdisFIndicateReceiveNetBufferLists?

We have a NDIS LWF driver, and on a single machine we get a DPC_WATCHDOG_VIOLATION 133/1 bugcheck when they try to connect to their VPN to connect to the internet. This could be related to our NdisFIndicateReceiveNetBufferLists, as the IRQL is raised to DISPATCH before calling it (and obviously lowered to whatever it was afterward), and that does appear in the output of !dpcwatchdog shown below. This is done due to a workaround for another bug explained here:
IRQL_UNEXPECTED_VALUE BSOD after NdisFIndicateReceiveNetBufferLists?
Now this is the bugcheck:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: fffff805422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding the cumulative timeout
Arg4: 0000000000000000
STACK_TEXT:
nt!KeBugCheckEx
nt!KeAccumulateTicks+0x1846b2
nt!KiUpdateRunTime+0x5d
nt!KiUpdateTime+0x4a1
nt!KeClockInterruptNotify+0x2e3
nt!HalpTimerClockInterrupt+0xe2
nt!KiCallInterruptServiceRoutine+0xa5
nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
nt!KiInterruptDispatchNoLockNoEtw+0x37
nt!KxWaitForSpinLockAndAcquire+0x2c
nt!KeAcquireSpinLockAtDpcLevel+0x5c
wanarp!WanNdisReceivePackets+0x4bb
ndis!ndisMIndicateNetBufferListsToOpen+0x141
ndis!ndisMTopReceiveNetBufferLists+0x3f0e4
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!NdisMIndicateReceiveNetBufferLists+0x104
ndiswan!IndicateRecvPacket+0x596
ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b
ndiswan!ProcessPPPFrame+0x16f
ndiswan!ReceivePPP+0xb3
ndiswan!ProtoCoReceiveNetBufferListChain+0x442
ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6
ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
raspptp!CallIndicateReceived+0x210
raspptp!CallProcessRxNBLs+0x199
ndis!ndisDispatchIoWorkItem+0x12
nt!IopProcessWorkItem+0x135
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28
SYMBOL_NAME: wanarp!WanNdisReceivePackets+4bb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: wanarp
IMAGE_NAME: wanarp.sys
And this following is the output of !dpcwatchdog, but I still can't find what is causing this bugcheck, and can't find which function is consuming too much time in DISPATCH level which is causing this bugcheck. Although I think this could be related to some spin locking done by wanarp? Could this be a bug with wanarp? Note that we don't use any spinlocking in our driver, and us raising the IRQL should not cause any issue as it is actually very common for indication in Ndis to be done at IRQL DISPATCH.
So How can I find the root cause of this bugcheck? There are no other third party LWF in the ndis stack.
3: kd> !dpcwatchdog
All durations are in seconds (1 System tick = 15.625000 milliseconds)
Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats: !intstats /d
--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
--------------------------------------------------
CPU#1
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
1: Normal : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart
1: Normal : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine
1: Normal : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #1.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d8991 nt!KiUpdateRunTime+0x5D
01 fffff805418d2803 nt!KiUpdateTime+0x4A1
02 fffff805418db1c2 nt!KeClockInterruptNotify+0x2E3
03 fffff80541808a45 nt!HalpTimerClockInterrupt+0xE2
04 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
05 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
06 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
07 fffff805418da3cc nt!KxWaitForSpinLockAndAcquire+0x2C
08 fffff8054fa614cb nt!KeAcquireSpinLockAtDpcLevel+0x5C
09 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x4BB
0a fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
0b fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0c fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0d fffff80546ba4a94 ndis!ndisInvokeNextReceiveHandler+0x1DF
0e fffff8057c32d17e ndis!NdisMIndicateReceiveNetBufferLists+0x104
0f fffff8057c30d6c7 ndiswan!IndicateRecvPacket+0x596
10 fffff8057c32d56b ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B
11 fffff8057c32d823 ndiswan!ProcessPPPFrame+0x16F
12 fffff8057c308e62 ndiswan!ReceivePPP+0xB3
13 fffff80546c5c006 ndiswan!ProtoCoReceiveNetBufferListChain+0x442
14 fffff80546c5c2d1 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6
15 fffff8057c2b0064 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
16 fffff8057c2b06a9 raspptp!CallIndicateReceived+0x210
17 fffff80546bd9dc2 raspptp!CallProcessRxNBLs+0x199
18 fffff80541899645 ndis!ndisDispatchIoWorkItem+0x12
19 fffff80541852b65 nt!IopProcessWorkItem+0x135
1a fffff80541871d25 nt!ExpWorkerThread+0x105
1b fffff80541a00778 nt!PspSystemThreadStartup+0x55
1c ---------------- nt!KiStartSystemThread+0x28
--------------------------------------------------
CPU#2
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
2: Normal : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine
DPC Watchdog Captures Analysis for CPU #2.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr Call Site
00 fffff805418d245a nt!KeClockInterruptNotify+0x453
01 fffff80541808a45 nt!HalpTimerClockIpiRoutine+0x1A
02 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5
03 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA
04 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37
05 fffff805418a9a68 nt!KxWaitForSpinLockAndAcquire+0x2C
06 fffff8054fa611cb nt!KeAcquireSpinLockRaiseToDpc+0x88
07 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x1BB
08 fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141
09 fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4
0a fffff80546bddfef ndis!ndisCallReceiveHandler+0x61
0b fffff80546be3a81 ndis!ndisInvokeNextReceiveHandler+0x1DF
0c fffff80546ba804e ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611
0d fffff8054e384d77 ndis!NdisFIndicateReceiveNetBufferLists+0x6E
0e fffff8054e3811a9 ourdriver+0x4D70
0f fffff80546ba7d40 ourdriver+0x11A0
10 fffff8054182a6b5 ndis!ndisDummyIrpHandler+0x100
11 fffff80541c164c8 nt!IofCallDriver+0x55
12 fffff80541c162c7 nt!IopSynchronousServiceTail+0x1A8
13 fffff80541c15646 nt!IopXxxControlFile+0xC67
14 fffff80541a0aab5 nt!NtDeviceIoControlFile+0x56
15 ---------------- nt!KiSystemServiceCopyEnd+0x25
--------------------------------------------------
CPU#3
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
dpcs: no pending DPCs found
Target machine version: Windows 10 Kernel Version 19041 MP (4 procs)
Also note that we also pass the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL flag to the NdisFIndicateReceiveNetBufferLists, if the current IRQL is dispatch.
Edit1:
This is also the output of !locks and !qlocks and !ready, And the contention count on one of the resources is 49135, is this normal or too high? Could this be related to our issue? The threads that are waiting on it or own it are for normal processes such as chrome, csrss, etc.
3: kd> !kdexts.locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource # nt!ExpTimeRefreshLock (0xfffff80542219440) Exclusively owned
Contention Count = 17
Threads: ffffcf8ce9dee640-01<*>
KD: Scanning for held locks.....
Resource # 0xffffcf8cde7f59f8 Shared 1 owning threads
Contention Count = 62
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks...............................................................................................
Resource # 0xffffcf8ce08d0890 Exclusively owned
Contention Count = 49135
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 6
Threads: ffffcf8cf18e3080-01<*> ffffcf8ce3faf080-01
Threads Waiting On Exclusive Access:
ffffcf8ceb6ce080 ffffcf8ce1d20080 ffffcf8ce77f1080 ffffcf8ce92f4080
ffffcf8ce1d1f0c0 ffffcf8ced7c6080
KD: Scanning for held locks.
Resource # 0xffffcf8ce08d0990 Shared 1 owning threads
Threads: ffffcf8cf18e3080-01<*>
KD: Scanning for held locks.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8ceff46350 Shared 1 owning threads
Threads: ffffcf8ce6de8080-01<*>
KD: Scanning for held locks......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf0cade50 Exclusively owned
Contention Count = 3
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks.........................
Resource # 0xffffcf8cf0f76180 Shared 1 owning threads
Threads: ffffcf8ce83dc080-02<*>
KD: Scanning for held locks.......................................................................................................................................................................................................................................................
Resource # 0xffffcf8cf1875cb0 Shared 1 owning threads
Contention Count = 3
Threads: ffffcf8ce89db040-02<*>
KD: Scanning for held locks.
Resource # 0xffffcf8cf18742d0 Shared 1 owning threads
Threads: ffffcf8cee5e1080-02<*>
KD: Scanning for held locks....................................................................................
Resource # 0xffffcf8cdceeece0 Shared 2 owning threads
Contention Count = 4
Threads: ffffcf8ce3a1c080-01<*> ffffcf8ce5625040-01<*>
Resource # 0xffffcf8cdceeed48 Shared 1 owning threads
Threads: ffffcf8ce5625043-02<*> *** Actual Thread ffffcf8ce5625040
KD: Scanning for held locks...
Resource # 0xffffcf8cf1d377d0 Exclusively owned
Threads: ffffcf8cf0ff3080-02<*>
KD: Scanning for held locks....
Resource # 0xffffcf8cf1807050 Exclusively owned
Threads: ffffcf8ce84ec080-01<*>
KD: Scanning for held locks......
245594 total locks, 13 locks currently held
3: kd> !qlocks
Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt
Processor Number
Lock Name 0 1 2 3
KE - Unused Spare
MM - Unused Spare
MM - Unused Spare
MM - Unused Spare
CC - Vacb
CC - Master
EX - NonPagedPool
IO - Cancel
CC - Unused Spare
IO - Vpb
IO - Database
IO - Completion
NTFS - Struct
AFD - WorkQueue
CC - Bcb
MM - NonPagedPool
3: kd> !ready
KSHARED_READY_QUEUE fffff8053f1ada00: (00) ****------------------------------------------------------------
SharedReadyQueue fffff8053f1ada00: No threads in READY state
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 15
THREAD ffffcf8ce9dee640 Cid 2054.2100 Teb: 000000fab7bca000 Win32Thread: 0000000000000000 READY on processor 1
Processor 2: No threads in READY state
Processor 3: No threads in READY state
3: kd> dt nt!_ERESOURCE 0xffffcf8ce08d0890
+0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffcf8c`e08d0610 - 0xffffcf8c`e08cf710 ]
+0x010 OwnerTable : 0xffffcf8c`ee6e8210 _OWNER_ENTRY
+0x018 ActiveCount : 0n1
+0x01a Flag : 0xf86
+0x01a ReservedLowFlags : 0x86 ''
+0x01b WaiterPriority : 0xf ''
+0x020 SharedWaiters : 0xffffae09`adcae8e0 Void
+0x028 ExclusiveWaiters : 0xffffae09`a9aabea0 Void
+0x030 OwnerEntry : _OWNER_ENTRY
+0x040 ActiveEntries : 1
+0x044 ContentionCount : 0xbfef
+0x048 NumberOfSharedWaiters : 1
+0x04c NumberOfExclusiveWaiters : 6
+0x050 Reserved2 : (null)
+0x058 Address : (null)
+0x058 CreatorBackTraceIndex : 0
+0x060 SpinLock : 0
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0))
(*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0)) [Type: _OWNER_ENTRY]
[+0x000] OwnerThread : 0xffffcf8cf18e3080 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x0 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x0 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x1 [Type: unsigned long]
[+0x008] TableSize : 0xc [Type: unsigned long]
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210)
((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210) : 0xffffcf8cee6e8210 [Type: _OWNER_ENTRY *]
[+0x000] OwnerThread : 0x0 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced : 0x1 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount : 0x0 [Type: unsigned long]
[+0x008] TableSize : 0x7 [Type: unsigned long]
Thanks for reporting this. I've tracked this down to an OS bug: there's a deadlock in wanarp. This issue appears to affect every version of the OS going back to Windows Vista.
I've filed internal issue task.ms/42393356 to track this: if you have a Microsoft support contract, your rep can get you status updates on that issue.
Meanwhile, you can partially work around this issue by either:
Indicating 1 packet at a time (NumberOfNetBufferLists==1); or
Indicating on a single CPU at a time
The bug in wanarp is exposed when 2 or more CPUs collectively process 3 or more NBLs at the same time. So either workaround would avoid the trigger conditions.
Depending on how much bandwidth you're pushing through this network interface, those options could be rather bad for CPU/battery/throughput. So please try to avoid pessimizing batching unless it's really necessary. (For example, you could make this an option that's off-by-default, unless the customer specifically uses wanarp.)
Note that you cannot fully prevent the issue yourself. Other drivers in the stack, including NDIS itself, have the right to group packets together, which would have the side effect re-batching the packets that you carefully un-batched. However, I believe that you can make a statistically significant dent in the crashes if you just indicate 1 NBL at a time, or indicate multiple NBLs on 1 CPU at a time.
Sorry this is happening to you again! wanarp is... a very old codebase.

Long running Informatica Job

I am running a job where in source, i have oracle sql query to read the rows, followed by sorter transformation. Below are session logs where we can see after the start of sorter transformation [srt] it takes more than 3 hours to process 410516 rows. I am failing to understand if sorter is taking time or source query?
Appreciate your response.
READER_1_1_1> RR_4049 [2022-06-26 23:40:13.888] SQL Query issued to database : (Sun Jun 26 23:40:13 2022)
READER_1_1_1> RR_4050 [2022-06-26 23:49:01.147] First row returned from database to reader : (Sun Jun 26 23:49:01 2022)
TRANSF_1_1_1> SORT_40420 [2022-06-26 23:49:01.148] Start of input for Transformation [srt]. : (Sun Jun 26 23:49:01 2022)
READER_1_1_1> BLKR_16019 [2022-06-27 03:04:51.901] Read [410516] rows, read [0] error rows for source table [DUMMY_src] instance name [DUMMY_src]
READER_1_1_1> BLKR_16008 [2022-06-27 03:04:51.902] Reader run completed.
TRANSF_1_1_1> SORT_40421 [2022-06-27 03:04:51.909] End of input for Transformation [srt]. : (Mon Jun 27 03:04:51 2022)
TRANSF_1_1_1> SORT_40422 [2022-06-27 03:04:52.180] End of output from Sorter Transformation [srt]. Processed 410516 rows (6568256 input bytes; 0 temp I/O bytes). : (Mon Jun 27 03:04:52 2022)
TRANSF_1_1_1> SORT_40423 [2022-06-27 03:04:52.181] End of sort for Sorter Transformation [srt]. : (Mon Jun 27 03:04:52 2022)
WRITER_1_*_1> WRT_8167 [2022-06-27 03:04:52.201] Start loading table
WRT_8035 [2022-06-27 03:09:47.457] Load complete time: Mon Jun 27 03:09:47 2022

Repository creation fails while upgrading OWB11gR1(11.1.0.7) to OWB11gR2(11.2.0.4)

I need to a new workspace in OWB11gR2(11.2.0.4) to upgrade OWB11gR1(11.1.0.7). Repository Assistant fails after processing 64%. The following is the error log.
main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): % = 0.8051529790660225
main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): -token name = LOADJAVA; -token type = 13
main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): ProcessEngine.token_db_min_ver =
main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): Before processing LOADJAVA Token
main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): ... I am in processLoadJavaToken ...
main.AWT-EventQueue-0[6]20200714#08:48:36.036: 00> oracle.wh.ui.jcommon.WhButton#424c414: WhButton setLabel rtsString = Yes
main.AWT-EventQueue-0[6]20200714#08:48:36.036: 00> oracle.wh.ui.jcommon.WhButton#424c414: WhButton setLabel rtsString = No
The following is the list of database patches.
Patch 17906774: applied on Wed Aug 04 11:21:52 BDT 2021
Unique Patch ID: 17692968
Created on 14 May 2014, 22:56:54 hrs PST8PDT
Bugs fixed:
17607032, 17974168, 17669786, 17561509, 16885825, 18274560, 17613052
17461930, 16829998, 17251918, 17435868, 17279666, 17328020, 17006987
18260620, 16833468, 18180599, 17292119, 17340242, 17296559, 15990966
17438322, 17939651, 17359696, 18385759, 17820353, 17939225, 17715818
18192446, 16960088, 17191248, 17422695
Patch 31668908 : applied on Mon Jul 12 16:13:02 BDT 2021
Unique Patch ID: 23822194
Patch description: "OJVM PATCH SET UPDATE 11.2.0.4.201020"
Created on 18 Sep 2020, 03:30:45 hrs PST8PDT
Bugs fixed:
23727132, 19554117, 19006757, 14774730, 18933818, 18458318, 18166577
19231857, 19153980, 19058059, 19007266, 17285560, 17201047, 17056813
19223010, 19852360, 19909862, 19895326, 19374518, 20408829, 21047766
21566944, 19176885, 17804361, 17528315, 21811517, 22253904, 19187988
21911849, 22118835, 22670385, 23265914, 22675136, 24448240, 25067795
24534298, 25076732, 25494379, 26023002, 19699946, 26637592, 27000663
25649873, 27461842, 27952577, 27642235, 28502128, 28915933, 29254615
29774367, 29992392, 29448234, 30160639, 30534664, 30855121, 31306274
30772207, 31476032, 30561292, 28394726, 26716835, 24817447, 23082876
31668867
Patch 31537677 : applied on Thu Jul 08 11:53:10 BDT 2021
Unique Patch ID: 23852314
Patch description: "Database Patch Set Update : 11.2.0.4.201020 (31537677)"
The following is the workaround to fix the issue.
Step 1: Rollback the patches 31668908 and 31537677. OWB does not support with OJVM patch newer than December 2018
Step 2:Re-run the repository assistant.

Can't add node to the cockroachde cluster

I'm staking to join a CockroachDB node to a cluster.
I've created first cluster, then try to join 2nd node to the first node, but 2nd node created new cluster as follows.
Does anyone knows whats are wrong steps on the following my steps, any suggestions are wellcome.
I've started first node as follows:
cockroach start --insecure --advertise-host=163.172.156.111
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
*
CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://163.172.156.111:8080
sql: postgresql://root#163.172.156.111:26257?sslmode=disable
client flags: cockroach <client cmd> --host=163.172.156.111:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp449555924
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: initialized new cluster
clusterID: 3e797faa-59a1-4b0d-83b5-36143ddbdd69
nodeID: 1
Then, start secondary node to join to 163.172.156.111, but can't join:
cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
The cockroach.log of joining node shows some gosip error:
cat cockroach-data/logs/cockroach.log
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] file created at: 2019/05/11 01:21:13
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] running on machine: amfortas
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257]
I190511 01:21:13.762309 1 util/log/clog.go:1199 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190511 01:21:13.762307 1 cli/start.go:1033 logging to directory /home/ueda/cockroach-data/logs
W190511 01:21:13.763373 1 cli/start.go:1068 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
I190511 01:21:13.763675 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.763752 1 cli/start.go:944 Using the default setting for --cache (128 MiB).
A significantly larger value is usually needed for good performance.
If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB).
I190511 01:21:13.764011 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.764047 1 cli/start.go:957 Using the default setting for --max-sql-memory (128 MiB).
A significantly larger value is usually needed in production.
If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB).
I190511 01:21:13.764239 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.764272 1 cli/start.go:1082 CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.866977 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.867002 1 server/config.go:386 system total memory: 992 MiB
I190511 01:21:13.867063 1 server/config.go:388 server configuration:
max offset 500000000
cache size 128 MiB
SQL memory pool size 128 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190511 01:21:13.867098 1 cli/start.go:929 process identity: uid 1000 euid 1000 gid 1000 egid 1000
I190511 01:21:13.867115 1 cli/start.go:554 starting cockroach node
I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997"
I190511 01:21:13.894320 21 server/server.go:876 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data"
W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190511 01:21:13.911380 21 server/config.go:494 [n?] 1 storage engine initialized
I190511 01:21:13.911417 21 server/config.go:497 [n?] RocksDB cache size: 128 MiB
I190511 01:21:13.911427 21 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
W190511 01:21:13.912459 21 gossip/gossip.go:1496 [n?] no incoming or outgoing connections
I190511 01:21:13.913206 21 server/server.go:926 [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms
I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.395848 21 gossip/gossip.go:392 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548
W190511 01:21:14.458176 21 storage/replica_range_lease.go:506 can't determine lease status due to node liveness error: node not in the liveness table
I190511 01:21:14.458465 21 server/node.go:461 [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190511 01:21:14.458775 21 storage/stores.go:244 [n1] read 0 node addresses from persistent storage
I190511 01:21:14.459095 21 server/node.go:699 [n1] connecting to gossip network to verify cluster ID...
W190511 01:21:14.469842 96 storage/store.go:1525 [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I190511 01:21:14.474785 21 server/node.go:719 [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc"
I190511 01:21:14.475033 21 server/node.go:542 [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes []
I190511 01:21:14.475393 21 server/status/recorder.go:610 [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:14.475514 21 server/server.go:1582 [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080)
I190511 01:21:14.475572 21 server/server.go:1584 [n1] starting grpc/postgres server at [::]:26257
I190511 01:21:14.475605 21 server/server.go:1585 [n1] advertising CockroachDB node at 128.199.127.164:26257
W190511 01:21:14.475655 21 jobs/registry.go:341 [n1] unable to get node liveness: node not in the liveness table
I190511 01:21:14.532949 21 server/server.go:1650 [n1] done ensuring all necessary migrations have run
I190511 01:21:14.533020 21 server/server.go:1653 [n1] serving sql connections
I190511 01:21:14.533209 21 cli/start.go:689 [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
I190511 01:21:14.533257 21 cli/start.go:697 node startup completed:
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
I190511 01:21:14.541205 146 server/server_update.go:67 [n1] no need to upgrade, cluster already at the newest version
I190511 01:21:14.555557 149 sql/event_log.go:135 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486}
I190511 01:21:14.916458 59 gossip/gossip.go:1510 [n1] node has connected to cluster via gossip
I190511 01:21:14.916660 59 storage/stores.go:263 [n1] wrote 0 node addresses to persistent storage
I190511 01:21:24.480247 116 storage/store.go:4220 [n1,s1] sstables (read amplification = 2):
0 [ 51K 1 ]: 51K
6 [ 1M 1 ]: 1M
I190511 01:21:24.480380 116 storage/store.go:4221 [n1,s1]
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 50.73 KB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
L6 1/0 1.26 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Sum 2/0 1.31 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Uptime(secs): 10.6 total, 10.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B
I190511 01:21:24.481565 121 server/status/runtime.go:500 [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net
What is the possibly cause to block to join? Thank you for your suggestion!
It seems you had previously started the second node (the one running on 128.199.127.164) by itself, creating its own cluster.
This can be seen in the error message:
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
To be able to join the cluster, the data directory of the joining node must be empty. You can either delete cockroach-data or specify an alternate directory with --store=/path/to/data-dir

How to understand ORA-00060 deadlock trace file

Recently I got a ORA-00060 deadlock error.
I read this post and this post and this post but I'm not sure what the problem is: unindexed FK cause this or other problem.
My question is how to understand this trace file and how to solve it?
Below is the trace file:
*** 2015-06-02 14:53:45.513
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-00014d94-00000000 497 556 S 332 1414 SX
TM-00014d94-00000000 332 1414 SX 416 1038 S
TX-0011000f-000000ab 416 1038 X 302 457 S
TM-00014d94-00000000 302 457 SX 497 556 S
session 556: DID 0001-01F1-0000000D session 1414: DID 0001-014C-00000022
session 1414: DID 0001-014C-00000022 session 1038: DID 0001-01A0-0000000C
session 1038: DID 0001-01A0-0000000C session 457: DID 0001-012E-00000028
session 457: DID 0001-012E-00000028 session 556: DID 0001-01F1-0000000D
Rows waited on:
Session 556: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA
(dictionary objn - 85396, file - 0, block - 0, slot - 0)
Session 1414: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA
(dictionary objn - 85396, file - 0, block - 0, slot - 0)
Session 1038: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA
(dictionary objn - 85396, file - 0, block - 0, slot - 0)
Session 457: obj - rowid = 00014FA0 - AAAU+gAAEAAAft+AAA
(dictionary objn - 85920, file - 4, block - 129918, slot - 0)
----- Information for the OTHER waiting sessions -----
Session 1414:
sid: 1414 ser: 1424 audsid: 100128 user: 91/SW flags: 0x45
pid: 332 O/S info: user: oracle, term: UNKNOWN, ospid: 10179
image: oracle#jwdb
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: localhost.localdomain program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
insert into t_course_takes (created_at, updated_at, attend, course_id, course_take_type_id, election_mode_id, lesson_id, limit_group_id, paid, remark, semester_id, state, std_id, turn, virtual_cost, id) values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 )
Session 1038:
sid: 1038 ser: 951 audsid: 100212 user: 91/SW flags: 0x45
pid: 416 O/S info: user: oracle, term: UNKNOWN, ospid: 10343
image: oracle#jwdb
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: localhost.localdomain program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
delete from t_course_takes where id=:1
Session 457:
sid: 457 ser: 2983 audsid: 100099 user: 91/SW flags: 0x45
pid: 302 O/S info: user: oracle, term: UNKNOWN, ospid: 10111
image: oracle#jwdb
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: jdbcclient program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
insert into t_elect_loggers (created_at, updated_at, course_code, course_name, course_take_type_id, course_type, credits, election_mode_id, ip_address, lesson_no, operator_code, operator_name, project_id, remark, screening, semester_id, std_code, std_name, turn, type, virtual_orig, virtual_rest, id) values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 )
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=ca9jc1g44ap41) -----
delete from t_course_takes where id=:1
===================================================
PROCESS STATE
-------------
Process global information:
process: 0x9d0fd98c8, call: 0x95429a500, xact: 0x922b10710, curses: 0x94110e198, usrses: 0x94110e198
----------------------------------------
SO: 0x9d0fd98c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x9d0fd98c8, name=process, file=ksu.h LINE:11459, pg=0
(process) Oracle pid:497, ser:7, calls cur/top: 0x95429a500/0x95429a500
flags : (0x0) -
flags2: (0x0), flags3: (0x0)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 9
last post received-location: ksq.h LINE:1877 ID:ksqrcl
last process to post me: 900fd0348 12 0
last post sent: 0 0 9
last post sent-location: ksq.h LINE:1877 ID:ksqrcl
last process posted by me: 900fce2c8 17 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x90102ea98
O/S info: user: oracle, term: UNKNOWN, ospid: 10507
OSD pid info: Unix process pid: 10507, image: oracle#jwdb
Dump of memory from 0x0000000921009C90 to 0x0000000921009E98
921009C90 00000000 00000000 00000000 00000000 [................]
Repeat 31 times
921009E90 00000000 00000000 [........]
(FOB) flags=2050 fib=0x902c3c2b8 incno=0 pending i/o cnt=0
fname=/home/jwdb/oracle/oradata/orcl/undotbs01.dbf
fno=3 lblksz=8192 fsiz=238080
(FOB) flags=2050 fib=0x902c3c8b8 incno=0 pending i/o cnt=0
fname=/home/jwdb/oracle/oradata/orcl/users01.dbf
fno=4 lblksz=8192 fsiz=603680
(FOB) flags=2050 fib=0x902c3b6a0 incno=0 pending i/o cnt=0
fname=/home/jwdb/oracle/oradata/orcl/system01.dbf
fno=1 lblksz=8192 fsiz=92160
----------------------------------------
SO: 0x94110e198, type: 4, owner: 0x9d0fd98c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x9d0fd98c8, name=session, file=ksu.h LINE:11467, pg=0
(session) sid: 556 ser: 3224 trans: 0x922b10710, creator: 0x9d0fd98c8
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40008) -/-
DID: , short-term DID:
txn branch: (nil)
oct: 7, prv: 0, sql: 0x9cfed0be0, psql: 0x9afeb20c8, user: 91/SW
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: localhost.localdomain program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
0: waiting for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=896 snap_id=53
wait times: snap=0.023828 sec, exc=4 min 18 sec, total=4 min 18 sec
wait times: max=infinite, heur=4 min 18 sec
wait counts: calls=87 os=87
in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping first 3 direct blockers:
inst: 1, sid: 457, ser: 2983
inst: 1, sid: 1353, ser: 5618
inst: 1, sid: 907, ser: 5215
Dumping final blocker:
inst: 1, sid: 1168, ser: 1194
There are 1 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 1136, ser: 2212
wait event: 'enq: TM - contention'
p1: 'name|mode'=0x544d0004
p2: 'object #'=0x14d94
p3: 'table/partition'=0x0
row_wait_obj#: 85396, block#: 0, row#: 0, file# 0
min_blocked_time: 12 secs, waiter_cache_ver: 32536
Wait State:
fixed_waits=0 flags=0x23 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000000 sec since current wait
0: waited for 'latch: enqueue hash chains'
address=0x9313226a0, number=0x1c, tries=0x0
wait_id=839 seq_num=895 snap_id=1
wait times: snap=0.226082 sec, exc=0.226082 sec, total=0.226082 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
1: waited for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=894 snap_id=52
wait times: snap=3.000901 sec, exc=4 min 18 sec, total=4 min 18 sec
wait times: max=infinite
wait counts: calls=87 os=87
occurred after 0.000000 sec of elapsed time
2: waited for 'latch: enqueue hash chains'
address=0x6000cf38, number=0x1c, tries=0x0
wait_id=838 seq_num=893 snap_id=1
wait times: snap=0.000142 sec, exc=0.000142 sec, total=0.000142 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
3: waited for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=892 snap_id=51
wait times: snap=12.003822 sec, exc=4 min 15 sec, total=4 min 15 sec
wait times: max=infinite
wait counts: calls=86 os=86
occurred after 0.000000 sec of elapsed time
4: waited for 'latch: enqueue hash chains'
address=0x6000cf38, number=0x1c, tries=0x0
wait_id=837 seq_num=891 snap_id=1
wait times: snap=0.000170 sec, exc=0.000170 sec, total=0.000170 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
5: waited for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=890 snap_id=50
wait times: snap=6.001627 sec, exc=4 min 3 sec, total=4 min 3 sec
wait times: max=infinite
wait counts: calls=82 os=82
occurred after 0.000000 sec of elapsed time
6: waited for 'latch: enqueue hash chains'
address=0x6000cf38, number=0x1c, tries=0x0
wait_id=836 seq_num=889 snap_id=1
wait times: snap=0.000378 sec, exc=0.000378 sec, total=0.000378 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
7: waited for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=888 snap_id=49
wait times: snap=3.000543 sec, exc=3 min 57 sec, total=3 min 57 sec
wait times: max=infinite
wait counts: calls=80 os=80
occurred after 0.000000 sec of elapsed time
8: waited for 'latch: enqueue hash chains'
address=0x6000cf38, number=0x1c, tries=0x0
wait_id=835 seq_num=887 snap_id=1
wait times: snap=0.000350 sec, exc=0.000350 sec, total=0.000350 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
9: waited for 'enq: TM - contention'
name|mode=0x544d0004, object #=0x14d94, table/partition=0x0
wait_id=787 seq_num=886 snap_id=48
wait times: snap=3.000880 sec, exc=3 min 54 sec, total=3 min 54 sec
wait times: max=infinite
wait counts: calls=79 os=79
occurred after 0.000000 sec of elapsed time
Updated on 6/5/2015
The whole trace file is here
This ORA-00060 was causing problem to my team also. The trace file was showing the blocking of two session each other as TX--- X --------S. In our case we were sure that the multithreaded environment is distributed on unique_number column such that two threads will not update the same record. Further down the trace file it showed "enq: TX - allocate ITL entry". When the processing was running I checked the kind of waits:
select blocking_session, sid, serial#, wait_class, seconds_in_wait
from
v$session
where
blocking_session is not NULL
order by
blocking_session;
And it was showing the wait class is "Configure". So, I increased the initrans of the tables used by the processing. Then also reduced the batch size so that commit happens more frequently then earlier. After that the problem is solved.

Resources