Copying an Oracle Database to another environment does not work - oracle
I am trying to restore a database from one environment to another. I backed up all the files from machineA and copied them to machineB.
The directory structures and file locations of both machines is the same and both are running Oracle 10.2.0.3.0.
I have done this several times before and it has always worked fine but this time i seem to be struggling and i appear to be stuck. After restoring all files into machineB, i startup oracle and it is showing that it has started.
SQL> startup
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size 2030456 bytes
Variable Size 234882184 bytes
Database Buffers 1358954496 bytes
Redo Buffers 14745600 bytes
Database mounted.
Database opened.
A few minutes later it just terminates. I look at the alert log and this is what it is showing
ALTER DATABASE MOUNT
Wed Nov 23 11:16:14 2011
Setting recovery target incarnation to 1
Wed Nov 23 11:16:14 2011
Successful mount of redo thread 1, with mount id 4202976378
Wed Nov 23 11:16:14 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Wed Nov 23 11:16:14 2011
ALTER DATABASE OPEN
Wed Nov 23 11:16:15 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Wed Nov 23 11:16:15 2011
Started redo scan
Wed Nov 23 11:16:15 2011
Completed redo scan
22887 redo blocks read, 29 data blocks need recovery
Wed Nov 23 11:16:15 2011
Started redo application at
Thread 1: logseq 29229, block 72
Wed Nov 23 11:16:15 2011
Recovery of Online Redo Log: Thread 1 Group 3 Seq 29229 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo03.log
Wed Nov 23 11:16:15 2011
Completed redo application
Wed Nov 23 11:16:16 2011
Completed crash recovery at
Thread 1: logseq 29229, block 22959, scn 10603747634124
29 data blocks read, 29 data blocks written, 22887 redo blocks read
Wed Nov 23 11:16:17 2011
Thread 1 advanced to log sequence 29230
Thread 1 opened at log sequence 29230
Current log# 1 seq# 29230 mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Successful open of redo thread 1
Wed Nov 23 11:16:17 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Nov 23 11:16:17 2011
SMON: enabling cache recovery
Wed Nov 23 11:16:18 2011
Successfully onlined Undo Tablespace 1.
Wed Nov 23 11:16:18 2011
SMON: enabling tx recovery
Wed Nov 23 11:16:18 2011
Database Characterset is WE8ISO8859P1
Wed Nov 23 11:16:18 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=16, OS id=13532
Wed Nov 23 11:16:20 2011
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:16:20 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery stopped at EOT rba 29230.66.16
Block recovery completed at rba 29230.66.16, scn 2468.3768347663
Doing block recovery for file 2 block 25
Block recovery from logseq 29230, block 56 to scn 10603747634177
Wed Nov 23 11:16:20 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.58.16, scn 2468.3768347651
Wed Nov 23 11:16:20 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-01595: error freeing extent (3) of rollback segment (2))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:16:20 2011
Completed: ALTER DATABASE OPEN
Wed Nov 23 11:16:21 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_mmon_13521.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:16:22 2011
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:16:22 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Doing block recovery for file 2 block 25
Block recovery from logseq 29230, block 56 to scn 10603747634208
Wed Nov 23 11:16:23 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.88.16, scn 2468.3768347681
Wed Nov 23 11:18:27 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:18:28 2011
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:28 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:28 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:30 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:30 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:32 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:32 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:34 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:34 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at "SYS.PRVT_ADVISOR", line 4896
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at line 1
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:35 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:35 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_m000_13538.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at "SYS.PRVT_ADVISOR", line 4896
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
ORA-06512: at line 1
Wed Nov 23 11:18:36 2011
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
Doing block recovery for file 2 block 972
Block recovery from logseq 29230, block 56 to scn 10603747634191
Wed Nov 23 11:18:36 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Mem# 0: /u/db1/app/oracle/oradata/mydb/redo01.log
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
Wed Nov 23 11:18:36 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_pmon_13503.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Wed Nov 23 11:18:37 2011
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_pmon_13503.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
PMON: terminating instance due to error 472
Instance terminated by PMON, pid = 13503
The only difference this time (compared from last time i restored the database) is that this time i cleared the trace files from the bdump directory before i started up the directory(not the alert log). Could this have caused this problem?
Here is an example of one of the trace file mentioned in one of the alert log
/u/db1/app/oracle/admin/ccsbill/bdump/mydb_pmon_13503.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u/db1/app/oracle/product/10.2.0/db
System name: SunOS
Node name: myPC
Release: 5.9
Version: Generic_122300-13
Machine: sun4u
Instance name: mydb
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 13503, image: oracle#myPC (PMON)
*** 2011-11-23 11:18:36.626
*** SERVICE NAME:(SYS$BACKGROUND) 2011-11-23 11:18:36.625
*** SESSION ID:(170.1) 2011-11-23 11:18:36.625
Flush retried for xcb 0x3ddf94728, pmd 0x3dc32cc30
DEBUG: Reconstructing undo block 0x8003cc for xcb 0x3ddf94728
Doing block recovery for file 2 block 972
Block header before block recovery:
buffer tsn: 1 rdba: 0x008003cc (2/972)
scn: 0x09a4.e09bc65c seq: 0x01 flg: 0x04 tail: 0xc65c0201
frmt: 0x02 chkval: 0x409e type: 0x02=KTU UNDO BLOCK
Block recovery from logseq 29230, block 56 to scn 10603747634191
*** 2011-11-23 11:18:36.641
Recovery of Online Redo Log: Thread 1 Group 1 Seq 29230 Reading mem 0
Block recovery completed at rba 29230.66.16, scn 2468.3768347664
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 383Kb in 0.06s => 6.24 Mb/sec
Total physical reads: 4096Kb
Longest record: 0Kb, moves: 0/11 (0%)
Longest LWN: 1Kb, moves: 0/7 (0%), moved: 0Mb
Last redo scn: 0x09a4.e09c6c0e (10603747634190)
----------------------------------------------
Block image after block recovery:
buffer tsn: 1 rdba: 0x008003cc (2/972)
scn: 0x09a4.e09bc65c seq: 0x01 flg: 0x04 tail: 0xc65c0201
frmt: 0x02 chkval: 0x409e type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000003D13FA000 to 0x00000003D13FC000
3D13FA000 02A20000 008003CC E09BC65C 09A40104 [...........\....]
3D13FA010 409E0000 00020018 00065BF2 C49F1515 [#.........[.....]
3D13FA020 00001FE8 1F641ED8 1E041D7C 1CF81C5C [.....d.....|...\]
3D13FA030 1BC41B40 1ADC1A88 1A1C1998 192818A8 [...#.........(..]
3D13FA040 1810172C 16C4166C 15E8156C 14D40000 [...,...l...l....]
3D13FA050 00000000 00000000 00000000 00000000 [................]
Repeat 328 times
3D13FB4E0 00000000 00000000 000C0048 0020001D [...........H. ..]
3D13FB4F0 00020000 000000ED 000000ED 00000000 [................]
3D13FB500 00000000 0B011800 04080001 008003CC [................]
3D13FB510 C49F1300 E09BC00B 09A40000 E09BC011 [................]
3D13FB520 09A40001 000A0024 E09BC647 09A4FFFF [.......$...G....]
3D13FB530 008003C5 00000000 00000000 04010000 [................]
3D13FB540 00000000 00070010 0004935C 00800580 [...........\....]
3D13FB550 9B5B1600 800009A4 E09BC643 0040067A [.[.........C.#.z]
3D13FB560 00400679 12FF0501 020076C0 2C000100 [.#.y......v.,...]
3D13FB570 00001301 FFF90100 00000000 00050000 [................]
...skipping...
child# table reference handle
------ -------- --------- --------
0 3dac7ffd8 3dac7fc48 3de7ab528
DATA BLOCKS:
data# heap pointer status pins change whr
----- -------- -------- --------- ---- ------ ---
0 3de7abae8 3dac80628 I/P/A/-/- 0 NONE 00
----------------------------------------
SO: 3df696d30, type: 12, owner: 3df4091d8, flag: -/-/-/0x00
KSV Slave Class State
--------------
slave num 0, incarnation 1, KSV Context 3df694da0, creator: 3df2f5ff8
slave flags: 0x102
ksvcctx: 3df694da0 dpptr: 3df696d30 exitcond: 0 class#: 5
active: 1 spawned: 1 max: a flags: 0x2 enqueue: 0
directmsghdl: 3df4678b8 workmsghdl: 3df467928
ksvwqlr: 3df694da0 latch 3df694da0
ksvrecv: 3df694e40 op: 0x0 ro = 0 owner = 0
Queue (0)
ksvmqd: 3df694e90 count : 0
ksvwqlr: 3df694e90 latch 3df694e90
ksvrecv: 3df694f30 op: 0x0 ro = 0 owner = 0
Queue messages 3df694f50 Is Empty [3df694f50,3df694f50]
Queue (1)
ksvmqd: 3df694f68 count : 0
ksvwqlr: 3df694f68 latch 3df694f68
ksvrecv: 3df695008 op: 0x0 ro = 0 owner = 0
Queue messages 3df695028 Is Empty [3df695028,3df695028]
Queue (2)
ksvmqd: 3df695040 count : 0
ksvwqlr: 3df695040 latch 3df695040
ksvrecv: 3df6950e0 op: 0x0 ro = 0 owner = 0
Queue messages 3df695100 Is Empty [3df695100,3df695100]
dmsg: sendq: 3df696dc0 Is Empty [3df696dc0,3df696dc0]
dmsg: recvq: 3df696dd0 Is Empty [3df696dd0,3df696dd0]
dmsg: doneq: 3df696de0 Is Empty [3df696de0,3df696de0]
wmsg: workq: 3df696df0 Is Empty [3df696df0,3df696df0]
wmsg: doneq: 3df696e00 Is Empty [3df696e00,3df696e00]
Class Context: active: 1, spawned: 1, max: 10
Context Flags: 0x2, Work Queue: 3df694e90, Class Num: 5
----------------------------------------
SO: 3ddfcbbe8, type: 41, owner: 3df4091d8, flag: INIT/-/-/0x00
(dummy) nxc=0, nlb=1
----------------------------------------
SO: 3ddf46648, type: 39, owner: 3ddfcbbe8, flag: -/-/-/0x00
(List of Blocks) next index = 5
index itli buffer hint rdba savepoint
-----------------------------------------------------------
0 1 0x3d0fa00a8 0xc05534 0x6b69
1 2 0x3d0f9f2d8 0xc002ee 0x6b6b
2 2 0x3d0f97ce8 0xc002f8 0x6b6d
3 2 0x3d0f97ac8 0xc00300 0x6b6f
4 2 0x3d0f97578 0xc0894a 0x6b71
----------------------------------------
SO: 3df43ad08, type: 3, owner: 3df2f9f38, flag: INIT/-/-/0x00
(call) sess: cur 3df4091d8, rec 3df4053c8, usr 3df4091d8; depth: 0
(k2g table)
error 600 detected in background process
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
Did you see the ORA-00600 [4194] errors? They look like this:
Errors in file /u/db1/app/oracle/admin/mydb/bdump/mydb_smon_13515.trc:
ORA-00600: internal error code, arguments: [4194], [21], [21], [], [], [], [], []
That's your problem.
ORA-00600 always means to work with Oracle Support on the problem. I did a quick lookup, and the 4194 error means you have undo segment corruption.
You may try redoing the clone, assuming the source database itself isn't corrupted. If the source has this problem too, you'll probably need to restore/recover the UNDO tablespace, at a minimum.
I strongly suggest you login to MOS support site, and look closely at this document:
ORA-600 [4194] "Undo Record Number Mismatch While Adding Undo Record"
[ID 39283.1]
Hope that helps.
Related
DPC_WATCHDOG_VIOLATION (133/1) Potentially related to NdisFIndicateReceiveNetBufferLists?
We have a NDIS LWF driver, and on a single machine we get a DPC_WATCHDOG_VIOLATION 133/1 bugcheck when they try to connect to their VPN to connect to the internet. This could be related to our NdisFIndicateReceiveNetBufferLists, as the IRQL is raised to DISPATCH before calling it (and obviously lowered to whatever it was afterward), and that does appear in the output of !dpcwatchdog shown below. This is done due to a workaround for another bug explained here: IRQL_UNEXPECTED_VALUE BSOD after NdisFIndicateReceiveNetBufferLists? Now this is the bugcheck: ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DPC_WATCHDOG_VIOLATION (133) The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above. Arguments: Arg1: 0000000000000001, The system cumulatively spent an extended period of time at DISPATCH_LEVEL or above. The offending component can usually be identified with a stack trace. Arg2: 0000000000001e00, The watchdog period. Arg3: fffff805422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains additional information regarding the cumulative timeout Arg4: 0000000000000000 STACK_TEXT: nt!KeBugCheckEx nt!KeAccumulateTicks+0x1846b2 nt!KiUpdateRunTime+0x5d nt!KiUpdateTime+0x4a1 nt!KeClockInterruptNotify+0x2e3 nt!HalpTimerClockInterrupt+0xe2 nt!KiCallInterruptServiceRoutine+0xa5 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa nt!KiInterruptDispatchNoLockNoEtw+0x37 nt!KxWaitForSpinLockAndAcquire+0x2c nt!KeAcquireSpinLockAtDpcLevel+0x5c wanarp!WanNdisReceivePackets+0x4bb ndis!ndisMIndicateNetBufferListsToOpen+0x141 ndis!ndisMTopReceiveNetBufferLists+0x3f0e4 ndis!ndisCallReceiveHandler+0x61 ndis!ndisInvokeNextReceiveHandler+0x1df ndis!NdisMIndicateReceiveNetBufferLists+0x104 ndiswan!IndicateRecvPacket+0x596 ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b ndiswan!ProcessPPPFrame+0x16f ndiswan!ReceivePPP+0xb3 ndiswan!ProtoCoReceiveNetBufferListChain+0x442 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11 raspptp!CallIndicateReceived+0x210 raspptp!CallProcessRxNBLs+0x199 ndis!ndisDispatchIoWorkItem+0x12 nt!IopProcessWorkItem+0x135 nt!ExpWorkerThread+0x105 nt!PspSystemThreadStartup+0x55 nt!KiStartSystemThread+0x28 SYMBOL_NAME: wanarp!WanNdisReceivePackets+4bb FOLLOWUP_NAME: MachineOwner MODULE_NAME: wanarp IMAGE_NAME: wanarp.sys And this following is the output of !dpcwatchdog, but I still can't find what is causing this bugcheck, and can't find which function is consuming too much time in DISPATCH level which is causing this bugcheck. Although I think this could be related to some spin locking done by wanarp? Could this be a bug with wanarp? Note that we don't use any spinlocking in our driver, and us raising the IRQL should not cause any issue as it is actually very common for indication in Ndis to be done at IRQL DISPATCH. So How can I find the root cause of this bugcheck? There are no other third party LWF in the ndis stack. 3: kd> !dpcwatchdog All durations are in seconds (1 System tick = 15.625000 milliseconds) Circular Kernel Context Logger history: !logdump 0x2 DPC and ISR stats: !intstats /d -------------------------------------------------- CPU#0 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function dpcs: no pending DPCs found -------------------------------------------------- CPU#1 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function 1: Normal : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart 1: Normal : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine 1: Normal : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine DPC Watchdog Captures Analysis for CPU #1. DPC Watchdog capture size: 641 stacks. Number of unique stacks: 1. No common functions detected! The captured stacks seem to indicate that only a single DPC or generic function is the culprit. Try to analyse what other processors were doing at the time of the following reference capture: CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec # RetAddr Call Site 00 fffff805418d8991 nt!KiUpdateRunTime+0x5D 01 fffff805418d2803 nt!KiUpdateTime+0x4A1 02 fffff805418db1c2 nt!KeClockInterruptNotify+0x2E3 03 fffff80541808a45 nt!HalpTimerClockInterrupt+0xE2 04 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5 05 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 06 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37 07 fffff805418da3cc nt!KxWaitForSpinLockAndAcquire+0x2C 08 fffff8054fa614cb nt!KeAcquireSpinLockAtDpcLevel+0x5C 09 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x4BB 0a fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141 0b fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 0c fffff80546bddfef ndis!ndisCallReceiveHandler+0x61 0d fffff80546ba4a94 ndis!ndisInvokeNextReceiveHandler+0x1DF 0e fffff8057c32d17e ndis!NdisMIndicateReceiveNetBufferLists+0x104 0f fffff8057c30d6c7 ndiswan!IndicateRecvPacket+0x596 10 fffff8057c32d56b ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B 11 fffff8057c32d823 ndiswan!ProcessPPPFrame+0x16F 12 fffff8057c308e62 ndiswan!ReceivePPP+0xB3 13 fffff80546c5c006 ndiswan!ProtoCoReceiveNetBufferListChain+0x442 14 fffff80546c5c2d1 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6 15 fffff8057c2b0064 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11 16 fffff8057c2b06a9 raspptp!CallIndicateReceived+0x210 17 fffff80546bd9dc2 raspptp!CallProcessRxNBLs+0x199 18 fffff80541899645 ndis!ndisDispatchIoWorkItem+0x12 19 fffff80541852b65 nt!IopProcessWorkItem+0x135 1a fffff80541871d25 nt!ExpWorkerThread+0x105 1b fffff80541a00778 nt!PspSystemThreadStartup+0x55 1c ---------------- nt!KiStartSystemThread+0x28 -------------------------------------------------- CPU#2 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function 2: Normal : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine DPC Watchdog Captures Analysis for CPU #2. DPC Watchdog capture size: 641 stacks. Number of unique stacks: 1. No common functions detected! The captured stacks seem to indicate that only a single DPC or generic function is the culprit. Try to analyse what other processors were doing at the time of the following reference capture: CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec # RetAddr Call Site 00 fffff805418d245a nt!KeClockInterruptNotify+0x453 01 fffff80541808a45 nt!HalpTimerClockIpiRoutine+0x1A 02 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5 03 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 04 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37 05 fffff805418a9a68 nt!KxWaitForSpinLockAndAcquire+0x2C 06 fffff8054fa611cb nt!KeAcquireSpinLockRaiseToDpc+0x88 07 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x1BB 08 fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141 09 fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 0a fffff80546bddfef ndis!ndisCallReceiveHandler+0x61 0b fffff80546be3a81 ndis!ndisInvokeNextReceiveHandler+0x1DF 0c fffff80546ba804e ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611 0d fffff8054e384d77 ndis!NdisFIndicateReceiveNetBufferLists+0x6E 0e fffff8054e3811a9 ourdriver+0x4D70 0f fffff80546ba7d40 ourdriver+0x11A0 10 fffff8054182a6b5 ndis!ndisDummyIrpHandler+0x100 11 fffff80541c164c8 nt!IofCallDriver+0x55 12 fffff80541c162c7 nt!IopSynchronousServiceTail+0x1A8 13 fffff80541c15646 nt!IopXxxControlFile+0xC67 14 fffff80541a0aab5 nt!NtDeviceIoControlFile+0x56 15 ---------------- nt!KiSystemServiceCopyEnd+0x25 -------------------------------------------------- CPU#3 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function dpcs: no pending DPCs found Target machine version: Windows 10 Kernel Version 19041 MP (4 procs) Also note that we also pass the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL flag to the NdisFIndicateReceiveNetBufferLists, if the current IRQL is dispatch. Edit1: This is also the output of !locks and !qlocks and !ready, And the contention count on one of the resources is 49135, is this normal or too high? Could this be related to our issue? The threads that are waiting on it or own it are for normal processes such as chrome, csrss, etc. 3: kd> !kdexts.locks **** DUMP OF ALL RESOURCE OBJECTS **** KD: Scanning for held locks. Resource # nt!ExpTimeRefreshLock (0xfffff80542219440) Exclusively owned Contention Count = 17 Threads: ffffcf8ce9dee640-01<*> KD: Scanning for held locks..... Resource # 0xffffcf8cde7f59f8 Shared 1 owning threads Contention Count = 62 Threads: ffffcf8ce84ec080-01<*> KD: Scanning for held locks............................................................................................... Resource # 0xffffcf8ce08d0890 Exclusively owned Contention Count = 49135 NumberOfSharedWaiters = 1 NumberOfExclusiveWaiters = 6 Threads: ffffcf8cf18e3080-01<*> ffffcf8ce3faf080-01 Threads Waiting On Exclusive Access: ffffcf8ceb6ce080 ffffcf8ce1d20080 ffffcf8ce77f1080 ffffcf8ce92f4080 ffffcf8ce1d1f0c0 ffffcf8ced7c6080 KD: Scanning for held locks. Resource # 0xffffcf8ce08d0990 Shared 1 owning threads Threads: ffffcf8cf18e3080-01<*> KD: Scanning for held locks......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Resource # 0xffffcf8ceff46350 Shared 1 owning threads Threads: ffffcf8ce6de8080-01<*> KD: Scanning for held locks...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Resource # 0xffffcf8cf0cade50 Exclusively owned Contention Count = 3 Threads: ffffcf8ce84ec080-01<*> KD: Scanning for held locks......................... Resource # 0xffffcf8cf0f76180 Shared 1 owning threads Threads: ffffcf8ce83dc080-02<*> KD: Scanning for held locks....................................................................................................................................................................................................................................................... Resource # 0xffffcf8cf1875cb0 Shared 1 owning threads Contention Count = 3 Threads: ffffcf8ce89db040-02<*> KD: Scanning for held locks. Resource # 0xffffcf8cf18742d0 Shared 1 owning threads Threads: ffffcf8cee5e1080-02<*> KD: Scanning for held locks.................................................................................... Resource # 0xffffcf8cdceeece0 Shared 2 owning threads Contention Count = 4 Threads: ffffcf8ce3a1c080-01<*> ffffcf8ce5625040-01<*> Resource # 0xffffcf8cdceeed48 Shared 1 owning threads Threads: ffffcf8ce5625043-02<*> *** Actual Thread ffffcf8ce5625040 KD: Scanning for held locks... Resource # 0xffffcf8cf1d377d0 Exclusively owned Threads: ffffcf8cf0ff3080-02<*> KD: Scanning for held locks.... Resource # 0xffffcf8cf1807050 Exclusively owned Threads: ffffcf8ce84ec080-01<*> KD: Scanning for held locks...... 245594 total locks, 13 locks currently held 3: kd> !qlocks Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt Processor Number Lock Name 0 1 2 3 KE - Unused Spare MM - Unused Spare MM - Unused Spare MM - Unused Spare CC - Vacb CC - Master EX - NonPagedPool IO - Cancel CC - Unused Spare IO - Vpb IO - Database IO - Completion NTFS - Struct AFD - WorkQueue CC - Bcb MM - NonPagedPool 3: kd> !ready KSHARED_READY_QUEUE fffff8053f1ada00: (00) ****------------------------------------------------------------ SharedReadyQueue fffff8053f1ada00: No threads in READY state Processor 0: No threads in READY state Processor 1: Ready Threads at priority 15 THREAD ffffcf8ce9dee640 Cid 2054.2100 Teb: 000000fab7bca000 Win32Thread: 0000000000000000 READY on processor 1 Processor 2: No threads in READY state Processor 3: No threads in READY state 3: kd> dt nt!_ERESOURCE 0xffffcf8ce08d0890 +0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffcf8c`e08d0610 - 0xffffcf8c`e08cf710 ] +0x010 OwnerTable : 0xffffcf8c`ee6e8210 _OWNER_ENTRY +0x018 ActiveCount : 0n1 +0x01a Flag : 0xf86 +0x01a ReservedLowFlags : 0x86 '' +0x01b WaiterPriority : 0xf '' +0x020 SharedWaiters : 0xffffae09`adcae8e0 Void +0x028 ExclusiveWaiters : 0xffffae09`a9aabea0 Void +0x030 OwnerEntry : _OWNER_ENTRY +0x040 ActiveEntries : 1 +0x044 ContentionCount : 0xbfef +0x048 NumberOfSharedWaiters : 1 +0x04c NumberOfExclusiveWaiters : 6 +0x050 Reserved2 : (null) +0x058 Address : (null) +0x058 CreatorBackTraceIndex : 0 +0x060 SpinLock : 0 3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0)) (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0)) [Type: _OWNER_ENTRY] [+0x000] OwnerThread : 0xffffcf8cf18e3080 [Type: unsigned __int64] [+0x008 ( 0: 0)] IoPriorityBoosted : 0x0 [Type: unsigned long] [+0x008 ( 1: 1)] OwnerReferenced : 0x0 [Type: unsigned long] [+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long] [+0x008 (31: 3)] OwnerCount : 0x1 [Type: unsigned long] [+0x008] TableSize : 0xc [Type: unsigned long] 3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210) ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210) : 0xffffcf8cee6e8210 [Type: _OWNER_ENTRY *] [+0x000] OwnerThread : 0x0 [Type: unsigned __int64] [+0x008 ( 0: 0)] IoPriorityBoosted : 0x1 [Type: unsigned long] [+0x008 ( 1: 1)] OwnerReferenced : 0x1 [Type: unsigned long] [+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long] [+0x008 (31: 3)] OwnerCount : 0x0 [Type: unsigned long] [+0x008] TableSize : 0x7 [Type: unsigned long]
Thanks for reporting this. I've tracked this down to an OS bug: there's a deadlock in wanarp. This issue appears to affect every version of the OS going back to Windows Vista. I've filed internal issue task.ms/42393356 to track this: if you have a Microsoft support contract, your rep can get you status updates on that issue. Meanwhile, you can partially work around this issue by either: Indicating 1 packet at a time (NumberOfNetBufferLists==1); or Indicating on a single CPU at a time The bug in wanarp is exposed when 2 or more CPUs collectively process 3 or more NBLs at the same time. So either workaround would avoid the trigger conditions. Depending on how much bandwidth you're pushing through this network interface, those options could be rather bad for CPU/battery/throughput. So please try to avoid pessimizing batching unless it's really necessary. (For example, you could make this an option that's off-by-default, unless the customer specifically uses wanarp.) Note that you cannot fully prevent the issue yourself. Other drivers in the stack, including NDIS itself, have the right to group packets together, which would have the side effect re-batching the packets that you carefully un-batched. However, I believe that you can make a statistically significant dent in the crashes if you just indicate 1 NBL at a time, or indicate multiple NBLs on 1 CPU at a time. Sorry this is happening to you again! wanarp is... a very old codebase.
Long running Informatica Job
I am running a job where in source, i have oracle sql query to read the rows, followed by sorter transformation. Below are session logs where we can see after the start of sorter transformation [srt] it takes more than 3 hours to process 410516 rows. I am failing to understand if sorter is taking time or source query? Appreciate your response. READER_1_1_1> RR_4049 [2022-06-26 23:40:13.888] SQL Query issued to database : (Sun Jun 26 23:40:13 2022) READER_1_1_1> RR_4050 [2022-06-26 23:49:01.147] First row returned from database to reader : (Sun Jun 26 23:49:01 2022) TRANSF_1_1_1> SORT_40420 [2022-06-26 23:49:01.148] Start of input for Transformation [srt]. : (Sun Jun 26 23:49:01 2022) READER_1_1_1> BLKR_16019 [2022-06-27 03:04:51.901] Read [410516] rows, read [0] error rows for source table [DUMMY_src] instance name [DUMMY_src] READER_1_1_1> BLKR_16008 [2022-06-27 03:04:51.902] Reader run completed. TRANSF_1_1_1> SORT_40421 [2022-06-27 03:04:51.909] End of input for Transformation [srt]. : (Mon Jun 27 03:04:51 2022) TRANSF_1_1_1> SORT_40422 [2022-06-27 03:04:52.180] End of output from Sorter Transformation [srt]. Processed 410516 rows (6568256 input bytes; 0 temp I/O bytes). : (Mon Jun 27 03:04:52 2022) TRANSF_1_1_1> SORT_40423 [2022-06-27 03:04:52.181] End of sort for Sorter Transformation [srt]. : (Mon Jun 27 03:04:52 2022) WRITER_1_*_1> WRT_8167 [2022-06-27 03:04:52.201] Start loading table WRT_8035 [2022-06-27 03:09:47.457] Load complete time: Mon Jun 27 03:09:47 2022
Repository creation fails while upgrading OWB11gR1(11.1.0.7) to OWB11gR2(11.2.0.4)
I need to a new workspace in OWB11gR2(11.2.0.4) to upgrade OWB11gR1(11.1.0.7). Repository Assistant fails after processing 64%. The following is the error log. main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): % = 0.8051529790660225 main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): -token name = LOADJAVA; -token type = 13 main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): ProcessEngine.token_db_min_ver = main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): Before processing LOADJAVA Token main.TaskScheduler timer[5]20200714#08:45:58.058: 00> oracle.wh.service.impl.assistant.ProcessEngine.display(ProcessEngine.java:2122): ... I am in processLoadJavaToken ... main.AWT-EventQueue-0[6]20200714#08:48:36.036: 00> oracle.wh.ui.jcommon.WhButton#424c414: WhButton setLabel rtsString = Yes main.AWT-EventQueue-0[6]20200714#08:48:36.036: 00> oracle.wh.ui.jcommon.WhButton#424c414: WhButton setLabel rtsString = No The following is the list of database patches. Patch 17906774: applied on Wed Aug 04 11:21:52 BDT 2021 Unique Patch ID: 17692968 Created on 14 May 2014, 22:56:54 hrs PST8PDT Bugs fixed: 17607032, 17974168, 17669786, 17561509, 16885825, 18274560, 17613052 17461930, 16829998, 17251918, 17435868, 17279666, 17328020, 17006987 18260620, 16833468, 18180599, 17292119, 17340242, 17296559, 15990966 17438322, 17939651, 17359696, 18385759, 17820353, 17939225, 17715818 18192446, 16960088, 17191248, 17422695 Patch 31668908 : applied on Mon Jul 12 16:13:02 BDT 2021 Unique Patch ID: 23822194 Patch description: "OJVM PATCH SET UPDATE 11.2.0.4.201020" Created on 18 Sep 2020, 03:30:45 hrs PST8PDT Bugs fixed: 23727132, 19554117, 19006757, 14774730, 18933818, 18458318, 18166577 19231857, 19153980, 19058059, 19007266, 17285560, 17201047, 17056813 19223010, 19852360, 19909862, 19895326, 19374518, 20408829, 21047766 21566944, 19176885, 17804361, 17528315, 21811517, 22253904, 19187988 21911849, 22118835, 22670385, 23265914, 22675136, 24448240, 25067795 24534298, 25076732, 25494379, 26023002, 19699946, 26637592, 27000663 25649873, 27461842, 27952577, 27642235, 28502128, 28915933, 29254615 29774367, 29992392, 29448234, 30160639, 30534664, 30855121, 31306274 30772207, 31476032, 30561292, 28394726, 26716835, 24817447, 23082876 31668867 Patch 31537677 : applied on Thu Jul 08 11:53:10 BDT 2021 Unique Patch ID: 23852314 Patch description: "Database Patch Set Update : 11.2.0.4.201020 (31537677)"
The following is the workaround to fix the issue. Step 1: Rollback the patches 31668908 and 31537677. OWB does not support with OJVM patch newer than December 2018 Step 2:Re-run the repository assistant.
Can't add node to the cockroachde cluster
I'm staking to join a CockroachDB node to a cluster. I've created first cluster, then try to join 2nd node to the first node, but 2nd node created new cluster as follows. Does anyone knows whats are wrong steps on the following my steps, any suggestions are wellcome. I've started first node as follows: cockroach start --insecure --advertise-host=163.172.156.111 * Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html * CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s) build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6) webui: http://163.172.156.111:8080 sql: postgresql://root#163.172.156.111:26257?sslmode=disable client flags: cockroach <client cmd> --host=163.172.156.111:26257 --insecure logs: /home/ueda/cockroach-data/logs temp dir: /home/ueda/cockroach-data/cockroach-temp449555924 external I/O path: /home/ueda/cockroach-data/extern store[0]: path=/home/ueda/cockroach-data status: initialized new cluster clusterID: 3e797faa-59a1-4b0d-83b5-36143ddbdd69 nodeID: 1 Then, start secondary node to join to 163.172.156.111, but can't join: cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257 CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s) build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6) webui: http://128.199.127.164:8080 sql: postgresql://root#128.199.127.164:26257?sslmode=disable client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure logs: /home/ueda/cockroach-data/logs temp dir: /home/ueda/cockroach-data/cockroach-temp067740997 external I/O path: /home/ueda/cockroach-data/extern store[0]: path=/home/ueda/cockroach-data status: restarted pre-existing node clusterID: a14e89a7-792d-44d3-89af-7037442eacbc nodeID: 1 The cockroach.log of joining node shows some gosip error: cat cockroach-data/logs/cockroach.log I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] file created at: 2019/05/11 01:21:13 I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] running on machine: amfortas I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6) I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257] I190511 01:21:13.762309 1 util/log/clog.go:1199 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ I190511 01:21:13.762307 1 cli/start.go:1033 logging to directory /home/ueda/cockroach-data/logs W190511 01:21:13.763373 1 cli/start.go:1068 RUNNING IN INSECURE MODE! - Your cluster is open for any client that can access <all your IP addresses>. - Any user, even root, can log in without providing a password. - Any user, connecting as root, can read or write any data in your cluster. - There is no network encryption nor authentication, and thus no confidentiality. Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html I190511 01:21:13.763675 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory W190511 01:21:13.763752 1 cli/start.go:944 Using the default setting for --cache (128 MiB). A significantly larger value is usually needed for good performance. If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB). I190511 01:21:13.764011 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory W190511 01:21:13.764047 1 cli/start.go:957 Using the default setting for --max-sql-memory (128 MiB). A significantly larger value is usually needed in production. If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB). I190511 01:21:13.764239 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory I190511 01:21:13.764272 1 cli/start.go:1082 CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6) I190511 01:21:13.866977 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory I190511 01:21:13.867002 1 server/config.go:386 system total memory: 992 MiB I190511 01:21:13.867063 1 server/config.go:388 server configuration: max offset 500000000 cache size 128 MiB SQL memory pool size 128 MiB scan interval 10m0s scan min idle time 10ms scan max idle time 1s event log enabled true I190511 01:21:13.867098 1 cli/start.go:929 process identity: uid 1000 euid 1000 gid 1000 egid 1000 I190511 01:21:13.867115 1 cli/start.go:554 starting cockroach node I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997" I190511 01:21:13.894320 21 server/server.go:876 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data" W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed. W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed. I190511 01:21:13.911380 21 server/config.go:494 [n?] 1 storage engine initialized I190511 01:21:13.911417 21 server/config.go:497 [n?] RocksDB cache size: 128 MiB I190511 01:21:13.911427 21 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 10000 W190511 01:21:13.912459 21 gossip/gossip.go:1496 [n?] no incoming or outgoing connections I190511 01:21:13.913206 21 server/server.go:926 [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69" I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69" I190511 01:21:14.395848 21 gossip/gossip.go:392 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548 W190511 01:21:14.458176 21 storage/replica_range_lease.go:506 can't determine lease status due to node liveness error: node not in the liveness table I190511 01:21:14.458465 21 server/node.go:461 [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00} I190511 01:21:14.458775 21 storage/stores.go:244 [n1] read 0 node addresses from persistent storage I190511 01:21:14.459095 21 server/node.go:699 [n1] connecting to gossip network to verify cluster ID... W190511 01:21:14.469842 96 storage/store.go:1525 [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown I190511 01:21:14.474785 21 server/node.go:719 [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc" I190511 01:21:14.475033 21 server/node.go:542 [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes [] I190511 01:21:14.475393 21 server/status/recorder.go:610 [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory I190511 01:21:14.475514 21 server/server.go:1582 [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080) I190511 01:21:14.475572 21 server/server.go:1584 [n1] starting grpc/postgres server at [::]:26257 I190511 01:21:14.475605 21 server/server.go:1585 [n1] advertising CockroachDB node at 128.199.127.164:26257 W190511 01:21:14.475655 21 jobs/registry.go:341 [n1] unable to get node liveness: node not in the liveness table I190511 01:21:14.532949 21 server/server.go:1650 [n1] done ensuring all necessary migrations have run I190511 01:21:14.533020 21 server/server.go:1653 [n1] serving sql connections I190511 01:21:14.533209 21 cli/start.go:689 [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc I190511 01:21:14.533257 21 cli/start.go:697 node startup completed: CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s) build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6) webui: http://128.199.127.164:8080 sql: postgresql://root#128.199.127.164:26257?sslmode=disable client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure logs: /home/ueda/cockroach-data/logs temp dir: /home/ueda/cockroach-data/cockroach-temp067740997 external I/O path: /home/ueda/cockroach-data/extern store[0]: path=/home/ueda/cockroach-data status: restarted pre-existing node clusterID: a14e89a7-792d-44d3-89af-7037442eacbc nodeID: 1 I190511 01:21:14.541205 146 server/server_update.go:67 [n1] no need to upgrade, cluster already at the newest version I190511 01:21:14.555557 149 sql/event_log.go:135 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486} I190511 01:21:14.916458 59 gossip/gossip.go:1510 [n1] node has connected to cluster via gossip I190511 01:21:14.916660 59 storage/stores.go:263 [n1] wrote 0 node addresses to persistent storage I190511 01:21:24.480247 116 storage/store.go:4220 [n1,s1] sstables (read amplification = 2): 0 [ 51K 1 ]: 51K 6 [ 1M 1 ]: 1M I190511 01:21:24.480380 116 storage/store.go:4221 [n1,s1] ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------- L0 1/0 50.73 KB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0 L6 1/0 1.26 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 Sum 2/0 1.31 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0 Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0 Uptime(secs): 10.6 total, 10.6 interval Flush(GB): cumulative 0.000, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count estimated_pending_compaction_bytes: 0 B I190511 01:21:24.481565 121 server/status/runtime.go:500 [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net What is the possibly cause to block to join? Thank you for your suggestion!
It seems you had previously started the second node (the one running on 128.199.127.164) by itself, creating its own cluster. This can be seen in the error message: W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69" To be able to join the cluster, the data directory of the joining node must be empty. You can either delete cockroach-data or specify an alternate directory with --store=/path/to/data-dir
How to understand ORA-00060 deadlock trace file
Recently I got a ORA-00060 deadlock error. I read this post and this post and this post but I'm not sure what the problem is: unindexed FK cause this or other problem. My question is how to understand this trace file and how to solve it? Below is the trace file: *** 2015-06-02 14:53:45.513 DEADLOCK DETECTED ( ORA-00060 ) [Transaction Deadlock] The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock: Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TM-00014d94-00000000 497 556 S 332 1414 SX TM-00014d94-00000000 332 1414 SX 416 1038 S TX-0011000f-000000ab 416 1038 X 302 457 S TM-00014d94-00000000 302 457 SX 497 556 S session 556: DID 0001-01F1-0000000D session 1414: DID 0001-014C-00000022 session 1414: DID 0001-014C-00000022 session 1038: DID 0001-01A0-0000000C session 1038: DID 0001-01A0-0000000C session 457: DID 0001-012E-00000028 session 457: DID 0001-012E-00000028 session 556: DID 0001-01F1-0000000D Rows waited on: Session 556: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA (dictionary objn - 85396, file - 0, block - 0, slot - 0) Session 1414: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA (dictionary objn - 85396, file - 0, block - 0, slot - 0) Session 1038: obj - rowid = 00014D94 - AAAAAAAAAAAAAAAAAA (dictionary objn - 85396, file - 0, block - 0, slot - 0) Session 457: obj - rowid = 00014FA0 - AAAU+gAAEAAAft+AAA (dictionary objn - 85920, file - 4, block - 129918, slot - 0) ----- Information for the OTHER waiting sessions ----- Session 1414: sid: 1414 ser: 1424 audsid: 100128 user: 91/SW flags: 0x45 pid: 332 O/S info: user: oracle, term: UNKNOWN, ospid: 10179 image: oracle#jwdb client details: O/S info: user: root, term: unknown, ospid: 1234 machine: localhost.localdomain program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 current SQL: insert into t_course_takes (created_at, updated_at, attend, course_id, course_take_type_id, election_mode_id, lesson_id, limit_group_id, paid, remark, semester_id, state, std_id, turn, virtual_cost, id) values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 ) Session 1038: sid: 1038 ser: 951 audsid: 100212 user: 91/SW flags: 0x45 pid: 416 O/S info: user: oracle, term: UNKNOWN, ospid: 10343 image: oracle#jwdb client details: O/S info: user: root, term: unknown, ospid: 1234 machine: localhost.localdomain program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 current SQL: delete from t_course_takes where id=:1 Session 457: sid: 457 ser: 2983 audsid: 100099 user: 91/SW flags: 0x45 pid: 302 O/S info: user: oracle, term: UNKNOWN, ospid: 10111 image: oracle#jwdb client details: O/S info: user: root, term: unknown, ospid: 1234 machine: jdbcclient program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 current SQL: insert into t_elect_loggers (created_at, updated_at, course_code, course_name, course_take_type_id, course_type, credits, election_mode_id, ip_address, lesson_no, operator_code, operator_name, project_id, remark, screening, semester_id, std_code, std_name, turn, type, virtual_orig, virtual_rest, id) values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 ) ----- End of information for the OTHER waiting sessions ----- Information for THIS session: ----- Current SQL Statement for this session (sql_id=ca9jc1g44ap41) ----- delete from t_course_takes where id=:1 =================================================== PROCESS STATE ------------- Process global information: process: 0x9d0fd98c8, call: 0x95429a500, xact: 0x922b10710, curses: 0x94110e198, usrses: 0x94110e198 ---------------------------------------- SO: 0x9d0fd98c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x9d0fd98c8, name=process, file=ksu.h LINE:11459, pg=0 (process) Oracle pid:497, ser:7, calls cur/top: 0x95429a500/0x95429a500 flags : (0x0) - flags2: (0x0), flags3: (0x0) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 9 last post received-location: ksq.h LINE:1877 ID:ksqrcl last process to post me: 900fd0348 12 0 last post sent: 0 0 9 last post sent-location: ksq.h LINE:1877 ID:ksqrcl last process posted by me: 900fce2c8 17 0 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 0x90102ea98 O/S info: user: oracle, term: UNKNOWN, ospid: 10507 OSD pid info: Unix process pid: 10507, image: oracle#jwdb Dump of memory from 0x0000000921009C90 to 0x0000000921009E98 921009C90 00000000 00000000 00000000 00000000 [................] Repeat 31 times 921009E90 00000000 00000000 [........] (FOB) flags=2050 fib=0x902c3c2b8 incno=0 pending i/o cnt=0 fname=/home/jwdb/oracle/oradata/orcl/undotbs01.dbf fno=3 lblksz=8192 fsiz=238080 (FOB) flags=2050 fib=0x902c3c8b8 incno=0 pending i/o cnt=0 fname=/home/jwdb/oracle/oradata/orcl/users01.dbf fno=4 lblksz=8192 fsiz=603680 (FOB) flags=2050 fib=0x902c3b6a0 incno=0 pending i/o cnt=0 fname=/home/jwdb/oracle/oradata/orcl/system01.dbf fno=1 lblksz=8192 fsiz=92160 ---------------------------------------- SO: 0x94110e198, type: 4, owner: 0x9d0fd98c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x9d0fd98c8, name=session, file=ksu.h LINE:11467, pg=0 (session) sid: 556 ser: 3224 trans: 0x922b10710, creator: 0x9d0fd98c8 flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40008) -/- DID: , short-term DID: txn branch: (nil) oct: 7, prv: 0, sql: 0x9cfed0be0, psql: 0x9afeb20c8, user: 91/SW ksuxds FALSE at location: 0 service name: SYS$USERS client details: O/S info: user: root, term: unknown, ospid: 1234 machine: localhost.localdomain program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 Current Wait Stack: 0: waiting for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=896 snap_id=53 wait times: snap=0.023828 sec, exc=4 min 18 sec, total=4 min 18 sec wait times: max=infinite, heur=4 min 18 sec wait counts: calls=87 os=87 in_wait=1 iflags=0x15a0 There is at least one session blocking this session. Dumping first 3 direct blockers: inst: 1, sid: 457, ser: 2983 inst: 1, sid: 1353, ser: 5618 inst: 1, sid: 907, ser: 5215 Dumping final blocker: inst: 1, sid: 1168, ser: 1194 There are 1 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 1136, ser: 2212 wait event: 'enq: TM - contention' p1: 'name|mode'=0x544d0004 p2: 'object #'=0x14d94 p3: 'table/partition'=0x0 row_wait_obj#: 85396, block#: 0, row#: 0, file# 0 min_blocked_time: 12 secs, waiter_cache_ver: 32536 Wait State: fixed_waits=0 flags=0x23 boundary=(nil)/-1 Session Wait History: elapsed time of 0.000000 sec since current wait 0: waited for 'latch: enqueue hash chains' address=0x9313226a0, number=0x1c, tries=0x0 wait_id=839 seq_num=895 snap_id=1 wait times: snap=0.226082 sec, exc=0.226082 sec, total=0.226082 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 1: waited for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=894 snap_id=52 wait times: snap=3.000901 sec, exc=4 min 18 sec, total=4 min 18 sec wait times: max=infinite wait counts: calls=87 os=87 occurred after 0.000000 sec of elapsed time 2: waited for 'latch: enqueue hash chains' address=0x6000cf38, number=0x1c, tries=0x0 wait_id=838 seq_num=893 snap_id=1 wait times: snap=0.000142 sec, exc=0.000142 sec, total=0.000142 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 3: waited for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=892 snap_id=51 wait times: snap=12.003822 sec, exc=4 min 15 sec, total=4 min 15 sec wait times: max=infinite wait counts: calls=86 os=86 occurred after 0.000000 sec of elapsed time 4: waited for 'latch: enqueue hash chains' address=0x6000cf38, number=0x1c, tries=0x0 wait_id=837 seq_num=891 snap_id=1 wait times: snap=0.000170 sec, exc=0.000170 sec, total=0.000170 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 5: waited for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=890 snap_id=50 wait times: snap=6.001627 sec, exc=4 min 3 sec, total=4 min 3 sec wait times: max=infinite wait counts: calls=82 os=82 occurred after 0.000000 sec of elapsed time 6: waited for 'latch: enqueue hash chains' address=0x6000cf38, number=0x1c, tries=0x0 wait_id=836 seq_num=889 snap_id=1 wait times: snap=0.000378 sec, exc=0.000378 sec, total=0.000378 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 7: waited for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=888 snap_id=49 wait times: snap=3.000543 sec, exc=3 min 57 sec, total=3 min 57 sec wait times: max=infinite wait counts: calls=80 os=80 occurred after 0.000000 sec of elapsed time 8: waited for 'latch: enqueue hash chains' address=0x6000cf38, number=0x1c, tries=0x0 wait_id=835 seq_num=887 snap_id=1 wait times: snap=0.000350 sec, exc=0.000350 sec, total=0.000350 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 9: waited for 'enq: TM - contention' name|mode=0x544d0004, object #=0x14d94, table/partition=0x0 wait_id=787 seq_num=886 snap_id=48 wait times: snap=3.000880 sec, exc=3 min 54 sec, total=3 min 54 sec wait times: max=infinite wait counts: calls=79 os=79 occurred after 0.000000 sec of elapsed time Updated on 6/5/2015 The whole trace file is here
This ORA-00060 was causing problem to my team also. The trace file was showing the blocking of two session each other as TX--- X --------S. In our case we were sure that the multithreaded environment is distributed on unique_number column such that two threads will not update the same record. Further down the trace file it showed "enq: TX - allocate ITL entry". When the processing was running I checked the kind of waits: select blocking_session, sid, serial#, wait_class, seconds_in_wait from v$session where blocking_session is not NULL order by blocking_session; And it was showing the wait class is "Configure". So, I increased the initrans of the tables used by the processing. Then also reduced the batch size so that commit happens more frequently then earlier. After that the problem is solved.