Stop Cobalt will hang(block) indefinitely - cobalt

Cobalt will hang(block) indefinitely after calling the ApplicationDirectFB::Get()->Stop() function, and can not exit, and the backtrace when hung is as follows, could anyone help to have a look?
<unknown> [0xb5d988f4]
SbConditionVariableWait [0xbd598]
base::WaitableEvent::TimedWait() [0xa0f1c]
base::WaitableEvent::Wait() [0xa0ff8]
cobalt::storage::StorageManager::FinishIO() [0x374454]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
cobalt::storage::StorageManager::~StorageManager() [0x374750]
If I comment the no_flushes_pending_.Wait(); in StorageManager::FinishIO in src/cobalt/storage/storage_manager.cc, it will not hung(block), and can exit successfully
void StorageManager::FinishIO() {
TRACE_EVENT0("cobalt::storage", __FUNCTION__);
DCHECK(!sql_message_loop_->BelongsToCurrentThread());
// The SQL thread may be communicating with the savegame I/O thread still,
// flushing all pending updates. This process can require back and forth
// communication. This method exists to wait for that communication to
// finish and for all pending flushes to complete.
// Start by finishing all commands currently in the sql message loop queue.
// This method is called by the destructor, so the only new tasks posted
// after this one will be generated internally. We need to do this because
// it is possible that there are no flushes pending at this instant, but there
// are tasks queued on |sql_message_loop_| that will begin a flush, and so
// we make sure that these are executed first.
base::WaitableEvent current_queue_finished_event_(true, false);
sql_message_loop_->PostTask(
FROM_HERE,
base::Bind(&base::WaitableEvent::Signal,
base::Unretained(&current_queue_finished_event_)));
current_queue_finished_event_.Wait();
// Now wait for all pending flushes to wrap themselves up. This may involve
// the savegame I/O thread and the SQL thread posting tasks to each other.
//no_flushes_pending_.Wait(); -->Comment it
}

This is not the best answer because I only vaguely remember encountering this before, but I couldn't find the reference to it anywhere to confirm. I believe this happens when one of the SbStorage APIs doesn't return the right value, perhaps on an error?

The root cause is that it will write data to $HOME/.starboard.storage(which is set in starboard/shared/linux/get_home_directory.cc), but on some platform, the partition is read only, and it will make the write failed and hang indefinitely, so it need to change the file path of .starboard.storage to some writeable partition.

Related

Pepper API 7 Emulator in Android Studio Spawning Too Many Threads?

I'm using the Pepper plugin on Android Studio. I have the robot emulator and device emulator running fine, but when I run the application, I get this weird threadpool spawning error. I've gone through the entire install tutorial and made sure everything was right, but I can't get around this. It happens most of the times that I run it, but sometimes it runs without any issues. Thanks!
07-29 11:38:29.474 2625-2643/com.tammy.tammygame E/qi.eventloop: Threadpool MainEventLoop: System seems to be deadlocked, sending emergency signal
07-29 11:38:29.474 2625-2643/com.tammy.tammygame A/qimessaging.jni: Emergency, aborting
07-29 11:38:29.474 2625-2631/com.tammy.tammygame I/art: Thread[3,tid=2631,WaitingInMainSignalCatcherLoop,Thread*=0xa682e700,peer=0x12c790a0,"Signal Catcher"]: reacting to signal 3
07-29 11:38:29.479 2625-2631/com.tammy.tammygame W/art: Method processed more than once: android.os.Message android.os.MessageQueue.next()
07-29 11:38:29.483 2625-2631/com.tammy.tammygame W/art: Method processed more than once: void java.lang.Daemons$ReferenceQueueDaemon.run()
07-29 11:38:29.484 2625-2631/com.tammy.tammygame W/art: Method processed more than once: java.lang.ref.Reference java.lang.ref.ReferenceQueue.remove(long)
07-29 11:38:29.484 2625-2631/com.tammy.tammygame W/art: Method processed more than once: boolean java.lang.Daemons$FinalizerWatchdogDaemon.waitForObject()
07-29 11:38:29.486 2625-2631/com.tammy.tammygame W/art: Method processed more than once: void java.lang.Daemons$HeapTaskDaemon.run()
07-29 11:38:29.490 2625-2631/com.tammy.tammygame W/art: Method processed more than once: void java.util.Timer$TimerImpl.run()
07-29 11:38:29.497 2625-2631/com.tammy.tammygame E/art: Unable to open stack trace file '/data/anr/traces.txt': No such file or directory
07-29 11:38:29.976 2625-2643/com.tammy.tammygame I/qi.eventloop: Threadpool MainEventLoop: Size limit reached (658 timeouts / 20 max, number of tasks: 690, number of active tasks: 8, number of threads: 8, maximum number of threads: 8)
07-29 11:38:29.976 2625-2643/com.tammy.tammygame E/qi.eventloop: Threadpool MainEventLoop: System seems to be deadlocked, sending emergency signal
07-29 11:38:29.976 2625-2643/com.tammy.tammygame A/qimessaging.jni: Emergency, aborting```
The Qi SDK and its underlying framework, libQi, produce threads automatically for your callbacks (for subscriptions of future continuations). But this is limited in hard code to 8 threads for Android clients, which is not so much. When they are all busy, the communication with the robot is blocked, and apparently the program is aborted too.
To avoid this kind of issue you must be cautious of the code you write in these callbacks (onRobotFocusGained is also one of them), and avoid blocking code there. To do so, remember to always call Qi SDK methods via asynchronous interfaces, for instance say.async().run() instead of say.run(). That would return a Future f, and the continuation of your code would land in the callback of f.andThen(...).
If you are using Kotlin, you can avoid this tiresome gymnastic by using coroutines, via suspend functions. A suspend awaiting a result from a future would stop there, freeing the thread, and resume execution upon receiving the result. Qi's notion of Future is compatible with coroutines. Here is a file you can put in your project to extend Qi's Future for coroutines. With this, you can call say.async().run().await() and wait for the result without blocking the thread. Yet, it looks synchronous, so it is convenient for you as a developer.

Async feature in Servlets

I was just going back to Servlet-3.x features and exploring it. If I am not wrong, before Servlet-3.x it was thread per request model and it would run out of threads in the pool, for heavy incoming traffic.
So, with Servlet-3.x it says it is Asynchronous and doesn't keep the threads blocked , rather releases them immediately but just the task is delegated.
Here is my interpretation,
consider there are 2 threads in Server thread-pool
For a new Async Servlet request R1 there is a thread T1, this T1 would delegate the task to T2 and T1 responds back to client immediately.
Question: Is T2 created from Server thread-pool? If so, I don't get the point.
Case 1: If it was old Synchronous Servlet request T1 would have been busy running I/O task,
Case 2: If it was Asynchronous Servlet call T2 is busy running I/O task.
In both cases, one of them is busy.
I tried to check the same with a sample Async servlet in openliberty app server, below is the sample log captured from my sample demo Servlet.
Entering doGet() == thread name is = Default Executor-thread-116
Exiting doGet() == thread name is = Default Executor-thread-116
=== Long running task started ===
Thread executing #start of long running task = Default Executor-thread-54
Thread executing #end of long running task = Default Executor-thread-54
=== Long running task ended ===
As shown above, the Default Executor-thread-116 is released immediately and delegated long running task to the Default Executor-thread-54, but I am not sure if they are from the App Server thread pool. If so, why can't just Default Executor-thread-116 do the task instead of delegation?
Can someone throw some light on this async behavior of Servlets in JavaEE
In your example, where the work is synchronous and there's no separate executor/threadpool, there is nearly no point to use async servlets. Lots of samples/examples out there are just block on a 2nd thread because they're trying to illustrate just the syntax.
But there's no reason why you can't spin off a thread to do a little work, add your async context to some list, and then after some event (inbound JMS, websocket, whatever) provides the data needed to complete the async response. For example, a 2-player game server wouldn't wait for player 2 in a second thread, it would just have their async context floating around in memory waiting for a 2nd player to find it.

How to debug MPI program before bad termination?

I am currently developing a program written in C++ with the MPI+pthread paradigm.
I add some functionality to my program, however I have a bad termination message from one MPI process, like this:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 37805 RUNNING AT node165
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0#node162] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
[proxy:0:0#node162] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2#node166] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
[proxy:0:2#node166] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2#node166] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event
srun: error: node162: task 0: Exited with exit code 7
[proxy:0:0#node162] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event
srun: error: node166: task 2: Exited with exit code 7
[mpiexec#node162] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec#node162] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec#node162] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec#node162] main (ui/mpich/mpiexec.c:340): process manager error waiting for completion
My problem is such that I have no idea about why I have this kind of message, and thus how to correct it.
I use only some basic functions from MPI, and ensure that there is no threads which uses MPI calls (only my "master process" is allowed to call such functions).
I also checked that one process does not send message to itself, and that the process destination exist before sending a message.
My question is quite simple: how to know where the problem comes from to then debug my application ?
Thank you a lot.
one of your processes has had a segmentation fault. This means reading from or writing to an area of memory that it is not permitted to.
That's the cause and MPI functions often are difficult to get right the first time - for example it could be MPI send and receive functions with incorrect sizes or locations.
The best solution is to fire up a parallel debugger so that you can watch all the processes. It looks like you are using a proper HPC system so there is a chance that there is one installed on the system -- ddt or totalview are the most popular.
Take a look at How to debug an MPI program
My experience with this problem when writing in C++ and using MPI is that this frequently occurred when I did not set MPI_Finalze(); before every return statement.

Why won't Micrometer stop sending data to datadog and just close already?

I have a Spring Boot app which due to weird restrictions needs to run once every three hours, and won't work with Quartz, so I've been running it once every three hours from OS cron and it quits when it's done.
After adding micrometer-registry-datadog (and spring-legacy) however, it never quits, it just sends metrics every 20 seconds or whatever the default period is, even after calling registry.close().
Am I doomed like the dutchman to sail the seas of processing forever, or is there an obvious error I have made?
Code: It reaches SpringApplication.exit(ctx), but it does not actually exit cleanly. (service is a TimedExecutorService.)
public void close() throws InterruptedException {
service.shutdown();
service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
meterRegistry.close();
SpringApplication.exit(ctx);
}
This sounds like a bug. It is possible the Datadog exporter is running in a non-daemon thread. The JVM views non-daemon threads as application critical work.
So essentially the JVM thinks it shouldn't shutdown until the non-daemon thread finishes. In the case of the Datadog exporter thread, that probably won't happen.
To verify there are non-daemon threads, use jstack to generate a thread dump. (command: jstack <pid>) or dump all threads in your close method:
ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
for (ThreadInfo ti : threadMxBean.dumpAllThreads(true, true)) {
System.out.print(ti.toString());
}
An example thread dump output is below. Notice the word 'daemon' on the first line:
"pool-1-thread-1" #13 prio=5 os_prio=31 tid=0x00007fe885aa5000 nid=0xa907 waiting on condition [0x000070000d67b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c07e9720> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thread wait reasons

I've been using code that I found in the following post:
How to get thread state (e.g. suspended), memory + CPU usage, start time, priority, etc
I'm examining thread state, and there's the following enum that describes the reasons for thread 'waiting' status -
enum KWAIT_REASON
{
Executive,
FreePage,
PageIn,
PoolAllocation,
DelayExecution,
Suspended,
UserRequest,
WrExecutive,
WrFreePage,
WrPageIn,
WrPoolAllocation,
WrDelayExecution,
WrSuspended,
WrUserRequest,
WrEventPair,
WrQueue,
WrLpcReceive,
WrLpcReply,
WrVirtualMemory,
WrPageOut,
WrRendezvous,
Spare2,
Spare3,
Spare4,
Spare5,
Spare6,
WrKernel,
MaximumWaitReason
};
Can anyone explain what WrQueue is, and perhaps what the difference between WrUserRequest and UserRequest is?
The information is obtained using NtQuerySystemInformation() with SystemProcessInformation.
WrQueue this is when thread waits on KQUEUE object (look it definition in wdm.h) in kernel. this can be call to ZwRemoveIoCompletion or Win32 shell GetQueuedCompletionStatus (IOCP is exactly KQUEUE object). or thread (begining from vista) call ZwWaitForWorkViaWorkerFactory (worker factory internally use KQUEUE. also possible that thread in kernel calls KeRemoveQueue - this usually does system working threads.
WrUserRequest is used by win32k.sys subsystem. Usually this is when thread calls GetMessage. So if we view WrUserRequest we can be sure that thread is waiting for window messages.
UserRequest - this means that thread waits on some object[s] via WaitForSingleObject[Ex] or WaitForMultipleObjects[Ex] or MsgWaitForMultipleObjects[Ex] (or it equivalents)

Resources