What is the stack used for in CPython, if anything?

What is the stack used for in CPython, if anything? - memory-management

As far as I understand:
The OS kernel (e.g. Linux) always allocates a stack for each system-level thread when a thread is created.
CPython is known for using a private heap for its objects, including presumably the call stack for Python subroutines.
If so, what is the stack used for in CPython, if anything?

CPython is an ordinary C program. There is no magic in running Python script / module / REPL / whatever: every piece of code must be read, parsed, interpreted — in a loop, until it's done. There is whole bunch of processor instructions behind every Python expression and statement.
Every "simple" top-level thing (parsing and production of bytecode, GIL management, attribute lookup, console I/O, etc) is very complex under the hood. If consists of functions, calling other functions, calling other functions... which means there is stack involved. Seriously, check it yourself: some of the source files span few thousand lines of code.
Just reaching the main loop of the interpreter is an adventure on it's own. Here is the gist, sewed from pieces from all around the code base:
#ifdef MS_WINDOWS
int wmain(int argc, wchar_t **argv)
{
return Py_Main(argc, argv);
}
#else
// standard C entry point
#endif
int Py_Main(int argc, wchar_t **argv)
{
_PyArgv args = /* ... */;
return pymain_main(&args);
}
static int pymain_main(_PyArgv *args)
{
// ... calling some initialization routines and checking for errors ...
return Py_RunMain();
}
int Py_RunMain(void)
{
int exitcode = 0;
pymain_run_python(&exitcode);
// ... clean-up ...
return exitcode;
}
static void pymain_run_python(int *exitcode)
{
// ... initializing interpreter state and startup config ...
// ... determining main import path ...
if (config->run_command) {
*exitcode = pymain_run_command(config->run_command, &cf);
}
else if (config->run_module) {
*exitcode = pymain_run_module(config->run_module, 1);
}
else if (main_importer_path != NULL) {
*exitcode = pymain_run_module(L"__main__", 0);
}
else if (config->run_filename != NULL) {
*exitcode = pymain_run_file(config, &cf);
}
else {
*exitcode = pymain_run_stdin(config, &cf);
}
// ... clean-up
}
int PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags)
{
// ... even more routing ...
int err = PyRun_InteractiveLoopFlags(fp, filename, flags);
// ...
}
int PyRun_InteractiveLoopFlags(FILE *fp, const char *filename_str, PyCompilerFlags *flags)
{
// ... more initializing ...
do {
ret = PyRun_InteractiveOneObjectEx(fp, filename, flags);
// ... error handling ...
} while (ret != E_EOF);
// ...
}

Related

Trying to load pyd with LoadLibraryEx and got failed

I have a pyd pytest.pyd, where declared two functions: say_hello and add_numbers. So I want to load this pyd dynamically in C++ with LoadLibraryEx. But, when I try to call initpytest func, it fails.
const char* funcname = "initpytest";
HINSTANCE hDLL = LoadLibraryEx(L"D:\\msp\\myproj\\Test\\pytest.pyd", NULL, LOAD_WITH_ALTERED_SEARCH_PATH);
FARPROC p = GetProcAddress(hDLL, funcname);
(*p)(); // fail
In output error: Fatal Python error: PyThreadState_Get: no current thread
Unhandled exception at 0x00007FFCD0CC286E (ucrtbase.dll) in Test.exe: Fatal program exit requested.
Here code of the extension before generating to pyd:
#include "Python.h"
static PyObject* say_hello(PyObject* self, PyObject* args)
{
const char* msg;
if(!PyArg_ParseTuple(args, "s", &msg))
{
return NULL;
}
printf("This is C world\nYour message is %s\n", msg);
return Py_BuildValue("i", 25);
}
static PyObject* add_numbers(PyObject* self, PyObject* args)
{
double a, b;
if (!PyArg_ParseTuple(args, "dd", &a, &b))
{
return NULL;
}
double res = a + b;
return Py_BuildValue("d", res);
}
static PyMethodDef pytest_methods[] = {
{"say_hello", say_hello, METH_VARARGS, "Say Hello!"},
{"add_numbers", add_numbers, METH_VARARGS, "Adding two numbers"},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initpytest(void)
{
Py_InitModule("pytest", pytest_methods);
}

In the absence of a proper minimal, reproducible example it's impossible to be certain. However, it's probably because you haven't initialized the interpreter: (see this question for example). You need to call Py_Initialize before using any Python functions.
Can I suggest that you use the normal Python C-API tools for running modules (rather than doing it yourself with LoadLibraryEx!) until you fully understand what how embedding Python works. You might consider PyImport_AppendInittab (before initializing) to set up your function directly and avoid the Python search path.

how to alloc user space memory in kernel space?

I hook a syscall(open) on Linux, and want to print this opened filename.
then I call syscall(getcwd) to get the absolute path.
this is source code:
void *memndup_from_user(const void __user *src, long len)
{
void *kbuf = NULL;
if(src == NULL) {
return kbuf;
}
kbuf = kmalloc(len + 1, GFP_KERNEL);
if(kbuf != NULL) {
if (copy_from_user(kbuf, src, len)) {
printk(KERN_ALERT "%s\n", "copy_from_user failed.");
kfree(kbuf);
kbuf = NULL;
}
else {
((char *)kbuf)[len] = '\0';
}
} else {
printk(KERN_ALERT "%s\n", "kmalloc failed.");
}
return kbuf;
}
void *memdup_from_user(const void __user *src)
{
long len = 0;
if(src == NULL) {
return NULL;
}
len = strlen_user(src);
return memndup_from_user(src, len);
}
asmlinkage long fake_getcwd(char __user *buf, unsigned long size)
{
return real_getcwd(buf, size);
}
asmlinkage long
fake_open(const char __user *filename, int flags, umode_t mode)
{
if(flags & O_CREAT) {
char *k_filename = (char *)memdup_from_user(filename);
char *u_path = (char *)kmalloc(PAGE_SIZE, GFP_USER);
if(k_filename != NULL) {
printk(KERN_ALERT "ano_fake_open pid:%ld create : %s\n", ano_fake_getpid(), k_filename);
kfree(k_filename);
}
if(u_path != NULL) {
long retv;
retv = fake_getcwd(u_path, PAGE_SIZE);
if(retv > 0) {
printk(KERN_ALERT "getcwd ret val: %ld, path: %s\n", retv, u_path);
} else {
printk(KERN_ALERT "getcwd ret val: %ld, error...\n", retv);
}
kfree(u_path);
}
}
return real_open(filename, flags, mode);
}
the sys_getcwd requires an user space memory, and I call kmalloc with GFP_USER.
but sys_getcwd always return -EFAULT(Bad Address)...
this is dmesg logs:
[344897.726061] fake_open pid:70393 create : sssssssssssssssss
[344897.726065] getcwd ret val: -14, error...
[344897.727431] fake_open pid:695 create : /var/lib/rsyslog/imjournal.state.tmp
[344897.727440] getcwd ret val: -14, error...
so I find the implement in sys_getcwd, he does
# define __user __attribute__((noderef, address_space(1)))
# define __kernel __attribute__((address_space(0)))
#define __getname() kmem_cache_alloc(names_cachep, GFP_KERNEL)
SYSCALL_DEFINE2(getcwd, char __user *, buf, unsigned long, size)
{
char *page = __getname();
get_fs_root_and_pwd_rcu(current->fs, &root, &pwd);
...
// char *cwd = page + xxx; (xxx < PAGE_SIZE)
// len = PAGE_SIZE + page - cwd;
...
if (len <= size) {
error = len;
if (copy_to_user(buf, cwd, len))
error = -EFAULT;
}
}
obviously, getcwd alloc memory with flag GFP_KERNEL, then copy to my buffer( __user *buf ) from (GFP_KERNEL) !!!
isn't __user MACRO be GFP_USER ?
the flag GFP_USER brief is https://elixir.bootlin.com/linux/v4.4/source/include/linux/gfp.h#L208:
/* GFP_USER is for userspace allocations that also need to be directly
* accessibly by the kernel or hardware. It is typically used by hardware
* for buffers that are mapped to userspace (e.g. graphics) that hardware
* still must DMA to. cpuset limits are enforced for these allocations.
*/
what's wrong ?

This is wrong on at least two accounts:
syscall hijacking (let alone for something like open) is just a bad idea. the only sensible method to catch all possible open path is through using LSM hooks. it also happens to deal with the actual file being opened avoiding the race: you read the path in your routine, wrapped opens reads it again. but by that time malicious userspace could have changed it and you ended up looking at the wrong file.
it should be clear getcwd has to have a method of resolving a name in order to put it into the userspace buffer. you should dig in into the call and see what can be changed to put it in a kernel buffer.
Why are you doing this to begin with?

libtorrent - storage_interface readv explanation

I have implemented a custom storage interface in libtorrent as described in the help section here.
The storage_interface is working fine, although I can't figure out why readv is only called randomly while downloading a torrent. From my view the overriden virtual function readv should get called each time I call handle->read_piece in piece_finished_alert. It should read the piece for read_piece_alert?
The buffer is provided in read_piece_alert without getting notified in readv.
So the question is why it is called only randomly and why it's not called on a read_piece() call? Is my storage_interface maybe wrong?
The code looks like this:
struct temp_storage : storage_interface
{
virtual int readv(file::iovec_t const* bufs, int num_bufs
, int piece, int offset, int flags, storage_error& ec)
{
// Only called on random pieces while downloading a larger torrent
std::map<int, std::vector<char> >::const_iterator i = m_file_data.find(piece);
if (i == m_file_data.end()) return 0;
int available = i->second.size() - offset;
if (available <= 0) return 0;
if (available > num_bufs) available = num_bufs;
memcpy(&bufs, &i->second[offset], available);
return available;
}
virtual int writev(file::iovec_t const* bufs, int num_bufs
, int piece, int offset, int flags, storage_error& ec)
{
std::vector<char>& data = m_file_data[piece];
if (data.size() < offset + num_bufs) data.resize(offset + num_bufs);
std::memcpy(&data[offset], bufs, num_bufs);
return num_bufs;
}
virtual bool has_any_file(storage_error& ec) { return false; }
virtual ...
virtual ...
}
Intialized with
storage_interface* temp_storage_constructor(storage_params const& params)
{
printf("NEW INTERFACE\n");
return new temp_storage(*params.files);
}
p.storage = &temp_storage_constructor;
The function below sets up alerts and invokes read_piece on each completed piece.
while(true) {
std::vector<alert*> alerts;
s.pop_alerts(&alerts);
for (alert* i : alerts)
{
switch (i->type()) {
case read_piece_alert::alert_type:
{
read_piece_alert* p = (read_piece_alert*)i;
if (p->ec) {
// read_piece failed
break;
}
// piece buffer, size is provided without readv
// notification after invoking read_piece in piece_finished_alert
break;
}
case piece_finished_alert::alert_type: {
piece_finished_alert* p = (piece_finished_alert*)i;
p->handle.read_piece(p->piece_index);
// Once the piece is finished, we read it to obtain the buffer in read_piece_alert.
break;
}
default:
break;
}
}
Sleep(100);
}

I will answer my own question. As Arvid said in the comments: readv was not invoked because of caching. Setting settings_pack::use_read_cache to false will invoke readv always.

C++11: How to implement fast, lightweight, and fair synchronized resource access

Question
What can I do to get a locking mechanism that provides minimal and stable latency while guaranteeing that a thread cannot reacquire a resource before another thread has acquired and released it?
The desirability of answers to this question are ranked as follows:
Some combination of built-in C++11 features that work in MinGW on Windows 7 (note that the <thread> and <mutex> libraries do not work on a Windows platform)
Some combination of Windows API features
A modification to the FairLock listed below, my own attempt at implementing such a mechanism
Some features provided by a free, open-source library that does not require a .configure/make/make install process, (getting that to work in MSYS is more of an adventure than I care for)
Background
I am writing an application which is effectively a multi-stage producer/consumer. One thread generates input consumed by another thread, which produces output consumed by yet another thread. The application uses pairs of buffers so that, after an initial delay, all threads can work nearly simultaneously.
Since I am writing a Windows 7 application, I had been using CriticalSections to guard the buffers. The problem with using CriticalSections (or, so far as I can tell, any other Windows or C++11-built-in synchronization object) is that it does not allow for any provision that a thread that just released a lock cannot reacquire it until another thread has done so first. Because of this, many of my test drivers for the middle thread (the Encoder) never gave the Encoder a chance to acquire the test input buffers and completed without having tested them. The end result was a ridiculous process of trying to determine an artificial wait time that stochastically worked for my machine.
Since the structure of my application requires that each stage waits for the other stage to have acquired, finished using, and released the necessary buffers for getting to use the buffer again, I need, for lack of a better term, a fair locking mechanism. I took a crack at writing one (the source code is provided below). In testing, this FairLock allows my test driver to run my Encoder at the same speeds that I was able to achieve using the CriticalSection maybe 60% of the runs. The other 40% of the runs take anywhere between 10 to 100 ms longer, which is not acceptable for my application.
FairLock
// FairLock.hpp
#ifndef FAIRLOCK_HPP
#define FAIRLOCK_HPP
#include <atomic>
using namespace std;
class FairLock {
private:
atomic_bool owned {false};
atomic<DWORD> lastOwner {0};
public:
FairLock(bool owned);
bool inline hasLock() const;
bool tryLock();
void seizeLock();
void tryRelease();
void waitForLock();
};
#endif
// FairLock.cpp
#include <windows.h>
#include "FairLock.hpp"
#define ID GetCurrentThreadId()
FairLock::FairLock(bool owned) {
if (owned) {
this->owned = true;
this->lastOwner = ID;
} else {
this->owned = false;
this->lastOwner = 0;
}
}
bool inline FairLock::hasLock() const {
return owned && lastOwner == ID;
}
bool FairLock::tryLock() {
bool success = false;
DWORD id = ID;
if (owned) {
success = lastOwner == id;
} else if (
lastOwner != id &&
owned.compare_exchange_strong(success, true)
) {
lastOwner = id;
success = true;
} else {
success = false;
}
return success;
}
void FairLock::seizeLock() {
bool success = false;
DWORD id = ID;
if (!(owned && lastOwner == id)) {
while (!owned.compare_exchange_strong(success, true)) {
success = false;
}
lastOwner = id;
}
}
void FairLock::tryRelease() {
if (hasLock()) {
owned = false;
}
}
void FairLock::waitForLock() {
bool success = false;
DWORD id = ID;
if (!(owned && lastOwner == id)) {
while (lastOwner == id); // spin
while (!owned.compare_exchange_strong(success, true)) {
success = false;
}
lastOwner = id;
}
}
EDIT
DO NOT USE THIS FairLock CLASS; IT DOES NOT GUARANTEE MUTUAL EXCLUSION!
I reviewed the above code to compare it against The C++ Programming Language: 4th Edition text I had not read carefully and what CouchDeveloper's recommended Synchronous Queue. I realized that there are several sequences in which the thread that just released the FairLock can be tricked into thinking it still owns it. All it takes is interleaving instructions as follows:
New owner: set owned to true
Old owner: is owned true? yes
Old owner: am I the last owner? yes
New owner: set me as the last owner
At this point, the old and new owners both enter their critical sections.
I am considering whether this problem has a solution and whether it is worth attempting to solve this at all. In the meantime, don't use this unless you see a fix.

I would implement this in C++11 using a condition_variable-per-thread setup so that I could choose exactly which thread to wake up when (Live demo at Coliru):
class FairMutex {
private:
class waitnode {
std::condition_variable cv_;
waitnode* next_ = nullptr;
FairMutex& fmtx_;
public:
waitnode(FairMutex& fmtx) : fmtx_(fmtx) {
*fmtx.tail_ = this;
fmtx.tail_ = &next_;
}
~waitnode() {
for (waitnode** p = &fmtx_.waiters_; *p; p = &(*p)->next_) {
if (*p == this) {
*p = next_;
if (!next_) {
fmtx_.tail_ = &fmtx_.waiters_;
}
break;
}
}
}
void wait(std::unique_lock<std::mutex>& lk) {
while (fmtx_.held_ || fmtx_.waiters_ != this) {
cv_.wait(lk);
}
}
void notify() {
cv_.notify_one();
}
};
waitnode* waiters_ = nullptr;
waitnode** tail_ = &waiters_;
std::mutex mtx_;
bool held_ = false;
public:
void lock() {
auto lk = std::unique_lock<std::mutex>{mtx_};
if (held_ || waiters_) {
waitnode{*this}.wait(lk);
}
held_ = true;
}
bool try_lock() {
if (mtx_.try_lock()) {
std::lock_guard<std::mutex> lk(mtx_, std::adopt_lock);
if (!held_ && !waiters_) {
held_ = true;
return true;
}
}
return false;
}
void unlock() {
std::lock_guard<std::mutex> lk(mtx_);
held_ = false;
if (waiters_ != nullptr) {
waiters_->notify();
}
}
};
FairMutex models the Lockable concept so it can be used like any other standard library mutex type. Put simply, it achieves fairness by inserting waiters into a list in arrival order, and passing the mutex to the first waiter in the list when unlocking.

If it's useful:
This demonstrates *) an implementation of a "synchronous queue" using semaphores as synchronization primitives.
Note: the actually implementation uses semaphores implemented with GCD (Grand Central Dispatch):
using gcd::mutex;
using gcd::semaphore;
// A blocking queue in which each put must wait for a get, and vice
// versa. A synchronous queue does not have any internal capacity,
// not even a capacity of one.
template <typename T>
class simple_synchronous_queue {
public:
typedef T value_type;
enum result_type {
OK = 0,
TIMEOUT_NOT_DELIVERED = -1,
TIMEOUT_NOT_PICKED = -2,
TIMEOUT_NOTHING_OFFERED = -3
};
simple_synchronous_queue()
: sync_(0), send_(1), recv_(0)
{
}
void put(const T& v) {
send_.wait();
new (address()) T(v);
recv_.signal();
sync_.wait();
}
result_type put(const T& v, double timeout) {
if (send_.wait(timeout)) {
new (storage_) T(v);
recv_.signal();
if (sync_.wait(timeout)) {
return OK;
}
else {
return TIMEOUT_NOT_PICKED;
}
}
else {
return TIMEOUT_NOT_DELIVERED;
}
}
T get() {
recv_.wait();
T result = *address();
address()->~T();
sync_.signal();
send_.signal();
return result;
}
std::pair<result_type, T> get(double timeout) {
if (recv_.wait(timeout)) {
std::pair<result_type, T> result =
std::pair<result_type, T>(OK, *address());
address()->~T();
sync_.signal();
send_.signal();
return result;
}
else {
return std::pair<result_type, T>(TIMEOUT_NOTHING_OFFERED, T());
}
}
private:
using storage_t = typename std::aligned_storage<sizeof(T), std::alignment_of<T>::value>::type;
T* address() { 
return static_cast<T*>(static_cast<void*>(&storage_));
}
storage_t storage_;
semaphore sync_;
semaphore send_;
semaphore recv_;
};
*) demonstrates: be carefully about potential issues, could be improved, etc. ... ;)

I accepted CouchDeveloper's answer since it pointed me down the right path. I wrote a Windows-specific C++11 implementation of a synchronous queue, and added this answer so that others could consider/use it if they so choose.
// SynchronousQueue.hpp
#ifndef SYNCHRONOUSQUEUE_HPP
#define SYNCHRONOUSQUEUE_HPP
#include <atomic>
#include <exception>
#include <windows>
using namespace std;
class CouldNotEnterException: public exception {};
class NoPairedCallException: public exception {};
template typename<T>
class SynchronousQueue {
private:
atomic_bool valueReady {false};
CRITICAL_SECTION getCriticalSection;
CRITICAL_SECTION putCriticalSection;
DWORD wait {0};
HANDLE getSemaphore;
HANDLE putSemaphore;
const T* address {nullptr};
public:
SynchronousQueue(DWORD waitMS): wait {waitMS}, address {nullptr} {
initializeCriticalSection(&getCriticalSection);
initializeCriticalSection(&putCriticalSection);
getSemaphore = CreateSemaphore(nullptr, 0, 1, nullptr);
putSemaphore = CreateSemaphore(nullptr, 0, 1, nullptr);
}
~SynchronousQueue() {
EnterCriticalSection(&getCriticalSection);
EnterCriticalSection(&putCriticalSection);
CloseHandle(getSemaphore);
CloseHandle(putSemaphore);
DeleteCriticalSection(&putCriticalSection);
DeleteCriticalSection(&getCriticalSection);
}
void put(const T& value) {
if (!TryEnterCriticalSection(&putCriticalSection)) {
throw CouldNotEnterException();
}
ReleaseSemaphore(putSemaphore, (LONG) 1, nullptr);
if (WaitForSingleObject(getSemaphore, wait) != WAIT_OBJECT_0) {
if (WaitForSingleObject(putSemaphore, 0) == WAIT_OBJECT_0) {
LeaveCriticalSection(&putCriticalSection);
throw NoPairedCallException();
} else {
WaitForSingleObject(getSemaphore, 0);
}
}
address = &value;
valueReady = true;
while (valueReady);
LeaveCriticalSection(&putCriticalSection);
}
T get() {
if (!TryEnterCriticalSection(&getCriticalSection)) {
throw CouldNotEnterException();
}
ReleaseSemaphore(getSemaphore, (LONG) 1, nullptr);
if (WaitForSingleObject(putSemaphore, wait) != WAIT_OBJECT_0) {
if (WaitForSingleObject(getSemaphore, 0) == WAIT_OBJECT_0) {
LeaveCriticalSection(&getCriticalSection);
throw NoPairedCallException();
} else {
WaitForSingleObject(putSemaphore, 0);
}
}
while (!valueReady);
T toReturn = *address;
valueReady = false;
LeaveCriticalSection(&getCriticalSection);
return toReturn;
}
};
#endif

POSIX Semaphores on Mac OS X: sem_timedwait alternative

I am trying to port a project (from linux) that uses Semaphores to Mac OS X however some of the posix semaphores are not implemented on Mac OS X
The one that I hit in this port is sem_timedwait()
I don't know much about semaphores but from the man pages sem_wait() seems to be close to sem_timedwait and it is implemented
From the man pages
sem_timedwait() function shall
lock the semaphore referenced by
sem as in the sem_wait() function.
However, if the semaphore cannot be
locked without waiting for another
process or thread to unlock the
semaphore by performing a sem_post()
function, this wait shall be ter-
minated when the specified timeout
expires
From my limited understanding of how semphores work I can see that sem_timedwait() is safer, but I still should be able to use sem_wait()
Is this correct? If not what other alternatives do I have...
Thanks

It's likely that the timeout is important to the operation of the algorithm. Therefore just using sem_wait() might not work.
You could use sem_trywait(), which returns right away in all cases. You can then loop, and use a sleep interval that you choose, each time decrementing the total timeout until you either run out of timeout or the semaphore is acquired.
A much better solution is to rewrite the algorithm to use a condition variable, and then you can use pthread_cond_timedwait() to get the appropriate timeout.

Yet another alternative may be to use the sem_timedwait.c
implementation by Keith Shortridge of the Australian Astronomical Observatory's software group.
From the source file:
/*
* s e m _ t i m e d w a i t
*
* Function:
* Implements a version of sem_timedwait().
*
* Description:
* Not all systems implement sem_timedwait(), which is a version of
* sem_wait() with a timeout. Mac OS X is one example, at least up to
* and including version 10.6 (Leopard). If such a function is needed,
* this code provides a reasonable implementation, which I think is
* compatible with the standard version, although possibly less
* efficient. It works by creating a thread that interrupts a normal
* sem_wait() call after the specified timeout.
*
* ...
*
* Limitations:
*
* The mechanism used involves sending a SIGUSR2 signal to the thread
* calling sem_timedwait(). The handler for this signal is set to a null
* routine which does nothing, and with any flags for the signal
* (eg SA_RESTART) cleared. Note that this effective disabling of the
* SIGUSR2 signal is a side-effect of using this routine, and means it
* may not be a completely transparent plug-in replacement for a
* 'normal' sig_timedwait() call. Since OS X does not declare the
* sem_timedwait() call in its standard include files, the relevant
* declaration (shown above in the man pages extract) will probably have
* to be added to any code that uses this.
*
* ...
*
* Copyright (c) Australian Astronomical Observatory.
* Commercial use requires permission.
* This code comes with absolutely no warranty of any kind.
*/

I used to use named semaphores on OSX, but now sem_timedwait isn't available and sem_init and friends are deprecated. I implemented semaphores using pthread mutex and conditions as follows which work for me (OSX 10.13.1). You might have to make a handle vs struct table and look up the sem_t type if it can't hold a ptr in it (i.e. pointers are 64bits and sem_t is 32?)
#ifdef __APPLE__
typedef struct
{
pthread_mutex_t count_lock;
pthread_cond_t count_bump;
unsigned count;
}
bosal_sem_t;
int sem_init(sem_t *psem, int flags, unsigned count)
{
bosal_sem_t *pnewsem;
int result;
pnewsem = (bosal_sem_t *)malloc(sizeof(bosal_sem_t));
if (! pnewsem)
{
return -1;
}
result = pthread_mutex_init(&pnewsem->count_lock, NULL);
if (result)
{
free(pnewsem);
return result;
}
result = pthread_cond_init(&pnewsem->count_bump, NULL);
if (result)
{
pthread_mutex_destroy(&pnewsem->count_lock);
free(pnewsem);
return result;
}
pnewsem->count = count;
*psem = (sem_t)pnewsem;
return 0;
}
int sem_destroy(sem_t *psem)
{
bosal_sem_t *poldsem;
if (! psem)
{
return EINVAL;
}
poldsem = (bosal_sem_t *)*psem;
pthread_mutex_destroy(&poldsem->count_lock);
pthread_cond_destroy(&poldsem->count_bump);
free(poldsem);
return 0;
}
int sem_post(sem_t *psem)
{
bosal_sem_t *pxsem;
int result, xresult;
if (! psem)
{
return EINVAL;
}
pxsem = (bosal_sem_t *)*psem;
result = pthread_mutex_lock(&pxsem->count_lock);
if (result)
{
return result;
}
pxsem->count = pxsem->count + 1;
xresult = pthread_cond_signal(&pxsem->count_bump);
result = pthread_mutex_unlock(&pxsem->count_lock);
if (result)
{
return result;
}
if (xresult)
{
errno = xresult;
return -1;
}
}
int sem_trywait(sem_t *psem)
{
bosal_sem_t *pxsem;
int result, xresult;
if (! psem)
{
return EINVAL;
}
pxsem = (bosal_sem_t *)*psem;
result = pthread_mutex_lock(&pxsem->count_lock);
if (result)
{
return result;
}
xresult = 0;
if (pxsem->count > 0)
{
pxsem->count--;
}
else
{
xresult = EAGAIN;
}
result = pthread_mutex_unlock(&pxsem->count_lock);
if (result)
{
return result;
}
if (xresult)
{
errno = xresult;
return -1;
}
return 0;
}
int sem_wait(sem_t *psem)
{
bosal_sem_t *pxsem;
int result, xresult;
if (! psem)
{
return EINVAL;
}
pxsem = (bosal_sem_t *)*psem;
result = pthread_mutex_lock(&pxsem->count_lock);
if (result)
{
return result;
}
xresult = 0;
if (pxsem->count == 0)
{
xresult = pthread_cond_wait(&pxsem->count_bump, &pxsem->count_lock);
}
if (! xresult)
{
if (pxsem->count > 0)
{
pxsem->count--;
}
}
result = pthread_mutex_unlock(&pxsem->count_lock);
if (result)
{
return result;
}
if (xresult)
{
errno = xresult;
return -1;
}
return 0;
}
int sem_timedwait(sem_t *psem, const struct timespec *abstim)
{
bosal_sem_t *pxsem;
int result, xresult;
if (! psem)
{
return EINVAL;
}
pxsem = (bosal_sem_t *)*psem;
result = pthread_mutex_lock(&pxsem->count_lock);
if (result)
{
return result;
}
xresult = 0;
if (pxsem->count == 0)
{
xresult = pthread_cond_timedwait(&pxsem->count_bump, &pxsem->count_lock, abstim);
}
if (! xresult)
{
if (pxsem->count > 0)
{
pxsem->count--;
}
}
result = pthread_mutex_unlock(&pxsem->count_lock);
if (result)
{
return result;
}
if (xresult)
{
errno = xresult;
return -1;
}
return 0;
}
#endif

Have you considered using the apache portable runtime? It's preinstalled on every Mac OS X Box and many Linux distros and it comes with a platform neutral wrapper around thread concurrency, that works even on MS Windows:
http://apr.apache.org/docs/apr/1.3/group__apr__thread__cond.html

I think the simplest solution is to use sem_wait() in combination with a call to alarm() to wake up abort the wait. For example:
alarm(2);
int return_value = sem_wait( &your_semaphore );
if( return_value == EINTR )
printf( "we have been interrupted by the alarm." );
One issue is that alarm takes seconds as input so the timed wait might be too long in your case.
-- aghiles

One option is to use low-level semaphore mach API:
#include <mach/semaphore.h>
semaphore_create(...)
semaphore_wait(...)
semaphore_timedwait(...)
semaphore_signal(...)
semaphore_destroy(...)
It is used in libuv BTW.
Reference:
https://opensource.apple.com/source/xnu/xnu-201/osfmk/kern/sync_sema.c
https://github.com/libuv/libuv/blob/master/src/unix/thread.c

Could you try to mimic the functionality of the sem_timedwait() call by starting a timer in another thread that calls sem_post() after the timer expires if it hasn't been called by the primary thread that is supposed to call sem_post()?

If you can just use MP API:
MPCreateSemaphore/MPDeleteSemaphore
MPSignalSemaphore/MPWaitOnSemaphore
MPWaitOnSemaphore exists with kMPTimeoutErr if specified timeout is exceeded without signaling.

I was planning on using the following function as a replacement but then I discovered that sem_getvalue() was also deprecated and non-functional on OSX. You are free to use the following slightly untested code under a MIT or LGPL license (your choice).
#ifdef __APPLE__
struct CSGX__sem_timedwait_Info
{
pthread_mutex_t MxMutex;
pthread_cond_t MxCondition;
pthread_t MxParent;
struct timespec MxTimeout;
bool MxSignaled;
};
void *CSGX__sem_timedwait_Child(void *MainPtr)
{
CSGX__sem_timedwait_Info *TempInfo = (CSGX__sem_timedwait_Info *)MainPtr;
pthread_mutex_lock(&TempInfo->MxMutex);
// Wait until the timeout or the condition is signaled, whichever comes first.
int Result;
do
{
Result = pthread_cond_timedwait(&TempInfo->MxCondition, &TempInfo->MxMutex, &TempInfo->MxTimeout);
if (!Result) break;
} while (1);
if (errno == ETIMEDOUT && !TempInfo->MxSignaled)
{
TempInfo->MxSignaled = true;
pthread_kill(TempInfo->MxParent, SIGALRM);
}
pthread_mutex_unlock(&TempInfo->MxMutex);
return NULL;
}
int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout)
{
// Quick test to see if a lock can be immediately obtained.
int Result;
do
{
Result = sem_trywait(sem);
if (!Result) return 0;
} while (Result < 0 && errno == EINTR);
// Since it couldn't be obtained immediately, it is time to shuttle the request off to a thread.
// Depending on the timeout, this could take longer than the timeout.
CSGX__sem_timedwait_Info TempInfo;
pthread_mutex_init(&TempInfo.MxMutex, NULL);
pthread_cond_init(&TempInfo.MxCondition, NULL);
TempInfo.MxParent = pthread_self();
TempInfo.MxTimeout.tv_sec = abs_timeout->tv_sec;
TempInfo.MxTimeout.tv_nsec = abs_timeout->tv_nsec;
TempInfo.MxSignaled = false;
sighandler_t OldSigHandler = signal(SIGALRM, SIG_DFL);
pthread_t ChildThread;
pthread_create(&ChildThread, NULL, CSGX__sem_timedwait_Child, &TempInfo);
// Wait for the semaphore, the timeout to expire, or an unexpected error condition.
do
{
Result = sem_wait(sem);
if (Result == 0 || TempInfo.MxSignaled || (Result < 0 && errno != EINTR)) break;
} while (1);
// Terminate the thread (if it is still running).
TempInfo.MxSignaled = true;
int LastError = errno;
pthread_mutex_lock(&TempInfo.MxMutex);
pthread_cond_signal(&TempInfo.MxCondition);
pthread_mutex_unlock(&TempInfo.MxMutex);
pthread_join(ChildThread, NULL);
pthread_cond_destroy(&TempInfo.MxCondition);
pthread_mutex_destroy(&TempInfo.MxMutex);
// Restore previous signal handler.
signal(SIGALRM, OldSigHandler);
errno = LastError;
return Result;
}
#endif
SIGALRM makes more sense than SIGUSR2 as another example here apparently uses (I didn't bother looking at it). SIGALRM is mostly reserved for alarm() calls, which are virtually useless when you want sub-second resolution.
This code first attempts to acquire the semaphore with sem_trywait(). If that immediately succeeds, then it bails out. Otherwise, it starts a thread which is where the timer is implemented via pthread_cond_timedwait(). The MxSignaled boolean is used to determine the timeout state.
You may also find this relevant function useful for calling the above sem_timedwait() implementation (again, MIT or LGPL, your choice):
int CSGX__ClockGetTimeRealtime(struct timespec *ts)
{
#ifdef __APPLE__
clock_serv_t cclock;
mach_timespec_t mts;
if (host_get_clock_service(mach_host_self(), CALENDAR_CLOCK, &cclock) != KERN_SUCCESS) return -1;
if (clock_get_time(cclock, &mts) != KERN_SUCCESS) return -1;
if (mach_port_deallocate(mach_task_self(), cclock) != KERN_SUCCESS) return -1;
ts->tv_sec = mts.tv_sec;
ts->tv_nsec = mts.tv_nsec;
return 0;
#else
return clock_gettime(CLOCK_REALTIME, ts);
#endif
}
Helps populate a timespec structure with the closest thing to what clock_gettime() can provide. There are various comments out there that calling host_get_clock_service() repeatedly is expensive. But starting up a thread is also expensive.
The real fix is for Apple to implement the entire POSIX specification, not just the mandatory parts. Implementing only the mandatory bits of POSIX and then claiming POSIX compliance just leaves everyone with a half-broken OS and tons of workarounds like the above that may have less-than-ideal performance.
The above all said, I am giving up on native semaphores (both Sys V and POSIX) on both Mac OSX and Linux. They are broken in quite a few rather unfortunate ways. Everyone else should give up on them too. (I'm not giving up on semaphores on those OSes, just the native implementations.) At any rate, now everyone has a sem_timedwait() implementation without commercial restrictions that others can copy-pasta to their heart's content.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What is the stack used for in CPython, if anything? - memory-management

Related

Trying to load pyd with LoadLibraryEx and got failed

how to alloc user space memory in kernel space?

libtorrent - storage_interface readv explanation

C++11: How to implement fast, lightweight, and fair synchronized resource access

POSIX Semaphores on Mac OS X: sem_timedwait alternative

Categories

Resources