Why did POCO chose to use Posix semaphores for OSX?

Why did POCO chose to use Posix semaphores for OSX? - macos

I am a new bee to MAC/OSX. I am working on Titanium a cross platform runtime, which uses POCO library for most of the portable C++ API. I see that POCO uses POSIX semaphore for its NamedMutex implementation on OSX as opposed to SysV semaphore that it is using for few other *NIX.
bool NamedMutexImpl::tryLockImpl()
{
#if defined(sun) || defined(__APPLE__) || defined(__osf__) || defined(__QNX__) || defined(_AIX)
return sem_trywait(_sem) == 0;
#else
struct sembuf op;
op.sem_num = 0;
op.sem_op = -1;
op.sem_flg = SEM_UNDO | IPC_NOWAIT;
return semop(_semid, &op, 1) == 0;
#endif
}
For few searches, I see that SysV sem_* API is supported on OSX as well: http://www.osxfaq.com/man/2/semop.ws. Any Idea, why POCO developers chose to use POSIX API on OSX?
I am particularly intested in SEM_UNDO functionality in the above call, which the POSIX semaphores can't give.

Any Idea, why POCO developers chose to use POSIX API on OSX?
That seems to be rather arbitrary decision on part of POCO developers: both semaphores do not really match the Windows' named semaphores (after which they are apparently crafted). There is no semaphore on POSIX which has its own symbolic namespace akin to a file system. (SysV sems have namespace made of integer ids, but no symbolic names.)
If the posted code really comes from the library, I can only advise to stop relying on the library for portability. Well, at least with the semaphores you apparently have to start your own implementation already.
Edit1. Check how the semaphores are implemented for Windows. It's common for such libraries to use Windows' critical sections. Then the POSIX sem_t is a proper match. You need SEM_UNDO only if the semaphore is accessed by several processes - it doesn't work for threads. I.e. undo happens when the process crashes. Though the fact that on Linux they use SysV is quite troubling. SysV semaphores are global and thus have OS limit (can be changed on run-time) on their number - while sem_t semaphores are local to the process, are just a structure in private memory and are limited only by the amount of local memory process can allocate.
P.S. Reluctantly. Real reason might be that the POCO main development takes place on Windows (usual to the "portable libraries"; they are "portable to Windows" so to say trying to make *NIX look like Windows). UNIX implementation is very very often an afterthought, implemented by somebody who has seen terminal screen from few meters away and never read the man page further the function prototype. That was my personal experience with couple of such "portable libraries" in the past.

Related

Shell script: Portable way to programmably obtain the CPU vendor on POSIX systems

Is there a portable way to programmably obtain the CPU vendor info on POSIX systems in shell scripts? In particular, I need to tell whether an x86_64/AMD64 CPU is vended by Intel or AMD. The approach does not have to work on all POSIX systems, but it should work on a decent range of common POSIX systems: GNU/Linux, MacOS, and *BSD. As an example, a Linux only approach is to extract the info from /proc/cpuinfo.

POSIX (IEEE Std 1003.1-2017) does not mandate a system utility or shell variable holding the CPU brand. The closest you'll get is uname -m, which is the "hardware type on which the system is running". Unfortunately, that command doesn't have standardized output, so while you might get amd64 on some older machines, you'll mostly get i686 or x86_64 these days.
POSIX does mandate c99, a basic C compiler interface, be present when a C compiler is available at all. You can use that to compile a naive version of cpuid:
$ cat cpuid.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main() {
uint32_t regs[4] = { 0 };
char brand[13] = { 0 };
#ifdef _WIN32
__cpuidex((int *)regs, 0, 0);
#else
__asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "a" (0), "c" (0));
#endif
memcpy(&brand[0], &regs[1], 4);
memcpy(&brand[4], &regs[3], 4);
memcpy(&brand[8], &regs[2], 4);
printf("%s\n", brand);
return 0;
}
On a variety of test machines, here's what I get:
$ c99 -o cpuid cpuid.c && ./cpuid # MacOS X
GenuineIntel
$ c99 -o cpuid cpuid.c && ./cpuid # Intel-based AWS EC2 (M5)
GenuineIntel
$ c99 -o cpuid cpuid.c && ./cpuid # AMD-based AWS EC2 (T3a)
AuthenticAMD
Wikipedia lists numerous other possible vendor brands based on the cpuid instruction, but the ones likely most interesting for your defined use case are:
GenuineIntel - Intel
AMDisbetter! - AMD
AuthenticAMD - AMD
Provided you had this simple executable available in your path, the POSIX-y logic would look like:
if cpuid | grep -q AMD; then
: # AMD logic here
elif cpuid | grep -q Intel; then
: # Intel logic here
else # neither Intel nor AMD
echo "Unsupported CPU vendor: $(cpuid)" >&2
fi
If you have a very, very old multi-core motherboard from the days when AMD was pin-equivalent with Intel, then you might care to know if CPU0 and CPU1 are the same vendor, in which case the C program above can be modified in the assembly lines to check processor 1 instead of 0 (the second argument to the respective asm functions).
This illustrates one particular benefit of this approach: if what you really want to know is whether the CPU supports a particular feature set (and are just using vendor as a proxy), then you can modify the C code to check whether the CPU feature is actually available. That's a quick modification to the EAX value given to the assembly code and a change to the interpretation of the E{B,C,D}X result registers.
With regards to the availability of c99, note that:
A POSIX conforming system without c99 is proof that no C compiler's available on that system. If your target systems do not have c99, then you need to select and install a C compiler (gcc, clang, msvc, etc.) or attempt a fallback detection with eg /proc/cpuinfo.
The standard declares that "Unlike all of the other non-OB-shaded utilities in this standard, a utility by this name probably will not appear in the next version of this standard. This utility's name is tied to the current revision of the ISO C standard at the time this standard is approved. Since the ISO C standard and this standard are maintained by different organizations on different schedules, we cannot predict what the compiler will be named in the next version of the standard." So you should consider compiling with something along the lines of ${C99:-c99} -o cpuid cpuid.c, which lets you flex as the binary name changes over time.

I would proceed writing exact commands to get the cpu vendor for every supported OS and then run the appropriate set of commands for the given OS detection.
I wrote an example that can be easily improved / extended, taking in consideration the Operating Systems in your question:
OS="`uname`"
case "$OS" in
SunOS*) /usr/platform/`uname -m`/sbin/prtdiag -v ;;
Darwin*) sysctl -n machdep.cpu.vendor ;;
Linux*) lscpu | grep Vendor | awk '{print $NF}' ;;
FreeBSD*) sysctl -n hw.model | awk 'NR==1{print $NF}' ;;
*) echo "unknown: $OS" ;;
esac

This is the basic logic you need:
Detect OS type: linux OR BSD if BSD darwin OR other bsd. If other BSD, openbsd, freebsd, netbsd, dragonfly bsd. If darwin, you'll need darwin specific handling. If not a bsd and not linux, is it a proprietary type Unix? Are you going to try to handle it? If not, you need a safe fallback. This will determine what methods you use to do some, but not all, of the detections.
If linux, it's easy if all you want is intel or amd, unless you need solid 32/64 bit detections, you specified 64 bit only, is this the running kernel or the cpu? So that has to be handled if it's relevant. Does it matter what type of intel/amd cpu it is? They make some SOC variants for example.
sysctl for BSDs will give whatever each BSD decided to put in there. dragonfly and freebsd will be similar or the same, openbsd you have to check release to release, netbsd... is tricky. Some installs will require root to read sysctl, that's out of your hands, so you have to handle it case by case, and have error handling to detect root required, that varies, the usual is to make it user readable data, but not always. Note that the bsds can and do change the syntax of some field's data in the output, so you have to keep up on it if you actually want bsd support. Apple in general does not seem to care at all about real unix tools being able to work with their data, so it's empirical, don't assume without seeing several generations of the output. And they don't include a lot of standard unix tools by default so you can't assume things are actually installed in the first place.
/proc/cpuinfo will cover all linux system for amd/intel, and a variety of methods can be used to pinpoint if it's running 32 bit or 64 bit, and if it's a 32 or 64 bit cpu.
vm's can help, but only go part of the way since the cpu will be your host machine's, or part of it. Getting reliable current and last generation data that is reliable and real is a pain. But if you have intel and amd systems to work with, you can install most of the bsd variants except darwin/osx and debug on those, so that gets you to most of the os types, except darwin, which requires having a mac of some type available.
Does failure matter? Does it actually matter if the detection fails? If so, how is failure handled? Does ARM/MIPS/PPC matter? What about other CPUs, like Elbrus? that have many intel-like features, but which are not amd or intel?
Like the comment said, read the cpu block in inxi to pick out what you need, but it's not easy to do, and requires a lot of data examples, and you'll be sad because one day FreeBSD or osx or openbsd will change something at total random for a new release.
If you ignore OSX, and pretend it doesn't exist, on the bright side, you'll get 98% support out of the box with very little code if all you need is intel/amd detection via /proc/cpuinfo, that prints it out as neat as can be desired. If you must have OSX, then you have to add the full suite of BSD handlers, which is a pain. Personally I wouldn't touch a project like that unless I got paid to do it, re OSX. Usually you can get FreeBSD and maybe OpenBSD reasonably readily, though you have to check every new major release to see if it all still works.
If you add more requirements, like cpus other than intel/amd, then it gets a lot harder and takes much more code.
Note that on darwin, currently all osx is I believe intel, though there's rumors apple is looking to leave intel. previously they were powerpc, so it also comes down to how robust the solution has to be, that is, do you care if it fails on a mac powerpc? do you care if it fails on a future mac that is not intel powered?
Further note that if BSD is specified, that excludes a wide variety of even more fragmented Unix systems, like openindiana, solaris proper, the proprietary unices of ibm, hp, and so on, which all use different tools.

How does Go make system calls?

As far as I know, in CPython, open() and read() - the API to read a file is written in C code. The C code probably calls some C library which knows how to make system call.
What about a language such as Go? Isn't Go itself now written in Go? Does Go call C libraries behind the scenes?

The short answer is "it depends".
Go compiles for multiple combinations of H/W and OS, and they all have different approaches to how syscalls are to be made when working with them.
For instance, Solaris does not provide a stable supported set of syscalls, so they go through the systems libc — just as required by the vendor.
Windows does support a rather stable set of syscalls but it is defined as a C API provided by a set of standard DLLs.
The functions exposed by those DLLs are mostly shims which use a single "make a syscall by number" function, but these numbers are not documented and are different between the kernel flavours and releases (perhaps, intentionally).
Linux does provide a stable and documented set of numbered syscalls and hence there Go just calls the kernel directly.
Now keep in mind that for Go to "call the kernel directly" means following the so-called ABI of the H/W and OS combo. For instance, on modern Linux on amd64 making a syscall requires filling a set of CPU registers with certain values, doing some other arrangements and then issuing the SYSENTER CPU instruction.
On Windows, you have to use its native calling convention (which is stdcall, not cdecl).

Yes go is now written in go. But, you don't need C to make syscalls.
An important thing to call out is that syscalls aren't "written in C." You can make syscalls from C on Unix because of <unistd.h>. In particular, how Linux defines this header is a little convoluted, but you can see from this file the general idea. Syscalls are defined with a name and a number. When you call read for example, what really happens behind the scenes is the parameters are setup in the proper registers/memory (linux expects the syscall number in eax) followed by the instruction syscall which fires interrupt 0x80. The OS has already setup the proper interrupt handlers that will receive this interrupt and the OS goes about doing whatever is needed for that syscall. So, you don't need something written in C (or a standard library for that matter) to make syscalls. You just need to understand the call ABI and know the interrupt numbers.
However, as #retgits points out golang's approach is to piggyback off the fact that libc already has all of the logic for handling syscalls. mksyscall.go is a CLI script that parses these libc files to extract the necessary information.
You can actually trace the life of a syscall if you compile a go script like:
package main
import (
"syscall"
)
func main() {
var buf []byte
syscall.Read(9, buf)
}
Run objdump -D on the resulting binary. The go runtime is rather large, so your best bet is to find the main function, see where it calls syscall.Read and then search for the offsets from there: syscall.Read calls syscall.syscall, syscall.syscall calls runtime.libcCall (which switches from the go ABI to C ABI compatibility so that arguments are located where the OS expects--you can see this in runtime, for darwin for example), runtime.libcCall calls runtime.asmcgocall, etc.
For extra fun, run that binary with gdb and continue stepping in until you hit the syscall.

The sys package takes care of the syscalls to the underlying OS. Depending on the OS you're using different packages are used to generate the appropriate calls. Here is a link to the README for Go running on Unix systems: https://github.com/golang/sys/blob/master/unix/README.md the parts on mksyscall.go, which are hand-written Go files which implement system calls that need special handling, and type files, should walk you through how it works.

The Go compiler (which translates the Go code to target CPU code) is written in Go but that is different to the run time support code which is what you are talking about. The standard library is mainly written in Go and probably knows how to directly make system calls with no C code involved. However, there may be a bit of C support code, depending on the target platform.

feature request: an atomicAdd() function included in gwan.h

In the G-WAN KV options, KV_INCR_KEY will use the 1st field as the primary key.
That means there is a function which increments atomically already built in the G-WAN core to make this primary index work.
It would be good to make this function opened to be used by servlets, i.e. included in gwan.h.
By doing so, ANSI C newbies like me could benefit from it.

There was ample discussion about this on the old G-WAN forum, and people were invited to share their experiences with atomic operations in order to build a rich list of documented functions, platform by platform.
Atomic operations are not portable because they address the CPU directly. It means that the code for Intel x86 (32-bit) and Intel AMD64 (64-bit) is different. Each platform (ARM, Power7, Cell, Motorola, etc.) has its own atomic instruction sets.
Such a list was not published in the gwan.h file so far because basic operations are easy to find (the GCC compiler offers several atomic intrinsics as C extensions) but more sophisticated operations are less obvious (needs asm skills) and people will build them as they need - for very specific uses in their code.
Software Engineering is always a balance between what can be made available at the lowest possible cost to entry (like the G-WAN KV store, which uses a small number of functions) and how it actually works (which is far less simple to follow).
So, beyond the obvious (incr/decr, set/get), to learn more about atomic operations, use Google, find CPU instruction sets manuals, and arm yourself with courage!

Thanks for Gil's helpful guidance.
Now, I can do it by myself.
I change the code in persistence.c, as below:
firstly, i changed the definition of val in data to volatile.
//data[0]->val++;
//xbuf_xcat(reply, "Value: %d", data[0]->val);
int new_count, loops=50000000, time1, time2, time;
time1=getus();
for(int i; i<loops; i++){
new_count = __sync_add_and_fetch(&data[0]->val, 1);
}
time2=getus();
time=loops/(time2-time1);
time=time*1000;
xbuf_xcat(reply, "Value: %d, time: %d incr_ops/msec", new_count, time);
I got 52,000 incr_operations/msec with my old E2180 CPU.
So, with GCC compiler I can do it by myself.
thanks again.

Are the *A Win32 API calls still relevant?

I still see advice about using the LPTSTR/TCHAR types, etc., instead of LPWSTR/WCHAR. I believe the Unicode stuff was well introduced at Win2k, and I frankly don't write code for Windows 98 anymore. (Excepting special cases, of course.) Given that I don't care about Windows 98 (or, even less, ME) as they're decade old OS, is there any reason to use the compatibility TCHAR, etc. types? Why still advise people to use TCHAR - what benefit does it add over using WCHAR directly?

If someone tells you to walk up to 1,000,000 lines of non-_UNICODE C++, with plenty of declarations using char instead of wchar_t or TCHAR or WCHAR, you had better be prepared to cope with the non-Unicode Win32 API. Conversion on a large scale is quite costly, and may not be something the source-o-money is prepared to pay for.
As for new code, well, there's so much example code out there using TCHAR that it may be easier to cut and paste, and there is in some cases some friction between WCHAR as wchar_t and WCHAR as unsigned short.
Who knows, maybe some day MS will add a UTF-32 data type under TCHAR?

Actually, the unicode versions of functions were introduced with Win32 in 1993 with Windows NT 3.1. In fact, on the NT based oses, almost all the *A functions just convert to Unicode and call the *W version internally. Also, support for the *W functions on 9x does exist through Microsoft Layer for Unicode.
For new programs, I would definately recommend using the TCHAR macros or WCHARs directly. I doubt MS will be adding support for any other character sizes during NT's lifetime. For existing code bases, I guess it would depend on how important it is to support Unicode vs cost of fixing it. The *A functions need to stay in Win32 forever for backward compatibility.

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.

I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!

The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.

After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}

This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!

If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.

I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?

Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio