It has been a major obstacle for decade. It was reported as impossible. Forum talks referred to issues related to setting and restoring GS. Wine HQ FAQ still refers to ABI incompatibility page which is not a live wiki page, but news archive link.
Wine 2.0 announced macOS 64-bit support. But… how? Isn't it something that all macOS hackers should know? Maybe some elegant (or dirty) trick interesting on its own for any x86-64 hacker.
The primary obstacle is a conflict over the GS segment base address (GS.base) maintained by the CPU under the control of the OS.
On 64-bit Windows, GS.base is used to hold the address of the Thread Environment Block (TEB) structure for each thread. Windows apps expect to access the TEB using %gs-relative addresses. This is hard-coded into the app code rather than being behind an API function.
On macOS, GS.base is used to hold the base of the thread-local storage area of the thread's struct _pthread, an internal implementation detail of the Pthreads implementation. It's less common for Mac apps to have hard-coded %gs-relative accesses baked into them, but some do and so do the system libraries.
On Linux, GS.base is available for 64-bit apps to use for their own purpose. So, there, Wine simply sets it using the OS-provided mechanism. Wine can't do that on macOS. Not only does the OS not provide any mechanism to do it but, if Wine could, it would break the system libaries. (It would also pose potential problems for the kernel on context switches and/or the kernel might fail to restore any value Wine might have set.)
The solution we figured out is only a partial solution. The most commonly accessed fields of the TEB structure are the "self" field (%gs:0x30) and a field for the thread-local storage implementation (%gs:0x58). Often, if apps need to access other fields, they first read the self field and then reference off of that.
On macOS, %gs:0x30 and %gs:0x58 correspond to particular slots of the thread-local storage area. They are in a part that's reserved to Apple (rather than, say, application uses). We found that one of those slots was unused. The other was used for the ttyname() function in the C library. As it happens, Wine never calls that function and there's little reason to expect any of the system libraries that it uses to do so, either.
So, Wine simply pokes the appropriate values at those %gs-relative locations. Therefore, when 64-bit Windows app code reads them, it gets what it needs. The actual TEB that Wine has allocated is located elsewhere (in heap-allocated memory), but apps find the address of the TEB in the place they expect to be the TEB self field, so they find it that way.
Apple has since graciously permanently reserved both of those slots for uses like Wine's. ttyname() now uses a different slot.
That said, as mentioned above, this solution is only partial. Some apps access other fields of the TEB directly using %gs-relative addresses at offsets other than 0x30 or 0x58. When they do so, they get junk values and/or overwrite values used by other parts of the system. So, Wine's support for 64-bit Windows apps is not complete on macOS. Some such apps will crash or otherwise misbehave. Luckily, it happens infrequently enough that it's not much of a problem in practice.
For reference, here are the commits that implement this solution:
http://source.winehq.org/git/wine.git/?a=commit;h=7501942008f91a9a137fe598ce5ce7cb47de5522
http://source.winehq.org/git/wine.git/?a=commit;h=3d8efb238808a519902e047d8673237debb0f0a2
Related
If this question deals with too basic a matter, please forgive me.
As a somewhat-close-to-beginner-level programmer, I really wonder about this--whether the underlying code of every win API function is compiled altogether at the time of writing an app, or whether the machine code for executing win APIs stays in the memory as part of the OS since the pc is booted up, and only the app uses them?
All the APIs for an OS are used by many apps by means of function call. So I thought that rather than making every individual app include the API machine code on their own, apps just contain the header or signature to call the APIs and the API machine code addresses are mapped when launching the app.
I am sorry that I failed to make this question succinct due to my poor English. I really would like to get your insights. Thank you.
The implementation for (most) API calls is provided by the system by way of compiled modules (Portable Executable images). Application code only contains enough information so that the system can identify and load the required modules, and resolve the respective imports.
As an example consider the following code that shows a message box, waits for it to close, and then exits the program:
#include <Windows.h>
int main()
{
::MessageBoxW(nullptr, L"Foo", L"Bar", MB_OK);
}
Given the function signature (declared in WinUser.h, which gets pulled in from Windows.h) the compiler can almost generate a call instruction. It knows the number of arguments, their expected types, and the order and location the callee expects them in. What's missing is the actual target address inside user32.dll, that's only known after a process was fully initialized, and had the user32.dll module mapped into its address space.
Clearly, the compiler cannot postpone code generation until after load time. It needs to generate a call instruction now. Since we know that "all problems in computer science can be solved by another level of indirection" that's what the compiler does, too: Instead of emitting a direct call instruction it generates an indirect call. The difference is that, while a direct call immediately needs to provide the target address, an indirect call can specify the address at which the target address is stored.
In x86 assembly, instead of having to say
call _MessageBoxW#16 ; uh-oh, not yet known
the compiler can conveniently delegate the call to the Import Address Table (IAT):
call dword ptr [__imp__MessageBoxW#16]
Disaster averted, we've bought us just enough time to fix things up before the code actually executes.
Once a process object is created the system hands over control to its primary thread to finish initialization. Part of that initialization is loading dependencies (such as user32.dll here). Once that has completed, the system finally knows the load address (and ultimately the address of imported symbols, such as _MessageBoxW#16), and can overwrite the IAT entry at address __imp__MessageBoxW#16 with the imported function address.
And that is approximately how the system provides implementations for system services without requiring client applications to know where (physically) they will find them.
I'm saying "approximately" because things are somewhat more involved in reality. If that is something you'll want to learn about, I'll leave it up to Raymond Chen. He has published a series of blog entries covering this topic in far more detail:
How were DLL functions exported in 16-bit Windows?
How were DLL functions imported in 16-bit Windows?
How are DLL functions exported in 32-bit Windows?
Exported functions that are really forwarders
Rethinking the way DLL exports are resolved for 32-bit Windows
Calling an imported function, the naive way
How a less naive compiler calls an imported function
Issues related to forcing a stub to be created for an imported function
What happens when you get dllimport wrong?
Names in the import library are decorated for a reason
Why can't I GetProcAddress a function I dllexport'ed?
I have a special disk image format (compressed and with extra information such as bad blocks) and would like to make that available as a block device (e.g. under /dev/rdisk) in macOS.
I am looking for pointers on how to write a read-only block level userspace driver.
I believe I need to do this with IOKit, but the docs are quite sparse in that area, and provide no sample code for this as far as I can tell.
I've had a look at Amit Singh's Mac OS Internals, but that only goes about filerting existing blocks that get routed through the added driver. But I need to read the data from a separate file, i.e. I need to use the file system, and I also want to do this in a userspace app if possible because making a kext is both hard to debug and is getting deprecated now, too.
Ideally, this should work on current macOS versions, including 10.15, but I'd also be happy with a solution that only works on older macOS versions, starting at 10.6.
Maybe I am still misunderstanding some things here. I was under the impression that it was possible even before 10.15 to write userspace IOKit drivers. But maybe that's not possible with Block Storage devices? Any clarifications would be welcome. I feel quite lost.
Update June 22, 2020
I also asked the question on the OSXFUSE support forum, and got a negative reply there.
Maybe I am still misunderstanding some things here. I was under the impression that it was possible even before 10.15 to write userspace IOKit drivers.
This is a true statement, but it does not mean it's possible to write all types of drivers in userspace, even with 10.15. The confusion likely stems from the fact that "IOKit" is used to mean subtly different things in different contexts:
Fundamentally, it is the OOP-based driver stack technology which is built into the XNU kernel (macOS, iOS, etc.), written in C++. Its core data structure is the I/O Kit registry, which is a directed graph of objects representing devices, device (sub-)functions, virtual device, service, drivers, user space client connections, etc. (most of it has properties of it, but nodes can technically have multiple "parent" nodes, and some actually do so on your typical Mac system)
The IOKit user space libraries. (a.k.a. IOKit.framework a.k.a. IOKitLib + various IOCFPlugins) These provide a view of the kernel's I/O Registry to user space programs, and allow them to perform certain interactions with IOKit objects. This is essentially limited to:
Iterating over the registry graph and matching services based on matching rules. Both return handles to specific I/O Registry entries to the user space program.
Getting properties on I/O Registry entries, and if the object explicitly supports it, setting (some) properties.
Creating a user client connection. This is the only way to create an entry in the I/O Registry from regular user space, and these are always IOUserClient subclasses, and only if the parent object specfically permits it.
The IOKit view available to DriverKit based System Extension processes (new in 10.15), which allows instantiation of entirely user space backed registry entry objects, for subclassing a (bounded) selection of super class types.
To answer the question of your specific case:
To implement a driver for a block storage device which will appear to the system as any other, you need to subclass IOStorage or one of its subclasses (typically IOBlockStorageDevice).
IOKit.framework allows you to create user space drivers which will only be used by that user space process or other user space processes. It is therefore not possible to use it to create drivers that will be used by some part of the kernel. For example, it is not possible to create a block device driver which will be used by the kernel's VFS system.
DriverKit currently (as of macOS 10.15.4 SDK) does not allow dext-backed subclasses of any of the storage stack's classes. So a dext will not solve the problem.
The only way to do what you're trying to do, as far as I'm aware, is to have a 2-part driver: a kext implementing an IOBlockStorageDevice subclass, as well as a service which offers a user client interface for some user space process to connect to for creating instances of block devices, and which will handle the I/O requests for the device. This is a sort of analog to the FUSE mechanism which works at the VFS layer instead of the block layer.This is exactly how macOS's built-in disk image mounting system works. Unfortunately, neither the kext nor user space parts of this are open source, documented, or a public API. So unless you reverse engineer it and find a boundary which doesn't enforce the requirement of an Apple code signature, you basically have to reimplement this.
I'm not currently aware of an existing open source system that does this, but I haven't looked very hard. Especially for read-only access, it shouldn't be terribly hard to do if you know what you're doing, but it comes with the usual downsides of developing and distributing kexts.
Note that you could also use FUSE to open your disk image if the block device it represents hosts a file system you wish to mount, and a FUSE implementation for that file system exists.
I know about Cygwin, and I know of its shortcomings. I also know about the slowness of fork, but not why on Earth it's not possible to work around that. I also know Cygwin requires a DLL. I also understand POSIX defines a whole environment (shell, etc...), that's not really what I care about here.
My question is asking if there is another way to tackle the problem. I see more and more of POSIX functionality being implemented by the MinGW projects, but there's no complete solution providing a full-blown (comparable to Linux/Mac/BSD implementation status) POSIX functionality.
The question really boils down to:
Can the Win32 API (as of MSVC20??) be efficiently used to provide a complete POSIX layer over the Windows API?
Perhaps this will turn out to be a full libc that only taps into the OS library for low-level things like filesystem access, threads, and process control. But I don't know exactly what else POSIX consists of. I doubt a library can turn Win32 into a POSIX compliant entiity.
POSIX <> Win32.
If you're trying to write apps that target POSIX, why are you not using some variant of *N*X? If you prefer to run Windows, you can run Linux/BSD/whatever inside Hyper-V/VMWare/Parallels/VirtualBox on your PC/laptop/etc.
Windows used to have a POSIX compliant environment that ran alongside the Win32 subsystem, but was discontinued after NT4 due to lack of demand. Microsoft bought Interix and released Services For Unix (SFU). While it's still available for download, SFU 3.5 is now deprecated and no longer developed or supported.
As to why fork is so slow, you need to understand that fork isn't just "Create a new process", it's "create a new process (itself an expensive operation) which is a duplicate of the calling process along with all memory".
In *N*X, the forked process is mapped to the same memory pages as the parent (i.e. is pretty quick) and is only given new pages as and when the forked process tried to modify any shared pages. This is known as copy on write. This is largely achievable because in UNIX, there is no hard barrier between the parent and forked processes.
In NT, on the other hand, all processes are separated by a barrier enforced by CPU hardware. In NT, the easiest way to spawn a parallel activity which has access to your process' memory and resources, is to create a thread. Threads run within the memory space of the creating process and have access to all of the process' memory and resources.
You can also share data between processes via various forms of IPC, RPC, Named Pipes, mailslots, memory-mapped files but each technique has its own complexities, performance characteristics, etc. Read this for more details.
Because it tries to mimic UNIX, CygWin's 'fork' operation creates a new child process (in its own isolated memory space) and has to duplicate every page of memory in the parent process within the newly forked child. This can be a very costly operation.
Again, if you want to write POSIX code, do so in *N*X, not NT.
How about this
Most of the Unix API is implemented by the POSIX.DLL dynamically loaded (shared) library. Programs linked with POSIX.DLL run under the Win32 subsystem instead of the POSIX subsystem, so programs can freely intermix Unix and Win32 library calls.
From http://en.wikipedia.org/wiki/UWIN
The UWIN environment may be what you're looking for, but note that it is hosted at research.att.com, while UWIN is distributed under a liberal license it is not the GNU license. Also, as it is research for att, and only 2ndarily something that they are distributing for use, there are a lot of issues with documentation.
See more info see my write-up as the last answer for Regarding 'for' loop in KornShell
Hmm main UWIN link is bad link in that post, try
http://www2.research.att.com/sw/download/
Also, You can look at
https://mailman.research.att.com/pipermail/uwin-users/
OR
https://mailman.research.att.com/pipermail/uwin-developers/
To get a sense of the features vs issues.
I hope this helps.
The question really boils down to: Can the Win32 API (as of MSVC20??)
be efficiently used to provide a complete POSIX layer over the Windows
API?
Short answer: No.
"Complete POSIX" means fork(), mmap(), signal() and such, and these are [almost] impossible to implement on NT.
To drive the point home: GNU Hurd has problems with fork() as well, because Hurd kernel is not POSIX.
NT is not POSIX too.
Another difference is persisence:
In POSIX-compliant systems it is possible to create system objects and leave them there. Examples of such objects are named pipes and shared memory objects (shms). You can create a named pipe or a shm, and leave it in the filesystem (or in a special filesystem-like place) where other processes will be able to access it. The downside is that a process might die and fail to clean up after itself, leaving unused objects behind (you know about zombie processes? same thing).
In NT every object is reference-counted, and is destroyed as soon as its last handle is closed. Files are among the few objects that persist.
Symlinks are a filesystem feature, and don't exactly depend on NT kernel, but current implementation (in Vista and later) is incapable of creating object-type-agnostic symlinks. That is, a symlink is either a file or a directory, and must link to either a file or a directory. If the target has wrong type, the symlink won't work. You can give it the right type if the target exists when you create the symlink, but POSIX requires that symlinks may be created without their target existing. I can't imagine a use-case for a symlink that points first to a file, then to a directory, but POSIX says that this should work, and if it doesn't, you're not completely POSIX-compliant. Or if your symlinking API/utility can be given an option that specifies the right type, when target doesn't exist, that also breaks POSIX compatibility.
It is possible to replicate some POSIX features to some degree (such as "integer descriptors from in a single namespace, referencing any I/O object, and being select()able" without sacrificing [much] performance, but that is still a major undertaking, and POSIX interface is really restrictive (that is, if you could just add one more argument to that function, it would have been possible to Do The Right Thing...but you couldn't, unless you want to throw POSIX compliance away).
Your best bet is to not to rely on POSIX features that are difficult to port to non-POSIX systems, or abstract in such a way that lower levels may have separate implementations for different OSes, and upper levels do not care about the details.
I was watching the WWDC 2009 Keynote and something someone said about Windows 7/Vista got me curious..
The speaker claimed that 7 was still a poor operating system because it still used the same technologies such as DLLs and the registry. How accurate are his claims and how different is OS X doing it? Even os x has dynamically loaded libraries right? I guess the Registry thing might have some weight..
Can anyone explain to me the differences in each OS' strategy?
I'm not trying to incite fanboys here or anything, I just want to know how both operating systems tackle problems in general..
Thanks,
kreb
Of course both operating systems have facilities for using DLLs (they're called dylibs or Frameworks on OS X depending on how they're packaged).dylibs are very much like DLLs--they are a dynamically linked library and as such there may be multiple versions of them floating around. Frameworks, on the other hand, are really a directory structure. They contain dynamically linked libraries (potentially multiple versions of them), resources, headers, documentation, etc. The dynamic linker on OS X automatically handles choosing the correct library version from the framework for each executable. The system appears to work better than Windows' DLL management which is, well, quite a mess still (of course, Windows' system is tied by legacy issues that Apple dropped when they moved to OS X). To be fair, Unix has had a solution to this problem for a long time, as well using symbolic links to link dylibs to their correct versioned implementation, allowing multiple installed versions.
There is no OS X equivalent of the Windows registry. This is good and bad. The good side is that it's much harder to corrupt an entire OS X system with a registry screw up. OS X instead stores configuration in many separate files, usually one or more per application, user, whatever. These files are generally a plist (an XML schema representing dictionaries, arrays, and primitive types) formatted file. The bad side is that, by retaining this Unix-y heritage, OS X doesn't have the same über-admin tools that can churn through the registry and do all sorts of crazy things.
DLLs
The major difference between OS X and Windows is that Windows historically tried to save space/memory by having everyone share code (i.e. you install one DLL, everyone can use it). Apple statically compiles (well, not really, but it may as well be) all of the non-system libraries into every application. Wastes disk space/memory, but makes app deployment way easier and no versioning issues.
Registry
OS X does have a registry, they're just flat files called plists, instead of a magic component that's mostly like a filesystem except where it's not. Apple's approach makes it easy to migrate settings from one machine to another, whereas Windows' approach is faster in-memory, and allows apps to easily "watch" a key without taking a big perf hit (i.e. one app changes a key and the other instantly knows about it).
In conclusion
The keynote presenter's full of it, 10.6 is mostly the same code as 10.5, which was mostly the same code as 10.4 et al, just like Win7 is mostly Vista, which is mostly Server '03, etc. There's far too much tested code in an operating system to throw it away every release, especially if you actually want your customers' apps to work.
DLL's are bad variations of libaries since they are unable to operate on their own, to use them a further wrapper executable is called(automatically), which adds unrequired overhead and makes it much harder to tell which libraries are actually in use. Another less important flaw, is the inability for systems to truely share a library.
*nix systems avoid this by having libraries exist on the top level running on their own or under a larger wrapper(like kde-init ), the libraries may be shared by any applications, meaning only a single copy of each library is required, and you may at any time kill a single library with ease as required.
The registry is a great idea, except for the fact that it is used for so much, almost anything you install will use the registry, and a corrupt registry and render your operating system almost completely useless until it's fixed.
This is avoided in *nix systems by having multiple different files for different content, drivers are refered to via Xorg's config file, installed applications will be written to their own database, and keys or identification will often be written into a directory, rather than a single all purpose file. This reduces the likely hood of a serious failure and means that at any time you can probably still repair the system. If Xorg becomes corrupt you just reconfigure it, if the installed applications database becomes corrupt you can repair or rebuild it, and should an applications individual settings directory become corrupt you need only reinstall one application(and most good commercial aps should have a way to repair this anyway)
I was curious as to how does one go about finding undocumented APIs in Windows.
I know the risks involved in using them but this question is focused towards finding them and not whether to use them or not.
Use a tool to dump the export table from a shared library (for example, a .dll such as kernel32.dll). You'll see the named entry points and/or the ordinal entry points. Generally for windows the named entry points are unmangled (extern "C"). You will most likely need to do some peeking at the assembly code and derive the parameters (types, number, order, calling convention, etc) from the stack frame (if there is one) and register usage. If there is no stack frame it is a bit more difficult, but still doable. See the following links for references:
http://www.sf.org.cn/symbian/Tools/symbian_18245.html
http://msdn.microsoft.com/en-us/library/31d242h4.aspx
Check out tools such as dumpbin for investigating export sections.
There are also sites and books out there that try to keep an updated list of undocumented windows APIs:
The Undocumented Functions
A Primer of the Windows Architecture
How To Find Undocumented Constants Used by Windows API Functions
Undocumented Windows
Windows API
Edit:
These same principles work on a multitude of operating systems however, you will need to replace the tool you're using to dump the export table. For example, on Linux you could use nm to dump an object file and list its exports section (among other things). You could also use gdb to set breakpoints and step through the assembly code of an entry point to determine what the arguments should be.
IDA Pro is your best bet here, but please please double please don't actually use them for anything ever.
They're internal because they change; they can (and do) even change as a result of a Hotfix, so you're not even guaranteed your undocumented API will work for the specific OS version and Service Pack level you wrote it for. If you ship a product like that, you're living on borrowed time.
Everybody here so far is missing some substantial functionality that comprises hugely un-documented portions of the Windows OS RPC . RPC (think rpcrt4.dll, lsass.exe, csrss.exe, etc...) operations occur very frequently across all subsystems, via LPC ports or other interfaces, their functionality is buried in the mysticism incantations of various type/sub-type/struct-typedef's etc... which are substantially more difficult to debug, due to the asynchronous nature or the fact that they are destine for process's which if you were to debug via single stepping or what have you, you would find the entire system lockup due to blocking keyboard or other I/O from being passed ;)
ReactOS is probably the most expedient way to investigate undocumented API. They have a fairly mature kernel and other executive's built up. IDA is fairly time-intensive and it's unlikely you will find anything the ReactOS people have not already.
Here's a blurb from the linked page;
ReactOS® is a free, modern operating
system based on the design of Windows®
XP/2003. Written completely from
scratch, it aims to follow the
Windows® architecture designed by
Microsoft from the hardware level
right through to the application
level. This is not a Linux based
system, and shares none of the unix
architecture.
The main goal of the
ReactOS project is to provide an
operating system which is binary
compatible with Windows. This will
allow your Windows applications and
drivers to run as they would on your
Windows system. Additionally, the look
and feel of the Windows operating
system is used, such that people
accustomed to the familiar user
interface of Windows® would find using
ReactOS straightforward. The ultimate
goal of ReactOS is to allow you to
remove Windows® and install ReactOS
without the end user noticing the
change.
When I am investigating some rarely seen Windows construct, ReactOS is often the only credible reference.
Look at the system dlls and what functions they export. Every API function, whether documented or not, is exported in one of them (user, kernel, ...).
For user mode APIs you can open Kernel32.dll User32.dll Gdi32.dll, specially ntdll.dll in dependancy walker and find all the exported APIs. But you will not have the documentation offcourse.
Just found a good article on Native APIS by Mark Russinovich