CGShadingGetBounds() signature? - macos

I am trying to get the signature of method CGShadingGetBounds()?
I tried, CG_EXTERN CGRect CGShadingGetBounds(CGShadingRef); but it does not seem to be a case.
Can someone help figure out the signature?
Below is the disassembly.
__text:000000000016BB76 public _CGShadingGetBounds
__text:000000000016BB76 _CGShadingGetBounds proc near ; CODE XREF: _log_LogShading+1B8p
__text:000000000016BB76 ; _dlr_DrawShading+1FEp ...
__text:000000000016BB76 push rbp
__text:000000000016BB77 mov rbp, rsp
__text:000000000016BB7A mov rax, rdi
__text:000000000016BB7D cmp byte ptr [rsi+28h], 0
__text:000000000016BB81 jz short loc_16BBAC
__text:000000000016BB83 movsd xmm0, qword ptr [rsi+30h]
__text:000000000016BB88 movsd qword ptr [rdi], xmm0
__text:000000000016BB8C movsd xmm0, qword ptr [rsi+38h]
__text:000000000016BB91 movsd qword ptr [rdi+8], xmm0
__text:000000000016BB96 movsd xmm0, qword ptr [rsi+40h]
__text:000000000016BB9B movsd qword ptr [rdi+10h], xmm0
__text:000000000016BBA0 movsd xmm0, qword ptr [rsi+48h]
__text:000000000016BBA5
__text:000000000016BBA5 loc_16BBA5: ; CODE XREF: _CGShadingGetBounds+5Ej
__text:000000000016BBA5 movsd qword ptr [rdi+18h], xmm0
__text:000000000016BBAA pop rbp
__text:000000000016BBAB retn
__text:000000000016BBAC ; ---------------------------------------------------------------------------
__text:000000000016BBAC
__text:000000000016BBAC loc_16BBAC: ; CODE XREF: _CGShadingGetBounds+Bj
__text:000000000016BBAC lea rcx, _CGRectInfinite
__text:000000000016BBB3 movsd xmm0, qword ptr [rcx]
__text:000000000016BBB7 movsd xmm1, qword ptr [rcx+8]
__text:000000000016BBBC movsd qword ptr [rdi], xmm0
__text:000000000016BBC0 movsd qword ptr [rdi+8], xmm1
__text:000000000016BBC5 movsd xmm0, qword ptr [rcx+10h]
__text:000000000016BBCA movsd qword ptr [rdi+10h], xmm0
__text:000000000016BBCF movsd xmm0, qword ptr [rcx+18h]
__text:000000000016BBD4 jmp short loc_16BBA5
__text:000000000016BBD4 _CGShadingGetBounds endp
My aim is to identify the bounds in which shading is going to happen.

I believe the signature you mentioned
CG_EXTERN CGRect CGShadingGetBounds(CGShadingRef);
is correct. For example if you try to reconstruct such function with a custom object, like this:
typedef struct
{
long a1, a2, a3, a4, a5;
char b6;
CGRect r;
} MyObj;
CGRect ReconstructFunc(MyObj *o)
{
if (o->b6) return o->r;
return CGRectNull;
}
of course, this does something different, but the "quick" path (where b6 is non-zero) is very very similar to the original function, in both assembly and in behaviour:
pushq %rbp
movq %rsp, %rbp
movq %rdi, %rax
cmpb $0, 40(%rsi)
je LBB0_2
movq 72(%rsi), %rcx
movq %rcx, 24(%rax)
movq 64(%rsi), %rcx
movq %rcx, 16(%rax)
movq 48(%rsi), %rcx
movq 56(%rsi), %rdx
movq %rdx, 8(%rax)
movq %rcx, (%rax)
popq %rbp
ret
... (continues)
This is basically the same at the assembly you posted. It also implies some "convention" obj-c and Mac GCC uses for compiling methods with CGRect structs. According to the x64 ABI parameters are passed in these registers: RDI, RSI, RDX, (and more). If you take a look at the first two, RDI and RSI, they clearly contains arguments, first one is a pointer to the output struct (CGRect), second one is the opaque struct (CGShadingReg).
Thus I believe that GCC on Mac translates this:
CGRect myrect = MyFuncReturningRect(param);
into this:
CGRect myrect;
MyFuncReturningRect(&myrect, param);
Anyway to sum it all up, I strongly believe your guessed signature is correct. If the function doesn't return values you expect, it is caused by some other factors (probably by the byte ptr [rsi+28h] value, which must be non-null to get some non-dummy information).

Related

Why does LLVM appear to ignore Rust's assume intrinsic?

LLVM appears to ignore core::intrinsics::assume(..) calls. They do end up in the bytecode, but don't change the resulting machine code. For example take the following (nonsensical) code:
pub fn one(xs: &mut Vec<i32>) {
if let Some(x) = xs.pop() {
xs.push(x);
}
}
This compiles to a whole lot of assembly:
example::one:
push rbp
push r15
push r14
push r12
push rbx
mov rbx, qword ptr [rdi + 16]
test rbx, rbx
je .LBB0_9
mov r14, rdi
lea rsi, [rbx - 1]
mov qword ptr [rdi + 16], rsi
mov rdi, qword ptr [rdi]
mov ebp, dword ptr [rdi + 4*rbx - 4]
cmp rsi, qword ptr [r14 + 8]
jne .LBB0_8
lea rax, [rsi + rsi]
cmp rax, rbx
cmova rbx, rax
mov ecx, 4
xor r15d, r15d
mov rax, rbx
mul rcx
mov r12, rax
setno al
jo .LBB0_11
mov r15b, al
shl r15, 2
test rsi, rsi
je .LBB0_4
shl rsi, 2
mov edx, 4
mov rcx, r12
call qword ptr [rip + __rust_realloc#GOTPCREL]
mov rdi, rax
test rax, rax
je .LBB0_10
.LBB0_7:
mov qword ptr [r14], rdi
mov qword ptr [r14 + 8], rbx
mov rsi, qword ptr [r14 + 16]
.LBB0_8:
or ebp, 1
mov dword ptr [rdi + 4*rsi], ebp
add qword ptr [r14 + 16], 1
.LBB0_9:
pop rbx
pop r12
pop r14
pop r15
pop rbp
ret
.LBB0_4:
mov rdi, r12
mov rsi, r15
call qword ptr [rip + __rust_alloc#GOTPCREL]
mov rdi, rax
test rax, rax
jne .LBB0_7
.LBB0_10:
mov rdi, r12
mov rsi, r15
call qword ptr [rip + alloc::alloc::handle_alloc_error#GOTPCREL]
ud2
.LBB0_11:
call qword ptr [rip + alloc::raw_vec::capacity_overflow#GOTPCREL]
ud2
Now we could introduce the assumption that xs is not full (at capacity) after
the pop() (this is nightly only):
#![feature(core_intrinsics)]
pub fn one(xs: &mut Vec<i32>) {
if let Some(x) = xs.pop() {
unsafe {
core::intrinsics::assume(xs.len() < xs.capacity());
}
xs.push(x);
}
}
Yet despite the assume showing up in the LLVM bytecode, the assembly is
unchanged. If however, we use core::hint::unreachable_unchecked() to create
a diverging path in the non-assumed case, such as:
pub fn one(xs: &mut Vec<i32>) {
if let Some(x) = xs.pop() {
if xs.len() >= xs.capacity() {
unsafe { core::hint::unreachable_unchecked() }
}
xs.push(x);
}
}
We get the following:
example::one:
mov rax, qword ptr [rdi + 16]
test rax, rax
je .LBB0_2
mov qword ptr [rdi + 16], rax
.LBB0_2:
ret
Which is essentially a no-op, but not too bad. Of course, we could have left the value in place by using:
pub fn one(xs: &mut Vec<i32>) {
xs.last_mut().map(|_e| ());
}
Which compiles down to what we'd expect:
example::one:
ret
Why does LLVM appear to ignore the assume intrinsic?
This now compiles to just a ret on recent versions of rustc due to improvements in rustc and LLVM. LLVM ignored the intrinsic because it wasn't able to optimize it before, but now it has the ability to optimize this better.

Import IORegistryEntrySearchCFProperty from Macapi.IOKit in Delphi for OSX64

I used this import definition in OSX32 successfully:
uses
MacApi.ObjectiveC,
MacApi.Foundation,
Macapi.CoreFoundation,
Macapi.Mach,
Macapi.IOKit;
type
io_iterator_t = io_object_t;
io_name_t = array[0..127] of AnsiChar;
function IORegistryEntrySearchCFProperty(entry: io_registry_entry_t;
plane: io_name_t; key: CFStringRef; allocator: CFAllocatorRef;
options: IOOptionBits): CFTypeRef; cdecl;
external libIOKit name _PU + 'IORegistryEntrySearchCFProperty';
function IOServiceGetMatchingServices(masterPort: mach_port_t;
matching: CFDictionaryRef; var existing: io_iterator_t): kern_return_t; cdecl;
external libIOKit name _PU + 'IOServiceGetMatchingServices';
function IOIteratorNext(name: io_iterator_t): io_object_t; cdecl;
external libIOKit name _PU + 'IOIteratorNext';
const
kIOSerialBSDServiceValue = 'IOSerialBSDClient';
kIOSerialBSDTypeKey = 'IOSerialBSDClientType';
kIOSerialBSDModemType = 'IOModemSerialStream';
kIOUSBDeviceClassName = 'IOUSBDevice';
kIOCalloutDeviceKey = 'IOCalloutDevice';
kIOTTYDeviceKey = 'IOTTYDevice';
kIOServicePlane = 'IOService';
kUSBInterfaceNumber = 'bInterfaceNumber';
kUSBVendorID = 'idVendor';
kUSBProductID = 'idProduct';
kIORegistryIterateRecursively = $00000001;
kIORegistryIterateParents = $00000002;
Since the migration to OSX64 I get a read access violation within the function IORegistryEntrySearchCFProperty.
From my point of view, there is no change in the parameters of IORegistryEntrySearchCFProperty from 32bit to 64bit.
The API function is used to read out vendor ID and the product ID of a USB device:
function TSerialInterface.SearchUSBSerialDevice(const Service: string; VendorID, ProductID: cardinal): integer;
var
MatchingDictionary: CFMutableDictionaryRef;
Iter: io_iterator_t;
USBRef: io_service_t;
ret: kern_return_t;
ResAsCFString: CFTypeRef;
aBsdPath: PAnsiChar;
Bsd: array[0..1024] of AnsiChar;
sBsd: string;
VID, PID: Int32;
begin
result := 0;
MatchingDictionary := IOServiceMatching(kIOSerialBSDServiceValue);
ret := IOServiceGetMatchingServices(kIOMasterPortDefault,
CFDictionaryRef(MatchingDictionary), Iter);
if (ret = KERN_SUCCESS) and (Iter <> 0) then
begin
try
repeat
USBRef := IOIteratorNext(Iter);
if USBRef <> 0 then
begin
// USB device found
Bsd[0] := #0;
VID := 0;
PID := 0;
ResAsCFString := IORegistryEntryCreateCFProperty(USBRef,
CFSTR(kIOCalloutDeviceKey), kCFAllocatorDefault, 0);
if assigned(ResAsCFString) then
begin
aBsdPath := CFStringGetCStringPtr(ResAsCFString,
kCFStringEncodingASCII);
if assigned(aBsdPath) then
sBsd := string(aBsdPath)
else if CFStringGetCString(ResAsCFString, #Bsd[0], sizeof(Bsd),
kCFStringEncodingASCII) then
sBsd := string(Bsd)
else
sBsd := ''; // Invalid device path
end;
ResAsCFString := IORegistryEntrySearchCFProperty(USBRef,
kIOServicePlane, CFSTR(kUSBVendorID), kCFAllocatorDefault,
kIORegistryIterateRecursively + kIORegistryIterateParents);
if assigned(ResAsCFString) then
if not CFNumberGetValue(ResAsCFString, kCFNumberIntType, #VID) then
VID := 0;
ResAsCFString := IORegistryEntrySearchCFProperty(USBRef,
kIOServicePlane, CFSTR(kUSBProductID), kCFAllocatorDefault,
kIORegistryIterateRecursively + kIORegistryIterateParents);
if assigned(ResAsCFString) then
if not CFNumberGetValue(ResAsCFString, kCFNumberIntType, #PID) then
PID := 0;
Log.d(name + ': USBDevice "' + sBsd + '" VID/PID: ' + IntToHex(VID) + '/' + IntToHex(PID));
end;
until USBRef = 0;
finally
IOObjectRelease(Iter);
end;
end;
I have implemented the above function also with C in XCode and can run this code without any troubles.
At exception (EAccessViolation: Access violation at address 00007FFF3A929716, accessing adress 000000000000000000) I see the following call stack in the Delphi IDE:
*System._DbgExcNotify(int, void*, System.SmallString<(unsigned char)255>*, void*, void*)(0,0x00000001039035e0,0x00000001012fa22e,0x00007fff3a929716,0x0000000000000000)
System.NotifyReRaise(System.TObject*, void*)(0x00000001039035e0,0x00007fff3a929716)
System._RaiseAtExcept(System.TObject*, void*)(0x00000001039035e0,0x00007fff3a929716)
System.Internal.Excutils.SignalConverter(NativeUInt, NativeUInt, NativeUInt)(140734176073494,0,11)
:000000010003CDE0 System::Internal::Excutils::GetExceptionObject(NativeUInt, NativeUInt, unsigned long)
:00007FFF3D27B2A0 IORegistryEntrySearchCFProperty
Serialinterface.TSerialInterface.SearchComPortForUSB(System.UnicodeString, unsigned int, unsigned int)(0x000000020978e5e0,'USBser',2283,65287)*
For deeper analyze, I have added here the generated assembler code of a small test procedure that calls IORegistryEntrySearchCFProperty.
Object Pascal in Unit3.pas:
procedure Test
var
USBRef: io_service_t;
key: CFStringRef;
allocator: CFAllocatorRef;
ResAsCFString: CFTypeRef;
begin
USBRef := 0;
key := CFSTR(kUSBVendorID);
allocator := kCFAllocatorDefault;
ResAsCFString := IORegistryEntrySearchCFProperty(USBRef,
kIOServicePlane, key, allocator, 0);
end;
The Delphi debugger CPU view shows the disassembled x64 code:
Unit3.pas.38: USBRef := 0;
000000010079CC43 C745FC00000000 mov dword ptr [rbp - 0x4], 0x0
Unit3.pas.39: key := CFSTR(kUSBVendorID);
000000010079CC4A 488D355B881000 lea rsi, [rip + 0x10885b]; __unnamed_1 + 12
000000010079CC51 488D7DD8 lea rdi, [rbp - 0x28]
000000010079CC55 E8E69987FF call 0x100016640; System.UTF8Encode(System.UnicodeString) at System.pas:39603
000000010079CC5A EB00 jmp 0x10079cc5c; <+44> at Unit3.pas:39
000000010079CC5C 488B7DD8 mov rdi, qword ptr [rbp - 0x28]
000000010079CC60 E85B9B87FF call 0x1000167c0; System._LStrToPChar(System.AnsiStringT<(unsigned short)0>) at System.pas:28783
000000010079CC65 48898538FFFFFF mov qword ptr [rbp - 0xc8], rax
000000010079CC6C EB00 jmp 0x10079cc6e; <+62> at Unit3.pas:39
000000010079CC6E 488BBD38FFFFFF mov rdi, qword ptr [rbp - 0xc8]
000000010079CC75 E8767789FF call 0x1000343f0; Macapi::Corefoundation::__CFStringMakeConstantString(char*)
000000010079CC7A 48898530FFFFFF mov qword ptr [rbp - 0xd0], rax
000000010079CC81 EB00 jmp 0x10079cc83; <+83> at Unit3.pas:39
000000010079CC83 488B8530FFFFFF mov rax, qword ptr [rbp - 0xd0]
000000010079CC8A 488945F0 mov qword ptr [rbp - 0x10], rax
Unit3.pas.40: allocator := kCFAllocatorDefault;
000000010079CC8E E8AD7A89FF call 0x100034740; Macapi.Corefoundation.kCFAllocatorDefault() at CFBaseImpl.inc:65
000000010079CC93 48898528FFFFFF mov qword ptr [rbp - 0xd8], rax
000000010079CC9A EB00 jmp 0x10079cc9c; <+108> at Unit3.pas:40
000000010079CC9C 488B8528FFFFFF mov rax, qword ptr [rbp - 0xd8]
000000010079CCA3 488945E8 mov qword ptr [rbp - 0x18], rax
Unit3.pas.41: ResAsCFString := IORegistryEntrySearchCFProperty(USBRef,
000000010079CCA7 8B7DFC mov edi, dword ptr [rbp - 0x4]
000000010079CCAA 0F28057F881000 movaps xmm0, xmmword ptr [rip + 0x10887f]; __unnamed_2 + 112
000000010079CCB1 0F2945C0 movaps xmmword ptr [rbp - 0x40], xmm0
000000010079CCB5 0F280564881000 movaps xmm0, xmmword ptr [rip + 0x108864]; __unnamed_2 + 96
000000010079CCBC 0F2945B0 movaps xmmword ptr [rbp - 0x50], xmm0
000000010079CCC0 0F280549881000 movaps xmm0, xmmword ptr [rip + 0x108849]; __unnamed_2 + 80
000000010079CCC7 0F2945A0 movaps xmmword ptr [rbp - 0x60], xmm0
000000010079CCCB 0F28052E881000 movaps xmm0, xmmword ptr [rip + 0x10882e]; __unnamed_2 + 64
000000010079CCD2 0F294590 movaps xmmword ptr [rbp - 0x70], xmm0
000000010079CCD6 0F280513881000 movaps xmm0, xmmword ptr [rip + 0x108813]; __unnamed_2 + 48
000000010079CCDD 0F294580 movaps xmmword ptr [rbp - 0x80], xmm0
000000010079CCE1 0F2805F8871000 movaps xmm0, xmmword ptr [rip + 0x1087f8]; __unnamed_2 + 32
000000010079CCE8 0F298570FFFFFF movaps xmmword ptr [rbp - 0x90], xmm0
000000010079CCEF 0F2805DA871000 movaps xmm0, xmmword ptr [rip + 0x1087da]; __unnamed_2 + 16
000000010079CCF6 0F298560FFFFFF movaps xmmword ptr [rbp - 0xa0], xmm0
000000010079CCFD 0F2805BC871000 movaps xmm0, xmmword ptr [rip + 0x1087bc]; __unnamed_2
000000010079CD04 0F298550FFFFFF movaps xmmword ptr [rbp - 0xb0], xmm0
000000010079CD0B 488B75F0 mov rsi, qword ptr [rbp - 0x10]
000000010079CD0F 488B55E8 mov rdx, qword ptr [rbp - 0x18]
000000010079CD13 4889E1 mov rcx, rsp
000000010079CD16 0F2845C0 movaps xmm0, xmmword ptr [rbp - 0x40]
000000010079CD1A 0F114170 movups xmmword ptr [rcx + 0x70], xmm0
000000010079CD1E 0F2845B0 movaps xmm0, xmmword ptr [rbp - 0x50]
000000010079CD22 0F114160 movups xmmword ptr [rcx + 0x60], xmm0
000000010079CD26 0F2845A0 movaps xmm0, xmmword ptr [rbp - 0x60]
000000010079CD2A 0F114150 movups xmmword ptr [rcx + 0x50], xmm0
000000010079CD2E 0F284590 movaps xmm0, xmmword ptr [rbp - 0x70]
000000010079CD32 0F114140 movups xmmword ptr [rcx + 0x40], xmm0
000000010079CD36 0F288550FFFFFF movaps xmm0, xmmword ptr [rbp - 0xb0]
000000010079CD3D 0F288D60FFFFFF movaps xmm1, xmmword ptr [rbp - 0xa0]
000000010079CD44 0F289570FFFFFF movaps xmm2, xmmword ptr [rbp - 0x90]
000000010079CD4B 0F285D80 movaps xmm3, xmmword ptr [rbp - 0x80]
000000010079CD4F 0F115930 movups xmmword ptr [rcx + 0x30], xmm3
000000010079CD53 0F115120 movups xmmword ptr [rcx + 0x20], xmm2
000000010079CD57 0F114910 movups xmmword ptr [rcx + 0x10], xmm1
000000010079CD5B 0F1101 movups xmmword ptr [rcx], xmm0
000000010079CD5E 31C9 xor ecx, ecx
> Register content here
> RBP: 00007FFEEFBFEA80
> RSP: 00007FFEEFBFEA00
> Memory content here
> 00007FFEEFBFEA00 49 4F 53 65 72 76 69 63 IOServic
> 00007FFEEFBFEA08 65 00 00 00 00 00 00 00 e.......
000000010079CD60 E82BFEFFFF call 0x10079cb90; Macapi::Iokit2::IORegistryEntrySearchCFProperty(unsigned int, System::StaticArray<char, 128>, __CFString*, __CFAllocator*, unsigned int)
> Access violation at address 00007FFF2ADD716, accessing address 0
000000010079CD65 48898520FFFFFF mov qword ptr [rbp - 0xe0], rax
000000010079CD6C EB00 jmp 0x10079cd6e; <+318> at Unit3.pas:41
000000010079CD6E 488B8520FFFFFF mov rax, qword ptr [rbp - 0xe0]
000000010079CD75 488945E0 mov qword ptr [rbp - 0x20], rax
000000010079CD79 EB37 jmp 0x10079cdb2; <+386> at Unit3.pas:41
000000010079CD7B 89D1 mov ecx, edx
000000010079CD7D 48898540FFFFFF mov qword ptr [rbp - 0xc0], rax
000000010079CD84 898D48FFFFFF mov dword ptr [rbp - 0xb8], ecx
000000010079CD8A 488D7DD8 lea rdi, [rbp - 0x28]
000000010079CD8E E81D9F87FF call 0x100016cb0; System._LStrClr(void*) at System.pas:25402
000000010079CD93 8B8D48FFFFFF mov ecx, dword ptr [rbp - 0xb8]
000000010079CD99 488BBD40FFFFFF mov rdi, qword ptr [rbp - 0xc0]
000000010079CDA0 48898518FFFFFF mov qword ptr [rbp - 0xe8], rax
000000010079CDA7 898D14FFFFFF mov dword ptr [rbp - 0xec], ecx
000000010079CDAD E8863B0100 call 0x1007b0938; symbol stub for: _Unwind_Resume
000000010079CDB2 488D45D8 lea rax, [rbp - 0x28]
Unit3.pas.47: end;
000000010079CDB6 4889C7 mov rdi, rax
000000010079CDB9 E8F29E87FF call 0x100016cb0; System._LStrClr(void*) at System.pas:25402
000000010079CDBE 48898508FFFFFF mov qword ptr [rbp - 0xf8], rax
000000010079CDC5 4881C480010000 add rsp, 0x180
000000010079CDCC 5D pop rbp
000000010079CDCD C3 ret
Do I have shown all concerned registers? (Unfortunately, I'm too far away from understanding the details of this x64 code).
Compared to the following C code written in XCode:
void Test(void)
{
io_service_t usbRef = 0;
CFStringRef key = CFSTR("idVendor");
CFAllocatorRef allocator = kCFAllocatorDefault;
CFTypeRef cf_vendor;
cf_vendor = IORegistryEntrySearchCFProperty(usbRef, kIOServicePlane, key, allocator, 0);
}
Generates this x64 assembly code:
0x100001c40 <+0>: pushq %rbp
0x100001c41 <+1>: movq %rsp, %rbp
0x100001c44 <+4>: subq $0x20, %rsp
0x100001c48 <+8>: xorl %r8d, %r8d
0x100001c4b <+11>: movq 0x3c6(%rip), %rax ; (void *)0x00007fff2fc755f0: kCFAllocatorDefault
0x100001c52 <+18>: leaq 0x4f7(%rip), %rcx ; #"idVendor"
0x100001c59 <+25>: movl $0x0, -0x4(%rbp)
0x100001c60 <+32>: movq %rcx, -0x10(%rbp)
0x100001c64 <+36>: movq (%rax), %rax
0x100001c67 <+39>: movq %rax, -0x18(%rbp)
-> 0x100001c6b <+43>: movl -0x4(%rbp), %edi
0x100001c6e <+46>: movq -0x10(%rbp), %rdx
0x100001c72 <+50>: movq -0x18(%rbp), %rcx
0x100001c76 <+54>: leaq 0x263(%rip), %rsi ; "IOService"
0x100001c7d <+61>: callq 0x100001d12 ; symbol stub for: IORegistryEntrySearchCFProperty
0x100001c82 <+66>: movq %rax, -0x20(%rbp)
0x100001c86 <+70>: addq $0x20, %rsp
0x100001c8a <+74>: popq %rbp
0x100001c8b <+75>: retq
Compared to the x86 code generated by Delphi 32bit compiler of the pascal test procedure which works as expected:
Unit3.pas.38: USBRef := 0;
004BB6E1 33C0 xor eax,eax
004BB6E3 8945FC mov [ebp-$04],eax
Unit3.pas.39: key := CFSTR(kUSBVendorID);
004BB6E6 6810000000 push $00000010
004BB6EB 55 push ebp
004BB6EC 68EDFEEFBE push $beeffeed
004BB6F1 83C4F4 add esp,-$0c
004BB6F4 83C4FC add esp,-$04
004BB6F7 8D55E8 lea edx,[ebp-$18]
004BB6FA 8D83A0B74B00 lea eax,[ebx+Test + $DC]
004BB700 E8276EB6FF call UTF8Encode
004BB705 83C404 add esp,$04
004BB708 8B45E8 mov eax,[ebp-$18]
004BB70B 83C4FC add esp,-$04
004BB70E E85904B6FF call #LStrToPChar
004BB713 83C404 add esp,$04
004BB716 50 push eax
004BB717 E8A0AD4400 call $009064bc
004BB71C 83C41C add esp,$1c
004BB71F FE4424F4 inc byte ptr [esp-$0c]
004BB723 8945EC mov [ebp-$14],eax
004BB726 8B45EC mov eax,[ebp-$14]
004BB729 8945F8 mov [ebp-$08],eax
Unit3.pas.40: allocator := kCFAllocatorDefault;
004BB72C 83C4F4 add esp,-$0c
004BB72F E8602EB7FF call kCFAllocatorDefault
004BB734 83C40C add esp,$0c
004BB737 8945F4 mov [ebp-$0c],eax
Unit3.pas.41: ResAsCFString := IORegistryEntrySearchCFProperty(USBRef,
004BB73A 6820000000 push $00000020
004BB73F 55 push ebp
004BB740 68EDFEEFBE push $beeffeed
004BB745 83C4F4 add esp,-$0c
004BB748 6A00 push $00
004BB74A 8B45F4 mov eax,[ebp-$0c]
004BB74D 50 push eax
004BB74E 8B45F8 mov eax,[ebp-$08]
004BB751 50 push eax
004BB752 8D83B4B74B00 lea eax,[ebx+Test + $F0]
> Register content here
> EAX: 004BB7B4
>Memory content here:
> 004BB7B4 49 4F 53 65 72 76 69 63 IOServic
> 004BB7BC 65 00 00 00 00 00 00 00 e.......
004BB758 50 push eax
004BB759 8B45FC mov eax,[ebp-$04]
004BB75C 50 push eax
004BB75D E8BEA84400 call $00906020
004BB762 83C42C add esp,$2c
004BB765 FE4424F4 inc byte ptr [esp-$0c]
004BB769 8945F0 mov [ebp-$10],eax
Unit3.pas.47: end;
004BB76C 688FB74B00 push $004bb78f
004BB771 011C24 add [esp],ebx
004BB774 8D45E8 lea eax,[ebp-$18]
004BB777 83C4F8 add esp,-$08
004BB77A E86DF7B5FF call #LStrClr
004BB77F 83C408 add esp,$08
004BB782 C3 ret
Maybe someone can give me a tip, what I have to pay attention to here.
The issue was in the declaration of the type io_name_t. For the Delphi MacOS64 in the procedure declaration, an array of AnsiChar is no longer identical to a PAnsiChar. The following solution works now:
type
io_name_t = PAnsiChar;
function IORegistryEntrySearchCFProperty(entry: io_registry_entry_t;
plane: io_name_t; key: CFStringRef; allocator: CFAllocatorRef;
options: IOOptionBits): CFTypeRef; cdecl;
external libIOKit name _PU + 'IORegistryEntrySearchCFProperty';

What exactly is GCC's auto-vectorized SSE2 implementation of sum += 1..n doing?

When GCC 8.3 for x86-64 with -O3 option is fed this small C function
int sum(int n) {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
it produces the following assembly (courtesy of godbolt):
sum:
test edi, edi
jle .L8
lea eax, [rdi-1]
cmp eax, 17
jbe .L9
mov edx, edi
movdqa xmm1, XMMWORD PTR .LC0[rip]
xor eax, eax
pxor xmm0, xmm0
movdqa xmm2, XMMWORD PTR .LC1[rip]
shr edx, 2
.L4:
add eax, 1
paddd xmm0, xmm1
paddd xmm1, xmm2
cmp eax, edx
jne .L4
movdqa xmm1, xmm0
mov ecx, edi
psrldq xmm1, 8
and ecx, -4
paddd xmm0, xmm1
lea edx, [rcx+1]
movdqa xmm1, xmm0
psrldq xmm1, 4
paddd xmm0, xmm1
movd eax, xmm0
cmp edi, ecx
je .L13
.L7:
add eax, edx
add edx, 1
cmp edi, edx
jge .L7
ret
.L13:
ret
.L8:
xor eax, eax
ret
.L9:
mov edx, 1
xor eax, eax
jmp .L7
.LC0:
.long 1
.long 2
.long 3
.long 4
.LC1:
.long 4
.long 4
.long 4
.long 4
I understand that for values of n less than 19, a completely unoptmized loop (code at .L9 and .L7) is used, but I can't make heads nor tails of what is happening for larger values of n — could someone explain it?
Clang, on the other hand, simply calculates (n-1)*(n-2)/2 + 2*n - 1, which is a slighlty more roundabout way of calculating n*(n+1)/2 — perhaps to prevent some problems with signed overflow — which seems to be a much more effective way to optimize this loop.

Replacing #pragma omp atomic with c++ atomics

I'm replacing some OpenMP code with standard C++11/C++14 atomics/thread support. Here is the OpenMP minimal code example:
#include <vector>
#include <cstdint>
void omp_atomic_add(std::vector<std::int64_t> const& rows,
std::vector<std::int64_t> const& cols,
std::vector<double>& values,
std::size_t const row,
std::size_t const col,
double const value)
{
for (auto i = rows[row]; i < rows[row+1]; ++i)
{
if (cols[i] == col)
{
#pragma omp atomic
values[i] += value;
return;
}
}
}
The code updates a CSR matrix format and occurs in a hot path for scientific computation. It is technically possible to use a std::mutex but the values vector can have millions of elements and is accessed many times more than that so a std::mutex is too heavy.
Checking the assembly https://godbolt.org/g/nPE9Dt, it seems to use CAS (with the disclaimer my atomic and assembly knowledge is severely limited so my comments are likely incorrect):
mov rax, qword ptr [rdi]
mov rdi, qword ptr [rax + 8*rcx]
mov rax, qword ptr [rax + 8*rcx + 8]
cmp rdi, rax
jge .LBB0_6
mov rcx, qword ptr [rsi]
.LBB0_2: # =>This Inner Loop Header: Depth=1
cmp qword ptr [rcx + 8*rdi], r8
je .LBB0_3
inc rdi
cmp rdi, rax
jl .LBB0_2
jmp .LBB0_6
#### Interesting stuff happens from here onwards
.LBB0_3:
mov rcx, qword ptr [rdx] # Load values pointer into register
mov rax, qword ptr [rcx + 8*rdi] # Offset to value[i]
.LBB0_4: # =>This Inner Loop Header: Depth=1
movq xmm1, rax # Move value into floating point register
addsd xmm1, xmm0 # Add function arg to the value from the vector<double>
movq rdx, xmm1 # Move result to register
lock # x86 lock
cmpxchg qword ptr [rcx + 8*rdi], rdx # Compare exchange on the value in the vector
jne .LBB0_4 # If failed, go back to the top and try again
.LBB0_6:
ret
Is this possible to do using C++ atomics? The examples I've seen only use std::atomic<double> value{} and nothing in the context of accessing a value through a pointer.
You can create a std::vector<std::atomic<double>> but you cannot change its size.
The first thing I'd do is get gsl::span or write my own variant. Then gsl::span<std::atomic<double>> is a better model for values than std::vector<std::atomic<double>>.
Once we have done that, simply remove the #pragma omp atomic and your code is atomic in c++20. In c++17 and before you have to manually implement +=.
double old = values[i];
while(!values[i].compare_exchange_weak(old, old+value))
{}
Live example.
Clang 5 generates:
omp_atomic_add(std::vector<long, std::allocator<long> > const&, std::vector<long, std::allocator<long> > const&, std::vector<std::atomic<double>, std::allocator<std::atomic<double> > >&, unsigned long, unsigned long, double): # #omp_atomic_add(std::vector<long, std::allocator<long> > const&, std::vector<long, std::allocator<long> > const&, std::vector<std::atomic<double>, std::allocator<std::atomic<double> > >&, unsigned long, unsigned long, double)
mov rax, qword ptr [rdi]
mov rdi, qword ptr [rax + 8*rcx]
mov rax, qword ptr [rax + 8*rcx + 8]
cmp rdi, rax
jge .LBB0_6
mov rcx, qword ptr [rsi]
.LBB0_2: # =>This Inner Loop Header: Depth=1
cmp qword ptr [rcx + 8*rdi], r8
je .LBB0_3
inc rdi
cmp rdi, rax
jl .LBB0_2
jmp .LBB0_6
.LBB0_3:
mov rax, qword ptr [rdx]
mov rax, qword ptr [rax + 8*rdi]
.LBB0_4: # =>This Inner Loop Header: Depth=1
mov rcx, qword ptr [rdx]
movq xmm1, rax
addsd xmm1, xmm0
movq rsi, xmm1
lock
cmpxchg qword ptr [rcx + 8*rdi], rsi
jne .LBB0_4
.LBB0_6:
ret
which seems identical to my casual glance.
There is a proposal for atomic_view that lets you manipulate a non-atomic value through an atomic view. In general, C++ only lets you operate atomically on atomic data.

Find most significant DWORD in an DWORD array

I want to find the most significant DWORD which isn't equal to 0 in an DWORD array. The algorithm should be optimized for data sizes up to 128 byte.
I've made three different functions, which all returns the index of the specific DWORD.
unsigned long msb_msvc(long* dw, std::intptr_t n)
{
while( --n )
{
if( dw[n] )
break;
}
return n;
}
static inline unsigned long msb_386(long* dw, std::intptr_t n)
{
__asm
{
mov ecx, [dw]
mov eax, [n]
__loop: sub eax, 1
jz SHORT __exit
cmp DWORD PTR [ecx + eax * 4], 0
jz SHORT __loop
__exit:
}
}
static inline unsigned long msb_sse2(long* dw, std::intptr_t n)
{
__asm
{
mov ecx, [dw]
mov eax, [n]
test ecx, 0x0f
jnz SHORT __128_unaligned
__128_aligned:
cmp eax, 4
jb SHORT __64
sub eax, 4
movdqa xmm0, XMMWORD PTR [ecx + eax * 4]
pxor xmm1, xmm1
pcmpeqd xmm0, xmm1
pmovmskb edx, xmm0
not edx
and edx, 0xffff
jz SHORT __128_aligned
jmp SHORT __exit
__128_unaligned:
cmp eax, 4
jb SHORT __64
sub eax, 4
movdqu xmm0, XMMWORD PTR [ecx + eax * 4]
pxor xmm1, xmm1
pcmpeqd xmm0, xmm1
pmovmskb edx, xmm0
not edx
and edx, 0xffff
jz SHORT __128_unaligned
jmp SHORT __exit
__64:
cmp eax, 2
jb __32
sub eax, 2
movq mm0, MMWORD PTR [ecx + eax * 4]
pxor mm1, mm1
pcmpeqd mm0, mm1
pmovmskb edx, mm0
not edx
and edx, 0xff
emms
jz SHORT __64
jmp SHORT __exit
__32:
test eax, eax
jz SHORT __exit
xor eax, eax
jmp __leave ; retn
__exit:
bsr edx, edx
shr edx, 2
add eax, edx
__leave:
}
}
These function should be used, to preselect data which will be compared against each other. So, it needs to be performant.
Does anybody know a better algorithm?
I think you are just looking for the first non-zero word in a given array. I would definitely go with a simple loop written in C. If there's some reason why this is super performance critical, I would recommend you look in the larger context of your program and ask e.g. the question why you need to find the non-zero object from the array and why can't you know its location already.

Resources