mmap() RWX page on MacOS (ARM64 architecture)? - macos

I've been trying to map a page that both writable AND executable.
mov x0, 0 // start address
mov x1, 4096 // length
mov x2, 7 // rwx
mov x3, 0x1001 // flags
mov x4, -1 // file descriptor
mov x5, 0 // offset
movl x16, 0x200005c // mmap
svc 0
This gives me a 0xD error code (EACCESS, which the documentation unhelpfully blames on an invalid file descriptor, although same documentation says to use '-1'). I think the code is correct, it returns a valid mmap if I just pass 'r--' for permissions.
I know the same code works in Catalina and x64 architecture. I tested the same error happens when SIP mode is disabled.
For more context, I'm trying to port a FORTH implementation to MacOs/ARM64, and this FORTH, like many others, heavily uses self modifying code/assembling code at runtime. And the code that is doing the assembling/compiling resides in the middle of the newly created code (in fact part the compiler will be generated in machine language as part of running FORTH), so it's very hard/infeasible to separate the FORTH JIT compiler (if you call it that) from the generated code.
Now, I'd really don't want to end up with the answer: "Apple thinks they know better than you, no FORTH for you!", but that is what it looks like so far. Thanks for any help!

You need to toggle the thread between being writable or executable, it can not be both at the same time. I think it is actually possible to do both with the same memory using 2 different threads but I haven't tried.
Before you write to the memory you mmap, call this:
pthread_jit_write_protect_np(0);
sys_icache_invalidate(addr, size);
Then when you are done writing to it you can switch back again like this:
pthread_jit_write_protect_np(1);
sys_icache_invalidate(addr, size);
This is the full code I am using right now
#include <stdio.h>
#include <sys/mman.h>
#include <pthread.h>
#include <libkern/OSCacheControl.h>
#include <stdlib.h>
#include <stdint.h>
uint32_t* c_get_memory(uint32_t size) {
int prot = PROT_READ | PROT_WRITE | PROT_EXEC;
int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_JIT;
int fd = -1;
int offset = 0;
uint32_t* addr = 0;
addr = (uint32_t*)mmap(0, size, prot, flags, fd, offset);
if (addr == MAP_FAILED){
printf("failure detected\n");
exit(-1);
}
pthread_jit_write_protect_np(0);
sys_icache_invalidate(addr, size);
return addr;
}
void c_jit(uint32_t* addr, uint32_t size) {
pthread_jit_write_protect_np(1);
sys_icache_invalidate(addr, size);
void (*foo)(void) = (void (*)())addr;
foo();
}

Related

Copying value from an x86 register to a memory location which is given by a pointer

Assuming I have a memory location that I want to copy data to, and I have that address in a pointer,
Is it possible to copy data at that location via MOV instruction and inline assembly.Or basically how would I copy data to a memory location via inline assembly.
In the code snippet I have provided below, I am just copying my value the the eax register then copying from eax register to some out value, whereas i would like to copy it to some memory address.
In short what is in the eax register I would like to store in the address pointed by outpointer
is it possible or is there a better way to do it in inline assembly?
#include <stdio.h>
#include <iostream>
using namespace std;
int func3(int parm)
{
int out =0;
// int *outpointer =&out;
//what is in the eax register I would like to store
//in the address pointed by outpointer
asm("mov %1 ,%%eax\n\t"
"mov %%eax, %0": "=r"(out) : "r"(parm));
// cout<<outpointer;
return out;
}
int main(int argc, char const *argv[])
{
int x{5};
int y{0};
y = func3(x);
cout<<y;
return 0;
}

Inline 64bit Assembly in 32bit GCC C Program

I'm compiling a 32 bit binary but want to embed some 64 bit assembly in it.
void method() {
asm("...64 bit assembly...");
}
Of course when I compile I get errors about referring to bad registers because the registers are 64 bit.
evil.c:92: Error: bad register name `%rax'
Is it possible to add some annotations so gcc will process the asm sections using the 64bit assembler instead. I have a workaround which is compile separately, map in a page with PROT_EXEC|PROT_WRITE and copy in my code but this is very awkward.
No, this isn't possible. You can't run 64-bit assembly from a 32-bit binary, as the processor will not be in long mode while running your program.
Copying 64-bit code to an executable page will result in that code being interpreted incorrectly as 32-bit code, which will have unpredictable and undesirable results.
Don't try to put 64-bit machine-code inside a compiler-generated function. It might work since the encoding for function prologue/epilogue is the same in 32 and 64-bit, but it would be cleaner to just have a separate block of 64-bit code.
The easiest thing is probably to assemble that block in a separate file, using GAS .code64 or NASM BITS 64 to get 64-bit code in an object file you can link into a 32-bit executable.
You said in a comment you're thinking of using this for a kernel exploit against a 64-bit kernel from a 32-bit user-space process, so you just need some code bytes in an executable part of your process's memory and a way to get a pointer to that block. This is certainly plausible; if you can gain control of the kernel's RIP from a 32-bit process, this is what you want, because kernel code will always be running in long mode.
If you were doing something with 64-bit userspace code in a process that started in 32-bit mode, you could maybe far jmp to the block of 64-bit code (as #RossRidge suggests), using a known value for the kernel's __USER_CS 64-bit code segment descriptor. syscall from 64-bit code should return in 64-bit mode, but if not, try the int 0x80 ABI. It always returns to the mode you were in, saving/restoring cs and ss along with rip and rflags. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?)
.rodata is part of the test segment of your executable, so just get the compiler to put bytes in a const array. Fun fact: const int main = 195; compiles to a program that exits without segfaulting, because 195 = 0xc3 = the x86 encoding for ret (and x86 is little-endian). For an arbitrary-length machine-code sequence, const char funcname[] = { 0x90, 0x90, ..., 0xc3 } will work. The const is necessary, otherwise it will go in .data (read/write/noexec) instead of .rodata.
You could use const char funcname[] __attribute__((section(".text"))) = { ... }; to control what section it goes in (e.g. .text along with compiler-generated functions), or even a linker script to get more control.
If you really want to do it all in one .c file, instead of using the easier solution of a separately-assembled pure asm source:
To assemble some 64-bit code along with compiler-generated 32-bit code, use the .code64 GAS directive in an asm statement *outside of any functions. IDK if there's any guarantee on what section will be active when gcc emits your asm how gcc will mix that asm with its asm, but it won't put it in the middle of a function.
asm(".pushsection .text \n\t" // AFAIK, there's no guarantee how this will mix with compiler asm output
".code64 \n\t"
".p2align 4 \n\t"
".globl my_codebytes \n\t" // optional
"my_codebytes: \n\t"
"inc %r10d \n\t"
"my_codebytes_end: \n\t"
//"my_codebytes_len: .long . - my_codebytes\n\t" // store the length in memory. Optional
".popsection \n\t"
#ifdef __i386
".code32" // back to 32-bit interpretation for gcc's code
// "\n\t inc %r10" // uncomment to check that it *doesn't* assemble
#endif
);
#ifdef __cplusplus
extern "C" {
#endif
// put C names on the labels.
// They are *not* pointers, their addresses are link-time constants
extern char my_codebytes[], my_codebytes_end[];
//extern const unsigned my_codebytes_len;
#ifdef __cplusplus
}
#endif
// This expression for the length isn't a compile-time constant, so this isn't legal C
//static const unsigned len = &my_codebytes_end - &my_codebytes;
#include <stddef.h>
#include <unistd.h>
int main(void) {
size_t len = my_codebytes_end - my_codebytes;
const char* bytes = my_codebytes;
// do whatever you want. Writing it to stdout is one option!
write(1, bytes, len);
}
This compiles and assembles with gcc and clang (compiler explorer).
I tried it on my desktop to double check:
peter#volta$ gcc -m32 -Wall -O3 /tmp/foo.c
peter#volta$ ./a.out | hd
00000000 41 ff c2 |A..|
00000003
This is the correct encoding for inc %r10d :)
The program also works when compiled without -m32, because I used #ifdef to decide whether to use .code32 at the end or not. (There's no push/pop mode directive like there is for sections.)
Of course, disassembling the binary will show you:
00000580 <my_codebytes>:
580: 41 inc ecx
581: ff c2 inc edx
because the disassembler doesn't know to switch to 64-bit disassembly for that block. (I wonder if ELF has attributes for that... I didn't use any assembler directives or linker scripts to generate such attributes, if such a thing exists.)
Switching between long mode and compatibility mode is done by changing CS. User mode code cannot modify the descriptor table, but it can perform a far jump or far call to a code segment that is already present in the descriptor table. In Linux the required descriptor is present (in my experience; this may not be true for all installations).
Here is sample code for 64-bit Linux (Ubuntu) that starts in 32-bit mode, switches to 64-bit mode, runs a function, and then switches back to 32-bit mode. Build with gcc -m32.
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
extern bool switch_cs(int cs, bool (*f)());
extern bool check_mode();
int main(int argc, char **argv)
{
int cs = 0x33;
if (argc > 1)
cs = strtoull(argv[1], 0, 16);
printf("switch to CS=%02x\n", cs);
bool r = switch_cs(cs, check_mode);
if (r)
printf("cs=%02x: 64-bit mode\n", cs);
else
printf("cs=%02x: 32-bit mode\n", cs);
return 0;
}
.intel_syntax noprefix
.text
.code32
.globl switch_cs
switch_cs:
mov eax, [esp+4]
mov edx, [esp+8]
push 0
push edx
push eax
push offset .L2
lea eax, [esp+8]
lcall [esp]
add esp, 16
ret
.L2:
call [eax]
lret
.code64
.globl check_mode
check_mode:
xor eax, eax
// In 32-bit mode, this instruction is executed as
// inc eax; test eax, eax
test rax, rax
setz al
ret

How do I ask the assembler to "give me a full size register"?

I'm trying to allow the assembler to give me a register it chooses, and then use that register with inline assembly. I'm working with the program below, and its seg faulting. The program was compiled with g++ -O1 -g2 -m64 wipe.cpp -o wipe.exe.
When I look at the crash under lldb, I believe I'm getting a 32-bit register rather than a 64-bit register. I'm trying to compute an address (base + offset) using lea, and store the result in a register the assembler chooses:
"lea (%0, %1), %2\n"
Above, I'm trying to say "use a register, and I'll refer to it as %2".
When I perform a disassembly, I see:
0x100000b29: leal (%rbx,%rsi), %edi
-> 0x100000b2c: movb $0x0, (%edi)
So it appears the code being generated calculates and address using 64-bit values (rbx and rsi), but saves it to a 32-bit register (edi) (that the assembler chose).
Here are the values at the time of the crash:
(lldb) type format add --format hex register
(lldb) p $edi
(unsigned int) $3 = 1063330
(lldb) p $rbx
(unsigned long) $4 = 4296030616
(lldb) p $rsi
(unsigned long) $5 = 10
A quick note on the Input Operands below. If I drop the "r" (2), then I get a compiler error when I refer to %2 in the call to lea: invalid operand number in inline asm string.
How do I tell the assembler to "give me a full size register" and then refer to it in my program?
int main(int argc, char* argv[])
{
string s("Hello world");
cout << s << endl;
char* ptr = &s[0];
size_t size = s.length();
if(ptr && size)
{
__asm__ __volatile__
(
"%=:\n" /* generate a unique label for TOP */
"subq $1, %1\n" /* 0-based index */
"lea (%0, %1), %2\n" /* calcualte ptr[idx] */
"movb $0, (%2)\n" /* 0 -> ptr[size - 1] .. ptr[0] */
"jnz %=b\n" /* Back to TOP if non-zero */
: /* no output */
: "r" (ptr), "r" (size), "r" (2)
: "0", "1", "2", "cc"
);
}
return 0;
}
Sorry about these inline assembly questions. I hope this is the last one. I'm not really thrilled with using inline assembly in GCC because of pain points like this (and my fading memory). But its the only legal way I know to do what I want to do given GCC's interpretation of the qualifier volatile in C.
If interested, GCC interprets C's volatile qualifier as hardware backed memory, and anything else is an abuse and it results in an illegal program. So the following is not legal for GCC:
volatile void* g_tame_the_optimizer = NULL;
...
unsigned char* ptr = ...
size_t size = ...;
for(size_t i = 0; i < size; i++)
ptr[i] = 0x00;
g_tame_the_optimizer = ptr;
Interestingly, Microsoft uses a more customary interpretation of volatile (what most programmers expect - namely, anything can change the memory, and not just memory mapped hardware), and the code above is acceptable.
gcc inline asm is a complicated beast. "r" (2) means allocate an int sized register and load it with the value 2. If you just need an arbitrary scratch register you can declare a 64 bit early-clobber dummy output, such as "=&r" (dummy) in the output section, with void *dummy declared earlier. You can consult the gcc manual for more details.
As to the final code snippet looks like you want a memory barrier, just as the linked email says. See the manual for example.

Assembly program runs and immediately crashes without printing the Hello World message

Ok so basically I was just writing a C program to build my object files and then create executeables from them by using nasm and ld respectively
The program I wrote makes the correct calls to nasm and ld but I either compile fine with -f win32/win64 ( I'm on a 64 bit windows 7 machine ) or fail with the other options which is fine though... right? If the program compiles and creates the exe it runs and immediately crashes without printing the Hello World message. I'd Really like to jump into assembly. Some Help ?
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov ah,00
int 16h
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
I also happen to have a kali system ; I don't suppose I could compile for both operating systems without using Wine?
So my C program is working nicely! I can't find any examples of code to assemble though. Well I can... but it all fails. Does anyone have a link?
#include <stdio.h>
#include <stdlib.h>
#include <CustomHeader_Small.h>
void Assemble(void);
void PrintMenu(void);
void LoadOptions(void);
void SaveOptions(void);
char TempBuff[255];
char Format[30];
void SaveOptions(void)
{
FILE *Source = fopen("Settings.ini","w");
if(Source)
{
printf("%s","Enter A Format Type ->");
scanf("%s",Format); //Save Format
fprintf(Source,"Format:%s",Format);
fclose(Source);
puts("Settings Updated!");
LoadOptions();
}
return;
}
void LoadOptions(void)
{
FILE *Source = fopen("Settings.ini","r");
if(Source)
{
char ch;
int i;
char Line[50];
fscanf(Source,"%s",Line);
CCopy(Line,CPos(Line,":",0)+1,CLen(Line),Format,0,1);
free(Line);
}
else
{
Source = fopen("Settings.ini","w");
fprintf(Source,"%s","Format:Win32");
fclose(Source);
}
PrintMenu();
LoadOptions();
return;
}
void PrintMenu(void)
{
printf("%s","Menu:\n________\n1.) [C]reate A New Project.\n2.) [O]pen A Project.\n3.) [A]ssemble A Project.\n4.) [E]dit Settings\n");
printf("%s","5.) [Q]uit\n");
return;
}
void Assemble(void)
{
char *File=malloc(256);
printf("Note : Compiling In %s Mode\n",Format);
printf("%s","Enter A Project Name -> ");
scanf("%s",File);
char *Command;
int ch;
while((ch=getchar())!='S')
{
Command=malloc(1024);
strcpy(Command,"H:\\Users\\Grim\\AppData\\Local\\nasm\\nasm.exe -f ");
strcat(Command,Format);
strcat(Command," ");
strcat(Command,File);
strcat(Command,"\\");
strcat(Command,File);
strcat(Command,".asm ");
strcat(Command,"-o ");
strcat(Command,File);
strcat(Command,"\\");
strcat(Command,File);//This Creates The Object File Using Nasm ( Not Just Yet But Were Well On Our Way!
strcat(Command,".o");
system(Command); //Calls Nasm.
free(Command);
Command=malloc(1024);
strcpy(Command,"H:\\MinGW\\bin\\ld.exe ");
strcat(Command,File);
strcat(Command,"\\");
strcat(Command,File);
strcat(Command,".o ");
strcat(Command,"-o ");
strcat(Command,File);//This Creates The Executable File Using Nasm ( Not Just Yet But Were Well On Our Way!
strcat(Command,"\\");
strcat(Command,File);
strcat(Command,".exe");
system(Command); //Calls Nasm.
free(Command);
puts("Press Enter To Compile Again But Enter An [S] Followed By Enter To [S]top.");
}
free(File);
puts("NasmWrapper Assembly Done!");
printf("%s","\n\n\n");
PrintMenu();
return;
}
int main()
{
LoadOptions();
char ch;
while((ch = getchar())!='Q')
{
if(ch=='A') Assemble();
if(ch=='E') SaveOptions();
}
return 0;
}
Also any comments on the C program would be nice :D Thanks for explaining how to use the [Code] thing.
Your assembly program, apart from that odd int 16h* is specifically for Linux (32-bit Linux, to be more precise). int 0x80 is the way you invoke one of the Linux kernel system calls.
Windows doesn't do it this way. Instead you call the Windows API or the C standard library.
This OS-specific variation is one of the reasons it is good to use a higher level language rather than assembly.
If you want to play with assembly, my recommendation would be to decide on which OS you want to start with, and use that exclusively to begin with. Find some tutorials (there are lots for Linux and Windows) and get started. Once you have got it working for one OS, try it for another.
* int 16h calls the BIOS from DOS. This won't work in Linux.

atomic_inc and atomic_xchg in gcc assembly

I have written the following user-level code snippet to test two sub functions, atomic inc and xchg (refer to Linux code).
What I need is just try to perform operations on 32-bit integer, and that's why I explicitly use int32_t.
I assume global_counter will be raced by different threads, while tmp_counter is fine.
#include <stdio.h>
#include <stdint.h>
int32_t global_counter = 10;
/* Increment the value pointed by ptr */
void atomic_inc(int32_t *ptr)
{
__asm__("incl %0;\n"
: "+m"(*ptr));
}
/*
* Atomically exchange the val with *ptr.
* Return the value previously stored in *ptr before the exchange
*/
int32_t atomic_xchg(uint32_t *ptr, uint32_t val)
{
uint32_t tmp = val;
__asm__(
"xchgl %0, %1;\n"
: "=r"(tmp), "+m"(*ptr)
: "0"(tmp)
:"memory");
return tmp;
}
int main()
{
int32_t tmp_counter = 0;
printf("Init global=%d, tmp=%d\n", global_counter, tmp_counter);
atomic_inc(&tmp_counter);
atomic_inc(&global_counter);
printf("After inc, global=%d, tmp=%d\n", global_counter, tmp_counter);
tmp_counter = atomic_xchg(&global_counter, tmp_counter);
printf("After xchg, global=%d, tmp=%d\n", global_counter, tmp_counter);
return 0;
}
My 2 questions are:
Are these two subfunctions written properly?
Will this behave the same when I compile this on 32-bit or
64-bit platform? For example, could the pointer address have a different
length. or could incl and xchgl will conflict with the operand?
My understanding of this question is below, please correct me if I'm wrong.
All the read-modify-write instructions (ex: incl, add, xchg) need a lock prefix. The lock instruction is to lock the memory accessed by other CPUs by asserting LOCK# signal on the memory bus.
The __xchg function in Linux kernel implies no "lock" prefix because xchg always implies lock anyway. http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/cmpxchg_64.h#L15
However, the incl used in atomic_inc does not have this assumption so a lock_prefix is needed.
http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/atomic.h#L105
btw, I think you need to copy the *ptr to a volatile variable to avoid gcc optimization.
William

Resources