Self-modifying code on macOS v10.15 (Catalina) / x64

Self-modifying code on macOS v10.15 (Catalina) / x64 - macos

As part as porting a Forth compiler, I'm trying to create a binary that allows for self-modifying code. The gory details are at https://github.com/klapauciusisgreat/jonesforth-MacOS-x64
Ideally, I create a bunch of pages for user definitions and call mprotect like so:
#define __NR_exit 0x2000001
#define __NR_open 0x2000005
#define __NR_close 0x2000006
#define __NR_read 0x2000003
#define __NR_write 0x2000004
#define __NR_mprotect 0x200004a
#define PROT_READ 0x01
#define PROT_WRITE 0x02
#define PROT_EXEC 0x04
#define PROT_ALL (PROT_READ | PROT_WRITE | PROT_EXEC)
#define PAGE_SIZE 4096
// https://opensource.apple.com/source/xnu/xnu-201/bsd/sys/errno.h
#define EACCES 13 /* Permission denied */
#define EINVAL 22 /* Invalid argument */
#define ENOTSUP 45 /* Operation not supported */
/* Assembler entry point. */
.text
.globl start
start:
// Use mprotect to allow read/write/execute of the .bss section
mov $__NR_mprotect, %rax // mprotect
lea user_defs_start(%rip), %rdi // Start address
and $-PAGE_SIZE,%rdi // Align at page boundary
mov $USER_DEFS_SIZE, %rsi // Length
mov $PROT_ALL,%rdx
syscall
cmp $EINVAL, %rax
je 1f
cmp $EACCES,%rax
je 2f
test %rax,%rax
je 4f // All good, proceed:
// must be ENOTSUP
mov $2,%rdi // First parameter: stderr
lea errENOTSUP(%rip),%rsi // Second parameter: error message
mov $8,%rdx // Third parameter: length of string
mov $__NR_write,%rax // Write syscall
syscall
jmp 3f
1:
mov $2,%rdi // First parameter: stderr
lea errEINVAL(%rip),%rsi // Second parameter: error message
mov $7,%rdx // Third parameter: length of string
mov $__NR_write,%rax // Write syscall
syscall
jmp 3f
2:
mov $2,%rdi // First parameter: stderr
lea errEACCES(%rip),%rsi // Second parameter: error message
mov $7,%rdx // Third parameter: length of string
mov $__NR_write,%rax // Write syscall
syscall
3:
// did't work -- then exit
xor %rdi,%rdi
mov $__NR_exit,%rax // syscall: exit
syscall
4:
// All good, let's get started for real:
.
.
.
.set RETURN_STACK_SIZE,8192
.set BUFFER_SIZE,4096
.set USER_DEFS_SIZE,65536*2 // 128 kiB ought to be enough for everybody
.bss
.balign 8
user_defs_start:
.space USER_DEFS_SIZE
However, I get an EACCES return value. I suspect that this is because of some security policy apple has set up, but I do not find good documentation.
Where is the source code for mprotect, and/or what are the methods to mark the data area executable and writable at the same time?
I found that compiling with
gcc -segprot __DATA rwx rwx
does mark the entire data segment rwx, so it must somehow be possible to do the right thing. But I would prefer to only make the area hosting Forth words executable, not the entire data segment.
I found a similar discussion here, but without any solution.

The segment that I want to 'unprotect' exec permission in has really two values describing its permissions:
the initial protection settings, which for __DATA I want rw-
the maximum protection (loosest) settings, which I want to be rwx.
So first I need to set the maxprot field to rwx. According to the ld manpage, this should be achieved by invoking gcc or ld with the flags -segprot __DATA rwx rw. However, a recent change made by Apple to the linker essentially ignores the maxprot value, and sets maxprot=initprot.
Thanks to Darfink, you can use this script to tweak the maxprot bits after the fact. I thought additional code signing with special entitlements was required, but it's not, at least for the __DATA segment. The __TEXT segment may need code signing with the com.apple.security.cs.disable-executable-page-protection entitlement.
Also see here for a few more details.
Looking at the larger picture, I should also point out that rather to unprotect pieces of an otherwise protected __DATA segment, it may be better to create a complete new data/code segment just for the self-modifying code with rwx permissions from the start. This allows still protecting the rest of data by the operating system, and requires no non-standard tools.

Related

Cannot modify data segment register. When tried General Protection Error is thrown

I have been trying to create an ISR handler following this
tutorial by James Molloy but I got stuck. Whenever I throw a software interrupt, general purpose registers and the data segment register is pushed onto the stack with the variables automatically pushed by the CPU. Then the data segment is changed to the value of 0x10 (Kernel Data Segment Descriptor) so the privilege levels are changed. Then after the handler returns those values are poped. But whenever the value in ds is changed a GPE is thrown with the error code 0x2544 and after a few seconds the VM restarts. (linker and compiler i386-elf-gcc , assembler nasm)
I tried placing hlt instructions in between instructions to locate which instruction was throwing the GPE. After that I was able to find out that the the `mov ds,ax' instruction. I tried various things like removing the stack which was initialized by the bootstrap code to deleting the privilege changing parts of the code. The only way I can return from the common stub is to remove the parts of my code which change the privilege levels but as I want to move towards user mode I still want them to stay.
Here is my common stub:
isr_common_stub:
pusha ; Pushes edi,esi,ebp,esp,ebx,edx,ecx,eax
xor eax,eax
mov ax, ds ; Lower 16-bits of eax = ds.
push eax ; save the data segment descriptor
mov ax, 0x10 ; load the kernel data segment descriptor
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
call isr_handler
xor eax,eax
pop eax
mov ds, ax ; This is the instruction everything fails;
mov es, ax
mov fs, ax
mov gs, ax
popa
iret
My ISR handler macros:
extern isr_handler
%macro ISR_NOERRCODE 1
global isr%1 ; %1 accesses the first parameter.
isr%1:
cli
push byte 0
push %1
jmp isr_common_stub
%endmacro
%macro ISR_ERRCODE 1
global isr%1
isr%1:
cli
push byte %1
jmp isr_common_stub
%endmacro
ISR_NOERRCODE 0
ISR_NOERRCODE 1
ISR_NOERRCODE 2
ISR_NOERRCODE 3
...
My C handler which results in "Received interrupt: 0xD err. code 0x2544"
#include <stdio.h>
#include <isr.h>
#include <tty.h>
void isr_handler(registers_t regs) {
printf("ds: %x \n" ,regs.ds);
printf("Received interrupt: %x with err. code: %x \n", regs.int_no, regs.err_code);
}
And my main function:
void kmain(struct multiboot *mboot_ptr) {
descinit(); // Sets up IDT and GDT
ttyinit(TTY0); // Sets up the VGA Framebuffer
asm volatile ("int $0x1"); // Triggers a software interrupt
printf("Wow"); // After that its supposed to print this
}
As you can see the code was supposed to output,
ds: 0x10
Received interrupt: 0x1 with err. code: 0
but results in,
...
ds: 0x10
Received interrupt: 0xD with err. code: 0x2544
ds: 0x10
Received interrupt: 0xD with err. code: 0x2544
...
Which goes on until the VM restarts itself.
What am I doing wrong?

The code isn't complete but I'm going to guess what you are seeing is a result of a well known bug in James Molloy's OSDev tutorial. The OSDev community has compiled a list of known bugs in an errata list. I recommend reviewing and fixing all the bugs mentioned there. Specifically in this case I believe the bug that is causing problems is this one:
Problem: Interrupt handlers corrupt interrupted state
This article previously told you to know the ABI. If you do you will
see a huge problem in the interrupt.s suggested by the tutorial: It
breaks the ABI for structure passing! It creates an instance of the
struct registers on the stack and then passes it by value to the
isr_handler function and then assumes the structure is intact
afterwards. However, the function parameters on the stack belongs to
the function and it is allowed to trash these values as it sees fit
(if you need to know whether the compiler actually does this, you are
thinking the wrong way, but it actually does). There are two ways
around this. The most practical method is to pass the structure as a
pointer instead, which allows you to explicitly edit the register
state when needed - very useful for system calls, without having the
compiler randomly doing it for you. The compiler can still edit the
pointer on the stack when it's not specifically needed. The second
option is to make another copy the structure and pass that
The problem is that the 32-bit System V ABI doesn't guarantee that data passed by value will be unmodified on the stack! The compiler is free to reuse that memory for whatever purposes it chooses. The compiler probably generated code that trashed the area on the stack where DS is stored. When DS was set with the bogus value it crashed. What you should be doing is passing by reference rather than value. I'd recommend these code changes in the assembly code:
irq_common_stub:
pusha
mov ax, ds
push eax
mov ax, 0x10 ;0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
push esp ; At this point ESP is a pointer to where GS (and the rest
; of the interrupt handler state resides)
; Push ESP as 1st parameter as it's a
; pointer to a registers_t
call irq_handler
pop ebx ; Remove the saved ESP on the stack. Efficient to just pop it
; into any register. You could have done: add esp, 4 as well
pop ebx
mov ds, bx
mov es, bx
mov fs, bx
mov gs, bx
popa
add esp, 8
sti
iret
And then modify irq_handler to use registers_t *regs instead of registers_t regs :
void irq_handler(registers_t *regs) {
if (regs->int_no >= 40) port_byte_out(0xA0, 0x20);
port_byte_out(0x20, 0x20);
if (interrupt_handlers[regs->int_no] != 0) {
interrupt_handlers[regs->int_no](*regs);
}
else
{
klog("ISR: Unhandled IRQ%u!\n", regs->int_no);
}
}
I'd actually recommend each interrupt handler take a pointer to registers_t to avoid unnecessary copying. If your interrupt handlers and the interrupt_handlers array used function that took registers_t * as the parameter (instead of registers_t) then you'd modify the code:
interrupt_handlers[r->int_no](*regs);
to be:
interrupt_handlers[r->int_no](regs);
Important: You have to make these same type of changes for your ISR handlers as well. Both the IRQ and ISR handlers and associated code have this same problem.

gcc with intel x86-32 bit assembly : accessing C function arguments

I am doing an operating system implementation work.
Here's the code first :
//generate software interrupt
void generate_interrupt(int n) {
asm("mov al, byte ptr [n]");
asm("mov byte ptr [genint+1], al");
asm("jmp genint");
asm("genint:");
asm("int 0");
}
I am compiling above code with -masm=intel option in gcc. Also,
this is not complete code to generate software interrupt.
My problem is I am getting error as n undefined, how do I resolve it, please help?
Also it promts error at link time not at compile time, below is an image

When you are using GCC, you must use GCC-style extended asm to access variables declared in C, even if you are using Intel assembly syntax. The ability to write C variable names directly into an assembly insert is a feature of MSVC, which GCC does not copy.
For constructs like this, it is also important to use a single assembly insert, not several in a row; GCC can and will rearrange assembly inserts relative to the surrounding code, including relative to other assembly inserts, unless you take specific steps to prevent it.
This particular construct should be written
void generate_interrupt(unsigned char n)
{
asm ("mov byte ptr [1f+1], %0\n\t"
"jmp 1f\n"
"1:\n\t"
"int 0"
: /* no outputs */ : "r" (n));
}
Note that I have removed the initial mov and any insistence on involving the A register, instead telling GCC to load n into any convenient register for me with the "r" input constraint. It is best to do as little as possible in an assembly insert, and to leave the choice of registers to the compiler as much as possible.
I have also changed the type of n to unsigned char to match the actual requirements of the INT instruction, and I am using the 1f local label syntax so that this works correctly if generate_interrupt is made an inline function.
Having said all that, I implore you to find an implementation strategy for your operating system that does not involve self-modifying code. Well, unless you plan to get a whole lot more use out of the self-modifications, anyway.

This isn't an answer to your specific question about passing parameters into inline assembly (see #zwol's answer). This addresses using self modifying code unnecessarily for this particular task.
Macro Method if Interrupt Numbers are Known at Compile-time
An alternative to using self modifying code is to create a C macro that generates the specific interrupt you want. One trick is you need to a macro that converts a number to a string. Stringize macros are quite common and documented in the GCC documentation.
You could create a macro GENERATE_INTERRUPT that looks like this:
#define STRINGIZE_INTERNAL(s) #s
#define STRINGIZE(s) STRINGIZE_INTERNAL(s)
#define GENERATE_INTERRUPT(n) asm ("int " STRINGIZE(n));
STRINGIZE will take a numeric value and convert it into a string. GENERATE_INTERRUPT simply takes the number, converts it to a string and appends it to the end of the of the INT instruction.
You use it like this:
GENERATE_INTERRUPT(0);
GENERATE_INTERRUPT(3);
GENERATE_INTERRUPT(255);
The generated instructions should look like:
int 0x0
int3
int 0xff
Jump Table Method if Interrupt Numbers are Known Only at Run-time
If you need to call interrupts only known at run-time then one can create a table of interrupt calls (using int instruction) followed by a ret. generate_interrupt would then simply retrieve the interrupt number off the stack, compute the position in the table where the specific int can be found and jmp to it.
In the following code I get GNU assembler to generate the table of 256 interrupt call each followed by a ret using the .rept directive. Each code fragment fits in 4 bytes. The result code generation and the generate_interrupt function could look like:
/* We use GNU assembly to create a table of interrupt calls followed by a ret
* using the .rept directive. 256 entries (0 to 255) are generated.
* generate_interrupt is a simple function that takes the interrupt number
* as a parameter, computes the offset in the interrupt table and jumps to it.
* The specific interrupted needed will be called followed by a RET to return
* back from the function */
extern void generate_interrupt(unsigned char int_no);
asm (".pushsection .text\n\t"
/* Generate the table of interrupt calls */
".align 4\n"
"int_jmp_table:\n\t"
"intno=0\n\t"
".rept 256\n\t"
"\tint intno\n\t"
"\tret\n\t"
"\t.align 4\n\t"
"\tintno=intno+1\n\t"
".endr\n\t"
/* generate_interrupt function */
".global generate_interrupt\n" /* Give this function global visibility */
"generate_interrupt:\n\t"
#ifdef __x86_64__
"movzx edi, dil\n\t" /* Zero extend int_no (in DIL) across RDI */
"lea rax, int_jmp_table[rip]\n\t" /* Get base of interrupt jmp table */
"lea rax, [rax+rdi*4]\n\t" /* Add table base to offset = jmp address */
"jmp rax\n\t" /* Do sepcified interrupt */
#else
"movzx eax, byte ptr 4[esp]\n\t" /* Get Zero extend int_no (arg1 on stack) */
"lea eax, int_jmp_table[eax*4]\n\t" /* Compute jump address */
"jmp eax\n\t" /* Do specified interrupt */
#endif
".popsection");
int main()
{
generate_interrupt (0);
generate_interrupt (3);
generate_interrupt (255);
}
If you were to look at the generated code in the object file you'd find the interrupt call table (int_jmp_table) looks similar to this:
00000000 <int_jmp_table>:
0: cd 00 int 0x0
2: c3 ret
3: 90 nop
4: cd 01 int 0x1
6: c3 ret
7: 90 nop
8: cd 02 int 0x2
a: c3 ret
b: 90 nop
c: cc int3
d: c3 ret
e: 66 90 xchg ax,ax
10: cd 04 int 0x4
12: c3 ret
13: 90 nop
...
[snip]
Because I used .align 4 each entry is padded out to 4 bytes. This makes the address calculation for the jmp easier.

static C variable not getting initialized

I have one file-level static C variable that isn't getting initialized.
const size_t VGA_WIDTH = 80;
const size_t VGA_HEIGHT = 25;
static uint16_t* vgat_buffer = (uint16_t*)0x62414756; // VGAb
static char vgat_initialized= '\0';
In particular, vgat_initialized isn't always 0 the first time it is accessed. (Of course, the problem only appears on certain machines.)
I'm playing around with writing my own OS, so I'm pretty sure this is a problem with my linker script; but, I'm not clear how exactly the variables are supposed to be organized in the image produced by the linker (i.e., I'm not sure if this variable is supposed to go in .data, .bss, some other section, etc.)
VGA_WIDTH and VGA_HEIGHT get placed in the .rodata section as expected.
vgat_buffer is placed in the .data section, as expected (By initializing this variable to 0x62417656, I can clearly see where the linker places it in the resulting image file.)
I can't figure out where vgat_initialized is supposed to go. I've included the relevant parts of the assembly file below. From what I understand, the .comm directive is supposed to allocate space for the variable in the data section; but, I can't tell where. Looking in the linker's map file didn't provide any clues either.
Interestingly enough, if I change the initialization to
static char vgat_initialized= 'x';
everything works as expected: I can clearly see where the variable is placed in the resulting image file (i.e., I can see the x in the hexdump of the image file).
Assembly code generated from the C file:
.text
.LHOTE15:
.local buffer.1138
.comm buffer.1138,100,64
.local buffer.1125
.comm buffer.1125,100,64
.local vgat_initialized
.comm vgat_initialized,1,1
.data
.align 4
.type vgat_buffer, #object
.size vgat_buffer, 4
vgat_buffer:
.long 1648445270
.globl VGA_HEIGHT
.section .rodata
.align 4
.type VGA_HEIGHT, #object
.size VGA_HEIGHT, 4
VGA_HEIGHT:
.long 25
.globl VGA_WIDTH
.align 4
.type VGA_WIDTH, #object
.size VGA_WIDTH, 4
VGA_WIDTH:
.long 80
.ident "GCC: (GNU) 4.9.2"

compilers can conform to their own names for sections certainly but using the common .data, .text, .rodata, .bss that we know from specific compilers, this should land in .bss.
But that doesnt in any way automatically zero it out. There needs to be a mechanism, sometimes depending on your toolchain the toolchain takes care of it and creates a binary that in addition to .data, .rodata (and naturally .text) being filled in will fill in .bss in the binary. But depends on a few things, primarily is this a simple ram only image, is everything living under one memory space definition in the linker script.
you could for example put .data after .bss in the linker script and depending the binary format you use and/or tools that convert that you could end up with zeroed memory in the binary without any other work.
Normally though you should expect to using toolchain specific (linker scripts are linker specific not to be assumed to be universal to all tools) mechanism for defining where .bss is from your perspective, then some form of communication from the linker as to where it starts and what size, that information is used by the bootstrap whose job it is to zero it in that case, and one can assume it is always the bootstrap's job to zero .bss with naturally some exceptions. Likewise if the binary is meant to be on a read only media (rom, flash, etc) but .data, and .bss are read/write you need to have .data in its entirety on this media then someone has to copy it to its runtime position in ram, and .bss is either part of that depending on the toolchain and how you used it or the start address and size are on the read only media and someone has to zero that space at some point pre-main(). Here again this is the job of the bootstrap. Set the stack pointer, move .data if needed, zero .bss are the typical minimal jobs of the bootstrap, you can shortcut them in special cases or avoid using .data or .bss.
Since it is the linkers job to take all the little .data and .bss (and other) definitions from the objects being linked and combine them per the directions from the user (linker script, command line, whatever that tool uses), the linker ultimately knows.
In the case of gcc you use what I would call variables that are defined in the linker script, the linker script can fill in these values with matching variable/label names for the assembler such that a generic bootstrap can be used and you dont have to do any more work than that.
Like this but possibly more complicated
MEMORY
{
bob : ORIGIN = 0x8000, LENGTH = 0x1000
ted : ORIGIN = 0xA000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > bob
__data_rom_start__ = .;
.data : {
__data_start__ = .;
*(.data*)
} > ted AT > bob
__data_end__ = .;
__data_size__ = __data_end__ - __data_start__;
.bss : {
__bss_start__ = .;
*(.bss*)
} > bob
__bss_end__ = .;
__bss_size__ = __bss_end__ - __bss_start__;
}
then you can pull these into the assembly language bootstrap
.globl bss_start
bss_start: .word __bss_start__
.globl bss_end
bss_end: .word __bss_end__
.word __bss_size__
.globl data_rom_start
data_rom_start:
.word __data_rom_start__
.globl data_start
data_start:
.word __data_start__
.globl data_end
data_end:
.word __data_end__
.word __data_size__
and then write some code to operate on those as needed for your design.
you can simply put things like that in a linked in assembly language file without other code using them and assemble, compile other code and link and then the disassembly or other tools you prefer will show you what the linker generated, tweak that until you are satisfied then you can write or borrow or steal bootstrap code to use them.
for bare metal I prefer to not completely conform to the standard with my code, dont have any .data and dont expect .bss to be zero, so my bootstrap sets the stack pointer and calls main, done. For an operating system, you should conform. the toolchains already have this solved for the native platform, but if you are taking over that with your own linker script and boostrap then you need to deal with it, if you want to use an existing toolchains solution for an existing operating system then...done...just do that.

This answer is simply an extension of the others. As has been mentioned C standard has rules about initialization:
10) If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
The problem in your code is that a computers memory may not always be initialized to zero. It is up to you to make sure the BSS section is initialized to zero in a free standing environment (like your OS and bootloader).
The BSS sections usually don't (by default) take up space in a binary file and usually occupy memory in the area beyond the limits of the code and data that appears in the binary. This is done to reduce the size of the binary that has to be read into memory.
I know you are writing an OS for x86 booting with legacy BIOS. I know that you are using GCC from your other recent questions. I know you are using GNU assembler for part of your bootloader. I know that you have a linker script, but I don't know what it looks like. The usual mechanism to do this is via a linker script that places the BSS data at the end, and creates start and end symbols to define the address extents of the section. Once these symbols are defined by the linker they can be used by C code (or assembly code) to loop through the region and set it to zero.
I present a reasonably simple MCVE that does this. The code reads an extra sector with the kernel with Int 13h/AH=2h; enables the A20 line (using fast A20 method); loads a GDT with 32-bit descriptors; enables protected mode; completes the transition into 32-bit protected mode; and then calls a kernel entry point in C called kmain. kmain calls a C function called zero_bss that initializes the BSS section based on the starting and ending symbols (__bss_start and __bss_end) generated by a custom linker script.
boot.S:
.extern kmain
.globl mbrentry
.code16
.section .text
mbrentry:
# If trying to create USB media, a BPB here may be needed
# At entry DL contains boot drive number
# Segment registers to zero
xor %ax, %ax
mov %ax, %ds
mov %ax, %es
# Set stack to grow down from area under the place the bootloader was loaded
mov %ax, %ss
mov $0x7c00, %sp
cld # Ensure forward direction of MOVS/SCAS/LODS instructions
# which is required by generated C code
# Load kernel into memory
mov $0x02, %ah # Disk read
mov $1, %al # Read 1 sector
xor %ch, %ch # Cylinder 0
xor %dh, %dh # Head 0
mov $2, %cl # Start reading from second sector
mov $0x7e00, %bx # Load kernel at 0x7e00
int $0x13
# Quick and dirty A20 enabling. May not work on all hardware
a20fast:
in $0x92, %al
or $2, %al
out %al, $0x92
loadgdt:
cli # Turn off interrupts until a Interrupt Vector
# Table (IVT) is set
lgdt (gdtr)
mov %cr0, %eax
or $1, %al
mov %eax, %cr0 # Enable protected mode
jmp $0x08,$init_pm # FAR JMP to next instruction to set
# CS selector with a 32-bit code descriptor and to
# flush the instruction prefetch queue
.code32
init_pm:
# Set remaining 32-bit selectors
mov $DATA_SEG, %ax
mov %ax, %ds
mov %ax, %es
mov %ax, %fs
mov %ax, %gs
mov %ax, %ss
# Start executing kernel
call kmain
cli
loopend: # Infinite loop when finished
hlt
jmp loopend
.align 8
gdt_start:
.long 0 # null descriptor
.long 0
gdt_code:
.word 0xFFFF # limit low
.word 0 # base low
.byte 0 # base middle
.byte 0b10011010 # access
.byte 0b11001111 # granularity/limit high
.byte 0 # base high
gdt_data:
.word 0xFFFF # limit low (Same as code)
.word 0 # base low
.byte 0 # base middle
.byte 0b10010010 # access
.byte 0b11001111 # granularity/limit high
.byte 0 # base high
end_of_gdt:
gdtr:
.word end_of_gdt - gdt_start - 1
# limit (Size of GDT)
.long gdt_start # base of GDT
CODE_SEG = gdt_code - gdt_start
DATA_SEG = gdt_data - gdt_start
kernel.c:
#include <stdint.h>
extern uintptr_t __bss_start[];
extern uintptr_t __bss_end[];
/* Zero the BSS section 4-bytes at a time */
static void zero_bss(void)
{
uint32_t *memloc = __bss_start;
while (memloc < __bss_end)
*memloc++ = 0;
}
int kmain(){
zero_bss();
return 0;
}
link.ld
ENTRY(mbrentry)
SECTIONS
{
. = 0x7C00;
.mbr : {
boot.o(.text);
boot.o(.*);
}
. = 0x7dfe;
.bootsig : {
SHORT(0xaa55);
}
. = 0x7e00;
.kernel : {
*(.text*);
*(.data*);
*(.rodata*);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss*);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
To compile, link and generate a binary file that can be used in a disk image from this code, you could use commands like:
as --32 boot.S -o boot.o
gcc -c -m32 -ffreestanding -O3 kernel.c
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m32 -Tlink.ld \
-o boot.elf -lgcc boot.o kernel.o
objcopy -O binary boot.elf boot.bin

The C standard says that static variables must be zero-initialized, even in absence of explicit initializer, so static char vgat_initialized= '\0'; is equivalent to static char vgat_initialized;.
In ELF and other similar formats, the zero-initialized data, such as this vgat_initialized goes to the .bss section. If you load such an executable yourself into memory, you need to explicitly zero the .bss part of the data segment.

The other answers are very complete and very helpful. In turns out that, in my specific case, I just needed to know that static variables initialized to 0 were put in .bss and not .data. Adding a .bss section to the linker script placed a zeroed-out section of memory in the image which solved the problem.

Why syscall doesn't work?

I'm on MAC OSX and I'm trying to call through assembly the execve syscall..
His opcode is 59 .
In linux I have to set opcode into eax, then parameters into the others registers, but here I have to put the opcode into eax and push parameters into the stack from right to left.
So I need execve("/bin/sh",NULL,NULL), I found somewhere that with assembly null=0, so I put null into 2nd and 3rd parameters.
global start
section .text
start:
jmp string
main:
; 59 opcode
; int execve(char *fname, char **argp, char **envp);
pop ebx ;stringa
push 0x0 ;3rd param
push 0x0 ;2nd param
push ebx ;1st param
add eax,0x3b ;execve opcode
int 0x80 ;interupt
sub eax,0x3a ; exit opcode
int 0x80
string:
call main
db '/bin/sh',0
When I try to execute it say:
Bad system call: 12

32-bit programs on BSD (on which OS/X is based) requires you to push an extra 4 bytes onto the stack if you intend to call int 0x80 directly. From the FreeBSD documentation you will find this:
By default, the FreeBSD kernel uses the C calling convention. Further, although the kernel is accessed using int 80h, it is assumed the program will call a function that issues int 80h, rather than issuing int 80h directly.
[snip]
But assembly language programmers like to shave off cycles. The above example requires a call/ret combination. We can eliminate it by pushing an extra dword:
open:
push dword mode
push dword flags
push dword path
mov eax, 5
push eax ; Or any other dword
int 80h
add esp, byte 16
When calling int 0x80 you need to adjust the stack pointer by 4. Pushing any value will achieve this. In the example they just do a push eax. Before your calls to int 0x80 push 4 bytes onto the stack.
Your other problem is that add eax,0x3b for example requires EAX to already be zero which is almost likely not the case. To fix that add an xor eax, eax to the code.
The fixes could look something like:
global start
section .text
start:
jmp string
main:
; 59 opcode
; int execve(char *fname, char **argp, char **envp);
xor eax, eax ;zero EAX
pop ebx ;stringa
push 0x0 ;3rd param
push 0x0 ;2nd param
push ebx ;1st param
add eax,0x3b ;execve opcode
push eax ;Push a 4 byte value after parameters per calling convention
int 0x80 ;interupt
sub eax,0x3a ; exit opcode
push eax ;Push a 4 byte value after parameters per calling convention
; in this case though it won't matter since the system call
; won't be returning
int 0x80
string:
call main
db '/bin/sh',0
Shellcode
Your code is actually called the JMP/CALL/POP method and is used for writing exploits. Are you writing an exploit or did you just find this code online? If it is intended to be used as shell code you would need to avoid putting a 0x00 byte in the output string. push 0x00 will encode 0x00 bytes in the generated code. To avoid this we can use EAX which we are now zeroing out and push it on the stack. As well you won't be able to NUL terminate the string so you'd have to move a NUL(0) character into the string. One way after zeroing EAX and popping EBX is to move zero to the end of the string manually with something like mov [ebx+7], al. Seven is the index after the end of the string /bin/sh. Your code would then look like this:
global start
section .text
start:
jmp string
main:
; 59 opcode
; int execve(char *fname, char **argp, char **envp);
xor eax, eax ;Zero EAX
pop ebx ;stringa
mov [ebx+7], al ;append a zero onto the end of the string '/bin/sh'
push eax ;3rd param
push eax ;2nd param
push ebx ;1st param
add eax,0x3b ;execve opcode
push eax
int 0x80 ;interupt
sub eax,0x3a ; exit opcode
push eax
int 0x80
string:
call main
db '/bin/sh',1

You are using a 64 bit syscall numbers and a 32 bit instruction to jump to the syscall. That is not going to work.
For 32 bit users:
opcode for Linux/MacOS execve: 11
instruction to call syscall: int 0x80
For 64 bit users:
opcode for Linux execve: 59 (MacOS 64-bit system calls also have a high bit set).
instruction to call syscall: syscall
The method for passing args to system calls is also different: 32-bit uses the stack, 64-bit uses similar registers to the function-calling convention.

Segmentation Fault 11 linking os x 32-bit assembler

UPDATE: Sure enough, it was a bug in the latest version of nasm. I "downgraded" and after fixing my code as shown in the answer I accepted, everything is working properly. Thanks, everyone!
I'm having problems with what should be a very simple program in 32-bit assembler on OS X.
First, the code:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
push hello
call _printf
push 0
call _exit
It assembles and links, but when I run the executable it crashes with a segmentation fault: 11.
The command lines to assemble and link are:
nasm -f macho32 hello32x.asm -o hello32x.o
I know the -o there is not 100 percent necessary
Linking:
ld -lc -arch i386 hello32x.o -o hello32x
When I run it into lldb to debug it, everything is fine until it enters into the call to _printf, where it crashes as shown below:
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x00001fac hello32x`main + 8, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x00001fac hello32x`main + 8
hello32x`main:
-> 0x1fac <+8>: calll 0xffffffff991e381e
0x1fb1 <+13>: pushl $0x0
0x1fb3 <+15>: calll 0xffffffff991fec84
0x1fb8: addl %eax, (%eax)
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x890a7ff8)
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
As you can see toward the bottom, it stops due to a bad access error.

16-byte Stack Alignment
One serious issue with your code is stack alignment. 32-bit OS/X code requires 16-byte stack alignment at the point you make a CALL. The Apple IA-32 Calling Convention says this:
The function calling conventions used in the IA-32 environment are the same as those used in the System V IA-32 ABI, with the following exceptions:
Different rules for returning structures
The stack is 16-byte aligned at the point of function calls
Large data types (larger than 4 bytes) are kept at their natural alignment
Most floating-point operations are carried out using the SSE unit instead of the x87 FPU, except when operating on long double values. (The IA-32 environment defaults to 64-bit internal precision for the x87 FPU.)
You subtract 12 from ESP to align the stack to a 16 byte boundary (4 bytes for return address + 12 = 16). The problem is that when you make a CALL to a function the stack MUST be 16 bytes aligned just prior to the CALL itself. Unfortunately you push 4 bytes before the call to printf and exit. This misaligns the stack by 4, when it should be aligned to 16 bytes. You'll have to rework the code with proper alignment. As well you must clean up the stack after you make a call. If you use PUSH to put parameters on the stack you need to adjust ESP after your CALL to restore the stack to its previous state.
One naive way (not my recommendation) to fix the code would be to do this:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 8
push hello ; 4(return address)+ 8 + 4 = 16 bytes stack aligned
call _printf
add esp, 4 ; Remove arguments
push 0 ; 4 + 8 + 4 = 16 byte alignment again
call _exit ; This will not return so no need to remove parameters after
The code above works because we can take advantage of the fact that both functions (exit and printf) require exactly one DWORD being placed on the stack for parameters. 4 bytes for main's return address, 8 for the stack adjustment we made, 4 for the DWORD parameter = 16 byte alignment.
A better way to do this is to compute the amount of stack space you will need for all your stack based local variables (in this case 0) in your main function, plus the maximum number of bytes you will need for any parameters to function calls made by main and then make sure you pad enough bytes to make the value evenly divisible by 12. In our case the maximum number of bytes needed to be pushed for any one given function call is 4 bytes. We then add 8 to 4 (8+4=12) to become evenly divisible by 12. We then subtract 12 from ESP at the start of our function.
Instead of using PUSH to put parameters on the stack you can now move the parameters directly onto the stack into the space we have reserved. Because we don't PUSH the stack doesn't get misaligned. Since we didn't use PUSH we don't need to fix ESP after our function calls. The code could then look something like:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
If you wanted to pass multiple parameters you place them manually on the stack as we did with a single parameter. If we wanted to print an integer 42 as part of our call to printf we could do it this way:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
When run we should get:
Hello, world 42
16-byte Stack Alignment and a Stack Frame
If you are looking to create a function with a typical stack frame then the code in the previous section has to be adjusted. Upon entry to a function in a 32-bit application the stack is misaligned by 4 bytes because the return address was placed on the stack. A typical stack frame prologue looks like:
push ebp
mov ebp, esp
Pushing EBP into the stack after entry to your function still results in a misaligned stack, but it is misaligned now by 8 bytes (4 + 4).
Because of that the code must subtract 8 from ESP rather than 12. As well when determining the space needed to hold parameters, local stack variables, and pad bytes for alignment the stack allocation size will have to be evenly divisible by 8, not by 12. Code with a stack frame could look like:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
push ebp
mov ebp, esp ; Set up stack frame
sub esp, 8 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
xor eax, eax ; Return value = 0
mov esp, ebp
pop ebp ; Remove stack frame
ret ; We linked with C library that calls _main
; after initialization. We can do a RET to
; return back to the C runtime code that will
; exit the program and return the value in EAX
; We can do this instead of calling _exit
Because you link with the C library on OS/X it will provide an entry point and do initialization before calling _main. You can call _exit but you can also do a RET instruction with the program's return value in EAX.
Yet Another Potential NASM Bug?
I discovered that NASM v2.12 installed via MacPorts on El Capitan seems to generate incorrect relocation entries for _printf and _exit, and when linked to a final executable the code doesn't work as expected. I observed almost the identical errors you did with your original code.
The first part of my answer still applies about stack alignment, however it appears you will need to work around the NASM issue as well. One way to do this install the NASM that comes with the latest XCode command line tools. This version is much older and only supports Macho-32, and doesn't support the default directive. Using my previous stack aligned code this should work:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
;default rel ; This directive isn't supported in older versions of NASM
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
To assemble with NASM and link with LD you could use:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
ld -macosx_version_min 10.8 -no_pie -arch i386 -o hello32x hello32x.o -lc
Alternatively you could link with GCC:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
gcc -m32 -Wl,-no_pie -o hello32x hello32x.o
/usr/bin/nasm is the location of the XCode command line tools version of NASM that Apple distributes. The version I have on El Capitan with latest XCode command line tools is:
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Jan 14 2016
I don't recommend NASM version 2.11.08 because it has a serious bug related to macho64 format. I recommend 2.11.09rc2. I have tested that version here and it does seem to work properly with the code above.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Self-modifying code on macOS v10.15 (Catalina) / x64 - macos

Related

Cannot modify data segment register. When tried General Protection Error is thrown

gcc with intel x86-32 bit assembly : accessing C function arguments

static C variable not getting initialized

Why syscall doesn't work?

Segmentation Fault 11 linking os x 32-bit assembler

Categories

Resources