What is the correct gnu assembly syntax for doing the following:
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
max_buffer_sectors: .word ((0x9fc00 - $data_buffer) / 512) # <=== need help here
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
Specifically, I'm writing a toy OS. The code above is in 16-bit real mode. I'm setting up a data buffer that will be dumped back to the boot disk. I want to calculate the number of sectors there are between where data_buffer gets placed in memory, and the upper bound of that data buffer. (Address 0x9fc00 is where the buffer would run into RAM reserved for other purposes.)
I know I could write assembly code to calculate this; but, since it is a constant known at build time, I'm curious if I can get the assembler to calculate it for me.
I'm running into three specific problems:
(1) If I use $data_buffer I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: can't resolve `L0' {*ABS* section} - `$data_buffer' {*UND* section}
which I find confusing, because I should use $ when I want the memory address of a label, correct?
(2) If I use data_buffer instead of $data_buffer, I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: value of 653855 too large for field of 2 bytes at 31
make: *** [obj/boot/dd_test.o] Error 1
which seems to suggest that the assembler is complaining about the size of the intermediate value (which does not need to fit in a 16-bit word).
(3) And, of course, what's up with the missing ')'?
When you use expressions in GNU assembler they have to resolve to absolute values. GNU assembler isn't aware of what the origin point of the code will actually be at. That is what the linker is for. Because of that data_buffer absolute address isn't known until linking is done so it is considered relocatable. If you take an absolute value like 0x9fc00 and subtract a relocatable value from it you get a relocatable value. Relocatable values can't be used in constant (absolute) expressions.
All is not lost. The linker itself will know the absolute address once it arranges everything in memory. You seem to suggest you already use a linker script which means the work you have to do is minimal. You can use the linker to compute the value of max_buffer_sectors.
Your linker script will have a SECTIONS directive like:
SECTIONS
{
[your section contents here]
}
You can create a linker symbol max_buffer_sectors with something like:
SECTIONS
{
max_buffer_sectors = (0x9fc00 - (data_buffer)) / 512;
[your section contents here]
}
This will allow the linker to compute the size since it will know data_buffer absolute address in memory.
Your GNU assembly file will need a bit of tweaking:
.globl data_buffer
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
You'll notice I used .globl data_buffer. This exports the symbol and makes it global so that the linker can use it.
You can then use the symbol max_buffer_sectors in code like:
mov $max_buffer_sectors, %ax
Related
I have a linker script in which I have defined a section for containing the checksum of a software image. Something like:
...
.my_checksum :
{
__checksum_is_here = .;
KEEP (*(.my_checksum))
. = ALIGN(4);
_sw_image_code_end = .;
} > IMAGE
...
The checksum is placed into that section by using objcopy --update-section.
I build an elf file by using the arm gcc compiler, and I can see this section and its value within it:
> arm-none-eabu-objdumph -h my_elf_file.elf
...
0 .text 0001496c 08010000 08010000 00010000 2**4
...
7 .my_checksum 00000004 080250c0 080250c0 000350c0 2**2
...
// Notice that 000350c0 is the file offset and 080250c0 is the LMA.
// The starting LMA is 08010000
And I can retrieve its value:
> xxd -s 0x000350c0 -l 4 my_elf_file.elf
000350c0: 015e 028e // I have checked this value and it is correct.
Now I generate a bin file by executing
> arm-none-eabi-objcopy -O binary --gap-fill 0xFF -S my_elf_file.elf my_elf_file.bin
Now, if I try to read the checksum value again, using the difference between the checksum LMA and the first section LMA (see above):
> xxd -s 0x150c0 -l 4 my_elf_file.bin
The result I obtain here is different from the one obtained in the elf file, that is, the checksum section has been removed by objcopy. (That's what I think at least).
Nevertheless, If I define this in my main.c file:
static volatile unsigned int __aux_checksum __attribute__((section(".my_checksum")));
...
int main() {
...
((void)__aux_checksum); // Avoid compiler/linker optimizations.
...
}
Now, if I replicate the same steps as above with the elf and bin files (using the proper offsets), I can retrieve the checksum from the bin file (elf and bin give the same result).
Questions
My first question is: I know that you can define a section using __attribute__((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
My second question is: Is this the only way for preventing objcopy of removing this particular section?
Lets answer your 2nd question first,
Is this the only way for preventing objcopy of removing this particular section?
You need a concept as documented in the gnu LD manual under SECTIONS.
4.6.8.1. Output Section Type
Each output section may have a type. The type is a keyword in parentheses. The following types are defined:
NOLOAD
The section should be marked as not loadable, so that it will not be loaded into memory when the program is run.
DSECT, COPY, INFO, OVERLAY
These type names are supported for backward compatibility, and are rarely used. They all have the same effect: the section should be marked as not allocatable, so that no memory is allocated for the section when the program is run.
The linker normally sets the attributes of an output section based on the input sections which map into it. You can override this by using the section type. For example, in the script sample below, the ROM section is addressed at memory location 0 and does not need to be loaded when the program is run. The contents of the ROM section will appear in the linker output file as usual.
SECTIONS {
ROM 0 (NOLOAD) : { … }
…
}
So what does that mean? Say you have debugging info in your objects. If you are burning a ROM image you probably don't want to place the debugging info in the object. As well, the BSS segment is all zero and there is no need to store it to ROM, but you need to clear our RAM (at the load address) to make way for it. The 'init value' for the .data section is initialized from ROM but resides in RAM. The concepts are 'loadable' and 'allocatable' and they have flags for them in an ELF file. By default your .my_checksum gets no flags. Ie, not allocated and not loadable like debug info.
I know that you can define a section using attribute((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
From the above,
The linker normally sets the attributes of an output section based on the input sections which map into it.
Your input sections flags get inherited by your output section. So you have put in at least allocatable as a flag.
I would suggest that you just put your checksum at the end of either .text or .data. For instance, input secttions .rodata (constant values) usually get put with the output .text. There is usually no need to invent another output sections unless you want some book keeping that wont get to the final image. Your __checksum_is_here label is sufficient to find it and you can look at this question on CRCs.
I'm trying to use .ascii directive in the gcc extended asm command but I keep getting compiler errors. What is the exact syntax for directives inside extended asm?
I tried the following options but none of the worked:
asm ("NOP;"
".ASCII ""ABC"""
);
I got "Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
".ASCII "ABC""
);
I got Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
.ASCII "ABC"
);
I got "error: expected ‘:’ or ‘)’ before ‘/’ token"
The syntax for directives inside the asm is identical to writing GNU Assembler, so you can reference the GNU Assembler manual for the relevant syntax.
Example:
#include <stdio.h>
int
main (void)
{
char *string;
asm (".pushsection .rodata\n"
"0:\n"
" .ascii \"Testing 1 2 3!\"\n"
" .popsection\n"
" mov $0b, %0\n":"=rm" (string));
puts (string);
}
In the example we use an extended asm to copy the address of a string to a char * and then pass that to puts to print the string.
The string needs to be placed into the appropriate linker section, not just added to the current (usually the code section i.e. .text). So you begin by pushing the section you want the string stored to into the assembler's section stack. In this example I give it's the read only data section (.rodata) where most strings live. Then you pop the section off the section stack to get back to whatever section the compiler left you in, and do your operation with the string address. The trick is to use a local label like 0 to reference the string and let the assembler and linker compute the offset for you. This may require more work if you're PIE or PIC depending on how much more complicated your references become or if they require relocations.
I have this code
global start
section .text
start:
mov rax,0x2000004
mov rdi,1
mov rsi,msg
mov rdx,msg.len
syscall
mov rax,0x2000004
mov rdi,2
mov rsi,msgt
mov rdx,msgt.len
syscall
mov rax,0x2000004
mov rdi,3
mov rsi,msgtn
mov rdx,msgtn.len
syscall
mov rax,0x2000001
mov rdi,0
syscall
section .data
msg: db "This is a string",10
.len: equ $ - msg
var: db 1
msgt: db "output of 1+1: "
.len: equ $ - msgt
msgtn: db 1
.len: equ $ - msg
I want to print the variable msgtn. I tried msgt: db "output of 1+1", var
But the NASM assembler failed with:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Instead of the variable, I also tried "output of 1+1", [1+1], but I got:
second.s:35: error: expression syntax error
I tried it also without the parantheses, there was no number, but only the string "1+1".
The command I used to assemble my program was:
/usr/local/Cellar/nasm/*/bin/nasm -f macho64 second.s && ld -macosx_version_min 10.7.0 second.o second.o
nasm -v shows:
NASM version 2.11.08 compiled on Nov 27 2015
OS X 10.9.5 with Intel core i5 (x86_64 assembly)
db directives let you put assemble-time-constant bytes into the object file (usually in the data section). You can use an expression as an argument, to have the assembler do some math for you at assemble time. Anything that needs to happen at run time needs to be done by instructions that you write, and that get run. It's not like C++ where a global variable can have a constructor that gets run at startup behind the scenes.
msgt: db "output of 1+1", var
would place those ascii characters, followed by (the low byte of?) the absolute address of var. You'd use this kind of thing (with dd or dq) to do something like this C: int var; int *global_ptr = &var;, where you have a global/static pointer variable that starts out initialized to point to another global/static variable. I'm not sure if MacOS X allows this with a 64bit pointer, or if it just refuses to do relocations for 32bit addresses. But that's why you're getting:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Notice that numeric value of the pointer depends on where in virtual address space the code is loaded. So the address isn't strictly an assemble-time constant. The linker needs to mark things that need run-time relocation, like those 64bit immediate-constant addresses you mov into registers (mov rsi,msg). See this answer for some information on the difference between that and lea rsi, [rel msg] to get the address into a register using a RIP-relative method. (That answer has links to more detailed info, and so do the x86 wiki).
Your attempt at using db [1+1]: What the heck were you expecting? [] in NASM syntax means memory reference. First: the resulting byte has to be an assemble-time constant. I'm not sure if there's an easy syntax for duplicating whatever's at some other address, but this isn't it. (I'd just define a macro and use it in both places.) Second: 2 is not a valid address.
msgt: db "output of 1+1: ", '0' + 1 + 1, 10
would put the ASCII characters: output of 1+1: 2\n at that point in the object file. 10 is the decimal value of ASCII newline. '0' is a way of writing 0x30, the ASCII encoding the character '0'. A 2 byte is not a printable ASCII character. Your version that did that would have printed a 2 byte there, but you wouldn't notice unless you piped the output into hexdump (or od -t x1c or something, IDK what OS X provides. od isn't very nice, but it is widely available.)
Note that this string is not null-terminated. If you want to pass it to something expecting an implicit-length string (like fputs(3) or strchr(3), instead of write(2) or memchr(3)), tack on an extra , 0 to add a zero-byte after everything else.
If you wanted to do the math at run-time, you need to get data into register, add it, then store a string representation of the number into a buffer somewhere. (Or print it one byte at a time, but that's horrible.)
The easy way is to just call printf, to easily print a constant string with some stuff substituted in. Spend your time writing asm for the part of your code that needs to be hand-tuned, not re-implementing library functions.
There's some discussion of int-to-string in comments.
Your link command looks funny:
ld -macosx_version_min 10.7.0 second.o second.o
Are you sure you want the same .o twice?
You could save some code bytes by only moving to 32bit registers when you don't need sign-extension into the 64bit reg. e.g. mov edi,2 instead of mov rdi,2 saves a byte (the REX prefix), unless NASM is clever and does that anyway (actually, it does).
lea rsi, [rel msg] (or use default rel) is a shorter instruction than mov r64, imm64, though. (The AT&T mnemonic is movabs, but Intel syntax still calls it mov.)
Bootloader is seperated into 2 stages. First stage is written in assembly and only loads second stage, second stage is in C. Stage1 loads code in C to address 0x0500:0, and jumps there. Stage2 have to write "hello message" and halt.
I tried different ways to set starting address to raw binary made by: (but nothing worked)
cc -nostartfiles -nostdlib -c stage2.c
ld -s -T scrptfile.ld stage2.o /* I'm using ld just to set starting address of executable */
objcopy -O binary stage2 stage2.bin /* delete all unuseful data */
Linker script
SECTIONS
{
. = 0x0500;
.text : { *(.text)}
.data : { *(.data)}
.bss : { *(.bss)}
}
Maybe I delete with objcopy somethnig that shouldt be deleted.
How can I execute this stage2.bin then?
As I understand, written C code using 32-bits length instructions, when raw binary allows only 16?
P.S. Parameter -set-start (objcopy) returns an error: Invalid bfd target. It is because output file is binary?
Thank you for answers.
. = 0x0500 does not correspond to 0x0500:0. 0x0500:0 is physical address 0x5000, not 0x500.
Also, if you're trying to compile C code as 32-bit and run it in real mode (which is 16-bit), it won't work. You need to either compile code as 16-bit or switch the CPU into 32-bit protected mode. There aren't that many C compilers still compiling 16-bit code. Turbo C++ is one, Open Watcom is another. AFAIK, gcc can't do that.
Finally, I'm guessing you expect the entry point to be at 0x500:0 (0x5000 physical). You need to either tell this to the linker (I don't remember how, if at all possible) or deal with an arbitrary location of the entry point (i.e. extract it from the binary somehow).
In the following line of code (which declares a global variable),
unsigned int __attribute__((section(".myVarSection,\"aw\",#nobits#"))) myVar;
what does the "aw" flag mean?
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?
The section("section-name") attribute places a variable in a specific section by producing the following assembler line:
.section section-name,"aw",#progbits
When you set section-name to ".myVarSection,\"aw\",#nobits#" you exploit a kind of "code injection" in GCC to produce:
.section .myVarSection,"aw",#nobits#,"aw",#progbits
Note that # sign starts a one-line comment.
See GNU Assembler manual for the full description of .section directive. A general syntax is
.section name [, "flags"[, #type[,flag_specific_arguments]]]
so "aw" are flags:
a: section is allocatable
w: section is writable
and #nobits is a type:
#nobits: section does not contain data (i.e., section only occupies space)
All the above is also applicable to functions, not just variables.
what does the "aw" flag mean?
It means that the section is allocatable (i.e. it's loaded to the memory at runtime) and writable (and readable, of course).
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?
#nobits (# is just a part of the name) means that the section isn't stored in the image on disk, it only exists in runtime (and it's filled with zeros at the startup).
# character begins the comment, so whatever the compiler will put in addition to what you have specified will be ignored.