Assembly ".set" directive emitting symbol - macos

In some kernel-mode assembly source I have a line that looks like this:
; excerpt #1
.set __framesize, ROUND_TO_STACK(localvarsize)
(localvarsize is a parameter to a C-preprocessor macro, if you’re wondering.) I assume that __framesize is a compile-time variable that is usable in .if statements, and is then discarded. However, I find references to a symbol named __framesize in the symbol table and disassembly of my kernel. The symbol is defined (as output by nm -m) as such:
; excerpt #2
0000000000000000 (absolute) non-external __framesize
The usage of __framesize in compiler-generated assembly is as such:
; excerpt #3
movq %gs:__framesize, %rax
movq 0x140(%rax), %r15
Given what I understand of my compiler and my kernel, excerpt #3 should be emitted as movq %gs:0x140, %r15, and that code should work. (The code that is actually being emitted from the C as excerpt #3 is causing a triple fault on the second line.)
I have two questions:
Should this __framesize symbol be emitted into my binary by the assembler when used in this fashion? If possible, how can I suppress it?
Would this usage of __framesize cause a problem like what is discussed above?
I am using GAS assembler syntax and the Xcode 7.1.1 assembler, and a Mach-O output format, if it is useful.

The GNU as manual says that .set modifies the value(i.e. address) and/or type of an existing symbol. It's synonymous with .equ, so it can be used to set/modify assembler macro variable, or to mess around with symbols which are also labels.
If __framesize is showing up in the object file, then it's probably declared somewhere else.
Try looking at the disassembly output, to see what really happened.

Related

What does this backslash do in this assembly code?

I am not sure what the difference is in these push lines. (trimmed down from Linux's x86/entry/calling.h, with the xor-zero clearing removed.)
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
pushq \rdx
pushq \rax
pushq %r11
pushq %r12
.endm
Do both push onto the stack? Or do the first two push lines do something different? I am on linux using the GNU toolchain.
These lines where found in a .h file that's included by .S asm source files.
Also can anyone tell me what this code does?
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
Specifically the code after PUSH_AND_CLEAR_REGS.
Inside a GAS .macro, you use \foo to refer to a macro parameter called foo.
The .macro you're looking at has 3 args with default values; presumably in some use-case they want to get alternate values saved in place of what's actually in RAX and RDX. But the rest of the registers get saved and xor-zeroed as normal.
So after macro expansion, yes it's just push %rdx and push %rax, same as the push %r11 and push %r12.
IDK if you were looking at an old version of Linux, but this is in a .h that's included by other .S hand-written asm source files, not by .c sources. I fixed your question for you.
I thought the comment on the GAS macro definition was was pretty clear about the purpose of this macro. See the github link I added to your question.

Using a user defined entry point in assembly x86-64 nasm when compiling with gcc

I recently started learning assembly and was wondering if it is possible for us to have our own defined entry point for an assembly code when compiling with gcc?
For example the standard code that compiles with gcc is
global main
section .data
section .bss
section .text
main:
I would like to change the entry point to a more defined name such as "addition", something like this below.
global addition
section .data
section .bss
section .text
addition:
A reason for why im using gcc to compile in the first place as well is that im using c libraries in my assembly code for "printf" and "scanf", and everytime I tried to change the entry point, I would get an undefined reference to main error.
If you are writing in assembly and not using the C runtime library, then you can call your entry point whatever you want. You tell the linker what the name of the entry point is, using either the gcc command line option -Wl,--entry=<symbol> or the ENTRY directive in the linker script. The linker writes the address of this entry point in the executable file.
If you are using the C runtime library, then the entry point in the executable file needs to be the entry point of the C runtime library, so that it can perform initialization. This entry point is typically called crt0. When crt0 finishes initializing, it calls main, so in this case, you cannot change the name.
You can put multiple labels on the same address. So you can stick the main label at whatever place you want the CRT startup code to call.
global main
main:
addition:
lea eax, [rdi+rdi] ; return argc*2
ret
I checked, and GDB chooses to show main in the disassembly for the block of code following the label, regardless of which one you declare first. (`global addition doesn't help either.)
Of if you want to be able to change one line at the top of your file to select which function is the main entry point, you could maybe do
%define addition main
I'm not sure if NASM lets you create an alias or weak-alias for a symbol, like with GAS
.weakref main, addition. (Call a function in another object file without using PLT within a shared library?)

Doesn't PIC mean no relocations?

I'm building an ELF SO for bada on ARM using GCC. The compiler options include -fpic. Yet in the built file, when I do readelf -r, there's a whole lot of relocation records, of following types:
R_ARM_RELATIVE
R_ARM_REL32
R_ARM_ABS32
R_ARM_GLOB_DAT
R_ARM_JUMP_SLOT
What am I misunderstanding here?
EDIT: from what I can see, the PIC implementation in the compiler doesn't use GOT. Instead, they use PC-relative addressing with stored constants being offsets from point of use to the symbol address; that's resolved by the linker. Like this, to read a global variable:
ldr r12, OffsetToVar
PointOfUse:
ldr r0, [r12, pc]
# r0 now has the value of MyVar
#...
# At function's end...
OffsetToVar:
.long MyVar-PointOfUse-8
# Compiler can't resolve this, since it doesn't know
# the address of MyVar, but linker can
Similar idea for cross-module function calls. When a project mixes ARM and Thumb code though, the latter may misfire. But I've worked around that.
Doesn't PIC mean no relocations?
No, it does not.
It just means no relocations against .text section (so the .text can be shared between multiple processes).

Inline assembly __sync_fetch_and_add and __sync_add_and_fetch

The GCC builtin __sync_fetch_and_add is an implementation of the x86 inline assembly:
asm("lock; xaddl %%eax, %2;"
:"=a" (val)
: "a" (val), "m" (*ptr) : )
How can I implement this inline assembly using the addl instruction instead of xaddl?
And another question that I have is how would be the x86 inline assembly of the builtin __sync_add_and_fetch ?
Thanks.
Builtins do not necessarily correspond with a single well defined chunk of assembly code. In particular both __sync_add_and_fetch and __sync_fetch_and_add will generate lock addl instead of lock xaddl if the result is not live out of the builtin, and they may generate lock incl if the result is not live out and the second argument is known to have the value 1.
It is not clear what you mean by "how can I implement this inline assembly". Assembly is something that you write or generate, not something that you implement (unless you are writing an assembler).

Using DLLs with NASM

I have been doing some x86 programming in Windows with NASM and I have run into some confusion. I am confused as to why I must do this:
extern _ExitProcess#4
Specifically I am confused about the '_' and the '#4'. I know that the '#4' is the size of the stack but why is it needed? When I looked in the kernel32.dll with a hex editor I only saw 'ExitProcess' not '_ExitProcess#4'.
I am also confused as to why C Functions do not need the underscore and the stack size such as this:
extern printf
Why don't C Functions need decorations?
My third question is "Is this the way I should be using these functions?" Right now I am linking with the actual dll files themselves.
I know that the '#4' is the size of the stack but why is it needed?
To enable the linker to report a fatal error if your compiler assumed the wrong calling convention for the function (this can happen if you forget to include header files in C and ignore all the compiler warnings or if a declaration doesn't exactly match the function in the shared library).
Why don't C Functions need decorations?
Functions that use the cdecl calling convention are decorated with a single leading (so it would actually be _printf).
The reason why no parameter size is encoded into the decorated name is that the caller is responsible for both setting up and tearing down the stack, so an argument count mismatch will not be fatal for the stack setup (though the calling function might still crash if it isn't given the right arguments, of course). It might even be possible that the argument count is variable, like in the case of printf.
When I looked in the kernel32.dll with a hex editor I only saw ExitProcess not _ExitProcess#4.
The mangled names are usually mapped to the actual exported names of the DLL using definition files (*.def), which then get compiled to *.lib import library files that can be used in your linker invocation. An example of such a definition file for kernel32.dll is this one. The following line defines the mapping for ExitProcess:
_ExitProcess#4 = ExitProcess
Is this the way I should be using these functions?
I don't know NASM very well, but the code I've seen so far usually specifies the decorated name, like in your example.
You can find more information on this excellent page about Win32 calling conventions.

Resources