Can we use PUSH instruction as ' PUSH AL ' - x86-16

8086 Microprocessor Instruction.Can we use PUSH instruction as 'PUSH AL'? What is the syntax of PUSH instruction either its operand is 16 bit or 8 bit

On 8086 push decrements the SP register by 2 and then writes a 16 bit value at the memory pointed to by the SP register. You cannot write push al because AL is an 8 bit value and push requires a 16 bit value. So you must write push ax.

Related

When a push or pop instruction is being executed then what is the byte size of the value that is being pushed or popped? [duplicate]

I can push 4 bytes onto the stack by doing this:
push DWORD 123
But I have found out that I can use push without specifying the operand size:
push 123
In this case, how many bytes does the push instruction push onto the stack? Does the number of bytes pushed depends on the operand size (so in my example it will push 1 byte)?
Does the number of bytes pushed depends on the operand size
It doesn't depend on the value of the number. The technical x86 term for how many bytes push pushes is "operand-size", but that's a separate thing from whether the number fits in an imm8 or not.
See also Does each PUSH instruction push a multiple of 8 bytes on x64?
(so in my example it will push 1 byte)?
No, the size of the immediate is not the operand-size. It always pushes 4 bytes in 32-bit code, or 64 in 64-bit code, unless you do something weird.
Recommendation: always just write push 123 or push 0x12345 to use the default push size for the mode you're in and and let the assembler pick the encoding. That is almost always what you want. If that's all you wanted to know, you can stop reading now.
First of all, it's useful to know what sizes of push are even possible in x86 machine code:
In 16-bit mode, you can push 16 or (with operand-size prefix on 386 and later) 32 bits.
In 32-bit mode, you can push 32 or (with operand-size prefix) 16 bits.
In 64-bit mode, you can push 64 or (with operand-size prefix) 16 bits.
A REX.W=0 prefix does not let you encode a 32-bit push.1
There are no other options. The stack pointer is always decremented by the operand-size of the push2. (So it's possible to "misalign" the stack by pushing 16 bits). pop has the same choices of size: 16, 32, or 64, except no 32-bit pop in 64-bit mode.
This applies whether you're pushing a register or an immediate, and regardless of whether the immediate fits in a sign-extended imm8 or it needs an imm32 (or imm16 for 16-bit pushes). (A 64-bit push imm32 sign-extends to 64-bit. There is no push imm64, only mov reg, imm64)
In NASM source code, push 123 assembles to the operand-size that matches the mode you're in. In your case, I think you're writing 32-bit code, so push 123 is a 32-bit push, even though it can (and does) use the push imm8 encoding.
Your assembler always knows what kind of code it's assembling, since it has to know when to use or not use operand-size prefixes when you do force the operand-size.
MASM is the same; the only thing that might be different is the syntax for forcing a different operand-size.
Anything you write in assembler will assemble to one of the valid machine-code options (because the people that wrote the assembler know what is and isn't encodeable), so no, you can't push a single byte with a push instruction. If you wanted that, you could emulate it with dec esp / mov byte [esp], 123
NASM Examples:
Output from nasm -l /dev/stdout to dump a listing to the terminal, along with the original source line.
Lightly edited to separate opcode and prefix bytes from the operands. (Unlike objdump -drwC -Mintel, NASM's disassembly format doesn't leave spaces between bytes in the machine-code hexdump).
68 80000000 push 128
6A 80 push -128 ;; signed imm8 is -128 to +127
6A 7B push byte 123
6A 7B push dword 123 ;; still optimized to the imm8 encoding
68 7B000000 push strict dword 123
6A 80 push strict byte 0x80 ;; will decode as push -128
****************** warning: signed byte value exceeds bounds [-w+number-overflow]
dword is normally an operand-size thing, while strict dword is how you request that the assembler doesn't optimize it to a smaller encoding.
All the preceding instructions are 32-bit pushes (or 64-bit in 64-bit mode, with the same machine code). All the following instructions are 16-bit pushes, regardless of what mode you assemble them in. (If assembled in 16-bit mode, they won't have a 0x66 operand-size prefix)
66 6A 7B push word 123
66 68 8000 push word 128
66 68 7B00 push strict word 123
NASM apparently seems to treat the byte and dword overrides as applying to the size of the immediate, but word applies to the operand-size of the instruction. Actually using o32 push 12 in 64-bit mode doesn't get a warning either. push eax does, though: "error: instruction not supported in 64-bit mode".
Notice that push imm8 is encoded as 6A ib in all modes. With no operand-size prefix, the operand size is the mode's size. (e.g. 6A FF decodes in long mode as a 64-bit operand-size push with an operand of -1, decrementing RSP by 8 and doing an 8-byte store.)
The address-size prefix only affects the explicit addressing mode used for push with a memory-source, e.g. in 64-bit mode: push qword [rsi] (no prefixes) vs. push qword [esi] (address-size prefix for 32-bit addressing mode). push dword [rsi] is not encodeable, because nothing can make the operand-size 32-bit in 64-bit code1. push qword [esi] does not truncate rsp to 32-bit. Apparently "Stack Address Width" is a different thing, probably set in a segment descriptor. (It's always 64 in 64-bit code on a normal OS, I think even for Linux's x32 ABI: ILP32 in long mode.)
When would you ever want to push 16 bits? If you're writing in asm for performance reasons, then probably never. In my code-golf adler32, a narrow push -> wide pop took fewer bytes of code than shift/OR to combine two 16b integers into a 32b value.
Or maybe in an exploit for 64-bit code, you might want to push some data onto the stack without gaps. You can't just use push imm32, because that sign or zero extends to 64-bit. You could do it in 16-bit chunks with multiple 16-bit push instructions. But still probably more efficient to mov rax, imm64 / push rax (10B+1B = 11B for an 8B imm payload). Or push 0xDEADBEEF / mov dword [rsp+4], 0xDEADC0DE (5B + 8B = 13B and doesn't need a register). four 16-bit pushes would take 16B.
Footnotes:
In fact REX.W=0 is ignored, and doesn't modify the operand-size away from its default 64-bit. NASM, YASM, and GAS all assemble push r12 to 41 54, not 49 54. GNU objdjump thinks 49 54 is unusual, and decodes it as 49 54 rex.WB push r12. (Both execute the same). Microsoft agrees as well, using a 40h REX as padding on push rbx in some Windows DLLs.
Intel just says that 32-bit pushes are "not encodeable" (N.E. in the table) in long mode. I don't understand why W=1 isn't the standard encoding for push / pop when a REX prefix is needed, but apparently the choice is arbitrary.
Fun-fact: only stack instructions and a few others default to 64-bit operand size in 64-bit mode. In machine code, add rax, rdx needs a REX prefix (with the W bit set). Otherwise it would decode as add eax, edx. But you can't decrease the operand-size with a REX.W=0 when it defaults to 64-bit, only increase it when it defaults to 32.
http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix lists the instructions that default to 64-bit in 64-bit mode. Note that jrcxz doesn't strictly belong in that list, because the register it checks (cx/ecx/rcx) is determined by address-size, not operand-size, so it can be overridden to 32-bit (but not 16-bit) in 64-bit mode. loop is the same.
It's strange that Intel's instruction reference manual entry for push (HTML extract: http://felixcloutier.com/x86/PUSH.html)
shows what would happen for a 32-bit operand-size push in 64-bit mode (the only case where stack address width can be 64, so it uses rsp). Perhaps it's achievable somehow with some non-standard settings in the code-segment descriptor, so you can't do it in normal 64-bit code running under a normal OS. Or more likely it's an oversight, and that's what would happen if it was encodeable, but it's not.
Except segment registers are 16-bit, but a normal push fs will still decrement the stack pointer by the stack-width (operand-size). Intel documents that recent Intel CPUs only do a 16b store in that case, leaving the rest of the 32 or 64b unmodified.
x86 doesn't officially have a stack width that's enforced in hardware. It's a software / calling convention term, e.g. char and short args passed on the stack in any calling conventions are padded out to 4B or 8B, so the stack stays aligned. (Modern 32 and 64-bit calling conventions such as the x86-32 System V psABI used by Linux keep the stack 16B aligned before function calls, even though an arg "slot" on the stack is still only 4B). Anyway, "stack width" is only a programming convention on any architecture.
The closest thing in the x86 ISA to a "stack width" is the default operand-size of push/pop. But you can manipulate the stack pointer however you want, e.g. sub esp,1. You can, but don't for performance reasons :P
The "stack width" in a computer, which is the smallest amount of data that can be pushed onto the stack, is defined to be the register size of the processor. This means that if you are dealing with a processor with 16 bit registers, the stack width will be 2 bytes. If the processor has 32 bit registers, the stack width is 4 bytes. If the processor has 64 bit registers, the stack width is 8 bytes.
Don't be confused when using modern x86/x86_64 systems; if the system is running in a 32 bit mode, the stack width and register size is 32 bits or 4 bytes. If you switch to 64 bit mode, then and only then will the register and stack size change.

Allocating memory using malloc() in 32-bit and 64-bit assembly language

I have to do a 64 bits stack. To make myself comfortable with malloc I managed to write two integers(32 bits) into memory and read from there:
But, when i try to do this with 64 bits:
The first snippet of code works perfectly fine. As Jester suggested, you are writing a 64-bit value in two separate (32-bit) halves. This is the way you have to do it on a 32-bit architecture. You don't have 64-bit registers available, and you can't write 64-bit chunks of memory at once. But you already seemed to know that, so I won't belabor it.
In the second snippet of code, you tried to target a 64-bit architecture (x86-64). Now, you no longer have to write 64-bit values in two 32-bit halves, since 64-bit architectures natively support 64-bit integers. You have 64-bit wide registers available, and you can write a 64-bit chunk to memory directly. Take advantage of that to simplify (and speed up) the code.
The 64-bit registers are Rxx instead of Exx. When you use QWORD PTR, you will want to use Rxx; when you use DWORD PTR, you will want to use Exx. Both are legal in 64-bit code, but only 32-bit DWORDs are legal in 32-bit code.
A couple of other things to note:
Although it is perfectly valid to clear a register using MOV xxx, 0, it is smaller and faster to use XOR eax, eax, so this is generally what you should write. It is a very old trick, something that any assembly-language programmer should know, and if you ever try to read other people's assembly programs, you'll need to be familiar with this idiom. (But actually, in the code you're writing, you don't need to do this at all. For the reason why, see point #2.)
In 64-bit mode, all instructions implicitly zero the upper 32 bits when writing the lower 32 bits, so you can simply write XOR eax, eax instead of XOR rax, rax. This is, again, smaller and faster.
The calling convention for 64-bit programs is different than the one used in 32-bit programs. The exact specification of the calling convention is going to vary, depending on which operating system you're using. As Peter Cordes commented, there is information on this in the x86 tag wiki. Both Windows and Linux x64 calling conventions pass at least the first 4 integer parameters in registers (rather than on the stack like the x86-32 calling convention), but which registers are actually used is different. Also, the 64-bit calling conventions have different requirements than do the 32-bit calling conventions for how you must set up the stack before calling functions.
(Since your screenshot says something about "MASM", I'll assume that you're using Windows in the sample code below.)
; Set up the stack, as required by the Windows x64 calling convention.
; (Note that we use the 64-bit form of the instruction, with the RSP register,
; to support stack pointers larger than 32 bits.)
sub rsp, 40
; Dynamically allocate 8 bytes of memory by calling malloc().
; (Note that the x64 calling convention passes the parameter in a register, rather
; than via the stack. On Windows, the first parameter is passed in RCX.)
; (Also note that we use the 32-bit form of the instruction here, storing the
; value into ECX, which is safe because it implicitly zeros the upper 32 bits.)
mov ecx, 8
call malloc
; Write a single 64-bit value into memory.
; (The pointer to the memory block allocated by malloc() is returned in RAX.)
mov qword ptr [rax], 1
; ... do whatever
; Clean up the stack space that we allocated at the top of the function.
add rsp, 40
If you wanted to do this in 32-bit halves, even on a 64-bit architecture, you certainly could. That would look like the following:
sub rsp, 40 ; set up stack
mov ecx, 8 ; request 8 bytes
call malloc ; allocate memory
mov dword ptr [eax], 1 ; write "1" into low 32 bits
mov dword ptr [eax+4], 2 ; write "2" into high 32 bits
; ... do whatever
add rsp, 40 ; clean up stack
Note that these last two MOV instructions are identical to what you wrote in the 32-bit version of the code. That makes sense, because you're doing exactly the same thing.
The reason the code you originally wrote didn't work is because EAX doesn't contain a QWORD PTR, it contains a DWORD PTR. Hence, the assembler generated the "invalid instruction operands" error, because there was a mismatch. This is the same reason that you don't offset by 8, because a DWORD PTR is only 4 bytes. A QWORD PTR is indeed 8 bytes, but you don't have one of those in EAX.
Or, if you wanted to write 16 bytes:
sub rsp, 40 ; set up stack
mov ecx, 16 ; request 16 bytes
call malloc ; allocate memory
mov qword ptr [rax], 1 ; write "1" into low 64 bits
mov qword ptr [rax+8], 2 ; write "2" into high 64 bits
; ... do whatever
add rsp, 40 ; clean up stack
Compare these three snippets of code, and make sure you understand the differences and why they need to be written as they are!

Understanding OSX 16-Byte alignment

So it seems like everyone knows that OSX syscalls are always 16 byte stack aligned. Great, that makes sense when you have code like this:
section .data
message db 'something', 10, 0
section .text
global start
start:
push 10 ; size of the message (4 bytes)
push msg ; the address of the message (4 bytes)
push 1 ; we want to write to STD_OUT (4 bytes)
mov eax, 4 ; write(...) syscall
sub esp, 4 ; move stack pointer down to 4 bytes for a total of 16.
int 0x80 ; invoke
add esp, 16 ; clean
Perfect, the stack is aligned to 16 bytes, makes perfect sense. How about though we call syscall(1) (exit). Logically that would look something like this:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 12 ; push down stack for total of 16 bytes.
int 0x80 ; invoke
This doesn't work though, but this does:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 4 ; push down stack for total of 8 bytes.
int 0x80 ; invoke
That works fine, but that's only 8 bytes???? Osx is cool, but this ABI is driving me nuts. Can someone shed some light on what I'm not understanding?
Short version: you probably don't need to align to 16 bytes, you just need to always leave a 4-byte gap before your argument list.
Long version:
Here's what I think is happening: I'm not sure that it's true that the stack should be 16-byte aligned. However, logic dictates that if it is and if padding or adjusting the stack is necessary to achieve that alignment, it must happen before the arguments for the syscall are pushed, not after. There can't be an arbitrary number of bytes between the stack pointer at the time of the int 0x80 instruction and where the arguments actually are. The kernel wouldn't know where to find the actual arguments. Subtracting from the stack pointer after pushing the arguments to achieve "alignment" doesn't align the arguments, it aligns the stack pointer by inserting an arbitrary number of bytes between the stack pointer and the arguments. Whatever else may be true, that can't be right.
Then why do the first and third snippets work at all? Don't they also insert arbitrary bytes there? They work by accident. It's because they both happen to insert 4 bytes. That adjustment isn't "successful" because it achieves stack alignment, it's part of the syscall ABI. Apparently, the syscall ABI expects and requires that there be a 4-byte slot before the argument list.
The source for the syscall() function can be found here. It looks like this:
LEAF(___syscall, 0)
popl %ecx // ret addr
popl %eax // syscall number
pushl %ecx
UNIX_SYSCALL_TRAP
movl (%esp),%edx // add one element to stack so
pushl %ecx // caller "pop" will work
jnb 2f
BRANCH_EXTERN(cerror)
2:
END(___syscall)
To call this library function, the caller will have set up the stack pointer to point to the arguments to the syscall() function, which starts with the syscall number and then has the real arguments for the actual syscall. However, the caller will then have used a call instruction to call it, which pushed the return address onto the stack.
So, the above code pops the return address, pops the syscall number into %eax, pushes the return address back onto the stack (where the syscall number originally was), and then does int 0x80. So, the stack pointer points to the return address and then the arguments. There's the extra 4 bytes: the return address. I suspect the kernel ignores the return address. I guess its presence in the syscall ABI may just be to make the ABI for system calls similar to that of function calls.
What does this mean for the alignment requirement of syscalls? Well, this function is guaranteed to change the alignment of the stack from how it was set up by its caller. The caller presumably set up the stack with 16-byte alignment and this function moves it by 4 bytes before the interrupt. It may just be a myth that the stack needs to be 16-byte aligned for syscalls. On the other hand, the 16-byte alignment requirement is definitely real for calling system library functions. The Wine project, for which I develop, was burned by it. It is mostly necessary for 128-bit SSE argument data types, but Apple made their lazy symbol resolver deliberately blow up if the alignemtn is wrong even for functions which don't use such arguments so that problems would be found early. Syscalls would not be subject to that early-failure mechanism. It may be that the kernel doesn't require the 16-byte alignment. I'm not sure if any syscalls take 128-bit arguments.

push on 64bit intel osx

I want to push 64 bit address on stack as below,
__asm("pushq $0x1122334455667788");
But I get compilation error and I can only push in following way,
__asm("pushq $0x11223344");
Can someone help me understand my mistake?
I am new to assembly, so please excuse me if my question sounds stupid.
x86-64 has some interesting quirks, which aren't obvious even if you're familiar with 32-bit x86...
Most instructions can only take a 32-bit immediate value, which is sign-extended to 64 bits if used in a 64-bit context. (The instruction encoding stores only 32 bits.)
This means that you can use pushq for immedate values in the range 0x0 - 0x7fffffff (i.e. positive signed 32-bit values which are sign-extended with 0 bits) or 0xffffffff80000000 - 0xffffffffffffffff) (i.e. negative signed 32-bit values which are sign-extended with 1 bits). But you cannot use values outside this range (as they cannot be represented in the instruction encoding).
mov is a special case: there is an encoding which takes a full 64-bit immediate operand. Hence Daniel's answer (which is probably your best bet).
If you really don't want to corrupt a register, you could use multiple pushes of smaller values. However, the obvious thing of pushing two 32-bit values won't work. In the 64-bit world, push will work with a 64 bit operand (subject to point 1 above, if it's an immediate constant), or a 16 bit operand, but not a 32 bit operand (even pushl %eax is not valid). So the best you can do is 4 16-bit pushes:
pushw $0x1122; pushw $0x3344; pushw $0x5566; pushw $0x7788
Your best bet would be to do something like this.
movq $0x1122334455667788, %rax
pushq %rax
Replace %rax with any other 64-bit register you find appropriate.
There is no single instruction capable of taking a 64-bit immediate value and pushing that onto the stack.
from how to use rip relative addressing
pushq my_const(%rip)
...
my_const: .quad 1122334455667788

NASM on OS X work with bit, not byte

I am working on my first NASM program, and while trying to figure out the not instruction, I realized that instead of reversing the bit 0, it was reversing the byte 00000000. How would I tell it to work with a bit or otherwise fix this? Here is my code...
section .text
global start
start:
mov eax, 255
not eax
push eax
mov eax, 0x1
sub esp, 4
int 0x80
Feel free to give me pointers also on my assembly coding, as I don't want to get into any bad habits.
In most computer architectures (including the x86), a bit is not a directly addressable unit of memory. The smallest unit that you can directly refer to is a byte, which happens to contain 8 bits on the x86. You have not stated what you're exactly trying to accomplish, so I'm not able to give you an exact solution to your problem, but working with single bits (or groups of bits) is most often achieved by masking out the bits that are of no interest with the AND instruction, eventually shifting the value left or right, and then doing the processing.
If you want to actually get the value of the n-th bit in a register, then you're most probably looking for the instruction BT. It stores the value of the n-th bit in the Carry Flag.
When it comes to other tips : the push instruction decrements the stack pointer by the number of bytes pushed to the stack. This is a characteristic of the x86 architecture - the stack grows, by design, downwards. Therefore, if you want to free some space on the stack, you do add esp, number_of_bytes, not sub (the way you did), which just reserves more space on the stack.

Resources