What is the initial value of `ESP`?

What is the initial value of `ESP`? - windows

Under Visual Studio masm:
mov ecx,270
l1: pop eax
loop l1
push eax
The point of this code is to find out if there is and what is the initial value of ESP. I try to pop immediately after the program starts, and experiment that after how many pop a push will create some memory reading related error. The result of the experiment is somehow unstable, even with exactly same number for ecx. Generally, greater than 512 will always(in my limited times of experiments) create an error, less than 128 is always "safe", and values around 250 to 400 will sometimes create error. It seem that there is no initial value for ESP. If there is, my experiment should create some stable result.
OK I run 127 for other 10 more times and now it start to crash. I am trying to experiment more numbers about this.
Let us just say using Windows-x86, on an average moment of starting a program like my experiment's program. How Windows determine what will be the initial value of esp? Is this difficult to determine(because I could imagine simply put the last address of stack segement in esp)? Is there a common practice of how to do this?

The initial value is wherever the OS put the stack in the process's virtual address space. In modern operating systems it's random.
What is above the top of the stack at _start is architecture-dependent. On Windows, you get an actual return address to something that will exit the current thread. On Linux, you get the command line and the environment variables. In any case, popping stuff from the stack that you didn't push is not going to be ABI compliant and will get you into trouble. The only rules that remain at that point are the security rules.

Related

number of executed bytes when debugging a PE

I am currently writing a small debugger in assembly on windows plateform.
I open the debuggee process as follow:
invoke CreateProcess, addr buffer, NULL, NULL, NULL, FALSE, DEBUG_PROCESS+DEBUG_ONLY_THIS_PROCESS, NULL, NULL, addr startinfo, addr pi
It works well, i can get the EIP by looking on the context of the debuggee and so i can get the 1st byte of the instruction that will be executed.
However, I need to get the number of bytes that have been executed in the previous instruction.
Instructions are not size independant. Sometimes an instruction is just 1 byte, and some other time 6 bytes or more.
I tried to substract the previous EIP with the current EIP in order to get the number of bytes that have been executed. But it doesn't work if there is a jmp or a call because the address space is not the same anymore.
I planned to get a map of all opcode and make some cmp, but it seems to be a huge work to do.
If you have some idea in order to get the number of byte of the previous instruction that has been executed (maybe looking into a cache or something like that), please let me know.
Best regards

TL;DR
Keep it simple: single step and decode only the branch instructions and use EIP - last EIP unless the last instruction was a branch (in that case use the decoding to find the length).
If an unknown instruction is found, back off and don't provide its size.
It's impossible to decode an x86 instruction stream backward because x86 encoding is not symmetric (w.r.t. address growth), to see this consider mov eax, 90909090h or similar.
So you need to disassemble each instruction as you single step through the program (a debugger needs this anyway) and record its size.
The control transfer instructions are significantly less than the total number of instructions, so you could decode just that and use the EIP - EIP' (where EIP' is the EIP of the last instruction) trick otherwise.
Intel processors support Last Branch Recording but it requires OS support and you'd need to post-process the data anyway, it's seem too burdensome.
A similar argument can be made for the Intel Processor Trace technology.
I can't think of any event for the performance counters (granted that you can use them) that would result in the the number of bytes of an instruction.
Actually in the backend, the concept of "instruction" has been reduced to a sequence of uOPs (probably with a bit to say that an opcode is the last one in an instruction) and the front-end is mostly decoupled from the architectural value of eip (working almost always with a speculative value of eip) so it may be several instructions ahead of the backend.
I believe each uOP probably have a field to record how to update eip at retirement but not the size of an instruction in bytes.
Similarly in the front-end only in the pre-decode stage an instruction length in bytes is recorded, after that I think it's discarded (I can't think of any use of it).
Instructions in the L1 instruction cache are not yet decoded, so even if there was a way to inspect their content and metadata there would be nothing there.
The usual way this is done is by making a trace: single step thorough the program, disassemble the instruction at eip (see below), record its size, resume the program, repeat until a stop condition.
This gives you a list of addresses and instruction sizes.
If you find an instruction you can't decode you either not record the size for it or try to estimate it with some heuristic (its length must be less than 16B and you could in theory integrate the data with the count from a PMC like BR_INST_RETIRED.ALL_BRANCHES).
It's possible to detect the size of an instruction at runtime but that's totally not feasible in this context.

Assembly - How to modify stack size?

I am a newbie in assembly programming and I am using push and pop instructions that use the memory stack.
So, What is the stack default size, How to modify it and What is the limit if its size?

Stack size depends upon a lot of factors.
It depends on where you start the stack, how much memory you have, what CPU you are using etc.
The CPU you are using is not called a "Windows CPU".
If you are specifying what CPU you are using, you specify the name of that CPU in detail and also, very important, the architecture of the CPU. In this case, you are probably using x86 architecture.
Here is a memory map for x86 architecture:
All addresses Before 0X100000 - Free
0x100000 - 0xc0000 - BIOS
0xc0000 - 0xa0000 - Video Memory
0xa0000 - 0x9fc00 - Extended BIOS data area
0x9fC00 - 0x7e00 - Free
0x7e00 - 0x7c00 - Boot loader
0x7c00 - 0x500 - Free
0x500 - 0x400 - BIOS data area
0x400 - 0x00 - Interupt vector table
In x86, stack information is held by two registers:
Base pointer (bp): Holds starting address of the stack
Stack pointer (sp): Holds the address in which next value will be stored
These registers have different names in different modes:
`Base pointer Stack pointer`
16 bit real mode: bp sp
32 bit protected mode: ebp esp
64 bit mode: rbp rsp
When you set up a stack, stack pointer and base pointer gets the same address.
Stack is setup in the address specified in base pointer register.
You can set up your stack anywhere in memory that is free and the stack grows downwards.
Each time you "push" something on to the stack, the value is stored in the address specified by stack pointer (which is same as base pointer at the beginning), and the stack pointer register is decremented.
Each time you "pop" something from the stack, the value stored in address specified by stack pointer register is stored in a register specified by the programmer and the stack pointer register is incremented.
In 16 bit real mode, you "push" and "pop" 16 bits. So each time you "push" or "pop", The stack pointer register is decremented or incremented by 0x02, since each address holds 8 bits..
In 32 bit protected mode, you "push" and "pop" 32 bits. So each time you "push" or "pop", The stack pointer register is decremented or incremented by 0x04, since each address holds 8 bits.
You will have to setup the stack in the right place dpending upon how many values you are going to be "pushing".
If you keep "pushing" your stack keeps growing downwards and at some point of time your stack may overwrite something. So be wise and set up the stack in a address in the memory where there is plenty of room for the stack to grow downwards.
For example:
If you setup your stack at 0x7c00, just below the bootloader and you "push" too many values, your stack might overwrite the BIOS data area at some point of time which causes a lot of errors.
You should have a basic idea of a stack and the size of it by now.

Whatever loaded ("the loader") your program into memory, and passed control to it, determines where in memory the stack is located, and how much space is available for the stack.
It does so by the simple artifice of loading the stack pointer, typically using a MOV ESP, ... instruction before calling/jumping to your code. Your program then uses the stack area supplied.
If your program uses too much, it will write beyond the end of the allocated stack area. This is a program bug, because the memory past the end may be allocated for some other purpose in the application. Writing on that other memory is likely to change the program behavior (e.g., "bug") when that memory gets used, and finding the cause of that bug is likely to be difficult (people assume that stacks don't damage program data and vice versa).
If your application wants to use a larger stack, generally all you have to do is allocate your own area, large enough for your purposes, and do a MOV ESP, ... yourself to set the stack to the chosen location. How you allocate an area depends on the execution environment in which you run. (You need to respect ESP conventions: must be a multiple of 4, should be initialized to the bottom of a cache line, often useful to initialize to the bottom of virtual memory page).
It is generally a good idea when "switching" stacks to save the old value of ESP provided by the loader, and restore ESP to that old value before returning control to the loader/caller/OS. Likewise, you should free the extended stack space no longer being used.
This scheme will work if you know the amount of stack space you need in advance. In practice, this is rather hard to "guess" (and may be impossible if your code has a recursive algorithm that nests deeply). So you can either pick a really huge number bigger than you need (ick) or you can use an organized approach to switch stacks when it is clear to the program that it needs more.
See How does a stackless language work? for more discussion.

How does a CPU know if an address in RAM contains an integer, a pre-defined CPU instruction, or any other kind of data?

The reason this gets me confused is that all addresses hold a sequence of 1's and 0's. So how does the CPU differentiate, let's say, 00000100(integer) from 00000100(CPU instruction)?

First of all, different commands have different values (opcodes). That's how the CPU knows what to do.
Finally, the questions remains: What's a command, what's data?
Modern PCs are working with the von Neumann-Architecture ( https://en.wikipedia.org/wiki/John_von_Neumann) where data and opcodes are stored in the same memory space. (There are architectures seperating between these two data types, such as the Harvard architecture)
Explaining everything in Detail would totally be beyond the scope of stackoverflow, most likely the amount of characters per post would not be sufficent.
To answer the question with as few words as possible (Everyone actually working on this level would kill me for the shortcuts in the explanation):
Data in the memory is stored at certain addresses.
Each CPU Advice is basically consisting of 3 different addresses (NOT values - just addresses!):
Adress about what to do
Adress about value
Adress about an additional value
So, assuming an addition should be performed, and you have 3 Adresses available in the memory, the application would Store (in case of 5+7) (I used "verbs" for the instructions)
Adress | Stored Value
1 | ADD
2 | 5
3 | 7
Finally the CPU receives the instruction 1 2 3, which then means ADD 5 7 (These things are order-sensitive! [Command] [v1] [v2])... And now things are getting complicated.
The CPU will move these values (actually not the values, just the adresses of the values) into its registers and then processing it. The exact registers to choose depend on datatype, datasize and opcode.
In the case of the command #1 #2 #3, the CPU will first read these memory addresses, then knowing that ADD 5 7 is desired.
Based on the opcode for ADD the CPU will know:
Put Address #2 into r1
Put Address #3 into r2
Read Memory-Value Stored at the address stored in r1
Read Memory-Value stored at the address stored in r2
Add both values
Write result somewhere in memory
Store Address of where I put the result into r3
Store Address stored in r3 into the Memory-Address stored in r1.
Note that this is simplified. Actually the CPU needs exact instructions on whether its handling a value or address. In Assembly this is done by using
eax (means value stored in register eax)
[eax] (means value stored in memory at the adress stored in the register eax)
The CPU cannot perform calculations on values stored in the memory, so it is quite busy moving values From memory to registers and from registers to memory.
i.e. If you have
eax = 0x2
and in memory
0x2 = 110011
and the instruction
MOV ebx, [eax]
this means: move the value, currently stored at the address, that is currently stored in eax into the register ebx. So finally
ebx = 110011
(This is happening EVERYTIME the CPU does a single calculation!. Memory -> Register -> Memory)
Finally, the demanding application can read its predefined memory address #2,
resulting in address #2568 and then knows, that the outcome of the calculation is stored at adress #2568. Reading that Adress will result in the value 12 (5+7)
This is just a tiny tiny example of whats going on. For a more detailed introduction about this, refer to http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
One cannot really grasp the amount of data movement and calculations done for a simple addition of 2 values. Doing what a CPU does (on paper) would take you several minutes just to calculate "5+7", since there is no "5" and no "7" - Everything is hidden behind an address in memory, pointing to some bits, resulting in different values depending on what the bits at adress 0x1 are instructing...

Short form: The CPU does not know what's stored there, but the instructions tell the CPU how to interpret it.
Let's have a simplified example.
If the CPU is told to add a word (let's say, an 32 bit integer) stored at the location X, it fetches the content of that address and adds it.
If the program counter reaches the same location, the CPU will again fetch this word and execute it as a command.

The CPU (other than security stuff like the NX bit) is blind to whether it's data or code.
The only way data doesn't accidentally get executed as code is by carefully organizing the code to never refer to a location holding data with an instruction meant to operate on code.
When a program is started, the processor starts executing it at a predefined spot. The author of a program written in machine language will have intentionally put the beginning of their program there. From there, that instruction will always end up setting the next location the processor will execute to somewhere this is an instruction. This continues to be the case for all of the instructions that make up the program, unless there is a serious bug in the code.
There are two main ways instructions can set where the processor goes next: jumps/branches, and not explicitly specifying. If the instruction doesn't explicitly specify where to go next, the CPU defaults to the location directly after the current instruction. Contrast that to jumps and branches, which have space to specifically encode the address of the next instruction's address. Jumps always jump to the place specified. Branches check if a condition is true. If it is, the CPU will jump to the encoded location. If the condition is false, it will simply go to the instruction directly after the branch.
Additionally, the a machine language program should never write data to a location that is for instructions, or some other instruction at some future point in the program could try to run what was overwritten with data. Having that happen could cause all sorts of bad things to happen. The data there could have an "opcode" that doesn't match anything the processor knows what to do. Or, the data there could tell the computer to do something completely unintended. Either way, you're in for a bad day. Be glad that your compiler never messes up and accidentally inserts something that does this.
Unfortunately, sometimes the programmer using the compiler messes up, and does something that tells the CPU to write data outside of the area they allocated for data. (A common way this happens in C/C++ is to allocate an array L items long, and use an index >=L when writing data.) Having data written to an area set aside for code is what buffer overflow vulnerabilities are made of. Some program may have a bug that lets a remote machine trick the program into writing data (which the remote machine sent) beyond the end of an area set aside for data, and into an area set aside for code. Then, at some later point, the processor executes that "data" (which, remember, was sent from a remote computer). If the remote computer/attacker was smart, they carefully crafted the "data" that went past the boundary to be valid instructions that do something malicious. (To give them more access, destroy data, send back sensitive data from memory, etc).

this is because an ISA must take into account what a valid set of instructions are and how to encode data: memory address/registers/literals.
see this for more general info on how ISA is designed
https://en.wikipedia.org/wiki/Instruction_set

In short, the operating system tells it where the next instruction is. In the case of x64 there is a special register called rip (instruction pointer) which holds the address of the next instruction to be executed. It will automatically read the data at this address, decode and execute it, and automatically increment rip by the number of bytes of the instruction.
Generally, the OS can mark regions of memory (pages) as holding executable code or not. If an error or exploit tries to modify executable memory an error should occur, similarly if the CPU finds itself trying to execute non-executable memory it will/should also signal an error and terminate the program. Now you're into the wonderful world of software viruses!

Can someone help me with segmentation and 8086 intel's microprocessor?

I am reading about the architecture of intel's 8086 and can't figure out the following things about segmentation: I know that segment registers point to segments respectively and contain the base address of a 64kb long segment. But who calculates and in which point sets the physical address in the segment registers? Also, because one physical address can be accessed by multiple segment:offset pairs and segments can overlap, how you can be sure that you won't overwrite something? Where I can read more about this?

Generally speaking the Assembler will only use offset addresses to access a logical address. For example looking at this code:
start lea si,[hello] ; Load effective address of string
mov word [ds:si+10],0 ; Zero-terminate string after 10th letter
jmp $ ; Loop endlessly
; Fill rest of the segment with 0s
times 65536-($-$$) db 0x00
hello db "I'm just outside of the current segment. Hello!",0
The assembler will try to calculate the offset of 'hello' from the origin of the program. Since no origin is defined 0x0 will be assumed. However the offset of 'hello' would be 0x10000 in this case, which does not fit 16-bits. Therefor the Assembler will truncate the address to 0x0000. It will not change any of the Segment registers. However it will likely issue a warning, for example test.asm:1: warning: word data exceeds bounds. What actually happens when you run this program is that the jmp $ line is overwritten with zeroes, because the address of hello wrapped around and the CPU will start executing nothing but Zeroes, which was not what you intended to do.
That is of course only if the code-segment and data-segment are the same. Now who guarantees that is the case? Nobody really. Especially since I still don't know what platform you are coding for. It is entirely your resposibility to set up the segment registers with correct values. The easiest way to do so is:
push cs ; Push address of code segment to stack
pop ds ; Pop address back into data segment
push cs ; Same for extra data segment
pop es ;
This way you can be certain your you are accessing the offset in the correct-data segment.
Now regarding 'How do you make sure the code segment doesnt overlap the data segment', why shouldn't it? When your program with data is smaller than 64KB it is actually the easiest way to access data if your code and data segment are identical.
And how can you be sure that you don't overwrite anything important? Assembler can't help you with that, you have to check yourself if the segment:offset address you are writing to already contains data.

How to view the current stack size of a windows thread when it overflows

I have a process that is overflowing the stack when run from within an IIS process, but works fine when run on its own. I suspect that on its own it gets the default 1MB stack, but within IIS gets somewhat less.
To avoid messing with the IIS worker processes I am using a sub-thread within the IIS process to allocate a bigger stack, but I suspect the stack size argument to Thread creation is being ignored as per the documentation (http://msdn.microsoft.com/en-us/library/ms149581.aspx)
When the stack overflows I can view the halted process in the debugger, but how do I find out how big a stack was actually allocated?

The answer is as follows.
In debugger, add a watch on the pseudo register TIB (http://msdn.microsoft.com/en-us/library/aa232399(v=vs.60).aspx )
Now take this value and display that address in a memory window. Subtract the third 4 byte word from the second 4 byte word, remembering to use little endian byte ordering.
http://en.wikipedia.org/wiki/Win32_Thread_Information_Block

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio