How to disambiguate instructions from data in the .text segment of a PE file? - windows

I have a PE file and I try to disassemble it in order to get it's instructions. However I noticed that .text segment contains not only instructions but also some data (I used IDA to notice that). Here's the example:
.text:004037E4 jmp ds:__CxxFrameHandler3
.text:004037EA ; [00000006 BYTES: COLLAPSED FUNCTION _CxxThrowException. PRESS KEYPAD "+" TO EXPAND]
.text:004037F0 ;
.text:004037F0 mov ecx, [ebp-10h]
.text:004037F3 jmp ds:??1exception#std##UAE#XZ ; std::exception::~exception(void)
.text:004037F3 ;
.text:004037F9 byte_4037F9 db 8Bh, 54h, 24h ; DATA XREF: sub_401440+2o
.text:004037FC dd 0F4428D08h, 33F04A8Bh, 0F6B2E8C8h, 0C4B8FFFFh, 0E9004047h
.text:004037FC dd 0FFFFFFD0h, 3 dup(0CCCCCCCCh), 0E904458Bh, 0FFFFD9B8h
.text:00403828 dword_403828 dd 824548Bh, 8BFC428Dh, 0C833F84Ah, 0FFF683E8h, 47F0B8FFh
.text:00403828 ; DATA XREF: sub_4010D0+2o
.text:00403828 ; .text:00401162o
.text:00403828 dd 0A1E90040h, 0CCFFFFFFh, 3 dup(0CCCCCCCCh), 50E0458Dh
.text:00403828 dd 0FFD907E8h, 458DC3FFh, 0D97EE9E0h
.text:00403860 db 2 dup(0FFh)
.text:00403862 word_403862 dw 548Bh
How can I distinct such data from instructions? My solution to this problem was to find simply the first instruction (enter address) and visit each instruction and all called functions. Unfortunatelly it occured that there are some blocks of code which are not directly called but their addresses are in .rdata segment among some data and I have no idea how distinct valid instruction addresses from data.
To sum up: is there any way to decide whether some address in .text segment contains data or instructions? Or maybe is there any way to decide which potential addresses in .rdata should be interpreted as instructions addresses and which as data?

You cannot, in general. The .text section of a PE file can mix up code and constants any way the author likes. Programs like IDA try to make sense of this by starting with the entrypoints and then disassembling, and seeing which addresses are targets of jumps, and which of reads. But devious programs can 'pun' between instructions and data.

Related

Explain to me how Windows allocates process virtual memory

I have pretty complex question combined of multiple related questions. Let me give you the preamble.
I wrote a simple Win64 program in assembly language which prints "2 + 3 = 5" using printf and then "Hello World!" using puts:
format PE64
entry start
section '.text' code readable executable
start:
sub rsp,8*5 ; reserve stack for API use and make stack dqword aligned
mov edx, 3
mov ecx, 2
call print_sum
lea rcx,[_hw_message]
call [puts]
mov ecx,0
call [ExitProcess]
print_sum:
sub rsp, 20h
mov r9d, ecx
add r9d, edx
mov r8d, edx
mov edx, ecx
lea ecx, [_format_message]
call [printf]
add rsp, 20h
ret
section '.data' data readable writeable
_hw_message db 'Hello World!',0
_format_message db '%d + %d = %d',13,10,0
section '.idata' import data readable writeable
dd 0,0,0,RVA kernel_name,RVA kernel_table
dd 0,0,0,RVA msvcrt_name,RVA msvcrt_table
kernel_table:
ExitProcess dq RVA _ExitProcess
dq 0
msvcrt_table:
printf dq RVA _printf
puts dq RVA _puts
dq 0
kernel_name db 'KERNEL32.DLL',0
msvcrt_name db 'msvcrt.dll',0
_ExitProcess dw 0
db 'ExitProcess',0
_printf dw 0
db 'printf',0
_puts dw 0
db 'puts',0
and built it with fasm. Resulting binary size is 2048 bytes.
I've opened it with CFF Explorer to see PE header values.
Image base is 0x400000, entry point is 0x1000, .text section virtual address is 0x1000 too, so, as far as I understand, it should start in virtual memory at offset 0x401000 and it is also its entry point.
Then I've opened it in debugger (I use x64dbg) to confirm my guess:
Looks believable. Also note that stack is located at 0x8A000.
Fine, then I've tried the same with another program – notepad.exe from C:\Windows:
Wait, what? 0x140000000 + 0x24050 = 0x140024050, not 0x7FF75FD04050. And I can't find in PE headers such big values starting with 7FF.
In addition, the stack is again located somewhere at the beginning of the process's memory map, but now its address is already much larger:
I thought that perhaps this is because notepad.exe is a system program and is tightly tied to the Windows system APIs, and some parts of it (and maybe all the code) are always loaded into RAM while Windows is running. Therefore, I tried to do the same with x64dbg itself, and saw about the same picture:
image base: 0x140000000
entry point (in headers): 0x2440
entry point in VM: 0x7FF6B0E82440
location of stack in VM: 0xFDA07F8000
So the questions are:
Why are sections of some programs mapped to addresses greater than 0x7ff000000000, which doesn't match PE headers?
How are these processes different from others?
How does the OS decide where to place the stack in virtual memory?
Each thread has its own stack. As you can see from the screenshots, thread stacks are usually placed before code sections. If the program starts a dynamic number of threads, this memory may not be enough. Where, in this case, will stacks be allocated for new threads?
How can I programmatically, having an executable file, but not running it, statically determine at what addresses in the virtual memory of its process the sections, the stack will be located, and what address spaces will be available for allocation on the heap?
I understand that this can be difficult to explain in a nutshell, so I appreciate if, in addition to answering my questions, you can recommend me some reading material that will help me improve my understanding of the Windows virtual memory mapping.
What you're seeing is Address space layout randomization, which is enabled by default in MSVC with the linker flag:
/DYNAMICBASE.
To enable this, the flag IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE (0x40) must be set in the PE header, at FileHeader -> OptionalHeader -> DllCharacteristics.
When enabled, the OS will select a random address for the base image, stack, and heap. The ImageBase specified in the PE header will be ignored.

LOOP is done only one time when single stepping in Turbo Debugger

The code must output 'ccb',but output only 'c', LOOP is done only one time, i have calibrated in TD, but why LOOP is done only one time?
I THINK THAT I MUST TO DECREMENT STRING_LENGTH, SO I WROTE
DEC STRING_LENGTH
BUT IT NOT WORK, SO I WROTE LIKE THAT
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
I KNOW WHAT ARE YOU THINKING RIGHT NOW, THAT IS SO INCORRECT, YOU ARE RIGHT)))
I CAN USE C++, BUT I WANT TO DO IT ONLY IN ASSEMBLY,
DOSSEG
.MODEL SMALL
.STACK 200H
.DATA
STRING DB 'cScbd$'
STRING_LENGTH EQU $-STRING
STRING1 DB STRING_LENGTH DUP (?) , '$'
.CODE
MOV AX,#DATA
MOV DS,AX
XOR SI,SI
XOR DI,DI
MOV CX,STRING_LENGTH
S:
MOV BL,STRING[DI]
AND STRING[DI],01111100B
CMP STRING[DI],01100000B
JNE L1
MOV AL,BL
MOV STRING1[SI],AL
ADD SI,2
L1:
ADD DI,2
LOOP S
MOV DL,STRING1
MOV AH,9
INT 21H
MOV AH,4CH
INT 21H
END
In Turbo Debugger (TD.EXE) the F8 "F8 step" will execute the loop completely, until the cx becomes zero (you can even create infinite loop by updating cx back to some value, preventing it from reaching the 1 -> 0 step).
To get "single-step" out of the loop instruction, use the F7 "F7 trace" - that will cause the cx to go from 6 to 5, and the code pointer will follow the jump back on the start of the loop.
About some other issues of your code:
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
sp is not general purpose register, don't use it for calculation like this. Whenever some instruction does use stack implicitly (push, pop, call, ret, ...), the values are being written and read in memory area addressed by the ss:sp register pair, so by manipulating the sp value you are modifying the current "stack".
Also in 16 bit x86 real mode all the interrupts (keyboard, timer, ...), when they occur, the current state of flag register and code address is stored into stack, before giving the control to the interrupt handler code, which usually will push additional values to the stack, so whatever is in memory on addresses below current ss:sp is not safe in 16 bit x86 real mode, and the memory content keeps "randomly" changing there by all the interrupts being executed meanwhile (the TD.EXE itself does use part of this stack memory after every single step).
For arithmetic use other registers, not sp. Once you will know enough about "stack", you will understand what kind of sp manipulation is common and why (like sub sp,40 at beginning of function which needs additional "local" memory space), and how to restore stack back into expected state.
One more thing about that:
MOV SP,STRING_LENGTH
DEC SP
MOV STRING_LENGTH,SP
The STRING_LENGTH is defined by EQU, which makes it compile time constant, and only compile time. It's not "variable" (memory allocation), contrary to the things like someLabel dw 1345, which cause the assembler to emit two bytes with values 0100_0001B, 0000_0101B (when read as 16 bit word in little-endian way, that's value 1345 encoded), and the first byte address has symbolic name someLabel, which can be used in further instructions, like dec word ptr [someLabel] to decrement that value in memory from 1345 to 1344 during runtime.
But EQU is different, it assigns the symbol STRING_LENGTH final value, like 14.
So your code can be read as:
mov sp,14 ; makes almost sense, (practically destroys stack setup)
dec sp ; still valid
mov 14,sp ; doesn't make any sense, constant can't be destination for MOV

How can I print numbers in my assembly program

I have a problem with my assembly program. My assembly compiler is NASM. The source and the outputs are in this picture:
The problem is that I can't print numbers from calculations with the extern C function printf(). How can I do it?
The output should be "Ergebnis: 8" but it isn't correct.
In NASM documentation it is pointed that NASM Requires Square Brackets For Memory References. When you write label name without bracket NASM gives its memory address (or offset as it is called sometimes). So, mov eax, val_1 it means that eax register gets val_1's offset. When you add eax, val_2, val_2 offset is added to val_1 offset and you get the result you see.
Write instead:
mov eax, [val_1]
add eax, [val_2]
And you shoul get 8 in eax.
P.S. It seems that you have just switched to NASM from MASM or TASM.
There are a lot of guides for switchers like you. See for example nice tutorials here and here.

PIC30F Data EEPROM reads 0xFFFF first time around

The first time my PIC30F code reads a word from Data EEPROM, it reads 0xFFFF instead of the data actually in EEPROM. It reads fine afterward.
After a bad read, I checked W1 and it does have the correct address
There are no words in data EEPROM with a value of 0xFFFF
I checked the supply: it's 5.13 V
If I break right before the table read instruction, and step through it, it woks fine
I know that NVMADRU and NVMADR are not involved in reading, but I checked them, and their value doesn't change between good reads and bad reads
It's a dsPIC30F5011
I checked the Errata, and did not find any reference to such issue
I am working through the debug function of MPLAB 8, with a PICkit II
I am working through the debug function of MPLAB 8, with a PICkit II: I reset, then run, and it fails
If I place the code in a tight loop until the value is correct, and counting the number of iterations, I see that it takes 2339 times through the loop until it reads correctly
EEPROM read code:
_ReadEEWord:
;--------------------------------------------------------------------------------
; Read a word from Data EEPROM
; Entry W0 Word address relative to the start of Data EEPROM
; Exit W0 Word at that location
; Uses W1, Table pointer
;--------------------------------------------------------------------------------
; Start address of Data EEPROM
#define DATAEE_START 0x7FFC00
; Setup pointer to EEPROM memory
mov #0x7F,W1 ; Set the table pointer
mov W1,TBLPAG ; to the page with the EEPROM
add W0,W0,W0 ; Convert the word address to a byte address
mov #0xFC00,W1 ; Add the start of EEPROM
add W1,W0,W1 ; to the address
nop
nop
nop
; Return the EEPROM data
tblrdl [W1],W0 ; Read the EEPROM data
nop
nop
nop
return
Any suggestions of what may be causing that?
SOLVED
The documentation doesn't say so, but, before you can read data EEPROM, you must wait for any previous EEPROM operations to be done.
You can do it in one of these ways:
1) In C:
#include <libpic30.h> // Includes EEPROM utilities
_wait_eedata(); // Wait for the erase to be done
2) In C, no lib import
while (NVMCONbits.WR);
3) In assembly:
btsc NVMCON,#15 ; If busy (WR bit s set)
bra $-2 ; Go back and wait

Assemble IA-32 mov [bootdrv], dl

I just start to program IA-32 assemble and boot loader and I can't understand one command: mov [bootdrv], dl.
dl is the low 8 bits of data register, but I dont know what is [bootdrv]. Is it a variable or something? How could a register be placed in [bootdrv]?
start:
mov ax,0x7c0 ; BIOS puts us at 0:07C00h, so set DS accordinly
mov ds,ax ; Therefore, we don't have to add 07C00h to all our data
mov [bootdrv], dl ; quickly save what drive we booted from
This is the beginning 3 line of a boot loader and [bootdrv] just appear without any definition, I couldn't understand.
Any information would be helpful and appreciated, thank you!
[bootdrv] is a specification of an absolute memory address. The code:
mov [bootdrv], dl
copies the contents of the 8-bit DL register into a byte in memory, at the address resulting of multiplying the current value of DS by 16, then add the value bootdrv. bootdrv itself is a label, which a value that represents where in the current data segment is the memory position located.
On the other hand, the symbol bootdrv must be defined somewhere. Otherwise, the assembler will stop with a "symbol not defined" error. Maybe it's defined past the code (assemblers do two passes through the source code in order to get all symbols so they can be used even if they are defined after the code sequence that uses them). Maybe it's in a separate .INC file.
mov [bootdrv], dl indicates a segment:offset memory access. In the previous instruction, you configured the Data Segment register with an address, so the mov [bootdrv], dl instruction writes to the segment:offset address 0x7c0:bootdrv, whatever bootdrv might be.

Resources