Thread Environment block and Process Environment block - windows

I have read win32 process memory contains this structure:
One Process Environment block (PEB) (one per process)
Several Thread Environment blocs (TEB) (one per thread inside the process)
I have read a lot of documentation and I do not understand:
This TEB and PEB are specific to windows x86 32 ? Or is it the same for x86 64 windows OS ?
Is there a way to loop all process's threads TEB without calling windows API ?
What is the equivalent of TEB/PEB for Linux systems ?
Thanks

This TEB and PEB are specific to windows x86 32 ? Or is it the same for x86 64 windows OS ?
There's a TEB and PEB for 32-bit and 64-bit programs. e.g. you have a TEB32 and TEB64 structures (you can see them in the kernel symbols). They have the same fields but since x64 fields are larger (e.g. a pointer is 4 bytes on 32-bit but 8 bytes on 64-bit) their sizes differ and the fields offsets are obviously different.
From a kernel debugger:
0: kd> ?? sizeof(_TEB64)
unsigned int64 0x1838
0: kd> dt _TEB64
nt!_TEB64
+0x000 NtTib : _NT_TIB64
+0x038 EnvironmentPointer : Uint8B
+0x040 ClientId : _CLIENT_ID64
+0x050 ActiveRpcHandle : Uint8B
+0x058 ThreadLocalStoragePointer : Uint8B
+0x060 ProcessEnvironmentBlock : Uint8B
+0x068 LastErrorValue : Uint4B
...
0: kd> ?? sizeof(_TEB32)
unsigned int64 0x1000
0: kd> dt _TEB32
nt!_TEB32
+0x000 NtTib : _NT_TIB32
+0x01c EnvironmentPointer : Uint4B
+0x020 ClientId : _CLIENT_ID32
+0x028 ActiveRpcHandle : Uint4B
+0x02c ThreadLocalStoragePointer : Uint4B
+0x030 ProcessEnvironmentBlock : Uint4B
+0x034 LastErrorValue : Uint4B
...
Is there a way to loop all process's threads TEB without calling windows API ?
Nope, TEBs are not linked and the PEB doesn't have a list of the TEBs. At the kernel level this is possible (with EPROCESS and ETHREAD structures), but not at the user-mode level. So, not without calling an API (e.g. NtQueryInformationThread).
What is the equivalent of TEB/PEB for Linux systems ?
There's no direct 1:1 mapping between TEB/PEB and linux structures; the closest you could get is, I guess, task_struct and thread_info (which are more akin to EPROCESS / ETHREAD), but the system architectures are different enough that there's no real counterparts in linux.

Related

inl instruction in __asm__

Been going through the linux kernel code and I have seen this:
__asm__("inl (%%dx)..."
Been trying to look it up online but couldnt find any docs on this instruction.
Its supposed to be something related to I/O.
It is the IN instruction with a 16-bit port argument (%dx) and a 32-bit destination value (%eax):
OpCode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
ED IN EAX,DX ZO Valid Valid Input doubleword from I/O port in DX into EAX.
It reads a DWORD from the I/O address space.

Pinning a DLL in memory (increase reference count)

I am trying to run an application, but the application exits due to an access violation. Running the application in the debugger I can see that this is caused by an unloaded library. I can not wait for the next release of the application, so I'm trying to workaround the problem.
I wonder whether WinDbg provides a way of increasing the reference count of a loaded module, similar to the C++ LoadLibrary() call. I could then break on module loads and increase the reference count on the affected DLL to see if I can use the application then.
I have already looked for commands starting with .load, !load, .lock, !lock, .mod and !mod in WinDbg help. .load will load the DLL as an extension into the debugger process, not into the target process.
Update
Forgot to mention that I have no source code, so I can't simply implement a LoadLibrary() call as a workaround and recompile.
The comment by Hans Passant leads me to .call and I tried to use it like
.call /v kernel32!LoadLibraryA("....dll")
but it gives the error message
Symbol not a function in '.call /v kernel32!LoadLibraryA("....dll")'
Update 2
Probably the string for the file name in .call should be a pointer to some memory in the target process instead of a string which resides in WinDbg.exe where I type the command. That again means I would probably mean to allocate some memory to store the string inside, so this might become more complex.
Using .call in windbg as always been finicky to me. I believe you are having trouble with it because kernel32 only has public symbols so the debugger doesn't know what it's arguments look like.
So let's look at some alternatives...
The easy way
You can go grab a tool like Process Hacker, which I think is a wonderful addition to any debugger's tool chest. It has an option to inject a DLL into a process.
Behind the scenes, it calls CreateRemoteThread to spawn a thread in the target process which calls LoadLibrary on the chosen DLL. With any luck, this will increase the module reference count. You can verify that the LoadCount has been increased in windbg by running the !dlls command before and after the dll injection.
The hard way
You can also dig into the internal data structures Windows uses to keep track of a process's loaded modules and play with the LoadCount. This changes between versions of Windows and is a serious no-no. But, we're debugging, so, what the hell? Let's do this.
Start by getting a list of loaded modules with !dlls. Suppose we care about your.dll; we might see something like:
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000001 NodeRefCount 0x00000001
We can see that the load count is currently 1. To modify it, we could use the address printed before the module path. It is the address of the the ntdll!_LDR_DATA_TABLE_ENTRY the process holds for that module.
r? #$t0 = (ntdll!_LDR_DATA_TABLE_ENTRY*) 0x002772a8
And, now you can change the LoadCount member to something larger as so:
?? #$t0->LoadCount = 2
But, as I said, this stuff changes with new versions of Windows. On Windows 8, the LoadCount member was moved out of _LDR_DATA_TABLE_ENTRY and into a new ntdll!_LDR_DDAG_NODE structure. In place of it, there is now an ObsoleteNodeCount which is not what we want.
On Windows 8, we would run the following command instead:
?? #$t0->DdagNode->LoadCount = 2
And, time to check our work...
0x002772a8: C:\path\to\your.dll
Base 0x06b80000 EntryPoint 0x06b81000 Size 0x000cb000 DdagNode 0x002b3a10
Flags 0x800822cc TlsIndex 0x00000000 LoadCount 0x00000002 NodeRefCount 0x00000001
Awesome. It's 2 now. That'll teach FreeLibrary a lesson about unloading our DLLs before we say it can.
The takeaway
Try the easy way first. If that doesn't work, you can start looking at the internal data structures Windows uses to keep track of this kind of stuff. I don't provide the hard way hoping you'll actually try it, but that it might make you more comfortable around the !dlls command and those data structures in the future.
Still, all modifying the LoadCount will afford you is confirmation that you are seeing a DLL get unloaded before it should have. If the problem goes away after artificially increasing the LoadCount, meaning that you've confirmed your theory, you'll have to take a different approach to debugging it -- figuring out when and why it got unloaded.
A dll that is linked while compiling will normally have a LoadCount of -1 (0xffff) and it is not Unloadable Via FreeLibrary
so you can utilize the loadModule Event to break on a Dynamically Loaded Module and increase the LoadCount during the Event
Blink of InLoadOrderModuleList (last dll Loaded in the process) when on initial break at ntdll!Dbgbreak() xp-sp3 for an arbitrary console app which uses a dll
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff <----------- not unloadable via FreeLibrary
setting up break on Specific Module Load
0:000> sxe ld skeleton
0:000> g
ModLoad: 10000000 10005000 C:\skeleton.dll
ntdll!KiFastSystemCallRet:
7c90e514 c3 ret
the LoadModule Breaks on MapSection so Ldr isnt yet updated
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\WINDOWS\system32\GDI32.dll"
+0x038 LoadCount : 0xffff
go up until the Ldr is updated
0:000> gu;gu;gu
ntdll!LdrpLoadDll+0x1e9:
7c91626a 8985c4fdffff mov dword ptr [ebp-23Ch],eax ss:0023:0013fa3c=00000000
blink showing the last loaded Module notice loadCount 0 not updated yet
0:000> dt ntdll!_LDR_DATA_TABLE_ENTRY FullDllName LoadCount ##((( #$peb)->Ldr)->InLoadOrderModuleList.Blink)
+0x024 FullDllName : _UNICODE_STRING "C:\skeleton.dll"
+0x038 LoadCount : 0
dump the LoadEntry of the module
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
**0x00252840:** C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000000 TlsIndex 0x00000000
LDRP_IMAGE_DLL
increase load count arbitrarily and redump (process attach hasnt been called yet)
0:000> ed 0x252840+0x38 4
0:000> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x00000004 LoadCount 0x00000004 TlsIndex 0x00000000
LDRP_IMAGE_DLL
run the binary
0:000> g
dll is loaded into the process break with ctrl+break
Break-in sent, waiting 30 seconds...
(aa0.77c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
7c90120e cc int 3
dump and see system has updated the loadcount to our count+1 also process attach has been called
0:001> !dlls -c skeleton
Dump dll containing 0x10000000:
0x00252840: C:\skeleton.dll
Base 0x10000000 EntryPoint 0x10001000 Size 0x00005000
Flags 0x80084004 LoadCount 0x00000005 TlsIndex 0x00000000
LDRP_IMAGE_DLL
LDRP_ENTRY_PROCESSED
LDRP_PROCESS_ATTACH_CALLED
btw use ken johnsons (skywing) sdbgext !remotecall instead of .call
it doesnt require Private Symbols
.load sdbgext
!remotecall kernel32!LoadLibraryA 0 "c:\skeleton.dll" ; g
should load the dll in the process
or use
!loaddll "c:\\skeleton.dll" from the same extension
kernel32!LoadLibraryA() will be run when execution is resumed
0:002> g
kernel32!LoadLibraryA() [conv=0 argc=4 argv=00AC0488]
kernel32!LoadLibraryA() returned 10000000
Simplest way - get .dll path and LoadLibrary it.
It will increase .dll reference count and .dll will not be released.

Kernel panic using deferred_io on kmalloced buffer

I'm writing a framebuffer for an SPI LCD display on ARM. Before I complete that, I've written a memory only driver and trialled it under Ubuntu (Intel, Virtualbox). The driver works fine - I've allocated a block of memory using kmalloc, page aligned it (it's page aligned anyway actually), and used the framebuffer system to create a /dev/fb1. I have my own mmap function if that's relevant (deferred_io ignores it and uses its own by the look of it).
I have set:
info->screen_base = (u8 __iomem *)kmemptr;
info->fix.smem_len = kmem_size;
When I open /dev/fb1 with a test program and mmap it, it works correctly. I can see what is happening x11vnc to "share" the fb1 out:
x11vnc -rawfb map:/dev/fb1#320x240x16
And view with a vnc viewer:
gvncviewer strontium:0
I've made sure I've no overflows by writing to the entire mmapped buffer and that seems to be fine.
The problem arises when I add in deferred_io. As a test of it, I have a delay of 1 second and the called deferred_io function does nothing except a pr_devel() print. I followed the docs.
Now, the test program opens /dev/fb1 fine, mmap returns ok but as soon as I write to that pointer, I get a kernel panic. The following dump is from the ARM machine actually but it panics on the Ubuntu VM as well:
root#duovero:~/testdrv# ./fbtest1 /dev/fb1
Device opened: /dev/fb3
Screen is: 320 x 240, 16 bpp
Screen size = 153600 bytes
mmap on device succeeded
Unable to handle kernel paging request at virtual address bf81e020
pgd = edbec000
[bf81e020] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: hhlcd28a(O) sysimgblt sysfillrect syscopyarea fb_sys_fops bnep ipv6 mwifiex_sdio mwifiex btmrvl_sdio firmware_class btmrvl cfg80211 bluetooth rfkill
CPU: 0 Tainted: G O (3.6.0-hh04 #1)
PC is at fb_deferred_io_fault+0x34/0xb0
LR is at fb_deferred_io_fault+0x2c/0xb0
pc : [<c0271b7c>] lr : [<c0271b74>] psr: a0000113
sp : edbdfdb8 ip : 00000000 fp : edbeedb8
r10: edbeedb8 r9 : 00000029 r8 : edbeedb8
r7 : 00000029 r6 : bf81e020 r5 : eda99128 r4 : edbdfdd8
r3 : c081e000 r2 : f0000000 r1 : 00001000 r0 : bf81e020
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: adbec04a DAC: 00000015
Process fbtest1 (pid: 485, stack limit = 0xedbde2f8)
Stack: (0xedbdfdb8 to 0xedbe0000)
[snipped out hexdump]
[<c0271b7c>] (fb_deferred_io_fault+0x34/0xb0) from [<c00db0c4>] (__do_fault+0xbc/0x470)
[<c00db0c4>] (__do_fault+0xbc/0x470) from [<c00dde0c>] (handle_pte_fault+0x2c4/0x790)
[<c00dde0c>] (handle_pte_fault+0x2c4/0x790) from [<c00de398>] (handle_mm_fault+0xc0/0xd4)
[<c00de398>] (handle_mm_fault+0xc0/0xd4) from [<c049a038>] (do_page_fault+0x140/0x37c)
[<c049a038>] (do_page_fault+0x140/0x37c) from [<c0008348>] (do_DataAbort+0x34/0x98)
[<c0008348>] (do_DataAbort+0x34/0x98) from [<c0498af4>] (__dabt_usr+0x34/0x40)
Exception stack(0xedbdffb0 to 0xedbdfff8)
ffa0: 00000280 0000ffff b6f5c900 00000000
ffc0: 00000003 00000000 00025800 b6f5c900 bea6dc1c 00011048 00000032 b6f5b000
ffe0: 00006450 bea6db70 00000000 000085d6 40000030 ffffffff
Code: 28bd8070 ebffff37 e2506000 0a00001b (e5963000)
---[ end trace 7e5ca57bebd433f5 ]---
Segmentation fault
root#duovero:~/testdrv#
I'm totally stumped - other drivers look more or less the same as mine but I assume they work. Most use vmalloc actually - is there a difference between kmalloc and vmalloc for this purpose?
Confirmed the fix so I'll answer my own question:
deferred_io changes the info mmap to its own that sets up fault handlers for writes to the video memory pages. In the fault handler it
checks bounds against info->fix.smem_len, so you must set that
gets the page that was written to.
For the latter case, it treats vmalloc differently from kmalloc (by checking info->screen_base to see if it's vmalloced). If you have vmalloced, it uses screen_base as the virtual address. If you have not used vmalloc, it assumes that the address of interest is the physical address in info->fix.smem_start.
So, to use deferred_io correctly
set screen_base (char __iomem *) and point that to the virtual address.
set info->fix.smem_len to the video buffer size
if you are not using vmalloc, you must set info->fix.smem_start to the video buffer's physical address by using virt_to_phys(vid_buffer);
Confirmed on Ubuntu as fixing the issue.
Really interesting, I'm currently implementing SPI-based display FB driver too (Sharp Memory LCD display and my VFDHack32 host driver). I also facing similar problem where it crashes at deferred_io. Can you share you source code ? mine is at my GitHub repo. P.S. that Memory LCD display is monochrome so I just pretend to be color display and just check whether the pixel byte is empty (dot off) or not empty (dot on).

Windbg ethread - IrpList location

I'm currently struggling to make sense of the output from Windbg.
What I'm trying to do is find out how many IRPs (Interrupt Request Packets) are queued in a particular thread, so here is what I currently have:
lkd> !thread
THREAD fffffa8001fce270
IRP List:
fffffa8001cf3b60
...
So this tells me that the current thread has one IRP in it's list, and it's address.
However, the next command is what's confusing me slightly:
lkd> ??#$thread->IrpList
struct _LIST_ENTRY
[ 0xfffffa8001cf3b80 - 0xfffffa8001cf3b80 ]
+0x000 Flink 0xfffffa8001cf3b80 _LIST_ENTRY [ 0xfffffa8001fce658 - 0xfffffa8001fce658]
+0x000 Blink 0xfffffa8001cf3b80 _LIST_ENTRY [ 0xfffffa8001fce658 - 0xfffffa8001fce658]
All of this information is coming out of the _ETHREAD structure, and according to windbg the offset for the 'IrpList' element in the structure is 0x3e8.
So if the thread (_ETHREAD) starts at offset 0xfffffa8001fce270, the IrpList element should be at offset 0xfffffa8001fce658 (0xfffffa8001fce270 + 0x3e8)
However, I don't fully understand why windbg is reporting the IRP List entry at offset 0xfffffa8001cf3b80.
I'm probably getting the wrong end of the stick here, but if anyone can point me in the right direction, I'd greatly appreciate it.
Thanks
The list address is not 0xfffffa8001cf3b80. That's the address of the list entry in the IRP, which is at IRP+0x20 (0xfffffa8001cf3b60 + 0x20 = 0xfffffa8001cf3b80). The list entry address in the ETHREAD is 0xfffffa8001fce658 (0xfffffa8001fce658 - fffffa8001fce270 = 0x3e8).

Modifying current process' pte through /dev/mem?

AFAIK, /dev/mem presents physical memory to user, and it's usually being used for device read/write through MMIO. In my use case, I want to modify current process' pte so that two ptes will point to the same physical page. In particular, I move a x86_64 binary above 4G virtual space and mmap virtual space below 4G. I want to make 4G above pte and 4G below pte point to the same physical page, so that when I write into 4G above vaddr and read from 4G below pte, I get the same result. Sample code might look like below:
*(unsigned char *)vaddr1 = 7 // write into 4G above vaddr1
val = *(unsigned char *)vaddr2; // read from 4G below vaddr2
printf("val should be 7, %d\n", val);
But after I modify 4G below pte to point to physical page pointed by 4G above pte through /dev/mem, kernel give me message below,
BUG: Bad page map in process mmap pte:8000000007eb2067 pmd:07acb067
page:ffffea00001fac80 count:0 mapcount:-1 mapping: (null) index:0x101b7b
page flags: 0x4000000000000014(referenced|dirty)
addr:0000000101b7b000 vm_flags:00100073 anon_vma:ffff880007ab0708 mapping: (null) index:101b7b
Pid: 609, comm: mmap Tainted: G B 3.5.3 #7
Call Trace:
[<ffffffff8107abcc>] ? print_bad_pte+0x1d2/0x1ea
[<ffffffff8107bf18>] ? unmap_single_vma+0x3a0/0x56d
[<ffffffff8107c745>] ? unmap_vmas+0x2c/0x46
[<ffffffff8108106b>] ? exit_mmap+0x6e/0xdd
[<ffffffff8101cc4f>] ? do_page_fault+0x30f/0x348
[<ffffffff81020ce6>] ? mmput+0x20/0xb4
[<ffffffff810256ae>] ? exit_mm+0x105/0x110
[<ffffffff8103bb6c>] ? hrtimer_try_to_cancel+0x67/0x70
[<ffffffff81026b59>] ? do_exit+0x211/0x711
[<ffffffff810272e0>] ? do_group_exit+0x76/0xa0
[<ffffffff8102731c>] ? sys_exit_group+0x12/0x19
[<ffffffff812f3662>] ? system_call_fastpath+0x16/0x1b
BUG: Bad rss-counter state mm:ffff880007a496c0 idx:0 val:-1
BUG: Bad rss-counter state mm:ffff880007a496c0 idx:1 val:1
I guess kernel will examine if the pte has been modified, and I did something wrong. Here are vaddr1 and vaddr2's pte before/after my pte rewriting.
above 4G pte: 0x8000000007eb2067
below 4G pte: 0x0000000007ea7067
after rewriting pte...
above 4G pte: 0x8000000007eb2067
below 4G pte: 0x8000000007eb2067
Any idea? Thanks.
Note: Now I know I should release the physical page pointed by vaddr2's pte, otherwise kernel will note that physical page isn't pointed by any pte and give those error. But how? I try to use __free_page, but get error below.
BUG: unable to handle kernel paging request at ffffebe00008001c
IP: [<ffffffff8106b908>] __free_pages+0x4/0x2a
PGD 0
Oops: 0000 [#2] PREEMPT SMP
CPU 0

Resources