Flashing NodeMCU on ESP32 - nodemcu

ESP8266 and ESP32 noob here. I bought a couple of ESP32 modules and I'm trying to install NodeMCU on them (they came with just some sample code).
I created a firmware image using the cloud builder and tried to flash it to the device (later on, I built it myself too, same result). After some experimentation I found that the bootloader expects the firmware to start at 0x1000 in the flash, instead of 0x0000 (I read the original flash content to confirm that), so I flashed the firmware at 0x1000. I can confirm using read_flash (or other methods) that the firmware has been flashed correctly. But when I connect to the serial port to see what the output is, I get this at the beginning:
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0x00
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x01,hd_drv:0x00,wp_drv:0x04
mode:DIO, clock div:2
load:0x260513e7,len:0
load:0x46007200,len:65534
1162 mmu set 00010000, pos 00010000
load:0x65920020,len:-491131
1162 mmu set 00020000, pos 00020000
1162 mmu set 00030000, pos 00030000
1162 mmu set 00040000, pos 00040000
1162 mmu set 00050000, pos 00050000
1162 mmu set 00060000, pos 00060000
1162 mmu set 00070000, pos 00070000
1162 mmu set 00080000, pos 00080000
1162 mmu set 00090000, pos 00090000
1162 mmu set 000a0000, pos 000a0000
1162 mmu set 000b0000, pos 000b0000
ets Jun 8 2016 00:22:57
And after that this looping over and over again:
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0x00
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x01,hd_drv:0x00,wp_drv:0x04
mode:DIO, clock div:2
load:0x260513e7,len:0
load:0x46007200,len:65534
1162 mmu set 00010000, pos 00010000
load:0x65920020,len:-491131
1162 mmu set 00020000, pos 00020000
1162 mmu set 00030000, pos 00030000
1162 mmu set 00040000, pos 00040000
1162 mmu set 00050000, pos 00050000
1162 mmu set 00060000, pos 00060000
1162 mmu set 00070000, pos 00070000
1162 mmu set 00080000, pos 00080000
1162 mmu set 00090000, pos 00090000
1162 mmu set 000a0000, pos 000a0000
1162 mmu set 000b0000, pos 000b0000
ets Jun 8 2016 00:22:57
At this point I'm quite lost in what could possibly be wrong, any help is appreciated.

Ultimately - the solution is to use esp32 branch of nodemcu, as indicated in my comment above. The standard branch only works on ESP8266.

Related

How to translate Virtual to Physical Address (WinDbg)?

It seems I don't understand something.
I'm trying to translate VA to PA on Windows 10 (x86) under VirtualBox.
I use Microsoft manual for that.
I set up a local kernel debugger (bcedit) and launched CFF Explorer as a tested application. Then I started WinDbg, connected to the kernel and get active processes:
!process 0 0
Found my test application:
PROCESS a6bd7900 SessionId: 1 Cid: 0988 Peb: 7ffd9000 ParentCid: 0840
DirBase: ba9ac3c0 ObjectTable: acaeedc0 HandleCount: <Data Not Accessible>
Image: CFF Explorer.exe
Then get PEB:
.process /p a6bd7900; !peb 7ffd9000
Implicit process is now a6bd7900
PEB at 7ffd9000
...
ImageBaseAddress: 00400000
...
Ldr 76f99aa0
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 00881658 . 00887c00
Ldr.InLoadOrderModuleList: 00881728 . 00887bf0
Ldr.InMemoryOrderModuleList: 00881730 . 00887bf8
Base TimeStamp Module
400000 50a8fbd6 Nov 18 18:16:38 2012 C:\Program Files\NTCore\Explorer Suite\CFF Explorer.exe
76e90000 580ee2c9 Oct 25 07:42:49 2016 C:\WINDOWS\SYSTEM32\ntdll.dll
74970000 57cf8f7a Sep 07 06:54:34 2016 C:\WINDOWS\system32\KERNEL32.DLL
...
I typed "!r" command to print all registers:
cr0 Value: 00720054
cr2 Value: 00720054
cr3 Value: 00720054
cr4 Value: 00720054
cr4 in bin: 00000000 00001010 11111100 10110110
The 5th bit is true what means that PAE is enabled.
Then I opened the Memory windows and typed 400000 address to check I have the header of CFF Explorer.exe in Virtual memory.
Then I tried to get page frame number (PFN) via PTE extension (by the manual):
lkd> !pte 00400000
VA 00400000
PDE at C0600010 PTE at C0002000
contains 0000000000000000
contains 0000000000000000
not valid
I've got not a not valid address. At the same time, when I tried to get PFN of kernel32.dll I've got valid address:
lkd> !pte 74970000
VA 74970000
PDE at C0601D20 PTE at C03A4B80
contains 000000000121B867 contains 800000006F1CE005
pfn 121b ---DA--UWEV pfn 6f1ce -------UR-V
And then successfully got the header by physical address via "!dc 6f1ce000".
Then I checked windbg.exe itself and noticed that kernel32.dll has the same base address as CFF Explorer.exe. I always think that each process has own mapping of the dependent module to his own memory, but now it seems not so.
My questions:
Why do I get "not valid" when trying to translate 0x00400000 address?
Please, clear the situation with kernel32.dll and my doubts about mapping the module to each process.
UPDATE 0:
I don't know why, but when I debug the kernel as local - I see the same value in ALL registers. I've tried to remote debug the kernel, and now I see the different values for each register:
cr0 Value: 80010033
cr2 Value: 909a301c
cr3 Value: 001a8000
cr4 Value: 000406e9
And now, I can't get either kernel32.dll or the other modules translation.
The main questions are opened.
!pte may not work without the capital /P when setting the process context, because !pte reads the contents of the page table entries via virtual address, starting with nt!MmPteBase (FFFFF6FB7DBED000 in my case) – this is a kernel address – remember that the page tables are in kernel virtual memory meaning the PTs/PDs/PTPTs/PML4 themselves have kernel virtual addresses, so enabling the user mode address bypass will not stop kernel addresses from still being translated in hardware.
Without /P, the debugger will naturally use the page table of the current process in the logical core to access the data at this virtual address using translation in hardware on the CPU, which will work fine for non–process-unique virtual addresses because the same physical page is mapped into all page tables so it doesn't matter what ones currently in the core, but it will not work for any user virtual memory as all user memory is unique to the process (where a virtual page maps to a physical page unique to the process) and neither will it work for any kernel virtual memory that is unique to the process. An example of kernel virtual memory that is unique to the process is the page for user addresses, and the page tables for kernel addresses that contain the page tables
/p and /P are used in order to bypass this, and the debugger accesses the correct dirbase in software and walks the page table in software. /p only bypasses for all user mode addresses and /P also bypasses for all kernel mode addresses.
lkd> !process 0 0 calc.exe
PROCESS fffffa805d954b10
SessionId: 1 Cid: 3294 Peb: 7fffffdb000 ParentCid: 10f8
DirBase: 27a385000 ObjectTable: fffff8a02a766e60 HandleCount: 81.
Image: calc.exe
lkd> .process /p fffffa805d954b10
Implicit process is now fffffa80`5d954b10
lkd> !pte 0`ffbe0000
VA 00000000ffbe0000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00018 PDE at FFFFF6FB40003FE8 PTE at FFFFF680007FDF00
contains 00C0000263E77867 contains 0000000000000000
pfn 263e77 ---DA--UWEV not valid
----------------------------------------------------------------------------------------
lkd> .process /P fffffa805d954b10
Implicit process is now fffffa80`5d954b10
lkd> !pte 0`ffbe0000
VA 00000000ffbe0000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00018 PDE at FFFFF6FB40003FE8 PTE at FFFFF680007FDF00
contains 00C000000B023867 contains 00D0000759124867 contains 00E0000792FA5867 contains 80F000004D7DD025
pfn b023 ---DA--UWEV pfn 759124 ---DA--UWEV pfn 792fa5 ---DA--UWEV pfn 4d7dd ----A--UR-V
!vtop 0 ffbe0000 will work without /p or /P because it gets the dirbase PML4 physical address from the EPROCESS structure (EPROCESS is in non–⁠process-unique kernel memory that it can use any page table to access), and then it maps in the PML4 physical page of the correct page table by physical address, showing their physical addresses, and mapping in each resulting physical address of the next entry in the hierarchy into virtual memory so it can read it and continue the walk.
!pte fffffa80`5d954b10 (the EPROCESS address) will work without /p or /P because the EPROCESS physical page block happens to be mapped into all page tables at the same virtual address, so it doesn't matter if the translation is being bypassed by the debugger or if it is being done in hardware with whatever page table is currently in the core.
It appears to me that you only need to do /p or /P once for the whole debug session, and in order to reset it you have to .cache nodecodeptes, which you can't do in a local debugging session for some reason:
lkd> .process /P fffffa805d954b10
Implicit process is now fffffa80`5d954b10
lkd> !pte 10000
VA 0000000000010000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00000 PDE at FFFFF6FB40000000 PTE at FFFFF68000000080
contains 00C000000B023867 contains 01300001F5CA7867 contains 014000030E728867 contains 8D400001A4654867
pfn b023 ---DA--UWEV pfn 1f5ca7 ---DA--UWEV pfn 30e728 ---DA--UWEV pfn 1a4654 ---DA--UW-V
------------------------------------------
lkd> .process fffffa8027653b10
Implicit process is now fffffa80`27653b10
lkd> !pte 10000
VA 0000000000010000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00000 PDE at FFFFF6FB40000000 PTE at FFFFF68000000080
contains 12B0000195039867 contains 036000016A13C867 contains 01400001730BD867 contains FFFFFFFF00000480
pfn 195039 ---DA--UWEV pfn 16a13c ---DA--UWEV pfn 1730bd ---DA--UWEV not valid
Proto: VAD
Protect: 4 - ReadWrite
I mean it does say that the behaviour of /p and /P is the same as .cache forcedcodeuser and .cache forcedecodeptes respectively. Omitting both /p and /P does not perform .cache nodecodeptes but leaves it as it is, so once you've set /p on one process it applies to all processes (despite what msdn says, which I think is wrong), and then you can toggle to /P on a new process, and then /P will apply to all processes. When you start the session, the current state is .cache nodecodeptes, and in that state, it depends on the page tables that are actually in the logical core of the processor at the time, which for a local debug will be kd.exe, and for remote debug it will be the process of whatever thread has broken into the debugger.

Modify page table entry on Windows

For a stack address I have the following PDE / PTE info from Windgb:
kd> !pte 6EFFC
VA 0006effc
PDE at C0600000 PTE at C0000370
contains 0000000065D39867 contains 0000000000000020
pfn 65d39 ---DA--UWEV not valid
DemandZero
Protect: 1 - Readonly
How does WinDBG find out about the readonly state if even the PTE is not valid and how can it be changed? Has to be done via VAD?
If the 'valid' bit of the PTE is not set (which is the case in your example) then the PTE is handled by the operating system, not by the MMU.
In this case your PTE is a software PTE (_MMPTE_SOFTWARE structure; != _MMPTE_HARDWARE [you can 'dt' both structures on windbg]), which can results in 4 types of software PTE, depending on the bits set in the bitfield.
If bits 12 to 31 are all zero, then this is a "Demand Zero" PTE (thus, not resolved via VAD). Bits 5 to 9 indicates page protection (0x20 = 5th bit set = Read Only).
Protection bits are not officially documented, although you can find their values on some pages on the net. Taken from this reactos page:
#define MM_ZERO_ACCESS 0 // this value is not used.
#define MM_READONLY 1
#define MM_EXECUTE 2
#define MM_EXECUTE_READ 3
#define MM_READWRITE 4 // bit 2 is set if this is writable.
#define MM_WRITECOPY 5
#define MM_EXECUTE_READWRITE 6
#define MM_EXECUTE_WRITECOPY 7
#define MM_NOCACHE 8
#define MM_DECOMMIT 0x10
#define MM_NOACCESS MM_DECOMMIT|MM_NOCACHE
(Note: remember you have to left shift by 5 the above constants as protection bits start at bit 5)
See this blog post "Windows Virtual Address Translation and the Pagefile" (especially the part discussing Software PTEs) for a very good explanation about the various PTEs.

Kernel panic using deferred_io on kmalloced buffer

I'm writing a framebuffer for an SPI LCD display on ARM. Before I complete that, I've written a memory only driver and trialled it under Ubuntu (Intel, Virtualbox). The driver works fine - I've allocated a block of memory using kmalloc, page aligned it (it's page aligned anyway actually), and used the framebuffer system to create a /dev/fb1. I have my own mmap function if that's relevant (deferred_io ignores it and uses its own by the look of it).
I have set:
info->screen_base = (u8 __iomem *)kmemptr;
info->fix.smem_len = kmem_size;
When I open /dev/fb1 with a test program and mmap it, it works correctly. I can see what is happening x11vnc to "share" the fb1 out:
x11vnc -rawfb map:/dev/fb1#320x240x16
And view with a vnc viewer:
gvncviewer strontium:0
I've made sure I've no overflows by writing to the entire mmapped buffer and that seems to be fine.
The problem arises when I add in deferred_io. As a test of it, I have a delay of 1 second and the called deferred_io function does nothing except a pr_devel() print. I followed the docs.
Now, the test program opens /dev/fb1 fine, mmap returns ok but as soon as I write to that pointer, I get a kernel panic. The following dump is from the ARM machine actually but it panics on the Ubuntu VM as well:
root#duovero:~/testdrv# ./fbtest1 /dev/fb1
Device opened: /dev/fb3
Screen is: 320 x 240, 16 bpp
Screen size = 153600 bytes
mmap on device succeeded
Unable to handle kernel paging request at virtual address bf81e020
pgd = edbec000
[bf81e020] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: hhlcd28a(O) sysimgblt sysfillrect syscopyarea fb_sys_fops bnep ipv6 mwifiex_sdio mwifiex btmrvl_sdio firmware_class btmrvl cfg80211 bluetooth rfkill
CPU: 0 Tainted: G O (3.6.0-hh04 #1)
PC is at fb_deferred_io_fault+0x34/0xb0
LR is at fb_deferred_io_fault+0x2c/0xb0
pc : [<c0271b7c>] lr : [<c0271b74>] psr: a0000113
sp : edbdfdb8 ip : 00000000 fp : edbeedb8
r10: edbeedb8 r9 : 00000029 r8 : edbeedb8
r7 : 00000029 r6 : bf81e020 r5 : eda99128 r4 : edbdfdd8
r3 : c081e000 r2 : f0000000 r1 : 00001000 r0 : bf81e020
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: adbec04a DAC: 00000015
Process fbtest1 (pid: 485, stack limit = 0xedbde2f8)
Stack: (0xedbdfdb8 to 0xedbe0000)
[snipped out hexdump]
[<c0271b7c>] (fb_deferred_io_fault+0x34/0xb0) from [<c00db0c4>] (__do_fault+0xbc/0x470)
[<c00db0c4>] (__do_fault+0xbc/0x470) from [<c00dde0c>] (handle_pte_fault+0x2c4/0x790)
[<c00dde0c>] (handle_pte_fault+0x2c4/0x790) from [<c00de398>] (handle_mm_fault+0xc0/0xd4)
[<c00de398>] (handle_mm_fault+0xc0/0xd4) from [<c049a038>] (do_page_fault+0x140/0x37c)
[<c049a038>] (do_page_fault+0x140/0x37c) from [<c0008348>] (do_DataAbort+0x34/0x98)
[<c0008348>] (do_DataAbort+0x34/0x98) from [<c0498af4>] (__dabt_usr+0x34/0x40)
Exception stack(0xedbdffb0 to 0xedbdfff8)
ffa0: 00000280 0000ffff b6f5c900 00000000
ffc0: 00000003 00000000 00025800 b6f5c900 bea6dc1c 00011048 00000032 b6f5b000
ffe0: 00006450 bea6db70 00000000 000085d6 40000030 ffffffff
Code: 28bd8070 ebffff37 e2506000 0a00001b (e5963000)
---[ end trace 7e5ca57bebd433f5 ]---
Segmentation fault
root#duovero:~/testdrv#
I'm totally stumped - other drivers look more or less the same as mine but I assume they work. Most use vmalloc actually - is there a difference between kmalloc and vmalloc for this purpose?
Confirmed the fix so I'll answer my own question:
deferred_io changes the info mmap to its own that sets up fault handlers for writes to the video memory pages. In the fault handler it
checks bounds against info->fix.smem_len, so you must set that
gets the page that was written to.
For the latter case, it treats vmalloc differently from kmalloc (by checking info->screen_base to see if it's vmalloced). If you have vmalloced, it uses screen_base as the virtual address. If you have not used vmalloc, it assumes that the address of interest is the physical address in info->fix.smem_start.
So, to use deferred_io correctly
set screen_base (char __iomem *) and point that to the virtual address.
set info->fix.smem_len to the video buffer size
if you are not using vmalloc, you must set info->fix.smem_start to the video buffer's physical address by using virt_to_phys(vid_buffer);
Confirmed on Ubuntu as fixing the issue.
Really interesting, I'm currently implementing SPI-based display FB driver too (Sharp Memory LCD display and my VFDHack32 host driver). I also facing similar problem where it crashes at deferred_io. Can you share you source code ? mine is at my GitHub repo. P.S. that Memory LCD display is monochrome so I just pretend to be color display and just check whether the pixel byte is empty (dot off) or not empty (dot on).

Modifying current process' pte through /dev/mem?

AFAIK, /dev/mem presents physical memory to user, and it's usually being used for device read/write through MMIO. In my use case, I want to modify current process' pte so that two ptes will point to the same physical page. In particular, I move a x86_64 binary above 4G virtual space and mmap virtual space below 4G. I want to make 4G above pte and 4G below pte point to the same physical page, so that when I write into 4G above vaddr and read from 4G below pte, I get the same result. Sample code might look like below:
*(unsigned char *)vaddr1 = 7 // write into 4G above vaddr1
val = *(unsigned char *)vaddr2; // read from 4G below vaddr2
printf("val should be 7, %d\n", val);
But after I modify 4G below pte to point to physical page pointed by 4G above pte through /dev/mem, kernel give me message below,
BUG: Bad page map in process mmap pte:8000000007eb2067 pmd:07acb067
page:ffffea00001fac80 count:0 mapcount:-1 mapping: (null) index:0x101b7b
page flags: 0x4000000000000014(referenced|dirty)
addr:0000000101b7b000 vm_flags:00100073 anon_vma:ffff880007ab0708 mapping: (null) index:101b7b
Pid: 609, comm: mmap Tainted: G B 3.5.3 #7
Call Trace:
[<ffffffff8107abcc>] ? print_bad_pte+0x1d2/0x1ea
[<ffffffff8107bf18>] ? unmap_single_vma+0x3a0/0x56d
[<ffffffff8107c745>] ? unmap_vmas+0x2c/0x46
[<ffffffff8108106b>] ? exit_mmap+0x6e/0xdd
[<ffffffff8101cc4f>] ? do_page_fault+0x30f/0x348
[<ffffffff81020ce6>] ? mmput+0x20/0xb4
[<ffffffff810256ae>] ? exit_mm+0x105/0x110
[<ffffffff8103bb6c>] ? hrtimer_try_to_cancel+0x67/0x70
[<ffffffff81026b59>] ? do_exit+0x211/0x711
[<ffffffff810272e0>] ? do_group_exit+0x76/0xa0
[<ffffffff8102731c>] ? sys_exit_group+0x12/0x19
[<ffffffff812f3662>] ? system_call_fastpath+0x16/0x1b
BUG: Bad rss-counter state mm:ffff880007a496c0 idx:0 val:-1
BUG: Bad rss-counter state mm:ffff880007a496c0 idx:1 val:1
I guess kernel will examine if the pte has been modified, and I did something wrong. Here are vaddr1 and vaddr2's pte before/after my pte rewriting.
above 4G pte: 0x8000000007eb2067
below 4G pte: 0x0000000007ea7067
after rewriting pte...
above 4G pte: 0x8000000007eb2067
below 4G pte: 0x8000000007eb2067
Any idea? Thanks.
Note: Now I know I should release the physical page pointed by vaddr2's pte, otherwise kernel will note that physical page isn't pointed by any pte and give those error. But how? I try to use __free_page, but get error below.
BUG: unable to handle kernel paging request at ffffebe00008001c
IP: [<ffffffff8106b908>] __free_pages+0x4/0x2a
PGD 0
Oops: 0000 [#2] PREEMPT SMP
CPU 0

Dump the contents of TLB buffer of x86 CPU

Is it possible to get list of translations (from virtual pages into physical pages) from TLB (Translation lookaside buffer, this is a special cache in the CPU). I mean modern x86 or x86_64; and I want to do it in programmatic way, not by using JTAG and shifting all TLB entries out.
The linux kernel has no such dumper, there is page from linux kernel about cache and tlb: https://www.kernel.org/doc/Documentation/cachetlb.txt "Cache and TLB Flushing Under Linux." David S. Miller
There was an such TLB dump in 80386DX (and 80486, and possibly in "Embedded Pentium" 100-166 MHz / "Embedded Pentium MMX 200-233 MHz" in 1998):
1 - Book "MICROPROCESSORS: THE 8086/8088, 80186/80286, 80386/80486 AND THE PENTIUM FAMILY", ISBN 9788120339422, 2010, page 579
This was done via Test Registers TR6 TR7:
2 - Book "Microprocessors & Microcontrollers" by Godse&Godse, 2008 ISBN 9788184312973 page SA3-PA19: "3.2.7.3 Test Registers" "only two test registers (TR6-TR7) are currently defined. ... These registers are used to check translation lookaside buffer (TLB) of the paging unit."
3 "x86-Programmierung und -Betriebsarten (Teil 5). Die Testregister TR6 und TR7", deutsche article about registers: "Zur Prüfung des Translation-Lookaside-Buffers sind die zwei Testregister TR6 und TR7 vorhanden. Sie werden als Test-Command-Register (TR6) und Testdatenregister (TR7) bezeichnet. "
4 Intel's "Embedded Pentium® Processor Family Developer’s Manual", part "26 Model Specific Registers and Functions" page 8 "26.2.1.2 TLB Test Registers"
TR6 is command register, the linear address is written to it. It can be used to write to TLB or to read line from TLB. TR7 is data to be written to TLB or read from TLB.
Wikipedia says in https://en.wikipedia.org/wiki/Test_register that reading TR6/TR7 "generate invalid opcode exception on any CPU newer than 80486."
The encoding of mov tr6/tr7 was available only to privilege level 0: http://www.fermimn.gov.it/linux/quarta/x86/movrs.htm
0F 24 /r movl tr6/tr7,r32 12 Move (test register) to (register)
movl %tr6,%ebx
movl %tr7,%ebx
0F 26 /r movl r32,tr6/tr7 12 Move (register) to (test register)
movl %ebx,%tr6
movl %ebx,%tr7
You can get the list of VA-PA translations stored in TLB but you may have to use a processor emulator like qemu. You can download and install qemu from http://wiki.qemu.org/Main_Page
You can boot a kernel which is stored in a disk image (typically in qcow2 or raw format) and run your application. You may have to tweak the code in qemu to print the contents of TLB. Look at tlb_* functions in qemu/exec.c You may want to add a tlb_dump_function to print the contents of the TLB. As far as I know, this is the closest you can get to dumping the contents of TLB.
P.S: I started answering this question and then realized it was an year old.

Resources