"nosplit stack overflow" when building Go project? - go

I did a spring cleaning in my code by splitting it up in more Go packages, mainly to help reusability (each "building block" in its own package).
After fixing the import errors, I discovered that my program suddenly won't build. Running "go build" returns a nosplit stack overflow error.
robot main.init: nosplit stack overflow
120 guaranteed after split check in main.init
112 on entry to robot/web.init
104 on entry to robot/controller.init
96 on entry to robot/slam.init
88 on entry to robot/slam/hector.init
80 on entry to hectormapping/map/mapimages.init
72 on entry to hectormapping/map/maprep.init
64 on entry to hectormapping/map/mapproccontainer.init
56 on entry to hectormapping/scanmatcher.init
48 on entry to hectormapping/map/gridmap/occbase.init
40 on entry to hectormapping/map/gridmap/base.init
32 on entry to hectormapping/map/gridmap.init
24 on entry to github.com/skelterjohn/go%2ematrix.init
16 on entry to math.init
8 on entry to math.init┬À1
0 on entry to runtime.panicindex
-8 on entry to runtime.morestack00
runtime.main: nosplit stack overflow
120 guaranteed after split check in runtime.main
128 after runtime.main uses -8
120 on entry to main.init
112 on entry to robot/web.init
104 on entry to robot/controller.init
96 on entry to robot/slam.init
88 on entry to robot/slam/hector.init
80 on entry to hectormapping/map/mapimages.init
72 on entry to hectormapping/map/maprep.init
64 on entry to hectormapping/map/mapproccontainer.init
56 on entry to hectormapping/scanmatcher.init
48 on entry to hectormapping/map/gridmap/occbase.init
40 on entry to hectormapping/map/gridmap/base.init
32 on entry to hectormapping/map/gridmap.init
24 on entry to github.com/skelterjohn/go%2ematrix.init
16 on entry to math.init
8 on entry to math.init┬À1
0 on entry to runtime.panicindex
-8 on entry to runtime.morestack00
Does anyone know what this is about? I can't find much documentation as to what might be causing it, except that for some cases this is a bug that supposedly is fixed.
Some of the code was split into a new folder in the "src" folder, so that the file structure is now:
src/robot/main.go (main() lives here)
src/robot/(...) (application-specific packages)
src/hectormapping/(...) (stand-alone package used in "robot")
I am using Go 1.0.3 on Windows 7 (x64).

This seems to be the same as described here which was said to be fixed in tip. The corresponding fix can be reviewed here.
To summarize the problem as I am seeing it:
Split stacking is used for growing stacks instead of the conventional fixed memory area. This has the benefit that more threads can be spawned, as only the needed stack memory is actually reserved. The problem here seems to be that the linker marks functions that don't use memory on the split stack accidentally as 'nosplit' because it doesn't find the split stack prologue. This leads to the linker calculating a wrong stack limit, which in turn lets the linker think there's no space and throws the error message at you.
Sadly, the only way of getting the tip version is to compile it by yourself. As Nick Craig-Wood already mentioned, you can find the instructions here. If you really really can't upgrade, you could try to work around this by allocation some arbitrary local variable in your init functions. But this is very messy of course.

Related

Finding and patching an instruction in a DLL

I have a (C++) program where, in one of its dll's, the following is done:
if (m_Map.GetMaxValue() >= MAX_CLASSES) {
I have two binaries of this program (compiled with various versions of Visual Studio), one where MAX_CLASSES was #define'd to 50, and one where it was 75. These binaries were made from different branches of the code and the other functionality is different as well. What I need is a version of the binary where the MAX_CLASSES was defined as 50, except with the higher limit i.e. 75.
So a sane person would change the constant in the source code of the branch I need, rebuild and go home. But, building this software is complex because it's old, the dependencies and tooling are old, etc.; plus I have issues with building the installers, and data and so on. So, I thought, how about I just patch this binary so that this one constant is changed directly in the DLL. I have vague recollections of doing similar things in the 1990's for, eh, probably 'educational' purposes.
But times have changed and I barely remember doing it, let alone how I did things back then. I opened the DLL (one where the limit is set to 75, this is the binary I have at hand - I will have to re-do this as soon as I have the actual binary with the 50 limit, so the following references 75 i.e. 0x4b for illustrating the principle) in Ghidra and after some poking around, I found the following:
18005160e 3c 4b CMP AL,0x4b
180051610 0f 82 19 JC LAB_18005172f
01 00 00
Which in the decompiler window I could link back to
if (bVar3 < 0x4b)
and some operations after that that I can map to the source code of the function I have.
Now my questions are:
how do I interpret the values above (the Ghidra output) wrt to the binary layout of the dll? When I hover over the first column value ('18005160e') in Ghidra, I get values for 'imagebase offset', 'memory block offset', 'function offset' and 'byte source offset'. Is this 'byte source offset' the physical address from the start of the dll where these instructions start? The actual value in this hover balloon is 50a0eh - is that Ghidra's notation for 0x50a0e ? I.e. does the trailing 'h' denote 'hex'?
I then tried to open the dll in a regular hex editor ('Hex Editor Neo' which I like to use to view/edit binary data files), and went to offset 0x50a0e, and looked for the values '3c 4b' around there which I didn't find. I searched for this byte sequence in the whole file, and found 7 occurrences, none of which are around 0x50a0e, leading me to think I'm misinterpreting Ghidra's 'byte source offset' here.
how do I make a 'patcher' for this? I would think what I need is a program that only does
FILE* fh = fopen('mydll.dll);
fseek(fh, 0x[magic constant]);
fwrite(fh, 0x4b);
fclose(fh);
where '0x[magic constant]' is hopefully just the value I got from Ghidra in 'byte source offset'? Or is there anything else I need to consider here? Is there any software tool that can generate a patcher program?
Thanks.
18005160e is a VA, a Virtual Address.
It is the sum of a Base Address (most likely 180000000) and an RVA, a Relative Virtual Address.
Find the Base Address of the DLL with any PE inspecting tool (e.g. CFF Explorer) or Ghidra itself.
Subtract the base address from 18005160e to the RVA. Let's say the result is 5160e.
Now you need to find which section this RVA lies in. Again use an PE inspecting tool to find the list of the sections and their RVA/Virtual start and RVA/Virtual size.
Say the RVA lies in the .text section with start at the RVA 1000.
Subtract this start RVA from the result above: 5160e - 1000 = 4160e.
This is the offset of the instruction in the .text section.
To find the offset in the file, just add the raw/offset start of the section (again you can find this with a PE inspecting tool).
Say the .text section starts at the offset 400, then 4160e + 400 = 41a0e is the offset corresponding to the VA 18005160e.
This is all PE 101.

Data loss during windows partitioning | Foremost and autopsy tools

I'm currently dealing with a loss of 320gb of data.
During the installation of windows, I accidentally deleted a partition (currently in a "not allocated space" state) please see example picture.
I tried Autopsy and Foremost, but both are not detecting any data on this SATA HHD.
Is there any advice to proceed with?
The disk is untouched and nothing has been done after that operation
Stage of data loss
You have to recreate partition table. To do that you have to find where your deleted partition(s) started with some hex editor, write down where they start and then create new partition table.
Your picture says you're under windows. If partition is windows then it will start like this (hexdump -C of first 8 bytes):
00000000 eb 52 90 4e 54 46 53 20 |.R.NTFS |
which you can use as search pattern.
I would advise to create new partition table under linux as you can quickly check your work with mount. Unless you have some magical tools for windows. You can use any live linux you like (here is small list with desriptions) even in virtual box.

DOS debug.exe: Restricted areas of memory?

(this my first question, excuse me for any mistakes)
I was messing around with debug.exe and tried to alter the BIOS date stored in address range FFFF:0005 to FFFF:000C.
-d FFFF:5 L 8
FFFF:0000 30 31 2F-30 31 2F 39 32 01/01/92
I finally figured out that to move to the address i want to modify i had to point the DS register to it and not the CS as erroneously stated in some sites(e.g. here)
-r DS
DS=073F
:FFFF
I also figured out that I can use the whole address to modify the exact memory address I want.
-e FFFF:000b
FFFF:000B 39.31 32.31
but then the output of dump command remained unchanged!!!
-d FFFF:5 L 8
FFFF:0000 30 31 2F-30 31 2F 39 32 01/01/92
I am suspecting that there are maybe some "protected" areas in memory I cannot modify, but I couldn't find any documentation about that is why I am asking. Can anyone possibly explain me why and how this is happening?
Thank you
P.S. Note that I am using DosBox to emulate this and to not brick my computer!(maybe this is the problem?)
As the comments suggest, you are writing to ROM, so the values there can't be changed by your code. On modern machines you would get some sort of error as feedback for doing this, but on old hardware it's very common for writes to ROM to be silently ignored. In other words, the CPU will perform the requested operation anyway, but that operation will have no effect on the memory.

Finding the Raw entrypoint

I want to be able to find out where the code appearing at the entry point comes from by looking at the PE header.
For example, this piece of code is the starting code of my program(401000h)
00401000 >/$ 58 POP EAX ; kernel32.76E93677
00401001 |. 2D 77360100 SUB EAX,13677
00401006 |. BB 4A184000 MOV EBX,<JMP.&kernel32.VirtualProtect>
I want to know where this code comes from. How can I find it without manually scanning my file? (to complete the example, here's an hexdump from the same file, the code now resides at 200h)
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00000200 58 2D 77 36 01 00 BB 4A 18 40 00
How can I get from my virtual entry point (401000h) to the raw entry point (200h)?
I tried solving it myself of course. But I'm missing something. At first I thought:
.text[ Entrypoint (1000h) - VirtualOffset (1000d) ] = raw entrypoint
since the file alignment = 200, and the raw entry point was at the very start of my .text section, I thought I could use this for all the executables.
Solved, I made stupid mistakes when calculating the raw entry point
.text[ Entry point - Virtual offset ] + File Alignment = Raw entry point (relative to .text section)
To locate the offset in the file by yourself you need to have a look at the _IMAGE_NT_HEADERS structure. From this you can get the IMAGE_OPTIONAL_HEADER where
the member you are interested in ImageBase is. You can change its value with EditBin /REBASE so there is little need to roll your own tool.
For reference how you can determine the entry point via dumpbin.
You can use
dumpbin /headers
dumpbin /headers \Windows\bfsvc
Dump of file \Windows\bfsvc.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (x86)
4 number of sections
4A5BBFB3 time date stamp Tue Jul 14 01:13:55 2009
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
102 characteristics
Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
9.00 linker version
DE00 size of code
2000 size of initialized data
0 size of uninitialized data
4149 entry point (01004149)
1000 base of code
F000 base of data
1000000 image base (01000000 to 01011FFF)
1000 section alignment
200 file alignment
For the entry point the image base value is relevant. But this is only true for images that are not ASLR enabled. For them a random base address (1 of 128 different ones) is choosen.
The flag that indicates if an image is ASLR enabled is the value 0x40 which is set in DLL characteristics.
8140 DLL characteristics
For svchost.exe for example it is set for older programs it is generally 0.
Yours,
Alois Kraus
Have a look at this thread including an answer with a detailed explanation: Calculating the file offset of a entry point in a PE file
AddressOfRawEntryPoint (in EXE file) = AddressOfEntryPoint + .text[PointerToRawData] - .text[VirtualAddress]

CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?

Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.
The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.
I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.
I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.
Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.

Resources