I'm trying to mprotect a subset of functions bundled in a shared library for the purposes of a larger level feature.
Given the requirements around page-alignment for mprotect, I've setup a separate section in my linker script that ensures that alignment:
SECTIONS
{
.protectedsection ALIGN(4096) : {
*(.protectedsection)
}
}
INSERT AFTER .rodata;
And in the declaration of the function I want protection on, I add the relevant GCC attributes:
// Setup with protection
void bar(int a) __attribute__ ((section ("protectedsection")));
// Setup without protection
void foo(int a);
I compile the resultant C code with GCC, along with the -T option to pass in the linker file:
gcc -fpic -shared -T linkerscript.ld libfuncs.c -o libfuncs.so
Objdump'ing it reveals that while it is in the right section, the alignment isn't right:
Disassembly of section protectedsection:
00000000000010da <_Z3bari>:
10da: 55 push %rbp
10db: 48 89 e5 mov %rsp,%rbp
10de: 48 83 ec 10 sub $0x10,%rsp
10e2: 89 7d fc mov %edi,-0x4(%rbp)
10e5: 48 8d 3d e1 00 00 00 lea 0xe1(%rip),%rdi # 11cd <_fini+0x9>
10ec: e8 df f4 ff ff callq 5d0 <puts#plt>
10f1: 90 nop
10f2: c9 leaveq
10f3: c3 retq
Is what I'm trying to do here possible, and if so, how?
Found the problem - changing 'INSERT AFTER .rodata' to 'INSERT AFTER .text' fixed my problem. With that, I was able to setup a 'buffered' region for the code I want to have mprotect'ed, and it all works like a charm!
Related
I understand when to use a cobbler list (e.g. listing a register which is modified in the assembly so that it doesn't get chosen for use as an input register, etc), but I can't wrap my head around the the earlyclobber constraint &. If you list your outputs, wouldn't that already mean that inputs can't use the selected register (aside from matching digit constraints)?
For example:
asm(
"movl $1, %0;"
"addl $3, %0;"
"addl $4, %1;" // separate bug: modifies input-only operand
"addl %1, %0;"
: "=g"(num_out)
: "g"(num_in)
:
);
Would & even be needed for the output variables? The compiler should know the register that was selected for the output, and thus know not to use it for the input.
By default, the compiler assumes all inputs will be consumed before any output registers are written to, so that it's allowed to use the same registers for both. This leads to better code when possible, but if the assumption is wrong, things will fail catastrophically. The "early clobber" marker is a way to tell the compiler that this output will be written before all the input has been consumed, so it cannot share a register with any input.
GNU C inline asm syntax was designed to wrap a single instruction as efficiently as possible. You can put multiple instructions in an asm template, but the defaults (assuming that all inputs are read before any outputs are written) are designed around wrapping a single instruction.
It's the same constraint syntax as GCC uses in its machine-description files that teach the compiler what instructions are available in an ISA.
Minimal educational example
Here I provide a minimal educational example that attempts to make what https://stackoverflow.com/a/15819941/895245 mentioned clearer.
This specific code is of course not useful in practice, and could be achieved more efficiently a single lea 1(%q[in]), %out instruction, it is just a simple educational example.
main.c
#include <assert.h>
#include <inttypes.h>
int main(void) {
uint64_t in = 1;
uint64_t out;
__asm__ (
"mov %[in], %[out];" /* out = in */
"inc %[out];" /* out++ */
"mov %[in], %[out];" /* out = in */
"inc %[out];" /* out++ */
: [out] "=&r" (out)
: [in] "r" (in)
:
);
assert(out == 2);
}
Compile and run:
gcc -ggdb3 -std=c99 -O3 -Wall -Wextra -pedantic -o main.out main.c
./main.out
This program is correct and the assert passes, because & forces the compiler to choose different registers for in and out.
This is because & tells the compiler that in might be used after out was written to, which is actually the case here.
Therefore, the only way to not wrongly modify in is to put in and out in different registers.
The disassembly:
gdb -nh -batch -ex 'disassemble/rs main' main.out
contains:
0x0000000000001055 <+5>: 48 89 d0 mov %rdx,%rax
0x0000000000001058 <+8>: 48 ff c0 inc %rax
0x000000000000105b <+11>: 48 89 d0 mov %rdx,%rax
0x000000000000105e <+14>: 48 ff c0 inc %rax
which shows that GCC chose rax for out and rdx for in.
If we remove the & however, the behavior is unspecified.
In my test system, the assert actually fails, because the compiler tries to minimize register usage, and compiles to:
0x0000000000001055 <+5>: 48 89 c0 mov %rax,%rax
0x0000000000001058 <+8>: 48 ff c0 inc %rax
0x000000000000105b <+11>: 48 89 c0 mov %rax,%rax
0x000000000000105e <+14>: 48 ff c0 inc %rax
therefore using rax for both in and out.
The result of this is that out is incremented twice, and equals 3 instead of 2 in the end.
Tested in Ubuntu 18.10 amd64, GCC 8.2.0.
More practical examples
multiplication implicit output registers
non-hardcoded scratch registers: GCC: Prohibit use of some registers
I want to use SymGetSourceFile to get a source file from source server using info from a dump file. But the first param is a handle to process but during postmortem we dont have a process, so is it meant to be used only for live debugging tools? How can I use it from a postmortem debugging tool?
BOOL IMAGEAPI SymGetSourceFile(
HANDLE hProcess,
ULONG64 Base,
PCSTR Params,
PCSTR FileSpec,
PSTR FilePath,
DWORD Size
);
https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symgetsourcefile
Update:
I have tried using IDebugAdvanced3 interface for same but get HR = 0x80004002 for GetSourceFileInformation call.
char buf[1000] = { 0 };
HRESULT hr = g_ExtAdvanced->GetSourceFileInformation(DEBUG_SRCFILE_SYMBOL_TOKEN,
"Application.cs",
0x000000dd6f5f1000, 0, buf, 1000, 0);
if (SUCCEEDED(hr))
{
dprintf("GetSourceFileInformation = %s", buf);
char buftok[5000] = { 0 };
hr = g_ExtAdvanced->FindSourceFileAndToken(0, 0x000000dd6f5f1000,
"Application.cs", DEBUG_FIND_SOURCE_TOKEN_LOOKUP,
buf, 1000, 0, buftok, 5000, 0);
if (SUCCEEDED(hr))
{
dprintf("FindSourceFileAndToken = %s", buf);
}
else
dprintf("FindSourceFileAndToken HR = %x", hr);
}
else
dprintf("GetSourceFileInformation HR = %x", hr);
I have dump that has this module and pdb loaded. and pass an address within the module - 0x000000dd6f5f1000, to GetSourceFileInformation
this was a comment but grew up so addingas answer
GetSourceFileINformation iirc checks the source servers those that start with srv or %srcsrv%
this returns a token for use with findsourcefileandtoken
if you have a known offset (0x1070 == main() in case below )
use GetLineByOffset this has the added advantage of reloading all the modules
hope you have your private pdb for the dump file you open.
this is engext syntax
Hr = m_Client->OpenDumpFile("criloc.dmp");
Hr = m_Control->WaitForEvent(0,INFINITE);
unsigned char Buff[BUFFERSIZE] = {0};
ULONG Buffused = 0;
DEBUG_READ_USER_MINIDUMP_STREAM MiniStream ={ModuleListStream,0,0,Buff,BUFFERSIZE,Buffused};
Hr = m_Advanced2->Request(DEBUG_REQUEST_READ_USER_MINIDUMP_STREAM,&MiniStream,sizeof(
DEBUG_READ_USER_MINIDUMP_STREAM),NULL,NULL,NULL);
MINIDUMP_MODULE_LIST *modlist = (MINIDUMP_MODULE_LIST *)&Buff;
Hr = m_Symbols->GetLineByOffset(modlist->Modules[0].BaseOfImage+0x1070,&Line,
FileBuffer,0x300,&Filesize,&Displacement);
Out("getlinebyoff returned %x\nsourcefile is at %s line number is %d\n",Hr,FileBuffer,Line);
this is part src adapt it to your needs.
the result of the extension command is pasted below
0:000> .load .\mydt.dll
0:000> !mydt
Loading Dump File [C:\Users\xxxx\Desktop\srcfile\criloc.dmp]
User Mini Dump File with Full Memory: Only application data is available
OpenDumpFile Returned 0
WaitForEvent Returned 0
Request Returned 0
Ministream Buffer Used 28c
06 00 00 00 00 00 8d 00 00 00 00 00 00 e0 04 00
f0 9a 05 00 2d 2e a8 5f ba 14 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
43 00 00 00 4a 38 00 00 00 00 00 00 00 00 00 00
40 81 00 00 00 00 00 00 00 00 00 00 00 00 00 00
No of Modules =6
Module[0]
Base = 8d0000
Size = 4e000
getlinebyoff returned 0
sourcefile is at c:\users\xxx\desktop\misc\criloc\criloc.cpp line number is 21 <<<<<<<<<
||1:1:010> lm
start end module name
008d0000 0091e000 CRILOC (private pdb symbols) C:\Users\xxxx\Desktop\misc\CRILOC\CRILOC.pdb
||1:1:010>
and the actual source file contents on path
:\>grep -i -n main CRILOC.CPP
20:int main(void) << the curly braces is on line 21
UPDATE:
yes if the src file is not source indexed (cvs,perforce,... ) GetSourceFileInformation () will not return a token
it checks for a token using the Which parameter
and the returned info can be used in FindSourceFileAndToken();
if your source is not source indexed and you only have a source path
use FindSourceFileandToken() with DEBUG_FIND_SOURCE_FULL_PATH Flag
be aware you need to either use SetSourcePath() or issue .srcpath command or use _NT_SOURCE_PATH environment variable or use -srcpath commandline switch prior to invoking FindSourceFileAndToken()
see below for a walkthrough
sourcefile and contents
:\>ls *.cpp
mydt.cpp
:\>cat mydt.cpp
#include <engextcpp.cpp>
#define BSIZE 0x1000
class EXT_CLASS : public ExtExtension {
public:
EXT_COMMAND_METHOD(mydt);
};
EXT_DECLARE_GLOBALS();
EXT_COMMAND( mydt, "mydt", "{;e,o,d=0;!mydt;}" ){
HRESULT Hr = m_Client->OpenDumpFile("criloc.dmp");
Hr = m_Control->WaitForEvent(0,INFINITE);
char Buff[BSIZE] = {0};
ULONG Buffused = 0;
DEBUG_READ_USER_MINIDUMP_STREAM MiniStream ={ModuleListStream,0,0,
Buff,BSIZE,Buffused};
Hr = m_Advanced2->Request(DEBUG_REQUEST_READ_USER_MINIDUMP_STREAM,&MiniStream,
sizeof(DEBUG_READ_USER_MINIDUMP_STREAM),NULL,NULL,NULL);
MINIDUMP_MODULE_LIST *modlist = (MINIDUMP_MODULE_LIST *)&Buff;
//m_Symbols->SetSourcePath("C:\\Users\\xxx\\Desktop\\misc\\CRILOC");
char srcfilename[BSIZE] ={0};
ULONG foundsize =0 ;
Hr = m_Advanced3->FindSourceFileAndToken(0,modlist->Modules[0].BaseOfImage,"criloc.cpp",
DEBUG_FIND_SOURCE_FULL_PATH,NULL,0,NULL,srcfilename,0x300,&foundsize);
Out("gsfi returned %x\n" , Hr);
Out("srcfilename is %s\n",srcfilename);
}
compiled and linked with
:\>cat bld.bat
#echo off
set "INCLUDE= %INCLUDE%;E:\windjs\windbg_18362\inc"
set "LIB=%LIB%;E:\windjs\windbg_18362\lib\x86"
set "LINKLIBS=user32.lib kernel32.lib dbgeng.lib dbghelp.lib"
cl /LD /nologo /W4 /Od /Zi /EHsc mydt.cpp /link /nologo /EXPORT:DebugExtensionInitialize /Export:mydt /Export:help /RELEASE %linklibs%
:\>bld.bat
mydt.cpp
E:\windjs\windbg_18362\inc\engextcpp.cpp(1849): warning C4245: 'argument': conversion from 'int' to 'ULONG64', signed/unsigned mismatch
Creating library mydt.lib and object mydt.exp
:\>file mydt.dll
mydt.dll; PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
executing
:\>cdb cdb
Microsoft (R) Windows Debugger Version 10.0.18362.1 X86
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ntdll!LdrpDoDebuggerBreak+0x2c:
77d805a6 cc int 3
0:000> .load .\mydt.dll
0:000> .chain
Extension DLL chain:
.\mydt.dll: API 1.0.0, built Thu Mar 18 20:40:04 2021
[path: C:\Users\xxxx\Desktop\srcfile\New folder\mydt.dll]
0:000> !mydt
Loading Dump File [C:\Users\xxxx\Desktop\srcfile\New folder\criloc.dmp]
User Mini Dump File with Full Memory: Only application data is available
gsfi returned 80004002
srcfilename is
||1:1:010> .srcpath "c:\\users\\xxxx\\desktop\\misc\\criloc\\"
Source search path is: c:\\users\\xxxx\\desktop\\misc\\criloc\\
************* Path validation summary **************
Response Time (ms) Location
OK c:\\users\\xxxx\\desktop\\misc\\criloc\\
||1:1:010> !mydt
Loading Dump File [C:\Users\xxxx\Desktop\srcfile\New folder\criloc.dmp]
gsfi returned 0
srcfilename is c:\\users\\xxxx\\desktop\\misc\\criloc\\criloc.cpp
||2:2:021>
I'm attempting to call a method from LLVM IR back to C++ code. I'm working in 64-bit Visual C++, or as LLVM describes it:
Machine CPU: skylake
Machine info: x86_64-pc-windows-msvc
For integer types and pointer types my code works fine as-is. However, floating point numbers seem to be handled a bit strange.
Basically the call looks like this:
struct SomeStruct
{
static void Breakpoint( return; } // used to set a breakpoint
static void Set(uint8_t* ptr, double foo) { return foo * 2; }
};
and LLVM IR looks like this:
define i32 #main(i32, i8**) {
varinit:
// omitted here: initialize %ptr from i8**.
%5 = load i8*, i8** %instance0
// call to some method. This works - I use it to set a breakpoint
call void #"Helper::Breakpoint"(i8* %5)
// this call fails:
call void #"Helper::Set"(i8* %5, double 0xC19EC46965A6494D)
ret i32 0
}
declare double #"SomeStruct::Callback"(i8*, double)
I figured that the problem is probably in the way the calling conventions work. So I've attempted to make some adjustments to correct for that:
// during initialization of the function
auto function = llvm::Function::Create(functionType, llvm::Function::ExternalLinkage, name, module);
function->setCallingConv(llvm::CallingConv::X86_64_Win64);
...
// during calling of the function
call->setCallingConv(llvm::CallingConv::X86_64_Win64);
Unfortunately no matter what I try, I end up with 'invalid instruction' errors, which this user reports to be an issue with calling conventions: Clang producing executable with illegal instruction . I've tried this with X86-64_Win64, Stdcall, Fastcall and no calling convention specs - all with the same result.
I've read up on https://msdn.microsoft.com/en-us/library/ms235286.aspx in an attempt to figure out what's going on. Then I looked at the assembly output that's supposed to be generated by LLVM (using the targetMachine->addPassesToEmitFile API call) and found:
movq (%rdx), %rsi
movq %rsi, %rcx
callq "Helper2<double>::Breakpoint"
vmovsd __real#c19ec46965a6494d(%rip), %xmm1
movq %rsi, %rcx
callq "Helper2<double>::Set"
xorl %eax, %eax
addq $32, %rsp
popq %rsi
According to MSDN, argument 2 should be in %xmm1 so that also seems correct. However, when checking if everything works in the debugger, Visual Studio reports a lot of question marks (e.g. 'illegal instruction').
Any feedback is appreciated.
The disassembly code:
00000144F2480007 48 B8 B6 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848B6h
00000144F2480011 48 89 D1 mov rcx,rdx
00000144F2480014 48 89 54 24 20 mov qword ptr [rsp+20h],rdx
00000144F2480019 FF D0 call rax
00000144F248001B 48 B8 C0 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848C0h
00000144F2480025 48 B9 00 00 47 F2 44 01 00 00 mov rcx,144F2470000h
00000144F248002F ?? ?? ??
00000144F2480030 ?? ?? ??
00000144F2480031 FF 08 dec dword ptr [rax]
00000144F2480033 10 09 adc byte ptr [rcx],cl
00000144F2480035 48 8B 4C 24 20 mov rcx,qword ptr [rsp+20h]
00000144F248003A FF D0 call rax
00000144F248003C 31 C0 xor eax,eax
00000144F248003E 48 83 C4 28 add rsp,28h
00000144F2480042 C3 ret
Some of the information about the memory is missing. Memory view:
0x00000144F248001B 48 b8 c0 48 b8 c8 fa 7f 00 00 48 b9 00 00 47 f2 44 01 00 00 62 f1 ff 08 10 09 48 8b 4c 24 20 ff d0 31 c0 48 83 c4 28 c3 00 00 00 00 00 ...
The question marks that are missing here are: '62 f1 '.
Some code is helpful to see how I get the JIT to compile etc. I'm afraid it's a bit long, but helps to get the idea... and I have no clue how to create a smaller piece of code.
// Note: FunctionBinderBase basically holds an llvm::Function* object
// which is bound using the above code and a name.
llvm::ExecutionEngine* Module::Compile(std::unordered_map<std::string, FunctionBinderBase*>& externalFunctions)
{
// DebugFlag = true;
#if (LLVMDEBUG >= 1)
this->module->dump();
#endif
// -- Initialize LLVM compiler: --
std::string error;
// Helper function, gets the current machine triplet.
llvm::Triple triple(MachineContextInfo::Triplet());
const llvm::Target *target = llvm::TargetRegistry::lookupTarget("x86-64", triple, error);
if (!target)
{
throw error.c_str();
}
llvm::TargetOptions Options;
// Options.PrintMachineCode = true;
// Options.EnableFastISel = true;
std::unique_ptr<llvm::TargetMachine> targetMachine(
target->createTargetMachine(MachineContextInfo::Triplet(), MachineContextInfo::CPU(), "", Options, llvm::Reloc::Default, llvm::CodeModel::Default, llvm::CodeGenOpt::Aggressive));
if (!targetMachine.get())
{
throw "Could not allocate target machine!";
}
// Create the target machine; set the module data layout to the correct values.
auto DL = targetMachine->createDataLayout();
module->setDataLayout(DL);
module->setTargetTriple(MachineContextInfo::Triplet());
// Pass manager builder:
llvm::PassManagerBuilder pmbuilder;
pmbuilder.OptLevel = 3;
pmbuilder.BBVectorize = false;
pmbuilder.SLPVectorize = true;
pmbuilder.LoopVectorize = true;
pmbuilder.Inliner = llvm::createFunctionInliningPass(3, 2);
llvm::TargetLibraryInfoImpl *TLI = new llvm::TargetLibraryInfoImpl(triple);
pmbuilder.LibraryInfo = TLI;
// Generate pass managers:
// 1. Function pass manager:
llvm::legacy::FunctionPassManager FPM(module.get());
pmbuilder.populateFunctionPassManager(FPM);
// 2. Module pass manager:
llvm::legacy::PassManager PM;
PM.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(PM);
// 3. Execute passes:
// - Per-function passes:
FPM.doInitialization();
for (llvm::Module::iterator I = module->begin(), E = module->end(); I != E; ++I)
{
if (!I->isDeclaration())
{
FPM.run(*I);
}
}
FPM.doFinalization();
// - Per-module passes:
PM.run(*module);
// Fix function pointers; the PM.run will ruin them, this fixes that.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#if (LLVMDEBUG >= 2)
// -- ASSEMBLER dump code
// 3. Code generation pass manager:
llvm::legacy::PassManager CGP;
CGP.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(CGP);
std::string result;
llvm::raw_string_ostream str(result);
llvm::buffer_ostream os(str);
targetMachine->addPassesToEmitFile(CGP, os, llvm::TargetMachine::CodeGenFileType::CGFT_AssemblyFile);
CGP.run(*module);
str.flush();
auto stringref = os.str();
std::string assembly(stringref.begin(), stringref.end());
std::cout << "ASM code: " << std::endl << "---------------------" << std::endl << assembly << std::endl << "---------------------" << std::endl;
// -- end of ASSEMBLER dump code.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#endif
#if (LLVMDEBUG >= 2)
module->dump();
#endif
// All done, *RUN*.
llvm::EngineBuilder engineBuilder(std::move(module));
engineBuilder.setEngineKind(llvm::EngineKind::JIT);
engineBuilder.setMCPU(MachineContextInfo::CPU());
engineBuilder.setMArch("x86-64");
engineBuilder.setUseOrcMCJITReplacement(false);
engineBuilder.setOptLevel(llvm::CodeGenOpt::None);
llvm::ExecutionEngine* engine = engineBuilder.create();
// Define external functions
for (auto it : externalFunctions)
{
auto fcn = it.second;
if (fcn->function)
{
engine->addGlobalMapping(fcn->function, const_cast<void*>(fcn->FunctionPointer())); // Yuck... LLVM only takes non-const pointers
}
}
// Finalize
engine->finalizeObject();
return engine;
}
Update (progress)
Apparently my Skylake has problems with the vmovsd instruction. When running the same code on a Haswell (server), the test succeeds. I've checked the assembly output on both - they are exactly the same.
Just to be sure: XSAVE/XRESTORE shouldn't be the problem on Win10-x64, but let's find out anyways. I've checked the features with the code from https://msdn.microsoft.com/en-us/library/hskdteyh.aspx and the XSAVE/XRESTORE from https://insufficientlycomplicated.wordpress.com/2011/11/07/detecting-intel-advanced-vector-extensions-avx-in-visual-studio/ . The latter runs just fine. As for the former, these are the results:
GenuineIntel
Intel(R) Core(TM) i7-6700HQ CPU # 2.60GHz
3DNOW not supported
3DNOWEXT not supported
ABM not supported
ADX supported
AES supported
AVX supported
AVX2 supported
AVX512CD not supported
AVX512ER not supported
AVX512F not supported
AVX512PF not supported
BMI1 supported
BMI2 supported
CLFSH supported
CMPXCHG16B supported
CX8 supported
ERMS supported
F16C supported
FMA supported
FSGSBASE supported
FXSR supported
HLE supported
INVPCID supported
LAHF supported
LZCNT supported
MMX supported
MMXEXT not supported
MONITOR supported
MOVBE supported
MSR supported
OSXSAVE supported
PCLMULQDQ supported
POPCNT supported
PREFETCHWT1 not supported
RDRAND supported
RDSEED supported
RDTSCP supported
RTM supported
SEP supported
SHA not supported
SSE supported
SSE2 supported
SSE3 supported
SSE4.1 supported
SSE4.2 supported
SSE4a not supported
SSSE3 supported
SYSCALL supported
TBM not supported
XOP not supported
XSAVE supported
It's weird, so I figured: why not simply emit the instruction directly.
int main()
{
const double value = 1.2;
const double value2 = 1.3;
auto x1 = _mm_load_sd(&value);
auto x2 = _mm_load_sd(&value2);
std::string s;
std::getline(std::cin, s);
}
This code runs fine. The disassembly:
auto x1 = _mm_load_sd(&value);
00007FF7C4833724 C5 FB 10 45 08 vmovsd xmm0,qword ptr [value]
auto x1 = _mm_load_sd(&value);
00007FF7C4833729 C5 F1 57 C9 vxorpd xmm1,xmm1,xmm1
00007FF7C483372D C5 F3 10 C0 vmovsd xmm0,xmm1,xmm0
Apparently it won't use register xmm1, but still proves that the instruction itself does the trick.
I just checked on another Intel Haswell what's going on here, and found this:
0000015077F20110 C5 FB 10 08 vmovsd xmm1,qword ptr [rax]
Apparently on Intel Haswell it emits another byte code instruction than on my Skylake.
#Ha. actually was kind enough to point me in the right direction here. Yes, the hidden bytes indeed indicate VMOVSD, but apparently it's encoded as EVEX. That's all nice and well, but EVEX prefix / encoding will be introduced in the latest Skylake architecture as part of AVX512, which won't be supported until Skylake Purley in 2017. In other words, this is an invalid instruction.
To check, I've put a breakpoint in X86MCCodeEmitter::EmitMemModRMByte. At some point, I do see an bool HasEVEX = [...] evaluating to true. This confirms that the codegen / emitter is producing the wrong output.
My conclusion is therefore that this has to be a bug in the target information of LLVM for Skylake CPU's. That means there are only two things remaining to do: figure out where this bug is exactly in LLVM so we can solve this and report the bug to the LLVM team...
So where is it in LLVM? That's tough to tell... x86.td.def defines skylake features as 'FeatureAVX512' which will probably trigger X86SSELevel to AVX512F. That in turn will give the wrong instructions. As a workaround, it's best to simply tell LLVM that we have an Intel Haswell instead and all will be well:
// MCPU is used to call createTargetMachine
llvm::StringRef MCPU = llvm::sys::getHostCPUName();
if (MCPU.str() == "skylake")
{
MCPU = llvm::StringRef("haswell");
}
Test, works.
see a simple code below:
int foo(int a)
{
return a;
}
int main() {
printf("%x\n", foo);
printf("%x\n", &foo);
printf("%x\n", *foo);
foo(1);
}
They all displayed the same value:
0x20453840
0x20453840
0x20453840
I used gdb to check foo() entry point is:
(gdb) p foo
$1 = {int (int)} 0x100003d8 <foo>
the value 0x20453840 is actually foo() pointer of pointer:
(gdb) p /x *0x20453850
$3 = 0x100003d8
(gdb) si
0x10000468 76 foo(1);
0x10000464 <main+76>: 38 60 00 01 li r3,1
=> 0x10000468 <main+80>: 4b ff ff 71 bl 0x100003d8 <foo>
(gdb)
foo (a=541407312) at insertcode.c:57
57 {
=> 0x100003d8 <foo+0>: 93 e1 ff fc stw r31,-4(r1)
0x100003dc <foo+4>: 94 21 ff e0 stwu r1,-32(r1)
0x100003e0 <foo+8>: 7c 3f 0b 78 mr r31,r1
0x100003e4 <foo+12>: 90 7f 00 38 stw r3,56(r31)
(gdb)
So I think 0x100003d8 is the entry point.
I used gcc 4.6.2 to compile.
I have tow questions:
why different function address definition on AIX? is it related to gcc?
I have to use gcc not xlC.
how to get real function address in C on AIX?
Thanks in advance!
why different function address definition on AIX?
nm -Pg ./f_addr | grep foo
Try this command, and you will see you have too symbols: foo and .foo One of them lives in the code segment (or text segment), the other, in the data segment.
The purpose is, indeed, creating an indirection in function calling; it is important when creating/using shared libraries.
is it related to gcc? I have to use gcc not xlC.
No.
How to get real function address in C on AIX?
Please clarify your question: what do you want to do with the 'real address'.
I am getting unexpected global variable read results when compiling the following code in avr-gcc 4.6.2 for ATmega328:
#include <avr/io.h>
#include <util/delay.h>
#define LED_PORT PORTD
#define LED_BIT 7
#define LED_DDR DDRD
uint8_t latchingFlag;
int main() {
LED_DDR = 0xFF;
for (;;) {
latchingFlag=1;
if (latchingFlag==0) {
LED_PORT ^= 1<<LED_BIT; // Toggle the LED
_delay_ms(100); // Delay
latchingFlag = 1;
}
}
}
This is the entire code. I would expect the LED toggling to never execute, seeing as latchingFlag is set to 1, however the LED blinks continuously. If latchingFlag is declared local to main() the program executes as expected: the LED never blinks.
The disassembled code doesn't reveal any gotchas that I can see, here's the disassembly of the main loop of the version using the global variable (with the delay routine call commented out; same behavior)
59 .L4:
27:main.cpp **** for (;;) {
60 .loc 1 27 0
61 0026 0000 nop
62 .L3:
28:main.cpp **** latchingFlag=1;
63 .loc 1 28 0
64 0028 81E0 ldi r24,lo8(1)
65 002a 8093 0000 sts latchingFlag,r24
29:main.cpp **** if (latchingFlag==0) {
66 .loc 1 29 0
67 002e 8091 0000 lds r24,latchingFlag
68 0032 8823 tst r24
69 0034 01F4 brne .L4
30:main.cpp **** LED_PORT ^= 1<<LED_BIT; // Toggle the LED
70 .loc 1 30 0
71 0036 8BE2 ldi r24,lo8(43)
72 0038 90E0 ldi r25,hi8(43)
73 003a 2BE2 ldi r18,lo8(43)
74 003c 30E0 ldi r19,hi8(43)
75 003e F901 movw r30,r18
76 0040 3081 ld r19,Z
77 0042 20E8 ldi r18,lo8(-128)
78 0044 2327 eor r18,r19
79 0046 FC01 movw r30,r24
80 0048 2083 st Z,r18
31:main.cpp **** latchingFlag = 1;
81 .loc 1 31 0
82 004a 81E0 ldi r24,lo8(1)
83 004c 8093 0000 sts latchingFlag,r24
27:main.cpp **** for (;;) {
84 .loc 1 27 0
85 0050 00C0 rjmp .L4
The lines 71-80 are responsible for port access: according to the datasheet, PORTD is at address 0x2B, which is decimal 43 (cf. lines 71-74).
The only difference between local/global declaration of the latchingFlag variable is how latchingFlag is accessed: the global variable version uses sts (store direct to data space) and lds (load direct from data space) to access latchingFlag, whereas the local variable version uses ldd (Load Indirect from Data Space to Register) and std (Store Indirect From Register to Data Space) using register Y as the address register (which can be used as a stack pointer, by avr-gcc AFAIK). Here are the relevant lines from the disassembly:
63 002c 8983 std Y+1,r24
65 002e 8981 ldd r24,Y+1
81 004a 8983 std Y+1,r24
The global version also has latchingFlag in the .bss section. I am really not what to attribute the different global vs. local variable behavior to. Here's the avr-gcc command-line (notice -O0):
/usr/local/avr/bin/avr-gcc \
-I. -g -mmcu=atmega328p -O0 \
-fpack-struct \
-fshort-enums \
-funsigned-bitfields \
-funsigned-char \
-D CLOCK_SRC=8000000UL \
-D CLOCK_PRESCALE=8UL \
-D F_CPU="(CLOCK_SRC/CLOCK_PRESCALE)" \
-Wall \
-ffunction-sections \
-fdata-sections \
-fno-exceptions \
-Wa,-ahlms=obj/main.lst \
-Wno-uninitialized \
-c main.cpp -o obj/main.o
With -Os compiler flags the loop is gone from the disassembly, but can be forced to be there again if latchingFlag is declared volatile, in which case the unexpected persists for me.
According to your disassembler listing, latchingFlag global variable is located at RAM address 0. This address corresponds to mirrored register r0 and is not a valid RAM address for global variable.
After couple checks and code compares in EE chat I noticed that my version of avr-gcc (4.7.0) stores the value for latchFlag in 0x0100, whereas Egor Skriptunoff mentioned SRAM addres 0 being in OP's assembly listing.
Looking at OP's disassembly (the avr-dump version), I noticed that OP's compiler (4.6.2) stores latchFlag value in a different address (specifically, 0x060) than my compiler (version 4.7.0), which stores latchFlag value at address 0x0100.
My advice is to update the avr-gcc version to at least version 4.7.0. The advantage of 4.7.0 rather than latest and greatest available is the ability to compare the generated code again with my findings.
Of course if 4.7.0 solves the issue, then there is harm in upgrading to a more recent version (if available).
Egor Skriptunoff suggestion is almost exactly right: the SRAM variable is mapped to the wrong memory address. The latchingFlag variable is not at 0x0100 address, which is the first valid SRAM address, but is mapped to 0x060, overlapping the WDTCSR register. This can be seen in the disassembly lines like the following one:
lds r24, 0x0060
THis line is supposed to load the value of latchingFlag from SRAM, and we can see that location 0x060 is used instead of 0x100.
The problem has to with a bug in the binutils which two conditions are met:
The linker is invoked with --gc-sections flag (compiler options: -Wl,--gc-sections) to save code space
None of your SRAM variables are initialized (i.e. initialized to non-zero values)
When both of these conditions are met, the .data section gets removed. When the .data section is missing, the SRAM variables start at address 0x060 instead of 0x100.
One solution is to reinstall binutils: the current versions have this bug fixed. Another solution is to edit your linker scripts: on Ubuntu this is probably in /usr/lib/ldscripts. For ATmega168/328 the script that needs to be edited is avr5.x, but you should really edit all them, otherwise you could run into this bug on other AVR platforms. The change that needs to be made is the following one:
.data : AT (ADDR (.text) + SIZEOF (.text))
{
PROVIDE (__data_start = .) ;
- *(.data)
+ KEEP(*(.data))
So replace the line *(.data) with KEEP(*(.data)). This ensures that the .data section is not discarded, and consequently the SRAM variable addresses start at 0x0100