Using dword value as an operand to ASM jmp

Using dword value as an operand to ASM jmp - windows

I'm playing with some runtime function patching but I have a problem with the endiannes when writing memory address values. So what I have:
char buf[] = \xE9\xDE\xAD\xBE\xEF
At runtime I have to fixup the 0xDEADBEEF to point to the actual address - here is my function to do this:
void FixJMPAddress(BYTE *jump, BYTE *newRoutine) {
DWORD address;
DWORD *dwPtr;
address = (DWORD)newRoutine;
dwPtr = (DWORD *)&(jump[1]);
*dwPtr = address;
}
It is invoked like that:
FixJMPAddress(buf, &Something);
Unfortunately when disassembling the end result I get:
E9 60 DA 47 93
instead of
E9 93 47 DA 60
So this is due to the fact that x86 is little-endian but is there a way in which I can cope with is automatically without having to write a function which essentially reverses the byteorder of the input?

This has nothing to do with little-endian. Your code assumes that the operand is stored in the same endianess as the architecture it's run on. That should be fine as long as your code runs on x86.
The real problem is that jmp uses a relative offset, not an absolute one.
To calculate the jmp destination:
dest = address_of_jmp + operand + sizeof(jmp_instruction)
Assuming BYTE* jump points to the actual instruction that will be executed, it should be:
LONG delta = address - (DWORD)jump - 5;
*(LONG*)(jump+1) = delta;

Related

what type of code can trigger unaligned data access sigbus trap dynamically?

I am looking for SIGBUS on unaligned data access. I am tracking one of this errors and I would like to know how this is happening on sitara am335x. Can someone please give me an example code to describe this or ensure triggering it.
Adding code snippet:
int Read( void *value, uint32_t *size, const uint32_t baseAddress )
{
uint8_t *userDataAddress = (uint8_t *)( baseAddress + sizeof( DBANode ));
memcpy( value, userDataAddress, ourDataSize );
*size = ourDataSize;
return 0;
}
DBA node is a class object of 20 bytes.
baseAddress is an mmap to a shared memory file again of a class object type of DBANode casted to a uint32_t so that the arithmetic can be done.
This is the dissasembly of the section:
91a8: e51b3010 ldr r3, [fp, #-16]
91ac: e5933000 ldr r3, [r3]
91b0: e51b0014 ldr r0, [fp, #-20] ; 0xffffffec
91b4: e51b1008 ldr r1, [fp, #-8]
91b8: e1a02003 mov r2, r3
91bc: ebffe72b bl 2e70 <memcpy#plt>
91c0: e51b3010 ldr r3, [fp, #-16]
91c4: e5932000 ldr r2, [r3]
91c8: e51b3018 ldr r3, [fp, #-24] ; 0xffffffe8
91cc: e5832000 str r2, [r3]
00002e70 <memcpy#plt>:
2e70: e28fc600 add ip, pc, #0, 12
2e74: e28cca08 add ip, ip, #8, 20 ; 0x8000
2e78: e5bcf868 ldr pc, [ip, #2152]! ; 0x868
When the exact same code base was re-built, the problem just disappeared. Can the gcc create 2 different versions of instructions with same optimization of -O0 specified for gcc ?
I also diffed the library so files obj dumps in both compilations. They are exactly the same. The api is used quite often. However, the crash only happens after prolonged use over a few days. I am reading the same node every 500ms. So this is not consistent.
Should I be looking at pointer corruption ?

Turns out the baseAddress is the issue. As I mentioned its an mmap to an shared memory location where the mmap can fail. failed mmap returns -1 and the code was checking for NULL and proceeding to write to -1 i.e 0xFFFFFFFF causing a sigbus.
The code 1 is seen when we use memcpy. Trying any other access like a direct byte addressing gives a code 3 with sigbus.
I am still not sure why it triggers SIGBUS instead of SIGSEGV. Shouldn't this be a memory violation instead ?
Here is an example:
int main(int argc, char **argv)
{
// Shared memory example
const char *NAME = "SharedMemory";
const int SIZE = 10 * sizeof(uint8_t);
uint8_t src[]={0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0x00};
int shm_fd = -1;
shm_fd = shm_open(NAME, O_CREAT | O_RDONLY, 0666);
ftruncate(shm_fd, SIZE);
// Map shared memory segment to address space
uint8_t *ptr = (uint8_t *) mmap(0, SIZE, PROT_READ | PROT_WRITE | _NOCACHE, MAP_SHARED, shm_fd, 0);
if(ptr == MAP_FAILED)
{
std::cerr << "ERROR in mmap()" << std::endl;
// return -1;
}
printf("ptr = 0x%08x\n",ptr);
std::cout << "Now storing data to mmap() memory" << std::endl;
#if 0
ptr[0] = 0x11;
ptr[1] = 0x22;
ptr[2] = 0x33;
ptr[3] = 0x44;
ptr[4] = 0x55;
ptr[5] = 0x66;
ptr[6] = 0x77;
ptr[7] = 0x88;
ptr[8] = 0x99;
ptr[9] = 0x00;
#endif
memcpy(ptr,src,SIZE); //causes sigbus code 1
shm_unlink(NAME);
}
I still do not know why mmap is failing on an shm even though I have a 100MB of RAM available and all my resource limits are set to unlimited with over 400 fds (file descriptors) still available out of 1000 fds limit. !!!

From the Cortex-A8 Technical Reference Manual:
The processor supports loads and stores of unaligned words and
halfwords. The processor makes the required number of memory accesses
and transfers adjacent bytes transparently.
Note Data accesses that cross a word boundary can add to the access time.
Setting the A bit in the CP15 c1 Control Register enables alignment
checking. When the A bit is set to 1, two types of memory access
generate a Data Abort signal and an Alignment fault status code:
a 16-bit access that is not halfword-aligned
a 32-bit load or store that is not word-aligned
Alignment fault detection is a mandatory address-generation function
rather than an optionally supported function of external memory
management hardware.See the ARM Architecture Reference Manual for
more information on unaligned data access support.
From the ARM ARM, instructions which always generate an alignment fault if not aligned to the transfer size:
LDREX, STREX, LDREXD, STREXD, LDM, STM, LDRD, RFE, SRS, STRD, SWP, LDC, LDC2, STC, STC2, VLDM, VLDR, VPOP, VPUSH, VSTM, VSTR.
Also, most PUSH, POP and VLDx where :align: is specified.
Further,
In an implementation that includes the Virtualization Extensions, an
unaligned access to Device or Strongly-ordered memory always causes an
Alignment fault Data Abort exception
As in the linked question, structs are the most obvious way to cause 'intended' unaligned accesses, but any corruption of the stack pointer or other variable pointer would also give the same result. Depending on how the core is configured will affect if normal single word/halfword accesses are just slow, or trigger a fault.
If you have access to the ETM trace, you would be able to identify the exact accesses. It seems that part has ETM/ETB (so no fancy trace capture device is required), but I've no idea how easy it will be to get tools to work with it.
As regards what code can trigger this, yes, even memcpy() could be a problem. Since the ARM instruction set has optimisations for transferring multiple registers (or register pairs in AA64), the optimised library functions will prefer to 'stream' data rather than perform byte by byte load and stores. Depending on the data structures and compilation target, it is perfectly possible to end up with illegal LDM to unaligned addresses.

gcc with intel x86-32 bit assembly : accessing C function arguments

I am doing an operating system implementation work.
Here's the code first :
//generate software interrupt
void generate_interrupt(int n) {
asm("mov al, byte ptr [n]");
asm("mov byte ptr [genint+1], al");
asm("jmp genint");
asm("genint:");
asm("int 0");
}
I am compiling above code with -masm=intel option in gcc. Also,
this is not complete code to generate software interrupt.
My problem is I am getting error as n undefined, how do I resolve it, please help?
Also it promts error at link time not at compile time, below is an image

When you are using GCC, you must use GCC-style extended asm to access variables declared in C, even if you are using Intel assembly syntax. The ability to write C variable names directly into an assembly insert is a feature of MSVC, which GCC does not copy.
For constructs like this, it is also important to use a single assembly insert, not several in a row; GCC can and will rearrange assembly inserts relative to the surrounding code, including relative to other assembly inserts, unless you take specific steps to prevent it.
This particular construct should be written
void generate_interrupt(unsigned char n)
{
asm ("mov byte ptr [1f+1], %0\n\t"
"jmp 1f\n"
"1:\n\t"
"int 0"
: /* no outputs */ : "r" (n));
}
Note that I have removed the initial mov and any insistence on involving the A register, instead telling GCC to load n into any convenient register for me with the "r" input constraint. It is best to do as little as possible in an assembly insert, and to leave the choice of registers to the compiler as much as possible.
I have also changed the type of n to unsigned char to match the actual requirements of the INT instruction, and I am using the 1f local label syntax so that this works correctly if generate_interrupt is made an inline function.
Having said all that, I implore you to find an implementation strategy for your operating system that does not involve self-modifying code. Well, unless you plan to get a whole lot more use out of the self-modifications, anyway.

This isn't an answer to your specific question about passing parameters into inline assembly (see #zwol's answer). This addresses using self modifying code unnecessarily for this particular task.
Macro Method if Interrupt Numbers are Known at Compile-time
An alternative to using self modifying code is to create a C macro that generates the specific interrupt you want. One trick is you need to a macro that converts a number to a string. Stringize macros are quite common and documented in the GCC documentation.
You could create a macro GENERATE_INTERRUPT that looks like this:
#define STRINGIZE_INTERNAL(s) #s
#define STRINGIZE(s) STRINGIZE_INTERNAL(s)
#define GENERATE_INTERRUPT(n) asm ("int " STRINGIZE(n));
STRINGIZE will take a numeric value and convert it into a string. GENERATE_INTERRUPT simply takes the number, converts it to a string and appends it to the end of the of the INT instruction.
You use it like this:
GENERATE_INTERRUPT(0);
GENERATE_INTERRUPT(3);
GENERATE_INTERRUPT(255);
The generated instructions should look like:
int 0x0
int3
int 0xff
Jump Table Method if Interrupt Numbers are Known Only at Run-time
If you need to call interrupts only known at run-time then one can create a table of interrupt calls (using int instruction) followed by a ret. generate_interrupt would then simply retrieve the interrupt number off the stack, compute the position in the table where the specific int can be found and jmp to it.
In the following code I get GNU assembler to generate the table of 256 interrupt call each followed by a ret using the .rept directive. Each code fragment fits in 4 bytes. The result code generation and the generate_interrupt function could look like:
/* We use GNU assembly to create a table of interrupt calls followed by a ret
* using the .rept directive. 256 entries (0 to 255) are generated.
* generate_interrupt is a simple function that takes the interrupt number
* as a parameter, computes the offset in the interrupt table and jumps to it.
* The specific interrupted needed will be called followed by a RET to return
* back from the function */
extern void generate_interrupt(unsigned char int_no);
asm (".pushsection .text\n\t"
/* Generate the table of interrupt calls */
".align 4\n"
"int_jmp_table:\n\t"
"intno=0\n\t"
".rept 256\n\t"
"\tint intno\n\t"
"\tret\n\t"
"\t.align 4\n\t"
"\tintno=intno+1\n\t"
".endr\n\t"
/* generate_interrupt function */
".global generate_interrupt\n" /* Give this function global visibility */
"generate_interrupt:\n\t"
#ifdef __x86_64__
"movzx edi, dil\n\t" /* Zero extend int_no (in DIL) across RDI */
"lea rax, int_jmp_table[rip]\n\t" /* Get base of interrupt jmp table */
"lea rax, [rax+rdi*4]\n\t" /* Add table base to offset = jmp address */
"jmp rax\n\t" /* Do sepcified interrupt */
#else
"movzx eax, byte ptr 4[esp]\n\t" /* Get Zero extend int_no (arg1 on stack) */
"lea eax, int_jmp_table[eax*4]\n\t" /* Compute jump address */
"jmp eax\n\t" /* Do specified interrupt */
#endif
".popsection");
int main()
{
generate_interrupt (0);
generate_interrupt (3);
generate_interrupt (255);
}
If you were to look at the generated code in the object file you'd find the interrupt call table (int_jmp_table) looks similar to this:
00000000 <int_jmp_table>:
0: cd 00 int 0x0
2: c3 ret
3: 90 nop
4: cd 01 int 0x1
6: c3 ret
7: 90 nop
8: cd 02 int 0x2
a: c3 ret
b: 90 nop
c: cc int3
d: c3 ret
e: 66 90 xchg ax,ax
10: cd 04 int 0x4
12: c3 ret
13: 90 nop
...
[snip]
Because I used .align 4 each entry is padded out to 4 bytes. This makes the address calculation for the jmp easier.

What's the proper way of calling a Win32/64 function from LLVM?

I'm attempting to call a method from LLVM IR back to C++ code. I'm working in 64-bit Visual C++, or as LLVM describes it:
Machine CPU: skylake
Machine info: x86_64-pc-windows-msvc
For integer types and pointer types my code works fine as-is. However, floating point numbers seem to be handled a bit strange.
Basically the call looks like this:
struct SomeStruct
{
static void Breakpoint( return; } // used to set a breakpoint
static void Set(uint8_t* ptr, double foo) { return foo * 2; }
};
and LLVM IR looks like this:
define i32 #main(i32, i8**) {
varinit:
// omitted here: initialize %ptr from i8**.
%5 = load i8*, i8** %instance0
// call to some method. This works - I use it to set a breakpoint
call void #"Helper::Breakpoint"(i8* %5)
// this call fails:
call void #"Helper::Set"(i8* %5, double 0xC19EC46965A6494D)
ret i32 0
}
declare double #"SomeStruct::Callback"(i8*, double)
I figured that the problem is probably in the way the calling conventions work. So I've attempted to make some adjustments to correct for that:
// during initialization of the function
auto function = llvm::Function::Create(functionType, llvm::Function::ExternalLinkage, name, module);
function->setCallingConv(llvm::CallingConv::X86_64_Win64);
...
// during calling of the function
call->setCallingConv(llvm::CallingConv::X86_64_Win64);
Unfortunately no matter what I try, I end up with 'invalid instruction' errors, which this user reports to be an issue with calling conventions: Clang producing executable with illegal instruction . I've tried this with X86-64_Win64, Stdcall, Fastcall and no calling convention specs - all with the same result.
I've read up on https://msdn.microsoft.com/en-us/library/ms235286.aspx in an attempt to figure out what's going on. Then I looked at the assembly output that's supposed to be generated by LLVM (using the targetMachine->addPassesToEmitFile API call) and found:
movq (%rdx), %rsi
movq %rsi, %rcx
callq "Helper2<double>::Breakpoint"
vmovsd __real#c19ec46965a6494d(%rip), %xmm1
movq %rsi, %rcx
callq "Helper2<double>::Set"
xorl %eax, %eax
addq $32, %rsp
popq %rsi
According to MSDN, argument 2 should be in %xmm1 so that also seems correct. However, when checking if everything works in the debugger, Visual Studio reports a lot of question marks (e.g. 'illegal instruction').
Any feedback is appreciated.
The disassembly code:
00000144F2480007 48 B8 B6 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848B6h
00000144F2480011 48 89 D1 mov rcx,rdx
00000144F2480014 48 89 54 24 20 mov qword ptr [rsp+20h],rdx
00000144F2480019 FF D0 call rax
00000144F248001B 48 B8 C0 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848C0h
00000144F2480025 48 B9 00 00 47 F2 44 01 00 00 mov rcx,144F2470000h
00000144F248002F ?? ?? ??
00000144F2480030 ?? ?? ??
00000144F2480031 FF 08 dec dword ptr [rax]
00000144F2480033 10 09 adc byte ptr [rcx],cl
00000144F2480035 48 8B 4C 24 20 mov rcx,qword ptr [rsp+20h]
00000144F248003A FF D0 call rax
00000144F248003C 31 C0 xor eax,eax
00000144F248003E 48 83 C4 28 add rsp,28h
00000144F2480042 C3 ret
Some of the information about the memory is missing. Memory view:
0x00000144F248001B 48 b8 c0 48 b8 c8 fa 7f 00 00 48 b9 00 00 47 f2 44 01 00 00 62 f1 ff 08 10 09 48 8b 4c 24 20 ff d0 31 c0 48 83 c4 28 c3 00 00 00 00 00 ...
The question marks that are missing here are: '62 f1 '.
Some code is helpful to see how I get the JIT to compile etc. I'm afraid it's a bit long, but helps to get the idea... and I have no clue how to create a smaller piece of code.
// Note: FunctionBinderBase basically holds an llvm::Function* object
// which is bound using the above code and a name.
llvm::ExecutionEngine* Module::Compile(std::unordered_map<std::string, FunctionBinderBase*>& externalFunctions)
{
// DebugFlag = true;
#if (LLVMDEBUG >= 1)
this->module->dump();
#endif
// -- Initialize LLVM compiler: --
std::string error;
// Helper function, gets the current machine triplet.
llvm::Triple triple(MachineContextInfo::Triplet());
const llvm::Target *target = llvm::TargetRegistry::lookupTarget("x86-64", triple, error);
if (!target)
{
throw error.c_str();
}
llvm::TargetOptions Options;
// Options.PrintMachineCode = true;
// Options.EnableFastISel = true;
std::unique_ptr<llvm::TargetMachine> targetMachine(
target->createTargetMachine(MachineContextInfo::Triplet(), MachineContextInfo::CPU(), "", Options, llvm::Reloc::Default, llvm::CodeModel::Default, llvm::CodeGenOpt::Aggressive));
if (!targetMachine.get())
{
throw "Could not allocate target machine!";
}
// Create the target machine; set the module data layout to the correct values.
auto DL = targetMachine->createDataLayout();
module->setDataLayout(DL);
module->setTargetTriple(MachineContextInfo::Triplet());
// Pass manager builder:
llvm::PassManagerBuilder pmbuilder;
pmbuilder.OptLevel = 3;
pmbuilder.BBVectorize = false;
pmbuilder.SLPVectorize = true;
pmbuilder.LoopVectorize = true;
pmbuilder.Inliner = llvm::createFunctionInliningPass(3, 2);
llvm::TargetLibraryInfoImpl *TLI = new llvm::TargetLibraryInfoImpl(triple);
pmbuilder.LibraryInfo = TLI;
// Generate pass managers:
// 1. Function pass manager:
llvm::legacy::FunctionPassManager FPM(module.get());
pmbuilder.populateFunctionPassManager(FPM);
// 2. Module pass manager:
llvm::legacy::PassManager PM;
PM.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(PM);
// 3. Execute passes:
// - Per-function passes:
FPM.doInitialization();
for (llvm::Module::iterator I = module->begin(), E = module->end(); I != E; ++I)
{
if (!I->isDeclaration())
{
FPM.run(*I);
}
}
FPM.doFinalization();
// - Per-module passes:
PM.run(*module);
// Fix function pointers; the PM.run will ruin them, this fixes that.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#if (LLVMDEBUG >= 2)
// -- ASSEMBLER dump code
// 3. Code generation pass manager:
llvm::legacy::PassManager CGP;
CGP.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(CGP);
std::string result;
llvm::raw_string_ostream str(result);
llvm::buffer_ostream os(str);
targetMachine->addPassesToEmitFile(CGP, os, llvm::TargetMachine::CodeGenFileType::CGFT_AssemblyFile);
CGP.run(*module);
str.flush();
auto stringref = os.str();
std::string assembly(stringref.begin(), stringref.end());
std::cout << "ASM code: " << std::endl << "---------------------" << std::endl << assembly << std::endl << "---------------------" << std::endl;
// -- end of ASSEMBLER dump code.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#endif
#if (LLVMDEBUG >= 2)
module->dump();
#endif
// All done, *RUN*.
llvm::EngineBuilder engineBuilder(std::move(module));
engineBuilder.setEngineKind(llvm::EngineKind::JIT);
engineBuilder.setMCPU(MachineContextInfo::CPU());
engineBuilder.setMArch("x86-64");
engineBuilder.setUseOrcMCJITReplacement(false);
engineBuilder.setOptLevel(llvm::CodeGenOpt::None);
llvm::ExecutionEngine* engine = engineBuilder.create();
// Define external functions
for (auto it : externalFunctions)
{
auto fcn = it.second;
if (fcn->function)
{
engine->addGlobalMapping(fcn->function, const_cast<void*>(fcn->FunctionPointer())); // Yuck... LLVM only takes non-const pointers
}
}
// Finalize
engine->finalizeObject();
return engine;
}
Update (progress)
Apparently my Skylake has problems with the vmovsd instruction. When running the same code on a Haswell (server), the test succeeds. I've checked the assembly output on both - they are exactly the same.
Just to be sure: XSAVE/XRESTORE shouldn't be the problem on Win10-x64, but let's find out anyways. I've checked the features with the code from https://msdn.microsoft.com/en-us/library/hskdteyh.aspx and the XSAVE/XRESTORE from https://insufficientlycomplicated.wordpress.com/2011/11/07/detecting-intel-advanced-vector-extensions-avx-in-visual-studio/ . The latter runs just fine. As for the former, these are the results:
GenuineIntel
Intel(R) Core(TM) i7-6700HQ CPU # 2.60GHz
3DNOW not supported
3DNOWEXT not supported
ABM not supported
ADX supported
AES supported
AVX supported
AVX2 supported
AVX512CD not supported
AVX512ER not supported
AVX512F not supported
AVX512PF not supported
BMI1 supported
BMI2 supported
CLFSH supported
CMPXCHG16B supported
CX8 supported
ERMS supported
F16C supported
FMA supported
FSGSBASE supported
FXSR supported
HLE supported
INVPCID supported
LAHF supported
LZCNT supported
MMX supported
MMXEXT not supported
MONITOR supported
MOVBE supported
MSR supported
OSXSAVE supported
PCLMULQDQ supported
POPCNT supported
PREFETCHWT1 not supported
RDRAND supported
RDSEED supported
RDTSCP supported
RTM supported
SEP supported
SHA not supported
SSE supported
SSE2 supported
SSE3 supported
SSE4.1 supported
SSE4.2 supported
SSE4a not supported
SSSE3 supported
SYSCALL supported
TBM not supported
XOP not supported
XSAVE supported
It's weird, so I figured: why not simply emit the instruction directly.
int main()
{
const double value = 1.2;
const double value2 = 1.3;
auto x1 = _mm_load_sd(&value);
auto x2 = _mm_load_sd(&value2);
std::string s;
std::getline(std::cin, s);
}
This code runs fine. The disassembly:
auto x1 = _mm_load_sd(&value);
00007FF7C4833724 C5 FB 10 45 08 vmovsd xmm0,qword ptr [value]
auto x1 = _mm_load_sd(&value);
00007FF7C4833729 C5 F1 57 C9 vxorpd xmm1,xmm1,xmm1
00007FF7C483372D C5 F3 10 C0 vmovsd xmm0,xmm1,xmm0
Apparently it won't use register xmm1, but still proves that the instruction itself does the trick.

I just checked on another Intel Haswell what's going on here, and found this:
0000015077F20110 C5 FB 10 08 vmovsd xmm1,qword ptr [rax]
Apparently on Intel Haswell it emits another byte code instruction than on my Skylake.
#Ha. actually was kind enough to point me in the right direction here. Yes, the hidden bytes indeed indicate VMOVSD, but apparently it's encoded as EVEX. That's all nice and well, but EVEX prefix / encoding will be introduced in the latest Skylake architecture as part of AVX512, which won't be supported until Skylake Purley in 2017. In other words, this is an invalid instruction.
To check, I've put a breakpoint in X86MCCodeEmitter::EmitMemModRMByte. At some point, I do see an bool HasEVEX = [...] evaluating to true. This confirms that the codegen / emitter is producing the wrong output.
My conclusion is therefore that this has to be a bug in the target information of LLVM for Skylake CPU's. That means there are only two things remaining to do: figure out where this bug is exactly in LLVM so we can solve this and report the bug to the LLVM team...
So where is it in LLVM? That's tough to tell... x86.td.def defines skylake features as 'FeatureAVX512' which will probably trigger X86SSELevel to AVX512F. That in turn will give the wrong instructions. As a workaround, it's best to simply tell LLVM that we have an Intel Haswell instead and all will be well:
// MCPU is used to call createTargetMachine
llvm::StringRef MCPU = llvm::sys::getHostCPUName();
if (MCPU.str() == "skylake")
{
MCPU = llvm::StringRef("haswell");
}
Test, works.

Visual studio inline assembly direct jump

I want to use inline assembly in visual studio to jump to a specific address. I tried this:
_asm {
jmp 0x12345678
}
But the compiler says: "The opcode does not use operands of this type."
How can I do a direct jump?

As far as I understand it, MASM does not support this type of jump. You have a few options :
mov eax, 12345678h
jmp eax
or
push 12345678h
ret
The first uses a register, the second incurs a performance hit because it rattles the CALL/RET pairing optimizations in the CPU. You could also use a typed constant or local variable, I think - this also consumes a few extra bytes. I don't think there's any other way, nor any direct, one-line means of performing a direct jump like this in MASM.
caveat : this assumes you are working in x86 code. Your OP suggests as much from the size of the jmp argument but if this is x64 then the answer will obviously be different.

Try set a var to the address:
unsigned int var = 0x12345678;
_asm {
jmp [var]
}

If you're using linker options /DYNAMICBASE:NO /FIXED /BASE:0x[your base address] for your module, you can use:
_asm
{
jmp label1 + 0xFFFFFF
label1:
}
Which in disassembly compiles to:
_asm
{
jmp label1 + 0xFFFFFF
040117DC E9 FF FF FF 00 jmp 050117E0
label1:
}
You can then play around to get an E9 jump to your destination address.

Stack allocation, why the extra space?

I was playing around a bit to get a better grip on calling conventions and how the stack is handled, but I can't figure out why main allocates three extra double words when setting up the stack (at <main+0>). It's neither aligned to 8 bytes nor 16 bytes, so that's not why as far as I know. As I see it, main requires 12 bytes for the two parameters to func and the return value.
What am I missing?
The program is C code compiled with "gcc -ggdb" on a x86 architecture.
Edit: I removed the -O0 flag from gcc, and it made no difference to the output.
(gdb) disas main
Dump of assembler code for function main:
0x080483d1 <+0>: sub esp,0x18
0x080483d4 <+3>: mov DWORD PTR [esp+0x4],0x7
0x080483dc <+11>: mov DWORD PTR [esp],0x3
0x080483e3 <+18>: call 0x80483b4 <func>
0x080483e8 <+23>: mov DWORD PTR [esp+0x14],eax
0x080483ec <+27>: add esp,0x18
0x080483ef <+30>: ret
End of assembler dump.
Edit: Of course I should have posted the C code:
int func(int a, int b) {
int c = 9;
return a + b + c;
}
void main() {
int x;
x = func(3, 7);
}
The platform is Arch Linux i686.

The parameters to a function (including, but not limited to main) are already on the stack when you enter the function. The space you allocate inside the function is for local variables. For functions with simple return types such as int, the return value will normally be in a register (eax, with a typical 32-bit compiler on x86).
If, for example, main was something like this:
int main(int argc, char **argv) {
char a[35];
return 0;
}
...we'd expect to see at least 35 bytes allocated on the stack as we entered main to make room for a. Assuming a 32-bit implementation, that would normally be rounded up to the next multiple of 4 (36, in this case) to maintain 32-bit alignment of the stack. We would not expect to see any space allocated for the return value. argc and argv would be on the stack, but they'd already be on the stack before main was entered, so main would not have to do anything to allocate space for them.
In the case above, after allocating space for a, a would typicaly start at [esp-36], argv would be at [esp-44] and argc would be at [esp-48] (or those two might be reversed -- depending on whether arguments were pushed left to right or right to left). In case you're wondering why I skipped [esp-40], that would be the return address.
Edit: Here's a diagram of the stack on entry to the function, and after setting up the stack frame:
Edit 2: Based on your updated question, what you have is slightly roundabout, but not particularly hard to understand. Upon entry to main, it's allocating space not only for the variables local to main, but also for the parameters you're passing to the function you call from main.
That accounts for at least some of the extra space being allocated (though not necessarily all of it).

It's alignment. I assumed for some reason that esp would be aligned from the start, which it clearly isn't.
gcc aligns stack frames to 16 bytes per default, which is what happened.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio