Non-static member functions with no ref-qualifier

Non-static member functions with no ref-qualifier - c++11

This is a follow-up to my previous post
With reference to Non-static member functions
Under
const-, volatile-, and ref-qualified member functions
A non-static member function can be declared with no ref-qualifier,...
During overload resolution, non-static cv-qualified member function of
class X is treated as follows:
no ref-qualifier: the implicit object parameter has type lvalue
reference to cv-qualified X and is additionally allowed to bind rvalue
implied object argument
To explore this further, I experimented with the source code provided in the above link, as shown below:
#include <utility>
#include <iostream>
using std::move;
using std::cout;
using std::endl;
struct S {
void f() {cout << "no ref-qualifier: the implicit object parameter has type lvalue reference to cv-qualified S and is additionally allowed to bind rvalue implied object argument\n"; }
//if only the below method signature were to be enabled,
//the invocations using rvalue implicit object would fail
//to compile with the error [-fpermissive]
//void f() & {cout << "lvalue\n"; }
//if only the below method signature were to be enabled,
//the invocation using lvalue implicit object would fail
//to complile with the error [-fpermissive]
//void f() && {cout << "rvalue\n"; }
};
int main (void){
S s;
s.f(); // prints "lvalue"
move(s).f(); // prints "rvalue"
S().f(); // prints "rvalue"
return 0;
}
I have provided relevant comments above each non-static member function overload based on reference qualifier, highlighting the compilation issue that would ensue if only that particular overload were to be enabled, in view of the source code in main().
My question is, what is happening under the hoods in order for the non-ref qualified non-static member function to be able to be agnostic to the implicit type of the object upon which it has been invoked? Does the compiler step in with the appropriate overload?
Appreciate your thoughts.

Const- and ref-qualified member functions do not function differently from non-const/ref member functions. These markings are mostly there to stop programmers from misusing stuff.
No that object is const because you are not supposed to change it, so
you are not allowed to call a mutating function on it!
compiler yelling at programmer.
If you have a function void g() const, then it does not suddenly compile to different instructions because you remove the const from it - it just means that the compiler stops checking if you mutate this in the body.
Well, mostly... It gets a bit more complicated by the fact that, by upholding various invariants, these keywords also sometimes allow the programmer (or compiler) to do stuff that would not be allowed in the general case. For example, if I know you have given me an rvalue, then I am allowed to steal its innards instead of needing to copy them.
In any case, to illustrate, I ran your three examples through godbolt.
In the first case void f() compiled to:
S::f():
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
nop
leave
ret
You don't have to be able to read all that, just see that when I compiled the two other functions they produced this:
S::f() &:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
nop
leave
ret
S::f() &&:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
nop
leave
ret
Just so we are clear, except for loading different string literals (.LC0, .LC1), they are all three identical. And if we add -O3, then it all gets inlined into main as such:
No ref-qualifiers:
main:
sub rsp, 8
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
With ref-qualifiers:
main:
sub rsp, 8
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Which again is identical (except for string literals)
In short: Nothing has to be done to make the non-ref-qualified function accept both arguments - it is the other way around: The compiler stops the ref-qualified functions from accepting arguments that they technically could but are not allowed to take (both to guard against programmers doing stuff they should not do, and in case the function was optimized to do something that would actually break on the general case).
On some level it is like the type system: On the machine level the computer is just moving around bits and bytes, and gives absolutely zero fucks about what type you think any given chunk represents. It is purely the compiler trying to hold you to account and make sure you uphold the invariants that you declared, but once the compiler is happy and emits some machine code then the types are long gone.
Or, since I like examples, you can say that it is like the difference between the VIP and the non-VIP door: The doors actually identical, but the guard (compiler) only lets you enter through the ones that (they think) you have permission to.

Related

Efficient type punning without undefined behavior

Say I'm working on a library called libModern. This library uses a legacy C library, called libLegacy, as an implementation strategy. libLegacy's interface looks like this:
typedef uint32_t LegacyFlags;
struct LegacyFoo {
uint32_t x;
uint32_t y;
LegacyFlags flags;
// more data
};
struct LegacyBar {
LegacyFoo foo;
float a;
// more data
};
void legacy_input(LegacyBar const* s); // Does something with s
void legacy_output(LegacyBar* s); // Stores data in s
libModern shouldn't expose libLegacy's types to its users for various reasons, among them:
libLegacy is an implementation detail that shouldn't be leaked. Future versions of libModern might chose to use another library instead of libLegacy.
libLegacy uses hard-to-use, easy-to-misuse types that shouldn't be part of any user-facing API.
The textbook way to deal with this situation is the pimpl idiom: libModern would provide a wrapper type that internally has a pointer to the legacy data. However, this is not possible here, since libModern cannot allocate dynamic memory. Generally, its goal is not to add a lot of overhead.
Therefore, libModern defines its own types that are layout-compatible with the legacy types, yet have a better interface. In this example it is using a strong enum instead of a plain uint32_t for flags:
enum class ModernFlags : std::uint32_t
{
first_flag = 0,
second_flag = 1,
};
struct ModernFoo {
std::uint32_t x;
std::uint32_t y;
ModernFlags flags;
// More data
};
struct ModernBar {
ModernFoo foo;
float a;
// more data
};
Now the question is: How can libModern convert between the legacy and the modern types without much overhead? I know of 3 options:
reinterpret_cast. This is undefined behavior, but in practice produces perfect assembly. I want to avoid this, since I cannot rely on this still working tomorrow or on another compiler.
std::memcpy. In simple cases this generates the same optimal assembly, but in any non-trivial case this adds significant overhead.
C++20's std::bit_cast. In my tests, at best it produces exactly the same code as memcpy. In some cases it's worse.
This is a comparison of the 3 ways to interface with libLegacy:
Interfacing with legacy_input()
Using reinterpret_cast:
void input_ub(ModernBar const& s) noexcept {
legacy_input(reinterpret_cast<LegacyBar const*>(&s));
}
Assembly:
input_ub(ModernBar const&):
jmp legacy_input
This is perfect codegen, but it invokes UB.
Using memcpy:
void input_memcpy(ModernBar const& s) noexcept {
LegacyBar ls;
std::memcpy(&ls, &s, sizeof(ls));
legacy_input(&ls);
}
Assembly:
input_memcpy(ModernBar const&):
sub rsp, 24
movdqu xmm0, XMMWORD PTR [rdi]
mov rdi, rsp
movaps XMMWORD PTR [rsp], xmm0
call legacy_input
add rsp, 24
ret
Significantly worse.
Using bit_cast:
void input_bit_cast(ModernBar const& s) noexcept {
LegacyBar ls = std::bit_cast<LegacyBar>(s);
legacy_input(&ls);
}
Assembly:
input_bit_cast(ModernBar const&):
sub rsp, 40
movdqu xmm0, XMMWORD PTR [rdi]
mov rdi, rsp
movaps XMMWORD PTR [rsp+16], xmm0
mov rax, QWORD PTR [rsp+16]
mov QWORD PTR [rsp], rax
mov rax, QWORD PTR [rsp+24]
mov QWORD PTR [rsp+8], rax
call legacy_input
add rsp, 40
ret
And I have no idea what's going on here.
Interfacing with legacy_output()
Using reinterpret_cast:
auto output_ub() noexcept -> ModernBar {
ModernBar s;
legacy_output(reinterpret_cast<LegacyBar*>(&s));
return s;
}
Assembly:
output_ub():
sub rsp, 56
lea rdi, [rsp+16]
call legacy_output
mov rax, QWORD PTR [rsp+16]
mov rdx, QWORD PTR [rsp+24]
add rsp, 56
ret
Using memcpy:
auto output_memcpy() noexcept -> ModernBar {
LegacyBar ls;
legacy_output(&ls);
ModernBar s;
std::memcpy(&s, &ls, sizeof(ls));
return s;
}
Assembly:
output_memcpy():
sub rsp, 56
lea rdi, [rsp+16]
call legacy_output
mov rax, QWORD PTR [rsp+16]
mov rdx, QWORD PTR [rsp+24]
add rsp, 56
ret
Using bit_cast:
auto output_bit_cast() noexcept -> ModernBar {
LegacyBar ls;
legacy_output(&ls);
return std::bit_cast<ModernBar>(ls);
}
Assembly:
output_bit_cast():
sub rsp, 72
lea rdi, [rsp+16]
call legacy_output
movdqa xmm0, XMMWORD PTR [rsp+16]
movaps XMMWORD PTR [rsp+48], xmm0
mov rax, QWORD PTR [rsp+48]
mov QWORD PTR [rsp+32], rax
mov rax, QWORD PTR [rsp+56]
mov QWORD PTR [rsp+40], rax
mov rax, QWORD PTR [rsp+32]
mov rdx, QWORD PTR [rsp+40]
add rsp, 72
ret
Here you can find the entire example on Compiler Explorer.
I also noted that the codegen varies significantly depending on the exact definition of the structs (i.e. order, amount & type of members). But the UB version of the code is consistently better or at least as good as the other two versions.
Now my questions are:
How come the codegen varies so dramatically? It makes me wonder if I'm missing something important.
Is there something I can do to guide the compiler to generate better code without invoking UB?
Are there other standard-conformant ways that generate better code?

In your compiler explorer link, Clang produces the same code for all output cases. I don't know what problem GCC has with std::bit_cast in that situation.
For the input case, the three functions cannot produce the same code, because they have different semantics.
With input_ub, the call to legacy_input may be modifying the caller's object. This cannot be the case in the other two versions. Therefore the compiler cannot optimize away the copies, not knowing how legacy_input behaves.
If you pass by-value to the input functions, then all three versions produce the same code at least with Clang in your compiler explorer link.
To reproduce the code generated by the original input_ub you need to keep passing the address of the caller's object to legacy_input.
If legacy_input is an extern C function, then I don't think the standards specify how the object models of the two languages are supposed to interact in this call. So, for the purpose of the language-lawyer tag, I will assume that legacy_input is an ordinary C++ function.
The problem in passing the address of &s directly is that there is generally no LegacyBar object at the same address that is pointer-interconvertible with the ModernBar object. So if legacy_input tries to access LegacyBar members through the pointer, that would be UB.
Theoretically you could create a LegacyBar object at the required address, reusing the object representation of the ModernBar object. However, since the caller presumably will expect there to still be a ModernBar object after the call, you then need to recreate a ModernBar object in the storage by the same procedure.
Unfortunately though, you are not always allowed to reuse storage in this way. For example if the passed reference refers to a const complete object, that would be UB, and there are other requirements. The problem is also whether the caller's references to the old object will refer to the new object, meaning whether the two ModernBar objects are transparently replaceable. This would also not always be the case.
So in general I don't think you can achieve the same code generation without undefined behavior if you don't put additional constraints on the references passed to the function.

Most non-MSVC compilers support an attribute called __may_alias__ that you can use
struct ModernFoo {
std::uint32_t x;
std::uint32_t y;
ModernFlags flags;
// More data
} __attribute__((__may_alias__));
struct ModernBar {
ModernFoo foo;
float a;
// more data
} __attribute__((__may_alias__));
Of course some optimizations can't be done when aliasing is allowed, so use it only if performance is acceptable
Godbolt link

Programs which would ever have any reason to access storage as multiple types should be processed using -fno-strict-aliasing or equivalent on any compiler that doesn't limit type-based aliasing assumptions around places where a pointer or lvalue of one type is converted to another, even if the program uses only corner-case behaviors mandated by the Standard. Using such a compiler flag will guarantee that one won't have type-based-aliasing problems, while jumping through hoops to use only standard-mandated corner cases won't. Both clang and gcc are sometimes prone to both:
have one phase of optimization change code whose behavior would be mandated by the Standard into code whose behavior isn't mandated by the Standard would be equivalent in the absence of further optimization, but then
have a later phase of optimization further transform the code in a manner that would have been allowable for the version of the code produced by #1 but not for the code as it was originally written.
If using -fno-strict-aliasing on straightforwardly-written source code yields machine code whose performance is acceptable, that's a safer approach than trying to jump through hoops to satisfy constraints that the Standard allows compilers to impose in cases where doing so would allow them to be more useful [or--for poor quality compilers--in cases where doing so would make them less useful].

You could create a union with a private member to restrict access to the legacy representation:
union UnionBar {
struct {
ModernFoo foo;
float a;
};
private:
LegacyBar legacy;
friend LegacyBar const* to_legacy_const(UnionBar const& s) noexcept;
friend LegacyBar* to_legacy(UnionBar& s) noexcept;
};
LegacyBar const* to_legacy_const(UnionBar const& s) noexcept {
return &s.legacy;
}
LegacyBar* to_legacy(UnionBar& s) noexcept {
return &s.legacy;
}
void input_union(UnionBar const& s) noexcept {
legacy_input(to_legacy_const(s));
}
auto output_union() noexcept -> UnionBar {
UnionBar s;
legacy_output(to_legacy(s));
return s;
}
The input/output functions are compiled to the same code as the reinterpret_cast-versions (using gcc/clang):
input_union(UnionBar const&):
jmp legacy_input
and
output_union():
sub rsp, 56
lea rdi, [rsp+16]
call legacy_output
mov rax, QWORD PTR [rsp+16]
mov rdx, QWORD PTR [rsp+24]
add rsp, 56
ret
Note that this uses anonymous structs and requires you to include the legacy implementation, which you mentioned you do not want. Also, I'm missing the experience to be fully confident that there's no hidden UB, so it would be great if someone else would comment on that :)

How to get the EXCEPTION_POINTERS in Windows 32bit x86 assembly?

So I have some basic asm file that looks like the one below(print_eax is removed as it's large and unrelated to the question) and have been using this http://www.godevtool.com/ExceptFrame.htm as a source of information. But I'm unsure of how to get the EXCEPTION_POINTERS. I've tried popping from the stack, and using different registers in case it was passed through those, as well as various offsets from each of these, but honestly I'm at a loss and have been unable to find a solution through google. How do I get the EXCEPTION_POINTERS when we enter FINAL_HANDLER?
global _main
extern _GetStdHandle#4
extern _WriteFile#20
extern _ExitProcess#4
extern _SetUnhandledExceptionFilter#4
section .text
_main:
push FINAL_HANDLER
CALL _SetUnhandledExceptionFilter#4
;---Protected code---
mov eax, dword [0xffffffff] ;Force C0000005h exception
; ExitProcess(0)
push 0
call _ExitProcess#4
; never here
hlt
FINAL_HANDLER:
; get EXCEPTION_POINTERS
;????
; get EXCEPTION_RECORD(dword [EXCEPTION_POINTERS+0])
mov eax, dword [EXCEPTION_POINTERS]
; get ExceptionCode(dword [EXCEPTION_RECORD+0])
mov eax, dword [eax]
call print_eax ; a simple procedure that outputs eax in hex
; ExitProcess(1)
push 1
call _ExitProcess#4

How to fix: (cannot have implicit far jump or call to near label) and (use a register assumed to ERROR)

I'm trying to create dll using VS 2017.
The dll will have one proc: symbol_count.
It asks to enter the string and then set symbol what is needed to count.
.def file
LIBRARY name
EXPORTS
symbol_count
Code:
.586
.model flat, stdcall
option casemap: none
include C:\masm32\include\windows.inc
include C:\masm32\include\user32.inc
include C:\masm32\include\msvcrt.inc
includelib C:\masm32\lib\msvcrt.lib
includelib C:\masm32\lib\user32.lib
.data
msg_string db 'Enter string: ', 0
msg_symbol db 'Enter symbol: ', 0
result db 'Count = %d', 0
str_modifier db '%s', 0
sym_modifier db '%c', 0
.data
string db ?
symbol db ?
DllEntry PROC hInstDLL:DWORD, reason:DWORD, reserved:DWORD
mov eax, 1
ret
DllEntry ENDP
symbol_count PROC
invoke crt_printf, OFFSET msg_string
invoke crt_scanf, OFFSET str_modifier, OFFSET string
invoke crt_printf, OFFSET msg_symbol
invoke crt_scanf, OFFSET sym_modifier, OFFSET symbol
xor esi, esi
xor ecx, ecx
mov ebx, OFFSET string
mov ecx, eax
mov al, symbol
loop1: <------------------------------------------ A2108
cmp byte ptr [ebx + ecx], 0
je endloop <------------------------------ A2107
cmp al, byte ptr [ebx + ecx]
jne next <-------------------------------- A2107
inc esi
next: <------------------------------------------- A2108
inc ecx
jmp loop1 <------------------------------- A2107
endloop: <---------------------------------------- A2108
invoke crt_printf, OFFSET result, esi
ret
symbol_count ENDP
End DllEntry
Here is the list of error messages, what a compiler gives to me: (
in the code, I marked the places where the compiler swears)
A2108 use of register assumed to ERROR
A2108 use of register assumed to ERROR
A2108 use of register assumed to ERROR
A2107 cannot have implicit far jump or call to near label
A2107 cannot have implicit far jump or call to near label
A2107 cannot have implicit far jump or call to near label
procedure argument or local not referenced : hInstDLL } all this points
procedure argument or local not referenced : reason } to DllEntry ENDP
procedure argument or local not referenced : reserved }

"You put your code into the .data section which may or may not cause some of the errors. The last 3 should just be warnings as you don't use the arguments." – #Jester

Input Buffer holds old input despite FlushConsoleInputBuffer MASM

I wrote a program that processes the array and I need to check the correctness of the input. If you enter along with numbers and letters, then the input command loops and does not allow you to enter data, I decided to do the cleaning of the buffer before each input, but the problem remained
S1:invoke WriteConsole, h_output, ADDR ComSizeMas, Len_ComSize, ADDR nWrite, 0
invoke FlushConsoleInputBuffer,h_input
invoke crt_scanf, ADDR format_size_buf, ADDR Size_buf
CMP Size_buf,1
JL S1
CMP Size_buf,100
JG S1

scanf[_s] read data to internal crt buffer. call FlushConsoleInputBuffer have no effect on it. instead we need call fflush on stdin stream
If the stream is open for input, fflush clears the contents of the
buffer
so in c/c++ code we need fflush(__iob_func()) call.
demo example c/c++
ULONG __cdecl GetNumberc()
{
ULONG n;
while (!scanf_s("%u", &n))
{
if (fflush(__iob_func())) break;
printf("invalid number\n");
}
return n;
}
for x86 asm
extern __imp__fflush : DWORD,
__imp____iob_func : DWORD,
__imp__scanf_s : DWORD,
__imp__printf : DWORD
.const
format_number DB "%u",0
invalid_number DB "invalid number",10,0
.code
_GetNumber proc
sub esp,4
##0:
push esp
push offset format_number
call __imp__scanf_s
add esp,8
test eax,eax
jnz ##1
call __imp____iob_func
push eax
call __imp__fflush
add esp,4
test eax,eax
jnz ##1
push offset invalid_number
call __imp__printf
add esp,4
jmp ##0
##1:
mov eax,[esp]
add esp,4
ret
_GetNumber endp
for x64 asm
extern __imp_fflush : QWORD,
__imp___iob_func : QWORD,
__imp_scanf_s : QWORD,
__imp_printf : QWORD
.const
format_number DB "%u",0
invalid_number DB "invalid number",10,0
.code
GetNumber proc
sub rsp,28h
##0:
lea rdx,[rsp+30h]
lea rcx,format_number
call __imp_scanf_s
test eax,eax
jnz ##1
call __imp___iob_func
mov rcx,rax
call __imp_fflush
test eax,eax
jnz ##1
lea rcx,invalid_number
call __imp_printf
jmp ##0
##1:
mov eax,[rsp+30h]
add rsp,28h
ret
GetNumber endp

Strange behaviour with a simple MASM32 program

I want to write a MASM program similar to the following C++ program :
#include <Windows.h>
#include <iostream>
typedef UINT (_stdcall *FuncPtr)(LPCSTR lpCmdLine, UINT uCmdShow);
int main(void)
{
HMODULE hDll = LoadLibrary(TEXT("Kernel32.dll"));
FuncPtr func_addr = reinterpret_cast<FuncPtr>(GetProcAddress(hDll, "WinExec"));
(*func_addr)("C:\\WINDOWS\\system32\\calc.exe", SW_SHOWDEFAULT);
FreeLibrary(hDll);
return (0);
}
As you can see, this code execute the microsoft calculator. I just want to do the same thing using MASM but the execution fails.
Here's the MASM source code :
.386
.model flat, stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\msvcrt.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\msvcrt.lib
.data
LpFileName db "kernel32.dll", 0
procName db "WinExec", 0
display db "addr_func = 0x%x", 0
.data?
hModule HMODULE ?
procAddr FARPROC ?
.code
start:
invoke LoadLibrary, offset LpFileName
mov hModule, eax
invoke GetProcAddress, hModule, ADDR procName
mov procAddr, eax
INVOKE crt_printf, ADDR display, procAddr
mov esi, procAddr
call esi
db "C:\WINDOWS\system32\calc.exe"
invoke FreeLibrary, hModule
invoke ExitProcess, NULL
end start
The crt_printf output is correct. The same address is printed like withe the C++ program. So the address passed to call is the same one. However the execution fails.
Here's a MASM32 code which works but this time the address of the function WinExec is hardcoded like this :
.386
.model flat, stdcall
option casemap:none
include \masm32\include\windows.inc
.code
start:
jmp _Debut
_Final:
TCHAR 233
dword 42424242h
_Suite:
mov esi, 779e304eh
call esi
jmp _Final
_Debut:
xor eax, eax
push eax
call _Suite
db "C:\WINDOWS\system32\calc.exe"
end start
See the line mov esi, 779e304eh. But dynamically, there is a problem. If I disassemble the code just above we can see that the order of bytes is reversed.
8EEH047E379
Maybe it's not the case dynamically and maybe I need a keyword in the following line (between the comma and procAddr):
mov esi, procAddr
I cannot find the solution. I'm lost. Can anyone help me?
Thanks a lot in advance for your help.

The execution fails because you are not passing it's parameters.
Here you just call the function without any arguments or rather with invalid arguments (because whatever is currently on the stack will be taken and the stack is corrupted in the process).
mov esi, procAddr
call esi
You should do
push SW_SHOWDEFAULT
push offset YourPathToCalc
mov esi, procAddr
call esi
In your samplecode this is what is done here implicitly
xor eax, eax
push eax ; uCmdShow
call _Suite ; Returnadress is the address of the commandline so this is bascially the "push path"
Another thing you are missing is, that when WinExec returns, it will start executing the path in your case so you need a jmp somewhere after the call.
And as Gunner pointed out, the path must be 0 terminated.

To add to the correct answer Devolus posted, your path to calc is not NULL terminated.
This
"C:\WINDOWS\system32\calc.exe" is not correct!
Instead it should be:
"C:\WINDOWS\system32\calc.exe", 0
Also, if you are going to put strings in the code section to use, you need to give them a label in order to use them and you need to jump over them otherwise the CPU will try to execute the bytes.
INVOKE crt_printf, ADDR display, procAddr
mov esi, procAddr
push SW_SHOWDEFAULT
push offset Calc
call esi
jmp #F
Calc db "C:\WINDOWS\system32\calc.exe", 0
##:
invoke FreeLibrary, hModule
invoke ExitProcess, NULL
* EDIT *
To convert that code to Assembly using MASM, all that is needed is this:
.data
LpFileName db "kernel32", 0
procName db "WinExec", 0
Calc db "calc", 0
.code
start:
invoke LoadLibrary, offset LpFileName
push eax
invoke GetProcAddress, eax, ADDR procName
push SW_SHOWDEFAULT
push offset Calc
call eax
call FreeLibrary
invoke ExitProcess, NULL
end start

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio