How kernel know the difference between "int 0x80" and "int x" - linux-kernel

int 0x80 is a system call, it's also 128 in hexa.
why kernel use int 0x80 as interrupt and when i declare int x he knows it's just an integer named x and vice versa ?

You appear to be confused about the difference between C and assembly language. Both are programming languages, (nowadays) both accept the 0xNNNN notation for writing numbers in hexadecimal, and there's usually some way to embed tiny snippets of assembly language in a C program, but they are different languages. The keyword int means something completely different in C than it does in (x86) assembly language.
To a C compiler, int always and only means to declare something involving an integer, and there is no situation where you can immediately follow int with a numeric literal. int 0x80 (or int 128, or int 23, or anything else of the sort) is always a syntax error in C.
To an x86 assembler, int always and only means to generate machine code for the INTerrupt instruction, and a valid operand for that instruction (an "imm8", i.e. a number in the range 0–255) must be the next thing on the line. int x; is a syntax error in x86 assembly language, unless x has been defined as a constant in the appropriate range using the assembler's macro facilities.
Obvious follow-up question: If a C compiler doesn't recognize int as the INTerrupt instruction, how does a C program (compiled for x86) make system calls? There are four complementary answers to this question:
Most of the time, in a C program, you do not make system calls directly. Instead, you call functions in the C library that do it for you. When processing your program, as far as the C compiler knows, open (for instance) is no different than any other external function. So it doesn't need to generate an int instruction. It just does call open.
But the C library is just more C that someone else wrote for you, isn't it? Yet, if you disassemble the implementation of open, you will indeed see an int instruction (or maybe syscall or sysenter instead). How did the people who wrote the C library do that? They wrote that function in assembly language, not in C. Or they used that technique for embedding snippets of assembly language in a C program, which brings us to ...
How does that work? Doesn't that mean the C compiler does need to understand int as an assembly mnemonic sometimes? Not necessarily. Let's look at the GCC syntax for inserting assembly—this could be an implementation of open for x86/32/Linux:
int open(const char *path, int flags, mode_t mode)
{
int ret;
asm ("int 0x80"
: "=a" (ret)
: "0" (SYS_open), "d" (path), "c" (flags), "D" (mode));
if (ret >= 0) return ret;
return __set_errno(ret);
}
You don't need to understand the bulk of that: the important thing for purpose of this question is, yes, it says int 0x80, but it says it inside a string literal. The compiler will copy the contents of that string literal, verbatim, into the generated assembly-language file that it will then feed to the assembler. It doesn't need to know what it means. That's the assembler's job.
More generally, there are lots of words that mean one thing in C and a completely different thing in assembly language. A C compiler produces assembly language, so it has to "know" both of the meanings of those words, right? It does, but it does not confuse them, because they are always used in separate contexts. "add" being an assembly mnemonic that the C compiler knows how to use, does not mean that there is any problem with naming a variable "add" in a C program, even if the "add" instruction gets used in that program.

Related

GCC and cast-as-lvalue : what if I do want to cast as Lvalue?

To avoid declaring flows of one-time-use variables, I like to proceed using as few variables as possible and recasting them like so:
int main()
{
int i1;
#define cChar ((char)i1)
cChar='a';
#undef cChar
}
Which gives me the famous "error : lvalue required as left operand of assignment".
After reading a bit about this issue on this forum, it was pointed out that since 4.0, the cast-as-lvalue was removed (https://gcc.gnu.org/gcc-4.0/changes.html).
However, I fail to understand the why behind this and I was wondering if there was an option (apparently not) or even an alternative to GCC that would accept this kind of operation, which has been working for ages on ye olde compiler (obviously not GCC, but like Borland C++).

Thread Declaration in C++

I was going through a project code in C. In there, I saw this declaration of thread :
pthread_t ui_thread = (pthread_t) 0;
I didn't understand the part starting from '=' operator. What is it and how can I code the same declaration in C++.
(pthread_t) 0 converts the literal integer value 0 to a thread handle pthread_t. This assumes that such a conversion is possible, and valid, and that this is a meaningful value (probably expected to be "no thread").
The full statement creates a variable ui_thread which is a thread handle of type pthread_t, and initializes it with this value.
In C++, you could probably write the same if you were on a platform where it was valid for C. However, you would be better to use the C++ thread library.
std::thread t;
will create a default-constructed thread handle with no associated thread, which is likely the equivalent to the above.
The (pthread_t) part is known as type casting in C. Also called explicit type conversion. It is just a way for the programmer to inform the compiler that the programmer means for the value (0 in this case) to be treated as the type pthread_t.
The code you have is still valid C++.
In C++11 you can probably just do this:
pthread_t ui_thread{nullptr};

GCC inline assembly read value from array

While learning gcc inline assembly I was playing a bit with memory access. I'm trying to read a value from an array using a value from a different array as index.
Both arrays are initialized to something.
Initialization:
uint8_t* index = (uint8_t*)malloc(256);
memset(index, 33, 256);
uint8_t* data = (uint8_t*)malloc(256);
memset(data, 44, 256);
Array access:
unsigned char read(void *index,void *data) {
unsigned char value;
asm __volatile__ (
" movzb (%1), %%edx\n"
" movzb (%2, %%edx), %%eax\n"
: "=r" (value)
: "c" (index), "c" (data)
: "%eax", "%edx");
return value;
}
This is how I use the function:
unsigned char value = read(index, data);
Now I would expect it to return 44. But it actually returns me some random value. Am I reading from uninitialzed memory? Also I'm not sure how to tell the compiler that it should assign the value from eax to the variable value.
You told the compiler you were going to put the output in %0, and it could pick any register for that "=r". But instead you never write %0 in your template.
And you use two temporaries for no apparent reason when you could have used %0 as the temporary.
As usual, you can debug your inline asm by adding comments like # 0 = %0 and looking at the compiler's asm output. (Not disassembly, just gcc -S to see what it fills in. e.g. # 0 = %ecx. (You didn't use an early-clobber "=&r" so it can pick the same register as inputs).
Also, this has 2 other bugs:
doesn't compile. Requesting 2 different operands in ECX with "c" constraints can't work unless the compiler can prove at compile-time that they have the same value so %1 and %2 can be the same register. https://godbolt.org/z/LgR4xS
You dereference pointer inputs without telling the compiler you're reading the pointed-to memory. Use a "memory" clobber or dummy memory operands. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
Or better https://gcc.gnu.org/wiki/DontUseInlineAsm because it's useless for this; just let GCC emit the movzb loads itself. unsigned char* is safe from strict-aliasing UB so you can safely cast any pointer to unsigned char* and dereference it, without even having to use memcpy or other hacks to fight against language rules for wider unaligned or type-punned accesses.
But if you insist on inline asm, read manuals and tutorials, links at https://stackoverflow.com/tags/inline-assembly/info. You can't just throw code at the wall until it sticks with inline asm: you must understand why your code is safe to have any hope of it being safe. There are many ways for inline asm to happen to work but actually be broken, or be waiting to break with different surrounding code.
This is a safe and not totally terrible version (other than the unavoidable optimization-defeating parts of inline asm). You do still want a movzbl load for both loads, even though the return value is only 8 bits. movzbl is the natural efficient way to load a byte, replacing instead of merging with the old contents of a full register.
unsigned char read(void *index, void *data)
{
uintptr_t value;
asm (
" movzb (%[idx]), %k[out] \n\t"
" movzb (%[arr], %[out]), %k[out]\n"
: [out] "=&r" (value) // early-clobber output
: [idx] "r" (index), [arr] "r" (data)
: "memory" // we deref some inputs as pointers
);
return value;
}
Note the early-clobber on the output: this stops gcc from picking the same register for output as one of the inputs. It would be safe for it to destroy the [idx] register with the first load, but I don't know how to tell GCC that in one asm statement. You could split your asm statement into two separate ones, each with their own input and output operands, connecting the output of the first to the input of the 2nd via a local variable. Then neither one would need early-clobber because they're just wrapping single instructions like GNU C inline asm syntax is designed to do nicely.
Godbolt with test caller to see how it inlines / optimizes when called twice, with i386 clang and x86-64 gcc. e.g. asking for index in a register forces an LEA, instead of letting the compiler see the deref and letting it pick an addressing mode for *index. Also the extra movzbl %al, %eax done by the compiler when adding to unsigned sum because we used a narrow return type.
I used uintptr_t value so this can compile for 32-bit and 64-bit x86. There's no harm in making the output from the asm statement wider than the return value of the function, and that saves us from having to use size modifiers like movzbl (%1), %k0 to get GCC to print the 32-bit register name (like EAX) if it chose AL for an 8-bit output variable, for example.
I did decided to actually use %k[out] for the benefit of 64-bit mode: we want movzbl (%rdi), %eax, not movzb (%rdi), %rax (wasting a REX prefix).
You might as well declare the function to return unsigned int or uintptr_t, though, so the compiler knows that it doesn't have to redo zero-extension. OTOH sometimes it can help the compiler to know that the value-range is only 0..255. You could tell it that you produce a correctly-zero-extend value using if(retval>255) __builtin_unreachable() or something. Or you could just not use inline asm.
You don't need asm volatile. (Assuming you want to let it optimize away if the result is unused, or be hoisted out of loops for constant inputs). You only need a "memory" clobber so if it does get used, the compiler knows that it reads memory.
(A "memory" clobber counts as all memory being an input, and all memory being an output. So it can't CSE, e.g. hoist out of a loop, because as far as the compiler knows one invocation might read something a previous one wrote. So in practice a "memory" clobber is about as bad as asm volatile. Even two back-to-back calls to this function without touching the input array force the compiler to emit the instructions twice.)
You could avoid this with dummy memory-input operands so the compiler knows this asm block doesn't modify memory, only read it. But if you actually care about efficiency, you shouldn't be using inline asm for this.
But like I said there is zero reason to use inline asm:
This will do exactly the same thing in 100% portable and safe ISO C:
// safe from strict-aliasing violations
// because unsigned char* can alias anything
inline
unsigned char read(void *index, void *data) {
unsigned idx = *(unsigned char*)index;
unsigned char * dp = data;
return dp[idx];
}
You could cast one or both pointers to volatile unsigned char* if you insist on the access happening every time and not being optimized away.
Or maybe even to atomic<unsigned char> * depending on what you're doing. (That's a hack, prefer C++20 atomic_ref to atomically load/store on objects that are normally not atomic.)

Return Code Best Practices in Windows apps [duplicate]

What is the correct (most efficient) way to define the main() function in C and C++ — int main() or void main() — and why? And how about the arguments?
If int main() then return 1 or return 0?
There are numerous duplicates of this question, including:
What are the valid signatures for C's main() function?
The return type of main() function
Difference between void main() and int main()?
main()'s signature in C++
What is the proper declaration of main()? — For C++, with a very good answer indeed.
Styles of main() functions in C
Return type of main() method in C
int main() vs void main() in C
Related:
C++ — int main(int argc, char **argv)
C++ — int main(int argc, char *argv[])
Is char *envp[] as a third argument to main() portable?
Must the int main() function return a value in all compilers?
Why is the type of the main() function in C and C++ left to the user to define?
Why does int main(){} compile?
Legal definitions of main() in C++14?
The return value for main indicates how the program exited. Normal exit is represented by a 0 return value from main. Abnormal exit is signaled by a non-zero return, but there is no standard for how non-zero codes are interpreted. As noted by others, void main() is prohibited by the C++ standard and should not be used. The valid C++ main signatures are:
int main(void)
and
int main(int argc, char **argv)
which is equivalent to
int main(int argc, char *argv[])
It is also worth noting that in C++, int main() can be left without a return-statement, at which point it defaults to returning 0. This is also true with a C99 program. Whether return 0; should be omitted or not is open to debate. The range of valid C program main signatures is much greater.
Efficiency is not an issue with the main function. It can only be entered and left once (marking the program's start and termination) according to the C++ standard. For C, re-entering main() is allowed, but should be avoided.
The accepted answer appears to be targetted for C++, so I thought I'd add an answer that pertains to C, and this differs in a few ways. There were also some changes made between ISO/IEC 9899:1989 (C90) and ISO/IEC 9899:1999 (C99).
main() should be declared as either:
int main(void)
int main(int argc, char **argv)
Or equivalent. For example, int main(int argc, char *argv[]) is equivalent to the second one. In C90, the int return type can be omitted as it is a default, but in C99 and newer, the int return type may not be omitted.
If an implementation permits it, main() can be declared in other ways (e.g., int main(int argc, char *argv[], char *envp[])), but this makes the program implementation defined, and no longer strictly conforming.
The standard defines 3 values for returning that are strictly conforming (that is, does not rely on implementation defined behaviour): 0 and EXIT_SUCCESS for a successful termination, and EXIT_FAILURE for an unsuccessful termination. Any other values are non-standard and implementation defined. In C90, main() must have an explicit return statement at the end to avoid undefined behaviour. In C99 and newer, you may omit the return statement from main(). If you do, and main() finished, there is an implicit return 0.
Finally, there is nothing wrong from a standards point of view with calling main() recursively from a C program.
Standard C — Hosted Environment
For a hosted environment (that's the normal one), the C11 standard (ISO/IEC 9899:2011) says:
5.1.2.2.1 Program startup
The function called at program startup is named main. The implementation declares no
prototype for this function. It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be
used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent;10) or in some other implementation-defined manner.
If they are declared, the parameters to the main function shall obey the following
constraints:
The value of argc shall be nonnegative.
argv[argc] shall be a null pointer.
If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup. The
intent is to supply to the program information determined prior to program startup
from elsewhere in the hosted environment. If the host environment is not capable of
supplying strings with letters in both uppercase and lowercase, the implementation
shall ensure that the strings are received in lowercase.
If the value of argc is greater than zero, the string pointed to by argv[0]
represents the program name; argv[0][0] shall be the null character if the
program name is not available from the host environment. If the value of argc is
greater than one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters.
The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.
10) Thus, int can be replaced by a typedef name defined as int, or the type of argv can be written as
char **argv, and so on.
Program termination in C99 or C11
The value returned from main() is transmitted to the 'environment' in an implementation-defined way.
5.1.2.2.3 Program termination
1 If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument;11) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
11) In accordance with 6.2.4, the lifetimes of objects with automatic storage duration declared in main
will have ended in the former case, even where they would not have in the latter.
Note that 0 is mandated as 'success'. You can use EXIT_FAILURE and EXIT_SUCCESS from <stdlib.h> if you prefer, but 0 is well established, and so is 1. See also Exit codes greater than 255 — possible?.
In C89 (and hence in Microsoft C), there is no statement about what happens if the main() function returns but does not specify a return value; it therefore leads to undefined behaviour.
7.22.4.4 The exit function
¶5 Finally, control is returned to the host environment. If the value of status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If the value of status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.
Standard C++ — Hosted Environment
The C++11 standard (ISO/IEC 14882:2011) says:
3.6.1 Main function [basic.start.main]
¶1 A program shall contain a global function called main, which is the designated start of the program. [...]
¶2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall
have a return type of type int, but otherwise its type is implementation defined.
All implementations
shall allow both of the following definitions of main:
int main() { /* ... */ }
and
int main(int argc, char* argv[]) { /* ... */ }
In the latter form argc shall be the number of arguments passed to the program from the environment
in which the program is run. If argc is nonzero these arguments shall be supplied in argv[0]
through argv[argc-1] as pointers to the initial characters of null-terminated multibyte strings (NTMBSs) (17.5.2.1.4.2) and argv[0] shall be the pointer to the initial character of a NTMBS that represents the
name used to invoke the program or "". The value of argc shall be non-negative. The value of argv[argc]
shall be 0. [Note: It is recommended that any further (optional) parameters be added after argv. —end
note]
¶3 The function main shall not be used within a program. The linkage (3.5) of main is implementation-defined. [...]
¶5 A return statement in main has the effect of leaving the main function (destroying any objects with automatic
storage duration) and calling std::exit with the return value as the argument. If control reaches the end
of main without encountering a return statement, the effect is that of executing
return 0;
The C++ standard explicitly says "It [the main function] shall have a return type of type int, but otherwise its type is implementation defined", and requires the same two signatures as the C standard to be supported as options. So a 'void main()' is directly not allowed by the C++ standard, though there's nothing it can do to stop a non-standard implementation allowing alternatives. Note that C++ forbids the user from calling main (but the C standard does not).
There's a paragraph of §18.5 Start and termination in the C++11 standard that is identical to the paragraph from §7.22.4.4 The exit function in the C11 standard (quoted above), apart from a footnote (which simply documents that EXIT_SUCCESS and EXIT_FAILURE are defined in <cstdlib>).
Standard C — Common Extension
Classically, Unix systems support a third variant:
int main(int argc, char **argv, char **envp) { ... }
The third argument is a null-terminated list of pointers to strings, each of which is an environment variable which has a name, an equals sign, and a value (possibly empty). If you do not use this, you can still get at the environment via 'extern char **environ;'. This global variable is unique among those in POSIX in that it does not have a header that declares it.
This is recognized by the C standard as a common extension, documented in Annex J:
###J.5.1 Environment arguments
¶1 In a hosted environment, the main function receives a third argument, char *envp[],
that points to a null-terminated array of pointers to char, each of which points to a string
that provides information about the environment for this execution of the program (5.1.2.2.1).
Microsoft C
The Microsoft VS 2010 compiler is interesting. The web site says:
The declaration syntax for main is
int main();
or, optionally,
int main(int argc, char *argv[], char *envp[]);
Alternatively, the main and wmain functions can be declared as returning void (no return value). If you declare main or wmain as returning void, you cannot return an exit code to the parent process or operating system by using a return statement. To return an exit code when main or wmain is declared as void, you must use the exit function.
It is not clear to me what happens (what exit code is returned to the parent or OS) when a program with void main() does exit — and the MS web site is silent too.
Interestingly, MS does not prescribe the two-argument version of main() that the C and C++ standards require. It only prescribes a three argument form where the third argument is char **envp, a pointer to a list of environment variables.
The Microsoft page also lists some other alternatives — wmain() which takes wide character strings, and some more.
The Microsoft Visual Studio 2005 version of this page does not list void main() as an alternative. The versions from Microsoft Visual Studio 2008 onwards do.
Standard C — Freestanding Environment
As noted early on, the requirements above apply to hosted environments. If you are working with a freestanding environment (which is the alternative to a hosted environment), then the standard has much less to say. For a freestanding environment, the function called at program startup need not be called main and there are no constraints on its return type. The standard says:
5.1.2 Execution environments
Two execution environments are defined: freestanding and hosted. In both cases,
program startup occurs when a designated C function is called by the execution
environment. All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified. Program termination returns control to the execution environment.
5.1.2.1 Freestanding environment
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The effect of program termination in a freestanding environment is implementation-defined.
The cross-reference to clause 4 Conformance refers to this:
¶5 A strictly conforming program shall use only those features of the language and library specified in this International Standard.3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
¶6 The two forms of conforming implementation are hosted and freestanding. A conforming hosted implementation shall accept any strictly conforming program. A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (clause 7) is confined to the contents of the standard headers <float.h>, <iso646.h>, <limits.h>, <stdalign.h>,
<stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and
<stdnoreturn.h>. A conforming implementation may have extensions (including
additional library functions), provided they do not alter the behavior of any strictly conforming program.4)
¶7 A conforming program is one that is acceptable to a conforming implementation.5)
3) A strictly conforming program can use conditional features (see 6.10.8.3) provided the use is guarded by an appropriate conditional inclusion preprocessing directive using the related macro. For example:
#ifdef __STDC_IEC_559__ /* FE_UPWARD defined */
/* ... */
fesetround(FE_UPWARD);
/* ... */
#endif
4) This implies that a conforming implementation reserves no identifiers other than those explicitly reserved in this International Standard.
5) Strictly conforming programs are intended to be maximally portable among conforming implementations. Conforming programs may depend upon non-portable features of a conforming implementation.
It is noticeable that the only header required of a freestanding environment that actually defines any functions is <stdarg.h> (and even those may be — and often are — just macros).
Standard C++ — Freestanding Environment
Just as the C standard recognizes both hosted and freestanding environment, so too does the C++ standard. (Quotes from ISO/IEC 14882:2011.)
1.4 Implementation compliance [intro.compliance]
¶7 Two kinds of implementations are defined: a hosted implementation and a freestanding implementation. For a hosted implementation, this International Standard defines the set of available libraries. A freestanding
implementation is one in which execution may take place without the benefit of an operating system, and has an implementation-defined set of libraries that includes certain language-support libraries (17.6.1.3).
¶8 A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that
use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.
¶9 Each implementation shall include documentation that identifies all conditionally-supported constructs that it does not support and defines all locale-specific characteristics.3
3) This documentation also defines implementation-defined behavior; see 1.9.
17.6.1.3 Freestanding implementations [compliance]
Two kinds of implementations are defined: hosted and freestanding (1.4). For a hosted implementation, this International Standard describes the set of available headers.
A freestanding implementation has an implementation-defined set of headers. This set shall include at least the headers shown in Table 16.
The supplied version of the header <cstdlib> shall declare at least the functions abort, atexit, at_quick_exit, exit, and quick_exit (18.5). The other headers listed in this table shall meet the same requirements as for a hosted implementation.
Table 16 — C++ headers for freestanding implementations
Subclause Header(s)
<ciso646>
18.2 Types <cstddef>
18.3 Implementation properties <cfloat> <limits> <climits>
18.4 Integer types <cstdint>
18.5 Start and termination <cstdlib>
18.6 Dynamic memory management <new>
18.7 Type identification <typeinfo>
18.8 Exception handling <exception>
18.9 Initializer lists <initializer_list>
18.10 Other runtime support <cstdalign> <cstdarg> <cstdbool>
20.9 Type traits <type_traits>
29 Atomics <atomic>
What about using int main() in C?
The standard §5.1.2.2.1 of the C11 standard shows the preferred notation — int main(void) — but there are also two examples in the standard which show int main(): §6.5.3.4 ¶8 and §6.7.6.3 ¶20. Now, it is important to note that examples are not 'normative'; they are only illustrative. If there are bugs in the examples, they do not directly affect the main text of the standard. That said, they are strongly indicative of expected behaviour, so if the standard includes int main() in an example, it suggests that int main() is not forbidden, even if it is not the preferred notation.
6.5.3.4 The sizeof and _Alignof operators
…
¶8 EXAMPLE 3 In this example, the size of a variable length array is computed and returned from a function:
#include <stddef.h>
size_t fsize3(int n)
{
char b[n+3]; // variable length array
return sizeof b; // execution time sizeof
}
int main()
{
size_t size;
size = fsize3(10); // fsize3 returns 13
return 0;
}
A function definition like int main(){ … } does specify that the function takes no arguments, but does not provide a function prototype, AFAICT. For main() that is seldom a problem; but it does mean that if you have recursive calls to main(), the arguments won't be checked. For other functions, it is more of a problem — you really need a prototype in scope when the function is called to ensure that the arguments are correct.
You don't normally call main() recursively, outside of places like IOCCC — and you are explicitly forbidden from doing so in C++. I do have a test program that does it — mainly for novelty. If you have:
int i = 0;
int main()
{
if (i++ < 10)
main(i, i * i);
return 0;
}
and compile with GCC and don't include -Wstrict-prototypes, it compiles cleanly under stringent warnings. If it's main(void), it fails to compile because the function definition says "no arguments".
I believe that main() should return either EXIT_SUCCESS or EXIT_FAILURE. They are defined in stdlib.h
Note that the C and C++ standards define two kinds of implementations: freestanding and hosted.
C90 hosted environment
Allowed forms 1:
int main (void)
int main (int argc, char *argv[])
main (void)
main (int argc, char *argv[])
/*... etc, similar forms with implicit int */
Comments:
The former two are explicitly stated as the allowed forms, the others are implicitly allowed because C90 allowed "implicit int" for return type and function parameters. No other form is allowed.
C90 freestanding environment
Any form or name of main is allowed 2.
C99 hosted environment
Allowed forms 3:
int main (void)
int main (int argc, char *argv[])
/* or in some other implementation-defined manner. */
Comments:
C99 removed "implicit int" so main() is no longer valid.
A strange, ambiguous sentence "or in some other implementation-defined manner" has been introduced. This can either be interpreted as "the parameters to int main() may vary" or as "main can have any implementation-defined form".
Some compilers have chosen to interpret the standard in the latter way. Arguably, one cannot easily state that they are not conforming by citing the standard in itself, since it is is ambiguous.
However, to allow completely wild forms of main() was probably(?) not the intention of this new sentence. The C99 rationale (not normative) implies that the sentence refers to additional parameters to int main 4.
Yet the section for hosted environment program termination then goes on arguing about the case where main does not return int 5. Although that section is not normative for how main should be declared, it definitely implies that main might be declared in a completely implementation-defined way even on hosted systems.
C99 freestanding environment
Any form or name of main is allowed 6.
C11 hosted environment
Allowed forms 7:
int main (void)
int main (int argc, char *argv[])
/* or in some other implementation-defined manner. */
C11 freestanding environment
Any form or name of main is allowed 8.
Note that int main() was never listed as a valid form for any hosted implementation of C in any of the above versions. In C, unlike C++, () and (void) have different meanings. The former is an obsolescent feature which may be removed from the language. See C11 future language directions:
6.11.6 Function declarators
The use of function declarators with empty parentheses (not prototype-format parameter type declarators) is an obsolescent feature.
C++03 hosted environment
Allowed forms 9:
int main ()
int main (int argc, char *argv[])
Comments:
Note the empty parenthesis in the first form. C++ and C are different in this case, because in C++ this means that the function takes no parameters. But in C it means that it may take any parameter.
C++03 freestanding environment
The name of the function called at startup is implementation-defined. If it is named main() it must follow the stated forms 10:
// implementation-defined name, or
int main ()
int main (int argc, char *argv[])
C++11 hosted environment
Allowed forms 11:
int main ()
int main (int argc, char *argv[])
Comments:
The text of the standard has been changed but it has the same meaning.
C++11 freestanding environment
The name of the function called at startup is implementation-defined. If it is named main() it must follow the stated forms 12:
// implementation-defined name, or
int main ()
int main (int argc, char *argv[])
References
ANSI X3.159-1989 2.1.2.2 Hosted environment. "Program startup"
The function called at program startup is named main. The
implementation declares no prototype for this function. It shall be
defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as
argc and argv, though any names may be used, as they are local to the
function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
ANSI X3.159-1989 2.1.2.1 Freestanding environment:
In a freestanding environment (in which C program execution may take
place without any benefit of an operating system), the name and type
of the function called at program startup are implementation-defined.
ISO 9899:1999 5.1.2.2 Hosted environment -> 5.1.2.2.1 Program startup
The function called at program startup is named main. The
implementation declares no prototype for this function. It shall be
defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as
argc and argv, though any names may be used, as they are local to the
function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent;9) or in some other implementation-defined
manner.
Rationale for International Standard — Programming Languages — C, Revision 5.10. 5.1.2.2 Hosted environment --> 5.1.2.2.1 Program startup
The behavior of the arguments to main, and of the interaction of exit, main and atexit
(see §7.20.4.2) has been codified to curb some unwanted variety in the representation of argv
strings, and in the meaning of values returned by main.
The specification of argc and argv as arguments to main recognizes extensive prior practice.
argv[argc] is required to be a null pointer to provide a redundant check for the end of the list, also on the basis of common practice.
main is the only function that may portably be declared either with zero or two arguments. (The number of other functions’ arguments must match exactly between invocation and definition.)
This special case simply recognizes the widespread practice of leaving off the arguments to main when the program does not access the program argument strings. While many implementations support more than two arguments to main, such practice is neither blessed nor forbidden by the Standard; a program that defines main with three arguments is not strictly conforming (see §J.5.1.).
ISO 9899:1999 5.1.2.2 Hosted environment --> 5.1.2.2.3 Program termination
If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument;11) reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int, the termination status returned to the host environment is unspecified.
ISO 9899:1999 5.1.2.1 Freestanding environment
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.
ISO 9899:2011 5.1.2.2 Hosted environment -> 5.1.2.2.1 Program startup
This section is identical to the C99 one cited above.
ISO 9899:1999 5.1.2.1 Freestanding environment
This section is identical to the C99 one cited above.
ISO 14882:2003 3.6.1 Main function
An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined. All implementations shall allow both of the following definitions of main:
int main() { /* ... */ }
and
int main(int argc, char* argv[]) { /* ... */ }
ISO 14882:2003 3.6.1 Main function
It is implementation-defined whether a program in a freestanding environment is required to define a main function.
ISO 14882:2011 3.6.1 Main function
An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined. All implementations shall
allow both
— a function of () returning int and
— a function of (int, pointer to pointer to char) returning int
as the type of main (8.3.5).
ISO 14882:2011 3.6.1 Main function
This section is identical to the C++03 one cited above.
Return 0 on success and non-zero for error. This is the standard used by UNIX and DOS scripting to find out what happened with your program.
main() in C89 and K&R C unspecified return types default to ’int`.
return 1? return 0?
If you do not write a return statement in int main(), the closing } will return 0 by default.
(In c++ and c99 onwards only, for c90 you must write return statement. Please see Why main does not return 0 here?)
return 0 or return 1 will be received by the parent process. In a shell it goes into a shell variable, and if you are running your program form a shell and not using that variable then you need not worry about the return value of main().
See How can I get what my main function has returned?.
$ ./a.out
$ echo $?
This way you can see that it is the variable $? which receives the least significant byte of the return value of main().
In Unix and DOS scripting, return 0 on success and non-zero for error are usually returned. This is the standard used by Unix and DOS scripting to find out what happened with your program and controlling the whole flow.
Keep in mind that,even though you're returning an int, some OSes (Windows) truncate the returned value to a single byte (0-255).
The return value can be used by the operating system to check how the program was closed.
Return value 0 usually means OK in most operating systems (the ones I can think of anyway).
It also can be checked when you call a process yourself, and see if the program exited and finished properly.
It's NOT just a programming convention.
The return value of main() shows how the program exited. If the return value is zero it means that the execution was successful while any non-zero value will represent that something went bad in the execution.
Omit return 0
When a C or C++ program reaches the end of main the compiler will automatically generate code to return 0, so there is no need to put return 0; explicitly at the end of main.
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit return; statements at the end of a void function. Reasons against omitting seem to boil down to "it looks weird". If, like me, you're curious about the rationale for the change to the C standard read this question. Also note that in the early 1990s this was considered "sloppy practice" because it was undefined behavior (although widely supported) at the time.
Additionally, the C++ Core Guidelines contains multiple instances of omitting return 0; at the end of main and no instances in which an explicit return is written. Although there is not yet a specific guideline on this particular topic in that document, that seems at least a tacit endorsement of the practice.
So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
Returning 0 should tell the programmer that the program has successfully finished the job.
What is the correct (most efficient) way to define the main() function in C and C++ — int main() or void main() — and why?
Those words "(most efficient)" don't change the question. Unless you're in a freestanding environment, there is one universally correct way to declare main(), and that's as returning int.
What should main() return in C and C++?
An int, pure and simple. And it's more than "what should main() return", it's "what must main() return". main() is, of course, a function that someone else calls. You don't have any control over the code that calls main. Therefore, you must declare main with a type-correct signature to match its caller. You simply don't have any choice in the matter. You don't have to ask yourself what's more or less efficient, or what's better or worse style, or anything like that, because the answer is already perfectly well defined, for you, by the C and C+ standards. Just follow them.
If int main() then return 1 or return 0?
0 for success, nonzero for failure. Again, not something you need to (or get to) pick: it's defined by the interface you're supposed to be conforming to.
What to return depends on what you want to do with the executable. For example if you are using your program with a command line shell, then you need to return 0 for a success and a non zero for failure. Then you would be able to use the program in shells with conditional processing depending on the outcome of your code. Also you can assign any nonzero value as per your interpretation, for example for critical errors different program exit points could terminate a program with different exit values , and which is available to the calling shell which can decide what to do by inspecting the value returned.
If the code is not intended for use with shells and the returned value does not bother anybody then it might be omitted. I personally use the signature int main (void) { .. return 0; .. }
If you really have issues related to efficiency of returning an integer from a process, you should probably avoid to call that process so many times that this return value becomes an issue.
If you are doing this (call a process so many times), you should find a way to put your logic directly inside the caller, or in a DLL file, without allocate a specific process for each call; the multiple process allocations bring you the relevant efficiency problem in this case.
In detail, if you only want to know if returning 0 is more or less efficient than returning 1, it could depend from the compiler in some cases, but generically, assuming they are read from the same source (local, field, constant, embedded in the code, function result, etc.) it requires exactly the same number of clock cycles.
Here is a small demonstration of the usage of return codes...
When using the various tools that the Linux terminal provides one can use the return code for example for error handling after the process has been completed. Imagine that the following text file myfile is present:
This is some example in order to check how grep works.
When you execute the grep command a process is created. Once it is through (and didn't break) it returns some code between 0 and 255. For example:
$ grep order myfile
If you do
$ echo $?
$ 0
you will get a 0. Why? Because grep found a match and returned an exit code 0, which is the usual value for exiting with a success. Why that is probably lies in the boolean nature of a simple check whether everything is ok or not. A simple negation of a 0 (boolean false) returns 1 (boolean true), which can easily be handled in a if-else statements.
Let's check it out again but with something that is not inside our text file and thus no match will be found:
$ grep foo myfile
$ echo $?
$ 1
Since grep failed to match the token "foo" with the content of our file the return code is 1 (this is the usual case when a failure occurs but as stated above you have plenty of values to choose from). Again if we put this in the simple boolean context (everything is ok or not) negating the 1 (boolean true) yields a 0 (boolean false), which again can easily be handled by an if-else statement. When it comes to boolean values anything that is not a 0 is considered to be equivalent to 1 (so 2, 3, 4 etc. in a simple if-else statement for checking whether an error has occurred or not will work the same way as if a 1 was used). You can use different return values to increase the granularity of your error state. It is considered a bad practice to use anything but a 0 for the state of successful execution (due to the reasons given above).
The following bash script (simply type it in a Linux terminal) although very basic should give some idea of error handling:
$ grep foo myfile
$ CHECK=$?
$ [ $CHECK -eq 0] && echo 'Match found'; [ $CHECK -ne 0] && echo 'No match was found'
$ No match was found
After the second line nothing is printed to the terminal since "foo" made grep return 1 and we check if the return code of grep was equal to 0. The second conditional statement echoes its message in the last line since it is true due to CHECK == 1.
As you can see if you are calling this and that process it is sometimes essential to see what it has returned (by the return value of main()), e.g. when running tests.
"int" is now mandated by the ISO for both C and C++ as the return type for "main".
Both languages previously allowed implicit "int", and for "main" to be declared without any return type. In fact, the very first external release of C++, itself (Release E of "cfront" from February 1985), which is written in its own language, declared "main" without any return type ... but returned an integer value: the number of errors or 127, whichever was smaller
As to the question of what to return: the ISO standards for C and C++ work in synchronization with the POSIX standard. For any hosted environment conforming to the POSIX standard,
(1) 126 is reserved for the OS's shell to indicate utilities that are not executable,
(2) 127 is reserved for the OS's shell to indicate that a command that is not found,
(3) the exit values for utilities are separately spelled out on a utility-by-utility basis,
(4) programs that invoke utilities outside the shell should use similar values for their own exits,
(5) the values 128 and above are meant for use to indicate termination that results from receiving a signal,
(6) the values 1-125 are for failures,
(7) the value 0 is for success.
In C and C++ the value EXIT_SUCCESS and EXIT_FAILURE are meant for use to handle the most common situation: for programs that report a success or just a generic failure. They may, but need not, be respectively equal to 0 and 1.
That means if you want a program to return different values for different failure modes or status indications, while continuing to make use of those two constants, you might have to resort to first making sure that your additional "failure" or "status" values lie strictly between max(EXIT_SUCCESS, EXIT_FAILURE) and 126 (and hope that there's enough room in-between), and to reserve EXIT_FAILURE to mark the generic or default failure mode.
Otherwise, if you're not going to use the constants, then you should go by what POSIX mandates.
For programs meant for use on free-standing environments or on hosts that are not POSIX-compliant, I can say nothing more, except the following:
I have written free-standing programs -- as multi-threaded programs on a custom run-time system (and a custom tool-base for everything else). The general rule I followed was that:
(1) "main" ran the foreground processes, which usually consisted only of start-up, configuration or initialization routines, but could have just as well included foreground processes meant for continual operation (like polling loops),
(2) "main" returns into an infinite sleep & wait loop,
(3) no return value for "main" was defined or used,
(4) background processes ran separately, as interrupt-driven & event-driven threads, independently of "main", terminated only by the receipt of a reset signal or by other threads ... or by simply shutting off the monitoring of whatever event was driving the thread.
In C, the Section 5.1.2.2.1 of the C11 standard (emphasis mine):
It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though
any names may be used, as they are local to the function in which they
are declared):
int main(int argc, char *argv[]) { /* ... */ }
However for some beginners like me, an abstract example would allow me to get a grasp on it:
When you write a method in your program, e.g. int read_file(char filename[LEN]);, then you want, as the caller of this method to know if everything went well (because failures can happen, e.g. file could not be found). By checking the return value of the method you can know if everything went well or not, it's a mechanism for the method to signal you about its successful execution (or not), and let the caller (you, e.g. in your main method) decide how to handle an unexpected failure.
So now imagine I write a C program for a micro-mechanism which is used in a more complex system. When the system calls the micro-mechanism, it wants to know if everything went as expected, so that it can handle any potential error. If the C program's main method would return void, then how would the calling-system know about the execution of its subsystem (the micro-mechanism)? It cannot, that's why main() returns int, in order to communicate to its caller a successful (or not) execution.
In other words:
The rational is that the host environment (i.e. Operating System (OS)) needs to know if the program finished correctly. Without an int-compatible type as a return type (eg. void), the "status returned to the host environment is unspecified" (i.e. undefined behavior on most OS).
On Windows, if a program crashes due to an access violation, the exit code will be STATUS_ACCESS_VIOLATION (0xC0000005). Similar for other kinds of crashes from an x86 exception as well.
So there are things other than what you return from main or pass to exit that can cause an exit code to be seen.

What are the limitations on the use of output registers in avr-gcc inline assembly?

Output register in inline assembly must be declared with the "=" constraint, meaning "write-only" [1]. What exactly does this mean - is it truly forbidden to read and modify them within the assembly? For example, consider this code:
uint8_t one ()
{
uint8_t res;
asm("ldi %[res],0\n"
"inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
The assembly sets the output register to 0 then increments it. Is this breaking the "write-only" constraint?
UPDATE
I'm seeing problems where my inline asm breaks when I change it to work directly on an output register, as opposed to using r16 for the computation and finally mov'ing r16 into the output register. The code is here: http://ideone.com/JTpYma . It prints results to serial, you just need to define F_CPU and BAUD. The problem appears only when using gcc-4.8.0 and not using gcc-4.7.2.
[1] http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
The compiler doesn't care whether you read it or not, it just won't put the initial value of the variable into the register. Your example is entirely legal, but people often wrongly expect to get result 2 from this code:
uint8_t one ()
{
uint8_t res = 1;
asm("inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
Since it's only an output constraint, the initial value of res is not guaranteed to be loaded into the register. In fact, the initializer may even be optimized away on the assumption that the asm block will overwrite it anyway. The above code is compiled to this by my version of avr-gcc:
inc r24
ret
As you can see, the compiler indeed removed loading 1 into res and hence into r24 thus producing undefined result.
Update
The problem with the updated program in the question is that it also has an input register operand. By default the compiler assumes that all inputs are consumed before the outputs are assigned so it's safe to allocate overlapping registers. That's clearly not the case for your example. You should use an "early clobber" modifier (&) for the output. This is what the manual has to say about that:
& Means (in a particular alternative) that this operand is an
earlyclobber operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is used as an input operand or as part of any
memory address.
Nobody said gcc inline asm was easy :D

Resources