We are basically using sparc architecture and our language is Ada we are using gnat compiler to compile our code.
We observed something funny.
Some of the constant in our code are having two or more copies.
file.adb:
With FileConsts; USE FileConsts
Procedure SomeProcedure is
A : LONG_FLOAT;
Begin
A := cSomeConstant;
End SomeProcedure;
FileConsts.ads
cSomeConstant : CONSTANT LONG_FLOAT := 100.0;
In the Map file we have basically
.rodata 0x40010000 (0x8)file.o
.rodata 0x40010008 pfileconsts__csomeconstant
In the Assembly it is accessing the area of file.o, i.e 0x40010000 instead if 0x40010008. In the binary file the value at 0x40010000 and 0x40010008 is actually same, so program is behaving as expected. But why would the compiler do that
If any other package(file2.adb) also accesses the cSomeConstant, it is making another copy in the section
.rodata 0x40010010 (0x8)file2.o
Again the value in binary file is same as cSomeConstant
Why compiler is behaving this way? How to suppress this behavior ?
It really is confusing while debugging.
You should remember that typed "constants" aren't static in Ada.
If you want a static constant, use a "named number":
Some_Constant : constant := 100.0;
(I don't know which code the compiler will generate in this case.)
Related
int a = 10;
if(a >= 5)
printf("Hello World");
int b;
b = 3;
For example I command "info locals" before execute line 4 "int b;" but gdb print information of variables a and b. Why gdb work like this and how can I print only declared variables?
It depends on the way the compiler decided to compile your code.
In your case, I believe that since variable b will always be used in your code, then the compiler might "declared" both a and b together to optimize execution time.
If you really want to understand what happened, disassemble the program and see the assembly that actually runs when you execute this code.
I assume you will discover that the instruction that allocates the space on the stack frame for the variable b allocated the space for both variables (a & b) at the same time.
gdb shows b variable because it has been declared by your compiler
Assuming this code is inside a function, once execution flow enters the function it allocates local variables in the stack, this is where a and b values reside. That's because the compiler reads your code and makes all declarations at the beginning, even if they haven't been declared on top of your function.
Take a look at How the local variable stored in stack
I am using Go 1.14 with linux/riscv64 target, and I'm compiling a hello world where I am seeing this in the assembly:
1b078: 04813183 ld gp,72(sp)
1b07c: 00018003 lb zero,0(gp)
1b080: 00313423 sd gp,8(sp)
As you can see there is a load to zero from [GP+0], which should be an "exception or whatever" according to the specification:
Loads with a destination of x0 must still raise any exceptions and cause any other side effects even though the load value is discarded.
What exactly is going on here? Is the compiler producing erroneous output?
I don’t know anything about go on riscv but this is a common pattern.
The memory access only checks that [gp+0] is accessible and readable, without actually reading.
This is useful for cases like:
func f(a *[0x100001]byte) {
(*a)[0x100000] = 1;
}
The compiler must generate the following pseudo code:
check_not_null(a)
store(a + 0x100000, 1)
The null check can be implemented using the same construct that you’ve discovered, without branches.
int 0x80 is a system call, it's also 128 in hexa.
why kernel use int 0x80 as interrupt and when i declare int x he knows it's just an integer named x and vice versa ?
You appear to be confused about the difference between C and assembly language. Both are programming languages, (nowadays) both accept the 0xNNNN notation for writing numbers in hexadecimal, and there's usually some way to embed tiny snippets of assembly language in a C program, but they are different languages. The keyword int means something completely different in C than it does in (x86) assembly language.
To a C compiler, int always and only means to declare something involving an integer, and there is no situation where you can immediately follow int with a numeric literal. int 0x80 (or int 128, or int 23, or anything else of the sort) is always a syntax error in C.
To an x86 assembler, int always and only means to generate machine code for the INTerrupt instruction, and a valid operand for that instruction (an "imm8", i.e. a number in the range 0–255) must be the next thing on the line. int x; is a syntax error in x86 assembly language, unless x has been defined as a constant in the appropriate range using the assembler's macro facilities.
Obvious follow-up question: If a C compiler doesn't recognize int as the INTerrupt instruction, how does a C program (compiled for x86) make system calls? There are four complementary answers to this question:
Most of the time, in a C program, you do not make system calls directly. Instead, you call functions in the C library that do it for you. When processing your program, as far as the C compiler knows, open (for instance) is no different than any other external function. So it doesn't need to generate an int instruction. It just does call open.
But the C library is just more C that someone else wrote for you, isn't it? Yet, if you disassemble the implementation of open, you will indeed see an int instruction (or maybe syscall or sysenter instead). How did the people who wrote the C library do that? They wrote that function in assembly language, not in C. Or they used that technique for embedding snippets of assembly language in a C program, which brings us to ...
How does that work? Doesn't that mean the C compiler does need to understand int as an assembly mnemonic sometimes? Not necessarily. Let's look at the GCC syntax for inserting assembly—this could be an implementation of open for x86/32/Linux:
int open(const char *path, int flags, mode_t mode)
{
int ret;
asm ("int 0x80"
: "=a" (ret)
: "0" (SYS_open), "d" (path), "c" (flags), "D" (mode));
if (ret >= 0) return ret;
return __set_errno(ret);
}
You don't need to understand the bulk of that: the important thing for purpose of this question is, yes, it says int 0x80, but it says it inside a string literal. The compiler will copy the contents of that string literal, verbatim, into the generated assembly-language file that it will then feed to the assembler. It doesn't need to know what it means. That's the assembler's job.
More generally, there are lots of words that mean one thing in C and a completely different thing in assembly language. A C compiler produces assembly language, so it has to "know" both of the meanings of those words, right? It does, but it does not confuse them, because they are always used in separate contexts. "add" being an assembly mnemonic that the C compiler knows how to use, does not mean that there is any problem with naming a variable "add" in a C program, even if the "add" instruction gets used in that program.
In all flavors of GCC, local variables that don't fit into registers are stored on the stack. For accessing them, one uses constructs like [ESP+n] or [EBP-n], where n might involve an offset within the variable.
When passing such variables to GCC inline assembly as operands, a spare register is used to store the calculated address. Is there a way to designate operands as "the base register of this variable" and/or "the offset of this variable relative to the base register"?
If you do something like
int stackvar;
...
asm ("...":"r"(stackvar))
you force GCC to load stackvar into register. If you add m constraint, you don't:
int stackvar;
...
asm ("...":"rm"(stackvar))
For the following statement inside function func(), I'm trying to figure out the variable name (which is 'dictionary' in the example) that points to the malloc'ed memory region.
Void func() {
uint64_t * dictionary = (uint64_t *) malloc ( sizeof(uint64_t) * 128 );
}
The instrumented malloc() can record the start address and size of the allocation. However, no knowledge of variable 'dictionary' that will be assigned to, any features from the compilers side can help to solve this problem, without modifying the compiler to instrument such assignment statements?
One way I've been thinking is to use the feature that variable 'dictionary' and function 'malloc' is on one source code line or next to each other, the dwarf provides line information.
One thing you can do with Clang and LLVM is emit the code with debug information and then look for malloc calls. These will be assigned to LLVM values, which can be traced (when not compiled with optimizations, that is) to the original C/C++ source code via the debug information metadata.