How can I compile and run a C program (with OpenMP) in gem5 Full System? - openmp

I am an undergraduate student working on my thesis about parallel programming.
I am using OpenMP model. Now i want to use gem5 for measure performance.
That's why i install gem5 Full System successfully by following link:
http://cearial01.kaist.ac.kr/index.php/2016/08/26/gem5-documentation/
Now i want to compile & run a c program with OpenMP (matmul.c) using gem5.
How can i compile & run this program?
I mean inside which folder i stored this program file (matmul.c) for compile?
How i create object file of this program?
How can i change no of processor, cache memory size, no of cpu during running & compilation?
I am new student of this section. That's why my asking list is too big!
Hope anybody don't mind.
Best Regards,
Litu

How can I compile and run this program? I mean inside which folder I stored this program file (matmul.c) for compile? How I create object file of this program?
How to cross compile for an image is not gem5 specific, so I'll be brief.
First you must obtain a cross compiler for the image.
The best way to do that, is to get a cross compiler from the same source as the image to ensure compatibility.
My preferred approach is to use minimal Buildroot images. Buildroot:
builds both the cross compiler and the image for me, and thus ensure compatibility
makes it easy to automate building new software with its package system
can produce very simple images, which are more suitable for gem5 and architecture research
This is my setup on GitHub. It contains a minimal OpenMP hello world which I have successfully run inside of gem5.
Another good option is to use https://crosstool-ng.github.io/
Since you want OpenMP support, you must build the GCC cross compiler compiler with support.
I think this is done in GCC with:
./configure --enable-libgomp
or through the Buildroot option:
BR2_GCC_ENABLE_OPENMP=y
And then when compiling, you must pass the -fopenmp option to gcc.
How can I change the number of processors and the cache memory size?
The best way to answer that question yourself is to use something like:
./build/ARM/gem5.opt configs/example/fs.py -h
and search the options.
You come across:
-n NUM_CPUS, --num-cpus=NUM_CPUS.
For ARM you also need to pass a .dtb with the corresponding core count, e.g.: ./system/arm/dt/armv7_gem5_v1_2cpu.dtb for 2 cores.
caches: you will find the following options easily:
--caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024
But keep in mind that:
the Linux kernel does not see the cache sizes correctly as of fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4
How to switch CPU models in gem5 after restoring a checkpoint and then observe the difference?
caches only affect certain CPU types, usually the more detailed such as ARM HPI and x86 DerivO3CPU, but not AtomicSimpleCPU.

Related

why must gnu binutils be configured for a spefic target. What's going on underneath

I am messing around with creating my own custom gcc toolchain for an arm Cortex-A5 cpu, and I am trying to dive as deeply as possible into each step. I have deliberately avoided using crosstool-ng or other tools to assist, in order to get a better understanding of what is going on in the process of creating a toolchain.
One thing stumples me though. During configuration and building of binutils, I need to specify a target (--target). This target can be either the classic host-tuple (ex: arm-none-linux-gnuabi) or a specific type, something like i686-elf for example.
Why is this target needed? and what does it specifically do with the generated "as" and "ld" programs built by binutils?
For example, if I build it with arm-none-linux-gnueabi, it looks like the resulting "as" program supports every arm instruction set under the sun (armv4t, armv5, e.t.c.).
Is it simply for saving space in the resulting executable? Or is something more going on?
I would get it, if I configured the binutils for a specific instruction set for example. Build me an assembler that understands armv4t instructions.
Looking through the source of binutils and gas specifically, it looks like the host-tuple is selecting some header files located in gas/config/tc*, gas/config/te*. Again, this seems arbitrary, as it is broad categories of systems.
Sorry for the rambling :) I guess my questing can be stated as: Why is binutils not an all-in-one package?
Why is this target needed?
Because there are (many) different target architectures. ARM assembler / code is different from PowerPC is different from x86 etc. In principle it would have been possible to design one tool for all targets, but that was not the approach taken at te time.
Focus was mainly on speed / performance. The executables are small as of today's 'standards' but combining all > 40 architectures and all tools like as, ld, nm etc. would be / have been quite clunky.
Moreover, not only are modern host machines much more powerful, that also applies to the compiler / assembled programs, sometime zillions of lines of (preprocessed) C++ to compile. This means the overall build times shifted much more in the direction of compilation that in the old days.
Only different core families are usually switchable / selectable per options.

Can I use -fstack-check when compiling my Ubuntu 10.04 kernel module?

It looks like my kernel module is performing some stack smashing under heavy loads. Can I use the -fstack-check compile option for kernel modules? It appears as if that compile option causes the compiler to emit additional code, but not link to a library or runtime. Is that correct?
I have a very simplified kernel that does not do much. I can load that simple kernel with and without slub debugging enabled, and it will also load with and without -fstack-check at compile. When I start testing my module, it starts crashing when I use the -fstack-check compile option, whereas it seems to not trip errors with just slub debugging.
A different question (How does the gcc option -fstack-check exactly work?) provided some information but I haven't been able to find examples of people using the -fstack-check option in kernel module compilations.
The stack space inside the Linux kernel is severely limited. Go over your code with a fine comb to check there are no paths using too much in local variables, no alloca() allowed at all. Other than that, the kernel environment is harsh. Check your logic carefully. Add tests for possibly out of range data, trace data to wherever it comes from and make sure it is always as you believe. Data from userland is always a reason for extra paranoia.

Distro provided cross compiler vs custom built gcc

I intend to cross compile for Raspberry Pi, basically a small ARM computer. The host will be an i686 box running Arch Linux.
My first instinct is to use cross compiler provided by Arch Linux, arm-elf-gcc-base and arm-elf-binutils. However, every wiki and post I read seems to use some version of custom gcc build. They seem to spend significant time on cooking their own gcc. Problem is that they never say WHY it is important to use their gcc over another.
Can stock distro provided cross compilers be used for building Raspberry Pi or ARM in general kernels and apps?
Is it necessary to have multiple compilers for ARM architecture? If so, why, since single gcc can support all x86 variants?
If 2), then how can I deduce what target subset is supported by a particular version of gcc?
More general question, what general use cases call for custom gcc build?
Please be as technical as you can, I'd like to know WHY as well as how.
When developers talk about building software (cross compiling) for a different machine (target) compared to their own (host) they use the term toolchain to describe the set of tools necessary to build binary files. That's because when you need to build an executable binary, you need more than a compiler.
You need routines (crt0.o) to initialize runtime according to requirements of operating system and standard libraries. You need standard set of libraries and those libraries need to be aware of the kernel on target because of the system calls API and several os level configurations (f.e. page size) and data structures (f.e. time structures).
On the hardware side, there are different set of ARM architectures. Architectures can be backward compatible but a toolchain by nature is binary and targeted for a specific architecture. You can have the most widespread architecture by default but then that won't be too fruitful for an already constraint environment (embedded device). If you have the latest architecture, then it won't be useful for older architecture based targets.
When you build a binary on your host for your host, compiler can look up all the necessary bits from its own environment or use what's on the host - so most of the above details are invisible to developer. However when you build for a different target than your host type, toolchain must know about hardware, os and standard library details. The way you tell these to toolchain is... by building it according to those details which might require some level of bootstrapping. (or you can do this via extensive set of parameters if toolchain supports / built for it.)
So when there is a generic (stock) cross compile toolchain, it has already some target specifics set and that might not meet your requirements. Please see this recent question about the situation on Ubuntu for an example.

Can I mix arm-eabi with arm-elf?

I have a product which bootloader and application are compiled using a compiler (gnuarm GCC 4.1.1) that generates "arm-elf".
The bootloader and application are segregated in different FLASH memory areas in the linker script.
The application has a feature that enables it to call the bootloader (as a simple c-function with 2 parameters).
I need to be able to upgrade existing products around the world, and I can safely do this using always the same compiler.
Now I'd like to be able to compile this product application using a new GCC version that outputs arm-eabi.
Everything will be fine for new products, where both application and bootloader are compiled using the same toolchain, but what happens with existing products?
If I flash a new application, compiled with GCC 4.6.x and arm-none-eabi, will my application still be able to call the bootloader function from the old arm-elf bootloader?
Furthermore, not directly related to the above question, can I mix object files compiled with arm-elf into a binary compiled with arm-eabi?
EDIT:
I think is good to make clear I am building for a bare metal ARM7, if it makes any difference...
No. An ABI is the magic that makes binaries compatible. The Application Binary Interface determines various conventions on how to communicate with other libraries/applications. For example, an ABI will define calling convention, which makes implicit assumptions about things like which registers are used for passing arguments to C functions, and how to deal with excess arguments.
I don't know the exact differences between EABI and ABI, but you can find some of them by reading up on EABI. Debian's page mentions the syscall convention is different, along with some alignment changes.
Given the above, of course, you cannot mix arm-elf and arm-eabi objects.
The above answer is given on the assumption that you talk to the bootloader code in your main application. Given that the interface may be very simple (just a function call with two parameters), it's possible that it might work. It'd be an interesting experiment to try. However, it is not ** guaranteed** to work.
Please keep in mind you do not have to use EABI. You can generate an arm-elf toolchain with gcc 4.6 just as well as with older versions. Since you're using a binary toolchain on windows, you may have more of a challenge. I'd suggest investigating crosstool-ng, which works quite well on Linux, and may work okay on cygwin to build the appropriate toolchain.
There is always the option of making the call to bootloader in inline assembly, in which case you can adhere to any calling standard you need :).
However, besides the portability issue it introduces, this approach will also make two assumptions about your bootloader and application:
you are able to detect in your app that a particular device has a bootloader built with your non-EABI toolchain, as you can only call the older type bootloader using the assembly code.
the two parameters you mentioned are used as primitive data by your bootloader. Should the bootloader use them, for example, as pointers to structs then you could be facing issues with incorrect alignment, padding and so forth.
I Think that this will be OK. I did a migration something like this myself, from what I remember I only ran into a problem to do with handling division.
This is the best info I can find about the differences, it suggests that if you don't have struct alignment issues, you may be OK.

Modifying gcc to accommodate more registers

I have built a processor using PTLSIM and want to test it, for educational purposes. the main thing of the processor is that it has more than 100 registers that are available for the code, again this is just a proof of concept. but to accommodate the code I would like to compile a benchmark using gcc, but I want to tell gcc that I have 100 registers.
So is there any compiler, even if other than gcc that allow me to modify the registers? If gcc has how can I modify it?
Here is the documentation about specifying the registers, its part of specifying target machine.
PS: I have not used it myself.

Resources