is intel icpc openmp slower than icc openmp - openmp

I am doing a 3D simulation of a diffusion-reaction model using finite difference. The system has over 8 million nodes in size. To solve the problem, both icc + openmp and icpc + openmp have been used. As a result, the icc + openmp turns out to be as 3-fold faster than that of icpc-openmp. The huge difference in computational time is confusing. So I am asking if icpc-openmp is actually slower than icc-openmp.
For more information:
intel compiler version: 15.0.1 20141023
all reaction equations were solved using c functions (not class objects).
Makefile options:
a) icc+openmp:
CC = icc
CFLAGS = -g -Wall -Ilib -O3 -openmp
LDFLAGS= -lz
Main: $(patsubst %.c,%.o,$(wildcard lib/*.c))
b) icpc+openmp:
CXX=icpc
CFLAGS = -Wall -Ilib -O3 -openmp -std=c++11
CXXFLAGS = -Wall -Ilib -O3 -openmp -std=c++11
LDFLAGS= -lz
main: $(patsubst %.cpp,%.o,$(wildcard lib/*.cpp))
Thank you!

Related

How to Compile a c program in clang exactly the same as -O2 and -O3 of gcc

I want to evaluate an AVX2 program written in c-intrinsics using gcc 5.4.0 and clang 3.8 for compiling and using perf , valgrind and IACA for evaluating and analysis. I Exactly want the same optimization approach so I read this related question clang optimization and this page for gcc optimization option for gcc but I still doubted .
gcc -O2 and gcc -O3 is my basis and want the same in clang since Clang do auto-vectorization in -O2 and I don't want it when comparing the results with gcc -O2 and want it when -O3 is enabled in gcc. so the question is what command should I use in clang that is corresponded to these commands in gcc :
First:
compile :
gcc -Wall -O2 -march=native -masm=intel -c -S "%f"
build:
gcc -Wall -O2 -mavx2 -o "%e" "%f"
Second:
compile :
gcc -Wall -O3 -march=native -masm=intel -c -S "%f"
build:
gcc -Wall -O3 -mavx2 -o "%e" "%f"

Stripping unused library functions / dead code from a static executable

I'm compiling code for an ARM Cortex-M0 mcu with GCC arm-none-eabi-g++ (4.8.3).
All is fine, but I noticed that when I include and use any function from cstdlib, all functions from that file are included as well. How to get rid of them?
I'm calling malloc() and free() only, but the resulting ELF has system() and isatty() machine code as well.
The mcu has only 32kB flash, so ~0.7kB ballast matters, especially if this keeps happening for other headers.
Right now I use -ffunction-sections -fdata-sections for compiling and -Wl,--gc-sections -Wl,--static while linking, as follows:
arm-none-eabi-g++ -c --std=c++11 -Os -I. -Ilpc1xxx -Idrivers -Wall -mthumb \
-ffunction-sections -fdata-sections -fmessage-length=0 -mcpu=cortex-m0 \
-DTARGET=LPC11xx -fno-builtin -flto -fno-exceptions -o main.o main.cpp
arm-none-eabi-gcc -c --std=c11 -Os -I. -Ilpc1xxx -Idrivers -Wall -mthumb \
-ffunction-sections -fdata-sections -fmessage-length=0 -mcpu=cortex-m0 \
-DTARGET=LPC11xx -fno-builtin -flto -o core_cm0.o lpc1xxx/nxp/core_cm0.c
arm-none-eabi-gcc -nostartfiles -mcpu=cortex-m0 -mthumb -Wl,--gc-sections -flto \
-Os -Wl,--static -T lpc1xxx/memory.ld -o firmware.elf main.o core_cm0.o \
libaeabi-cortexm0/libaeabi-cortexm0.a LPC11xx_handlers.o LPC1xxx_startup.o
Edit: Warning: The -flto flag in my example is wrong – somehow it discards interrupt routines.
The result is that when I do arm-none-eabi-objdump -t firmware.elf, I get among others:
00000fbc g F .text 0000002c _isatty
00001798 g F .text 00000018 fclose
00000e4c g F .text 00000030 _kill
00000e7c g F .text 00000018 _exit
00000fe8 g F .text 00000050 _system
These functions are clearly redundant (and quite useless on mcu at all), yet GCC keeps them in the executable. There are no calls to them, these symbols are not referenced anywhere. It's effectively dead code.
How to get rid of them? Some extra compiler/linker flags?
Edit:
Minimal code to reproduce my problem:
#include <cstdlib>
int main(){
[[gnu::unused]] volatile void * x = malloc(1);
return 0;
}
Command used to compile that:
arm-none-eabi-g++ --std=c++11 -Os -Wall -mthumb -ffunction-sections
-fdata-sections -fmessage-length=0 -mcpu=cortex-m0 -fno-builtin -flto
-fno-exceptions -Wl,--static -Wl,--gc-sections -o main.elf main.cpp
And the main.elf file still has all stdlib bloat.
Using -ffunction-sections is the right thing here, but the issue is that the object file that provides malloc and free is built without it (either LPC11xx_handlers.o, LPC1xxx_startup.o or some of the object files within libaeabi-cortexm0.a). In that case, the linker can only include the whole object file (or with -Wl,--gc-sections, the whole section) that contain functions you need.
The layout of functions in object files and sections is the only thing that actually matters, not which function is defined in the same header as another function.
So to fix your issue, rebuild your standard library files with -ffunction-sections -fdata-sections.

gfortran make circular dependency dropped

I'm running a makefile using GNU Make 4.1 on windows. I've seen a lot of SO links about this topic, but they all seem to be for C or c++. I'm not sure if the same rules apply, and since I'm using windows, syntax seems to be a bit different too.
Here's my make file:
FC = gfortran
FCFLAGS = -O0 -Og -Wall -pedantic -fbacktrace -fcheck=all
# FCFLAGS = -O2
MODDIR = "bin"
FCFLAGS += -J$(MODDIR) -fopenmp -fimplicit-none -Wuninitialized
SRCS_C =\
gridFun.f90 \
test.f90
OBJS_C = $(SRCS_C:.c=.o)
TARGET = test
all: $(TARGET)
$(TARGET): $(OBJS_C)
$(FC) -o $# $(FCFLAGS) $(OBJS_C)
$(OBJS_C): $(SRCS_C)
$(FC) $(FCFLAGS) -c $(SRCS_C)
cleanMod:
del *.mod
cleanObj:
del *.o
I run my make file with
gmake
and I've noticed that
mingw32-make
seems to produce the same result. The error I'm getting is:
gmake: Circular gridFun.f90 <- gridFun.f90 dependency dropped.
gmake: Circular test.f90 <- gridFun.f90 dependency dropped.
gmake: Circular test.f90 <- test.f90 dependency dropped.
gfortran -O0 -Og -Wall -pedantic -fbacktrace -fcheck=all -J"bin" -fopenmp -fimpl
icit-none -Wuninitialized -c gridFun.f90 test.f90
gfortran -o test -O0 -Og -Wall -pedantic -fbacktrace -fcheck=all -J"bin" -fopenm
p -fimplicit-none -Wuninitialized gridFun.f90 test.f90
Any help about how to fix this, and maybe an explanation would be greatly appreciated!
You don't have c sources so the _C suffix on variables is inaccurate (harmless but confusing).
The real issue is with this OBJS_C = $(SRCS_C:.c=.o) substitution ref.
That's expecting to change .c files into .o files but you don't have any .c files.
Change that to OBJS_C = $(SRCS_C:.f90=.o) and it should work for you.

Bootloader issues due to GCC-4.7.0

This is a weird problem. I am having a custom bootloader for MIPS 34Kc processor which was consistently booting my target. This was compiled with GCC-4.2.4. Recently we had moved to GCC-4.7.0 and the bootloader is failing to boot the target all the time.
The optimizations are as below:
W_OPTS = -Wimplicit -Wformat -Werror
CC_OPTS = -c -O -mips32r2 $(W_OPTS) -fomit-frame-pointer -fno-pic -nostdinc -mno-abicalls
CC_OPTS_16 = -c -O -mips16 $(W_OPTS) -fomit-frame-pointer -fno-pic -nostdinc -mno-abicalls
CC_OPTS_A = $(CC_OPTS) -D_ASSEMBLER_
Any pointers to debug this issue would be helpful.

options superseded in gcc

In a Makefile of a library I am trying to build, there are a few lines specify the options to gcc:
CFLAGS += -I$(CURDIR) -pedantic -std=c89 -O3
CFLAGS += -Wall -Wno-unused-function -Wno-long-long
CFLAGS += $(if $(DEBUG), -O0 -g)
If DEBUG exists, there will be both -O3 and -O0 -g in CFLAGS. But -O0 and -O3 cannot be used at the same time. Will the one specified later supersede the one earlier?
Thanks and regards!
From the manpage:
If you use multiple -O options, with or without level numbers, the
last such option is the one that is effective.

Resources