I'm trying to profile my program written with Intel AVX2 instructions using valgrind. The program run smoothly under memcheck. But when I run with callgrind (valgrind --tool=callgrind), it terminates with unrecognized instruction error. I check the release note of Valgrind 3.9.0 and it says Support for Intel AVX2 instructions. This is available only on 64 bit code.. I compile my program with g++-4.8 -std=c++11 -mavx2 -m64 but the error remains. Part of the output is as below:
vex amd64->IR: unhandled instruction bytes: 0x16 0xC5 0xDD 0x64 0xD2 0xC5 0xF5 0xDB
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==6775== valgrind: Unrecognised instruction at address 0x43d1c9.
==6775== at 0x43D1C9: byteslice::ByteSliceColumnBlock<16ul, (byteslice::Direction)1>::Scan(bytes
lice::Comparator, unsigned long, byteslice::BitVectorBlock*, byteslice::Bitwise) const (avxintrin.h
:965)
==6775== by 0x45DEAD: byteslice::Column::Scan(byteslice::Comparator, unsigned long, byteslice::B
itVector*, byteslice::Bitwise) const (column.cpp:113)
==6775== by 0x4017C9: main (simple.cpp:89)
Edit: I find the error depends on optimization level. There's no error with -O0. But error shows up with -O1 and above.
Related
I'm frankly not even sure if this is a thing GDB can do, but no amount of searching I've done so far has given me a 'yes' or 'no'.
When I attempt to debug an application using a GDB installation built for Linux and opened in WSL, it is unable to insert a breakpoint anywhere in the program, claiming it can not access the memory at that address. If I do this from Windows with a GDB built for Windows, this error does not happen (and before you ask why I don't just use the Windows build, it's because I'm having other miscellaneous issues with that one. I may open a question for that as well)
I've got an internal error from GDB as well, but unfortunately, I can't seem to recreate it right now.
I've tried rebuilding GDB, as well as switching to another version of GDB (the same as my Windows build)
I'm using a WSL installation of Ubuntu 20.04 and GDB 10.2, configured as follows:
(gdb) show configuration
This GDB was configured as follows:
configure --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-auto-load-dir=$debugdir:$datadir/auto-load
--with-auto-load-safe-path=$debugdir:$datadir/auto-load
--without-expat
--with-gdb-datadir=/usr/local/share/gdb (relocatable)
--with-jit-reader-dir=/usr/local/lib/gdb (relocatable)
--without-libunwind-ia64
--without-lzma
--without-babeltrace
--without-intel-pt
--without-mpfr
--without-xxhash
--without-python
--without-python-libdir
--without-debuginfod
--without-guile
--disable-source-highlight
--with-separate-debug-dir=/usr/local/lib/debug (relocatable)
To see if this was an issue with the particular program I was debugging, I made a very minimal program in NASM (my original project was also in NASM) and compiled it as follows:
nasm -f win32 -gcv8 Test.asm
gcc -m32 -g Test.obj -o Test.exe
The source assembly is very simple. It just calls printf with a string and integer.
; Test.asm
global _main
extern _printf
section .data
fmt: db "%s, %d", 0x0
string: db "Testing...", 0x0
section .bss
num: resd 1
section .text
_main:
mov dword [num], 28
push dword [num]
push string
push fmt
call _printf
add esp, 12
ret
When attempting to debug this with GDB in WSL, this is the output I get:
(gdb) file Test.exe
Reading symbols from Test.exe...
(gdb) set architecture i386:x86-64
The target architecture is set to "i386:x86-64".
(gdb) start
Temporary breakpoint 1 at 0x401520
Starting program: /mnt/c/NASM/Test.exe
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x401520
EDIT: After poking at it some more, I discovered something that seems important. GDB is only unable to access the memory and place breakpoints when the program is running. Before I've started the program, I can place breakpoints and disassemble freely.
(gdb) disas main
Dump of assembler code for function main:
0x00401520 <+0>: mov DWORD PTR ds:0x405028,0x1c
0x0040152a <+10>: push DWORD PTR ds:0x405028
0x00401530 <+16>: push 0x40300b
0x00401535 <+21>: push 0x403004
0x0040153a <+26>: call 0x40249c <printf>
0x0040153f <+31>: add esp,0xc
0x00401542 <+34>: ret
0x00401543 <+35>: xchg ax,ax
0x00401545 <+37>: xchg ax,ax
0x00401547 <+39>: xchg ax,ax
0x00401549 <+41>: xchg ax,ax
0x0040154b <+43>: xchg ax,ax
0x0040154d <+45>: xchg ax,ax
0x0040154f <+47>: nop
End of assembler dump.
(gdb) b *main+26
Breakpoint 1 at 0x40153a
(gdb) run
Starting program: /mnt/c/NASM/Test.exe
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x40153a
(gdb) disas main
Dump of assembler code for function main:
0x00401520 <+0>: Cannot access memory at address 0
EDIT 2:
I don't know how useful this information might be, but I did find a method that consistently causes an internal error for GDB. Starting execution of the program, then setting the architecture to auto causes an internal error every time I've tried it.
(gdb) file Test.exe
Reading symbols from Test.exe...
(gdb) start
Temporary breakpoint 1 at 0x401520
Starting program: /mnt/c/NASM/Test.exe
warning: Selected architecture i386 is not compatible with reported target architecture i386:x86-64
warning: Architecture rejected target-supplied description
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x401520
(gdb) set architecture auto
warning: Selected architecture i386 is not compatible with reported target architecture i386:x86-64
/mnt/c/Users/Joshua/gdb-10.2/gdb/arch-utils.c:503: internal-error: could not select an architecture automatically
A problem internal to GDB has been detected,
further debugging may prove unreliable.
If the answer to this really is as simple as "GDB built for Linux can't debug applications built for Windows"... I'll be very sad, and also quite annoyed that I was unable to find that info anywhere.
I have an ARMv8 inline assembly segment:
/* get leading 0 of cache ways */
__asm__ __volatile__
(
"CLZ %w[shift], %w[maxWay] \n"
: [shift] "=r" (uiShift)
: [maxWay] "r" (uiMaxWay)
);
When compile by ARM GCC compiler:
Interestingly, if I compile with Linaro compiler, then there is no problem.
Is there a problem in ARM GCC compiler, or in my code?
Unlike x86 where the same compiler can produce x86-32 or x86-64 code with -m32 and -m64, you need a separate build of gcc for ARM vs. AArch64.
ARM gcc accepts -march=armv8-a, but it's still compiling in 32-bit ARM mode, not AArch64.
I can reproduce your problem on the Godbolt compiler explorer with AArch64 gcc and ARM gcc. (And I included an example that uses __builtin_clz(uiShift) instead of inline asm, so it compiles to a clz instruction on either architecture.)
BTW, you could have left out the w size override on both operands, and simply use unsigned int for the input and output. Then the same inline asm would work with both ARM and AArch64. (But __builtin_clz is still better, because the compiler understands what it does. e.g. it knows the result is in the range 0..31, which may enable some optimizations.)
I am trying to compile Apple's Libm (version 2026, tarball here). The only file that is failing to compile properly is Source/Intel/frexp.s because
/<path>/Libm-2026/Source/Intel/frexp.s:239:5:
error: invalid instruction mnemonic 'movsxw'
movsxw 8+(8 + (32 - 8))(%rsp), %eax
^~~~~~
/<path>/Libm-2026/Source/Intel/frexp.s:291:5:
error: invalid instruction mnemonic 'movsxw'
movsxw 8(%rsp), %eax
^~~~~~
Looking around on the Internet I can only find very scanty details of the movsxw instruction but it does appear to exist for i386 architectures. I am running OS X 10.9.3 with a Core i5 processor. The macro __x86_64__ is predefined, however it seems the __i386__ macro is NOT *.
I was under the impression that the x86_64 instruction set was fully compatible with the i386 set. Is this incorrect? I can only assume that the movsxw instruction does not exist in the x86_64 instruction set, thus my question is: what does it do, and what can I replace it with from the x86_64 instruction set?
*Checked with: clang -dM -E -x c /dev/null
The canonical at&t syntax for movsxw is movswl although at least some assembler versions seem to accept the former too.
movsxb : Sign-extend a byte into the second operand
movsxw : Sign-extend a word (16 bits) into the second operand
movsxl : Sign-extend a long (32 bits) into the second operand
movsxw assembles just fine for me in 64-bit mode using gcc/as (4.8.1/2.24). I don't have clang for x86 installed on this machine, but you could try specifying the size of the second operand (i.e. change movsxw to movsxwl, which would be "sign-extend word into long").
I get the help of gcc -march by typing gcc --target-help command:
-march=CPU[,+EXTENSION...]
generate code for CPU and EXTENSION, CPU is one of: i8086,
i186, i286, i386, i486, pentium, pentiumpro, pentiumii,
pentiumiii, pentium4, prescott, nocona, core, core2,
corei7, l1om, k6, k6_2, athlon, k8, amdfam10, generic32,
generic64 EXTENSION is combination of: 8087, 287, 387,
no87, mmx, nommx, sse, sse2, sse3, ssse3, sse4.1, sse4.2,
sse4, nosse, avx, noavx, vmx, smx, xsave, movbe, ept, aes,
pclmul, fma, clflush, syscall, rdtscp, 3dnow, 3dnowa,
sse4a, svme, abm, padlock, fma4, xop, lwp
I tried to set -march=i686+nommx and -march=i686,+nommx, but it's not correct! gcc reported error: error: bad value (i686,+nommx) for -march= switch
I want to build my program to i686 without mmx target, how to set the -march option?
When I compiled a program I was writing in C++ (for the latest Macbook pro, which of course supports the AVX instruction set), I got the following errors. I am using the latest release of g++ obtained from Macports. Do you have any ideas as to what I can do to fix the error without restricting the instruction sets available to the compiler? Is there any package in particular that I should try to update?
g++-mp-4.7 -std=c++11 -Wall -Ofast -march=native -fno-rtti src/raw_to_json.cpp -o bin/raw_to_json.bin
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1831:no such instruction: `vpxor %xmm0, %xmm0,%xmm0'
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1847:no such instruction: `vmovdqa %xmm0, 96(%rsp)'
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1848:no such instruction: `vmovdqa %xmm0, 112(%rsp)'
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1849:no such instruction: `vmovdqa %xmm0, 128(%rsp)'
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1850:no such instruction: `vmovdqa %xmm0, 144(%rsp)'
/var/folders/83/tjczqmxn1y9166m642_rxdlw0000gn/T//cc0hIx0w.s:1851:no such instruction: `vmovdqa %xmm0, 160(%rsp)'
Thanks for the help!
A simpler solution that fixed this problem for me was adding -Wa,-q to the compiler flags. From the man pages for as (version 1.38):
-q
Use the clang(1) integrated assembler instead of the GNU based system assembler.
Fixed thanks to Conrado PLG's answer to his own question here. In short, I had to do the following:
Move or otherwise get rid of the old as, found at /opt/local/bin/../local/libexec/as/x86_64/as.
Copy the script by Vincent Habchi, found here, to /opt/local/bin/../local/libexec/as/x86_64/as.
sudo chmod +x the script.
Note that there may some performance degradation, due to the fact that calling the assembler requires going through a shell script first.