I'm trying to capture outgoing ethernet frames on the local host before they are sent by inserting a kprobe into __dev_queue_xmit().
However, the bytes I extract from the sk_buff structure do not match the subsequently captured packets.
I only attempted it for linear skbs up to now, because I already get unexpected results there.
For example, my kprobe reported the following information during a call to __dev_queue_xmit():
COMM PID TGID LEN DATALEN
chronyd 1058 1058 90 0
3431c4b06a8b3c7c3f2023bd08006500d0a57f040f7f0000000000000000000000000000000000006018d11a0f7f00000100000000000000000000000000000060a67f040f7f0000000000000000000000000000000000004001
COMM is the name of the process which called the function,
PID is the calling thread's id and TGID its thread group id. LEN is the value of (skb->len - skb->data_len) and DATA_LEN is skb->data_len.
Next, the program has copied LEN (in this case 90) bytes starting at skb->data.
Since DATALEN is zero, this is a linear skb. Thus, those bytes should contain exactly the frame which is about to be sent, shouldn't they?
Well, Wireshark subsequently recorded this frame:
0000 34 31 c4 b0 6a 8b 3c 7c 3f 20 23 bd 08 00 45 00
0010 00 4c 83 93 40 00 40 11 d1 a2 c0 a8 b2 18 c0 a8
0020 b2 01 c8 07 00 7b 00 38 e5 b4 23 00 06 20 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050 00 00 38 bc 17 13 12 4a 4c c0
The first 14 bytes, which are forming the ethernet header, match up perfectly as expected. Everything else doesn't match up at all.
The question now is: Why do the bytes not match up?
(Yes, I am certain the frame from Wireshark is indeed the one caused by this call to __dev_queue_xmit(). This is because only background programs using the network were running at the time, so the amount of outgoing traffic was rather small. Additionally, the captured frame contains, as expected, 90 bytes. Also, this frame holds an NTP payload, which is just what you'd expect from chronyd.)
My kernel version is 5.12.6-200.fc33.x86_64.
If you want to try it out yourself or have a closer look at my program, here it is:
from bcc import BPF
from ctypes import cast, POINTER, c_char
prog = """
#include <linux/sched.h>
#include <linux/skbuff.h>
struct xmit_event {
u64 ts;
u32 pid;
u32 tgid;
u32 len;
u32 datalen;
u32 packet_buf_ptr;
char comm[TASK_COMM_LEN];
u64 head;
u64 data;
u64 tail;
u64 end;
};
BPF_PERF_OUTPUT(xmits);
#define PACKET_BUF_SIZE 32768
# define PACKET_BUFS_PER_CPU 15
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
BPF_PERCPU_ARRAY(packet_buf, struct packet_buf, PACKET_BUFS_PER_CPU);
BPF_PERCPU_ARRAY(packet_buf_head, u32, 1);
int kprobe____dev_queue_xmit(struct pt_regs *ctx, struct sk_buff *skb, void *accel_priv) {
if (skb == NULL || skb->data == NULL)
return 0;
struct xmit_event data = { };
u64 both = bpf_get_current_pid_tgid();
data.pid = both;
if (data.pid == 0)
return 0;
data.tgid = both >> 32;
data.ts = bpf_ktime_get_ns();
bpf_get_current_comm(&data.comm, sizeof(data.comm));
data.len = skb->len;
// Copy packet contents
int slot = 0;
u32 *packet_buf_ptr = packet_buf_head.lookup(&slot);
if (packet_buf_ptr == NULL)
return 0;
u32 buf_head = *packet_buf_ptr;
u32 next_buf_head = (buf_head + 1) % PACKET_BUFS_PER_CPU;
packet_buf_head.update(&slot, &next_buf_head);
struct packet_buf *ringbuf = packet_buf.lookup(&buf_head);
if (ringbuf == NULL)
return 0;
u32 skb_data_len = skb->data_len;
u32 headlen = data.len - skb_data_len;
headlen &= 0xffffff; // Useless, but validator demands it because "this unsigned(!) variable could otherwise be negative"
bpf_probe_read_kernel(ringbuf->data, headlen < PACKET_BUF_SIZE ? headlen : PACKET_BUF_SIZE, skb->data);
data.packet_buf_ptr = buf_head;
data.len = headlen;
data.datalen = skb_data_len;
data.head = (u64) skb->head;
data.data = (u64) skb->data;
data.tail = (u64) skb->tail;
data.end = (u64) skb->end;
xmits.perf_submit(ctx, &data, sizeof(data));
return 0;
}
"""
global b
def xmit_received(cpu, data, size):
global b
global py_packet_buf
ev = b["xmits"].event(data)
print("%-18d %-25s %-8d %-8d %-10d %-10d %-12d %-12d %-12d %-12d" % (ev.ts, ev.comm.decode(), ev.pid, ev.tgid, ev.len, ev.datalen, ev.head, ev.data, ev.tail, ev.end))
bs = cast(py_packet_buf[ev.packet_buf_ptr][cpu].data, POINTER(c_char))[:ev.len]
c = bytes(bs)
print(c.hex())
def observe_kernel():
# load BPF program
global b
b = BPF(text=prog)
print("%-18s %-25s %-8s %-8s %-10s %-10s %-12s %-12s %-12s %-12s" % ("TS", "COMM", "PID", "TGID", "LEN", "DATALEN", "HEAD", "DATA", "TAIL", "END"))
b["xmits"].open_perf_buffer(xmit_received)
global py_packet_buf
py_packet_buf = b["packet_buf"]
try:
while True:
b.perf_buffer_poll()
except KeyboardInterrupt:
print("Kernel observer thread stopped.")
observe_kernel()
Found the issue.
I needed to replace
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
with
struct packet_buf {
unsigned char data[PACKET_BUF_SIZE];
};
I, however, do not understand how signedness makes a difference when I am not performing comparisons or arithmetic operations with this data.
Related
I would like to create an icon manually with CreateIconIndirect as follows:
HDC hDC = ::CreateCompatibleDC( nullptr );
BITMAPINFO bmiMask = {};
bmiMask.bmiHeader.biSize = sizeof( bmiMask.bmiHeader );
bmiMask.bmiHeader.biWidth = 16;
bmiMask.bmiHeader.biHeight = -16; // starts with top row
bmiMask.bmiHeader.biPlanes = 1;
bmiMask.bmiHeader.biBitCount = 32;
BYTE *byMask = nullptr;
HBITMAP hbmMask = ::CreateDIBSection( hDC, &bmiMask, DIB_RGB_COLORS,
reinterpret_cast< void** >( &byMask ),
nullptr, 0 );
BYTE bgraMask[] = { 0x00, 0x00, 0x00, 0x00 };
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byMask[ i * 4 + j ] = bgraMask[ j ];
byMask[ 0 ] = 0x00; byMask[ 1 ] = 0x00; byMask[ 2 ] = 0x00; byMask[ 3 ] = 0x00;
BITMAPINFO bmiColor = {};
bmiColor.bmiHeader.biSize = sizeof( bmiColor.bmiHeader );
bmiColor.bmiHeader.biWidth = 16;
bmiColor.bmiHeader.biHeight = -16; // starts with top row
bmiColor.bmiHeader.biPlanes = 1;
bmiColor.bmiHeader.biBitCount = 32;
BYTE *byColor = nullptr;
HBITMAP hbmColor = ::CreateDIBSection( hDC, &bmiColor, DIB_RGB_COLORS,
reinterpret_cast< void** >( &byColor ),
nullptr, 0 );
BYTE bgraColor[] = { 0xff, 0xff, 0xff, 0xff };
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byColor[ i * 4 + j ] = bgraColor[ j ];
byColor[ 0 ] = 0x00; byColor[ 1 ] = 0x00; byColor[ 2 ] = 0x00; byColor[ 3 ] = 0x00;
ICONINFO ii = {};
ii.fIcon = TRUE;
ii.xHotspot = ii.yHotspot = 0;
ii.hbmMask = hbmMask;
ii.hbmColor = hbmColor;
HICON hIcon = ::CreateIconIndirect( &ii );
::SendMessage( hwndDialog, WM_SETICON, ICON_SMALL,
reinterpret_cast< LPARAM >( hIcon ) );
According to MSDN ( https://learn.microsoft.com/en-us/previous-versions/dd183376(v=vs.85) ) each pixel's data consists of 4 bytes: blue, green, red, and an unused byte.
I made some experiments with the data by changing the values of byMask and byColor, then doing a screen shot and reading the exact RGB value in MS Paint. (Each time I placed Notepad directly behind the application's windows to have a constant background for eventual transparency / alpha channel effects.)
First I changed only the top left corner, while the rest of the data was 0x00 for the mask and 0xff for the color. The result: most of the icon was white (as expected) and the top left pixel had the following colors:
MR MA CR CA OR OG OB
00 00 00 ff 00 00 00
00 00 00 00 d3 e9 fe
00 00 ff 00 d3 e9 fe (a)
ff 00 00 00 d3 e9 fe
ff ff 00 00 d3 e9 fe
00 ff 00 00 d3 e9 fe
00 00 ff ff ff 00 00 (b)
ff 00 ff ff ff 00 00
ff ff ff ff ff 00 00
00 ff ff ff ff 00 00
After that I changed each pixel in both bitmaps to the same value. (And checked with MS Paint's fill tool that the picture has only one color.)
MR MA CR CA OR OG OB
00 00 00 ff 00 00 00
00 00 00 00 00 00 00
00 00 ff 00 ff 00 00 (c)
ff 00 00 00 00 00 00
ff ff 00 00 00 00 00
00 ff 00 00 00 00 00
00 00 ff ff ff 00 00 (d)
ff 00 ff ff ff 00 00 (d)
ff ff ff ff ff 00 00 (d)
00 ff ff ff ff 00 00 (d)
Abbreviations: M=mask C=Color O=Observation R=red G=green B=blue A=alpha/4.byte
I don't understand the following (letters refer to rows in the above table):
If the 4. byte is ignored, and not an alpha channel, why is pixel (a) not ff0000 like (b)?
If I change other pixels only, why does the first one change: (c) vs (a)?
Why is the result of all rows marked with (d) the same, why does the mask have no effect here?
Is there a bug in my code? Am I looking at the wrong MSDN page?
OK, I figured it. For historical reasons, there are two types of 32 bit/pixel bitmaps: those that use 3 bytes for the 3 colors, and the last one is unused (usually set to 0), and those that use the fourth byte to represent the alpha channel.
The catch is that the type is not indicated by any field but Windows takes it as an old format bitmap (without alpha) if all 4th bytes are zero, and treats it as a new format (with alpha) one if any fourth byte differs from zero. So this will result in a fully black icon:
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byColor[ i * 4 + j ] = 0x00;
But if I add this line to the above, the whole icon becomes transparent*:
byColor[ 7 * 16 + 7 + 0 ] = 0x01;
* Except for the pixel in row 7 column 7, which will be only 255/256 transparent.
The following dBase code invokes a win32 API function to convert a local DST time to a system time. The first parameter set to "null" means that the function takes the current active time zone. What value do I have to put instead of "null" to specify another time zone?
The following page refers to lpTimeZoneInformation as a pointer to a TIME_ZONE_INFORMATION structure that specifies the time zone for the localtime input to this function (lpLocalTime), but is is unclear to me what kind of pointer this is.
I have tried 'Brisbane', 'E. Australia Standard Time', '10:00' and '+10:00' but none returns the expected value.
https://learn.microsoft.com/en-us/windows/win32/api/timezoneapi/nf-timezoneapi-tzspecificlocaltimetosystemtime
ITOH and HTOI are Integer TO Hex and vice-versa conversion functions
localtime and systemtime structures work, I tried to replicate that for the time_zone_information part but without success so far
As it stands, the return value is 13.20
Thanks for any help!
d=new date("31/12/2020 5:08")
offset1=getLocalTimeOffset(d)/60
function getLocalTimeOffset(d_in)
// todo typechecking of the parameter
extern clogical TzSpecificLocalTimeToSystemTime(cptr,cptr,cptr) kernel32
extern culong GetLastError(cvoid) kernel32
local systemtime,localtime,tmp
localtime = replicate(chr(0),16)
systemtime = replicate(chr(0),16)
TZI = replicate(chr(0),16)
TZIa=itoh(-600,4)
TZIb=itoh(-60,4)
TZI.setbyte(1,htoi(left(TZIa,2)))
TZI.setbyte(0,htoi(right(TZIa,2)))
TZI.setbyte(9,htoi(left(TZIb,2)))
TZI.setbyte(8,htoi(right(TZIb,2)))
tmp = itoh(d_in.year,4)
localtime.setbyte(1,htoi(left(tmp,2))) // fill the systemtime structure
localtime.setbyte(0,htoi(right(tmp,2))) // seconds and ms are of no concern
localtime.setbyte(2,d_in.month+1)
localtime.setbyte(4,d_in.day)
localtime.setbyte(6,d_in.date)
localtime.setbyte(8,d_in.hour)
localtime.setbyte(10,d_in.minute)
if TzSpecificLocalTimeToSystemTime(TZI,localtime,systemtime) = 0
tmp = getlasterror() ; ? "Error: "+tmp ; return 9999
endif
tmp = sign(d_in.date-systemtime.getbyte(6))*24*60 // consider day boundary
if (d_in.date = 1 or systemtime.getbyte(6) = 1) and (d_in.month+1 <> systemtime.getbyte(2))
tmp = -tmp // adjust for month boundaries
endif
tmp += (d_in.hour - systemtime.getbyte(8))*60
tmp += d_in.minute - systemtime.getbyte(10)
return tmp
(Too long for a comment.) The first parameter to TzSpecificLocalTimeToSystemTime must be either NULL, or otherwise point to a TIME_ZONE_INFORMATION structure, filled-in with the target timezone data from HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones per Remarks on the same page.
In OP's case, Brisbane falls under the E. Australia Standard Time key, and TZI data parses as:
typedef struct _REG_TZI_FORMAT
{
LONG Bias; // -600 A8 FD FF FF
LONG StandardBias; // 0 00 00 00 00
LONG DaylightBias; // -60 C4 FF FF FF
SYSTEMTIME StandardDate; // n/a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SYSTEMTIME DaylightDate; // n/a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
} REG_TZI_FORMAT;
Following is the C code to fill-in a TIME_ZONE_INFORMATION structure with the same data, and successfully convert a Brisbane local time to UTC:
#include <windows.h>
#include <stdio.h>
int main()
{
TIME_ZONE_INFORMATION tzEAST = // [offset] bytes
{
-600, // LONG Bias; [0] A8 FD FF FF
{ 0 }, // WCHAR StandardName[32]; [4] 00 .. 00
{ 0 }, // SYSTEMTIME StandardDate; [68] 00 .. 00
0, // LONG StandardBias; [84] 00 00 00 00
{ 0 }, // WCHAR DaylightName[32]; [88] 00 .. 00
{ 0 }, // SYSTEMTIME DaylightDate; [152] 00 .. 00
-60 // LONG DaylightBias; [168] C4 FF FF FF
};
SYSTEMTIME stEAST = { 2021, 1, 1, 4, 12 }, stUTC = { 0 };
if(!TzSpecificLocalTimeToSystemTime(&tzEAST, &stEAST, &stUTC)) return 1;
printf("EAST %d-%02d-%02d %02d:%02d:%02d = UTC %d-%02d-%02d %02d:%02d:%02d\n",
stEAST.wYear, stEAST.wMonth, stEAST.wDay, stEAST.wHour, stEAST.wMinute, stEAST.wSecond,
stUTC.wYear, stUTC.wMonth, stUTC.wDay, stUTC.wHour, stUTC.wMinute, stUTC.wSecond);
return 0;
}
Output:
EAST 2021-01-04 12:00:00 = UTC 2021-01-04 02:00:00
[ EDIT ] Following is my guess of what the dBase code might look like. Just a guess, and nothing more than a guess, since I don't actually know dBase beyond what's been posted here.
tzBias = itoh(-600, 8)
tzDstBias = itoh( -60, 8)
tzi = replicate(chr(0), 86) // 86*2 = 172 = sizeof TIME_ZONE_INFORMATION
tzi.setbyte( 0, htoi(substring( tzBias, 6, 8))) // [ 0] LONG Bias;
tzi.setbyte( 1, htoi(substring( tzBias, 4, 6)))
tzi.setbyte( 2, htoi(substring( tzBias, 2, 4)))
tzi.setbyte( 3, htoi(substring( tzBias, 0, 2)))
tzi.setbyte(168, htoi(substring(tzDstBias, 6, 8))) // [168] LONG DaylightBias;
tzi.setbyte(169, htoi(substring(tzDstBias, 4, 6)))
tzi.setbyte(170, htoi(substring(tzDstBias, 2, 4)))
tzi.setbyte(171, htoi(substring(tzDstBias, 0, 2)))
if TzSpecificLocalTimeToSystemTime(tzi, localtime, systemtime) = 0 // ...
[ EDIT #2 courtesy OP ] The working dBase code to fill the structure is the following:
tzi.setbyte( 0, htoi(substr(tzBias, 7, 2))) // [ 0] LONG Bias
tzi.setbyte( 1, htoi(substr(tzBias, 5, 2)))
tzi.setbyte( 2, htoi(substr(tzBias, 3, 2)))
tzi.setbyte( 3, htoi(substr(tzBias, 1, 2)))
tzi.setbyte(168, htoi(substr(tzDstBias, 7,2))) // [168] LONG DaylightBias
tzi.setbyte(169, htoi(substr(tzDstBias, 5,2)))
tzi.setbyte(170, htoi(substr(tzDstBias, 3,2)))
tzi.setbyte(171, htoi(substr(tzDstBias, 1,2)))
Assume we have a 64-bit x86 machine, which is little-endian and therefore stores the least-significant byte of a word in the byte with the lowest address.
Assuming standard alignment rules for a 64-bit x86 Linux C compiler.
Consider
File 1:
#include <stdio.h>
struct cs {
int count;
unsigned short flags;
};
struct cs gcount;
extern void add_counter( int n );
int main(int c, char *argv[]);
int main(int c, char *argv[]) {
gcount.flags = 0xe700;
gcount.count = 1;
add_counter(42);
printf("count =%d\n", gcount.count);
return 0;
}
File 2:
struct cs {
unsigned short flags;
int count;
};
struct cs gcount = {0,0};
void add_counter (int n) {
gcount.count +=n;
}
If compiled the output is 1.
Explanation:
count is defined as a strong global int the second file is thus
initialized to {0,0}, here the order doesn't matter yet since it's
just all zeroes.
A struct / type is defined per compilation unit so the first file uses
the first definition to write to the struct meaning
gcount.flags = 0xe700; gcount.count=1;
cause the memory to look like
[e7 00 | 00 00 00 01] where (in little endian) the left is the top and
the right is the bottom of memory.
(there's no padding between the two fields since short is at the end,
sizeof will report 8B though)
when calling add_counter(42), the second file will use the second
definition of cs and look at the memory as
[e7 00 00 00 | 00 01]
Now there's a 2B padding in between the two fields and the write
access to the count will thus affect the range
[e7 00 00 00 | 00 01]
42 is 0x2a in hexadecimal (2*16 + 10) and will thus result in
[e7 2a 00 00 | 00 01]
converting this back to the view the first file has we get
[e7 2a | 00 00 00 01]
and thus the result is 1 instead of the expected 43.
Now I do get the general gist but I'm a bit confused about why we get [*e7 2a* 00 00 | 00 01] when adding 42=0x2a and not [*e7 00 00 2a | 00 01].
I'm expecting [*e7 00 00 2a | 00 01] because we are using little-endian, meaning, the most right bit is the LSB. So e7 would actualy represent the most significant 8 bits here.
My disparaging comments about the exercise itself notwithstanding, it is possible to interpret the question as a simpler one about byte ordering. In that sense, the issue is with this assertion:
little-endian, meaning, the most right bit is the LSB.
Little-endian means that the bytes are ordered from least significant to most significant. The term having been coined in English and English being written left-to-right, that means the most left byte is the LSB in little-endian ordering.
I have an issue with a stack-use-after-scope with error with the C++ Armadillo library within an OpenMP blog in an R package and I cannot figure out what is wrong. The complete gcc log is here from the CRAN GCC ASAN check of the R-package. I have have kept the relevant part of the log below
==33791==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd03364940 at pc 0x7ff8127abc07 bp 0x7ffd03364680 sp 0x7ffd03364670
WRITE of size 4 at 0x7ffd03364940 thread T0
#0 0x7ff8127abc06 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool) /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215
#1 0x7ff8129fb0c2 in GMA<logistic>::solve() [clone ._omp_fn.0] /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:411
#2 0x7ff825ae2cde in GOMP_parallel (/lib64/libgomp.so.1+0xdcde)
#3 0x7ff812a0c9f8 in GMA<logistic>::solve() ddhazard/GMA_solver.cpp:83
#4 0x7ff81276421d in ddhazard_fit_cpp(...
Address 0x7ffd03364940 is located in stack of thread T0 at offset 416 in frame
#0 0x7ff8129fa82f in GMA<logistic>::solve() [clone ._omp_fn.0] ddhazard/GMA_solver.cpp:83
This frame has 5 object(s):
[32, 40) 'dest'
[96, 104) 'src'
[160, 176) 'ans'
[224, 384) 'my_X_cross'
[416, 576) '<unknown>' <== Memory access at offset 416 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool)
Shadow bytes around the buggy address:
0x1000206648d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648f0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
0x100020664900: 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f2 f2 f2 f2
0x100020664910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020664920: 00 00 00 00 f2 f2 f2 f2[f8]f8 f8 f8 f8 f8 f8 f8
0x100020664930: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f3 f3 f3 f3
0x100020664940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==33791==ABORTING
The WRITE that causes the error is in the dynamichazard/src/ddhazard/GMA_solver.cpp and particularly this OpenMP block
#ifdef _OPENMP
int n_threads = std::max(1, std::min(omp_get_max_threads(),
(int)r_set.n_elem / 1000 + 1));
#pragma omp parallel num_threads(n_threads) if(n_threads > 1)
{
#endif
arma::mat my_X_cross(q, q, arma::fill::zeros);
#ifdef _OPENMP
#pragma omp for schedule(static)
#endif
for(arma::uword i = 0; i < r_set.n_elem; i++){
auto trunc_eta = T::truncate_eta(
is_event[i], eta[i], exp(eta[i]), at_risk_length[i]);
h_1d[i] = w[i] * T::d_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
double h_2d_neg = - w[i] * T::dd_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
sym_mat_rank_one_update(h_2d_neg, X_t.unsafe_col(i), my_X_cross);
}
#ifdef _OPENMP
#pragma omp critical(gma_lock)
{
#endif
X_cross += my_X_cross;
#ifdef _OPENMP
}
}
#endif
As far as I can tell, the error is at the X_t.unsafe_col(i) call in the call to sym_mat_rank_one_update. The declaration of the function is
void sym_mat_rank_one_update(const double, const arma::vec&, arma::mat&);
It should trigger a call to the arma::col<double> constructor in line 411 of include/armadillo_bits/Col_meat.hpp which inherit the arma::mat<double> constructor in line 1215 of include/armadillo_bits/Mat_meat.hpp. I gather this is where the 4 bit write occurs with one of the unsigned int since the arma::mat<double> constructor is
template<typename eT>
inline
Mat<eT>::Mat(eT* aux_mem, const uword aux_n_rows, const uword aux_n_cols, const bool copy_aux_mem, const bool strict)
: n_rows ( aux_n_rows )
, n_cols ( aux_n_cols )
, n_elem ( aux_n_rows*aux_n_cols )
, vec_state( 0 )
, mem_state( copy_aux_mem ? 0 : ( strict ? 2 : 1 ) )
, mem ( copy_aux_mem ? 0 : aux_mem )
{
arma_extra_debug_sigprint_this(this);
if(copy_aux_mem == true)
{
init_cold();
arrayops::copy( memptr(), aux_mem, n_elem );
}
}
where
template<typename eT>
class Mat : public Base< eT, Mat<eT> >
{
public:
typedef eT elem_type; //!< the type of elements stored in the matrix
typedef typename get_pod_type<eT>::result pod_type; //!< if eT is std::complex<T>, pod_type is T; otherwise pod_type is eT
const uword n_rows; //!< number of rows (read-only)
const uword n_cols; //!< number of columns (read-only)
const uword n_elem; //!< number of elements (read-only)
const uhword vec_state; //!< 0: matrix layout; 1: column vector layout; 2: row vector layout
const uhword mem_state;
...
See include/armadillo_bits/Mat_bones.hpp and notice that arma::uword is unsigned int. However, I cannot figure out why this would cause a stack-use-after-scope.
A similar error is in the Morpho package. See the current CRAN log here and src/createL.cpp.
Setup
The above check is on CRAN. As far as I can tell, it is with gcc 7.2 on Fedora 26 with the following config.site used to build R
CXX="g++ -fsanitize=address,undefined,bounds-strict -fno-omit-frame-pointer"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -fsanitize=address"
FFLAGS="-g -O2 -mtune=native"
FCFLAGS="-g -O2 -mtune=native"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native"
MAIN_LDFLAGS=-fsanitize=address,undefined
Further, the following ~/.R/Makevars is used
CC = gcc -std=gnu99 -fsanitize=address,undefined -fno-omit-frame-pointer
F77 = gfortran -fsanitize=address
FC = gfortran -fsanitize=address
FCFLAGS = -g -O2 -mtune=native -fbounds-check
FFLAGS = -g -O2 -mtune=native -fbounds-check
The error does not happen with clang 5.0.0 and valgrind on the same machine. Further, I cannot reproduce them on a local Ubuntu 17.04 with gcc version 6.3 and clang version 4.0.0.
Minimal, Complete, and Verifiable example
I will work on making one.
WinAPI uses wchar_t buffers. As I understand we need to use UTF-16 to encode all our arguments to WinAPI.
We have two versions of UTF-16: UTF-16be and UTF-16le. Let encode a string "Example" 0x45 0x78 0x61 0x6d 0x70 0x6c 0x65. With UTF-16be bytes should be placed as this: 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65. With UTF-16le it should be 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00. (We are omitting BOM). Byte representations of the same string are different.
According to the docs Windows uses UTF-16le. This means that we should encode all strings with UTF-16le or it would not work.
At the same time my compiler (VS2015) uses UTF-16be for the strings that I hard coded into my code (smth like L"my test string"). But WinAPI works well with these strings. Why it works? What am I missing?
Update 1:
To test byte representation of hard coded strings I used following code:
std::string charToHex(wchar_t ch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result(4, ' ');
result[0] = alphabet[static_cast<unsigned int>((ch & 0xf000) >> 12)];
result[1] = alphabet[static_cast<unsigned int>((ch & 0xf00) >> 8)];
result[2] = alphabet[static_cast<unsigned int>((ch & 0xf0) >> 4)];
result[3] = alphabet[static_cast<unsigned int>(ch & 0xf)];
return std::move(result);
}
Little endian or big endian describes the way that variables of more than 8 bits are stored in memory. The test you have devised doesn't test memory layout, it's working with wchar_t types directly; the upper bits of an integer type are always the upper bits, no matter if the CPU is big endian or little endian!
This modification to your code will show how it really works.
std::string charToHex(wchar_t * pch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result;
unsigned char * pbytes = static_cast<unsigned char *>(pch);
for (int i = 0; i < sizeof(wchar_t); ++i)
{
result.push_back(alphabet[(pbytes[i] & 0xf0) >> 4];
result.push_back(alphabet[pbytes[i] & 0x0f];
}
return std::move(result);
}