CreateIconIndirect - interpretation of pixel data - winapi

I would like to create an icon manually with CreateIconIndirect as follows:
HDC hDC = ::CreateCompatibleDC( nullptr );
BITMAPINFO bmiMask = {};
bmiMask.bmiHeader.biSize = sizeof( bmiMask.bmiHeader );
bmiMask.bmiHeader.biWidth = 16;
bmiMask.bmiHeader.biHeight = -16; // starts with top row
bmiMask.bmiHeader.biPlanes = 1;
bmiMask.bmiHeader.biBitCount = 32;
BYTE *byMask = nullptr;
HBITMAP hbmMask = ::CreateDIBSection( hDC, &bmiMask, DIB_RGB_COLORS,
reinterpret_cast< void** >( &byMask ),
nullptr, 0 );
BYTE bgraMask[] = { 0x00, 0x00, 0x00, 0x00 };
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byMask[ i * 4 + j ] = bgraMask[ j ];
byMask[ 0 ] = 0x00; byMask[ 1 ] = 0x00; byMask[ 2 ] = 0x00; byMask[ 3 ] = 0x00;
BITMAPINFO bmiColor = {};
bmiColor.bmiHeader.biSize = sizeof( bmiColor.bmiHeader );
bmiColor.bmiHeader.biWidth = 16;
bmiColor.bmiHeader.biHeight = -16; // starts with top row
bmiColor.bmiHeader.biPlanes = 1;
bmiColor.bmiHeader.biBitCount = 32;
BYTE *byColor = nullptr;
HBITMAP hbmColor = ::CreateDIBSection( hDC, &bmiColor, DIB_RGB_COLORS,
reinterpret_cast< void** >( &byColor ),
nullptr, 0 );
BYTE bgraColor[] = { 0xff, 0xff, 0xff, 0xff };
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byColor[ i * 4 + j ] = bgraColor[ j ];
byColor[ 0 ] = 0x00; byColor[ 1 ] = 0x00; byColor[ 2 ] = 0x00; byColor[ 3 ] = 0x00;
ICONINFO ii = {};
ii.fIcon = TRUE;
ii.xHotspot = ii.yHotspot = 0;
ii.hbmMask = hbmMask;
ii.hbmColor = hbmColor;
HICON hIcon = ::CreateIconIndirect( &ii );
::SendMessage( hwndDialog, WM_SETICON, ICON_SMALL,
reinterpret_cast< LPARAM >( hIcon ) );
According to MSDN ( https://learn.microsoft.com/en-us/previous-versions/dd183376(v=vs.85) ) each pixel's data consists of 4 bytes: blue, green, red, and an unused byte.
I made some experiments with the data by changing the values of byMask and byColor, then doing a screen shot and reading the exact RGB value in MS Paint. (Each time I placed Notepad directly behind the application's windows to have a constant background for eventual transparency / alpha channel effects.)
First I changed only the top left corner, while the rest of the data was 0x00 for the mask and 0xff for the color. The result: most of the icon was white (as expected) and the top left pixel had the following colors:
MR MA CR CA OR OG OB
00 00 00 ff 00 00 00
00 00 00 00 d3 e9 fe
00 00 ff 00 d3 e9 fe (a)
ff 00 00 00 d3 e9 fe
ff ff 00 00 d3 e9 fe
00 ff 00 00 d3 e9 fe
00 00 ff ff ff 00 00 (b)
ff 00 ff ff ff 00 00
ff ff ff ff ff 00 00
00 ff ff ff ff 00 00
After that I changed each pixel in both bitmaps to the same value. (And checked with MS Paint's fill tool that the picture has only one color.)
MR MA CR CA OR OG OB
00 00 00 ff 00 00 00
00 00 00 00 00 00 00
00 00 ff 00 ff 00 00 (c)
ff 00 00 00 00 00 00
ff ff 00 00 00 00 00
00 ff 00 00 00 00 00
00 00 ff ff ff 00 00 (d)
ff 00 ff ff ff 00 00 (d)
ff ff ff ff ff 00 00 (d)
00 ff ff ff ff 00 00 (d)
Abbreviations: M=mask C=Color O=Observation R=red G=green B=blue A=alpha/4.byte
I don't understand the following (letters refer to rows in the above table):
If the 4. byte is ignored, and not an alpha channel, why is pixel (a) not ff0000 like (b)?
If I change other pixels only, why does the first one change: (c) vs (a)?
Why is the result of all rows marked with (d) the same, why does the mask have no effect here?
Is there a bug in my code? Am I looking at the wrong MSDN page?

OK, I figured it. For historical reasons, there are two types of 32 bit/pixel bitmaps: those that use 3 bytes for the 3 colors, and the last one is unused (usually set to 0), and those that use the fourth byte to represent the alpha channel.
The catch is that the type is not indicated by any field but Windows takes it as an old format bitmap (without alpha) if all 4th bytes are zero, and treats it as a new format (with alpha) one if any fourth byte differs from zero. So this will result in a fully black icon:
for( int i = 0; i < 16 * 16; i++ )
for( int j = 0; j < 4; j++ )
byColor[ i * 4 + j ] = 0x00;
But if I add this line to the above, the whole icon becomes transparent*:
byColor[ 7 * 16 + 7 + 0 ] = 0x01;
* Except for the pixel in row 7 column 7, which will be only 255/256 transparent.

Related

Reading sk_buff with ebpf inside dev_queue_xmit yields questionable data

I'm trying to capture outgoing ethernet frames on the local host before they are sent by inserting a kprobe into __dev_queue_xmit().
However, the bytes I extract from the sk_buff structure do not match the subsequently captured packets.
I only attempted it for linear skbs up to now, because I already get unexpected results there.
For example, my kprobe reported the following information during a call to __dev_queue_xmit():
COMM PID TGID LEN DATALEN
chronyd 1058 1058 90 0
3431c4b06a8b3c7c3f2023bd08006500d0a57f040f7f0000000000000000000000000000000000006018d11a0f7f00000100000000000000000000000000000060a67f040f7f0000000000000000000000000000000000004001
COMM is the name of the process which called the function,
PID is the calling thread's id and TGID its thread group id. LEN is the value of (skb->len - skb->data_len) and DATA_LEN is skb->data_len.
Next, the program has copied LEN (in this case 90) bytes starting at skb->data.
Since DATALEN is zero, this is a linear skb. Thus, those bytes should contain exactly the frame which is about to be sent, shouldn't they?
Well, Wireshark subsequently recorded this frame:
0000 34 31 c4 b0 6a 8b 3c 7c 3f 20 23 bd 08 00 45 00
0010 00 4c 83 93 40 00 40 11 d1 a2 c0 a8 b2 18 c0 a8
0020 b2 01 c8 07 00 7b 00 38 e5 b4 23 00 06 20 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050 00 00 38 bc 17 13 12 4a 4c c0
The first 14 bytes, which are forming the ethernet header, match up perfectly as expected. Everything else doesn't match up at all.
The question now is: Why do the bytes not match up?
(Yes, I am certain the frame from Wireshark is indeed the one caused by this call to __dev_queue_xmit(). This is because only background programs using the network were running at the time, so the amount of outgoing traffic was rather small. Additionally, the captured frame contains, as expected, 90 bytes. Also, this frame holds an NTP payload, which is just what you'd expect from chronyd.)
My kernel version is 5.12.6-200.fc33.x86_64.
If you want to try it out yourself or have a closer look at my program, here it is:
from bcc import BPF
from ctypes import cast, POINTER, c_char
prog = """
#include <linux/sched.h>
#include <linux/skbuff.h>
struct xmit_event {
u64 ts;
u32 pid;
u32 tgid;
u32 len;
u32 datalen;
u32 packet_buf_ptr;
char comm[TASK_COMM_LEN];
u64 head;
u64 data;
u64 tail;
u64 end;
};
BPF_PERF_OUTPUT(xmits);
#define PACKET_BUF_SIZE 32768
# define PACKET_BUFS_PER_CPU 15
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
BPF_PERCPU_ARRAY(packet_buf, struct packet_buf, PACKET_BUFS_PER_CPU);
BPF_PERCPU_ARRAY(packet_buf_head, u32, 1);
int kprobe____dev_queue_xmit(struct pt_regs *ctx, struct sk_buff *skb, void *accel_priv) {
if (skb == NULL || skb->data == NULL)
return 0;
struct xmit_event data = { };
u64 both = bpf_get_current_pid_tgid();
data.pid = both;
if (data.pid == 0)
return 0;
data.tgid = both >> 32;
data.ts = bpf_ktime_get_ns();
bpf_get_current_comm(&data.comm, sizeof(data.comm));
data.len = skb->len;
// Copy packet contents
int slot = 0;
u32 *packet_buf_ptr = packet_buf_head.lookup(&slot);
if (packet_buf_ptr == NULL)
return 0;
u32 buf_head = *packet_buf_ptr;
u32 next_buf_head = (buf_head + 1) % PACKET_BUFS_PER_CPU;
packet_buf_head.update(&slot, &next_buf_head);
struct packet_buf *ringbuf = packet_buf.lookup(&buf_head);
if (ringbuf == NULL)
return 0;
u32 skb_data_len = skb->data_len;
u32 headlen = data.len - skb_data_len;
headlen &= 0xffffff; // Useless, but validator demands it because "this unsigned(!) variable could otherwise be negative"
bpf_probe_read_kernel(ringbuf->data, headlen < PACKET_BUF_SIZE ? headlen : PACKET_BUF_SIZE, skb->data);
data.packet_buf_ptr = buf_head;
data.len = headlen;
data.datalen = skb_data_len;
data.head = (u64) skb->head;
data.data = (u64) skb->data;
data.tail = (u64) skb->tail;
data.end = (u64) skb->end;
xmits.perf_submit(ctx, &data, sizeof(data));
return 0;
}
"""
global b
def xmit_received(cpu, data, size):
global b
global py_packet_buf
ev = b["xmits"].event(data)
print("%-18d %-25s %-8d %-8d %-10d %-10d %-12d %-12d %-12d %-12d" % (ev.ts, ev.comm.decode(), ev.pid, ev.tgid, ev.len, ev.datalen, ev.head, ev.data, ev.tail, ev.end))
bs = cast(py_packet_buf[ev.packet_buf_ptr][cpu].data, POINTER(c_char))[:ev.len]
c = bytes(bs)
print(c.hex())
def observe_kernel():
# load BPF program
global b
b = BPF(text=prog)
print("%-18s %-25s %-8s %-8s %-10s %-10s %-12s %-12s %-12s %-12s" % ("TS", "COMM", "PID", "TGID", "LEN", "DATALEN", "HEAD", "DATA", "TAIL", "END"))
b["xmits"].open_perf_buffer(xmit_received)
global py_packet_buf
py_packet_buf = b["packet_buf"]
try:
while True:
b.perf_buffer_poll()
except KeyboardInterrupt:
print("Kernel observer thread stopped.")
observe_kernel()
Found the issue.
I needed to replace
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
with
struct packet_buf {
unsigned char data[PACKET_BUF_SIZE];
};
I, however, do not understand how signedness makes a difference when I am not performing comparisons or arithmetic operations with this data.

How to generate a deterministic set of UUIDs in golang

I'm doing some testing and it would be useful to have a known set of UUIDs that are getting used by our code. However, I'm having trouble figuring out how to create a deterministic set of UUIDs in golang.
I've tried a few approaches, but neither seemed to work:
type KnownReader struct {
store *Store
}
type Store struct {
val uint16
}
func (r KnownReader) Read(p []byte) (n int, err error) {
ret := r.store.val
r.store.val = ret + 1
fmt.Printf("\nStore: %v", r.store.val)
p = make([]byte, 4)
binary.LittleEndian.PutUint16(p, uint16(ret))
fmt.Printf("\nreader p: % x", p)
return binary.MaxVarintLen16, nil
}
func main() {
r := KnownReader{
store: &Store{val: 111},
}
uuid.SetRand(r)
u, _ := uuid.NewRandomFromReader(r)
fmt.Printf("\n%v",u)
u, _ = uuid.NewRandomFromReader(r)
fmt.Printf("\n%v",u)
}
---- OUTPUT ----
Store: 1
reader p: 00 00 00 00
Store: 2
reader p: 01 00 00 00
Store: 3
reader p: 02 00 00 00
Store: 4
reader p: 03 00 00 00
Store: 5
reader p: 04 00 00 00
Store: 6
reader p: 05 00 00 00
00000000-0000-4000-8000-000000000000
Store: 7
reader p: 06 00 00 00
Store: 8
reader p: 07 00 00 00
Store: 9
reader p: 08 00 00 00
Store: 10
reader p: 09 00 00 00
Store: 11
reader p: 0a 00 00 00
Store: 12
reader p: 0b 00 00 00
00000000-0000-4000-8000-000000000000
As you can see, the UUID, does not change between calls
I also tried using uuid.FromBytes, but that didn't seem to work either:
func getbytes(num uint16) []byte {
p := make([]byte, 4)
binary.LittleEndian.PutUint16(p, num)
fmt.Printf("\ngetbytes p: % x", p)
return p
}
func main() {
var i uint16 = 0
fmt.Printf("\nout getbytes: % x", getbytes(i))
u, _ := uuid.FromBytes(getbytes(i))
i = i + 1
fmt.Printf("\nUUID: %v", u)
fmt.Printf("\nout getbytes: % x", getbytes(i))
u, _ = uuid.FromBytes(getbytes(i))
fmt.Printf("\nUUID: %v", u)
}
---- OUTPUT ----
getbytes p: 00 00 00 00
out getbytes: 00 00 00 00
getbytes p: 00 00 00 00
UUID: 00000000-0000-0000-0000-000000000000
getbytes p: 01 00 00 00
out getbytes: 01 00 00 00
getbytes p: 01 00 00 00
UUID: 00000000-0000-0000-0000-000000000000
As you can see the UUIDs are still the same here as well.
So, is there something I'm missing? How can I get a consistent set of UUIDs?
Thanks
Thanks Adrian, I think I figured out the answer:
rnd := rand.New(rand.NewSource(1))
uuid.SetRand(rnd)
u, _ = uuid.NewRandomFromReader(rnd)
fmt.Printf("\n%v", u)
u, _ = uuid.NewRandomFromReader(rnd)
fmt.Printf("\n%v", u)
--- OUTPUT ---
52fdfc07-2182-454f-963f-5f0f9a621d72
9566c74d-1003-4c4d-bbbb-0407d1e2c649

how to supply a specific timezone to TzSpecificLocalTimeToSystemTime()

The following dBase code invokes a win32 API function to convert a local DST time to a system time. The first parameter set to "null" means that the function takes the current active time zone. What value do I have to put instead of "null" to specify another time zone?
The following page refers to lpTimeZoneInformation as a pointer to a TIME_ZONE_INFORMATION structure that specifies the time zone for the localtime input to this function (lpLocalTime), but is is unclear to me what kind of pointer this is.
I have tried 'Brisbane', 'E. Australia Standard Time', '10:00' and '+10:00' but none returns the expected value.
https://learn.microsoft.com/en-us/windows/win32/api/timezoneapi/nf-timezoneapi-tzspecificlocaltimetosystemtime
ITOH and HTOI are Integer TO Hex and vice-versa conversion functions
localtime and systemtime structures work, I tried to replicate that for the time_zone_information part but without success so far
As it stands, the return value is 13.20
Thanks for any help!
d=new date("31/12/2020 5:08")
offset1=getLocalTimeOffset(d)/60
function getLocalTimeOffset(d_in)
// todo typechecking of the parameter
extern clogical TzSpecificLocalTimeToSystemTime(cptr,cptr,cptr) kernel32
extern culong GetLastError(cvoid) kernel32
local systemtime,localtime,tmp
localtime = replicate(chr(0),16)
systemtime = replicate(chr(0),16)
TZI = replicate(chr(0),16)
TZIa=itoh(-600,4)
TZIb=itoh(-60,4)
TZI.setbyte(1,htoi(left(TZIa,2)))
TZI.setbyte(0,htoi(right(TZIa,2)))
TZI.setbyte(9,htoi(left(TZIb,2)))
TZI.setbyte(8,htoi(right(TZIb,2)))
tmp = itoh(d_in.year,4)
localtime.setbyte(1,htoi(left(tmp,2))) // fill the systemtime structure
localtime.setbyte(0,htoi(right(tmp,2))) // seconds and ms are of no concern
localtime.setbyte(2,d_in.month+1)
localtime.setbyte(4,d_in.day)
localtime.setbyte(6,d_in.date)
localtime.setbyte(8,d_in.hour)
localtime.setbyte(10,d_in.minute)
if TzSpecificLocalTimeToSystemTime(TZI,localtime,systemtime) = 0
tmp = getlasterror() ; ? "Error: "+tmp ; return 9999
endif
tmp = sign(d_in.date-systemtime.getbyte(6))*24*60 // consider day boundary
if (d_in.date = 1 or systemtime.getbyte(6) = 1) and (d_in.month+1 <> systemtime.getbyte(2))
tmp = -tmp // adjust for month boundaries
endif
tmp += (d_in.hour - systemtime.getbyte(8))*60
tmp += d_in.minute - systemtime.getbyte(10)
return tmp
(Too long for a comment.)   The first parameter to TzSpecificLocalTimeToSystemTime must be either NULL, or otherwise point to a TIME_ZONE_INFORMATION structure, filled-in with the target timezone data from HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones per Remarks on the same page.
In OP's case, Brisbane falls under the E. Australia Standard Time key, and TZI data parses as:
typedef struct _REG_TZI_FORMAT
{
LONG Bias; // -600 A8 FD FF FF
LONG StandardBias; // 0 00 00 00 00
LONG DaylightBias; // -60 C4 FF FF FF
SYSTEMTIME StandardDate; // n/a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SYSTEMTIME DaylightDate; // n/a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
} REG_TZI_FORMAT;
Following is the C code to fill-in a TIME_ZONE_INFORMATION structure with the same data, and successfully convert a Brisbane local time to UTC:
#include <windows.h>
#include <stdio.h>
int main()
{
TIME_ZONE_INFORMATION tzEAST = // [offset] bytes
{
-600, // LONG Bias; [0] A8 FD FF FF
{ 0 }, // WCHAR StandardName[32]; [4] 00 .. 00
{ 0 }, // SYSTEMTIME StandardDate; [68] 00 .. 00
0, // LONG StandardBias; [84] 00 00 00 00
{ 0 }, // WCHAR DaylightName[32]; [88] 00 .. 00
{ 0 }, // SYSTEMTIME DaylightDate; [152] 00 .. 00
-60 // LONG DaylightBias; [168] C4 FF FF FF
};
SYSTEMTIME stEAST = { 2021, 1, 1, 4, 12 }, stUTC = { 0 };
if(!TzSpecificLocalTimeToSystemTime(&tzEAST, &stEAST, &stUTC)) return 1;
printf("EAST %d-%02d-%02d %02d:%02d:%02d = UTC %d-%02d-%02d %02d:%02d:%02d\n",
stEAST.wYear, stEAST.wMonth, stEAST.wDay, stEAST.wHour, stEAST.wMinute, stEAST.wSecond,
stUTC.wYear, stUTC.wMonth, stUTC.wDay, stUTC.wHour, stUTC.wMinute, stUTC.wSecond);
return 0;
}
Output:
EAST 2021-01-04 12:00:00 = UTC 2021-01-04 02:00:00
[ EDIT ]   Following is my guess of what the dBase code might look like. Just a guess, and nothing more than a guess, since I don't actually know dBase beyond what's been posted here.
tzBias = itoh(-600, 8)
tzDstBias = itoh( -60, 8)
tzi = replicate(chr(0), 86) // 86*2 = 172 = sizeof TIME_ZONE_INFORMATION
tzi.setbyte( 0, htoi(substring( tzBias, 6, 8))) // [ 0] LONG Bias;
tzi.setbyte( 1, htoi(substring( tzBias, 4, 6)))
tzi.setbyte( 2, htoi(substring( tzBias, 2, 4)))
tzi.setbyte( 3, htoi(substring( tzBias, 0, 2)))
tzi.setbyte(168, htoi(substring(tzDstBias, 6, 8))) // [168] LONG DaylightBias;
tzi.setbyte(169, htoi(substring(tzDstBias, 4, 6)))
tzi.setbyte(170, htoi(substring(tzDstBias, 2, 4)))
tzi.setbyte(171, htoi(substring(tzDstBias, 0, 2)))
if TzSpecificLocalTimeToSystemTime(tzi, localtime, systemtime) = 0 // ...
[ EDIT #2 courtesy OP ]   The working dBase code to fill the structure is the following:
tzi.setbyte( 0, htoi(substr(tzBias, 7, 2))) // [ 0] LONG Bias
tzi.setbyte( 1, htoi(substr(tzBias, 5, 2)))
tzi.setbyte( 2, htoi(substr(tzBias, 3, 2)))
tzi.setbyte( 3, htoi(substr(tzBias, 1, 2)))
tzi.setbyte(168, htoi(substr(tzDstBias, 7,2))) // [168] LONG DaylightBias
tzi.setbyte(169, htoi(substr(tzDstBias, 5,2)))
tzi.setbyte(170, htoi(substr(tzDstBias, 3,2)))
tzi.setbyte(171, htoi(substr(tzDstBias, 1,2)))

Armadillo and OpenMP and stack-use-after-scope

I have an issue with a stack-use-after-scope with error with the C++ Armadillo library within an OpenMP blog in an R package and I cannot figure out what is wrong. The complete gcc log is here from the CRAN GCC ASAN check of the R-package. I have have kept the relevant part of the log below
==33791==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd03364940 at pc 0x7ff8127abc07 bp 0x7ffd03364680 sp 0x7ffd03364670
WRITE of size 4 at 0x7ffd03364940 thread T0
#0 0x7ff8127abc06 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool) /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215
#1 0x7ff8129fb0c2 in GMA<logistic>::solve() [clone ._omp_fn.0] /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:411
#2 0x7ff825ae2cde in GOMP_parallel (/lib64/libgomp.so.1+0xdcde)
#3 0x7ff812a0c9f8 in GMA<logistic>::solve() ddhazard/GMA_solver.cpp:83
#4 0x7ff81276421d in ddhazard_fit_cpp(...
Address 0x7ffd03364940 is located in stack of thread T0 at offset 416 in frame
#0 0x7ff8129fa82f in GMA<logistic>::solve() [clone ._omp_fn.0] ddhazard/GMA_solver.cpp:83
This frame has 5 object(s):
[32, 40) 'dest'
[96, 104) 'src'
[160, 176) 'ans'
[224, 384) 'my_X_cross'
[416, 576) '<unknown>' <== Memory access at offset 416 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope /data/gannet/ripley/R/test-3.5/RcppArmadillo/include/armadillo_bits/Mat_meat.hpp:1215 in arma::Mat<double>::Mat(double*, unsigned int, unsigned int, bool, bool)
Shadow bytes around the buggy address:
0x1000206648d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000206648f0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
0x100020664900: 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f2 f2 f2 f2
0x100020664910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020664920: 00 00 00 00 f2 f2 f2 f2[f8]f8 f8 f8 f8 f8 f8 f8
0x100020664930: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f3 f3 f3 f3
0x100020664940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020664970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==33791==ABORTING
The WRITE that causes the error is in the dynamichazard/src/ddhazard/GMA_solver.cpp and particularly this OpenMP block
#ifdef _OPENMP
int n_threads = std::max(1, std::min(omp_get_max_threads(),
(int)r_set.n_elem / 1000 + 1));
#pragma omp parallel num_threads(n_threads) if(n_threads > 1)
{
#endif
arma::mat my_X_cross(q, q, arma::fill::zeros);
#ifdef _OPENMP
#pragma omp for schedule(static)
#endif
for(arma::uword i = 0; i < r_set.n_elem; i++){
auto trunc_eta = T::truncate_eta(
is_event[i], eta[i], exp(eta[i]), at_risk_length[i]);
h_1d[i] = w[i] * T::d_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
double h_2d_neg = - w[i] * T::dd_log_like(
is_event[i], trunc_eta, at_risk_length[i]);
sym_mat_rank_one_update(h_2d_neg, X_t.unsafe_col(i), my_X_cross);
}
#ifdef _OPENMP
#pragma omp critical(gma_lock)
{
#endif
X_cross += my_X_cross;
#ifdef _OPENMP
}
}
#endif
As far as I can tell, the error is at the X_t.unsafe_col(i) call in the call to sym_mat_rank_one_update. The declaration of the function is
void sym_mat_rank_one_update(const double, const arma::vec&, arma::mat&);
It should trigger a call to the arma::col<double> constructor in line 411 of include/armadillo_bits/Col_meat.hpp which inherit the arma::mat<double> constructor in line 1215 of include/armadillo_bits/Mat_meat.hpp. I gather this is where the 4 bit write occurs with one of the unsigned int since the arma::mat<double> constructor is
template<typename eT>
inline
Mat<eT>::Mat(eT* aux_mem, const uword aux_n_rows, const uword aux_n_cols, const bool copy_aux_mem, const bool strict)
: n_rows ( aux_n_rows )
, n_cols ( aux_n_cols )
, n_elem ( aux_n_rows*aux_n_cols )
, vec_state( 0 )
, mem_state( copy_aux_mem ? 0 : ( strict ? 2 : 1 ) )
, mem ( copy_aux_mem ? 0 : aux_mem )
{
arma_extra_debug_sigprint_this(this);
if(copy_aux_mem == true)
{
init_cold();
arrayops::copy( memptr(), aux_mem, n_elem );
}
}
where
template<typename eT>
class Mat : public Base< eT, Mat<eT> >
{
public:
typedef eT elem_type; //!< the type of elements stored in the matrix
typedef typename get_pod_type<eT>::result pod_type; //!< if eT is std::complex<T>, pod_type is T; otherwise pod_type is eT
const uword n_rows; //!< number of rows (read-only)
const uword n_cols; //!< number of columns (read-only)
const uword n_elem; //!< number of elements (read-only)
const uhword vec_state; //!< 0: matrix layout; 1: column vector layout; 2: row vector layout
const uhword mem_state;
...
See include/armadillo_bits/Mat_bones.hpp and notice that arma::uword is unsigned int. However, I cannot figure out why this would cause a stack-use-after-scope.
A similar error is in the Morpho package. See the current CRAN log here and src/createL.cpp.
Setup
The above check is on CRAN. As far as I can tell, it is with gcc 7.2 on Fedora 26 with the following config.site used to build R
CXX="g++ -fsanitize=address,undefined,bounds-strict -fno-omit-frame-pointer"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -fsanitize=address"
FFLAGS="-g -O2 -mtune=native"
FCFLAGS="-g -O2 -mtune=native"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native"
MAIN_LDFLAGS=-fsanitize=address,undefined
Further, the following ~/.R/Makevars is used
CC = gcc -std=gnu99 -fsanitize=address,undefined -fno-omit-frame-pointer
F77 = gfortran -fsanitize=address
FC = gfortran -fsanitize=address
FCFLAGS = -g -O2 -mtune=native -fbounds-check
FFLAGS = -g -O2 -mtune=native -fbounds-check
The error does not happen with clang 5.0.0 and valgrind on the same machine. Further, I cannot reproduce them on a local Ubuntu 17.04 with gcc version 6.3 and clang version 4.0.0.
Minimal, Complete, and Verifiable example
I will work on making one.

golang: index efficiency of array

It's a simple program.
test environment: debian 8, go 1.4.2
union.go:
package main
import "fmt"
type A struct {
t int32
u int64
}
func test() (total int64) {
a := [...]A{{1, 100}, {2, 3}}
for i := 0; i < 5000000000; i++ {
p := &a[i%2]
total += p.u
}
return
}
func main() {
total := test()
fmt.Println(total)
}
union.c:
#include <stdio.h>
struct A {
int t;
long u;
};
long test()
{
struct A a[2];
a[0].t = 1;
a[0].u = 100;
a[1].t = 2;
a[1].u = 3;
long total = 0;
long i;
for (i = 0; i < 5000000000; i++) {
struct A* p = &a[i % 2];
total += p->u;
}
return total;
}
int main()
{
long total = test();
printf("%ld\n", total);
}
result compare:
go:
257500000000
real 0m9.167s
user 0m9.196s
sys 0m0.012s
C:
257500000000
real 0m3.585s
user 0m3.560s
sys 0m0.008s
It seems that the go compiles lot of weird assembly codes (you could use objdump -D to check it).
For example, why movabs $0x12a05f200,%rbp appears twice?
400c60: 31 c0 xor %eax,%eax
400c62: 48 bd 00 f2 05 2a 01 movabs $0x12a05f200,%rbp
400c69: 00 00 00
400c6c: 48 39 e8 cmp %rbp,%rax
400c6f: 7d 46 jge 400cb7 <main.test+0xb7>
400c71: 48 89 c1 mov %rax,%rcx
400c74: 48 c1 f9 3f sar $0x3f,%rcx
400c78: 48 89 c3 mov %rax,%rbx
400c7b: 48 29 cb sub %rcx,%rbx
400c7e: 48 83 e3 01 and $0x1,%rbx
400c82: 48 01 cb add %rcx,%rbx
400c85: 48 8d 2c 24 lea (%rsp),%rbp
400c89: 48 83 fb 02 cmp $0x2,%rbx
400c8d: 73 2d jae 400cbc <main.test+0xbc>
400c8f: 48 6b db 10 imul $0x10,%rbx,%rbx
400c93: 48 01 dd add %rbx,%rbp
400c96: 48 8b 5d 08 mov 0x8(%rbp),%rbx
400c9a: 48 01 f3 add %rsi,%rbx
400c9d: 48 89 de mov %rbx,%rsi
400ca0: 48 89 5c 24 28 mov %rbx,0x28(%rsp)
400ca5: 48 ff c0 inc %rax
400ca8: 48 bd 00 f2 05 2a 01 movabs $0x12a05f200,%rbp
400caf: 00 00 00
400cb2: 48 39 e8 cmp %rbp,%rax
400cb5: 7c ba jl 400c71 <main.test+0x71>
400cb7: 48 83 c4 20 add $0x20,%rsp
400cbb: c3 retq
400cbc: e8 6f e0 00 00 callq 40ed30 <runtime.panicindex>
400cc1: 0f 0b ud2
...
while the C assembly is more clean:
0000000000400570 <test>:
400570: 48 c7 44 24 e0 64 00 movq $0x64,-0x20(%rsp)
400577: 00 00
400579: 48 c7 44 24 f0 03 00 movq $0x3,-0x10(%rsp)
400580: 00 00
400582: b9 64 00 00 00 mov $0x64,%ecx
400587: 31 d2 xor %edx,%edx
400589: 31 c0 xor %eax,%eax
40058b: 48 be 00 f2 05 2a 01 movabs $0x12a05f200,%rsi
400592: 00 00 00
400595: eb 18 jmp 4005af <test+0x3f>
400597: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
40059e: 00 00
4005a0: 48 89 d1 mov %rdx,%rcx
4005a3: 83 e1 01 and $0x1,%ecx
4005a6: 48 c1 e1 04 shl $0x4,%rcx
4005aa: 48 8b 4c 0c e0 mov -0x20(%rsp,%rcx,1),%rcx
4005af: 48 83 c2 01 add $0x1,%rdx
4005b3: 48 01 c8 add %rcx,%rax
4005b6: 48 39 f2 cmp %rsi,%rdx
4005b9: 75 e5 jne 4005a0 <test+0x30>
4005bb: f3 c3 repz retq
4005bd: 0f 1f 00 nopl (%rax)
Could somebody explain it? Thanks!
The main difference is the the array bounds checking. In the disassembly dump for the Go program, there is:
400c89: 48 83 fb 02 cmp $0x2,%rbx
400c8d: 73 2d jae 400cbc <main.test+0xbc>
...
400cbc: e8 6f e0 00 00 callq 40ed30 <runtime.panicindex>
400cc1: 0f 0b ud2
So if %rbx is greater than or equal to 2, then it jumps down to a call to runtime.panicindex. Given you're working with an array of size 2, that is clearly the bounds check. You could make the argument that the compiler should be smart enough to skip the bounds check in this particular case where the range of the index can be determined statically, but it seems that it isn't smart enough to do so yet.
While you're seeing a noticeable performance difference for this micro-benchmark, it might be worth considering whether this is actually representative of your actual code. If you're doing other stuff in your loop, the difference is likely to be less noticeable.
And while bounds checking does have a cost, in many cases it is better than the alternative of the program continuing with undefined behaviour.

Resources