Memory overwrite when using strong and weak symbols in two linked files - gcc

Assume we have a 64-bit x86 machine, which is little-endian and therefore stores the least-significant byte of a word in the byte with the lowest address.
Assuming standard alignment rules for a 64-bit x86 Linux C compiler.
Consider
File 1:
#include <stdio.h>
struct cs {
int count;
unsigned short flags;
};
struct cs gcount;
extern void add_counter( int n );
int main(int c, char *argv[]);
int main(int c, char *argv[]) {
gcount.flags = 0xe700;
gcount.count = 1;
add_counter(42);
printf("count =%d\n", gcount.count);
return 0;
}
File 2:
struct cs {
unsigned short flags;
int count;
};
struct cs gcount = {0,0};
void add_counter (int n) {
gcount.count +=n;
}
If compiled the output is 1.
Explanation:
count is defined as a strong global int the second file is thus
initialized to {0,0}, here the order doesn't matter yet since it's
just all zeroes.
A struct / type is defined per compilation unit so the first file uses
the first definition to write to the struct meaning
gcount.flags = 0xe700; gcount.count=1;
cause the memory to look like
[e7 00 | 00 00 00 01] where (in little endian) the left is the top and
the right is the bottom of memory.
(there's no padding between the two fields since short is at the end,
sizeof will report 8B though)
when calling add_counter(42), the second file will use the second
definition of cs and look at the memory as
[e7 00 00 00 | 00 01]
Now there's a 2B padding in between the two fields and the write
access to the count will thus affect the range
[e7 00 00 00 | 00 01]
42 is 0x2a in hexadecimal (2*16 + 10) and will thus result in
[e7 2a 00 00 | 00 01]
converting this back to the view the first file has we get
[e7 2a | 00 00 00 01]
and thus the result is 1 instead of the expected 43.
Now I do get the general gist but I'm a bit confused about why we get [*e7 2a* 00 00 | 00 01] when adding 42=0x2a and not [*e7 00 00 2a | 00 01].
I'm expecting [*e7 00 00 2a | 00 01] because we are using little-endian, meaning, the most right bit is the LSB. So e7 would actualy represent the most significant 8 bits here.

My disparaging comments about the exercise itself notwithstanding, it is possible to interpret the question as a simpler one about byte ordering. In that sense, the issue is with this assertion:
little-endian, meaning, the most right bit is the LSB.
Little-endian means that the bytes are ordered from least significant to most significant. The term having been coined in English and English being written left-to-right, that means the most left byte is the LSB in little-endian ordering.

Related

Reading sk_buff with ebpf inside dev_queue_xmit yields questionable data

I'm trying to capture outgoing ethernet frames on the local host before they are sent by inserting a kprobe into __dev_queue_xmit().
However, the bytes I extract from the sk_buff structure do not match the subsequently captured packets.
I only attempted it for linear skbs up to now, because I already get unexpected results there.
For example, my kprobe reported the following information during a call to __dev_queue_xmit():
COMM PID TGID LEN DATALEN
chronyd 1058 1058 90 0
3431c4b06a8b3c7c3f2023bd08006500d0a57f040f7f0000000000000000000000000000000000006018d11a0f7f00000100000000000000000000000000000060a67f040f7f0000000000000000000000000000000000004001
COMM is the name of the process which called the function,
PID is the calling thread's id and TGID its thread group id. LEN is the value of (skb->len - skb->data_len) and DATA_LEN is skb->data_len.
Next, the program has copied LEN (in this case 90) bytes starting at skb->data.
Since DATALEN is zero, this is a linear skb. Thus, those bytes should contain exactly the frame which is about to be sent, shouldn't they?
Well, Wireshark subsequently recorded this frame:
0000 34 31 c4 b0 6a 8b 3c 7c 3f 20 23 bd 08 00 45 00
0010 00 4c 83 93 40 00 40 11 d1 a2 c0 a8 b2 18 c0 a8
0020 b2 01 c8 07 00 7b 00 38 e5 b4 23 00 06 20 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0050 00 00 38 bc 17 13 12 4a 4c c0
The first 14 bytes, which are forming the ethernet header, match up perfectly as expected. Everything else doesn't match up at all.
The question now is: Why do the bytes not match up?
(Yes, I am certain the frame from Wireshark is indeed the one caused by this call to __dev_queue_xmit(). This is because only background programs using the network were running at the time, so the amount of outgoing traffic was rather small. Additionally, the captured frame contains, as expected, 90 bytes. Also, this frame holds an NTP payload, which is just what you'd expect from chronyd.)
My kernel version is 5.12.6-200.fc33.x86_64.
If you want to try it out yourself or have a closer look at my program, here it is:
from bcc import BPF
from ctypes import cast, POINTER, c_char
prog = """
#include <linux/sched.h>
#include <linux/skbuff.h>
struct xmit_event {
u64 ts;
u32 pid;
u32 tgid;
u32 len;
u32 datalen;
u32 packet_buf_ptr;
char comm[TASK_COMM_LEN];
u64 head;
u64 data;
u64 tail;
u64 end;
};
BPF_PERF_OUTPUT(xmits);
#define PACKET_BUF_SIZE 32768
# define PACKET_BUFS_PER_CPU 15
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
BPF_PERCPU_ARRAY(packet_buf, struct packet_buf, PACKET_BUFS_PER_CPU);
BPF_PERCPU_ARRAY(packet_buf_head, u32, 1);
int kprobe____dev_queue_xmit(struct pt_regs *ctx, struct sk_buff *skb, void *accel_priv) {
if (skb == NULL || skb->data == NULL)
return 0;
struct xmit_event data = { };
u64 both = bpf_get_current_pid_tgid();
data.pid = both;
if (data.pid == 0)
return 0;
data.tgid = both >> 32;
data.ts = bpf_ktime_get_ns();
bpf_get_current_comm(&data.comm, sizeof(data.comm));
data.len = skb->len;
// Copy packet contents
int slot = 0;
u32 *packet_buf_ptr = packet_buf_head.lookup(&slot);
if (packet_buf_ptr == NULL)
return 0;
u32 buf_head = *packet_buf_ptr;
u32 next_buf_head = (buf_head + 1) % PACKET_BUFS_PER_CPU;
packet_buf_head.update(&slot, &next_buf_head);
struct packet_buf *ringbuf = packet_buf.lookup(&buf_head);
if (ringbuf == NULL)
return 0;
u32 skb_data_len = skb->data_len;
u32 headlen = data.len - skb_data_len;
headlen &= 0xffffff; // Useless, but validator demands it because "this unsigned(!) variable could otherwise be negative"
bpf_probe_read_kernel(ringbuf->data, headlen < PACKET_BUF_SIZE ? headlen : PACKET_BUF_SIZE, skb->data);
data.packet_buf_ptr = buf_head;
data.len = headlen;
data.datalen = skb_data_len;
data.head = (u64) skb->head;
data.data = (u64) skb->data;
data.tail = (u64) skb->tail;
data.end = (u64) skb->end;
xmits.perf_submit(ctx, &data, sizeof(data));
return 0;
}
"""
global b
def xmit_received(cpu, data, size):
global b
global py_packet_buf
ev = b["xmits"].event(data)
print("%-18d %-25s %-8d %-8d %-10d %-10d %-12d %-12d %-12d %-12d" % (ev.ts, ev.comm.decode(), ev.pid, ev.tgid, ev.len, ev.datalen, ev.head, ev.data, ev.tail, ev.end))
bs = cast(py_packet_buf[ev.packet_buf_ptr][cpu].data, POINTER(c_char))[:ev.len]
c = bytes(bs)
print(c.hex())
def observe_kernel():
# load BPF program
global b
b = BPF(text=prog)
print("%-18s %-25s %-8s %-8s %-10s %-10s %-12s %-12s %-12s %-12s" % ("TS", "COMM", "PID", "TGID", "LEN", "DATALEN", "HEAD", "DATA", "TAIL", "END"))
b["xmits"].open_perf_buffer(xmit_received)
global py_packet_buf
py_packet_buf = b["packet_buf"]
try:
while True:
b.perf_buffer_poll()
except KeyboardInterrupt:
print("Kernel observer thread stopped.")
observe_kernel()
Found the issue.
I needed to replace
struct packet_buf {
char data[PACKET_BUF_SIZE];
};
with
struct packet_buf {
unsigned char data[PACKET_BUF_SIZE];
};
I, however, do not understand how signedness makes a difference when I am not performing comparisons or arithmetic operations with this data.

Why `sizeof var` does not show true size?

Suppose we use avr-gcc to compile code which has the following structure:
typedef struct {
uint8_t bLength;
uint8_t bDescriptorType;
int16_t wString[];
} S_string_descriptor;
We initialize it globally like this:
const S_string_descriptor sn_desc PROGMEM = {
1 + 1 + sizeof L"1234" - 2, 0x03, L"1234"
};
Let's check what is generated from it:
000000ac <__trampolines_end>:
ac: 0a 03 fmul r16, r18
ae: 31 00 .word 0x0031 ; ????
b0: 32 00 .word 0x0032 ; ????
b2: 33 00 .word 0x0033 ; ????
b4: 34 00 .word 0x0034 ; ????
...
So, indeed string content follows the first two elements of the structure, as required.
But if we try to check sizeof sn_desc, result is 2.
Definition of the variable is done in compile-time, sizeof is also a compile-time operator. So, why sizeof var does not show true size of var? And where this behavior of the compiler (i.e., adding arbitrary data to a structure) is documented?
sn_desc is a 2-byte pointer into flash. It is meant to be used with LPM et alia in order to retrieve the actual data. There is no way to get the actual size of this data; store it separately.

What's the memory layout of UTF-16 encoded strings with Visual Studio 2015?

WinAPI uses wchar_t buffers. As I understand we need to use UTF-16 to encode all our arguments to WinAPI.
We have two versions of UTF-16: UTF-16be and UTF-16le. Let encode a string "Example" 0x45 0x78 0x61 0x6d 0x70 0x6c 0x65. With UTF-16be bytes should be placed as this: 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65. With UTF-16le it should be 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00. (We are omitting BOM). Byte representations of the same string are different.
According to the docs Windows uses UTF-16le. This means that we should encode all strings with UTF-16le or it would not work.
At the same time my compiler (VS2015) uses UTF-16be for the strings that I hard coded into my code (smth like L"my test string"). But WinAPI works well with these strings. Why it works? What am I missing?
Update 1:
To test byte representation of hard coded strings I used following code:
std::string charToHex(wchar_t ch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result(4, ' ');
result[0] = alphabet[static_cast<unsigned int>((ch & 0xf000) >> 12)];
result[1] = alphabet[static_cast<unsigned int>((ch & 0xf00) >> 8)];
result[2] = alphabet[static_cast<unsigned int>((ch & 0xf0) >> 4)];
result[3] = alphabet[static_cast<unsigned int>(ch & 0xf)];
return std::move(result);
}
Little endian or big endian describes the way that variables of more than 8 bits are stored in memory. The test you have devised doesn't test memory layout, it's working with wchar_t types directly; the upper bits of an integer type are always the upper bits, no matter if the CPU is big endian or little endian!
This modification to your code will show how it really works.
std::string charToHex(wchar_t * pch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result;
unsigned char * pbytes = static_cast<unsigned char *>(pch);
for (int i = 0; i < sizeof(wchar_t); ++i)
{
result.push_back(alphabet[(pbytes[i] & 0xf0) >> 4];
result.push_back(alphabet[pbytes[i] & 0x0f];
}
return std::move(result);
}

Re-attempting BASIC 6502 N-byte integer addition?

I initially (asked for help) and wrote a BASIC program in the 6502 pet emulator which added two n-byte integers. However, my feedback was that it was simply adding two 16 bit integers (not adding n-byte integers).
Can anyone help me understand this feedback by looking at my code and point me in the right direction to make a program that adds two n-byte integers?
Thank You for the collaboration!
Documentation:
Adds two n-byte integers using absolute indexed addressing. The addends begin at memory locations $0600, $0700 and the answer is at $0800. Byte length of the integers is at $0600 (¢ —> 256)
Machine Code:
18 a2 00 ac 00 06 bd 00
07 7d 00 08 9d 00 09 e8
00 88 00 d0
Op Codes, Documentation, Variables:
A1 = $0600
B1 = $0700
B2 = $0800
Z1 = $0900
[START] = $0500
CLC 18 // loads x with 0
LDX A2 00 // loads length on Y
LDY A1 AC 00 06 // load first operand
loop: LDA B1, x BD 00 07 // adds second operand
ADC B2, x 7D 00 08 // store result
STA Z1, x 9D 00 09 // go to next byte
INX E8 00 // count how many are left
DEY 88 00 // do more if needed
BNE loop D0
It looked to me like your code does what you claim -- adds two N byte operands in little-endian byte order. I vaguely remembered the various addressing modes of the 6502 from my misspent youth and the code seems fine. X is used to index the current byte from the two numbers, Y is a counter for the length of the operands in bytes and you loop over those bytes, stored at addresses 0x0700 and 0x0800 and write the result at address 0x0900.
Rather than get the Commodore 64 out of the attic and try it out I used an online virtual 6502 simulator. On this site we can set the memory address and load the byte values in. They even link to a page to assemble opcodes too. So setting the memory locations 0x0600 to "04" and both 0x0700 and 0x0800 to "04 03 02 01" we should see this code add these two 32 bit values (0x01020304 + 0x01020304 == 0x02040608).
Stepping through the code by clicking on the PC register and setting it to 0x0500 and then single stepping we see there is a bug in your machine code. After INX which compiles to E8 we hit a spurious 0x00 value(BRK) which terminates. The corrected code as below runs to completion and the expected value is seen by reading the memory at 0x0900.
0000 CLC 18
0001 LDX #$00 A2 00
0003 LDY $0600 AC 00 06
0006 LOOP: LDA $0700,X BD 00 07
0009 ADC $0800,X 7D 00 08
000C STA $0900,X 9D 00 09
000F INX E8
0010 DEY 88
0011 BNE LOOP: D0 F3
Memory dump:
:0900 08 06 04 02 00 00 00 00
:0908 00 00 00 00 00 00 00 00

Hashing a 64-bit value into a 32-bit MAC address

I'm looking into suggestions on how to convert a 64-bit die revision field into a 32-bit MAC address I can use a for a wireless application to avoid collisions.
The die information is
struct {
uint32_t lot;
uint16_t X_coordinate;
uint16_t Y_coordinate;
}
I don't know the range of coordinates, but based on a few samples, I think the coordinates are limited to < 256. That effectly reduces the space by 2 bytes. But the lot number is fully populated.
I'm going to try this (pseudocode to make it readable, I'm leaving the casts out)
MAC = X_coordinate | Y_coordinate << 8 | lot << 16;
and throw away the top 16 bits of the lot and the top 8 bits of the coordinates. I feel though that maybe I should XOR in the top 16 bits of the lot somewhere, but I have no experience with this in the real world.
Here is a sample of die revision information: little endian byte dump
lot/wafer ID X coordinate Y coordinate
C3 1B B0 46 20 00 22 00
CB 8B 94 46 14 00 32 00
CB 8B 94 46 27 00 1E 00
B9 F7 80 6F 20 00 08 00

Resources