Related
First time I need to work on raw data (with different endianness, 2's complement, ...) and thus finally figured out how to work with the bytes type.
I need to implement the following checksum algorithm. I understand the C code, but wonder how to gracefully do this in Python3...
I'm sure I could come up with something that works, but would be terribly inefficient or unreliable
The checksum algorithm used is the 8-bit Fletcher algorithm. This algorithm works as follows:
Buffer[N] is an array of bytes that contains the data over which the checksum is to be calculated.
The two CK_A and CK_A values are 8-bit unsigned integers, only! If implementing with larger- sized integer values, make sure to mask both
CK_A and CK_B with the value 0xff after both operations in the loop.
After the loop, the two U1 values contain the checksum, transmitted after the message payload, which concludes the frame.
CK_A = 0, CK_B = 0 For (I = 0; I < N; I++)
{
CK_A = CK_A + Buffer[I]
CK_B = CK_B + CK_A
} ```
My data structure is as follows:
source = b'\xb5b\x01<#\x00\x01\x00\x00\x00hUX\x17\xdd\xff\xff\xff^\xff\xff\xff\xff\xff\xff\xff\xa6\x00\x00\x00F\xee\x88\x01\x00\x00\x00\x00\xa5\xf5\xd1\x05d\x00\x00\x00d\x00\x00\x00j\x00\x00\x00d\x00\x00\x00\xcb\x86\x00\x00\x00\x00\x00\x007\x01\x00\x00\xcd\xa2'
I came up with a couple of ideas on how to do this but have issues.
The following is where I am now, I've added comments on how I think it would work (but doesn't).
for b in source[5:-2]:
# The following results in "TypeError("can't concat int to bytes")"
# So I take one element of a byte, then I would expect to get a single byte.
# However, I get an int.
# Should I convert the left part of the operation to an int first?
# I suppose I could get this done in a couple of steps but it seems this can't be the "correct" way...
CK_A[-1:] += b
# I hoped the following would work as a bitmask,
# (by keeping only the last byte) thus "emulating" an uint8_t
# Might not be the correct/best assumption...
CK_A = CK_A[-1:]
CK_B[-1:] += CK_A
CK_B = CK_B[-1:]
ret = CK_A + CK_B
Clearly, I do not completely grasp how this Bytes type works/should be used.
Seems I was making things too difficult...
CK_A = 0
CK_B = 0
for b in source:
CK_A += b
CK_B += CK_A
CK_A %= 0x100
CK_B %= 0x100
ret = bytes()
ret = int.to_bytes(CK_A,1, 'big') + int.to_bytes(CK_B,1,'big')
The %=0x100 works as a bit mask, leaving only the 8 LSB...
Why is my code behaving this way when I dereference *str + i? I know that, if I were going to try to print each character of the string one-by-one, I would have done str[i] rather than *str + i, but I wanted to see what happened here.
Is the computer recognizing that 'A' is the first letter, finding the memory location of 'A', and then just going up through the ASCII table? It almost seems like there is just one place in the computer in which the letter 'A' is stored, I found it, and then because a char is one byte it just went through the rest of the ASCII table on the for loop.
Thank you!
input:
char *str1 = "Abc";
for (int i = 0; i < 30; i++)
{
printf("letter: %c - ", *str1 + i);
printf("memory address: %p", &str1 + i);
printf("\n");
}
output:
letter: A - memory address: 0x7ffea8ea9510
letter: B - memory address: 0x7ffea8ea9518
letter: C - memory address: 0x7ffea8ea9520
letter: D - memory address: 0x7ffea8ea9528
letter: E - memory address: 0x7ffea8ea9530
letter: F - memory address: 0x7ffea8ea9538
letter: G - memory address: 0x7ffea8ea9540
letter: H - memory address: 0x7ffea8ea9548
letter: I - memory address: 0x7ffea8ea9550
letter: J - memory address: 0x7ffea8ea9558
letter: K - memory address: 0x7ffea8ea9560
letter: L - memory address: 0x7ffea8ea9568
letter: M - memory address: 0x7ffea8ea9570
letter: N - memory address: 0x7ffea8ea9578
letter: O - memory address: 0x7ffea8ea9580
etc. etc. etc.
When you write
char *str1 = "ABC";
memory looks like this:
+---+---+---+---+
| A | B | C | \0|
+---+---+---+---+
^
|
+---+
| | str1
+---+
With that in mind, what does
*str + i
do? Well, C interprets this to mean "give me the character pointed at by str, then add i to it." Since str1 points to the first character of the string, the value of *str is 'A'. Adding i to 'A' then starts advancing through the characters of the alphabet sequentially, which is why you see A, then B, then C, etc.
On the other hand, when you write
&str1 + i
C interprets this to mean "give me the address of the variable str1, then shift forward i positions." So, for example, &str1 + 0 is the address of the str1 pointer, then &str1 + 1 is the memory address of where a char* one position past str1 would be, etc. But none of those addresses, other than &str1 + 0, actually represents anything.
If you want to see the addresses of the array elements, just write str1 + i. That means "go where str1 points, then advance forward i positions."
I have the following two types that are mutually recursive (they have pointers pointing to each other):
vtypedef SimpleTextOutputInterface =
#{ reset = EfiTextReset
, output_string = [l:addr] (EfiTextString#l | ptr l)
}
vtypedef EfiTextString = [n:nat][l1,l2:addr] (SimpleTextOutputInterface#l1, #[uint16][n]#l2 | ptr l1, ptr l2) -> uint64
I tried declaring an abstract type, like this:
absvt#ype SimpleTextOutputInterface
vtypedef EfiTextString = [n:nat][l1,l2:addr] (SimpleTextOutputInterface#l1, #[uint16][n]#l2 | ptr l1, ptr l2) -> uint64
assume SimpleTextOutputInterface =
#{ reset = EfiTextReset
, output_string = [l:addr] (EfiTextString#l | ptr l)
}
But I have type errors when I try to use them (as though an at-view is lost somewhere).
Is there a way to make this work?
Maybe with a forward declaration if that exists in ATS?
My "standard" approach is to introduce a dummy and some related proof functions:
abst#ype SimpleTextOutputInterface_
prfun encode :
{l:addr} SimpleTextOutputInterface #l -> SimpleTextOutputInterface_#l
prfun decode :
{l:addr} SimpleTextOutputInterface_#l -> SimpleTextOutputInterface #l
I'm trying to perform some write in memory registers from user space ( through a custom driver ). I want to write three 64-bit integer and I initialized the variables "value_1, value_2 and value_3" to uint64_t type.
I must use the gcc inline mov instruction and I'm working on an ARM 64-bit architecture on a custom version of linux for an embedded system.
Thi is my code:
asm ( "MOV %[reg1], %[val1]\t\n"
"MOV %[reg2], %[val2]\t\n"
"MOV %[reg3], %[val3]\t\n"
:[reg1] "=&r" (*register_1),[arg2] "=&r" (*register_2), [arg3] "=&r" (*register_3)
:[val1] "r"(value_1),[val2] "r" (value_2), [val3] "r" (value_3)
);
The problem is strange...
If I perform just two MOV, the code works.
If I perform all the three MOV, the entire system crash and I have to reboot the entire system.
even stranger...
If I put a "printf" o even a nanosleep with 0 nanosecond between the second and the third MOV, the code works!
I looked around trying to find a solution and I also use the clobber of the memory:
asm ( "MOV %[reg1], %[val1]\t\n"
"MOV %[reg2], %[val2]\t\n"
"MOV %[reg3], %[val3]\t\n"
:[reg1] "=&r" (*register_1),[arg2] "=&r" (*register_2), [arg3] "=&r" (*register_3)
:[val1] "r"(value_1),[val2] "r" (value_2), [val3] "r" (value_3)
:"memory"
);
...doen't work!
I used also the memory barrier macro between the second and the third MOV or at the end of the three MOV:
asm volatile("": : :"memory")
..doesn't work!
Also, I tried to write directly into the register using pointers and I had the same behavior: after the second write the system crash...
Anybody can suggest me a solution..or tell me if I'm using the gcc inline MOV or the memory barrier in a wrong way?
----> MORE DETAILS <-----
This is my main:
int main()
{
int dev_fd;
volatile void * base_addr = NULL;
volatile uint64_t * reg1_addr = NULL;
volatile uint32_t * reg2_addr = NULL;
volatile uint32_t * reg3_addr = NULL;
dev_fd = open(MY_DEVICE, O_RDWR);
if (dev_fd < 0)
{
perror("Open call failed");
return -1;
}
base_addr = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, xsmll_dev_fd, 0);
if (base_addr == MAP_FAILED)
{
perror("mmap operation failed");
return -1;
}
printf("BASE ADDRESS VIRT: 0x%p\n", base_addr);
/* Preparing the registers */
reg1_addr = base_addr + REG1_OFF;
reg2_addr = base_addr + REG2_OFF;
reg3_addr = base_addr + REG3_OFF;
uint64_t val_1 = 0xEEEEEEEE;
uint64_t val_2 = 0x00030010;
uint64_t val_3 = 0x01;
asm ( "str %[val1], %[reg1]\t\n"
"str %[val2], %[reg2]\t\n"
"str %[val3], %[reg3]\t\n"
:[reg1] "=&m" (*reg1_addr),[reg2] "=&m" (*reg2_addr), [reg3] "=&m" (*reg3_addr)
:[val1] "r"(val_1),[val2] "r" (val_2), [val3] "r" (val_3)
);
printf("--- END ---\n");
close(dev_fd);
return 0;
}
This is the output of the compiler regarding the asm statement (linaro..I cross compile):
400bfc: f90013a0 str x0, [x29,#32]
400c00: f94027a3 ldr x3, [x29,#72]
400c04: f94023a4 ldr x4, [x29,#64]
400c08: f9402ba5 ldr x5, [x29,#80]
400c0c: f9401ba0 ldr x0, [x29,#48]
400c10: f94017a1 ldr x1, [x29,#40]
400c14: f94013a2 ldr x2, [x29,#32]
400c18: f9000060 str x0, [x3]
400c1c: f9000081 str x1, [x4]
400c20: f90000a2 str x2, [x5]
Thank you!
I tried with *reg1_addr = val_1; and I have the same problem.
Then this code isn't the problem. Avoiding asm is just a cleaner way to get equivalent machine code, without having to use inline asm. Your problem is more likely your choice of registers and values, or the kernel driver.
Or do you need the values to be in CPU registers before writing the first mmaped location, to avoid loading anything from the stack between stores? That's the only reason I can think of that you'd need inline asm, where compiler-generated stores might not be equivalent.
Answer to original question:
An "=&r" output constraint means a CPU register. So your inline-asm instructions will run in that order, assembling to something like
mov x0, x5
mov x1, x6
mov x2, x7
And then after that, compiler-generated code will store the values back to memory in some unspecified order. That order depends on how it chooses to generate code for the surrounding C. This is probably why changing the surrounding code changes the behaviour.
One solution might be "=&m" constraints with str instructions, so your asm does actually store to memory. str %[val1], %[reg1] because STR instructions take the addressing mode as the 2nd operand, even though it's the destination.
Why can't you use volatile uint64_t* = register_1; like a normal person, to have the compiler emit store instructions that it's not allowed to reorder or optimize away? MMIO is exactly what volatile is for.
Doesn't Linux have macros or functions for doing MMIO loads/stores?
If you're having problem with inline asm, step 1 in debugging should be to look at the actual asm emitted by the compiler when it filled in the asm template, and the surrounding code.
Then single-step by instructions through the code (with GDB stepi, maybe in layout reg mode).
To clarify the question, please observe the c/c++ code fragment:
int a = 10, b = 20, c = 30, d = 40; //consecutive 4 int data values.
int* p = &d; //address of variable d.
Now, in visual studio (tested on 2013), if value of p == hex_value (which can be viewed in debugger memory window), then, you can observe that, the addresses for other variables a, b, c, and d are each at a 12 byte difference!
So, if p == hex_value, then it follows:
&c == hex_value + 0xC (note hex C is 12 in decimal)
&b == &c + 0xC
&a == &b + 0xC
So, why is there a 12 bytes offset instead of 4 bytes -- int are just 4 bytes?
Now, if we declared an array:
int array[] = {10,20,30,40};
The values 10, 20, 30, 40 each are located at 4 bytes difference as expected!
Can anyone please explain this behavior?
The standard C++ states in section 8.3.4 Arrays that "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T."
This is why, array[] will be a set of contiguous int, and that difference between one element and the next will be exactly sizeof(int).
For local/block variables (automatic storage), no such guarantee is given. The only statements are in section 1.7. The C++ memory model: "Every byte has a unique address." and 1.8. The C++ object model: "the address of that object is the address of the first byte it occupies. Two objects (...) shall have distinct addresses".
So everything that you do assuming contiguousness of such objects would be undefined behavior and non portable. You cannot even be sure of the order of the addresses in which these objects are created.
Now I have played with a modified version of your code:
int a = 10, b = 20, c = 30, d = 40; //consecutive 4 int data values.
int* p = &d; //address of variable d.
int array[] = { 10, 20, 30, 40 };
char *pa = reinterpret_cast<char*>(&a),
*pb = reinterpret_cast<char*>(&b),
*pc = reinterpret_cast<char*>(&c),
*pd = reinterpret_cast<char*>(&d);
cout << "sizeof(int)=" << sizeof(int) << "\n &a=" << &a << \
" +" << pa - pb << "char\n &b=" << &b << \
" +" << pb - pc << "char\n &c=" << &c << \
" +" << pc - pd << "char\n &d=" << &d;
memset(&d, 0, (&a - &d)*sizeof(int));
// ATTENTION: undefined behaviour:
// will trigger core dump on leaving
// "Runtime check #2, stack arround the variable b was corrupted".
When running this code I get:
debug release comment on release
sizeof(int)=4 sizeof(int)=4
&a=0052F884 +12char &a=009EF9AC +4char
&b=0052F878 +12char &b=009EF9A8 +-8char // is before a
&c=0052F86C +12char &c=009EF9B0 +12char // is just after a !!
&d=0052F860 &d=009EF9A4
So you see that the order of the addresses may even be altered on the same compiler, depending on the build options !! In fact, in release mode the variables are contiguous but not in the same order.
The extra spaces on the debug version come from option /RTCs. I have on purpose overwritten the variables with a harsh memset() that assumes they are contiguous. Upon exit of the execution, I get immediately a message: "Runtime check #2, stack arround the variable b was corrupted", which clearly demonstrate the purpose of these extra chars.
If you remove the option, you will get with MSVC13 contiguous variables, each of 4 bytes as you did expect. But there will be no more error message about corruption of stack either.