question about byte order and go standard library? - go

we have num 0x1234
In bigEndian:
low address -----------------> high address
0x12 | 0x34
In littleEndian:
low address -----------------> high address
0x34 | 0x12
we can see function below in binary.go:
func (bigEndian) PutUint16(b []byte, v uint16) {
_ = b[1] // early bounds check to guarantee safety of writes below
b[0] = byte(v >> 8)
b[1] = byte(v)
}
I download golang code of x86 and powpc arch and find the same definition.
https://golang.org/dl/
go1.12.7.linux-ppc64le.tar.gz Archive Linux ppc64le 99MB 8eda20600d90247efbfa70d116d80056e11192d62592240975b2a8c53caa5bf3
Now let's see what happen in this function.
If cpu is little endian, we store 0x1234 in memory like this:
low address -----------------> high address
0x34 | 0x12
v >> 8 means shift 8 bits right, means /2^8, so we get this in memory:
low address -----------------> high address
0x12 | 0x00
byte(v>>8), we get byte 0x12 which is in low address -> b[0]
byte(v), we get byte 0x34 -> b[1]
so we get the result which i think it's right:
[0x12,0x34]
=====================================
If cpu is big endian, we store 0x1234 in memory like this:
low address -----------------> high address
0x12 | 0x34
v >> 8 means shift 8 bits right, means /2^8, so we get this in memory:
low address -----------------> high address
0x00 | 0x12
byte(v>>8), we get byte 0x00 which is in low address -> b[0]
byte(v), we get byte 0x12 -> b[1]
so we get the result which i think it's not right:
[0x00,0x12]
I find in web how to check your cpu bigendian or little endian, and i write function below:
func IsBigEndian() bool {
test16 := uint16(0x1234)
test8 := *(*uint8)(unsafe.Pointer(&test16))
if test8 == 0x12{
return true
}else{
fmt.Printf("little")
return false
}
}
According to this function, I think byte() means get low address byte, am I right?
If right, why i get wrong result in analysis of "if cpu is big endian ..." ?

thanks a lot #Volker, I found this post Does bit-shift depend on endianness? . And know "byte(xxx)" operate in processor's register which not depend on the endianness in memory, so byte(0x1234) always get 0x34.

Related

IPv4 Address BigEndian ByteOrder

I'm a bit confused about how Go binary package within the standard library represents integer into []byte with BigEndian ordering.
For reference, below is the method in the standard library I'm confused with:
func (bigEndian) PutUint32(b []byte, v uint32) {
_ = b[3] // early bounds check to guarantee safety of writes below
b[0] = byte(v >> 24)
b[1] = byte(v >> 16)
b[2] = byte(v >> 8)
b[3] = byte(v)
}
Suppose I have an IPv4 addressed represented as an unsigned 32-bits integer such as 236194314
With a BigEndian ordering, this should be represented as 4-bytes slice: [10 10 20 14]
However, the PutUint32 stores the most significant byte in the array in the last index b[3] = byte(v) resulting in [14 20 10 10].
Is there any specific explanation for this?
The number 236194314 is 0E 14 0A 0A in hex. So the most significant byte is indeed 14d. Your IPv4 addressed represented as an unsigned 32-bits integer comes in already byte reversed.
The problem happened before you convert to a byte slice.

Extending SRecord to handle crc32_mpeg2?

statement of problem:
I'm working with a Kinetis L series (ARM Cortex M0+) that has a dedicated CRC hardware module. Through trial and error and using this excellent online CRC calculator, I determined that the CRC hardware is configured to compute CRC32_MPEG2.
I'd like to use srec_input (a part of SRecord 1.64) to generate a CRC for a .srec file whose results must match the CRC_MPEG2 computed by the hardware. However, srec's built-in CRC algos (CRC32 and STM32) don't generate the same results as the CRC_MPEG2.
the question:
Is there a straightforward way to extend srec to handle CRC32_MPEG2? My current thought is to fork the srec source tree and extend it, but it seems likely that someone's already been down this path.
Alternatively, is there a way for srec to call an external program? (I didn't see one after a quick scan.) That might do the trick as well.
some details
The parameters of the hardware CRC32 algorithm are:
Input Reflected: No
Output Reflected: No
Polynomial: 0x4C11DB7
Initial Seed: 0xFFFFFFFF
Final XOR: 0x0
To test it, an input string of:
0x10 0xB5 0x06 0x4C 0x23 0x78 0x00 0x2B
0x07 0xD1 0x05 0x4B 0x00 0x2B 0x02 0xD0
should result in a CRC32 value of:
0x938F979A
what generated the CRC value in the first place?
In response to Mark Adler's well-posed question, the firmware uses the Freescale fsl_crc library to compute the CRC. The relevant code and parameters (mildly edited) follows:
void crc32_update(crc32_data_t *crc32Config, const uint8_t *src, uint32_t lengthInBytes)
{
crc_config_t crcUserConfigPtr;
CRC_GetDefaultConfig(&crcUserConfigPtr);
crcUserConfigPtr.crcBits = kCrcBits32;
crcUserConfigPtr.seed = 0xffffffff;
crcUserConfigPtr.polynomial = 0x04c11db7U;
crcUserConfigPtr.complementChecksum = false;
crcUserConfigPtr.reflectIn = false;
crcUserConfigPtr.reflectOut = false;
CRC_Init(g_crcBase[0], &crcUserConfigPtr);
CRC_WriteData(g_crcBase[0], src, lengthInBytes);
crcUserConfigPtr.seed = CRC_Get32bitResult(g_crcBase[0]);
crc32Config->currentCrc = crcUserConfigPtr.seed;
crc32Config->byteCountCrc += lengthInBytes;
}
Peter Miller be praised...
It turns out that if you supply enough filters to srec_cat, you can make it do anything! :) In fact, the following arguments the correct checksum:
$ srec_cat test.srec -Bit_Reverse -CRC32LE 0x1000 -Bit_Reverse -XOR 0xff -crop 0x1000 0x1004 -Output -HEX_DUMP
00001000: 93 8F 97 9A #....
In other words, bit reverse the bits going to the CRC32 algorithm, bit reverse them on the way out, and 1's compliment them.

golang slice allocation performance

I stumbled upon an interesting thing while checking performance of memory allocation in GO.
package main
import (
"fmt"
"time"
)
func main(){
const alloc int = 65536
now := time.Now()
loop := 50000
for i := 0; i<loop;i++{
sl := make([]byte, alloc)
i += len(sl) * 0
}
elpased := time.Since(now)
fmt.Printf("took %s to allocate %d bytes %d times", elpased, alloc, loop)
}
I am running this on a Core-i7 2600 with go version 1.6 64bit (also same results on 32bit) and 16GB of RAM (on WINDOWS 10)
so when alloc is 65536 (exactly 64K) it runs for 30 seconds (!!!!).
When alloc is 65535 it takes ~200ms.
Can someone explain this to me please?
I tried the same code at home with my core i7-920 # 3.8GHZ but it didn't show same results (both took around 200ms). Anyone has an idea what's going on?
Setting GOGC=off improved performance (down to less than 100ms). Why?
becaue of escape analysis. When you build with go build -gcflags -m the compiler prints whatever allocations escapes to heap. It really depends on your machine and GO compiler version but when the compiler decides that the allocation should move to heap it means 2 things:
1. the allocation will take longer (since "allocating" on the stack is just 1 cpu instruction)
2. the GC will have to clean up that memory later - costing more CPU time
for my machine, the allocation of 65536 bytes escapes to heap and 65535 doesn't.
that's why 1 bytes changed the whole proccess from 200ms to 30s. Amazing..
Note/Update 2021: as Tapir Liui notes in Go101 with this tweet:
As of Go 1.17, Go runtime will allocate the elements of slice x on stack if the compiler proves they are only used in the current goroutine and N <= 64KB:
var x = make([]byte, N)
And Go runtime will allocate the array y on stack if the compiler proves it is only used in the current goroutine and N <= 10MB:
var y [N]byte
Then how to allocated (the elements of) a slice which size is larger than 64KB but not larger than 10MB on stack (and the slice is only used in one goroutine)?
Just use the following way:
var y [N]byte
var x = y[:]
Considering stack allocation is faster than heap allocation, that would have a direct effect on your test, for alloc equals to 65536 and more.
Tapir adds:
In fact, we could allocate slices with arbitrary sum element sizes on stack.
const N = 500 * 1024 * 1024 // 500M
var v byte = 123
func createSlice() byte {
var s = []byte{N: 0}
for i := range s { s[i] = v }
return s[v]
}
Changing 500 to 512 make program crash.
the reason is very simple.
const alloc int = 65535
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $65784-0
const alloc int = 65536
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $248-0
the difference is where the slice are created.

How to calculate g values from LIS3DH sensor?

I am using LIS3DH sensor with ATmega128 to get the acceleration values to get motion. I went through the datasheet but it seemed inadequate so I decided to post it here. From other posts I am convinced that the sensor resolution is 12 bit instead of 16 bit. I need to know that when finding g value from the x-axis output register, do we calculate the two'2 complement of the register values only when the sign bit MSB of OUT_X_H (High bit register) is 1 or every time even when this bit is 0.
From my calculations I think that we calculate two's complement only when MSB of OUT_X_H register is 1.
But the datasheet says that we need to calculate two's complement of both OUT_X_L and OUT_X_H every time.
Could anyone enlighten me on this ?
Sample code
int main(void)
{
stdout = &uart_str;
UCSRB=0x18; // RXEN=1, TXEN=1
UCSRC=0x06; // no parit, 1-bit stop, 8-bit data
UBRRH=0;
UBRRL=71; // baud 9600
timer_init();
TWBR=216; // 400HZ
TWSR=0x03;
TWCR |= (1<<TWINT)|(1<<TWSTA)|(0<<TWSTO)|(1<<TWEN);//TWCR=0x04;
printf("\r\nLIS3D address: %x\r\n",twi_master_getchar(0x0F));
twi_master_putchar(0x23, 0b000100000);
printf("\r\nControl 4 register 0x23: %x", twi_master_getchar(0x23));
printf("\r\nStatus register %x", twi_master_getchar(0x27));
twi_master_putchar(0x20, 0x77);
DDRB=0xFF;
PORTB=0xFD;
SREG=0x80; //sei();
while(1)
{
process();
}
}
void process(void){
x_l = twi_master_getchar(0x28);
x_h = twi_master_getchar(0x29);
y_l = twi_master_getchar(0x2a);
y_h = twi_master_getchar(0x2b);
z_l = twi_master_getchar(0x2c);
z_h = twi_master_getchar(0x2d);
xvalue = (short int)(x_l+(x_h<<8));
yvalue = (short int)(y_l+(y_h<<8));
zvalue = (short int)(z_l+(z_h<<8));
printf("\r\nx_val: %ldg", x_val);
printf("\r\ny_val: %ldg", y_val);
printf("\r\nz_val: %ldg", z_val);
}
I wrote the CTRL_REG4 as 0x10(4g) but when I read them I got 0x20(8g). This seems bit bizarre.
Do not compute the 2s complement. That has the effect of making the result the negative of what it was.
Instead, the datasheet tells us the result is already a signed value. That is, 0 is not the lowest value; it is in the middle of the scale. (0xffff is just a little less than zero, not the highest value.)
Also, the result is always 16-bit, but the result is not meant to be taken to be that accurate. You can set a control register value to to generate more accurate values at the expense of current consumption, but it is still not guaranteed to be accurate to the last bit.
the datasheet does not say (at least the register description in chapter 8.2) you have to calculate the 2' complement but stated that the contents of the 2 registers is in 2's complement.
so all you have to do is receive the two bytes and cast it to an int16_t to get the signed raw value.
uint8_t xl = 0x00;
uint8_t xh = 0xFC;
int16_t x = (int16_t)((((uint16)xh) << 8) | xl);
or
uint8_t xa[2] {0x00, 0xFC}; // little endian: lower byte to lower address
int16_t x = *((int16*)xa);
(hope i did not mixed something up with this)
I have another approach, which may be easier to implement as the compiler will do all of the work for you. The compiler will probably do it most efficiently and with no bugs too.
Read the raw data into the raw field in:
typedef union
{
struct
{
// in low power - 8 significant bits, left justified
int16 reserved : 8;
int16 value : 8;
} lowPower;
struct
{
// in normal power - 10 significant bits, left justified
int16 reserved : 6;
int16 value : 10;
} normalPower;
struct
{
// in high resolution - 12 significant bits, left justified
int16 reserved : 4;
int16 value : 12;
} highPower;
// the raw data as read from registers H and L
uint16 raw;
} LIS3DH_RAW_CONVERTER_T;
than use the value needed according to the power mode you are using.
Note: In this example, bit fields structs are BIG ENDIANS.
Check if you need to reverse the order of 'value' and 'reserved'.
The LISxDH sensors are 2's complement, left-justified. They can be set to 12-bit, 10-bit, or 8-bit resolution. This is read from the sensor as two 8-bit values (LSB, MSB) that need to be assembled together.
If you set the resolution to 8-bit, just can just cast LSB to int8, which is the likely your processor's representation of 2's complement (8bit). Likewise, if it were possible to set the sensor to 16-bit resolution, you could just cast that to an int16.
However, if the value is 10-bit left justified, the sign bit is in the wrong place for an int16. Here is how you convert it to int16 (16-bit 2's complement).
1.Read LSB, MSB from the sensor:
[MMMM MMMM] [LL00 0000]
[1001 0101] [1100 0000] //example = [0x95] [0xC0] (note that the LSB comes before MSB on the sensor)
2.Assemble the bytes, keeping in mind the LSB is left-justified.
//---As an example....
uint8_t byteMSB = 0x95; //[1001 0101]
uint8_t byteLSB = 0xC0; //[1100 0000]
//---Cast to U16 to make room, then combine the bytes---
assembledValue = ( (uint16_t)(byteMSB) << UINT8_LEN ) | (uint16_t)byteLSB;
/*[MMMM MMMM LL00 0000]
[1001 0101 1100 0000] = 0x95C0 */
//---Shift to right justify---
assembledValue >>= (INT16_LEN-numBits);
/*[0000 00MM MMMM MMLL]
[0000 0010 0101 0111] = 0x0257 */
3.Convert from 10-bit 2's complement (now right-justified) to an int16 (which is just 16-bit 2's complement on most platforms).
Approach #1: If the sign bit (in our example, the tenth bit) = 0, then just cast it to int16 (since positive numbers are represented the same in 10-bit 2's complement and 16-bit 2's complement).
If the sign bit = 1, then invert the bits (keeping just the 10bits), add 1 to the result, then multiply by -1 (as per the definition of 2's complement).
convertedValueI16 = ~assembledValue; //invert bits
convertedValueI16 &= ( 0xFFFF>>(16-numBits) ); //but keep just the 10-bits
convertedValueI16 += 1; //add 1
convertedValueI16 *=-1; //multiply by -1
/*Note that the last two lines could be replaced by convertedValueI16 = ~convertedValueI16;*/
//result = -425 = 0xFE57 = [1111 1110 0101 0111]
Approach#2: Zero the sign bit (10th bit) and subtract out half the range 1<<9
//----Zero the sign bit (tenth bit)----
convertedValueI16 = (int16_t)( assembledValue^( 0x0001<<(numBits-1) ) );
/*Result = 87 = 0x57 [0000 0000 0101 0111]*/
//----Subtract out half the range----
convertedValueI16 -= ( (int16_t)(1)<<(numBits-1) );
[0000 0000 0101 0111]
-[0000 0010 0000 0000]
= [1111 1110 0101 0111];
/*Result = 87 - 512 = -425 = 0xFE57
Link to script to try out (not optimized): http://tpcg.io/NHmBRR

Breaking a 32 bit integer into 8 bit chucks for Radix Sort

I am basically a beginner in Computer Science. Please forgive me if I ask elementary questions. I am trying to understand radix sort. I read that a 32 bit unsigned integer can be broken down into 4 8-bit chunks. After that, all it takes is "4 passes" to complete the radix sort. Can somebody please show me an example for how this breakdown (32 bit into 4 8-bit chunks) works? Maybe, a 32-bit integer like 2147507648.
Thanks!
You would divide the 32 bit integer up in 4 pieces of 8 bits. Extracting those pieces is a matter of using using some of the operators available in C.:
uint32_t x = 2147507648;
uint8_t chunk1 = x & 0x000000ff; //lower 8 bits
uint8_t chunk2 = (x & 0x0000ff00) >> 8;
uint8_t chunk3 = (x & 0x00ff0000) >> 16;
uint8_t chunk4 = (x & 0xff000000) >> 24; //highest 8 bits
2147507648 decimal is 0x80005DC0 hex. You an pretty much eyeball those 8 bits out of the hex representation, since each hex digit represents 4 bits, two and two of them represents 8 bits.
So that now means chunk 1 is 0xC0, chunk 2 is 0x5D, chunk3 is 0x00 and chunk 4 is 0x80
It's done as follows:
2147507648
=> 0x80005DC0 (hex value of 2147507648)
=> 0x80 0x00 0x5D 0xC0
=> 128 0 93 192
To do this, you'd need bitwise operations as nos suggested.

Resources