how to work with hexstrings and bit shifting in golang - go

I have a hexademial string that I constructed from arduino sensor "5FEE012503591FC70CC8"
This value contains an array of with specific data placed at specific locations of the hex string. Using javascript I can convert the hexstring to a buffer then apply bit shifting operations to extract data, for example battery status like so
const bytes = new Buffer.from("5FEE012503591FC70CC8", "hex");
const battery = (bytes[8] << 8) | bytes[9];
console.log(battery)
My attemp works but I feel its not optimal like the js way I have shown above
package main
import (
"fmt"
"strconv"
)
func main() {
hex := "5FEE012503591FC70CC8"
bat, err := strconv.ParseInt(hex[16:20],16,64)
if err != nil {
panic(err)
}
fmt.Println("Battry: ",bat)
}
I would really appreciate if someone could show me how to work with buffers or equivalent to the js solution but in go. FYI the battery information is in the last 4 characters in the hex string

The standard way to convert a string of hex-digits into a bytes slice is to use hex.DecodeString:
import "encoding/hex"
bs, err := hex.DecodeString("5FEE012503591FC70CC8")
you can convert the last two bytes into an uint16 (unsigned since battery level is probably not negative) in the same fashion as your Javascript:
bat := uint16(bs[8])<< 8 | uint16(bs[9])
https://play.golang.org/p/eeCeofCathF
or you can use the encoding/binary package:
bat := binary.BigEndian.Uint16(bs[8:])
https://play.golang.org/p/FVvaofgM8W-

Related

How is this code generating memory aligned slices?

I'm trying to do direct i/o on linux, so I need to create memory aligned buffers. I copied some code to do it, but I don't understand how it works:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
"yottaStore/yottaStore-go/src/yfs/test/utils"
)
const (
AlignSize = 4096
BlockSize = 4096
)
// Looks like dark magic
func Alignment(block []byte, AlignSize int) int {
return int(uintptr(unsafe.Pointer(&block[0])) & uintptr(AlignSize-1))
}
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096*2)
a := Alignment(file, AlignSize)
offset := 0
if a != 0 {
offset = AlignSize - a
}
file = file[offset : offset+BlockSize]
n, readErr := unix.Pread(fd, file, 0)
if readErr != nil {
panic(readErr)
}
fmt.Println(a, offset, offset+utils.BlockSize, len(file))
fmt.Println("Content is: ", string(file))
}
I understand that I'm generating a slice twice as big than what I need, and then extracting a memory aligned block from it, but the Alignment function doesn't make sense to me.
How does the Alignment function works?
If I try to fmt.Println the intermediate steps of that function I get different results, why? I guess because observing it changes its memory alignment (like in quantum physics :D)
Edit:
Example with fmt.println, where I don't need any more alignment:
package main
import (
"fmt"
"golang.org/x/sys/unix"
"unsafe"
)
func main() {
path := "/path/to/file.txt"
fd, err := unix.Open(path, unix.O_RDONLY|unix.O_DIRECT, 0666)
defer unix.Close(fd)
if err != nil {
panic(err)
}
file := make([]byte, 4096)
fmt.Println("Pointer: ", &file[0])
n, readErr := unix.Pread(fd, file, 0)
fmt.Println("Return is: ", n)
if readErr != nil {
panic(readErr)
}
fmt.Println("Content is: ", string(file))
}
Your AlignSize has a value of a power of 2. In binary representation it contains a 1 bit followed by full of zeros:
fmt.Printf("%b", AlignSize) // 1000000000000
A slice allocated by make() may have a memory address that is more or less random, consisting of ones and zeros following randomly in binary; or more precisely the starting address of its backing array.
Since you allocate twice the required size, that's a guarantee that the backing array will cover an address space that has an address in the middle somewhere that ends with as many zeros as the AlignSize's binary representation, and has BlockSize room in the array starting at this. We want to find this address.
This is what the Alignment() function does. It gets the starting address of the backing array with &block[0]. In Go there's no pointer arithmetic, so in order to do something like that, we have to convert the pointer to an integer (there is integer arithmetic of course). In order to do that, we have to convert the pointer to unsafe.Pointer: all pointers are convertible to this type, and unsafe.Pointer can be converted to uintptr (which is an unsigned integer large enough to store the uninterpreted bits of a pointer value), on which–being an integer–we can perform integer arithmetic.
We use bitwise AND with the value uintptr(AlignSize-1). Since AlignSize is a power of 2 (contains a single 1 bit followed by zeros), the number one less is a number whose binary representation is full of ones, as many as trailing zeros AlignSize has. See this example:
x := 0b1010101110101010101
fmt.Printf("AlignSize : %22b\n", AlignSize)
fmt.Printf("AlignSize-1 : %22b\n", AlignSize-1)
fmt.Printf("x : %22b\n", x)
fmt.Printf("result of & : %22b\n", x&(AlignSize-1))
Output:
AlignSize : 1000000000000
AlignSize-1 : 111111111111
x : 1010101110101010101
result of & : 110101010101
So the result of & is the offset which if you subtract from AlignSize, you get an address that has as many trailing zeros as AlignSize itself: the result is "aligned" to the multiple of AlignSize.
So we will use the part of the file slice starting at offset, and we only need BlockSize:
file = file[offset : offset+BlockSize]
Edit:
Looking at your modified code trying to print the steps: I get an output like:
Pointer: 0xc0000b6000
Unsafe pointer: 0xc0000b6000
Unsafe pointer, uintptr: 824634466304
Unpersand: 0
Cast to int: 0
Return is: 0
Content is:
Note nothing is changed here. Simply the fmt package prints pointer values using hexadecimal representation, prefixed by 0x. uintptr values are printed as integers, using decimal representation. Those values are equal:
fmt.Println(0xc0000b6000, 824634466304) // output: 824634466304 824634466304
Also note the rest is 0 because in my case 0xc0000b6000 is already a multiple of 4096, in binary it is 1100000000000000000100001110000000000000.
Edit #2:
When you use fmt.Println() to debug parts of the calculation, that may change escape analysis and may change the allocation of the slice (from stack to heap). This depends on the used Go version too. Do not rely on your slice being allocated at an address that is (already) aligned to AlignSize.
See related questions for more details:
Mix print and fmt.Println and stack growing
why struct arrays comparing has different result
Addresses of slices of empty structs

How do I count emojis in a string in go? [duplicate]

How can I get the number of characters of a string in Go?
For example, if I have a string "hello" the method should return 5. I saw that len(str) returns the number of bytes and not the number of characters so len("£") returns 2 instead of 1 because £ is encoded with two bytes in UTF-8.
You can try RuneCountInString from the utf8 package.
returns the number of runes in p
that, as illustrated in this script: the length of "World" might be 6 (when written in Chinese: "世界"), but the rune count of "世界" is 2:
package main
import "fmt"
import "unicode/utf8"
func main() {
fmt.Println("Hello, 世界", len("世界"), utf8.RuneCountInString("世界"))
}
Phrozen adds in the comments:
Actually you can do len() over runes by just type casting.
len([]rune("世界")) will print 2. At least in Go 1.3.
And with CL 108985 (May 2018, for Go 1.11), len([]rune(string)) is now optimized. (Fixes issue 24923)
The compiler detects len([]rune(string)) pattern automatically, and replaces it with for r := range s call.
Adds a new runtime function to count runes in a string.
Modifies the compiler to detect the pattern len([]rune(string))
and replaces it with the new rune counting runtime function.
RuneCount/lenruneslice/ASCII 27.8ns ± 2% 14.5ns ± 3% -47.70%
RuneCount/lenruneslice/Japanese 126ns ± 2% 60 ns ± 2% -52.03%
RuneCount/lenruneslice/MixedLength 104ns ± 2% 50 ns ± 1% -51.71%
Stefan Steiger points to the blog post "Text normalization in Go"
What is a character?
As was mentioned in the strings blog post, characters can span multiple runes.
For example, an 'e' and '◌́◌́' (acute "\u0301") can combine to form 'é' ("e\u0301" in NFD). Together these two runes are one character.
The definition of a character may vary depending on the application.
For normalization we will define it as:
a sequence of runes that starts with a starter,
a rune that does not modify or combine backwards with any other rune,
followed by possibly empty sequence of non-starters, that is, runes that do (typically accents).
The normalization algorithm processes one character at at time.
Using that package and its Iter type, the actual number of "character" would be:
package main
import "fmt"
import "golang.org/x/text/unicode/norm"
func main() {
var ia norm.Iter
ia.InitString(norm.NFKD, "école")
nc := 0
for !ia.Done() {
nc = nc + 1
ia.Next()
}
fmt.Printf("Number of chars: %d\n", nc)
}
Here, this uses the Unicode Normalization form NFKD "Compatibility Decomposition"
Oliver's answer points to UNICODE TEXT SEGMENTATION as the only way to reliably determining default boundaries between certain significant text elements: user-perceived characters, words, and sentences.
For that, you need an external library like rivo/uniseg, which does Unicode Text Segmentation.
That will actually count "grapheme cluster", where multiple code points may be combined into one user-perceived character.
package uniseg
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
gr := uniseg.NewGraphemes("👍🏼!")
for gr.Next() {
fmt.Printf("%x ", gr.Runes())
}
// Output: [1f44d 1f3fc] [21]
}
Two graphemes, even though there are three runes (Unicode code points).
You can see other examples in "How to manipulate strings in GO to reverse them?"
👩🏾‍🦰 alone is one grapheme, but, from unicode to code points converter, 4 runes:
👩: women (1f469)
dark skin (1f3fe)
ZERO WIDTH JOINER (200d)
🦰red hair (1f9b0)
There is a way to get count of runes without any packages by converting string to []rune as len([]rune(YOUR_STRING)):
package main
import "fmt"
func main() {
russian := "Спутник и погром"
english := "Sputnik & pogrom"
fmt.Println("count of bytes:",
len(russian),
len(english))
fmt.Println("count of runes:",
len([]rune(russian)),
len([]rune(english)))
}
count of bytes 30 16
count of runes 16 16
I should point out that none of the answers provided so far give you the number of characters as you would expect, especially when you're dealing with emojis (but also some languages like Thai, Korean, or Arabic). VonC's suggestions will output the following:
fmt.Println(utf8.RuneCountInString("🏳️‍🌈🇩🇪")) // Outputs "6".
fmt.Println(len([]rune("🏳️‍🌈🇩🇪"))) // Outputs "6".
That's because these methods only count Unicode code points. There are many characters which can be composed of multiple code points.
Same for using the Normalization package:
var ia norm.Iter
ia.InitString(norm.NFKD, "🏳️‍🌈🇩🇪")
nc := 0
for !ia.Done() {
nc = nc + 1
ia.Next()
}
fmt.Println(nc) // Outputs "6".
Normalization is not really the same as counting characters and many characters cannot be normalized into a one-code-point equivalent.
masakielastic's answer comes close but only handles modifiers (the rainbow flag contains a modifier which is thus not counted as its own code point):
fmt.Println(GraphemeCountInString("🏳️‍🌈🇩🇪")) // Outputs "5".
fmt.Println(GraphemeCountInString2("🏳️‍🌈🇩🇪")) // Outputs "5".
The correct way to split Unicode strings into (user-perceived) characters, i.e. grapheme clusters, is defined in the Unicode Standard Annex #29. The rules can be found in Section 3.1.1. The github.com/rivo/uniseg package implements these rules so you can determine the correct number of characters in a string:
fmt.Println(uniseg.GraphemeClusterCount("🏳️‍🌈🇩🇪")) // Outputs "2".
If you need to take grapheme clusters into account, use regexp or unicode module. Counting the number of code points(runes) or bytes also is needed for validaiton since the length of grapheme cluster is unlimited. If you want to eliminate extremely long sequences, check if the sequences conform to stream-safe text format.
package main
import (
"regexp"
"unicode"
"strings"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
str2 := "a" + strings.Repeat("\u0308", 1000)
println(4 == GraphemeCountInString(str))
println(4 == GraphemeCountInString2(str))
println(1 == GraphemeCountInString(str2))
println(1 == GraphemeCountInString2(str2))
println(true == IsStreamSafeString(str))
println(false == IsStreamSafeString(str2))
}
func GraphemeCountInString(str string) int {
re := regexp.MustCompile("\\PM\\pM*|.")
return len(re.FindAllString(str, -1))
}
func GraphemeCountInString2(str string) int {
length := 0
checked := false
index := 0
for _, c := range str {
if !unicode.Is(unicode.M, c) {
length++
if checked == false {
checked = true
}
} else if checked == false {
length++
}
index++
}
return length
}
func IsStreamSafeString(str string) bool {
re := regexp.MustCompile("\\PM\\pM{30,}")
return !re.MatchString(str)
}
There are several ways to get a string length:
package main
import (
"bytes"
"fmt"
"strings"
"unicode/utf8"
)
func main() {
b := "这是个测试"
len1 := len([]rune(b))
len2 := bytes.Count([]byte(b), nil) -1
len3 := strings.Count(b, "") - 1
len4 := utf8.RuneCountInString(b)
fmt.Println(len1)
fmt.Println(len2)
fmt.Println(len3)
fmt.Println(len4)
}
Depends a lot on your definition of what a "character" is. If "rune equals a character " is OK for your task (generally it isn't) then the answer by VonC is perfect for you. Otherwise, it should be probably noted, that there are few situations where the number of runes in a Unicode string is an interesting value. And even in those situations it's better, if possible, to infer the count while "traversing" the string as the runes are processed to avoid doubling the UTF-8 decode effort.
I tried to make to do the normalization a bit faster:
en, _ = glyphSmart(data)
func glyphSmart(text string) (int, int) {
gc := 0
dummy := 0
for ind, _ := range text {
gc++
dummy = ind
}
dummy = 0
return gc, dummy
}

Using crypto/rand for generating permutations with rand.Perm

Go has two packages for random numbers:
crypto/rand, which provides a way to get random bytes
math/rand, which has a nice algorithm for shuffling ints
I want to use the Perm algorithm from math/rand, but provide it with high-quality random numbers.
Since the two rand packages are part of the same standard library there should be a way to combine them in a way so that crypto/rand provides a good source of random numbers that is used by math/rand.Perm to generate a permutation.
Here (and on the Playground) is the code I wrote to connect these two packages:
package main
import (
cryptoRand "crypto/rand"
"encoding/binary"
"fmt"
mathRand "math/rand"
)
type cryptoSource struct{}
func (s cryptoSource) Int63() int64 {
bytes := make([]byte, 8, 8)
cryptoRand.Read(bytes)
return int64(binary.BigEndian.Uint64(bytes) >> 1)
}
func (s cryptoSource) Seed(seed int64) {
panic("seed")
}
func main() {
rnd := mathRand.New(&cryptoSource{})
perm := rnd.Perm(52)
fmt.Println(perm)
}
This code works. Ideally I don't want to define the cryptoSource type myself but just stick together the two rand packages so that they work together. So is there a predefined version of this cryptoSource type somewhere?
That's basically what you need to do. It's not often that you need a cryptographically secure source of randomness for the common usage of math/rand, so there's no adaptor provided. You can make the implementation slightly more efficient by allocating the buffer space directly in the value, rather than allocating a new slice on every call. However in the unlikely event that reading the OS random source fails, this will need to panic to prevent returning invalid results.
type cryptoSource [8]byte
func (s *cryptoSource) Int63() int64 {
_, err := cryptoRand.Read(s[:])
if err != nil {
panic(err)
}
return int64(binary.BigEndian.Uint64(s[:]) & (1<<63 - 1))
}

Umlauts and slices

I'm having some trouble while reading a file which has a fixed column length format. Some columns may contain umlauts.
Umlauts seem to use 2 bytes instead of one. This is not the behaviour I was expecting. Is there any kind of function which returns a substring? Slice does not seem to work in this case.
Here's some sample code:
http://play.golang.org/p/ZJ1axy7UXe
umlautsString := "Rhön"
fmt.Println(len(umlautsString))
fmt.Println(umlautsString[0:4])
Prints:
5
Rhö
In go, a slice of a string counts bytes, not runes. This is why "Rhön"[0:3] gives you Rh and the first byte of ö.
Characters encoded in UTF-8 are represented as runes because UTF-8 encodes characters in more than one
byte (up to four bytes) to provide a bigger range of characters.
If you want to slice a string with the [] syntax, convert the string to []rune before.
Example (on play):
umlautsString := "Rhön"
runes = []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rhö
Noteworthy: This golang blog post about string representation in go.
You can convert string to []rune and work with it:
package main
import "fmt"
func main() {
umlautsString := "Rhön"
fmt.Println(len(umlautsString))
subStrRunes:= []rune(umlautsString)
fmt.Println(len(subStrRunes))
fmt.Println(string(subStrRunes[0:4]))
}
http://play.golang.org/p/__WfitzMOJ
Hope that helps!
Another option is the utf8string package:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString("🧡💛💚💙💜")
// example 1
n := s.RuneCount()
println(n == 5)
// example 2
t := s.Slice(0, 2)
println(t == "🧡💛")
}
https://pkg.go.dev/golang.org/x/exp/utf8string

Looking for an elegant way to parse an Integer [duplicate]

This question already has answers here:
Convert string to integer type in Go?
(5 answers)
Closed 8 months ago.
Right now I am doing the following in order to parse an integer from a string and then convert it to int type:
tmpValue, _ := strconv.ParseInt(str, 10, 64) //returns int64
finalValue = int(tmpValue)
It is quite verbose, and definitely not pretty, since I haven't found a way to do the conversion in the ParseInt call. Is there a nicer way to do that?
It seems that the function strconv.Atoi does what you want, except that it works regardless of the bit size of int (your code seems to assume it's 64 bits wide).
If you have to write that once in your program than I see no problem. If you need at several places you can write a simple specialization wrapper, for example:
func parseInt(s string) (int, error) {
i, err := strconv.ParseInt(str, 10, 32) // or 64 on 64bit tip
return int(i), err
}
The standard library doesn't aim to supply (bloated) APIs for every possible numeric type and/or their combinations.
Don't forget to check for errors. For example,
package main
import (
"fmt"
"strconv"
)
func main() {
s := "123"
i, err := strconv.Atoi(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(i)
}
Output:
123

Resources