bytewise compare varint encoded int64's - go

I'm using levigo, the leveldb bindings for Go. My keys are int64's and need to be kept sorted. By default, leveldb uses a bytewise comparator so I'm trying to use varint encoding.
func i2b(x int64) []byte {
b := make([]byte, binary.MaxVarintLen64)
n := binary.PutVarint(b, x)
return key[:n]
}
My keys are not being sorted correctly. I wrote the following as a test.
var prev int64 = 0
for i := int64(1); i < 1e5; i++ {
if bytes.Compare(i2b(i), i2b(prev)) <= 0 {
log.Fatalf("bytewise: %d > %d", b2i(prev), i)
}
prev = i
}
output: bytewise: 127 > 128
playground
I'm not sure where the problem is. Am I doing the encoding wrong? Is varint not the right encoding to use?
EDIT:
BigEndian fixed width encoding is bytewise comparable
func i2b(x int64) []byte {
b := make([]byte, 8)
binary.BigEndian.PutUint64(b, uint64(x))
return b
}

The varint encoding is not bytewise comparable* wrt to the order of the values it caries. One option how to write the ordering/collating function (cmp bellow) is for example:
package main
import (
"encoding/binary"
"log"
)
func i2b(x int64) []byte {
var b [binary.MaxVarintLen64]byte
return b[:binary.PutVarint(b[:], x)]
}
func cmp(a, b []byte) int64 {
x, n := binary.Varint(a)
if n < 0 {
log.Fatal(n)
}
y, n := binary.Varint(b)
if n < 0 {
log.Fatal(n)
}
return x - y
}
func main() {
var prev int64 = 0
for i := int64(1); i < 1e5; i++ {
if cmp(i2b(i), i2b(prev)) <= 0 {
log.Fatal("fail")
}
prev = i
}
}
Playground
(*) The reason is (also) the bit fiddling performed.

Related

Golang number base conversion

I was wondering, how do you convert a base10 number from one base to another without usage of strconv in Golang ?
Could you please give me some advice ?
package main
import (
"fmt"
"math/big"
)
func main() {
fmt.Println(big.NewInt(1000000000000).Text(62))
}
Demo
Use the math package and a log identify:
log_77(x) = log(x) / log(77)
This is probably cheating but I guess you could look at the implementation of strconv.FormatInt, and build some of your own code using that as an example. That way you aren't using it directly, you have implemented it yourself.
You can use this function to convert any decimal number to any base with the character set of your choice.
func encode(nb uint64, buf *bytes.Buffer, base string) {
l := uint64(len(base))
if nb/l != 0 {
encode(nb/l, buf, base)
}
buf.WriteByte(base[nb%l])
}
func decode(enc, base string) uint64 {
var nb uint64
lbase := len(base)
le := len(enc)
for i := 0; i < le; i++ {
mult := 1
for j := 0; j < le-i-1; j++ {
mult *= lbase
}
nb += uint64(strings.IndexByte(base, enc[i]) * mult)
}
return nb
}
You can use it like that:
// encoding
var buf bytes.Buffer
encode(100, &buf, "0123456789abcdef")
fmt.Println(buf.String())
// 64
// decoding
val := decode("64", "0123456789abcdef")
fmt.Println(val)
// 100

How to remove Unicode characters from byte buffer in Go?

I have a bytes.Buffer type variable which I filled with Unicode characters:
var mbuff bytes.Buffer
unicodeSource := 'کیا حال ھے؟'
for i,r := range(unicodeSource) {
mbuff.WriteRune(r)
}
Note: I iterated over a Unicode literals here, but really the source is an infinite loop of user input characters.
Now, I want to remove a Unicode character from any position in the buffer mbuff. The problem is that characters may be of variable byte sizes. So I cannot just pick out the ith byte from mbuff.String() as it might be the beginning, middle, or end of a character. This is my trivial (and horrendous) solution:
// removing Unicode character at position n
var tempString string
currChar := 0
for _, ch := range(mbuff.String()) { // iterate over Unicode chars
if currChar != n { // skip concatenating nth char
tempString += ch
}
currChar++
}
mbuff.Reset() // empty buffer
mbuff.WriteString(tempString) // write new string
This is bad in many ways. For one, I convert buffer to string, remove ith element, and write a new string back into the buffer. Too many operations. Second, I use the += operator in the loop to concatenate Unicode characters into a new string. I am using buffers in the first place exactly to avoid concatenation using += which is slow as this answer points out.
What is an efficient method to remove the ith Unicode character in a bytes.Buffer?
Also what is an efficient way to insert a Unicode character after i-1 Unicode characters (i.e. in the ith place)?
To remove the ith rune from a slice of bytes, loop through the slice counting runes. When the ith rune is found, copy the bytes following the rune down to the position of the ith rune:
func removeAtBytes(p []byte, i int) []byte {
j := 0
k := 0
for k < len(p) {
_, n := utf8.DecodeRune(p[k:])
if i == j {
p = p[:k+copy(p[k:], p[k+n:])]
}
j++
k += n
}
return p
}
This function modifies the backing array of the argument slice, but it does not allocate memory.
Use this function to remove a rune from a bytes.Buffer.
p := removeAtBytes(mbuf.Bytes(), i)
mbuf.Truncate(len(p)) // backing bytes were updated, adjust length
playground example
To remove the ith rune from a string, loop through the string counting runes. When the ith rune is found, create a string by concatenating the segment of the string before the rune with the segment of the string after the rune.
func removeAt(s string, i int) string {
j := 0 // count of runes
k := 0 // index in string of current rune
for k < len(s) {
_, n := utf8.DecodeRuneInString(s[k:])
if i == j {
return s[:k] + s[k+n:]
}
j++
k += n
}
return s
}
This function allocates a single string, the result. DecodeRuneInString is a function in the standard library unicode/utf8 package.
Taking a step back, go often works on Readers and Writers, so an alternative solution would be to use the text/transform package. You create a Transformer, attach it to a Reader and use the new Reader to produce a transformed string. For example here's a skipper:
func main() {
src := strings.NewReader("کیا حال ھے؟")
skipped := transform.NewReader(src, NewSkipper(5))
var buf bytes.Buffer
io.Copy(&buf, skipped)
fmt.Println("RESULT:", buf.String())
}
And here's the implementation:
package main
import (
"bytes"
"fmt"
"io"
"strings"
"unicode/utf8"
"golang.org/x/text/transform"
)
type skipper struct {
pos int
cnt int
}
// NewSkipper creates a text transformer which will remove the rune at pos
func NewSkipper(pos int) transform.Transformer {
return &skipper{pos: pos}
}
func (s *skipper) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
for utf8.FullRune(src) {
_, sz := utf8.DecodeRune(src)
// not enough space in the dst
if len(dst) < sz {
return nDst, nSrc, transform.ErrShortDst
}
if s.pos != s.cnt {
copy(dst[:sz], src[:sz])
// track that we stored in dst
dst = dst[sz:]
nDst += sz
}
// track that we read from src
src = src[sz:]
nSrc += sz
// on to the next rune
s.cnt++
}
if len(src) > 0 && !atEOF {
return nDst, nSrc, transform.ErrShortSrc
}
return nDst, nSrc, nil
}
func (s *skipper) Reset() {
s.cnt = 0
}
There may be bugs with this code, but hopefully you can see the idea.
The benefit of this approach is it could work on a potentially infinite amount of data without having to store all of it in memory. For example you could transform a file this way.
Edit:
Remove the ith rune in the buffer:
A: Shift all runes one location to the left (Here A is faster than B), try it on The Go Playground:
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s
}
r := []rune(s)
if runePosition >= len(r) {
return s
}
copy(r[runePosition:], r[runePosition+1:])
return string(r[:len(r)-1])
}
B: Copy to new buffer, try it on The Go Playground
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s // avoid allocation
}
r := []rune(s)
if runePosition >= len(r) {
return s // avoid allocation
}
t := make([]rune, len(r)-1) // Apply replacements to buffer.
w := copy(t, r[:runePosition])
w += copy(t[w:], r[runePosition+1:])
return string(t[:w])
}
C: Try it on The Go Playground:
package main
import (
"bytes"
"fmt"
)
func main() {
str := "hello"
fmt.Println(str)
fmt.Println(removeRuneAt(str, 1))
buf := bytes.NewBuffer([]byte(str))
fmt.Println(buf.Bytes())
buf = bytes.NewBuffer([]byte(removeRuneAt(buf.String(), 1)))
fmt.Println(buf.Bytes())
}
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s // avoid allocation
}
r := []rune(s)
if runePosition >= len(r) {
return s // avoid allocation
}
t := make([]rune, len(r)-1) // Apply replacements to buffer.
w := copy(t, r[0:runePosition])
w += copy(t[w:], r[runePosition+1:])
return string(t[0:w])
}
D: Benchmark:
A: 745.0426ms
B: 1.0160581s
for 2000000 iterations
1- Short Answer: to replace all (n) instances of a character (or even a string):
n := -1
newR := ""
old := "µ"
buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, n)))
2- For replacing the character(string) in the ith instance in the buffer, you may use:
buf = bytes.NewBuffer([]byte(Replace(buf.String(), oldString, newOrEmptyString, ith)))
See:
// Replace returns a copy of the string s with the ith
// non-overlapping instance of old replaced by new.
func Replace(s, old, new string, ith int) string {
if len(old) == 0 || old == new || ith < 0 {
return s // avoid allocation
}
i, j := 0, 0
for ; ith >= 0; ith-- {
j = strings.Index(s[i:], old)
if j < 0 {
return s // avoid allocation
}
j += i
i = j + len(old)
}
t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
w := copy(t, s[0:j])
w += copy(t[w:], new)
w += copy(t[w:], s[j+len(old):])
return string(t[0:w])
}
Try it on The Go Playground:
package main
import (
"bytes"
"fmt"
"strings"
)
func main() {
str := `How are you?µ`
fmt.Println(str)
fmt.Println(Replace(str, "µ", "", 0))
buf := bytes.NewBuffer([]byte(str))
fmt.Println(buf.Bytes())
buf = bytes.NewBuffer([]byte(Replace(buf.String(), "µ", "", 0)))
fmt.Println(buf.Bytes())
}
func Replace(s, old, new string, ith int) string {
if len(old) == 0 || old == new || ith < 0 {
return s // avoid allocation
}
i, j := 0, 0
for ; ith >= 0; ith-- {
j = strings.Index(s[i:], old)
if j < 0 {
return s // avoid allocation
}
j += i
i = j + len(old)
}
t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
w := copy(t, s[0:j])
w += copy(t[w:], new)
w += copy(t[w:], s[j+len(old):])
return string(t[0:w])
}
3- If you want to remove all instances of Unicode character (old string) from any position in the string, you may use:
strings.Replace(str, old, "", -1)
4- Also this works fine for removing from bytes.buffer:
strings.Replace(buf.String(), old, newR, -1)
Like so:
buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, -1)))
Here is the complete working code (try it on The Go Playground):
package main
import (
"bytes"
"fmt"
"strings"
)
func main() {
str := `کیا حال ھے؟` //How are you?
old := `ک`
newR := ""
fmt.Println(strings.Replace(str, old, newR, -1))
buf := bytes.NewBuffer([]byte(str))
// for _, r := range str {
// buf.WriteRune(r)
// }
fmt.Println(buf.Bytes())
bs := []byte(strings.Replace(buf.String(), old, newR, -1))
buf = bytes.NewBuffer(bs)
fmt.Println(" ", buf.Bytes())
}
output:
یا حال ھے؟
[218 169 219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]
[219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]
5- strings.Replace is very efficient, see inside:
// Replace returns a copy of the string s with the first n
// non-overlapping instances of old replaced by new.
// If old is empty, it matches at the beginning of the string
// and after each UTF-8 sequence, yielding up to k+1 replacements
// for a k-rune string.
// If n < 0, there is no limit on the number of replacements.
func Replace(s, old, new string, n int) string {
if old == new || n == 0 {
return s // avoid allocation
}
// Compute number of replacements.
if m := Count(s, old); m == 0 {
return s // avoid allocation
} else if n < 0 || m < n {
n = m
}
// Apply replacements to buffer.
t := make([]byte, len(s)+n*(len(new)-len(old)))
w := 0
start := 0
for i := 0; i < n; i++ {
j := start
if len(old) == 0 {
if i > 0 {
_, wid := utf8.DecodeRuneInString(s[start:])
j += wid
}
} else {
j += Index(s[start:], old)
}
w += copy(t[w:], s[start:j])
w += copy(t[w:], new)
start = j + len(old)
}
w += copy(t[w:], s[start:])
return string(t[0:w])
}

Convert int32 to string in Golang

I need to convert an int32 to string in Golang. Is it possible to convert int32 to string in Golang without converting to int or int64 first?
Itoa needs an int. FormatInt needs an int64.
One line answer is fmt.Sprint(i).
Anyway there are many conversions, even inside standard library function like fmt.Sprint(i), so you have some options (try The Go Playground):
1- You may write your conversion function (Fastest):
func String(n int32) string {
buf := [11]byte{}
pos := len(buf)
i := int64(n)
signed := i < 0
if signed {
i = -i
}
for {
pos--
buf[pos], i = '0'+byte(i%10), i/10
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
2- You may use fmt.Sprint(i) (Slow)
See inside:
// Sprint formats using the default formats for its operands and returns the resulting string.
// Spaces are added between operands when neither is a string.
func Sprint(a ...interface{}) string {
p := newPrinter()
p.doPrint(a)
s := string(p.buf)
p.free()
return s
}
3- You may use strconv.Itoa(int(i)) (Fast)
See inside:
// Itoa is shorthand for FormatInt(int64(i), 10).
func Itoa(i int) string {
return FormatInt(int64(i), 10)
}
4- You may use strconv.FormatInt(int64(i), 10) (Faster)
See inside:
// FormatInt returns the string representation of i in the given base,
// for 2 <= base <= 36. The result uses the lower-case letters 'a' to 'z'
// for digit values >= 10.
func FormatInt(i int64, base int) string {
_, s := formatBits(nil, uint64(i), base, i < 0, false)
return s
}
Comparison & Benchmark (with 50000000 iterations):
s = String(i) takes: 5.5923198s
s = String2(i) takes: 5.5923199s
s = strconv.FormatInt(int64(i), 10) takes: 5.9133382s
s = strconv.Itoa(int(i)) takes: 5.9763418s
s = fmt.Sprint(i) takes: 13.5697761s
Code:
package main
import (
"fmt"
//"strconv"
"time"
)
func main() {
var s string
i := int32(-2147483648)
t := time.Now()
for j := 0; j < 50000000; j++ {
s = String(i) //5.5923198s
//s = String2(i) //5.5923199s
//s = strconv.FormatInt(int64(i), 10) // 5.9133382s
//s = strconv.Itoa(int(i)) //5.9763418s
//s = fmt.Sprint(i) // 13.5697761s
}
fmt.Println(time.Since(t))
fmt.Println(s)
}
func String(n int32) string {
buf := [11]byte{}
pos := len(buf)
i := int64(n)
signed := i < 0
if signed {
i = -i
}
for {
pos--
buf[pos], i = '0'+byte(i%10), i/10
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
func String2(n int32) string {
buf := [11]byte{}
pos := len(buf)
i, q := int64(n), int64(0)
signed := i < 0
if signed {
i = -i
}
for {
pos--
q = i / 10
buf[pos], i = '0'+byte(i-10*q), q
if i == 0 {
if signed {
pos--
buf[pos] = '-'
}
return string(buf[pos:])
}
}
}
The Sprint function converts a given value to string.
package main
import (
"fmt"
)
func main() {
var sampleInt int32 = 1
sampleString := fmt.Sprint(sampleInt)
fmt.Printf("%+V %+V\n", sampleInt, sampleString)
}
// %!V(int32=+1) %!V(string=1)
See this example.
Use a conversion and strconv.FormatInt to format int32 values as a string. The conversion has zero cost on most platforms.
s := strconv.FormatInt(int64(n), 10)
If you have many calls like this, consider writing a helper function similar to strconv.Itoa:
func formatInt32(n int32) string {
return strconv.FormatInt(int64(n), 10)
}
All of the low-level integer formatting code in the standard library works with int64 values. Any answer to this question using formatting code in the standard library (fmt package included) requires a conversion to int64 somewhere. The only way to avoid the conversion is to write formatting function from scratch, but there's little point in doing that.
func FormatInt32(value int32) string {
return fmt.Sprintf("%d", value)
}
Does this work?

Golang Cryptographic Shuffle

I'm trying to implement a string shuffle function in Go that uses crypto/rand instead of math/rand. The Fisher-Yates Shuffle requires random integers so I've tried to implement that functionality, without having to use crypto/rand Int which relies on math/big. Below is the best I've come up with so far but is there a better method? The fact that I can't find existing examples leads me to wonder if there's a good reason why nobody does this!
package main
import "crypto/rand"
import "fmt"
import "encoding/binary"
func randomInt(max int) int {
var n uint16
binary.Read(rand.Reader, binary.LittleEndian, &n)
return int(n) % max
}
func shuffle(s *[]string) {
slice := *s
for i := range slice {
j := randomInt(i + 1)
slice[i], slice[j] = slice[j], slice[i]
}
*s = slice
}
func main() {
slice := []string{"a", "b", "c", "d", "e", "f", "h", "i", "j", "k"}
shuffle(&slice)
fmt.Println(slice)
}
Go's math/rand library has good facilities for producing random numerical primitives from a Source.
// A Source represents a source of uniformly-distributed
// pseudo-random int64 values in the range [0, 1<<63).
type Source interface {
Int63() int64
Seed(seed int64)
}
NewSource(seed int64) returns the builtin, deterministic PRNG, but New(source Source) will allow anything that satisfies the Source interface.
Here is an example of a Source that is backed by crypto/rand.
type CryptoRandSource struct{}
func NewCryptoRandSource() CryptoRandSource {
return CryptoRandSource{}
}
func (_ CryptoRandSource) Int63() int64 {
var b [8]byte
rand.Read(b[:])
// mask off sign bit to ensure positive number
return int64(binary.LittleEndian.Uint64(b[:]) & (1<<63 - 1))
}
func (_ CryptoRandSource) Seed(_ int64) {}
You can use it like this:
r := rand.New(NewCryptoRandSource())
for i := 0; i < 10; i++ {
fmt.Println(r.Int())
}
The math/rand library has a properly implemented Intn() method which ensures a uniform distribution.
func (r *Rand) Intn(n int) int {
if n <= 0 {
panic("invalid argument to Intn")
}
if n <= 1<<31-1 {
return int(r.Int31n(int32(n)))
}
return int(r.Int63n(int64(n)))
}
func (r *Rand) Int31n(n int32) int32 {
if n <= 0 {
panic("invalid argument to Int31n")
}
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int31() & (n - 1)
}
max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
v := r.Int31()
for v > max {
v = r.Int31()
}
return v % n
}
func (r *Rand) Int63n(n int64) int64 {
if n <= 0 {
panic("invalid argument to Int63n")
}
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int63() & (n - 1)
}
max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
v := r.Int63()
for v > max {
v = r.Int63()
}
return v % n
}
Cryptographic hash functions also can be wrapped as a Source for alternate means of randomness.
The numbers from n % max are not distributed uniformly. For example,
package main
import (
"fmt"
"math"
)
func main() {
max := 7
size := math.MaxUint8
count := make([]int, size)
for i := 0; i < size; i++ {
count[i%max]++
}
fmt.Println(count[:max])
}
Output:
[37 37 37 36 36 36 36]
Based on the comments received, I think I can improve on the example in my question by adding a uniformInt function, populating a uint32 instead of a uint16 and removing the pointer to the slice.
package main
import "crypto/rand"
import "fmt"
import "encoding/binary"
func randomInt() int {
var n uint32
binary.Read(rand.Reader, binary.LittleEndian, &n)
return int(n)
}
func uniformInt(max int) (r int) {
divisor := 4294967295 / max // Max Uint32
for {
r = randomInt() / divisor
if r <= max {
break
}
}
return
}
func shuffle(slice []string) {
for i := range slice {
j := uniformInt(i + 1)
slice[i], slice[j] = slice[j], slice[i]
}
}
func main() {
slice := []string{"a", "b", "c", "d", "e", "f", "h", "i", "j", "k"}
shuffle(slice)
fmt.Println(slice)
}

The binary representation of unsigned integer in Go

Is there a built-in function to convert a uint to a slice of binary integers {0,1} ?
>> convert_to_binary(2)
[1, 0]
I am not aware of such a function, however you can use strconv.FormatUint for that purpose.
Example (on play):
func Bits(i uint64) []byte {
bits := []byte{}
for _, b := range strconv.FormatUint(i, 2) {
bits = append(bits, byte(b - rune('0')))
}
return bits
}
FormatUint will return the string representation of the given uint to a base, in this case 2, so we're encoding it in binary. So the returned string for i=2 looks like this: "10". In bytes this is [49 48] as 1 is 49 and 0 is 48 in ASCII and Unicode. So we just need to iterate over the string, subtracting 48 from each rune (unicode character) and converting it to a byte.
Here is another method:
package main
import (
"bytes"
"fmt"
"math/bits"
)
func unsigned(x uint) []byte {
b := make([]byte, bits.UintSize)
for i := range b {
if bits.LeadingZeros(x) == 0 {
b[i] = 1
}
x = bits.RotateLeft(x, 1)
}
return b
}
func trimUnsigned(x uint) []byte {
return bytes.TrimLeft(unsigned(x), string(0))
}
func main() {
b := trimUnsigned(2)
fmt.Println(b) // [1 0]
}
https://golang.org/pkg/math/bits#LeadingZeros

Resources