Convert string of numbers into 'binary representation' - performance

Im recently made the "Winning Lottery Ticket" coding challange on hackerrank.
https://www.hackerrank.com/challenges/winning-lottery-ticket/
The idea is to count the combinations of two lines which contain all numbers from 0-9, in the example below its 5 combinations in total.
56789
129300455
5559948277
012334556
123456879
The idea is to change the the representation of something quicker for checking if all numbers are contained.
Example representation:
1278 --> 01100001100
Example with using the first two lines from above:
56789129300455 --> 1111111111
When checking if a number is contained with the concatenation of 2 lines I can abort directly if I encounter a zero because thats not gonna be a pair with all 0-9.
This logic works, but it fails when having a huge amount of lines to compare.
// Go code
func winningLotteryTicket(tickets []string) int {
counter := 0
for i := 0; i < len(tickets); i++ {
for j := i + 1; j < len(tickets); j++ {
if err := bitMask(fmt.Sprintf("%v%v", tickets[i], tickets[j])); err == nil {
counter++
}
}
}
return counter
}
func bitMask(s string) error {
for i := 0; i <= 9; i++ {
if !strings.Contains(s, strconv.Itoa(i)) {
return errors.New("No Pair")
}
}
return nil
}
Not sure if this representation is called a bitMaks, if not please correct me and I will adjust this post.
From my point of view there is no way the improove performance on the concatenation of the strings because I will have to check each combination.
For checking if a number is contained within the string at the function "bitMask" im not sure.
Do you have an idea how this could perform better ?

Bit masks are integers, not strings of ones and zeros. It's called a bitmask because we're not interested in the numerical value of these integers but only in the bit pattern. We can use bitwise operations on integers and those are really fast because they are implemented in hardware, directly in the CPU.
Here is a function that turns a string into an actual bitmask, with each one-bit signaling that a particular digit is present in the string:
func mask(s string) uint16 {
// We need ten bits, one for each possible decimal digit in s, so int16 and
// uint16 are the smallest possible integer types that fit. For bitmasks it
// is typical to select an unsigned type because the sign bit doesn't have
// any meaning. As I said earlier, mask's numerical value is irrelevant.
var mask uint16
for _, c := range s {
switch c {
case '0':
mask |= 0b0000000001
case '1':
mask |= 0b0000000010
case '2':
mask |= 0b0000000100
case '3':
mask |= 0b0000001000
case '4':
mask |= 0b0000010000
case '5':
mask |= 0b0000100000
case '6':
mask |= 0b0001000000
case '7':
mask |= 0b0010000000
case '8':
mask |= 0b0100000000
case '9':
mask |= 0b1000000000
}
}
return mask
}
This is rather verbose, but it should be pretty obvious what happens.
Note that the binary number literals can be replaced with bit shifts:
0b0000000001 is the same as 1<<0 (1 shifted zero times to the left)
0b0000000010 is the same as 1<<1 (1 shifted one time to the left
0b0000000100 is the same as 1<<2 (1 shifted two times to the left), and so on
Using this, and taking advantage of the fact that the bytes '0' through '9' are themselves just integers (48 through 57 in decimal, given by their place in the ASCII table, we can shorten this function like so:
func mask(s string) uint16 {
var mask uint16
for _, c := range s {
if '0' <= c && c <= '9' {
mask |= 1 << (c - '0')
}
}
return mask
}
To check two lines, then, all we have to do is OR the masks for the lines and compare to 0b1111111111 (i.e. check if all ten bits are set):
package main
import "fmt"
func main() {
a := "56789"
b := "129300455"
mA := mask(a)
mB := mask(b)
fmt.Printf("mask(%11q) = %010b\n", a, mA)
fmt.Printf("mask(%11q) = %010b\n", b, mB)
fmt.Printf("combined = %010b\n", mA|mB)
fmt.Printf("all digits present: %v\n", mA|mB == 0b1111111111)
}
func mask(s string) uint16 {
var mask uint16
for _, c := range s {
if '0' <= c && c <= '9' {
mask |= 1 << (c - '0')
}
}
return mask
}
mask( "56789") = 1111100000
mask("129300455") = 1000111111
combined = 1111111111
all digits present: true
Try it on the playground: https://play.golang.org/p/mr1KqnC9phB

Related

Splitting big.Int by digit

I'm trying to split a big.Int into a number of int64s such that each is a portion of the larger number, with a standard offset of 18 digits. For example, given the following input value of 1234512351234088800000999, I would expect the following output: [351234088800000999, 1234512]. For negative numbers, I would expect all of the parts to be negative (i.e. -1234512351234088800000999 produces [-351234088800000999, -1234512]).
I already know I can do this to get the result I want:
func Split(input *big.Int) []int64 {
const width = 18
asStr := in.Coefficient().Text(10)
strLen := len(asStr)
offset := 0
if in.IsNegative() {
offset = 1
}
length := int(math.Ceil(float64(strLen-offset) / width))
ints := make([]int64, length)
for i := 1; i <= length; i++ {
start := strLen - (i * width)
end := start + width
if start < 0 || (start == 1 && asStr[0] == '-') {
start = 0
}
ints[i-1], _ = strconv.ParseInt(asStr[start:end], 10, 64)
if offset == 1 && ints[i-1] > 0 {
ints[i-1] = 0 - ints[i-1]
}
}
return ints
}
However, I don't like the idea of using string-parsing nor do I like the use of strconv. Is there a way I can do this utilizing the big.Int directly?
You can use the DivMod function to do what you need here, with some special care to handle negative numbers:
var offset = big.NewInt(1e18)
func Split(input *big.Int) []int64 {
rest := new(big.Int)
rest.Abs(input)
var ints []int64
r := new(big.Int)
for {
rest.DivMod(rest, offset, r)
ints = append(ints, r.Int64() * int64(input.Sign()))
if rest.BitLen() == 0 {
break
}
}
return ints
}
Multiplying each output by input.Sign() ensures that each output will be negative if the input is negative. The sum of the output values multiplied by 1e18 times their position in the output should equal the input.

How to convert any negative value to zero with bitwise operators?

I'm writing the PopBack() operation for a LinkedList in Go, the code looks like this:
// PopBack will remove an item from the end of the linked list
func (ll *LinkedList) PopBack() {
lastNode := &ll.node
for *lastNode != nil && (*lastNode).next != nil {
lastNode = &(*lastNode).next
}
*lastNode = nil
if ll.Size() != 0 {
ll.size -= 1
}
}
I don't like the last if clause; if the size is zero we don't want to decrement to a negative value. I was wondering if there is a bitwise operation in which whatever the value is after the decrement, if it's only negative it should covert to a zero?
Negative values have the sign bit set, so you can do like this
ll.size += (-ll.size >> 31)
Suppose ll.size is int32 and ll.Size() returns ll.size. Of course this also implies that size is never negative. When the size is positive then the right shift will sign-extend -ll.size to make it -1, otherwise it'll be 0
If ll.size is int64 then change the shift count to 63. If ll.size is uint64 you can simply cast to int64 if the size is never larger than 263. But if the size can be that large (although almost impossible to occur in the far future) then things are much more trickier:
mask := uint64(-int64(ll.size >> 63)) // all ones if ll.size >= (1 << 63)
ll.size = ((ll.size - 1) & mask) | ((ll.size + uint64(-int64(ll.size) >> 63)) & ^mask)
It's basically a bitwise mux that's usually used in bithacks because you cannot cast bool to int without if in golang
Neither of these are quite readable at first glance so the if block is usually better
Trade a nil check in each iteration of the loop for a single nil check before the loop. With this change, the loop runs faster and the operator for updating size is subtraction.
func (ll *LinkedList) PopBack() {
if ll.node == nil {
return
}
lastNode := &ll.node
for (*lastNode).next != nil {
lastNode = &(*lastNode).next
}
*lastNode = nil
ll.size -= 1
}

Byte ordering of floats

I'm working on Tormenta (https://github.com/jpincas/tormenta) which is backed by BadgerDB (https://github.com/dgraph-io/badger). BadgerDB stores keys (slices of bytes) in byte order. I am creating keys which contain floats which need to be stored in order so I can user Badger's key iteration properly. I don't have a solid CS background so I'm a little out of my depth.
I encode the floats like this: binary.Write(buf, binary.BigEndian, myFloat). This works fine for positive floats - the key order is what you'd expect, but the byte ordering breaks down for negative floats.
As an aside, ints present the same problem, but I was able to fix that relatively easily by flipping the sign bit on the int with b[0] ^= 1 << 7 (where b is the []byte holding the result of encoding the int), and then flipping back when retrieving the key.
Although b[0] ^= 1 << 7 DOES also flip the sign bit on floats and thus places all the negative floats before the positive ones, the negative ones are incorrectly (backwards) ordered. It is necessary to flip the sign bit and reverse the order of the negative floats.
A similar question was asked on StackOverflow here: Sorting floating-point values using their byte-representation, and the solution was agreed to be:
XOR all positive numbers with 0x8000... and negative numbers with 0xffff.... This should flip the sign bit on both (so negative numbers go first), and then reverse the ordering on negative numbers.
However, that's way above my bit-flipping-skills level, so I was hoping a Go bit-ninja could help me translate that into some Go code.
Using math.Float64bits()
You could use math.Float64bits() which returns an uint64 value having the same bytes / bits as the float64 value passed to it.
Once you have an uint64, performing bitwise operations on it is trivial:
f := 1.0 // Some float64 value
bits := math.Float64bits(f)
if f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
Then serialize the bits value instead of the f float64 value, and you're done.
Let's see this in action. Let's create a wrapper type holding a float64 number and its bytes:
type num struct {
f float64
data [8]byte
}
Let's create a slice of these nums:
nums := []*num{
{f: 1.0},
{f: 2.0},
{f: 0.0},
{f: -1.0},
{f: -2.0},
{f: math.Pi},
}
Serializing them:
for _, n := range nums {
bits := math.Float64bits(n.f)
if n.f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, bits); err != nil {
panic(err)
}
}
This is how we can sort them byte-wise:
sort.Slice(nums, func(i int, j int) bool {
ni, nj := nums[i], nums[j]
for k := range ni.data {
if bi, bj := ni.data[k], nj.data[k]; bi < bj {
return true // We're certain it's less
} else if bi > bj {
return false // We're certain it's not less
} // We have to check the next byte
}
return false // If we got this far, they are equal (=> not less)
})
And now let's see the order after byte-wise sorting:
fmt.Println("Final order byte-wise:")
for _, n := range nums {
fmt.Printf("% .7f %3v\n", n.f, n.data)
}
Output will be (try it on the Go Playground):
Final order byte-wise:
-2.0000000 [ 63 255 255 255 255 255 255 255]
-1.0000000 [ 64 15 255 255 255 255 255 255]
0.0000000 [128 0 0 0 0 0 0 0]
1.0000000 [191 240 0 0 0 0 0 0]
2.0000000 [192 0 0 0 0 0 0 0]
3.1415927 [192 9 33 251 84 68 45 24]
Without math.Float64bits()
Another option would be to first serialize the float64 value, and then perform XOR operations on the bytes.
If the number is positive (or zero), XOR the first byte with 0x80, and the rest with 0x00, which is basically do nothing with them.
If the number is negative, XOR all the bytes with 0xff, which is basically a bitwise negation.
In action: the only part that is different is the serialization and XOR operations:
for _, n := range nums {
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, n.f); err != nil {
panic(err)
}
if n.f >= 0 {
n.data[0] ^= 0x80
} else {
for i, b := range n.data {
n.data[i] = ^b
}
}
}
The rest is the same. The output will also be the same. Try this one on the Go Playground.

How to generate a stream of *unique* random numbers in Go using the standard library

How can I generate a stream of unique random number in Go?
I want to guarantee there are no duplicate values in array a using math/rand and/or standard Go library utilities.
func RandomNumberGenerator() *rand.Rand {
s1 := rand.NewSource(time.Now().UnixNano())
r1 := rand.New(s1)
return r1
}
rng := RandomNumberGenerator()
N := 10000
for i := 0; i < N; i++ {
a[i] = rng.Int()
}
There are questions and solutions on how to generate a series of random number in Go, for example, here.
But I would like to generate a series of random numbers that does not duplicate previous values. Is there a standard/recommended way to achieve this in Go?
My guess is to (1) use permutation or to (2) keep track of previously generated numbers and regenerate a value if it's been generated before.
But solution (1) sounds like overkill if I only want a few number and (2) sounds very time consuming if I end up generating a long series of random numbers due to collision, and I guess it's also very memory-consuming.
Use Case: To benchmark a Go program with 10K, 100K, 1M pseudo-random number that has no duplicates.
You should absolutely go with approach 2. Let's assume you're running on a 64-bit machine, and thus generating 63-bit integers (64 bits, but rand.Int never returns negative numbers). Even if you generate 4 billion numbers, there's still only a 1 in 4 billion chance that any given number will be a duplicate. Thus, you'll almost never have to regenerate, and almost never never have to regenerate twice.
Try, for example:
type UniqueRand struct {
generated map[int]bool
}
func (u *UniqueRand) Int() int {
for {
i := rand.Int()
if !u.generated[i] {
u.generated[i] = true
return i
}
}
}
I had similar task to pick elements from initial slice by random uniq index. So from slice with 10k elements get 1k random uniq elements.
Here is simple head on solution:
import (
"time"
"math/rand"
)
func getRandomElements(array []string) []string {
result := make([]string, 0)
existingIndexes := make(map[int]struct{}, 0)
randomElementsCount := 1000
for i := 0; i < randomElementsCount; i++ {
randomIndex := randomIndex(len(array), existingIndexes)
result = append(result, array[randomIndex])
}
return result
}
func randomIndex(size int, existingIndexes map[int]struct{}) int {
rand.Seed(time.Now().UnixNano())
for {
randomIndex := rand.Intn(size)
_, exists := existingIndexes[randomIndex]
if !exists {
existingIndexes[randomIndex] = struct{}{}
return randomIndex
}
}
}
I see two reasons for wanting this. You want to test a random number generator, or you want unique random numbers.
You're Testing A Random Number Generator
My first question is why? There's plenty of solid random number generators available. Don't write your own, it's basically dabbling in cryptography and that's never a good idea. Maybe you're testing a system that uses a random number generator to generate random output?
There's a problem: there's no guarantee random numbers are unique. They're random. There's always a possibility of collision. Testing that random output is unique is incorrect.
Instead, you want to test the results are distributed evenly. To do this I'll reference another answer about how to test a random number generator.
You Want Unique Random Numbers
From a practical perspective you don't need guaranteed uniqueness, but to make collisions so unlikely that it's not a concern. This is what UUIDs are for. They're 128 bit Universally Unique IDentifiers. There's a number of ways to generate them for particular scenarios.
UUIDv4 is basically just a 122 bit random number which has some ungodly small chance of a collision. Let's approximate it.
n = how many random numbers you'll generate
M = size of the keyspace (2^122 for a 122 bit random number)
P = probability of collision
P = n^2/2M
Solving for n...
n = sqrt(2MP)
Setting P to something absurd like 1e-12 (one in a trillion), we find you can generate about 3.2 trillion UUIDv4s with a 1 in a trillion chance of collision. You're 1000 times more likely to win the lottery than have a collision in 3.2 trillion UUIDv4s. I think that's acceptable.
Here's a UUIDv4 library in Go to use and a demonstration of generating 1 million unique random 128 bit values.
package main
import (
"fmt"
"github.com/frankenbeanies/uuid4"
)
func main() {
for i := 0; i <= 1000000; i++ {
uuid := uuid4.New().Bytes()
// use the uuid
}
}
you can generate a unique random number with len(12) using UnixNano in golang time package :
uniqueNumber:=time.Now().UnixNano()/(1<<22)
println(uniqueNumber)
it's always random :D
1- Fast positive and negative int32 unique pseudo random numbers in 296ms using std lib:
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const n = 1000000
rand.Seed(time.Now().UTC().UnixNano())
duplicate := 0
mp := make(map[int32]struct{}, n)
var r int32
t := time.Now()
for i := 0; i < n; {
r = rand.Int31()
if i&1 == 0 {
r = -r
}
if _, ok := mp[r]; ok {
duplicate++
} else {
mp[r] = zero
i++
}
}
fmt.Println(time.Since(t))
fmt.Println("len: ", len(mp))
fmt.Println("duplicate: ", duplicate)
positive := 0
for k := range mp {
if k > 0 {
positive++
}
}
fmt.Println(`n=`, n, `positive=`, positive)
}
var zero = struct{}{}
output:
296.0169ms
len: 1000000
duplicate: 118
n= 1000000 positive= 500000
2- Just fill the map[int32]struct{}:
for i := int32(0); i < n; i++ {
m[i] = zero
}
When reading it is not in order in Go:
for k := range m {
fmt.Print(k, " ")
}
And this just takes 183ms for 1000000 unique numbers, no duplicate (The Go Playground):
package main
import (
"fmt"
"time"
)
func main() {
const n = 1000000
m := make(map[int32]struct{}, n)
t := time.Now()
for i := int32(0); i < n; i++ {
m[i] = zero
}
fmt.Println(time.Since(t))
fmt.Println("len: ", len(m))
// for k := range m {
// fmt.Print(k, " ")
// }
}
var zero = struct{}{}
3- Here is the simple but slow (this takes 22s for 200000 unique numbers), so you may generate and save it to a file once:
package main
import "time"
import "fmt"
import "math/rand"
func main() {
dup := 0
t := time.Now()
const n = 200000
rand.Seed(time.Now().UTC().UnixNano())
var a [n]int32
var exist bool
for i := 0; i < n; {
r := rand.Int31()
exist = false
for j := 0; j < i; j++ {
if a[j] == r {
dup++
fmt.Println(dup)
exist = true
break
}
}
if !exist {
a[i] = r
i++
}
}
fmt.Println(time.Since(t))
}
Temporary workaround based on #joshlf's answer
type UniqueRand struct {
generated map[int]bool //keeps track of
rng *rand.Rand //underlying random number generator
scope int //scope of number to be generated
}
//Generating unique rand less than N
//If N is less or equal to 0, the scope will be unlimited
//If N is greater than 0, it will generate (-scope, +scope)
//If no more unique number can be generated, it will return -1 forwards
func NewUniqueRand(N int) *UniqueRand{
s1 := rand.NewSource(time.Now().UnixNano())
r1 := rand.New(s1)
return &UniqueRand{
generated: map[int]bool{},
rng: r1,
scope: N,
}
}
func (u *UniqueRand) Int() int {
if u.scope > 0 && len(u.generated) >= u.scope {
return -1
}
for {
var i int
if u.scope > 0 {
i = u.rng.Int() % u.scope
}else{
i = u.rng.Int()
}
if !u.generated[i] {
u.generated[i] = true
return i
}
}
}
Client side code
func TestSetGet2(t *testing.T) {
const N = 10000
for _, mask := range []int{0, -1, 0x555555, 0xaaaaaa, 0x333333, 0xcccccc, 0x314159} {
rng := NewUniqueRand(2*N)
a := make([]int, N)
for i := 0; i < N; i++ {
a[i] = (rng.Int() ^ mask) << 1
}
//Benchmark Code
}
}

Convert rune to int?

In the following code, I iterate over a string rune by rune, but I'll actually need an int to perform some checksum calculation. Do I really need to encode the rune into a []byte, then convert it to a string and then use Atoi to get an int out of the rune? Is this the idiomatic way to do it?
// The string `s` only contains digits.
var factor int
for i, c := range s[:12] {
if i % 2 == 0 {
factor = 1
} else {
factor = 3
}
buf := make([]byte, 1)
_ = utf8.EncodeRune(buf, c)
value, _ := strconv.Atoi(string(buf))
sum += value * factor
}
On the playground: http://play.golang.org/p/noWDYjn5rJ
The problem is simpler than it looks. You convert a rune value to an int value with int(r). But your code implies you want the integer value out of the ASCII (or UTF-8) representation of the digit, which you can trivially get with r - '0' as a rune, or int(r - '0') as an int. Be aware that out-of-range runes will corrupt that logic.
For example, sum += (int(c) - '0') * factor,
package main
import (
"fmt"
"strconv"
"unicode/utf8"
)
func main() {
s := "9780486653556"
var factor, sum1, sum2 int
for i, c := range s[:12] {
if i%2 == 0 {
factor = 1
} else {
factor = 3
}
buf := make([]byte, 1)
_ = utf8.EncodeRune(buf, c)
value, _ := strconv.Atoi(string(buf))
sum1 += value * factor
sum2 += (int(c) - '0') * factor
}
fmt.Println(sum1, sum2)
}
Output:
124 124
why don't you do only "string(rune)".
s:="12345678910"
var factor,sum int
for i,x:=range s{
if i%2==0{
factor=1
}else{
factor=3
}
xstr:=string(x) //x is rune converted to string
xint,_:=strconv.Atoi(xstr)
sum+=xint*factor
}
fmt.Println(sum)
val, _ := strconv.Atoi(string(v))
Where v is a rune
More concise but same idea as above

Resources