Parsing a UInt8 Array of chars into a float without allocation - performance

I have a this problem here:
function main()
b = UInt8[0x35, 0x30, 0x2e, 0x30] # This is "50.0" in bytes
s = parse(Float64, String(b))
end
#btime main()
250.269 ns (2 allocations: 128 bytes)
50.0
I have a UInt8 array which contains a number in string format e.g. "50.0". I want to parse this number into a float without allocation and as fast as possible (I have millions of these numbers to parse). Is there a better way than the way I've produced above (ignoring the allocation for the UInt8 array since this will not exist).
Cheers guys!

You can use:
julia> using Parsers
julia> b = UInt8[0x35, 0x30, 0x2e, 0x30] # This is "50.0" in bytes
4-element Vector{UInt8}:
0x35
0x30
0x2e
0x30
julia> #btime Parsers.parse(Float64, $b)
13.527 ns (0 allocations: 0 bytes)
50.0
Note that your code allocates b inside main so it including the time and memory allocation of creation of b.
Also the difference is that in your code String(b) empties b, while with Parsers.parse the b is left untouched. This is especially useful if b would be a view of a slice of a longer tape of UInt8 values.

Related

Does go use something like space padding for structs? [duplicate]

This question already has answers here:
Sizeof struct in Go
(6 answers)
Closed 4 months ago.
I was playing around in go, and was trying to calculate and get the size of struct objects. And found something interesting, if you take a look at the following structs:
type Something struct {
anInteger int16 // 2 bytes
anotherInt int16 // 2 bytes
yetAnother int16 // 2 bytes
someBool bool // 1 byte
} // I expected 7 bytes total
type SomethingBetter struct {
anInteger int16 // 2 bytes
anotherInt int16 // 2 bytes
yetAnother int16 // 2 bytes
someBool bool // 1 byte
anotherBool bool // 1 byte
} // I expected 8 bytes total
type Nested struct {
Something // 7 bytes expected at first
completingByte bool // 1 byte
} // 8 bytes expected at first sight
But the result I got using unsafe.Sizeof(...) was as following:
Something -> 8 bytes
SomethingBetter -> 8 bytes
Nested -> 12 bytes, still, after finding out that "Something" used 8 bytes, though this might use 9 bytes
I suspect that go does something kind of like padding, but I don't know how and why it does that, is there some formula? Or logics? If it uses space padding, is it done randomly? Or based on some rules?
Yes, we have padding! if your system architecture is 32-bit the word size is 4 bytes and if it is 64-bit, the word size is 8 bytes. Now, what is the word size? "Word size" refers to the number of bits processed by a computer's CPU in one go (these days, typically 32 bits or 64 bits). Data bus size, instruction size, address size are usually multiples of the word size.
For example, suppose this struct:
type data struct {
a bool // 1 byte
b int64 // 8 byte
}
This struct it's not 9 bytes because, when our word size is 8, for first cycle, cpu reads 1 byte of bool and padding 7 bytes for others.
Imagine:
p: padding
+-----------------------------------------+----------------+
| 1-byte bool | p | p | p | p | p | p | p | int-64 |
+-----------------------------------------+----------------+
first 8 bytes second 8 bytes
For better performance, sort your struct items from bigger to small.
This is not good performance:
type data struct {
a string // 16 bytes size 16
b int32 // 4 bytes size 20
// 4 bytes padding size 24
c string // 16 bytes size 40
d int32 // 4 bytes size 44
// 4 bytes padding size 48 - Aligned on 8 bytes
}
Now It's better:
type data struct {
a string // 16 bytes size 16
c string // 16 bytes size 32
d int32 // 4 bytes size 36
b int32 // 4 bytes size 40
// no padding size 40 - Aligned on 5 bytes
}
See here for more examples.

golang slice allocation performance

I stumbled upon an interesting thing while checking performance of memory allocation in GO.
package main
import (
"fmt"
"time"
)
func main(){
const alloc int = 65536
now := time.Now()
loop := 50000
for i := 0; i<loop;i++{
sl := make([]byte, alloc)
i += len(sl) * 0
}
elpased := time.Since(now)
fmt.Printf("took %s to allocate %d bytes %d times", elpased, alloc, loop)
}
I am running this on a Core-i7 2600 with go version 1.6 64bit (also same results on 32bit) and 16GB of RAM (on WINDOWS 10)
so when alloc is 65536 (exactly 64K) it runs for 30 seconds (!!!!).
When alloc is 65535 it takes ~200ms.
Can someone explain this to me please?
I tried the same code at home with my core i7-920 # 3.8GHZ but it didn't show same results (both took around 200ms). Anyone has an idea what's going on?
Setting GOGC=off improved performance (down to less than 100ms). Why?
becaue of escape analysis. When you build with go build -gcflags -m the compiler prints whatever allocations escapes to heap. It really depends on your machine and GO compiler version but when the compiler decides that the allocation should move to heap it means 2 things:
1. the allocation will take longer (since "allocating" on the stack is just 1 cpu instruction)
2. the GC will have to clean up that memory later - costing more CPU time
for my machine, the allocation of 65536 bytes escapes to heap and 65535 doesn't.
that's why 1 bytes changed the whole proccess from 200ms to 30s. Amazing..
Note/Update 2021: as Tapir Liui notes in Go101 with this tweet:
As of Go 1.17, Go runtime will allocate the elements of slice x on stack if the compiler proves they are only used in the current goroutine and N <= 64KB:
var x = make([]byte, N)
And Go runtime will allocate the array y on stack if the compiler proves it is only used in the current goroutine and N <= 10MB:
var y [N]byte
Then how to allocated (the elements of) a slice which size is larger than 64KB but not larger than 10MB on stack (and the slice is only used in one goroutine)?
Just use the following way:
var y [N]byte
var x = y[:]
Considering stack allocation is faster than heap allocation, that would have a direct effect on your test, for alloc equals to 65536 and more.
Tapir adds:
In fact, we could allocate slices with arbitrary sum element sizes on stack.
const N = 500 * 1024 * 1024 // 500M
var v byte = 123
func createSlice() byte {
var s = []byte{N: 0}
for i := range s { s[i] = v }
return s[v]
}
Changing 500 to 512 make program crash.
the reason is very simple.
const alloc int = 65535
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $65784-0
const alloc int = 65536
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $248-0
the difference is where the slice are created.

Index Out of Range when using binary.PutVarint(...)

http://play.golang.org/p/RqScJVvpS7
package main
import (
"fmt"
"math/rand"
"encoding/binary"
)
func main() {
buffer := []byte{0, 0, 0, 0, 0, 0, 0, 0}
num := rand.Int63()
count := binary.PutVarint(buffer, num)
fmt.Println(count)
}
I had this working awhile ago when num was just an incrementing uint64 and I was using binary.PutUvarint but now that it's a random int64 and binary.PutVarint I get an error:
panic: runtime error: index out of range
goroutine 1 [running]:
encoding/binary.PutUvarint(0x1042bf58, 0x8, 0x8, 0x6ccb, 0xff9faa4, 0x9acb0442, 0x7fcfd52, 0x4d658221)
/usr/local/go/src/encoding/binary/varint.go:44 +0xc0
encoding/binary.PutVarint(0x1042bf58, 0x8, 0x8, 0x6ccb, 0x7fcfd52, 0x4d658221, 0x14f9e0, 0x104000e0)
/usr/local/go/src/encoding/binary/varint.go:83 +0x60
main.main()
/tmp/sandbox010341234/main.go:12 +0x100
What am I missing? I would have thought this to be a trivial change...
EDIT: I just tried extending my buffer array. For some odd reason it works and I get a count of 10. How can that be? int64 is 64 bits = 8 bytes, right?
Quoting the doc of encoding/binary:
The varint functions encode and decode single integer values using a variable-length encoding; smaller values require fewer bytes. For a specification, see https://developers.google.com/protocol-buffers/docs/encoding.
So the binary.PutVarint() is not a fixed, but a variable-length encoding. When passing an int64, it will need more than 8 bytes for large numbers, and less than 8 bytes for small numbers. Since the number you're encoding is a random number, it will have random bits even in its highest byte.
See this simple example:
buffer := make([]byte, 100)
for num := int64(1); num < 1<<60; num <<= 4 {
count := binary.PutVarint(buffer, num)
fmt.Printf("Num=%d, bytes=%d\n", num, count)
}
Output:
Num=1, bytes=1
Num=16, bytes=1
Num=256, bytes=2
Num=4096, bytes=2
Num=65536, bytes=3
Num=1048576, bytes=4
Num=16777216, bytes=4
Num=268435456, bytes=5
Num=4294967296, bytes=5
Num=68719476736, bytes=6
Num=1099511627776, bytes=6
Num=17592186044416, bytes=7
Num=281474976710656, bytes=8
Num=4503599627370496, bytes=8
Num=72057594037927936, bytes=9
The essence of variable-length encoding is that small numbers use less bytes, but this can only be achieved if in turn big numbers may use more than 8 bytes (that would be size of int64).
Details of the specific encoding is on the linked page.
A very easy example would be: A byte is 8 bits. Use 7 bits of the output byte as the "useful" bits to encode the data/number. If the highest bit is 1, that means more bytes are required. If highest bit is 0, we're done. You can see that small numbers can be encoded using 1 output byte (e.g. n=10), while we're using 1 extra bit for every 7-bit useful data, so if the input number uses all the 64 bits, we will end up with more than 8 bytes: 10 groups are required to cover 64 bits, so we will need 10 bytes (9 groups is only 9*7=63 bits).

Convert uint64 to int64 without loss of information

The problem with the following code:
var x uint64 = 18446744073709551615
var y int64 = int64(x)
is that y is -1. Without loss of information, is the only way to convert between these two number types to use an encoder and decoder?
buff bytes.Buffer
Encoder(buff).encode(x)
Decoder(buff).decode(y)
Note, I am not attempting a straight numeric conversion in your typical case. I am more concerned with maintaining the statistical properties of a random number generator.
Your conversion does not lose any information in the conversion. All the bits will be untouched. It is just that:
uint64(18446744073709551615) = 0xFFFFFFFFFFFFFFFF
int64(-1) = 0xFFFFFFFFFFFFFFFF
Try:
var x uint64 = 18446744073709551615 - 3
and you will have y = -4.
For instance: playground
var x uint64 = 18446744073709551615 - 3
var y int64 = int64(x)
fmt.Printf("%b\n", x)
fmt.Printf("%b or %d\n", y, y)
Output:
1111111111111111111111111111111111111111111111111111111111111100
-100 or -4
Seeing -1 would be consistent with a process running as 32bits.
See for instance the Go1.1 release notes (which introduced uint64)
x := ^uint32(0) // x is 0xffffffff
i := int(x) // i is -1 on 32-bit systems, 0xffffffff on 64-bit
fmt.Println(i)
Using fmt.Printf("%b\n", y) can help to see what is going on (see ANisus' answer)
As it turned out, the OP wheaties confirms (in the comments) it was run initially in 32 bits (hence this answer), but then realize 18446744073709551615 is 0xffffffffffffffff (-1) anyway: see ANisusanswer;
The types uint64 and int64 can both represent 2^64 discrete integer values.
The difference between the two is that uint64 holds only positive integers (0 thru 2^64-1), where as int64 holds both negative and positive integers using 1 bit to hold the sign (-2^63 thru 2^63-1).
As others have said, if your generator is producing 0xffffffffffffffff, uint64 will represent this as the raw integer (18,446,744,073,709,551,615) whereas int64 will interpret the two's complement value and return -1.

Breaking a 32 bit integer into 8 bit chucks for Radix Sort

I am basically a beginner in Computer Science. Please forgive me if I ask elementary questions. I am trying to understand radix sort. I read that a 32 bit unsigned integer can be broken down into 4 8-bit chunks. After that, all it takes is "4 passes" to complete the radix sort. Can somebody please show me an example for how this breakdown (32 bit into 4 8-bit chunks) works? Maybe, a 32-bit integer like 2147507648.
Thanks!
You would divide the 32 bit integer up in 4 pieces of 8 bits. Extracting those pieces is a matter of using using some of the operators available in C.:
uint32_t x = 2147507648;
uint8_t chunk1 = x & 0x000000ff; //lower 8 bits
uint8_t chunk2 = (x & 0x0000ff00) >> 8;
uint8_t chunk3 = (x & 0x00ff0000) >> 16;
uint8_t chunk4 = (x & 0xff000000) >> 24; //highest 8 bits
2147507648 decimal is 0x80005DC0 hex. You an pretty much eyeball those 8 bits out of the hex representation, since each hex digit represents 4 bits, two and two of them represents 8 bits.
So that now means chunk 1 is 0xC0, chunk 2 is 0x5D, chunk3 is 0x00 and chunk 4 is 0x80
It's done as follows:
2147507648
=> 0x80005DC0 (hex value of 2147507648)
=> 0x80 0x00 0x5D 0xC0
=> 128 0 93 192
To do this, you'd need bitwise operations as nos suggested.

Resources