How the slice is enlarged by append? Is the capacity always doubled? - go

When append a slice, the slice may be enlarged if necessary. Because the spec doesn't specify the algorithm, I am curious about it.
I try to find the append implementation in the Go source code, but can't find it.
Could anyone explain the specified algorithm for enlarging slice? Is the capacity always doubled? or Could anyone provide the source code position of append? I can check it myself.

The code responsible for growing slices in append can be found here.
As of 2014-2020 the implemented rules are:
If appending to the slice will increase its length by more than double, the new capacity is set to the new length.
Otherwise, double the capacity if the current length is less than 1024, or by 25% if it is larger. Repeat this step until the new capacity fits the desired length.
Presumably this isn't part of the specification so the heuristics can be changed in future if needed. You can check the most current version of this implementation on the master branch.

In Go 1.18 it changed.
https://github.com/golang/go/commit/2dda92ff6f9f07eeb110ecbf0fc2d7a0ddd27f9d
starting cap growth factor
256 2.0
512 1.63
1024 1.44
2048 1.35
4096 1.30

Related

Compression of Binary file

If we are given a binary file of length n, where each bit independently is one with probability 1/3 and zero else. We want to construct a method that the expected length of the compressed sequence is less than 10 percent more than Shannon's lower bound (for all n large enough).
I've got the lower bound is 0.918. I tried to use tuples of size 2, but it gives me an expected length of 1.88 by Huffman coding. Am I going in the right direction?
What if we want to get a 3% margin ?
The Shannon entropy bound is 0.918 output bits per input bit.
If you just write the bits you're given, you'll spend 1 output bit per input bit.
This is already less than 10% more than the bound, so no compression is required.
You can use Arithmetic compressor or Rangecoder.
There is explanation with code for Arithmetic compressor and open-source implementation of Rangecoder.
I personally recommend to use Rangecoder, because of it works fastest, and has never been patented (patent for arithmetic compressor already expired).

Why does the GIF spec require at least 2-bits for the initial LZW code size?

I've been trying to figure out why the GIF89a spec requires that the initial LZW code size to be at least 2-bits, even when encoding 1-bit images (B&W). In appendix F of the spec, it says the following:
ESTABLISH CODE SIZE
The first byte of the Compressed Data stream is a value indicating the minimum number of bits required to represent the set of actual pixel values. Normally this will be the same as the number of color bits. Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2.
I'm curious as to what these algorithmic constraints are. What would possibly prevent the variant of LZW used in GIF from using a code size of 1? Was this just a limitation of the early encoders or decoders? Or is there some weird edge case that can manifest itself in with just the right combination of bits? Or is there something completely different going on here?
In addition to the codes for 0 and 1, you also have a clear code and an end of information code.
Quoting from the spec:
The output codes are of variable length, starting at +1 bits per
code, up to 12 bits per code. This defines a maximum code value of 4095
(0xFFF). Whenever the LZW code value would exceed the current code length, the
code length is increased by one.
If you start with a code size of 1, the code size needs to be increased immediately by this rule.
This limitation gets rid of one if in implementation (with codesize==1 the first vocabulary phrase code would have width==codesize+2, in all other cases width==codesize+1).
The drawback is very small decreasing in compression ratio for 2-color pictures.

How to efficiently store and manipulate sparse binary matrices in Octave?

I'm trying to manipulate sparse binary matrices in GNU Octave, and it's using way more memory than I expect, and relevant sparse-matrix functions don't behave the way I want them to. I see this question about higher-than-expected sparse-matrix storage in MATLAB, which suggests that this matrix should consume even more memory, but helped explain (only) part of this situation.
For a sparse, binary matrix, I can't figure out any way to get Octave to NOT STORE the array of values (they're always implicitly 1, so need not be stored). Can this be done? Octave always seems to consume memory for a values array.
A trimmed-down example demonstrating the situation: create random sparse matrix, turn it into "binary":
mys=spones(sprandn(1024,1024,.03)); nnz(mys), whos mys
Shows the situation. The consumed size is consistent with the storage mechanism outlined in aforementioned SO answer and expanded below, if spones() creates an array of storage-class double and if all indices are 32-bit (i.e., TotalStorageSize - rowIndices - columnIndices == NumNonZero*sizeof(double) -- unnecessarily storing these values (all 1s as doubles) is over half of the total memory consumed by this 3%-sparse object.
After messing with this (for too long) while composing this question, I discovered some partial workarounds, so I'm going to "self-answer" (only) part of the question for continuity (hopefully), but I didn't figure out an adequate answer to main question:
How do I create an efficiently-stored ("no-/implicit-values") binary matrix in Octave?
Additional background on storage format follows...
The Octave docs say the storage format for sparse matrices uses format Compressed Sparse Column (CSC). This seems to imply storing the following arrays (expanding on aforementioned SO answer, with canonical Yale format labels and tweaks for column-major order):
values (A), number-of-nonzeros (NNZ) entries of storage-class size;
row numbers (IA), NNZ entries of index size (hopefully int64 but maybe int32);
start of each column (JA), number-of-columns-plus-1 entries of index size)
In this case, for binary-only storage, I hope there's a way to completely avoid storing array (A), but I can't figure it out.
Full disclosure: As noted above, as I was composing this question, I discovered a workaround to reduce memory usage, so I'm "self-answering" part of this here, but it still isn't fully satisfying, so I'm still listening for a better actual answer to storage of a sparse binary matrix without a trivial, bloated, unnecessary values array...
To get a binary-like value out of a number-like value and reduce the memory usage in this case, use "logical" storage, created by logical(X). For example, building from above,
logicalmys = logical(mys);
creates a sparse bool matrix, that takes up less memory (1-byte logical rather than 8-byte double for the values array).
Adding more information to the whos information using whos_line_format helps illuminate the situation: The default string includes 5 of the 7 properties (see docs for more). I'm using the format string
whos_line_format(" %a:4; %ln:6; %cs:16:6:1; %rb:12; %lc:8; %e:10; %t:20;\n")
to add display of "elements", and "type" (which is distinct from "class").
With that, whos mys logicalmys shows something like
Attr Name Size Bytes Class Elements Type
==== ==== ==== ===== ===== ======== ====
mys 1024x1024 391100 double 32250 sparse matrix
logicalmys 1024x1024 165350 logical 32250 sparse bool matrix
So this shows a distinction between sparse matrix and sparse bool matrix. However, the total memory consumed by logicalmys is consistent with actually storing an array of NNZ booleans (1-byte) -- That is:
totalMemory minus rowIndices minus columnOffsets leaves NNZ bytes left;
in numbers,
165350 - 32250*4 - 1025*4 == 32250.
So we're still storing 32250 elements, all of which are 1. Further, if you set one of the 1-elements to zero, it reduces the reported storage! For a good time, try: pick a nonzero element, e.g., (42,1), then zero it: logicalmys(42,1) = 0; then whos it!
My hope is that this is correct, and that this clarifies some things for those who might be interested. Comments, corrections, or actual answers welcome!

Hashing of pointer values

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.
So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.
This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.
Here are some more algorithms, including performance tests.
It seems that the magic words are "integer hashing".
They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.
If you know the lowest possible pointer address (which is often the case if you're working within a large buffer), just convert the pointer to an integer by subtracting the lowest possible pointer value; eg. that could be the buffer's base address.
-Remember: pointer subtracted from pointer equals an offset (integer).
So: Don't "chop off" bits; it's much better to convert to an offset.
This will result in that the offset value is much smaller than a pointer value.
It may help further to shift the pointer value right twice (eg. divide by 4) in some cases as well, before hashing it.
The problem with pointers is often that small blocks of memory is likely to be allocated on the same address (eg. a block being freed and another block is taking the freed block's place).
Why not just use an existing hash function?

optimizing byte-pair encoding

Noticing that byte-pair encoding (BPE) is sorely lacking from the large text compression benchmark, I very quickly made a trivial literal implementation of it.
The compression ratio - considering that there is no further processing, e.g. no Huffman or arithmetic encoding - is surprisingly good.
The runtime of my trivial implementation was less than stellar, however.
How can this be optimized? Is it possible to do it in a single pass?
This is a summary of my progress so far:
Googling found this little report that links to the original code and cites the source:
Philip Gage, titled 'A New Algorithm
for Data Compression', that appeared
in 'The C Users Journal' - February
1994 edition.
The links to the code on Dr Dobbs site are broken, but that webpage mirrors them.
That code uses a hash table to track the the used digraphs and their counts each pass over the buffer, so as to avoid recomputing fresh each pass.
My test data is enwik8 from the Hutter Prize.
|----------------|-----------------|
| Implementation | Time (min.secs) |
|----------------|-----------------|
| bpev2 | 1.24 | //The current version in the large text benchmark
| bpe_c | 1.07 | //The original version by Gage, using a hashtable
| bpev3 | 0.25 | //Uses a list, custom sort, less memcpy
|----------------|-----------------|
bpev3 creates a list of all digraphs; the blocks are 10KB in size, and there are typically 200 or so digraphs above the threshold (of 4, which is the smallest we can gain a byte by compressing); this list is sorted and the first subsitution is made.
As the substitutions are made, the statistics are updated; typically each pass there is only around 10 or 20 digraphs changed; these are 'painted' and sorted, and then merged with the digraph list; this is substantially faster than just always sorting the whole digraph list each pass, since the list is nearly sorted.
The original code moved between a 'tmp' and 'buf' byte buffers; bpev3 just swaps buffer pointers, which is worth about 10 seconds runtime alone.
Given the buffer swapping fix to bpev2 would bring the exhaustive search in line with the hashtable version; I think the hashtable is arguable value, and that a list is a better structure for this problem.
Its sill multi-pass though. And so its not a generally competitive algorithm.
If you look at the Large Text Compression Benchmark, the original bpe has been added. Because of it's larger blocksizes, it performs better than my bpe on on enwik9. Also, the performance gap between the hash-tables and my lists is much closer - I put that down to the march=PentiumPro that the LTCB uses.
There are of course occasions where it is suitable and used; Symbian use it for compressing pages in ROM images. I speculate that the 16-bit nature of Thumb binaries makes this a straightforward and rewarding approach; compression is done on a PC, and decompression is done on the device.
I've done work with optimizing a LZF compression implementation, and some of the same principles I used to improve performance are usable here.
To speed up performance on byte-pair encoding:
Limit the block size to less than 65kB (probably 8-16 kB will be optimal). This guarantees not all bytes will be used, and allows you to hold intermediate processing info in RAM.
Use a hashtable or simple lookup table by short integer (more RAM, but faster) to hold counts for a byte pairs. There are 65,656 2-byte pairs, and BlockSize instances possible (max blocksize 64k). This gives you a table of 128k possible outputs.
Allocate and reuse data structures capable of holding a full compression block, replacement table, byte-pair counts, and output bytes in memory. This sounds wasteful of RAM, but when you consider that your block size is small, it's worth it. Your data should be able to sit entirely in CPU L2 or (worst case) L3 cache. This gives a BIG speed boost.
Do one fast pass over the data to collect counts, THEN worry about creating your replacement table.
Pack bytes into integers or short ints whenever possible (applicable mostly to C/C++). A single entry in the counting table can be represented by an integer (16-bit count, plus byte pair).
Code in JustBasic can be found here complete with input text file.
Just BASIC Files Archive – forum post
EBPE by TomC 02/2014 – Ehanced Byte Pair Encoding
EBPE features two post processes to Byte Pair Encoding
1. Is compressing the dictionary (believed to be a novelty)
A dictionary entry is composed of 3 bytes:
AA – the two char to be replaced by (byte pair)
1 – this single token (tokens are unused symbols)
So "AA1" tells us when decoding that every time we see a "1" in the
data file, replace it with "AA".
While long runs of sequential tokens are possible, let’s look at this
8 token example:
AA1BB3CC4DD5EE6FF7GG8HH9
It is 24 bytes long (8 * 3)
The token 2 is not in the file indicating that it was not an open token to
use, or another way to say it: the 2 was in the original data.
We can see the last 7 tokens 3,4,5,6,7,8,9 are sequential so any time we
see a sequential run of 4 tokens or more, let’s modify our dictionary to be:
AA1BB3<255>CCDDEEFFGGHH<255>
Where the <255> tells us that the tokens for the byte pairs are implied and
are incremented by 1 more than the last token we saw (3). We increment
by one until we see the next <255> indicating an end of run.
The original dictionary was 24 bytes,
The enhanced dictionary is 20 bytes.
I saved 175 bytes using this enhancement on a text file where tokens
128 to 254 would be in sequence as well as others in general, to include
the run created by lowercase pre-processing.
2. Is compressing the data file
Re-using rarely used characters as tokens is nothing new.
After using all of the symbols for compression (except for <255>),
we scan the file and find a single "j" in the file. Let this char do double
duty by:
"<255>j" means this is a literal "j"
"j" is now used as a token for re-compression,
If the j occurred 1 time in the data file, we would need to add 1 <255>
and a 3 byte dictionary entry, so we need to save more than 4 bytes in BPE
for this to be worth it.
If the j occurred 6 times we would need 6 <255> and a 3 byte dictionary
entry so we need to save more than 9 bytes in BPE for this to be worth it.
Depending on if further compression is possible and how many byte pairs remain
in the file, this post process has saved in excess of 100 bytes on test runs.
Note: When decompressing make sure not to decompress every "j".
One needs to look at the prior character to make sure it is not a <255> in order
to decompress. Finally, after all decompression, go ahead and remove the <255>'s
to recreate your original file.
3. What’s next in EBPE?
Unknown at this time
I don't believe this can be done in a single pass unless you find a way to predict, given a byte-pair replacement, if the new byte-pair (after-replacement) will be good for replacement too or not.
Here are my thoughts at first sight. Maybe you already do or have already thought all this.
I would try the following.
Two adjustable parameters:
Number of byte-pair occurrences in chunk of data before to consider replacing it. (So that the dictionary doesn't grow faster than the chunk shrinks.)
Number of replacements by pass before it's probably not worth replacing anymore. (So that the algorithm stops wasting time when there's maybe only 1 or 2 % left to gain.)
I would do passes, as long as it is still worth compressing one more level (according to parameter 2). During each pass, I would keep a count of byte-pairs as I go.
I would play with the two parameters a little and see how it influences compression ratio and speed. Probably that they should change dynamically, according to the length of the chunk to compress (and maybe one or two other things).
Another thing to consider is the data structure used to store the count of each byte-pair during the pass. There very likely is a way to write a custom one which would be faster than generic data structures.
Keep us posted if you try something and get interesting results!
Yes, keep us posted.
guarantee?
BobMcGee gives good advice.
However, I suspect that "Limit the block size to less than 65kB ... . This guarantees not all bytes will be used" is not always true.
I can generate a (highly artificial) binary file less than 1kB long that has a byte pair that repeats 10 times, but cannot be compressed at all with BPE because it uses all 256 bytes -- there are no free bytes that BPE can use to represent the frequent byte pair.
If we limit ourselves to 7 bit ASCII text, we have over 127 free bytes available, so all files that repeat a byte pair enough times can be compressed at least a little by BPE.
However, even then I can (artificially) generate a file that uses only the isgraph() ASCII characters and is less than 30kB long that eventually hits the "no free bytes" limit of BPE, even though there is still a byte pair remaining with over 4 repeats.
single pass
It seems like this algorithm can be slightly tweaked in order to do it in one pass.
Assuming 7 bit ASCII plaintext:
Scan over input text, remembering all pairs of bytes that we have seen in some sort of internal data structure, somehow counting the number of unique byte pairs we have seen so far, and copying each byte to the output (with high bit zero).
Whenever we encounter a repeat, emit a special byte that represents a byte pair (with high bit 1, so we don't confuse literal bytes with byte pairs).
Include in the internal list of byte "pairs" that special byte, so that the compressor can later emit some other special byte that represents this special byte plus a literal byte -- so the net effect of that other special byte is to represent a triplet.
As phkahler pointed out, that sounds practically the same as LZW.
EDIT:
Apparently the "no free bytes" limitation I mentioned above is not, after all, an inherent limitation of all byte pair compressors, since there exists at least one byte pair compressor without that limitation.
Have you seen
"SCZ - Simple Compression Utilities and Library"?
SCZ appears to be a kind of byte pair encoder.
SCZ apparently gives better compression than other byte pair compressors I've seen, because
SCZ doesn't have the "no free bytes" limitation I mentioned above.
If any byte pair BP repeats enough times in the plaintext (or, after a few rounds of iteration, the partially-compressed text),
SCZ can do byte-pair compression, even when the text already includes all 256 bytes.
(SCZ uses a special escape byte E in the compressed text, which indicates that the following byte is intended to represent itself literally, rather than expanded as a byte pair.
This allows some byte M in the compressed text to do double-duty:
The two bytes EM in the compressed text represent M in the plain text.
The byte M (without a preceeding escape byte) in the compressed text represents some byte pair BP in the plain text.
If some byte pair BP occurs many more times than M in the plaintext, then the space saved by representing each BP byte pair as the single byte M in the compressed data is more than the space "lost" by representing each M as the two bytes EM.)
You can also optimize the dictionary so that:
AA1BB2CC3DD4EE5FF6GG7HH8 is a sequential run of 8 token.
Rewrite that as:
AA1<255>BBCCDDEEFFGGHH<255> where the <255> tells the program that each of the following byte pairs (up to the next <255>) are sequential and incremented by one. Works great for text
files and any where there are at least 4 sequential tokens.
save 175 bytes on recent test.
Here is a new BPE(http://encode.ru/threads/1874-Alba).
Example for compile,
gcc -O1 alba.c -o alba.exe
It's faster than default.
There is an O(n) version of byte-pair encoding which I describe here. I am getting a compression speed of ~200kB/second in Java.
the easiest efficient structure is a 2 dimensional array like byte_pair(255,255). Drop the counts in there and modify as the file compresses.

Resources