Is there a super fast compare byte[] method in Chronicle Bytes? - chronicle

I'm storing a Murmur3 64-bit hash as a byte array in a Chronicle bytes object. I trying to sort these keys as fast as physically possible. I implemented quick sort to do that. I noticed there a set of compare and swap methods but nothing for a byte array. Is there anything I can use to speed up my quick sort?
Profiling indicated that most pressure is on net.openhft.chronicle.bytes.AbstractBytes.readCheckOffset(long, long, boolean) AbstractBytes.java
Thanks for any hints.

The fastest way to compare two byte[] would be to use Unsafe to read an int or a long from the underlying arrays (and swap the byte order if little endian) This would give you a very fast comparison.
For the most part, Chronicle Bytes is designed to make it either to work with off heap memory though it also supports on heap e.g. byte[].

Related

nanopb oneof size requirements

I came across nanopb and want to use it in my project. I'm writing code for an embedded device so memory limitations are a real concern.
My goal is to transfer data items from device to device, each data item has an 32bit identifier and an value. The value can be anything from 1 char to float to long string. I'm wondering what would be the most efficient way of declaring the messages for this type of problem.
I was thinking something like this:
message data_msg{
message data_item{
int32 id = 1;
oneof value{
int8 ival = 2;
float fval = 3;
string sval = 4;
}
}
repeated data_item;
}
But as I understood this is converted in to C union, which is the size of the largest element. Say I limit the string to 50 chars, then the union is always 50 bytes long, even if I would need 4 bytes for a float.
Have I understood this correctly or is there some other way to accomplish this?
Thanks!
Your understanding is correct, the structure size in C will equal the size of the largest member of the oneof. However this is only the size in memory, the message size after serialization will be the minimum needed for the contents.
If the size in memory is an issue, there are several options available. The default of allocating maximum possibly needed size makes memory management easy. If you want to dynamically allocate only the needed amount of memory, you'll have to decide how you want to do it:
Using PB_ENABLE_MALLOC compilation option, you can use the FT_POINTER field type for the string and other large fields. The memory will then be allocated using malloc() from the system heap.
With FT_CALLBACK, instead of allocating any memory, you'll get a callback where you can read out the string and process or store it any way you want. For example, if you wanted to write the string to SD card, you could do so without ever storing it completely in memory.
In overall system design, static allocation for maximum size required is often the easiest to test for. If the data fits once, it will always fit. If you go for dynamic allocation, you'll need to analyze more carefully what is the maximum possible memory usage needed.

How can a compact trie be beneficial?

I just learned that a compact trie can save more memory compared to a regular trie, by storing 3 integers as a reference to a string object and instead of actually storing a string in each node. However, I'm still confused about how it can actually save memory using that method.
If in a node of a compact trie, we store 3 integers and a reference to a String object, would that not save any memory at all if that String object is also stored in memory?
If this is the case, is the compact trie only beneficial when we store the String object on disk.
The compact storage of a compressed trie can be more space efficient if you're already storing the strings for some other purpose.
The compact and non-compact versions will have similar memory usage if you're also counting the storing of the strings. The compact version might even be worse, depending on how many bytes your integers and pointers require.
For a non-compact compressed trie:
Each node would have a string which requires a pointer (let's say 4 bytes) and a length (2 bytes). This gives us 6 bytes.
On top of this we've need to store the actual string (even if we're already storing these elsewhere).
For a compact compressed trie:
Each node would have 3 integers (2 bytes each). This gives us 6 bytes.
If we're counting storing the strings as well, we'd have the actual strings (which is the same as the above), in addition to the overhead (pointer and length) of those strings.
Given these numbers (if we're counting storing the strings), this version would be worse.
For a non-compressed trie: (that is, where we only store one character per node)
Each node directly stores its one character.
There would be no string overhead.
However, if you have long chains, you could have many more nodes with this representation, thus this could end up being less time and space efficient (especially considering the time cost of having to jump between a bunch of locations in memory instead of just being able to read a sequential block of memory).

What's the reason behind ZigZag encoding in Protocol Buffers and Avro?

ZigZag requires a lot of overhead to write/read numbers. Actually I was stunned to see that it doesn't just write int/long values as they are, but does a lot of additional scrambling. There's even a loop involved:
https://github.com/mardambey/mypipe/blob/master/avro/lang/java/avro/src/main/java/org/apache/avro/io/DirectBinaryEncoder.java#L90
I don't seem to be able to find in Protocol Buffers docs or in Avro docs, or reason myself, what's the advantage of scrambling numbers like that? Why is it better to have positive and negative numbers alternated after encoding?
Why they're not just written in little-endian, big-endian, network order which would only require reading them into memory and possibly reverse bit endianness? What do we buy paying with performance?
It is a variable length 7-bit encoding. The first byte of the encoded value has it high bit set to 0, subsequent bytes have it at 1. Which is the way the decoder can tell how many bytes were used to encode the value. Byte order is always little-endian, regardless of the machine architecture.
It is an encoding trick that permits writing as few bytes as needed to encode the value. So an 8 byte long with a value between -64 and 63 takes only one byte. Which is common, the range provided by long is very rarely used in practice.
Packing the data tightly without the overhead of a gzip-style compression method was the design goal. Also used in the .NET Framework. The processor overhead needed to en/decode the value is inconsequential. Already much lower than a compression scheme, it is a very small fraction of the I/O cost.

Redis 10x more memory usage than data

I am trying to store a wordlist in redis. The performance is great.
My approach is of making a set called "words" and adding each new word via 'sadd'.
When adding a file thats 15.9 MB and contains about a million words, the redis-server process consumes 160 MB of ram. How come I am using 10x the memory, is there any better way of approaching this problem?
Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.
A Redis set is implemented as a hash table. This includes:
an array of pointers growing geometrically (powers of two)
a second array may be required when incremental rehashing is active
single-linked list cells representing the entries in the hash table (3 pointers, 24 bytes per entry)
Redis object wrappers (one per value) (16 bytes per entry)
actual data themselves (each of them prefixed by 8 bytes for size and capacity)
All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)
Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.
If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.
To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:
hash it modulo 10000 (done on client side)
HMSET words:[hashnum] [word] 1
Instead of storing:
words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }
you can store:
words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...
To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).
With this strategy, significant memory saving can be done provided the modulo of the hash is
chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):
# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
Beware: the name of these parameters have changed with Redis >= 2.6.
Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.
As for my experiments, It is better to store your data inside a hash table/dictionary . the best ever case I reached after a lot of benchmarking is to store inside your hashtable data entries that are not exceeding 500 keys.
I tried standard string set/get, for 1 million keys/values, the size was 79 MB. It is very huge in case if you have big numbers like 100 millions which will use around 8 GB.
I tried hashes to store the same data, for the same million keys/values, the size was increasingly small 16 MB.
Have a try in case if anybody needs the benchmarking code, drop me a mail
Did you try persisting the database (BGSAVE for example), shutting the server down and getting it back up? Due to fragmentation behavior, when it comes back up and populates its data from the saved RDB file, it might take less memory.
Also: What version of Redis to you work with? Have a look at this blog post - it says that fragmentation has partially solved as of version 2.4.

Sort serial data with buffer

Is there any algorithms to sort data from serial input using buffer which is smaller than data length?
For example, I have 100 bytes of serial data, which can be read only once, and 40 bytes buffer. And I need to print out sorted bytes.
I need it in Javascript, but any general ideas are appreciated.
This kind of sorting is not possible in a single pass.
Using your example: suppose you have filled your 40 byte buffer, so you need to start printing out bytes in order to make room for the next one. In order to print out sorted data, you must print the smallest byte first. However, if the smallest byte has not been read, you can't possibly print it out yet!
The closest relevant fit to your question may be external sorting algorithms, which take multiple passes in order to sort data that can't fit into memory. That is, if you have peripherals that can store the output of a processing pass, you can sort data larger than your memory in O(log(N/M)) passes, where N is the size of the problem, and M is the size of your memory.
The classic storage peripheral for external sorting is the tape drive; however, the same algorithms work for disk drives (of whatever kind). Also, as cache hierarchies grow in depth, the principles of external sorting become more relevant even for in-memory sorts -- try taking a look at cache-oblivious algorithms.

Resources