How to determine the optimal capacity for Quadtree subdivision? - algorithm

I've created a flocking simulation using Boid's algorithm and have integrated a quadtree for optimization. Boids are inserted into the quadtree if the quadtree has not yet met its boid capacity. If the quadtree has met its capacity, it will subdivide into smaller quadtrees and the remaining boids will try to insert again on that one, recursively.
The performance seems to get better if I increase the capacity from its default 4 to one that is capable of holding more boids like 20, and I was just wondering if there is any sort of rule or methodology that goes into picking the optimal capacity formulaically.
You can view the site live here or the source code here if relevant.

I'd assume it very much depends on your implementation, hardware, and the data characteristics.
Implementation:
An extreme case would be using GPU processing to compare entries. If you support that, having very large nodes, potentially just a single node containing all entries, may be faster than any other solution.
Hardware:
Cache size and Bus speed will play a big role, also depending on how much memory every node and every entry consumes. Accessing a sub-node that is not cached is obviously expensive, so you may want to increase the size of nodes in order to reduce sub-node traversal.
-> Coming back to implementation, storing the whole quadtree on a contiguous segment of memory can be very beneficial.
Data characteristics:
Clustered data: Having strongly clustered data can have adverse effect on performance because it may cause the tree to become very deep. In this case, increasing node size may help.
Large amounts of data will mean that you may get over a threshold very everything fits into a cache. In this case, making nodes larger will save memory because you will have fewer nodes and everything may fit into the cache again.
In my experience I found that 10-50 entries per node gives the best performance across different datasets.
If you update your tree a lot, you may want to define a threshold to avoid 'flickering' and frequent merging/splitting of nodes. I.e. split nodes with >25 entries but merge them only when getting less than 15 entries.
If you are interested in a quadtree-like structure that avoids degenerated 'deep' quadtrees, have a look at my PH-Tree. It is structured like a quadtree but operates on bit-level, so maximum depth is strictly limited to 64 or 32, depending on how many bits your data has. In practice the depth will rarely exceed 10 levels or so, even for very dense data. Note: A plain PH-Tree is a key-value 'map' in the sense that every coordinate (=key) can only have one entry (=value). That means you need to store lists or sets of entries in case you expect more than one entry for any given coordinate.

Related

shared memory and population dynamics on a landscape

I would like to parallelize population dynamics for individuals moving on a 2D landscape. The landscape will be divided into cells with each processing core operating on individuals that exist in a specific cell.
The problem is that because the individuals move they travel between cells. Meanwhile the positions of individuals in a given cell (and its neighboring cells) must be known at any point in time in order to determine when pairs of individuals can mate.
In openMPI, it would be necessary to pass the structures of individuals (in this case, a list of mutations, and their locations in a genome) as messages whenever they move to a different cell, which would be very slow.
However, it seems that in OpenMP there is a way for the processing cores to share the memory for the entire list of genomes / individuals (i.e., for all cells). In this case, there would be no need for message passing, and the code could be very efficient.
Is my understanding of openMP correct? The nodes on my cluster each contain 32 processing cores. Does this mean I am limited to sharing memory among these 32 cores?
Thank you

Redis GEORADIUS with one ZSET versus a lot of ZSETs of particular size

What will work faster, one big ZSET with geodata where I'll query for 100m radius with GEORADIUS
OR
a lot of ZSETs where each ZSET is responsible for 100m X 100m square covering the whole world? and named after this 100m squares like:
left_corner1_49_2440000_28_5010000
left_corner2_49_2450000_28_5010000
.......
and have all the 100 meters to the right and bottom inside the sets.
So when searching for the nearest point I'll just omit the redundant digits from gps like: 49.2440408, 28.5011694 will become
49.2440000, 28.5010000 so this way I'll know the ZSETS's name where just to get all the exact values with 100 meters precision.
OR to question it in general form: how are the ZSET's names are stored and accessed in redis? If I have too much ZSETS will it impact performance while accessing them?
Precise comparison of this approaches could only be done via benchmark and it would be specific to your dataset and configuration. But architecturally speaking, your pros and cons are:
BIG ZSET: less bandwidth and less operations (CPU cycles) taken to execute, no problems on borders (possible duplicates with many ZSETS), can get throughput with sharding;
MANY ZSETS: less latency for other operations (while big ZSET is going, other commands are waiting), can get throughput with sharding AND latency with clustering.
As for bottom line question, I did not see implementation code, but set names should be the same keys as any other keys you use. This is what Redis FAQ says about number of keys:
What is the maximum number of keys a single Redis instance can hold? <...>
Redis can handle up to 2^32 keys, and was tested in practice to handle
at least 250 million keys per instance.
UPDATE:
Look at what Redis docs say about GEORADIUS:
Time complexity: O(N+log(M)) where N is the number of elements inside
the bounding box of the circular area delimited by center and radius
and M is the number of items inside the index.
It means that items outside of your query make O(log(M)) impact on your query. So, 17 hops for 10m items or 21 hop for 1b items which is quite affordable. The question left is will you do partitioning between nodes?

Using ChronicleMap as a key-value database

I would like to use a ChronicleMap as a memory-mapped key-value database (String to byte[]). It should be able to hold up to the order of 100 million entries. Reads/gets will happen much more frequently than writes/puts, with an expected write rate of less than 10 entries/sec. While the keys would be similar in length, the length of the value could vary strongly: it could be anything from a few bytes up to tens of Mbs. Yet, the majority of values will have a length between 500 to 1000 bytes.
Having read a bit about ChronicleMap, I am amazed about its features and am wondering why I can't find articles describing it being used as a general key-value database. To me there seem to be a lot of advantages of using ChronicleMap for such a purpose. What am I missing here?
What are the drawbacks of using ChronicleMap for the given boundary conditions?
I voted for closing this question because any "drawbacks" would be relative.
As a data structure, Chronicle Map is not sorted, so it doesn't fit when you need to iterate the key-value pairs in the sorted order by key.
Limitation of the current implementation is that you need to specify the number of elements that are going to be stored in the map in advance, and if the actual number isn't close to the specified number, you are going to overuse memory and disk (not very severely though, on Linux systems), but if the actual number of entries exceeds the specified number by approximately 20% or more, operation performance starts to degrade, and the performance hit grows linearly with the number of entries growing further. See https://github.com/OpenHFT/Chronicle-Map/issues/105

Balanced trees and space and time trade-offs

I was trying to solve problem 3-1 for large input sizes given in the following link http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/assignments/MIT6_006F11_ps3_sol.pdf. The solution uses an AVL tree for range queries and that got me thinking.
I was wondering about scalability issues when the input size increases from a million to a billion and beyond. For instance consider a stream of integers (size: 4 bytes) and input of size 1 billion, the space required to store the integers in memory would be ~3GB!! The problem gets worse when you consider other data types such as floats and strings with the input size the order of magnitude under consideration.
Thus, I reached the conclusion that I would require the assistance of secondary storage to store all those numbers and pointers to child nodes of the AVL tree. I was considering storing the left and right child nodes as separate files but then I realized that that would be too many files and opening and closing the files would require expensive system calls and time consuming disk access and thus at this point I realized that AVL trees would not work.
I next thought about B-Trees and the advantage they provide as each node can have 'n' children, thereby reducing the number of files on disk and at the same time packing in more keys at every level. I am considering creating separate files for the nodes and inserting the keys in the files as and when they are generated.
1) I wanted to ask if my approach and thought-process is correct and
2) Whether I am using the right data structure and if B-Trees are the right data structure what should the order be to make the application efficient? What flavour of B Trees would yield maximum efficiency. Sorry for the long post! Thanks in advance for your replies!
Yes, you're reasoning is correct, although there are probably smarter schemes than to store one node per file. In fact, a B(+)-Tree often outperforms a binary search tree in practice (especially for very large collections) for numerous reasons and that's why just about every major database system uses it as its main index structure. Some reasons why binary search trees don't perform too well are:
Relatively large tree height (1 billion elements ~ height of 30 (if perfectly balanced)).
Every comparison is completely unpredictable (50/50 choice), so the hardware can't pre-fetch memory and fill the cpu pipeline with instructions.
After the upper few levels, you jump far away and to unpredictable locations in memory, each possibly requiring accessing the hard drive.
A B(+)-Tree with a high order will always be relatively shallow (height of 3-5) which reduces number of disk accesses. For range queries, you can read consecutively from memory while in binary trees you jump around a lot. Searching in a node may take a bit longer, but practically speaking you are limited by memory accesses not CPU time anyway.
So, the question remains what order to use? Usually, the node size is chosen to be equal to the page size (4-64KB) as optimizing for disk accesses is paramount. The page size is the minimal consecutive chunk of memory your computer may load from disk to main memory. Depending on the size of your key, this will result in a different number of elements per node.
For some help for the implementation, just look at how B+-Trees are implemented in database systems.

Why skiplist memory locality is poor but balanced tree is good?

A guy once challenged antirez(author of Redis) why Redis use skip list for the implementation sorted sets in ycombinator:
I was looking at Redis yesterday and noticed this. Is there any
particular reason you chose skip list instead of btrees except for
simplicity? Skip lists consume more memory in pointers and are
generally slower than btrees because of poor memory locality so
traversing them means lots of cache misses. I also suggested a way to
improve throughput when you guarantee each command's durability (at
the end of the wiki page):
http://code.google.com/p/redis/wiki/AppendOnlyFileHowto Also, have you
thought about accommodating read-only traffic in an additional thread
as a way to utilize at least two cores efficiently while sharing the
same memory?
Then antirez answered:
There are a few reasons: 1) They are not very memory intensive. It's
up to you basically. Changing parameters about the probability of a
node to have a given number of levels will make then less memory
intensive than btrees. 2) A sorted set is often target of many ZRANGE
or ZREVRANGE operations, that is, traversing the skip list as a linked
list. With this operation the cache locality of skip lists is at least
as good as with other kind of balanced trees. 3) They are simpler to
implement, debug, and so forth. For instance thanks to the skip list
simplicity I received a patch (already in Redis master) with augmented
skip lists implementing ZRANK in O(log(N)). It required little changes
to the code. About the Append Only durability & speed, I don't think
it is a good idea to optimize Redis at cost of more code and more
complexity for a use case that IMHO should be rare for the Redis
target (fsync() at every command). Almost no one is using this feature
even with ACID SQL databases, as the performance hint is big anyway.
About threads: our experience shows that Redis is mostly I/O bound.
I'm using threads to serve things from Virtual Memory. The long term
solution to exploit all the cores, assuming your link is so fast that
you can saturate a single core, is running multiple instances of Redis
(no locks, almost fully scalable linearly with number of cores), and
using the "Redis Cluster" solution that I plan to develop in the
future.
I read that carefully, but I can't understand why skip list comes with poor memory locality? And why balanced tree will lead a good memory locality?
In my opinion, memory locality is about storing data in a continuous memory. I think it's true when read data in address x, CPU will load the data in address x+1 into cache(Based on some experiments by C, years ago). So traversal an array will result a high possibility cache hit and we can say array has good memory locality.
But when comes to skip list and balanced tree, both aren't arrays and don't store data continuously. So I think their memory locality are both poor. So could anyone explain a little for me?
Maybe the guy meant that there is only one key value at skip list node (in case of default implementation) and N keys at b-tree node with linear layout. So we can load a bunch of b-tree keys from node into the cache.
you've said:
both aren't arrays and don't store data continuously
but we do. We store data continiously at b-tree node.

Resources