What data structure do vectors in Clojure use? - data-structures

What data structure does Clojure use to implement its vector type?
I ask because they have some interesting complexity properties. It is cheap (O(log32(N))) to index in to them, and you can get a new copy with any item changed cheaply.
This would lead me to think that it is based on (really wide) tree, but that wouldn't explain why it is cheap to add to one end, but not the other. You also can't cheaply insert or delete elements in the middle of a vector.

Yes, they are wide trees. http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation.html and http://hypirion.com/musings/understanding-persistent-vector-pt-1 are two article series describing in more detail how they work.

Related

A data structure with certain properties

I want to implement a data structure myself in C++11. What I'm planning to do is having a data structure with the following properties:
search. O(log(n))
insert. O(log(n))
delete. O(log(n))
iterate. O(n)
What I have been thinking about after research was implementing a balanced binary search tree. Are there other structures that would fulfill my needs? I am completely new to this topic and thought a question here would give me a good jumpstart.
First of all, using the existing standard library data types is definitely the way to go for production code. But since you are asking how to implement such data structures yourself, I assume this is mainly an educational exercise for you.
Binary search trees of some form (https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree#Implementations) or B-trees (https://en.wikipedia.org/wiki/B-tree) and hash tables (https://en.wikipedia.org/wiki/Hash_table) are definitely the data structures that are usually used to accomplish efficient insertion and lookup. If you want to go wild you can combine the two by using a tree instead of a linked list to handle hash collisions (although this has a good potential to actually make your implementation slower if you don't make massive mistakes in sizing your hash table or in choosing an adequate hash function).
Since I'm assuming you want to learn something, you might want to have a look at minimal perfect hashing in the context of hash tables (https://en.wikipedia.org/wiki/Perfect_hash_function) although this only has uses in special applications (I had the opportunity to use a perfect minimal hash function exactly once). But it sure is fascinating. As you can see from the link above, the botany of search trees is virtually limitless in scope so you can also go wild on that front.

How to implement an efficient excel-like app

I need to implement an efficient excel-like app.
I'm looking for a data structure that will:
Store the data in an efficient manner (for example - I don't want
to pre-allocate memory for unused cells).
Allow efficient update when the user changes a formula in one of the cells
Any ideas?
Thanks,
Li
In this case, you're looking for an online dictionary structure. This is a category of structures which allow you to associate one morsel of data (in this case, the coordinates that represent the cell) with another (in this case, the cell contents or formula). The "online" adjective means dictionary entries can be added, removed, or changed in real time.
There's many such structures. To name some more common ones: hash tables, binary trees, skip lists, linked lists, and even lists in arrays.
Of course, some of these are more efficient than others (depending on implementation and the number of entries). Typically I use hash tables for this sort of problem.
However, if you need to do range querying "modify all of the cells in this range", you may be better off with a binary tree or a more complicated spatial structure -- but not likely given the simple requirements of the problem.

Data structure for storing entities in game

I have a tiled map which is made of chunks and each chunk is a rectangle of tiles. Now I want to add entities to this(each entity stand on a specific tile), and every loop all the entities should call their update() function, so I wanted to ask for a suggestion: what data structure should I use for saving their locations?
I don't know yet what kind of methods I'll have, but I'll probably need a method that gets all the entities from a specific area(maybe a point) for drawing for example. This is a critic question because there maybe a huge map like 100x100 chunks where each chunk is 30x20 tiles so it will be 3000x2000 tiles, and lots of entities for example 1000, so if I'll save it in a list it will be very slow to search for an entities O(n) and if every entities make a search it will take O(n^2).
Right now I have a couple of solutions but they are all problematic:
kd-tree(for 2d) - since each loop all the entities can change their locations, the complexity of updating them will be the same as rebuilding the whole tree each loop O(nlogn).
each chunk will save the entities that belong to it - my best solution so far, easy updating, but the complexity is higher then in the kd-tree.
So does anybody have a suggestion for this problem?
A dictionary that maps tile (position) towards a list of all entities on that tile. All entities should have a position property and an event notifying when it changes, so that the dictionary can be updated at each movement.
(There should be no list for the tiles without entities. The list should be created when an entitiy moves to that position, and removed when the last entity leaves the position.)
This might be a crude suggestion, and I'm sure it can be improved, but here's a thought:
First, store your positions in such a way that you can access them in constant time given a specific object. For instance, if you want to access them directly through your entities, you can store the position structs in a list/vector and give each entity a pointer/reference to its position.
Second, store an entity pointer/reference or GUID in the same struct as the entity position, so you can identify an entity based on a position object. (There is probably a better way I'm not thinking of right now though.)
Third, utilize some of the principles of sweep and prune/sort and sweep (common in 3D games): Keep two sorted position lists/vectors, one sorted in the x direction and the other sorted in the y direction. One can hold the actual position objects, and the other can hold pointers/references. These lists can take advantage of temporal coherence, so the cost of keeping them sorted shouldn't be too high, unless there's a lot of fast and chaotic movement.
An advantage of this setup is that it's really easy to figure out where every object is relative to each other. Want to know how many objects are within 10 squares of Billy the Elf in either direction? Check Billy's position and iterate forward/backward through both lists until you reach an entity more than 10 squares away in each direction.
If you're interested in the concept, look up sort and sweep (also known as sweep and prune). You'd only be using the first half of the algorithm, but it's used for broad-phase collision detection in practically every major 3D physics engine, so you know that it has to be fast in general. ;) There's a lot of information floating around about it, so you'll probably find much more sophisticated implementation ideas floating around too. (For instance, I don't like the indirection involved in storing a sorted list of pointers/references to position structs; working with the actual structs is more cache-efficient, but then you need to update the position in two places if you want to exploit temporal coherency with persistent arrays. Someone else may have thought of a more clever design that's escaping me right now.)
EDIT: I'd comment on Erik H's idea, but my rep isn't high enough. I just wanted to say that his idea sounds very well suited to your game, especially if you will have a lot of entities tightly packed on the same tile or in a small neighborhood. If I were you, I'd probably try it before the sweep and prune idea. However, it should be accompanied with a well-planned memory management strategy: If you have a dictionary of tile locations that naively map to vectors of entities, you're going to have a lot of memory being allocated and freed when entities move from one tile to another. Instead, you'll want to implement his idea as something more like a dictionary/linked list combo:
The dictionary keys would be tile positions, and the dictionary would return a single pointer to a linked list node. This node would be part of a linked list of all entities on the same tile. Whenever an entity moves from one tile to another, it will be removed from its current linked list and added to the new one. If an entity moves to an empty tile, it will be in a linked list all on its own, and it should be added to the dictionary. When the last entity moves from a tile, the entry for that tile should be removed from the dictionary. This will allow you to move around entities without continual dynamic allocation/deallocation, since you're just updating pointers (and the dictionary will probably be pretty memory efficient).
Note that you don't have to store full-blown entities in the linked lists, either; you can easily create your linked list out of lightweight objects (containing a pointer or GUID to the actual entity).

Iterable O(1) insert and random delete collection

I am looking to implement my own collection class. The characteristics I want are:
Iterable - order is not important
Insertion - either at end or at iterator location, it does not matter
Random Deletion - this is the tricky one. I want to be able to have a reference to a piece of data which is guaranteed to be within the list, and remove it from the list in O(1) time.
I plan on the container only holding custom classes, so I was thinking a doubly linked list that required the components to implement a simple interface (or abstract class).
Here is where I am getting stuck. I am wondering whether it would be better practice to simply have the items in the list hold a reference to their node, or to build the node right into them. I feel like both would be fairly simple, but I am worried about coupling these nodes into a bunch of classes.
I am wondering if anyone has an idea as to how to minimize the coupling, or possibly know of another data structure that has the characteristics I want.
It'd be hard to beat a hash map.
Take a look at tries.
Apparently they can beat hashtables:
Unlike most other algorithms, tries have the peculiar feature that the time to insert, or to delete or to find is almost identical because the code paths followed for each are almost identical. As a result, for situations where code is inserting, deleting and finding in equal measure tries can handily beat binary search trees or even hash tables, as well as being better for the CPU's instruction and branch caches.
It may or may not fit your usage, but if it does, it's likely one of the best options possible.
In C++, this sounds like the perfect fit for std::unordered_set (that's std::tr1::unordered_set or boost::unordered_set to you if you have an older compiler). It's implemented as a hash set, which has the characteristics you describe.
Here's the interface documentation. Note that the hash containers actually offer two sets of iterators, the usual ones and local ones which only go through one bucket.
Many other languages have "hash sets" as well, certainly Java and C#.

How to subdivide a 2d game world for better collision detection

I'm developing a game which features a sizeable square 2d playing area. The gaming area is tileless with bounded sides (no wrapping around). I am trying to figure out how I can best divide up this world to increase the performance of collision detection. Rather than checking each entity for collision with all other entities I want to only check nearby entities for collision and obstacle avoidance.
I have a few special concerns for this game world...
I want to be able to be able to use a large number of entities in the game world at once. However, a % of entities won't collide with entities of the same type. For example projectiles won't collide with other projectiles.
I want to be able to use a large range of entity sizes. I want there to be a very large size difference between the smallest entities and the largest.
There are very few static or non-moving entities in the game world.
I'm interested in using something similar to what's described in the answer here: Quadtree vs Red-Black tree for a game in C++?
My concern is how well will a tree subdivision of the world be able to handle large size differences in entities? To divide the world up enough for the smaller entities the larger ones will need to occupy a large number of regions and I'm concerned about how that will affect the performance of the system.
My other major concern is how to properly keep the list of occupied areas up to date. Since there's a lot of moving entities, and some very large ones, it seems like dividing the world up will create a significant amount of overhead for keeping track of which entities occupy which regions.
I'm mostly looking for any good algorithms or ideas that will help reduce the number collision detection and obstacle avoidance calculations.
If I were you I'd start off by implementing a simple BSP (binary space partition) tree. Since you are working in 2D, bound box checks are really fast. You basically need three classes: CBspTree, CBspNode and CBspCut (not really needed)
CBspTree has one root node instance of class CBspNode
CBspNode has an instance of CBspCut
CBspCut symbolize how you cut a set in two disjoint sets. This can neatly be solved by introducing polymorphism (e.g. CBspCutX or CBspCutY or some other cutting line). CBspCut also has two CBspNode
The interface towards the divided world will be through the tree class and it can be a really good idea to create one more layer on top of that, in case you would like to replace the BSP solution with e.g. a quad tree. Once you're getting the hang of it. But in my experience, a BSP will do just fine.
There are different strategies of how to store your items in the tree. What I mean by that is that you can choose to have e.g. some kind of container in each node that contains references to the objects occuping that area. This means though (as you are asking yourself) that large items will occupy many leaves, i.e. there will be many references to large objects and very small items will show up at single leaves.
In my experience this doesn't have that large impact. Of course it matters, but you'd have to do some testing to check if it's really an issue or not. You would be able to get around this by simply leaving those items at branched nodes in the tree, i.e. you will not store them on "leaf level". This means you will find those objects quick while traversing down the tree.
When it comes to your first question. If you only are going to use this subdivision for collision testing and nothing else, I suggest that things that can never collide never are inserted into the tree. A missile for example as you say, can't collide with another missile. Which would mean that you dont even have to store the missile in the tree.
However, you might want to use the bsp for other things as well, you didn't specify that but keep that in mind (for picking objects with e.g. the mouse). Otherwise I propose that you store everything in the bsp, and resolve the collision later on. Just ask the bsp of a list of objects in a certain area to get a limited set of possible collision candidates and perform the check after that (assuming objects know what they can collide with, or some other external mechanism).
If you want to speed up things, you also need to take care of merge and split, i.e. when things are removed from the tree, a lot of nodes will become empty or the number of items below some node level will decrease below some merge threshold. Then you want to merge two subtrees into one node containing all items. Splitting happens when you insert items into the world. So when the number of items exceed some splitting threshold you introduce a new cut, which splits the world in two. These merge and split thresholds should be two constants that you can use to tune the efficiency of the tree.
Merge and split are mainly used to keep the tree balanced and to make sure that it works as efficient as it can according to its specifications. This is really what you need to worry about. Moving things from one location and thus updating the tree is imo fast. But when it comes to merging and splitting it might become expensive if you do it too often.
This can be avoided by introducing some kind of lazy merge and split system, i.e. you have some kind of dirty flagging or modify count. Batch up all operations that can be batched, i.e. moving 10 objects and inserting 5 might be one batch. Once that batch of operations is finished, you check if the tree is dirty and then you do the needed merge and/or split operations.
Post some comments if you want me to explain further.
Cheers !
Edit
There are many things that can be optimized in the tree. But as you know, premature optimization is the root to all evil. So start off simple. For example, you might create some generic callback system that you can use while traversing the tree. This way you dont have to query the tree to get a list of objects that matched the bound box "question", instead you can just traverse down the tree and execute that call back each time you hit something. "If this bound box I'm providing intersects you, then execute this callback with these parameters"
You most definitely want to check this list of collision detection resources from gamedev.net out. It's full of resources with game development conventions.
For other than collision detection only, check their entire list of articles and resources.
My concern is how well will a tree
subdivision of the world be able to
handle large size differences in
entities? To divide the world up
enough for the smaller entities the
larger ones will need to occupy a
large number of regions and I'm
concerned about how that will affect
the performance of the system.
Use a quad tree. For objects that exist in multiple areas you have a few options:
Store the object in both branches, all the way down. Everything ends up in leaf nodes but you may end up with a significant number of extra pointers. May be appropriate for static things.
Split the object on the zone border and insert each part in their respective locations. Creates a lot of pain and isn't well defined for a lot of objects.
Store the object at the lowest point in the tree you can. Sets of objects now exist in leaf and non-leaf nodes, but each object has one pointer to it in the tree. Probably best for objects that are going to move.
By the way, the reason you're using a quad tree is because it's really really easy to work with. You don't have any heuristic based creation like you might with some BSP implementations. It's simple and it gets the job done.
My other major concern is how to
properly keep the list of occupied
areas up to date. Since there's a lot
of moving entities, and some very
large ones, it seems like dividing the
world up will create a significant
amount of overhead for keeping track
of which entities occupy which
regions.
There will be overhead to keeping your entities in the correct spots in the tree every time they move, yes, and it can be significant. But the whole point is that you're doing much much less work in your collision code. Even though you're adding some overhead with the tree traversal and update it should be much smaller than the overhead you just removed by using the tree at all.
Obviously depending on the number of objects, size of game world, etc etc the trade off might not be worth it. Usually it turns out to be a win, but it's hard to know without doing it.
There are lots of approaches. I'd recommend settings some specific goals (e.g., x collision tests per second with a ratio of y between smallest to largest entities), and do some prototyping to find the simplest approach that achieves those goals. You might be surprised how little work you have to do to get what you need. (Or it might be a ton of work, depending on your particulars.)
Many acceleration structures (e.g., a good BSP) can take a while to set up and thus are generally inappropriate for rapid animation.
There's a lot of literature out there on this topic, so spend some time searching and researching to come up with a list candidate approaches. Mock them up and profile.
I'd be tempted just to overlay a coarse grid over the play area to form a 2D hash. If the grid is at least the size of the largest entity then you only ever have 9 grid squares to check for collisions and it's a lot simpler than managing quad-trees or arbitrary BSP trees. The overhead of determining which coarse grid square you're in is typically just 2 arithmetic operations and when a change is detected the grid just has to remove one reference/ID/pointer from one square's list and add the same to another square.
Further gains can be had from keeping the projectiles out of the grid/tree/etc lookup system - since you can quickly determine where the projectile would be in the grid, you know which grid squares to query for potential collidees. If you check collisions against the environment for each projectile in turn, there's no need for the other entities to then check for collisions against the projectiles in reverse.

Resources