How to compress pointer ? eg. arbitrary bit pointer - algorithm

I'm coding a complex tree data structure, which stores lots of pointers. The pointers themselves occupy lots of space, and this is what I'm expecting to save.
So I'm here to ask whether there are examples on this. E.g.: For 64-bit data type, can I use a 32-bit or less pointer if the data it's pointing to is sure to be continuous?
I found a paper called Transparent Pointer Compression for Linked Data Structures, but I thought there could be a much simpler solution.
Update:
It is an octree. A paper here about this on GPU is GigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes, they use 15-bit pointers on GPU

Instead of using pointers, use an index into an array. The index can be a short if the array is less than 65536 in length, or int32_t if it's less than 2147483648.
An arbitrary pointer can really be anywhere in memory, so there's no way to shorten it by more than a couple of bits.

If the utilization of pointers takes a lot of space:
use an array of pointers, and replaces pointers with indexes in that array. That adds just another indirection. With less than 64k pointers, you need a [short] array (Linux)
Simple implementation
#define MAX_PTR 60000
void *aptr[MAX_PTR];
short nb = 0;
short ptr2index(void *ptr) {
aptr[nb] = ptr;
return (short)nb++;
}
void *index2ptr(short index) {
return aptr[index];
}
... utilization ...
... short next; // in Class
Class *c = new Class();
mystruct->next = ptr2index((void *)c);
...
Class *x = (Class *)index2ptr(otherstruct->next);

One option is to write a custom allocator to allocate big blocks of contiguous memory, and then store your nodes contiguously in there. Each of your nodes can then be referenced by a simple index that can be mapped back to memory using simple pointer arithmetic (e.g.: node_ptr = mem_block_ptr + node_index).
Soon you realise that having multiple of these memory blocks means that you no longer knows in which of them a specific node resides. This is where partitioning comes into scene. You can opt for horizontal and/or vertical partitioning. Both considerably increase the level of complexity, and both have pros and cons (see [1] and [2]).
The key thing here is to ensure that the data is split up in a predictable manner.
References:
Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes
37signals - Mr. Moore gets to punt on sharding

In some cases, you could simply use an array to hold the nodes. A binary tree node at arr[i] would have children from arr[(i*2)+1] to arr[(i+1)*2]. Its parent would be at arr[(i-1)/2], if i != 0. And to figure the real pointer address, of course, you could say &arr[i]. It's actually a rather common implementation for trees that are full by specification, like the tree used for a heap.
In order for a node to know for itself how to find its children, though, you'd likely need either an index or a pointer to the container. (And even then, with only one of the two pieces, you're having to do a bit of hoop-jumping; you really need both pieces in order to do things easily. But having to calculate stuff rather than remembering it, is kinda the price you pay when you're trying not to remember much.) In order to keep stuff reasonably space-efficient, you'd have to dumb down the nodes; make them basically structs, or even just the values, and let the tree class do all the node-finding stuff. It'd just hand out pointers to nodes, and that pointer would be all the container needs to figure out the node's index (and, thus, where its children will be). You'd also have to pass both a tree pointer and a node pointer to any function that wants to traverse the tree.
Note, though, this won't save much space unless your trees are consistently near full (that is, unless most/all of your leaf nodes are at the end). For every leaf node that's not at the bottom of the tree (where the top is the root), you waste something like ((node size) * (tree size / i)) bytes.
If you can't count on the tree being full, or the nodes being in a certain restricted space, then there's not a whole lot to optimize here. The whole point of trees is nodes have pointers to their children; you can fake it with an array, but it must be easy to somehow find a node's children in order to make a tree worthwhile.

A very simple way of dealing with your issue is simply to use less pointers (seems silly right) ?
Compare the two following approaches:
template <typename T>
struct OctreeNaiveNode {
T value;
Point center;
OctreeNaiveNode* parent;
std::unique_ptr<OctreeNaiveNode> children[8];
}; // struct OctreeNaiveNode
// sizeof(OctreeNaiveNode) >= sizeof(T) + sizeof(Point) + 9 * sizeof(void*)
template <typename T>
struct OctreeNode {
T value;
Point center;
std::unique_ptr<OctreeNode[]> children; // allocate for 8 only when necessary
}; // struct OctreeNode
// sizeof(OctreeNode) >= sizeof(T) + sizeof(Point) + sizeof(void*)
How does it work:
A parent pointer is only necessary for simple iterators, if you have far more less iterators than nodes, then it is more economic to have deep iterators: ie iterators that conserve a stack of parents up to the root. Note that in RB-tree it does not work so well (balancing), but in octree it should be better because the partitioning is fixed.
A single children pointer: instead of having an array of pointers to children, build a pointer to an array of children. Not only it means 1 dynamic allocation instead of 8 (less heap fragmentation/overhead), but it also means 1 pointer instead of 8 within your node.
Overhead:
Point = std::tuple<float,float,float> => sizeof(T) + sizeof(Point) >= 64 => +100%
Point = std::tuple<double,double,double> => sizeof(T) + sizeof(Point) >= 256 => +25%
So, rather than delving into compression pointers strategies, I advise you just rework your datastructures to have less pointers/indirection in the first place.

Related

Stack overflow solution with O(n) runtime

I have a problem related to runtime for push and pop in a stack.
Here, I implemented a stack using array.
I want to avoid overflow in the stack when I insert a new element to a full stack, so when the stack is full I do the following (Pseudo-Code):
(I consider a stack as an array)
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
Now, I know that for a single push operation to the stack with the size of n the action executes in the worst case in O(n).
I want to show that the runtime of n pushes to an empty stack in the worst case is also O(n).
Also how can I update this algorithm that for every push the operation will execute in a constant runtime in the worst case?
Amortized constant-time is often just as good in practice if not better than constant-time alternatives.
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
This is actually a very decent and respectable solution for a stack implementation because it has good locality of reference and the cost of reallocating and copying is amortized to the point of being almost negligible. Most generalized solutions to "growable arrays" like ArrayList in Java or std::vector in C++ rely on this type of solution, though they might not exactly double in size (lots of std::vector implementations increase their size by something closer to 1.5 than 2.0).
One of the reasons this is much better than it sounds is because our hardware is super fast at copying bits and bytes sequentially. After all, we often rely on millions of pixels being blitted many times a second in our daily software. That's a copying operation from one image to another (or frame buffer). If the data is contiguous and just sequentially processed, our hardware can do it very quickly.
Also how can I update this algorithm that for every push the operation
will execute in a constant runtime in the worst case?
I have come up with stack solutions in C++ that are ever-so-slightly faster than std::vector for pushing and popping a hundred million elements and meet your requirements, but only for pushing and popping in a LIFO pattern. We're talking about something like 0.22 secs for vector as opposed to 0.19 secs for my stack. That relies on just allocating blocks like this:
... of course typically with more than 5 elements worth of data per block! (I just didn't want to draw an epic diagram). There each block stores an array of contiguous data but when it fills up, it links to a next block. The blocks are linked (storing a previous link only) but each one might store, say, 512 bytes worth of data with 64-byte alignment. That allows constant-time pushes and pops without the need to reallocate/copy. When a block fills up, it just links a new block to the previous block and starts filling that up. When you pop, you just pop until the block becomes empty and then once it's empty, you traverse its previous link to get to the previous block before it and start popping from that (you can also free the now-empty block at this point).
Here's your basic pseudo-C++ example of the data structure:
template <class T>
struct UnrolledNode
{
// Points to the previous block. We use this to get
// back to a former block when this one becomes empty.
UnrolledNode* prev;
// Stores the number of elements in the block. If
// this becomes full with, say, 256 elements, we
// allocate a new block and link it to this one.
// If this reaches zero, we deallocate this block
// and begin popping from the previous block.
size_t num;
// Array of the elements. This has a fixed capacity,
// say 256 elements, though determined at runtime
// based on sizeof(T). The structure is a VLS to
// allow node and data to be allocated together.
T data[];
};
template <class T>
struct UnrolledStack
{
// Stores the tail end of the list (the last
// block we're going to be using to push to and
// pop from).
UnrolledNode<T>* tail;
};
That said, I actually recommend your solution instead for performance since mine barely has a performance edge over the simple reallocate and copy solutions and yours would have a slight edge when it comes to traversal since it can just traverse the array in a straightforward sequential fashion (as well as straightforward random-access if you need it). I didn't actually implement mine for performance reasons. I implemented it to prevent pointers from being invalidated when you push things to the container (the actual thing is a memory allocator in C) and, again, in spite of achieving true constant-time push backs and pop backs, it's still barely any faster than the amortized constant-time solution involving reallocation and memory copying.

List storing pointers or "plain object"

I am designing a class which tracks the user manipulations in a software in order to restore previous application states (i.e. CTRL+Z/CTRL+Y). I symply wanted to clarify something about performances.
I am using the std::list container of the STL. This list is not meant to contain really huge objects, but a significant number. Should I use pointers or not?
For instance, here is the kinds of objects which will be stored:
struct ImagesState
{
cv::Mat first;
cv::Mat second;
};
struct StatusBarState
{
std::string notification;
std::string algorithm;
};
For now, I store the whole thing under the form of struct pointers, such as:
std::list<ImagesStatee*> stereoImages;
I know (I think) that new and delete operators are time consuming, but I don't want to encounter a stack overflow with "plain object". Is it a bad design?
If you are using a list, i would suggest not to use the pointer. The list items are on the heap anyway and the pointer just adds an unnecessary layer of indirection.
If you are after performance, using std::list is most likely not the best solution. Using std::vector might boost your performance significantly since the objects are better for your caches.
Even in an vector, the objects would lie on the heap and therefore the pointer are not needed (they would even harm you more than with a list). You only have to care about them if you make an array on your stack.
like so:
Type arrayName[REALLY_HUGE_NUMBER]

What is the fastest way to Initialize a priority_queue from an unordered_set

It is said in construction of a priority queue , option (12):
template< class InputIt >
priority_queue( InputIt first, InputIt last,
const Compare& compare = Compare(),
Container&& cont = Container() );
But I don't know how ot use this.
I have a non-empty std::unordered_set<std::shared_ptr<MyStruct>> mySet, and I want to convert it to a priority queue. I also create a comparator struct MyComparator:
struct MyComparator {
bool operator()(const std::shared_ptr<myStruct>& a,
const std::shared_ptr<myStruct>& b){...}
};
Now how can I construct a new priority_queue myQueue in a better way? I used the following and it works:
std::priority_queue<std::shared_ptr<MyStruct>, std::deque<std::shared_ptr<MyStruct>, MyComparator>
myQueue(mySet.begin(), mySet.end());
I benchmarked both vector and deque, and I find deque will outperform vector when the size is relatively large (~30K).
Since we have already known the size of mySet, I should create the deque with that size. But how can I create this priority_queue with my own comparator and predefined deque, say myDeque?
Since you have already determined that std::deque gives you better performance than std::vector, I don't think there is much more you can do in terms of how you construct the priority_queue. As you have probably seen, there is no std::deque::reserve() method, so it's simply not possible to create a deque with memory allocated ahead of time. For most use cases this is not a problem, because the main feature of deque vs vector is that deque does not need to copy elements as new ones are inserted.
If you are still not achieving the performance you desire, you might consider either storing raw pointers (keeping your smart pointers alive outside), or simply changing your unordered_map to a regular map and relying on the ordering that container provides.

Performance of std::vector<Test> vs std::vector<Test*>

In an std::vector of a non POD data type, is there a difference between a vector of objects and a vector of (smart) pointers to objects? I mean a difference in the implementation of these data structures by the compiler.
E.g.:
class Test {
std::string s;
Test *other;
};
std::vector<Test> vt;
std::vector<Test*> vpt;
Could be there no performance difference between vt and vpt?
In other words: when I define a vector<Test>, internally will the compiler create a vector<Test*> anyway?
In other words: when I define a vector, internally will the compiler create a vector anyway?
No, this is not allowed by the C++ standard. The following code is legal C++:
vector<Test> vt;
Test t1; t1.s = "1"; t1.other = NULL;
Test t2; t2.s = "1"; t2.other = NULL;
vt.push_back(t1);
vt.push_back(t2);
Test* pt = &vt[0];
pt++;
Test q = *pt; // q now equal to Test(2)
In other words, a vector "decays" to an array (accessing it like a C array is legal), so the compiler effectively has to store the elements internally as an array, and may not just store pointers.
But beware that the array pointer is valid only as long as the vector is not reallocated (which normally only happens when the size grows beyond capacity).
In general, whatever the type being stored in the vector is, instances of that may be copied. This means that if you are storing a std::string, instances of std::string will be copied.
For example, when you push a Type into a vector, the Type instance is copied into a instance housed inside of the vector. The copying of a pointer will be cheap, but, as Konrad Rudolph pointed out in the comments, this should not be the only thing you consider.
For simple objects like your Test, copying is going to be so fast that it will not matter.
Additionally, with C++11, moving allows avoiding creating an extra copy if one is not necessary.
So in short: A pointer will be copied faster, but copying is not the only thing that matters. I would worry about maintainable, logical code first and performance when it becomes a problem (or the situation calls for it).
As for your question about an internal pointer vector, no, vectors are implemented as arrays that are periodically resized when necessary. You can find GNU's libc++ implementation of vector online.
The answer gets a lot more complicated at a lower than C++ level. Pointers will of course have to be involved since an entire program cannot fit into registers. I don't know enough about that low of level to elaborate more though.

Use cases of std::multimap

I don't quite get the purpose of this data structure. What's the difference between std::multimap<K, V> and std::map<K, std::vector<V>>. The same goes for std::multiset- it could just be std::map<K, int> where the int counts the number of occurrences of K. Am I missing something on the uses of these structures?
A counter-example seems to be in order.
Consider a PhoneEntry in an AdressList grouped by name.
int AdressListCompare(const PhoneEntry& p1, const PhoneEntry& p2){
return p1.name<p2.name;
}
multiset<PhoneEntry, AdressListCompare> adressList;
adressList.insert( PhoneEntry("Cpt.G", "123-456", "Cellular") );
adressList.insert( PhoneEntry("Cpt.G", "234-567", "Work") );
// Getting the entries
addressList.equal_range( PhoneENtry("Cpt.G") ); // All numbers
This would not be feasible with a set+count. Your Object+count approach seems to be faster if this behavior is not required. For instance the multiset::count() member states
"Complexity: logarithmic in size +
linear in count."
You could use make the substitutions that you suggest, and extract similar behavior. But the interfaces would be very different than when dealing with regular standard containers. A major design theme of these containers is that they share as much interface as possible, making them as interchangeable as possible so that the appropriate container can be chosen without having to change the code that uses it.
For instance, std::map<K, std::vector<V>> would have iterators that dereference to std::pair<K, std::vector<V>> instead of std::pair<K, V>. std::map<K, std::vector<V>>::Count() wouldn't return the correct result, failing to account for the duplicates in the vector. Of course you could change your code to do the extra steps needed to correct for this, but now you are interfacing with the container in a much different way. You can't later drop in unordered_map or some other map implementation to see it performs better.
In a broader sense, you are breaking the container abstraction by handling container implementation details in your code rather than having a container that handles it's own business.
It's entirely possible that your compiler's implementation of std::multimap is really just a wrapper around std::map<K, std::vector<V>>. Or it might not be. It could be more efficient and friendly to object pool allocation (which vectors are not).
Using std::map<K, int> instead of std::multiset is the same case. Count() would not return the expected value, iterators will not iterate over the duplicates, iterators will dereference to std::pair<k, int> instead of directly to `K.
A multimap or multiset allows you to have elements with duplicate keys.
ie a set is a non-ordered group of elements that are all unique in that {A,B,C} == {B,C,A}

Resources