How can we use a contiguous block of memory in such a way that some part of it links with the remaining part? For example if I allocate a contiguous block of bytes using malloc, and now I want to structure it in such a way that the initial part of the block will be structured as pointers which points to remaining part. That means the pointers and the pointing objects should be contiguous...??
The question doesn't make much sense to me. Let's assume you wanted nItems of the size sizeBytes (meaning they are all the same size), you would not need to store the pointers, because you can compute the offsets to the allocated memory whenever you needed it. So you are probably missing some criteria in your question. Here is how you'd do that:
void *block = malloc(nItems * sizeBytes);
Then to reach into the n-thobject, you'd simply do:
void *myMemory = block + n * sizeBytes;
You'd want to possibly do some bounds checking there...
But that's too easy, so I'm guessing you really have structures of different sizes that you want to allocate in a single malloc and get access to. So it's not just a question of figuring out what the address of a "sub block of memory" is, but you'd want to know how to cast it so that you can later make sense of the object (assuming it's a C structure). So I guess I'd have to say I remain confused by the question overall.
You'll probably want / need something like the pointer, the size, and the type of structure each "sub block" of memory is supposed to be. This would then indicate what your header information should probably look like. Roughly, you'd want to compute the storage required for your 'meta data' and then the 'payload data', and malloc those things together.
But it's not a trivial thing to implement, because you have to figure out how you tell your function that allocates / initializes the memory block what the mix of objects is going to be (and the sequence of the layout of each sub-object).
I'm afraid this question is woefully underspecified.
If you want a 2D array of one type of object, you can do it like this:
int entries = xSize * ySize; // create a 2D array of xSize by ySize dimensions
size_t buffSize = entries * objectSize; // objectSize is number of bytes for your object
void *block = malloc(buffSize);
Now to access any entry in your 2D array:
void *thingie = block + y * xSize + x;
Now thingie points to block that corresponds to x, y. If you wanted to, you can also change the layout of your memory object. Above I did row-major. You could do:
void *thing = block + x * ySize + y;
That would be column major. The above can extend to n-dimensions:
int entries = xSize * ySize * zSize; // create a 3D array of xSize, ySize, zSize dimensions
size_t buffSize = entries * objectSize; // objectSize is number of bytes for your object
void *block = malloc(buffSize);
And then:
void *thingie = block + z * ySize * xSize + y * xSize + x;
to get to your record in 3D cube. You can take this to any dimension that you want, of course, you'll blow up your memory sooner than later if you're dealing with large objects in large dimensional spaces.
Related
I need to use and recycle ids(ints) within a range say from 1 to 20million.
what is the most efficient way to do this?
Somethings i tried.
Generate a running sequence of numbers from 1 to k and store in a
map. after k numbers if lets say if id 2 becomes free we delete it
from the map. And continue our next id from k+1 (it will be good if
i can choose the id that was freed from the beginning(2) instead of
k+1. how can i do this ? )
Generate random numbers in between range 1 to 20 million and check
if its already used with a map lookup, if yes, choose another random
number or do number+1 until map lookup fails.
Storing all numbers from 1 to 20million in a set and taking one by one for use and add back when it's freed( this will have bigger
memory footprint and don't want to do this)
What is the most efficent way to solve this problem, if lets say around
50% of ids are used at any point of time
A space-efficient solution is to use a bit-mask to keep track of free entries. 20M bits is only 2.5MB.
If about half of them will be free, then when you need to allocate a new ID, you can just start at a random spot and walk forward until you find an entry with a free bit.
If you need a guaranteed time bound, then you can use an array of 64-bit words for your bit mask, and a bit mask of 1/64 the size to keep track of which words have free entries. Recurse until you get to one or two words.
If space isn't a problem, then the simplest fast way is to keep free IDs in a free list. That requires an array of up to 20M integers. You remember the last entry freed, and for every free node x, array[x] is the index of the preceeding freed node, or -1.
If your IDs actually point to something, then often you can use the very same array for the free list and the pointers, so the free list takes no extra memory at all.
20M of integers is about 80 Mb of RAM. If we are talking about Java, then according to this article, HashSet<Integer> can take up to 22 times more space, so it's about 1.7 Gb, wow.
You can implement your own bitset that supports fast selection of the next free ID. Bitset should take only about 2.4 Mb of RAM and we can find the next free ID in O(1). Haven't checked the code, it's mostly an idea:
int range = 20_000_000;
long[] bitset = new long[range / 64 + 1]; // About 2.4 Mb of RAM, array length is 312501
Stack<Integer> hasFreeIds = new Stack<Integer>(); // Slots in bitset with free IDs
for (int i = 0; i < bitset.length; ++i) { // All slots have free IDs in the beginning
hasFreeIds.push(i);
}
// Now `hasFreeIds` is about (8 + 4) * 312_000 bytes = ~4Mb of RAM
// Our structure should be ~6.4 Mb of RAM in total
// Complexity is O(1), so should be fast
int getNextFreeId() {
// Select the first slot with free IDs
int freeSlotPos = hasFreeIds.pop();
long slot = bitset[freeSlotPos];
// Find the first free ID
long lowestZeroBit = Long.lowestOneBit(~slot);
int lowestZeroBitPosition = Long.numberOfTrailingZeros(lowestZeroBit);
int freeId = 64 * freeSlotPos + lowestZeroBitPosition;
// Update the slot, flip the bit to mark it as used
slot |= lowestZeroBit;
bitset[freeSlotPos] = slot;
// If the slot still has free IDs, then push it back to our stack
if (~slot != 0) {
hasFreeIds.push(freeSlotPos);
}
return freeId;
}
// Complexity is also O(1)
void returnId(int id) {
// Find slot that contains this id
long slot = bitset[id / 64];
boolean slotIsFull = (~slot == 0L); // True if the slot does not have free IDs
// Flip the bit in the slot to mark it as free
int bitPosition = id % 64;
slot &= ~(1 << bitPosition);
bitset[id / 64] = slot;
// If this slot was full before, we need to push it to the stack
if (slotIsFull) {
hasFreeIds.push(id / 64);
}
}
Theoretically speaking, the fastest would be storing all free IDs in a linked list.
That is, push 20M sequential numbers into a linked list. To allocate an ID pop it from the front. And when an ID is free - push it at either top or bottom depending on your preferred staregy (i.e. would you reuse freed IDs first, or only after each preallocated one was used).
This way both allocating an ID and freeing it is O(1).
Now, as an optimization you don't really need to preallocate all your IDs. You should only store the highest ID allocated. When you need to allocate an ID and the list of free IDs is empty - just increase the highest ID variable and return it.
This way your list will never reach big numbers, unless they were really allocated and returned.
I have two 32-bit floating point numbers. I want to keep a count of how often any combination of the two occurs. I could technically do this by concatenating them into a string and use a regular hash map to keep track of the count, but the overhead of that is considerable in my application, so I was thinking if there would be a better way. I don't need to keep the full precision of a 32 bit float, and I know that one number is never > 10, and the other never > 100. So I could technically multiple the first by 10000, the second by 1000, cast the result to int to chop off anything after the comma, bit-shift the first nr 16 bits and & them together into an integer. I could then allocate an array of MAX_INT elements and use the integer I just created as an index into that array.
However, that would leave me with a 2GB array, most of which would be empty, so I'd like to avoid that. I was wondering if there are any hashing algorithms that go about this in a more sophisticated way, or any data structures that work in a 'tiered' way, like a tree where a lookup is first done on a combination of the first digits of each number, then on the second numbers and so on, so that no room needs to be allocated for any combinations that aren't known yet. (There is probably a problem with this exact approach, it's just an example of the direction I'm thinking in). Or any other way is fine too - like maybe a more sophisticated way of hashing two floats together, in such a way that the result is scaled between 0 and some number, where the chose of that 'some number' would give me a way to tune the max size of the lookup table in memory.
Any ideas?
You could copy the floats to one buffer and than hash that buffer, like this (this is c/c++):
int hashNumbers(float a, float b){
char bytes[2 * sizeof(float)];
memcpy(bytes, &a, sizeof(float));
memcpy(bytes + sizeof(float), &b, sizeof(float));
//I don't know your implementation of the actual hash function
return hash(bytes, 2 * sizeof(float)); //assuming the hash function would take in an array of bytes with its size.
//I don't know it's output type here. I'm assuming it's an int.
}
As for your has function you could use the modulo, if you for example just take the int which has 2^32 possibilities and that do x = x % 100, that would leave you with 100 possibilities as to what x could be.
If instead of 100 you would take a number which is a power of 2(2, 4, 8, 16, etc). You could accelerate this by instead of using modulo, you use bitwise operations. By doing: x = x >> 24. You now only have 256 possibilities for x.
I have a large genetic dataset (X, Y coordinates), of which I can easily know one dimension (X) during runtime.
I drafted the following code for a matrix class which allows to specify the size of one dimension, but leaves the other one dynamic by implementing std::vector. Each vector is new'd using unique_ptr, which is embedded in a C-style array, also with new and unique_ptr.
class Matrix
{
private:
typedef std::vector<Genotype> GenVec;
typedef std::unique_ptr<GenVec> upGenVec;
std::unique_ptr<upGenVec[]> m;
unsigned long size_;
public:
// ...
// construct
Matrix(unsigned long _size): m(new upGenVec[_size]), size_(_size)
{
for (unsigned long i = 0; i < this->size_; ++i)
this->m[i] = upGenVec(new GenVec);
}
};
My question:
Does it make sense to use this instead of std::vector< std::vector<Genotype> > ?
My reasoning behind this implementation is that I only require one dimension to be dynamic, while the other should be fixed. Using std::vectors could imply more memory allocation than needed. As I am working with data that would fill up estimated ~50GB of RAM, I would like to control memory allocation as much as I can.
Or, are there better solutions?
I won't cite any paragraphs from specification, but I'm pretty sure that std::vector memory overhead is fixed, i.e. it doesn't depend on number of elements it contains. So I'd say your solution with C-style array is actually worse memory-wise, because what you allocate, excluding actual data, is:
N * pointer_size (first dimension array)
N * vector_fixed_size (second dimension vectors)
In vector<vector<...>> solution what you allocate is:
1 * vector_fixed_size (first dimension vector)
N * vector_fixed_size (second dimension vectors)
I'm having trouble with the simple task of finding the maximum of an array in OpenCL.
__kernel void ndft(/* lots of stuff*/)
{
size_t thread_id = get_global_id(0); // thread_id = [0 .. spectrum_size[
/* MATH MAGIC */
// Now I have float spectrum_abs[spectrum_size] and
// I want the maximum as well as the index holding the maximum
barrier();
// this is the old, sequential code:
if (*current_max_value < spectrum_abs[i])
{
*current_max_value = spectrum_abs[i];
*current_max_freq = i;
}
}
Now I could add if (thread_id == 0) and loop through the entire thing as I would do on a single core system, but since performance is a critical issue (otherwise I wouldn't be doing spectrum calculations on a GPU), is there a faster way to do that?
Returning to the CPU at the end of the kernel above is not an option, because the kernel actually continues after that.
You will need to write a parallel reduction. Split your "large" array into small pieces (a size a single workgroup can effectively process) and compute the min-max in each.
Do this iteratively (involves both host and device code) till you are left with only one set of min/max values.
Note that you might need to write a separate kernel that does this unless the current work-distribution works for this piece of the kernel (see my question to you above).
An alternative if your current work distribution is amenable is to find the min max inside of each workgroup and write it to a buffer in global memory (index = local_id). After a barrier(), simply make the kernel running on thread_id == 0 loop across the reduced results and find the max in it. This will not be the optimal solution, but might be one that fits inside your current kernel.
Is it possible to create collision free hash function for a data structure with specific properties.
The datastructure is int[][][]
It contains no duplicates
The range of integers that are contained in it is defined. Let's say it's 0..1000, the maximal integer is definitely not greater than 10000.
Big problem is that this hash function should also be very fast.
Is there a way to create such a hash function? Maybe at run time depending on the integer range?
ADDITION: I should say that the purpose of this hash function is to quckily check if the particular combination was processed. So when some combination of numbers in the data structure is processed, I calculate the hash value and store it. Then when processing another combination of numbers within the data structure I will compare the hash values.
I think what you want is a "perfect hash" or even a "minimal perfect hash":
http://en.wikipedia.org/wiki/Perfect_hash_function
Edit: That said, if you're sure and certain you'll never go above [0...1000] and depending on what you need to do you probably can simply "bucket" your results directly in an array. If you don't have many elements, that array would be sparse (and hence a bit of a waste) but for at most 1001 elements going from [0...1000] an Object[1001] (or int[1001] or whatever) will probably do.
what if you just use a 64-bit value and store the location in each level of the hierarchy into one section of bits?
something like(off the top of my head): hash = (a << 34) | (b << 17) | (c)
A perfect hash is likely not feasible, because it can take a lot of computation time to find one for your data set.
Would a bool[][][] work for you, where true means a certain x,y,z combination has been processed? Below is a prototype for a three-dimensional bit array. Because of the limits of an Int32, this will only work up to a maximum index of about 1,024 (but would fit within 128 MB). You could get to 10,000 by creating a BitArray[][]. However, this is probably not practical at that size, because it would occupy over 116 GB of RAM.
Depending on your exact problem size and needs, a plain old hash table (with collisions) may be your best bet. That said, here is the prototype code:
public class ThreeDimensionalBitArray
{
// todo: consider making the size configurable
private const int MAX_INDEX = 1000;
private BitArray _bits = new BitArray(MAX_INDEX * MAX_INDEX * MAX_INDEX);
public bool this[int x, int y, int z]
{
get { return _bits[getBitIndex(x, y, z)]; }
set { _bits[getBitIndex(x, y, z)] = value; }
}
public ThreeDimensionalBitArray()
{
}
private static int getBitIndex(int x, int y, int z)
{
// todo: bounds check x, y, and z
return (x * MAX_INDEX * MAX_INDEX) + (y * MAX_INDEX) + z;
}
}
public class BitArrayExample
{
public static void Main()
{
ThreeDimensionalBitArray bitArray = new ThreeDimensionalBitArray();
Console.WriteLine(bitArray[500, 600, 700]); // "false"
bitArray[500, 600, 700] = true;
Console.WriteLine(bitArray[500, 600, 700]); // "true"
}
}