Complexity: Is it more efficient to store two objects as variables or as array [2]? - complexity-theory

Is it more efficient to store two objects than variables or as array [2], or doesn't it make any difference?
I am sorry if this is a dump question.

That depends on the language. For example in c/c++ it is the same, on the other hand in Java the array has some overhead and therefore more expensive.

Related

Tuple v/s StaticVectors in Julia

If I understand correctly, then as tuples are immutable in Julia, they must also be stack allocated (similar to StaticVectors). So there should be not any advantage of using StaticVectors in place of Tuples when I am dealing with small vectors say, a length 3 vector for coordinates of a particle. Can someone highlight the advantages of using StaticVectors in such cases. And more broadly what will be the use cases where I would possible want to choose using one over the other?
Thanks.
The raw performance is similar, since StaticArrays are built on tuples. The point of StaticArrays is all the functionality, the linear algebra, the solvers, sorting, the mutable arrays, etc.
Tuples are a barebones data collection with barely any mathematical structure. That's fine as far as it goes, but StaticArrays has done most of the work you would have to do yourself with tuples.

Why is lookup in an Array O(1)?

I believe that in some languages other than Ruby, an Array lookup is O(1) because you know where the data starts, and you multiply the index by the size of the data the array is holding, and then access that memory location.
However, in Ruby, an Array can have objects from different classes, so how does it manage to do a lookup of O(1) complexity?
What #Neil Slater said, with a little more detail…
There are basically two plausible approaches to storing an array of heterogeneous objects of differing sizes:
Store the objects as a singly- or doubly-linked list, with the storage space for each individual object preceded by pointer(s) to the preceding and/or following objects. This structure has the advantage of making it very easy to insert new objects at arbitrary points without shifting around the rest of the array, but the huge downside is that looking up an object by its position is generally O(N), since you have to start from one end of the list and jump through it node-by-node until you arrive at the n-th one.
Store a table or array of constant-sized pointers to the individual objects. Since this lookup table contains constant-sized items in a contiguous ordered layout, looking up the addresses of individual objects O(1); the table is just a C-style array, in which lookup only takes 1-to-a-few machine instructions, even on RISC CPU architectures.
(The allocation strategies for storing the individual objects are also interesting and complex, but not immediately relevant to your question.)
Dynamic languages like Perl/Python/Ruby pretty much all opt for #2 for their general-purpose list/array types. In other words, they make lookup more efficient than inserting objects at random locations in the list, which is the better choice for many applications.
I'm not familiar with the implementation details for Ruby, but they are likely quite similar to those of Python's list type, whose performance and design is explained in wonderful detail at effbot.org.
Its implementation probably contains an array of memory addresses, pointing to the actual objects. Therefore it can still lookup without looping through the array.

What is a good sorting algorithm on CUDA?

I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:
struct OBJ
{
int N; //sort array of OBJ with respect to N
OB *c; //OB is another struct
}
The array size is small, about 512 elements, but the size of every element is big therefore I cannot copy the array to shared memory.
What is the simplest and "good" way to sort this array? I do not need a complex algorithm that require a lot of time to implement (since the number of elements in the array is small) I just need a simple algorithm.
Note: I have read some papers about sorting algorithms using GPUs, but the speed gain from these papers only show up when the size of the array is very big. Therefore I did not try to implement their algorithms because the size of my array is small. I only need a simple way to parallel sort my array. Thanks.
What means "big" and "small" ?
By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:
Use only a single block to sort the array (it can be then a part of some bigger CUDA kernel)
Bitonic sort is one of good appraches which can be adopted for parallel algorithm.
Some pages on bitonic sort:
Bitonic sort (nice explanation, applet to visualise and java source which does not take too much space)
Wikipedia (a bit too short explanation for my taste, but more source codes - some abstract language and Java)
NVIDIA code Samples (A sample source in CUDA. I think it is a bit ovefocused on killing bank conflicts. I believe the simpler code may actually perform faster)
I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.
Use the sorting calls available in the CUDPP or the Thrust library.
If you use cudppSort, note that it only works with integers or floats. To sort your array of structures, you can first sort the keys along with an index array. Later, you can use the sorted index array to move the structures to their final sorted location. I have described how to do this for the cudppCompact compaction algorithm in a blog post here. The steps are similar for sorting an array of structs using cudppSort.
Why exactly are you heading towards CUDA? I mean, it smells like your problem is not one of those, CUDA is very good at. You just want to sort an array of 512 Elements and let some pointers refer to another location. This is nothing fancy, use a simple serial algorithm for that, e.g. Quicksort, Heapsort or Mergesort.
Additionally, think about the overhead it takes to copy data from your Heap/Stack to your CUDA device. Using CUDA just makes sense, when the calculations are intense enough so that COMPUTING_TIME_ON_CUDA+COPY_DATA_FROM_HEAP_TO_CUDA_DEVICE+COPY_DATA_FROM_CUDA_DEVICE_TO_HEAP < COMPUTING_TIME_ON_HOST_CPU.
Besides, CUDA is immersely powerful at math calculations with big vectors and matrices and rather simple data-types (numbers) because it is one of the problems that often arise on a GPU: Calculating graphics.
Yes I would totally agree, the overhead of sorting small arrays (<5k elements) kills the possible speedup you will achieve with a "fine-tuned" parallel sorting algorithm implemented in CUDA. I would prefer CPU based sorting for such a small size...

why 2D array is better than objects to store x-y coordinates for better performance and less memory?

Assuming I want to store n points with integer (x,y) coordinates. I can use a 2-d (2Xn) array or use a list / collection / or an array of n objects where each object has 2 integer fields to store the coordinates.
As far as I know is the 2d array option is faster and consumes less memory, but I don't know why? Detailed explanation or links with details are appreciated.
This is a very broad question, and kinda has many parts to it. First off, this is relative to the language you are working in. Lets take Java as an example.
When you create an object, it inherits from the main object class. When the object is created, the overhead comes from the fact that the user defined class inherits from Object. The compiler has to virtualize certain method calls in memory so that when you call .equals() or .toString(), the program knows which one to call (that is, your classes' .equals() or Object's .equals()). This is accomplished with a lookup table and determined at runtime with pointers.
This is called virtualization. Now, in java, an array is actually an object, so you really don't gain much from an array of arrays. In fact, you might do better using your own class, since you can limit the metadata associated with it. Arrays in java store information on their length.
However, many of the collections DO have overhead associated with them. ArrayList for example will resize itself and stores metadata about itself in memory, that you might not need. LinkedList has references to other nodes, which is overhead to its actual data.
Now, what I said is only true about Java. In other OO languages, objects behave differently on the insides, and some may be more/less efficient.
In a language such as C++, when you allocate an array, you are really just getting a chunck of memory and it is up to you what you want to do with it. In that sense, it might be better. C++ has similar overhead with its objects if you use overriding (keyword virtual) as it will create these virtual lookups in memory.
All comes down to how efficiently you'll be using the storage space and what your access requirements are. Having to set aside memory to hold a 10,000 x 10,000 array to store only 10 points would be a hideous waste of memory. On the flip side, saving memory by storing the points in a linked list will also be pointless if you spend so much time iterating the list to find the one point you actually need in the 10,000,000 stored.
Some of the downsides of both can be overcome. sparse arrays, pre-sorting the list by some rule so "needed" points float to the top, etc...
In most languages, With a multidimentional array say AxB, you just have a chunk of memory big enough to hold A*B objects, and when you look up an element (m,n) all you need to do is find the element at location m*A+b. When you have an list of objects, there is overhead associated with every list, plus the lookup is more complex than a simple address calculation.
If the size of your matrix is constant, a 2D array is the fastest option. If it needs to grow and shrink though you probably have no option but to use the second approach.

What is the standard OCaml data structure with fastest iteration?

I'm looking for a container that provides fastest unordered iterations through the encapsulated elements. In other words, "add once, iterate many times".
Is there one among OCaml's standard modules that is fast enough (such that further optimization of it would be useless)? Or some kind of third-party GPL-ready ones?
AFAIK there's just one OCaml compiler, so the concept of being fast is more or less clear...
...But after I saw a couple of answers, it appears, it's not. Of course, there's a plenty of data structures that allow O(n) iteration through container of size n. But the task I'm solving is one of those, where difference between O(n) and O(2n) matters ;-).
I also see that Arrays and Lists provide unnecessary information about the order of elements added, which I don't need. Maybe in "functional world" there exists data structures such that can trade this information for a bit of iteration speed.
In C I would outright pick a plain array. The question is, what should I pick in OCaml?
You are unlikely to do better than built-in arrays and lists, since they are hand-coded in C, unless you bind to your own native implementation of an iterator. An array will behave almost exactly like an array in C (a contiguously allocated block of memory containing a sequence of element values), possibly with some extra pointer indirections due to boxing. List are implemented exactly how you would expect: as cells with a value and a "next" pointer. Arrays will give you the best locality for unboxed types (especially floats, which have a super-special unboxed implementation).
For information about the implementation of arrays and lists, see Section 18.3 of the OCaml manual and the files byterun/mlvalues.h, byterun/array.c, and byterun/alloc.c in the OCaml source code.
From the questioner: indeed, Array appeared to be the fastest solution. However it only outperformed List by 7%. Maybe it was because the type of an array element was not plain enough: it was an algebraic type. Hashtbl performed 4 times worse, as expected.
So, I will pick Array and I'm accepting this one. good.
To know for sure, you're going to have to measure. Based on the machine instructions the compiler is likely to generate, I would try an array, then a list.
Access to an array element requires a bounds check, address arithmetic, and a load
Access to the head of a list requires a load, a test for empty list, and a load at a known compile-time offset.
The details of which is faster probably depend on your application and what else is happening on your machine. They also depend on the type of elements; for example, if they are floating-point numbers, ocamlopt may be clever enough to make an unboxed array, which will save you a level of indirection.
Other common data structures like hash tables or balanced trees generally require that you allocate some context somewhere to keep track of where you are. With an array, keeping track requires only an integer index; with a list, keeping track requires a single pointer. I think this is going to be hard to beat in another data structure.
Finally please note that there may be only one OCaml compiler, but it has two back ends: bytecode and native code. Naturally if you care about this level of performance, you are using the native-code ocamlopt version. Right?
Please take measurements and edit the results into your question.
Don't forget about Bigarrays, they are most close to C arrays (just a flat piece of memory), but cannot contain arbitrary OCaml values. Also consider switching bounds checking off (unsafe_set/get). And of course you should profile first.
The array - a linear piece of memory with the items visited in sequential order - best utilises the CPU's L1 data cache.
All common data structures are iterable in O(n) time, so the differences between data structures will only be constant (and very probably not significant).
At least lists and arrays allow iteration without significant overhead. I can't think of a situation where that would not be fast enough.

Resources