Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Arrays in PHP work both with numeric keys and string keys. Which is awesome.
Ex:
$array[0] = "My value.";
or
$array['key'] = "My value";
Why doesn't go implement arrays like this?
What's the benefit for having two different concepts and syntax (maps) in Go?
I believe I'm failing to see the usefulness behind this.
Go is not PHP. While a few higher-level languages share this abstraction, it's not very common. Arrays and Maps are different data structures for different purposes.
PHP's arrays are actually hash tables underneath. Go has true arrays, and it has slices which are a more powerful abstraction over arrays.
Having real arrays, gives you predictable memory layouts, and true O(1) indexing (the same goes for Go's slices, which use an array internally). Using a hash-map for the underlying data store costs you a constant overhead for all operations, as well as not being able to better control data locality.
One of the primary reason is that arrays have order, and maps do not, which has important implications as stated here:
When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next. If you require a stable iteration order you must maintain a separate data structure that specifies that order.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I often find myself unsure about what data structure is better for matrix-based algorithms.
By "matrix-based algoritm" I mean algorithms like Needleman-Wunsh alignment. There are many algorithms that are visually represented with a matrix.
I wonder what should I choose:
Array of arrays
Linked-list of linked-lists
Hash table where Key is a tuple like (line, column)
etc
What do I have to consider when facing this impasse?
Obs: My question is "language-open". You can use any programming language in your answer.
What data structure to use depends on your algorithm and how you will access that matrix. For example, If size is fixed and there is a need for fast access, it is better to use 2 dimensional array, because no matter what you use, you will have to allocate that space anyway. If the size of matrix is determined dynamically, then probably vector of vector (or similar data structure depending on language).
Another question is if your matrix is sparse and extremely big(like in digital geometry algorithms) and you have to do arithmetic operations on that matrix very often, then triple format data structures could be useful , for example compressed row storage that could be created using 3 vectors. You can read more in this link https://de.wikipedia.org/wiki/Compressed_Row_Storage
Hope it helps
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Suppose you are developing a language that is intended to be used for scripting, prototyping, as a macro language for application automation, or as an interactive calculator. It's dynamically typed, memory management is implicit, based on a garbage collection. In most cases, you do not expect users to use highly optimized algorithms or hand-picked and fine-tuned data structures. You want to provide a general-purpose list type that would have a decent performance on average. It must support all kinds of operations: iteration, random access by index, prepending and appending elements, insertion, deletion, mapping, filtering, membership testing, concatenation, splitting, reversing, sorting, cloning, extracting segments. It could be used both with small and large number of elements (but you can assume that it fits into physical memory). It's intended only for single-thread access and you need not care about thread safety. You expect users to use this general-purpose list type, no matter what is their scenario or usage pattern. Some users might want to use it as a sparse array, where most elements have some default value (e.g. 0), and only few elements have non-default values.
What implementation would you choose?
We assume that you can afford to invest significant development effort, so the solution need not be necessarily simple. For example, you could implement different ways of internal organization of data, and switch between them depending on number of elements or usage patterns. High performance is a more important goal than reducing memory consumption, so you can afford some memory overhead if it wins you performance.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
while browsing for a contains method, I came across the following Q&A
contains-method-for-a-slice
It is said time and again in this Q&A that the method is really trivial to implement. What I don't understand is, if it were so easy to implement, and seeing how DRY is a popular software principle && and most modern languages implement said method , what sort of design reasoning could be involved behind the exclusion of such a simple method?
The triviality of the implementation depends on the scope of the implementation. It is trivial to implement when you know how to compare each value. Application code usually knows how to compare the types used in that application. But it is not trivial to implement in the general case for arbitrary types, and that is the situation for the language and standard library.
Figuring out if a slice contains a certain object is an O(n) operation where n is the length of the slice. This would not change if the language provided a function to do this. If your code relies on frequently checking if a slice contains a certain value, you should reevaluate your choice of data structures; a map is usually better in these kind of cases. Why should the standard library include functions that encourage you to use the wrong data structure for the task you have?
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need to generate global unique ids by hashing some data.
On the one hand, I could use a combination of timestamp and network address, which is unique since every computer can only create one id at the same time. But since this data is to long I'd need to hash it and thus collisions could occur. (As a side note, we could also throw in a random number if the timestamp is not exact enough.)
On the other hand, I could just use a random number and hash that. Shouldn't that bring exactly the same hash collision probability as the first approach? It is interesting because this approach would be faster and is much easier to implement.
Is there a difference in terms of hash collisions when using unique data rather than random data? (By the way, I will not use real GUIDs as described by the standard but mine will only be 64 bits long. But that shouldn't affect the question.)
Why bother to hash a random number? Hashing is designed to map inputs uniformly to a keyspace, but PRNGs are already giving you a uniform mapping of outcomes. All you're doing is creating more work.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Lets put some numbers first:
The largest of the list is about 100M records. (but is expected to grow upto 500). The other lists (5-6 of them) are in millions but would be less than 100M for the foreseeable future.
These are always joined based on a single id. and never with any other parameters.
Whats the best algorithm to join such lists?
I was thinking in lines of distributed computing. Have a good hash (the circular hash kinds, where you can add a node and there's not a lot of data movement) function and have these lists split into several smaller files. And since, they are always joined on the common id (which i will be hashing) it would boil down to joining to small files. And maybe use the nix join commands for that.
A DB (at least MySQL) would join using merge join (since it would be on primary key). Is that going to be more efficient that my approach?
I know its best to test and see. But given the magnitute of these files, its pretty time consuming. And I would like to do some theoretical calculation and then see how it fairs in practice.
Any insights on these or other ideas would be helpful. I dont mind if it takes slightly longer, but would prefer the best utilization of the resources I have. Don't have a huge budget :)
Use a Database. They are designed for performing joins (with the right indexes of course!)