MATLAB efficient dynamically expanding primitive - data-structures

Just wondering if MATLAB has an efficient dynamically expanding primitive, something akin to Java Collections? I realise that the Java API is always an option, but it can be a complete pain to use. Of course something like the ArrayList would be simple to implement, but I was specifically wondering about built-in data structures.
Thanks very much.

Map Containers should fit your criteria of built-in data structure.
Java Collections Framework might not be best as example for 'expanding primitive'. It is a framework implemented based on the Java language. Even in Java, 'primitive data types' are breaking pure object orientation.

Related

When would it be a good idea to implement data structures rather than using built-in ones?

What is the purpose of creating your own linked list, or other data structure like maps, queues or hash function, for some programming language, instead of using built-in ones, or why should I create it myself? Thank you.
Good question! There are several reasons why you might want to do this.
For starters, not all programming languages ship with all the nice data structures that you might want to use. For example, C doesn't have built-in libraries for any data structures (though it does have bsearch and qsort for arrays), so if you want to use a linked list, hash table, etc. in C you need to either build it yourself or use a custom third-party library.
Other languages (say, JavaScript) have built-in support for some but not all types of data structures. There's no native JavaScript support for linked lists or binary search trees, for example. And I'm not aware of any mainstream programming language that has a built-in library for tries, though please let me know if that's not the case!
The above examples indicate places where a lack of support, period, for some data structure would require you to write your own. But there are other reasons why you might want to implement your own custom data structures.
A big one is efficiency. Put yourself in the position of someone who has to implement a dynamic array, hash table, and binary search tree for a particular programming language. You can't possibly know what workflows people are going to subject your data structures to. Are they going to do a ton of inserts and deletes, or are they mostly going to be querying things? For example, if you're writing a binary search tree type where insertions and deletions are common, you probably would want to look at something like a red/black tree, but if insertions and deletions are rare then an AVL tree would work a lot better. But you can't know this up front, because you have to write one implementation that stands the test of time and works pretty well for all applications. That might counsel you to pick a "reasonable" choice that works well in many applications, but isn't aggressively performance-tuned for your specific application. Coding up a custom data structure, therefore, might let you take advantage of the particular structure of the problem you're solving.
In some cases, the language specification makes it impossible or difficult to use fast implementations of data structures as the language standard. For example, C++ requires its associative containers to allow for deletions and insertions of elements without breaking any iterators into them. This makes it significantly more challenging / inefficient to implement those containers with types like B-trees that might actually perform a bit better than regular binary search trees due to the effects of caches. Similarly, the implementation of the unordered containers has an interface that assumes chained hashing, which isn't necessarily how you'd want to implement a hash table. That's why, for example, there's Google's alternatives to the standard containers that are optimized to use custom data structures that don't easily fit into the language framework.
Another reason why libraries might not provide the fastest containers would be challenges in providing a simple interface. For example, cuckoo hashing is a somewhat recent hashing scheme that has excellent performance in practice and guarantees worst-case efficient lookups. But to make cuckoo hashing work, you need the ability to select multiple hash functions for a given data type. Most programming languages have a concept that each data type has "a" hash function (std::hash<T>, Object.hashCode, __hash__, etc.), which isn't compatible with this idea. The languages could in principle require users to write families of hash functions with the idea that there would be many different hashes to pick from per object, but that complicates the logistics of writing your own custom types. Leaving it up to the programmer to write families of hash functions for types that need it then lets the language stay simple.
And finally, there's just plain innovation in the space. New data structures get invented all the time, and languages are often slow to grow and change. There's been a bunch of research into new faster binary search trees recently (check out WAVL trees as an example) or new hashing strategies (cuckoo hashing and the "Swiss Table" that Google developed), and language designers and implementers aren't always able to keep pace with them.
So, overall, the answer is a mix of "because you can't assume your favorite data structure will be there" and "because you might be able to get better performance rolling your own implementations."
There's one last reason I can think of, and that's "to learn how the language and the data structure work." Sometimes it's worthwhile building out custom data types just to sharpen your skills, and you'll often find some really clever techniques in data structures when you do!
All of this being said, I wouldn't recommend defaulting to coding your own version of a data structure every time you need one. Library versions are usually a pretty safe bet unless you're looking for extra performance or you're missing some features that you need. But hopefully this gives you a better sense as to why you may want to consider setting aside the default, well-tested tools and building out your own.
Hope this helps!

How To Use Classic Custom Data Structures As Java 8 Streams

I saw a SO question yesterday about implementing a classic linked list in Java. It was clearly an assignment from an undergraduate data structures class. It's easy to find questions and implementations for lists, trees, etc. in all languages.
I've been learning about Java lambdas and trying to use them at every opportunity to get the idiom under my fingers. This question made me wonder: How would I write a custom list or tree so I could use it in all the Java 8 lambda machinery?
All the examples I see use the built in collections. Those work for me. I'm more curious about how a professor teaching data structures ought to rethink their techniques to reflect lambdas and functional programming.
I started with an Iterator,but it doesn't appear to be fully featured.
Does anyone have any advice?
Exposing a stream view of arbitrary data structures is pretty easy. The key interface you have to implement is Spliterator, which, as the name suggests, combines two things -- sequential element access (iteration) and decomposition (splitting).
Once you have a Spliterator, you can turn that into a stream easily with StreamSupport.stream(). In fact, here's the stream() method from AbstractCollection (which most collections just inherit):
default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}
All the real work is in the spliterator() method -- and there's a broad range of spliterator quality (the absolute minimum you need to implement is tryAdvance, but if that's all you implement, it will work sequentially, but will lose out on most of the stream optimizations.) Look in the JDK sources Arrays.stream(), IntStream.range()) for examples of how to do better.)
I'd look at http://www.javaslang.io for inspiration, a library that does exactly what you want to do: Implement custom lists, trees, etc. in a Java 8 manner.
It specifically doesn't closely couple with the JDK collections outside of importing/exporting methods, but re-implements all the immutable collection semantics that a Scala (or other FP language) developer would expect.

There is a fast way to use Clojure vectors as matrices?

I am trying to use Clojure to process images and I would like to represent images using Clojure data structures. Basically, my first approach was using a vector of vectors and mapv to operate over each pixel value and return a new image representation with the same data structure. However, some basic operations are taking too much time.
Using Jvisual profiler, I got the results showed below. Somebody could give me a tip to improve the performance? I can give more details if it is necessary, but maybe just looking at the costs of seq and next someone can have a good guess.
You should check out core.matrix and associated libraries for anything to do with matrix computation. core.matrix is a general purpose Clojure API for matrix computation, supporting multiple back-end implementations.
Clojure's persistent data structures are great for most purposes, but are really not suited for fast processing of large matrices. The main problems are:
Immutability: usually a good thing, but can be a killer for low level code where you need to do things like accumulate results in a mutable array for performance reasons.
Boxing: Clojure data structures generally box results (as java.lang.Double etc.) which adds a lot of overhead compared to using primitives
Sequences: traversing most Clojure data structures as sequences involved the creation of temporary heap objects to hold the sequence elements. Normally not a problem, but when you are dealing with large matrices it becomes problematic.
The related libraries that you may want to look at are:
vectorz-clj : a very fast matrix library that works as a complete core.matrix implementation. The underlying code is pure Java but with a nice Clojure wrapper. I believe it is currently the fastest way of doing general purpose matrix computation in Clojure without resorting to native code. Under the hood, it uses arrays of Java primitives, but you don't need to deal with this directly.
Clatrix: another fast matrix library for Clojure which is also a core.matrix implementation. Uses JBLAS under the hood.
image-matrix : represents a Java BufferedImage as a core.matrix implementation, so you can perform matrix operations on images. A bit experimental right now, but should work for basic use cases
Clisk : a library for procedural image processing. Not so much a matrix library itself, but very useful for creating and manipulating digital images using a Clojure-based DSL.
Depending on what you want to do, the best approach may be to use image-matrix to convert the images into vectorz-clj matrices and do your processing there. Alternatively, Clisk might be able to do what you want out of the box (it has a lot of ready-made filters / distortion effects etc.)
Disclaimer: I'm the lead developer for most of the above libraries. But I'm using them all myself for serious work, so very willing to vouch for their usefulness and help fix any issues you find.
I really think that you should use arrays of primitives for this. Clojure has array support built-in, even though it's not highlighted, and it's for cases just like this, where you have a high volume of numerical data.
Any other approach, vectors, even java collections will result in all of your numbers being boxed individually, which is very wasteful. Arrays of primitives (int, double, byte, whatever is appropriate) don't have this problem, and that's why they're there. People feel shy about using arrays in clojure, but they're there for a reason, and this is it. And it'll be good protable clojure code -- int-array works in both jvm clojure and clojure-script.
Try arrays and benchmark.
Clojure's Transients offer a middle ground between full persistence and no persistence like you would get with a standard java array. This allows you to build the image using fast mutate-in-place opperations (which are limited to the current thread) and then call persistent! to convert it in constante time to a proper persistent structure for manipulation in the rest of the program
It looks like you are also seeing a lot of overhead from working with sequences over the contents of the image, if transients don't make enough of a difference you may want to next consider using normal java arrays and structure the accesses to directly access the array elements.

Persistent data structures in Scala

Are all immutable data structures in Scala persistent? If not, which of them are and which not? What are the behavioural characteristics of those which are persistent? Also, how do they compare to the persistent data structures in Clojure?
Scala's immutable data structures are all persistent, in the sense that the old value is maintained by an `update' operation. In fact, I do not know of a difference between immutable and persistent; for me the two terms are aliases.
Two of Scala's 2.8 immutable data structures are vectors and hash tries, represented as 32-ary trees. These were originally designed by Phil Bagwell, who was working with my team at EPFL, then adopted for Clojure, and now finally adopted for Scala 2.8. The Scala implementation shares a common root with the Clojure implementation, but is certainly not a port of it.
Please have a look at these excellent articles by Daniel Spiewak:
http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala
http://www.codecommit.com/blog/scala/more-persistent-vectors-performance-analysis
He's also referring to the Clojure implementation.
List, Vector, HashMap and HashSet are all persistent on Scala 2.8. There are other persistent data structures, but these covering all the major uses, I'm not sure there's any point in enumerating all of them.
For the last part of your question, I remember Rich Hickey mentioning in a presentation that the Clojure data structures have been ported to Scala. Also, Michael Fogus mentions plans for Scala 2.8 to adopt some of Clojure's data structures in this interview.
Sorry this is so short on details... I'm not sure what the status is on the above mentioned Scala 2.8 plans, but I remembered Rich and Michael mentioning this and thought it might be an interesting thing for you to google for if you're interested.

Reimplementing data structures in the real world

The topic of algorithms class today was reimplementing data structures, specifically ArrayList in Java. The fact that you can customize a structure for in various ways definitely got me interested, particularly with variations of add() & iterator.remove() methods.
But is reimplementing and customizing a data structure something that is of more interest to the academics vs the real-world programmers? Has anyone reimplemented their own version of a data structure in a commercial application/program, and why did you pick that route over your particular language's implementation?
Knowing how data structures are implemented and can be implemented is definitely of interest to everyone, not just academics. While you will most likely not reimplement a datastructure if the language already provides an implementation with suitable functions and performance characteristics, it is very possible that you will have to create your own data structure by composing other data structures... or you may need to implement a data structure with slightly different behavior than a well-known data structure. In that case, you certainly will need to know how the original data structure is implemented. Alternatively, you may end up needing a data structure that does not exist or which provides similar behavior to an existing data structure, but the way in which it is used requires that it be optimized for a different set of functions. Again, such a situation would require you to know how to implement (and alter) the data structure, so yes it is of interest.
Edit
I am not advocating that you reimplement existing datastructures! Don't do that. What I'm saying is that the knowledge does have practical application. For example, you may need to create a bidirectional map data structure (which you can implement by composing two unidirectional map data structures), or you may need to create a stack that keeps track of a variety of statistics (such as min, max, mean) by using an existing stack data structure with an element type that contains the value as well as these various statistics. These are some trivial examples of things that you might need to implement in the real world.
I have re-implemented some of a language's built-in data structures, functions, and classes on a number of occasions. As an embedded developer, the main reason I would do that is for speed or efficiency. The standard libraries and types were designed to be useful in a variety of situations, but there are many instances where I can create a more specialized version that is custom-tailored to take advantage of the features and limitations of my current platform. If the language doesn't provide a way to open up and modify existing classes (like you can in Ruby, for instance), then re-implementing the class/function/structure can be the only way to go.
For example, one system I worked on used a MIPS CPU that was speedy when working with 32-bit numbers but slower when working with smaller ones. I re-wrote several data structures and functions to use 32-bit integers instead of 16-bit integers, and also specified that the fields be aligned to 32-bit boundaries. The result was a noticable speed boost in a section of code that was bottlenecking other parts of the software.
That being said, it was not a trivial process. I ended up having to modify every function that used that structure and I ended up having to re-write several standard library functions as well. In this particular instance, the benefits outweighed the work. In the general case, however, it's usually not worth the trouble. There's a big potential for hard-to-debug problems, and it's almost always more work than it looks like. Unless you have specific requirements or restrictions that the existing structures/classes don't meet, I would recommend against re-implementing them.
As Michael mentions, it is indeed useful to know how to re-implement structures even if you never do so. You may find a problem in the future that can be solved by applying the principles and techniques used in existing data structures.

Resources