Is array.push() O(1) in Haxe? - data-structures

Many language has a standard type which can resize itself when needed, like C++'s vector<T> or C#'s ArrayList<T>. In Haxe, however, I don't see such a data structure.
Does Array in Haxe work this way? Can it add/remove the last element in (amortized) O(1)?

Technically this of course depends on the platform-specific implementation of Array, but it is safe to assume that push has amortized O(1) as that's pretty straight forward to accomplish (the neko implementation shows that rather nicely).
On all platforms that come with a dynamically sized Array with support for sparseness Haxe uses those for implementation (AFAIK that's flash, js and php), but I guess if those ever showed poor metrics, they would be re-implemented.
I would note that there is also List, if random access is not important. But on some platforms it is never faster than Array, only smaller.

Related

When would it be a good idea to implement data structures rather than using built-in ones?

What is the purpose of creating your own linked list, or other data structure like maps, queues or hash function, for some programming language, instead of using built-in ones, or why should I create it myself? Thank you.
Good question! There are several reasons why you might want to do this.
For starters, not all programming languages ship with all the nice data structures that you might want to use. For example, C doesn't have built-in libraries for any data structures (though it does have bsearch and qsort for arrays), so if you want to use a linked list, hash table, etc. in C you need to either build it yourself or use a custom third-party library.
Other languages (say, JavaScript) have built-in support for some but not all types of data structures. There's no native JavaScript support for linked lists or binary search trees, for example. And I'm not aware of any mainstream programming language that has a built-in library for tries, though please let me know if that's not the case!
The above examples indicate places where a lack of support, period, for some data structure would require you to write your own. But there are other reasons why you might want to implement your own custom data structures.
A big one is efficiency. Put yourself in the position of someone who has to implement a dynamic array, hash table, and binary search tree for a particular programming language. You can't possibly know what workflows people are going to subject your data structures to. Are they going to do a ton of inserts and deletes, or are they mostly going to be querying things? For example, if you're writing a binary search tree type where insertions and deletions are common, you probably would want to look at something like a red/black tree, but if insertions and deletions are rare then an AVL tree would work a lot better. But you can't know this up front, because you have to write one implementation that stands the test of time and works pretty well for all applications. That might counsel you to pick a "reasonable" choice that works well in many applications, but isn't aggressively performance-tuned for your specific application. Coding up a custom data structure, therefore, might let you take advantage of the particular structure of the problem you're solving.
In some cases, the language specification makes it impossible or difficult to use fast implementations of data structures as the language standard. For example, C++ requires its associative containers to allow for deletions and insertions of elements without breaking any iterators into them. This makes it significantly more challenging / inefficient to implement those containers with types like B-trees that might actually perform a bit better than regular binary search trees due to the effects of caches. Similarly, the implementation of the unordered containers has an interface that assumes chained hashing, which isn't necessarily how you'd want to implement a hash table. That's why, for example, there's Google's alternatives to the standard containers that are optimized to use custom data structures that don't easily fit into the language framework.
Another reason why libraries might not provide the fastest containers would be challenges in providing a simple interface. For example, cuckoo hashing is a somewhat recent hashing scheme that has excellent performance in practice and guarantees worst-case efficient lookups. But to make cuckoo hashing work, you need the ability to select multiple hash functions for a given data type. Most programming languages have a concept that each data type has "a" hash function (std::hash<T>, Object.hashCode, __hash__, etc.), which isn't compatible with this idea. The languages could in principle require users to write families of hash functions with the idea that there would be many different hashes to pick from per object, but that complicates the logistics of writing your own custom types. Leaving it up to the programmer to write families of hash functions for types that need it then lets the language stay simple.
And finally, there's just plain innovation in the space. New data structures get invented all the time, and languages are often slow to grow and change. There's been a bunch of research into new faster binary search trees recently (check out WAVL trees as an example) or new hashing strategies (cuckoo hashing and the "Swiss Table" that Google developed), and language designers and implementers aren't always able to keep pace with them.
So, overall, the answer is a mix of "because you can't assume your favorite data structure will be there" and "because you might be able to get better performance rolling your own implementations."
There's one last reason I can think of, and that's "to learn how the language and the data structure work." Sometimes it's worthwhile building out custom data types just to sharpen your skills, and you'll often find some really clever techniques in data structures when you do!
All of this being said, I wouldn't recommend defaulting to coding your own version of a data structure every time you need one. Library versions are usually a pretty safe bet unless you're looking for extra performance or you're missing some features that you need. But hopefully this gives you a better sense as to why you may want to consider setting aside the default, well-tested tools and building out your own.
Hope this helps!

Scheme efficiency structure

I was wondering if we interpreters were cheating to get better performance. As I understand, the only real datastructure in a scheme is the cons cell.
Obviously, a cons cell is good to make simple datastructure like linked list and trees but I think it might get make the code slowlier in some case for example if you want to access the cadr of an object. It would get worse with a data structure with many more elements...
That said, may be scheme car and cdr are so efficient that it's not much slowlier than having a register offset in C++ for example.
I was wondering if it was necessary to implement a special datastructure that allocate native memory block. Something similar to using malloc. I'm talking about pure scheme and not anything related to FFI.
As I understand, the only real datastructure in a scheme is the cons cell.
That’s not true at all.
R5RS, R6RS, and R7RS scheme all include vectors and bytevectors as well as pairs/lists, which are the contiguous memory blocks you allude to.
Also, consider that Scheme is a minimal standard, and individual Scheme implementations tend to provide many more primitives than exist in the standard. These make it possible to do efficient I/O, for example, and many Schemes also provide an FFI to call C code if you want to do something that isn’t natively supported by the Scheme implementation you’re using.
It’s true that linked lists are a relatively poor general-purpose data structure, but they are simple and work alright for iteration from front to back, which is fairly common in functional programming. If you need random access, though, you’re right—using linked lists is a bad idea.
First off. There are many primitive types and many different compound types and even user described types in Scheme.
In C++ the memory model and how values are stored are a crucial part of the standard. In Scheme you have not have access to the language internals as standard, but implementations can do it to have a higher percentage of the implementation written in Scheme.
The standard doesn't interfere in how the implementation chooses to store data so even though many imitate each other with embedding primitive values in the address and otherwise every other value is a object on the heap it doesn't need to be. Using pairs as the implementation of vectors (arrays in C++) is pushing it and would make for a very unpopular implementation if not just a funny prank.
With R6RS you can make your own types it's even extendable with records:
(define-record-type (node make-node node?)
(fields
(immutable value node-value)
(immutable left node-left))
(immutable right node-right)))
node? would be disjoint and thus no other values would be #t other than values made with the constructor make-node and this has 3 fields instead of using two cons to store the same.
Now C++ has perhaps the edge by default when it comes to storing elements of the same type in an array, but you can in many ways work around this. Eg. use the same trick as you see in this video about optimizing java for memory usage. I would have started by making a good data modeling with records and rather worried about the performance when they become an issue.

Could range-based algorithm be fully independent of (and yet optimized for any) container type?

I was wondering if boost::range or range_v3 will reconciliate free functions and member functions in a similar way that std::begin reconciliates STL containers and C-like arrays (in terms of coding genericity I mean)?
More particularly it would be convenient to me to call std::sort on a list that automatically calls the best possible implementation given by std::list::sort.
At the end, could member functions be seen as interfaces for their generic
counterpart only (std::list::sort never called in client code)?
AFAIK, neither library you mention deals with this directly. There is a push to deal with this kind of thing more generally in C++17, including a proposal to make f(x) and x.f() equivalent, but as I mentioned in the comment above, I'm unclear if it will work with range-v3's algorithms.
I did notice an interesting comment in range-v3's sort.hpp: // TODO Forward iterators, like EoP?. So, perhaps Niebler does have ideas to support a more generic sort. ("EoP" is Elements of Programming by Alex Stepanov.)
One complication: A generic sort uses iterators to reorder values, while list::sort() reorders the links themselves. The distinction is important if you care what iterators point to after the sort, so you'd still need a way to select which sort you want. One could even argue that sort() should never call list::sort(), given the different semantics.

There is a fast way to use Clojure vectors as matrices?

I am trying to use Clojure to process images and I would like to represent images using Clojure data structures. Basically, my first approach was using a vector of vectors and mapv to operate over each pixel value and return a new image representation with the same data structure. However, some basic operations are taking too much time.
Using Jvisual profiler, I got the results showed below. Somebody could give me a tip to improve the performance? I can give more details if it is necessary, but maybe just looking at the costs of seq and next someone can have a good guess.
You should check out core.matrix and associated libraries for anything to do with matrix computation. core.matrix is a general purpose Clojure API for matrix computation, supporting multiple back-end implementations.
Clojure's persistent data structures are great for most purposes, but are really not suited for fast processing of large matrices. The main problems are:
Immutability: usually a good thing, but can be a killer for low level code where you need to do things like accumulate results in a mutable array for performance reasons.
Boxing: Clojure data structures generally box results (as java.lang.Double etc.) which adds a lot of overhead compared to using primitives
Sequences: traversing most Clojure data structures as sequences involved the creation of temporary heap objects to hold the sequence elements. Normally not a problem, but when you are dealing with large matrices it becomes problematic.
The related libraries that you may want to look at are:
vectorz-clj : a very fast matrix library that works as a complete core.matrix implementation. The underlying code is pure Java but with a nice Clojure wrapper. I believe it is currently the fastest way of doing general purpose matrix computation in Clojure without resorting to native code. Under the hood, it uses arrays of Java primitives, but you don't need to deal with this directly.
Clatrix: another fast matrix library for Clojure which is also a core.matrix implementation. Uses JBLAS under the hood.
image-matrix : represents a Java BufferedImage as a core.matrix implementation, so you can perform matrix operations on images. A bit experimental right now, but should work for basic use cases
Clisk : a library for procedural image processing. Not so much a matrix library itself, but very useful for creating and manipulating digital images using a Clojure-based DSL.
Depending on what you want to do, the best approach may be to use image-matrix to convert the images into vectorz-clj matrices and do your processing there. Alternatively, Clisk might be able to do what you want out of the box (it has a lot of ready-made filters / distortion effects etc.)
Disclaimer: I'm the lead developer for most of the above libraries. But I'm using them all myself for serious work, so very willing to vouch for their usefulness and help fix any issues you find.
I really think that you should use arrays of primitives for this. Clojure has array support built-in, even though it's not highlighted, and it's for cases just like this, where you have a high volume of numerical data.
Any other approach, vectors, even java collections will result in all of your numbers being boxed individually, which is very wasteful. Arrays of primitives (int, double, byte, whatever is appropriate) don't have this problem, and that's why they're there. People feel shy about using arrays in clojure, but they're there for a reason, and this is it. And it'll be good protable clojure code -- int-array works in both jvm clojure and clojure-script.
Try arrays and benchmark.
Clojure's Transients offer a middle ground between full persistence and no persistence like you would get with a standard java array. This allows you to build the image using fast mutate-in-place opperations (which are limited to the current thread) and then call persistent! to convert it in constante time to a proper persistent structure for manipulation in the rest of the program
It looks like you are also seeing a lot of overhead from working with sequences over the contents of the image, if transients don't make enough of a difference you may want to next consider using normal java arrays and structure the accesses to directly access the array elements.

Do you use linked lists, doubly linked lists and so on, in business programming?

Are data structures like linked lists something that are purely academic for real programming or do you really use them? Are they things that are covered by generics so that you don't need to build them (assuming your language has generics)? I'm not debating the importance of understanding what they are, just the usage of them outside of academia. I ask from a front end web, backend database perspective. I'm sure someone somewhere builds these. I'm asking from my context.
Thank you.
EDIT: Are Generics so that you don't have to build linked lists and the like?
It will depend on the language and frameworks you're using. Most modern languages and frameworks won't make you reinvent these wheels. Instead, they'll provide things like List<T> or HashTable.
EDIT:
We probably use linked lists all the time, but don't realize it. We don't have to write implementations of linked lists on our own, because the frameworks we use have already written them for us.
You may also be getting confused about "generics". You may be referring to generic list classes like List<T>. This is just the same as the non-generic class List, but where the element is always of type T. It is probably implemented as a linked list, but we don't have to care about that.
We also don't have to worry about allocation of physical memory, or how interrupts work, or how to create a file system. We have operating systems to do that for us. But we may be taught that information in school just the same.
Certainly. Many "List" implementations in modern languages are actually linked lists, sometimes in combination with arrays or hash tables for direct access (by index as opposed to iteration).
Linked lists (especially doubly linked lists) are very commonly used in "real-world" data structures.
I would dare to say every common language has a pre-built implementation of linked list, either as a language primitive, native template library (e.g. C++), native library (e.g. Java) or some 3rd party implementation (probably open-source).
That being said, several times in the past I wrote a linked list implementation from scratch myself when creating infrastructure code for complex data structures. Sometimes it's a good idea to have full control over the implementation, and sometimes you need to add a "twist" to the classic implementation for it to satisfy your specific requirement. There's no right or wrong when it comes to whether to code your own implementation, as long as you understand the alternatives and trade-offs. In most cases, and certainly in very modern languages like C# I would avoid it.
Another point is when you should use lists versus array/vectors or hash tables. From your question I understand you are aware of the trade-offs here so I won't go too much into it, but basically, if your main usage is traversing lists by-order, and the list size may vary significantly, a list may be a viable option. Another consideration is the type of insertion. If a common use case is "inserting in the middle", than lists have a significant advantage over arrays/vectors. I can go on but this information is in the classic CS books :)
Clarification: My answer is language agnostic and does not relate specifically to Generics which to my understanding have a linked list implementation.
A singly-linked list is the only way to have a memory efficient immutable list which can be composed to "mutate" it. Look at how Erlang does it. It may be slightly slower than an array-backed list but it has very useful properties in multithreaded and purely-functional implementations.
Yes, there are real world application that use linked list, I sometimes have to maintain a huge application that makes very have use of linked lists.
And yes, linked lists are included in just about any class library from C++/STL to .net.
And I wish it used arrays instead.
In the real world linked lists are SLOW because of things like paging and CPU cache size (linked lists tend to spread you data and that makes it more likely that you will need to access data from different areas of memory and that is much slower on todays computers than using arrays that store all the data in one sequence).
Google "locality of reference" for more info.
Never used hand-made lists except for homeworks at university.
Depending on usage a linked list could be the best option. Deletes from the front of the list are much faster with a linked list than an array list.
In a Java program that I maintain profiling showed that I could increase performance by moving from an ArrayList to a LinkedList for a List that had lots of deletes at the beginning.
I've been developing line of business applications (.NET) for years and I can only think of one instance where I've used linked list and even then I did not have to create the object.
This has just been my experience.
I would say it depends on the usage, in some cases they are quicker than typical random access containers.
Also I think they are used by some libraries as an underlying collection type, so what may look like a non-linked list might actually be one underneath.
In a C/C++ application I developed at my last company we used doubly linked lists all the time. They were essential to what we were doing, which was real-time 3D graphics.
Yes all sorts of data-structures are very useful in daily software development. In most languages that I know (C/C++/Python/Objective-C) there are frameworks that implement those data-structures so that you don't have to reinvent the wheel.
And yes, data-structures are not only for academics, they are very useful and you would not be able to write software without them (depends on what you do).
You use data-structures in message queues, data maps, hash tables, keeping data ordered, fast access/removal/insertion and so on depends what needs to be done.
Yes, I do. It all depends on the situation. If I won't be storing a lot of data in them, or if the specific application needs a FIFO structure, I'll use them without a second thought because they are fast to implement.
However, in applications for other developers I know, there are times that a linked list would fit perfectly except that poor locality causes a lot of cache misses.
I cannot imagine many programs that doesn't deal with lists.
The minute you need to deal with more than 1 thing of something, lists in all forms and shapes becomes needed, as you need somewhere to store these things. That list might be a singly/doubly linked list, an array, a set, a hashtable if you need to index your things based on a key, a priority queue if you need to sort it etc.
Typically you'd store these lists in a database system, but somewhere you need to fetch them from the db, store them in your application and manipulate them, even if it's as simple to retrieve a little list of things you populate into a drop-down combobox.
These days, in languages such as C#,Python,Java and many more, you're usually abstracted away from having to implement your own lists. These languages come with a great deal of abstractions of containers you can store stuff in. Either via standard libraries or built into the language.
You're still at an advantage of learning these topics, e.g. if you're working with C# you'd want to know how an ArrayList works, and wheter you'd choose ArrayList or something else depending on your need to add/insert/search/random index such a list.

Resources