Why is D missing container classes? - data-structures

I'm used to C++ STL containers. D has arrays, associative arrays, and strings, but where is the rest? I know about std.container, but as far as I can tell it only has one container, the red-black tree, which I could use if I needed something similar to std::set. But, what if I need a list? Am I supposed to use an array instead?
std::vector -> array
std::deque -> ?
std::queue -> ?
std::stack -> ? maybe array and std.container functions ?
std::priority_queue -> BinaryHeap
std::list -> ?
std::set -> std.container RedBlackTree
std::multiset -> ?
std::unordered_set -> ?
std::map -> associative arrays
std::multimap -> ?
std::unordered_map -> ?
Are there any plans to support any of the missing?

I believe that the main holdup for getting more containers into std.container is that Andrei Alexandrescu has been sorting out how best to deal with custom allocators, and he wants to do that before implementing all of the sundry container types, because otherwise it's going to require a lot of code changes once he does.
In the interim, you have the built-in arrays and associative arrays, and std.container contains Array (which is essentially std::vector), SList (which is a singly-linked list), RedBlackTree (which can be used for any type of set or map which uses a tree - which is what the STL's various set and map types do), and BinaryHeap.
So, there's no question that the situation needs to be improved (and it will), but I don't know how soon. Eventually, std.container should have container types which correspond to all of the STL container types.

Containers are a todo in terms of library development in D, but noone's gotten a comprehensive container library into Phobos because noone agrees on what the design should be, and everyone who contributes to the standard library (which has been growing very rapidly) has found more interesting things to work on.
std::vector -> array as you say
std::dequeue, std::queue: We don't have one yet, unfortunately.
std::stack: This can be trivially implemented on top of SList or an array.
std::set: This can be trivially implemented on top of either RedBlackTree.
std::multiset: I think RedBlackTree can be set to allow duplicates.
std::unordered_set: This can be trivially implemented on top of the builtin associative array. To implement it on top of the builtin AA, use byte[0][SomeType].
std::map: Can be trivially implemented on top of RedBlackTree.
std::multimap: You can probably use associative arrays of arrays for this.
std__unordered_map: Use builtin associative arrays.

Related

Coq list sorting practices & sortBy?

What is the equivalent of Haskell's sortBy in Coq?
In general, I find the Coq standard library around sorting confusing.
I would have hoped for some "axiomatisation" of a sorted list, and the the availability of different sorts, to which I can provide an ordering function.
However, this does not seem to be the case.
There is a theory of a "Sorted list", which uses a relation Variable R : A -> A -> Prop.
, but there are no restrictions on this R. I would have expected it to be an ordering, but no such thing exists.
There is another file with a "mergesort implementation", which needs a new module to be passed.
There is no "higher level" version which provides helpers such as sortBy.
Is there some implementation of sortBy I can use, or do I need to create this manually?
The MergeSort module in the Coq standard library does what you need. It works like sortBy in Haskell except that instead of passing an ordering function and obtaining the specialized sort, you pass a module that encapsulates the ordering function along with the proof that this function is total. See the example at the bottom of the module documentation.
Besides the Coq standard library, the Mathetical Components library also has an implementation of merge sort. It is called sort, and it lives in the module mathcomp.path. Its signature is forall T : Type, (T -> T -> bool) -> list T -> list T, which is closer to the original sortBy.

Are some combinations of keys behaviors and values behavior not appropriate for a multimap implementation?

Iv'e been looking at the mutable multimap implementations of Guava and noticed that most of the keys and values implementation combinations are not present.
Java offers 3 Map implementations, 2 List implementations and 3 Set implementations. The Map implementation controls the key behavior and the Set and List implementations control the value behavior. In theory, there can be 3 x (2+3) = 15 combinations. Guava offers 5 of these.
Do the other combination make no sense (they are just worse)? Are they not possible to implement? Are they fine but there is no need for so many combinations?
(Note that while I refer to Java, this is a question in data structures and not restricted to a specific language. Any language that can manifest hash tables, arrays, linked lists etc. can be used.)
Use MultimapBuilder to have all those combinations, ex. TreeMap -> ArrayList can be constructed with:
ListMultimap<String, Integer> treeListMultimap =
MultimapBuilder.treeKeys().arrayListValues().build();
Want SetMultimap with EnumSet "values"? Use:
SetMultimap<Integer, MyEnum> hashEnumMultimap =
MultimapBuilder.hashKeys().enumSetValues(MyEnum.class).build();
(Examples taken from documentation.)
If you want even more possibilities, not restricted to JDK maps / collections implementations, you can always construct own multimaps like using static methods: Multimaps.new{List,Set,SortedSet}Multimap:
ListMultimap<String, Integer> myListMultimap =
Multimaps.newListMultimap(new HashMap<>(), MyList::new);
EDIT:
(Direct answers to your questions below.)
Do the other combination make no sense (they are just worse)?
They make sense, they're just rarer in real world.
Are they not possible to implement?
They are possible, why not?
Are they fine but there is no need for so many combinations?
See above - you can construct such multimap using new*Multimap or MultimapBuilder. They (i.e. Guava team) always take usage statistics across Google internal codebase, so I guess they chose most used ones.

Why list ++ requires to scan all elements of list on its left?

The Haskell tutorial says, be cautious that when we use "Hello"++" World", the new list construction has to visit all single elements(here, every character of "Hello"), so if the list on the left of "++" is long, then using "++" will bring down performance.
I think I was not understanding correctly, does Haskell's developers never tune the performance of list operations? Why this operation remains slow, to have some kind of syntax consistencies in any lambda function or currying?
Any hints? Thanks.
In some languages, a "list" is a general-purpose sequence type intended to offer good performance for concatenation, splitting, etc. In Haskell, and most traditional functional languages, a list is a very specific data structure, namely a singly-linked list. If you want a general-purpose sequence type, you should use Data.Sequence from the containers package (which is already installed on your system and offers very good big-O asymptotics for a wide variety of operations), or perhaps some other one more heavily optimized for common usage patterns.
If you have immutable list which has a head and a reference to the tail, you cannot change its tail. If you want to add something to the 'end' of the list, you have to reach the end and then put all items one by one to the head of your right list. It is the fundamential property of immutable lists: concatenation is expensive.
Haskell lists are like singly-linked lists: they are either empty or they consist of a head and a (possibly empty) tail. Hence, when appending something to a list, you'll first have to walk the entire list to get to the end. So you end up traversing the entire list (the list to which you append, that is), which needs O(n) runtime.

Use cases of std::multimap

I don't quite get the purpose of this data structure. What's the difference between std::multimap<K, V> and std::map<K, std::vector<V>>. The same goes for std::multiset- it could just be std::map<K, int> where the int counts the number of occurrences of K. Am I missing something on the uses of these structures?
A counter-example seems to be in order.
Consider a PhoneEntry in an AdressList grouped by name.
int AdressListCompare(const PhoneEntry& p1, const PhoneEntry& p2){
return p1.name<p2.name;
}
multiset<PhoneEntry, AdressListCompare> adressList;
adressList.insert( PhoneEntry("Cpt.G", "123-456", "Cellular") );
adressList.insert( PhoneEntry("Cpt.G", "234-567", "Work") );
// Getting the entries
addressList.equal_range( PhoneENtry("Cpt.G") ); // All numbers
This would not be feasible with a set+count. Your Object+count approach seems to be faster if this behavior is not required. For instance the multiset::count() member states
"Complexity: logarithmic in size +
linear in count."
You could use make the substitutions that you suggest, and extract similar behavior. But the interfaces would be very different than when dealing with regular standard containers. A major design theme of these containers is that they share as much interface as possible, making them as interchangeable as possible so that the appropriate container can be chosen without having to change the code that uses it.
For instance, std::map<K, std::vector<V>> would have iterators that dereference to std::pair<K, std::vector<V>> instead of std::pair<K, V>. std::map<K, std::vector<V>>::Count() wouldn't return the correct result, failing to account for the duplicates in the vector. Of course you could change your code to do the extra steps needed to correct for this, but now you are interfacing with the container in a much different way. You can't later drop in unordered_map or some other map implementation to see it performs better.
In a broader sense, you are breaking the container abstraction by handling container implementation details in your code rather than having a container that handles it's own business.
It's entirely possible that your compiler's implementation of std::multimap is really just a wrapper around std::map<K, std::vector<V>>. Or it might not be. It could be more efficient and friendly to object pool allocation (which vectors are not).
Using std::map<K, int> instead of std::multiset is the same case. Count() would not return the expected value, iterators will not iterate over the duplicates, iterators will dereference to std::pair<k, int> instead of directly to `K.
A multimap or multiset allows you to have elements with duplicate keys.
ie a set is a non-ordered group of elements that are all unique in that {A,B,C} == {B,C,A}

Why does F# prefer lists over arrays?

I'm trying to learn F# and was watching a video when something odd (at least, to me) came up. The video in question is here and the relevant part starts at 2:30 for those interested. But basically, the guy says that F# makes it awkward to work with arrays and that the designers did so on purpose because lists are easier to "prepend and append".
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language? Specifically, I'm thinking of C#'s Lists where you can do something like List.Add(obj); and mutate the list. With an array you'd have to create an entirely new array, but that's also what would need to happen in an immutable language.
So why do the designers of F# prefer lists? What is the fundamental difference in an immutable environment between a list and an array? What am I missing? Are lists in F# really linked lists?
I would disagree that "F# makes it awkward to work with arrays". In fact, F# makes working with arrays quite nice compared to most languages.
For example, F# has literal array construction: let arr = [|1;2;3;4;|]. And perhaps even cooler, pattern matching on arrays:
match arr with
| [|1;2;_;_|] -> printfn "Starts with 1;2"
| [|_;_;3;4|] -> printfn "Ends with 3;4"
| _ -> printfn "Array not recognized"
As to why immutable singly linked lists are preferred in functional programming like F#, there's a lot to say, but the short answer is that it allows an O(1) prepend efficiency, and allows the implementation to share nodes, so it is easy on memory. For example,
let x = [2;3]
let y = 1::x
Here y is created by prepending 1 to x, but x is neither modified nor copied, so making y was very cheap. We can see a bit how this is possible, since x points to the head, 2, of the initially constructed list and can only move forward, and since the elements of the list it points to can't be mutated, it doesn't matter that y shares nodes with it.
In functional languages lists are usually single-linked lists. I.e. it is not necessary to copy the complete list. Instead prepending (often called cons) is a O(1) operation and you can still use the old list, because lists are immutable.
First of all, arrays are pretty low-level data structure and they are really only useful if you know the lenght of the array when creating it. That's not often the case and that's a reason why C# programmers use System.Collections.Generic.List<T> and F# programmers use F# list<T>.
The reason why F# prefers its own functional list rather than using .NET List<T> is that functional languages prefer immutable types. Instead of modifying the object by calling list.Add(x), you can create new list with items added to the front by writing let newList = x::list.
I also agree with Stephen that using arrays in F# is not awkward at all. If you know the number of elements you're working with or you're transforming some existing data source, then working with arrays is quite easy:
/ You can create arrays using `init`
let a = Array.init 10 (fun i -> (* calculate i-th element here *) )
// You can transform arrays using `map` and `filter`
a |> Array.map (fun n -> n + 10)
|> Array.filter (fun n -> n > 12)
// You can use array comprehensions:
let a2 = [| for n in a do
if n > 12 then yield n + 10 |]
This is essentially the same as processing lists - there you would use list comprehensions [ ... ] and list processing functions such as List.map etc. The difference really appears just when initializing the list/array.
F# makes it awkward to work with arrays
F# provides many features that make it easier to work with arrays than in other languages, including array literals, array patterns and higher-order functions.
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language?
I believe you have misunderstood what that statement means. When people talk about prepending and appending in the context of purely functional data structures they are referring to the creation of a new collection that is derived (and shared most of its internals) with an existing collection.
So why do the designers of F# prefer lists?
F# inherited some list-related capabilities from OCaml which inherited them from Standard ML and ML because singly-linked immutable lists are very useful in the context of their application domain (metaprogramming) but I would not say that the designers of F# prefer lists.
What is the fundamental difference in an immutable environment between a list and an array?
In F#, lists provide O(1) prepend and append and O(n) random access whereas arrays provide O(n) prepend and append and O(1) random access. Arrays can be mutated but lists cannot.
What am I missing?
Basic knowledge of purely functional data structures. Read Okasaki.
Are lists in F# really linked lists?
Yes. Specifically, singly-linked immutable lists. In fact, in some MLs the list type can be defined as:
type 'a list =
| ([])
| (::) of 'a * 'a list
This is why the :: operator is a constructor and not a function so you cannot write (::) as you can with, for example, (+).
An F# list is more like the following datastructure - a single linked list:
public class List<T> {
public List(T item, List<T> prev) { /*...*/ }
public T Item { get; }
public List<T> Prev { get; }
}
So when a new list is created, it is actually creating a single node with a reference to the first element of the previous list, rather than copying the entire array.

Resources