Groovy: sort hash keys by values value - sorting

I want to sort a hash of key->value by the values , and get the list of the sorted keys.
This seems to work:
groovy> def map = [a:5, b:3, c:6, d:4].sort { a, b -> a.value <=> b.value }.keySet()
groovy> println map
[b, d, a, c]
but will it always work?
I don't know if the iterator that builds the keySet() will always iterate them by order.
Thanks!

Short answer: Yes, the keySet() method will always return an ordered java.util.List.
Long answer: That's a bit hard to prove as we have to look at some source code.
Examination starts in groovy.runtime.DefaultGroovyMethods where the public static <K, V> Map<K, V> sort(Map<K, V> self, Closure closure) method returns a java.util.LinkedHashMap, which is ordered.
The LinkedHashMap's Set<K> keySet() method is defined in the java.util.HashMap class and returns an Iterator by calling the Iterator<K> newKeyIterator() method, which is overriden in the LinkedHashMap class][4]. It returns a LinkedHashMap$KeyIterator, which [defines the K next() method that internally calls the Entry<K,V> nextEntry() method, which returns an Entry that has been defined in the LinkedHashMap$Entry.after field.
Finally, one can see in the LinkedHashMap$Entry.addBefore(Entry<K,V> existingEntry) method that the LinkedHashMap$Entry.after field is set in an ordered manner.
Oh my ... I had linked each statement I did to the corresponding source code in groovy.runtime.DefaultGroovyMethods, java.util.HashMap and java.util.LinkedHashMap, summing up to 10 hyperlinks. Unfortunately, as a newbie at Stackoverflow, I'm just allowed to post one, having to remove most links ... Sorry.

Related

Cannot understand C++ map semantics

I don't understand the constructor statement in the following code. How can the iterator to the past-of-end element be added to the map as a key?
template<typename K, typename V>
class my_map {
std::map<K,V> m_map;
public:
my_map( V const& val) {
m_map.insert(m_map.end(),std::make_pair(std::numeric_limits<K>::lowest(),val));
}
};
How can the iterator to the past-of-end element be added to the map as a key?
It's not the key. It's the position of the insertion. By passing end you're saying append to the map.
The key that you're inserting is the first part of the pair. i.e. std::numeric_limits<K>::lowest().
The value that you're inserting is the second part of the pair. i.e. val.
The docs for std::map::insert are useful.
How can the iterator to the past-of-end element be added to the map as a key?
That's an incorrect conclusion. std::map::insert has several overloads. The one that is use in your call is:
iterator insert( iterator hint, const value_type& value ); // Overload 4
which does the following:
Inserts value in the position as close as possible, just prior, to hint.

Kotlin Instantiate Immutable List

I've started using Kotlin as a substitute for java and quite like it. However, I've been unable to find a solution to this without jumping back into java-land:
I have an Iterable<SomeObject> and need to convert it to a list so I can iterate through it more than once. This is an obvious application of an immutable list, as all I need to do is read it several times. How do I actually put that data in the list at the beginning though? (I know it's an interface, but I've been unable to find an implementation of it in documentation)
Possible (if unsatisfactory) solutions:
val valueList = arrayListOf(values)
// iterate through valuelist
or
fun copyIterableToList(values: Iterable<SomeObject>) : List<SomeObject> {
var outList = ArrayList<SomeObject>()
for (value in values) {
outList.add(value)
}
return outList
}
Unless I'm misunderstanding, these end up with MutableLists, which works but feels like a workaround. Is there a similar immutableListOf(Iterable<SomeObject>) method that will instantiate an immutable list object?
In Kotlin, List<T> is a read-only list interface, it has no functions for changing the content, unlike MutableList<T>.
In general, List<T> implementation may be a mutable list (e.g. ArrayList<T>), but if you pass it as a List<T>, no mutating functions will be exposed without casting. Such a list reference is called read-only, stating that the list is not meant to be changed. This is immutability through interfaces which was chosen as the approach to immutability for Kotlin stdlib.
Closer to the question, toList() extension function for Iterable<T> in stdlib will fit: it returns read-only List<T>.
Example:
val iterable: Iterable<Int> = listOf(1, 2, 3)
val list: List<Int> = iterable.toList()

BiConsumer cannot modify argument

I implemented a Collector for a java 8 stream that will store Entities to a Repository when a given threshold is hit.
public BiConsumer<Tuple2<Integer,List<T>>, T> accumulator() {
return (tuple, e) -> {
List<T> list = tuple._2;
list.add(e);
if (list.size() >= this.threshold) {
this.repository.save(list);
this.repository.flush();
list = new LinkedList<>();
}
tuple = new Tuple2<>(tuple._1 + 1, list);
};
}
This does not work as intended. The the Element e is added to the list but is not reset after the threshold is reached. Also Integer stays at 0 which is to be expected since its a final member.
As it seems my only option is to make my Tuple2 mutable and empty the List :-(
Any suggestions how to solve this using immutable tuples?
A lambda expression can be thought of as a function (or method) that simply doesn't have a name. In this example it would be like a method that has two formal parameters tuple and e and also some local variables within its body, including list.
When you make an assignment to a formal parameter or to a local variable, that assignment is local to the current method (or lambda body). No mutation or side effects will be visible to the outside after the accumulator returns, so these assignments won't affect the data structure you're collecting into.
I'm not entirely sure what you're trying to do, but instead of using a tuple (which presumably is immutable and must be replaced instead of mutated) you might try writing an ordinary, mutable class that contains an integer counter (or whatever) and a list. The accumulator would add to the list, conditionally flush and replace the list, and increment the counter. These mutative operations are allowed in collectors because the collector framework carefully thread-confines these operations, so the object you're mutating doesn't need to be thread-safe.

Need help overriding compareTo

I'm a Java novice that is having some trouble overriding the compareTo method in the Comparable interface. My code creates a HashMap that associates strings to an int. I would like to override compareTo so that the strings in the ArrayList keys are sorted based on their HashMap values, not alphabetically. Under this implementation, however, the strings are still sorted alphabetically.
Oh, and to clarify, nameWeight is the HashMap of String,Integer pairs.
Any ideas?
List<String> keys = new ArrayList<String>(nameWeight.keySet());
System.out.println(keys);
Collections.sort(keys);
public int compareTo(String that){
int gtr = 1;
int less = -1;
int eql = 0;
System.out.print(this);
System.out.print(that);
if(that=="JOHN")
return less;
int valThis = nameWeight.get(this);
int valThat = nameWeight.get(that);
if(valThis==valThat)
return eql;
if(valThis>valThat)
return gtr;
if(valThis<valThat)
return less;
return gtr;
}
You're sorting a list of strings so the compareTo method called is the one defined in class String (or its superclass). Since you can't modify String you would have to create a subclass of String, override compareTo in that class and use List<StringSubClass>. But since String is final you're not allowed to subclass from it (thanks #pst).
Alternatively you don't use String objects in the list but objects of the type you created and in which you override compareTo (don't forget to add implements Comparable to the class definition).
Or (shameless plug from #pst again), and that is probably the best solution, you pass a comparator to the sort function which will be used to sort the strings instead of the default implementation.

Sorted hash table (map, dictionary) data structure design

Here's a description of the data structure:
It operates like a regular map with get, put, and remove methods, but has a sort method that can be called to sorts the map. However, the map remembers its sorted structure, so subsequent calls to sort can be much quicker (if the structure doesn't change too much between calls to sort).
For example:
I call the put method 1,000,000 times.
I call the sort method.
I call the put method 100 more times.
I call the sort method.
The second time I call the sort method should be a much quicker operation, as the map's structure hasn't changed much. Note that the map doesn't have to maintain sorted order between calls to sort.
I understand that it might not be possible, but I'm hoping for O(1) get, put, and remove operations. Something like TreeMap provides guaranteed O(log(n)) time cost for these operations, but always maintains a sorted order (no sort method).
So what's the design of this data structure?
Edit 1 - returning the top-K entries
Alhough I'd enjoy hearing the answer to the general case above, my use case has gotten more specific: I don't need the whole thing sorted; just the top K elements.
Data structure for efficiently returning the top-K entries of a hash table (map, dictionary)
Thanks!
For "O(1) get, put, and remove operations" you essentially need O(1) lookup, which implies a hash function (as you know), but the requirements of a good hash function often break the requirement to be easily sorted. (If you had a hash table where adjacent values mapped to the same bucket, it would degenerate to O(N) on lots of common data, which is a worse case you typically want a hash function to avoid.)
I can think of how to get you 90% of the way there. Set up a hashtable alongside a parallel index that is sorted. The index has a clean part (ordered) and a dirty part (unordered). The index would map keys to the values (or references to the values stored in the hashtable - whichever suits you in terms of performance or memory use). When you add to the hashtable, the new entry is pushed onto the back of the dirty list. When you remove from the hashtable, the entry is nulled/removed from the clean and dirty parts of the index. You can sort the index, which sorts the dirty entries only, then merges them into the already sorted 'clean' part of the index. And obviously you can iterate over the index.
As far as I can see, this gives you the O(1) everywhere except on the remove operation and is still fairly simple to implement with standard containers (at least as provided by C++, Java, or Python). It also gives you the "second sort is cheaper" condition by only needing to sort the dirty index entries and then letting you do an O(N) merge. The cost of all this is obviously extra memory for the index and extra indirection when using it.
Why exactly do you need a sort() function ?
What do you perhaps want and need is a Red-Black Tree.
http://en.wikipedia.org/wiki/Red-black_tree
These trees are automatically sorting your input by a comparator you give. They are complex, but have excellent O(n) characteristics. Couple your tree entries as key with a hash
map as dictionary and you get your datastructure.
In Java it is implemented as TreeMap as instance of SortedMap.
What you're looking at is a hashtable with pointers in the entries to the next entry in sorted order. It's a lot like the LinkedHashMap in java except that the links are tracking a sort order rather than the insertion order. You can actually implement this totally by wrapping a LinkedHashMap and having the implementation of sort transfer the entries from the LinkedHashMap into a TreeMap and then back into a LinkedHashMap.
Here's an implementation that sorts the entries in an array list rather than transferring to a tree map. I think the sort algorithm used by Collection.sort will do a good job of merging the new entries into the already sorted portion.
public class SortaSortedMap<K extends Comparable<K>,V> implements Map<K,V> {
private LinkedHashMap<K,V> innerMap;
public SortaSortedMap() {
this.innerMap = new LinkedHashMap<K,V>();
}
public SortaSortedMap(Map<K,V> map) {
this.innerMap = new LinkedHashMap<K,V>(map);
}
public Collection<V> values() {
return innerMap.values();
}
public int size() {
return innerMap.size();
}
public V remove(Object key) {
return innerMap.remove(key);
}
public V put(K key, V value) {
return innerMap.put(key, value);
}
public Set<K> keySet() {
return innerMap.keySet();
}
public boolean isEmpty() {
return innerMap.isEmpty();
}
public Set<Entry<K, V>> entrySet() {
return innerMap.entrySet();
}
public boolean containsKey(Object key) {
return innerMap.containsKey(key);
}
public V get(Object key) {
return innerMap.get(key);
}
public boolean containsValue(Object value) {
return innerMap.containsValue(value);
}
public void clear() {
innerMap.clear();
}
public void putAll(Map<? extends K, ? extends V> m) {
innerMap.putAll(m);
}
public void sort() {
List<Map.Entry<K,V>> entries = new ArrayList<Map.Entry<K,V>>(innerMap.entrySet());
Collections.sort(entries, new KeyComparator());
LinkedHashMap<K,V> newMap = new LinkedHashMap<K,V>();
for (Map.Entry<K,V> e: entries) {
newMap.put(e.getKey(), e.getValue());
}
innerMap = newMap;
}
private class KeyComparator implements Comparator<Map.Entry<K,V>> {
public int compare(Entry<K, V> o1, Entry<K, V> o2) {
return o1.getKey().compareTo(o2.getKey());
}
}
}
I don't know if there's a name, but you could store the current index of each item on the hash.
That is, you have a HashMap< Object, Pair( Integer, Object ) >
and a List<Object> objects
When you put, add to the tail or head of the list and insert into the hashmap with your data and the index of insertion. This is O(1).
When you get, pull from the hashmap and ignore the index. This is O(1).
When you remove, you pull from the map. Take the index and remove from the list as well. This is O(1)
When you sort, just sort the list. Either update the indexes in the map during the sort, or update after the sort is complete. This does not affect the O(nlgn) sort, as it's a linear step. O(nlgn + n) == O(nlgn)
Ordered Dictionary
Recent versions of Python (2.7, 3.1) have "ordered dictionaries" which sound like what you're describing.
The official Python "ordered dictionary" implementation is inspired by previous 3rd-party implementations, as described in the PEP 372.
References:
collections.OrderedDict documentation for Python 2.7
collections.OrderedDict documentation for Python 3.1
PEP 372
ActiveState Ordered Dictionary recipe for Python ≥ 2.4
I'm not aware of a data structure classification with that exact behavior, at least not in Java Collections (or from nonlinear data structures class). Perhaps you can implement it, and it will henceforth be known as the RudigerMap.

Resources