Scala map sorting - sorting

How do I sort a map of this kind:
"01" -> List(34,12,14,23), "11" -> List(22,11,34)
by the beginning values?

One way is to use scala.collection.immutable.TreeMap, which is always sorted by keys:
val t = TreeMap("01" -> List(34,12,14,23), "11" -> List(22,11,34))
//If you have already a map...
val m = Map("01" -> List(34,12,14,23), "11" -> List(22,11,34))
//... use this
val t = TreeMap(m.toSeq:_*)
You can convert it to a Seq or List and sort it, too:
//by specifying an element for sorting
m.toSeq.sortBy(_._1) //sort by comparing keys
m.toSeq.sortBy(_._2) //sort by comparing values
//by providing a sort function
m.toSeq.sortWith(_._1 < _._1) //sort by comparing keys
There are plenty of possibilities, each more or less convenient in a certain context.

As stated, the default Map type is unsorted, but there's always SortedMap
import collection.immutable.SortedMap
SortedMap("01" -> List(34,12,14,23), "11" -> List(22,11,34))
Although I'm guessing you can't use that, because I recognise this homework and suspect that YOUR map is the result of a groupBy operation. So you have to create an empty SortedMap and add the values:
val unsorted = Map("01" -> List(34,12,14,23), "11" -> List(22,11,34))
val sorted = SortedMap.empty[String, List[Int]] ++ unsorted
//or
val sorted = SortedMap(unsorted.toSeq:_*)
Or if you're not wedded to the Map interface, you can just convert it to a sequence of tuples. Note that this approach will only work if both the keys and values have a defined ordering. Lists don't have a default ordering defined, so this won't work with your example code - I therefore made up some other numbers instead.
val unsorted = Map("01" -> 56, "11" -> 34)
val sorted = unsorted.toSeq.sorted
This might be useful if you can first convert your lists to some other type (such as a String), which is best done using mapValues
update: See Landei's answer, which shows how you can provide a custom sort function that'll make this approach work.

Related

Can I check whether a bounded list contains duplicates, in linear time?

Suppose I have an Int list where elements are known to be bounded and the list is known to be no longer than their range, so that it is entirely possible for it not to contain duplicates. How can I test most quickly whether it is the case?
I know of nubOrd. It is quite fast. We can pass our list through and see if it becomes shorter. But the efficiency of nubOrd is still not linear.
My idea is that we can trade space for time efficiency. Imperatively, we would allocate a bit field as wide as our range, and then traverse the list, marking the entries corresponding to the list elements' values. As soon as we try to flip a bit that is already 1, we return False. It only takes (read + compare + write) * length of the list. No binary search trees, no nothing.
Is it reasonable to attempt a similar construction in Haskell?
The discrimination package has a linear time nub you can use. Or a linear time group that doesn't require the equivalent elements to be adjacent in order to group them, so you could see if any of the groups are not size 1.
The whole package is based on sidestepping the well known bounds on comparison-based sorts (and joins, and etc) by using algorithms based on "discrimination" rather than ones based on comparisons. As I understand it, the technique is somewhat like a radix sort, but generalised to ADTs.
For integers (and other Ix-like types), you could use a mutable array, for example with the array package.
We can here use a STUArray here, like:
import Control.Monad.ST
import Data.Array.ST
updateDups_ :: [Int] -> STArray s Int Bool -> ST s Bool
updateDups_ [] _ = return False
updateDups_ (x:xs) arr = do
contains <- readArray arr x
if contains then return True
else writeArray arr x True >> updateDups_ xs arr
withDups_ :: Int -> [Int] -> ST s Bool
withDups_ mx l = newArray (0, mx) False >>= updateDups_ l
withDups :: Int -> [Int] -> Bool
withDups mx ls = runST (withDups_ mx ls)
For example:
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,5]
False
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,1]
True
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,16,2]
True
So here the first parameter is the maximum value that can be added in the list, and the second parameter the list of values we want to check.
So you have a list of size N, and you know that the elements in the list are within the range min .. min+N-1.
There is a simple linear time algorithm that requires O(1) space.
First, scan the list to find the minimum and maximum elements.
If (max - min + 1) < N then you know there's a duplicate. Otherwise ...
Because the range is N, the minimum item can go at a[0], and the max item at a[n-1]. You can map any item to its position in the array simply by subtracting min. You can do an in-place sort in O(n) because you know exactly where every item should go.
Starting at the beginning of the list, take the first element and subtract min to determine where it should go. Go to that position, and replace the item that's there. With the new item, compute where it should go, and replace the item in that position, etc.
If you ever get to a point where you're you're trying to place an item at a[x], and the value already there is the value that's supposed to be there (i.e. a[x] == x+min), then you've found a duplicate.
The code to do all this is pretty simple:
Corrected code.
min, max = findMinMax()
currentIndex = 0
while currentIndex < N
temp = a[currentIndex]
targetIndex = temp - min;
// Do this until we wrap around to the current index
// If the item is already in place, then targetIndex == currentIndex,
// and we won't enter the loop.
while targetIndex != currentIndex
if (a[targetIndex] == temp)
// the item at a[targetIndex] is the item that's supposed to be there.
// The only way that can happen is if the item we have in temp is a duplicate.
found a duplicate
end if
save = a[targetIndex]
a[targetIndex] = temp
temp = save
targetIndex = temp - min
end while
// At this point, targetIndex == currentIndex.
// We've wrapped around and need to place the last item.
// There's no need to check here if a[targetIndex] == temp, because if it did,
// we would not have entered the loop.
a[targetIndex] = temp
++currentIndex
end while
That's the basic idea.

In Scala, what is the right way to sort a list on a composite key

I am trying to fetch top N-1 elements from a List. I have gone through similar posts in SO, like here and here. I have understood the motivations behind the solutions proposed in those posts. However, I think my problem is a bit different.
I have a sorted (descending) list of some elements. Let's assume that it has been sorted in an efficient manner, because that is not the point here.
The head is the topmost that I have got. But, there are 0 or more duplicates in the rest of the list. The attr1 of each of these duplicates is the same as that of head. I want to extract these duplicates (if any). Then I want to sort this (top + duplicates) list on the second attribute, attr2. In other words, I want to sort on a composite key: first by Key_1, and then - for the same Key_1 - by Key_2.
Let's say that an element in the list is a Pair.
case class PairOfAttrs(val int attr1,val int attr2)
/* .. more code here ... */
// Now, we obtain a sorted (descending) list of PairOfAttrs below
// Ordering is based on the value of attr1 (key_1)
val sortedPairsFirstAttr = .. // Seq(Pair1,Pair2,Pair3,Pair4....Pairn)
val top = sortedPairsFirstAttr.head
Obviously, top's attr1 is the highest in the list. To extract the duplicates, I do:
val otherToppers = sortedPairsFirstAttr.tail.filter
(e => e.attr1 == topper.attr1) // Expression(1)
This is the key point, IMHO. I cannot find the duplicates until isolate the top and use its attribute (attr1) during the comparison.
// Then, I create a new list of the toppers only
val semiToppers = if (!otherToppers.empty) {
List(topper) ++ otherToppers // Expression(2)
}
Then, I sort the resultant list. Ordering is based on the value of attr2 (Key_2)
val finalToppers = semiToppers.sortWith(_._2 < _.2) // Expression(3)
So, effectively, I have sorted the original list using a compound key: sorted descendingly on Key_1 and then, ascendingly on Key_2.
I understand that Experssion(1) can be optimized for a long list; I don't need to travel through the entire list. I can break out of it earlier. I also understand that Expression(2) and Expression(3) can be merged. So, we can keep these two points aside.
My question is if my approach is functionally appropriate and acceptable. If not, then what you think is a better, more idiomatic approach?
This is a learner's question. So, I seek your comments/observations wholeheartedly.
Your solution is working and valid, so I won't comment on it. You already use val and immutable data structures, which is the pure way of doing it.
I will instead suggest an alternative way that is a slightly shorter and might also interest you:
Since you preprocess your list anyway, I suggest you to sort it lexicographically in the first place: First by descending attr1, and in case of a tie, by ascending attr2:
scala> val originalList = Seq((1,0), (2,7), (3,1), (1,3), (2,5), (3,4), (3,2))
scala> val lst = originalList.sortBy(x => (-x._1, x._2))
lst: Seq[(Int, Int)] = List((3,1), (3,2), (3,4), (2,5), (2,7), (1,0), (1,3))
Now you just have to take the duplicates from the front and the result is already sorted by attr2:
scala> lst.takeWhile(_._1 == lst.head._1)
res8: Seq[(Int, Int)] = List((3,1), (3,2), (3,4))

Neat way of computing functions on key-value pairs

Suppose you have a have list with key - value pairs. Neither
keys, nor values, nor the pair are required to be unique.
The following example
a -> 1
b -> 2
c -> 3
a -> 3
b -> 1
would be valid.
Now suppose I want to associate to any key value pair (k->v) another value V,
which has the following properties:
it is the same for two pairs, if their keys are identical
it is uniquely determined by the set of key-value pairs in the entire list
This sounds abstract, but for example the sum, the maximum and counting function qualify as examples
Pair SUM MAX COUNT
a -> 1 4 3 2
b -> 2 3 2 2
c -> 3 3 3 1
a -> 3 4 3 2
b -> 1 3 2 2
I am looking for a fast methods/data structures to compute such functions on the entire list.
If the keys can be sorted, one can simply sort the list, then iterate through the sorted list, and compute the function V in each block with identical keys.
I am asking whether there are nice methods to do this, if the values are not comparable or one does
not want to change the order of the entries.
Some thoughts:
Of course, one could apply a hash function to the keys, in order to obtain sortable keys.
Of course, one could also store the original position of each pair, then do the sorting, then compute
the function, and finally undo the sorting.
So essentially the question is already answered. However, I am interested in whether
there are more elegant solutions maybe using some adapt data structure
EDIT: To clarify Sunny Agrawal comment, what I mean by associate. Well this is also part of the question on how to nicely arrange the data structure.
In my example, I would get another list/map with (k->v) as key and V as value. However, it might make sense to not arrange the data that way. I require, that V is stored in such a way that for given k it needs constant time to obtain V.
Maintain 2 DS
1. List< Pair< Key_Type, Value_Type > >
2. Map<Key_Type, Stats>
where Stats is struct as follows
struct Stats
{
int Sum;
int Count;
int Max;
};
First DS Contains all your key,val pairs in the order you want to store,
Second maintains the data stats for each key as shown in your example
Insert will work as follows(Pseudo C++ Code)
void Insert(key,val)
{
list.insert(Pair(key,val))
Stats curr;
if(map.contains(key))
{
curr = map[key];
curr.Max = Max(curr.Max, val);
curr.Count++;
curr.Sum += val;
}
else
{
curr.Max = val
curr.Count = 1;
curr.Sum = val;
}
map[key] = curr;
}
Complexity will be O(1) for updating list and O(lgM) for updating map
where M is no of unique Keys and
if N is total no of objects in list
Total time in inserts will be O(N) + O(NlogM)
Note: this will work if we have inserts only, in case of deletions, Updating Max will be difficult

Efficient way to identify named Sets with common elements in Scala

Given a Map[String, Set[String]] what is an elegant and efficient way in Scala to determine the set of all pairs of distinct keys where the corresponding sets have a non-empty intersection?
For example fix the map as
val input = Map (
"a" -> Set("x", "z"),
"b" -> Set("f")
"c" -> Set("f", "z", "44")
"d" -> Set("99")
)
then the required output is
Set(
("a", "c"),
("b", "c")
)
Efficient in this context means better than O(n^2) where n is the sum of the number of elements in the family of sets given as input.
You can't get better pessimistic complexity than O(n^2). Look at following example:
Map(
1 -> Set("a"),
2 -> Set("a"),
3 -> Set("a"),
...
n -> Set("a")
)
In this case every single pair of sets has non-empty intersection. So the size of output in this case is O(n^2), so you can't get better complexity.
Obviously, that doesn't mean you can't think of better algorithm than just brute force. For example, you could transform this:
val input = Map (
"a" -> Set("x", "z"),
"b" -> Set("f")
"c" -> Set("f", "z", "44")
"d" -> Set("99")
)
into this:
val transformed = Map (
"x" -> Set("a"),
"z" -> Set("a", "c"),
"f" -> Set("b", "c"),
"44" -> Set("c"),
"99" -> Set("d")
)
You can do this in linear time. I'd use Scala collection builders or mutable collections for this to avoid expensive operations on immutable collections.
Then you can just look at every set that is a value in this transformed map and for each one, generate all possible pairs of its elements. This can take O(n^2) but if you don't have many pairs in your output then it will be a lot faster.

Efficient algorithm to remove any map that is contained in another map from a collection of maps

I have set (s) of unique maps (Java HashMaps currently) and wish to remove from it any maps that are completely contained by some other map in the set (i.e. remove m from s if m.entrySet() is a subset of n.entrySet() for some other n in s.)
I have an n^2 algorithm, but it's too slow. Is there a more efficient way to do this?
Edit:
the set of possible keys is small, if that helps.
Here is an inefficient reference implementation:
public void removeSubmaps(Set<Map> s) {
Set<Map> toRemove = new HashSet<Map>();
for (Map a: s) {
for (Map b : s) {
if (a.entrySet().containsAll(b.entrySet()))
toRemove.add(b);
}
}
s.removeAll(toRemove);
}
Not sure I can make this anything other than an n^2 algorithm, but I have a shortcut that might make it faster. Make a list of your maps with the length of the each map and sort it. A proper subset of a map must be shorter or equal to the map you're comparing - there's never any need to compare to a map higher on the list.
Here's another stab at it.
Decompose all your maps into a list of key,value,map number. Sort the list by key and value. Go through the list, and for each group of key/value matches, create a permutation of all the map number pairs - these are all potential subsets. When you have the final list of pairs, sort by map numbers. Go through this second list, and count the number of occurrences of each pair - if the number matches the size of one of the maps, you've found a subset.
Edit: My original interpretation of the problem was incorrect, here is new answer based on my re-read of the question.
You can create a custom hash function for HashMap which returns the product of all hash value of its entries. Sort the list of hash value and start loop from biggest value and find all divisor from smaller hash values, these are possible subsets of this hashmap, use set.containsAll() to confirm before marking them for removal.
This effectively transforms the problem into a mathematical problem of finding possible divisor from a collection. And you can apply all the common divisor-search optimizations.
Complexity is O(n^2), but if many hashmaps are subsets of others, the actual time spent can be a lot better, approaching O(n) in best-case scenario (if all hashmaps are subset of one). But even in worst case scenario, division calculation would be a lot faster than set.containsAll() which itself is O(n^2) where n is number of items in a hashmap.
You might also want to create a simple hash function for hashmap entry objects to return smaller numbers to increase multiply/division performance.
Here's a subquadratic (O(N**2 / log N)) algorithm for finding maximal sets from a set of sets: An Old Sub-Quadratic Algorithm for Finding Extremal Sets.
But if you know your data distribution, you can do much better in average case.
This what I ended up doing. It works well in my situation as there is usually some value that is only shared by a small number of maps. Kudos to Mark Ransom for pushing me in this direction.
In prose: Index the maps by key/value pair, so that each key/value pair is associated with a set of maps. Then, for each map: Find the smallest set associated with one of it's key/value pairs; this set is typically small for my data. Each of the maps in this set is a potential 'supermap'; no other map could be a 'supermap' as it would not contain this key/value pair. Search this set for a supermap. Finally remove all the identified submaps from the original set.
private <K, V> void removeSubmaps(Set<Map<K, V>> maps) {
// index the maps by key/value
List<Map<K, V>> mapList = toList(maps);
Map<K, Map<V, List<Integer>>> values = LazyMap.create(HashMap.class, ArrayList.class);
for (int i = 0, uniqueRowsSize = mapList.size(); i < uniqueRowsSize; i++) {
Map<K, V> row = mapList.get(i);
Integer idx = i;
for (Map.Entry<K, V> entry : row.entrySet())
values.get(entry.getKey()).get(entry.getValue()).add(idx);
}
// find submaps
Set<Map<K, V>> toRemove = Sets.newHashSet();
for (Map<K, V> submap : mapList) {
// find the smallest set of maps with a matching key/value
List<Integer> smallestList = null;
for (Map.Entry<K, V> entry : submap.entrySet()) {
List<Integer> list = values.get(entry.getKey()).get(entry.getValue());
if (smallestList == null || list.size() < smallestList.size())
smallestList = list;
}
// compare with each of the maps in that set
for (int i : smallestList) {
Map<K, V> map = mapList.get(i);
if (isSubmap(submap, map))
toRemove.add(submap);
}
}
maps.removeAll(toRemove);
}
private <K,V> boolean isSubmap(Map<K, V> submap, Map<K,V> map){
if (submap.size() >= map.size())
return false;
for (Map.Entry<K,V> entry : submap.entrySet()) {
V other = map.get(entry.getKey());
if (other == null)
return false;
if (!other.equals(entry.getValue()))
return false;
}
return true;
}

Resources