Subdivide a list into list of lists [duplicate] - algorithm

This question already has answers here:
Split list into multiple lists with fixed number of elements
(5 answers)
Closed 8 years ago.
I would like to subdivide a list into lists of list with a max size for each sublist. For example, given List(1,2,5,3,90,3,4,1,0,3) and max size of sublists defined as 4, I would like to get List(List(1,2,5,3), List(90,3,4,1), List(0,3)) back.
This is what has already been done:
val l: List[Int] = ???
val subSize: Int = 4
val rest: Int = if(l.size % subSize == 0) 1 else 0
val subdivided: List[List[Int]] = for{
j <- List.range(0, l.size/subSize - rest, 1)
}yield{
for{
i <- List.range(subSize*j,subSize*j+3,1)
if(i < l.size)
}yield{
l(i)
}
}
Is there a better, more functional way of doing this?

Yes there is, using grouped.
scala> List(1,2,5,3,90,3,4,1,0,3).grouped(4).toList
res1: List[List[Int]] = List(List(1, 2, 5, 3), List(90, 3, 4, 1), List(0, 3))
Note that grouped actually returns an Iterator, so that you can lazily traverse the collection without doing all the computation at once.

Related

Number of ways to pick the elements of an array?

How to formulate this problem in code?
Problem Statement:
UPDATED:
Find the number of ways to pick the element from the array which are
not visited.
We starting from 1,2,.....,n with some (1<= x <= n) number of elements already picked/visited randomly which is given in the input.
Now, we need to find the number of ways we can pick rest of the (n - x) number of elements present in the array, and the way we pick an element is defined as:
On every turn, we can only pick the element which is adjacent(either left or right) to some visited element i.e
in an array of elements:
1,2,3,4,5,6 let's say we have visited 3 & 6 then we can now pick
2 or 4 or 5, as they are unvisited and adjacent to visited nodes, now say we pick 2, so now we can pick 1 or 4 or 5 and continues.
example:
input: N = 6(number of elements: 1, 2, 3, 4, 5, 6)
M = 2(number of visited elements)
visited elements are = 1, 5
Output: 16(number of ways we can pick the unvisited elements)
ways: 4, 6, 2, 3
4, 6, 3, 2
4, 2, 3, 6
4, 2, 6, 3
4, 3, 2, 6
4, 3, 6, 2
6, 4, 2, 3
6, 4, 2, 3
6, 2, 3, 4
6, 2, 4, 3
2, 6, 4, 3
2, 6, 3, 4
2, 4, 6, 3
2, 4, 3, 6
2, 3, 4, 6
2, 3, 6, 4.
Some analysis of the problem:
The actual values in the input array are assumed to be 1...n, but these values do not really play a role. These values just represent indexes that are referenced by the other input array, which lists the visited indexes (1-based)
The list of visited indexes actually cuts the main array into subarrays with smaller sizes. So for example, when n=6 and visited=[1,5], then the original array [1,2,3,4,5,6] is cut into [2,3,4] and [6]. So it cuts it into sizes 3 and 1. At this point the index numbering loses its purpose, so the problem really is fully described with those two sizes: 3 and 1. To illustrate, the solution for (n=6, visited=[1,5]) is necessarily the same as for (n=7, visited[1,2,6]): the sizes into which the original array is cut, are the same in both cases (in a different order, but that doesn't influence the result).
Algorithm, based on a list of sizes of subarrays (see above):
The number of ways that one such subarray can be visited, is not that difficult: if the subarray's size is 1, there is just one way. If it is greater, then at each pick, there are two possibilities: either you pick from the left side or from the right side. So you get like 2*2*..*2*1 possibilities to pick. This is 2size-1 possibilities.
The two outer subarrays are an exception to this, as you can only pick items from the inside-out, so for those the number of ways to visit such a subarray is just 1.
The number of ways that you can pick items from two subarrays can be determined as follows: count the number of ways to pick from just one of those subarrays, and the number of ways to pick from the other one. Then consider that you can alternate when to pick from one sub array or from the other. This comes down to interweaving the two sub arrays. Let's say the larger of the two sub arrays has j elements, and the smaller k, then consider there are j+1 positions where an element from the smaller sub array can be injected (merged) into the larger array. There are "k multichoose j+1" ways ways to inject all elements from the smaller sub array.
When you have counted the number of ways to merge two subarrays, you actually have an array with a size that is the sum of those two sizes. The above logic can then be applied with this array and the next subarray in the problem specification. The number of ways just multiplies as you merge more subarrays into this growing array. Of course, you don't really deal with the arrays, just with sizes.
Here is an implementation in JavaScript, which applies the above algorithm:
function getSubArraySizes(n, visited) {
// Translate the problem into a set of sizes (of subarrays)
let j = 0;
let sizes = [];
for (let i of visited) {
let size = i - j - 1;
if (size > 0) sizes.push(size);
j = i;
}
let size = n - j;
if (size > 0) sizes.push(size);
return sizes;
}
function Combi(n, k) {
// Count combinations: "from n, take k"
// See Wikipedia on "Combination"
let c = 1;
let end = Math.min(k, n - k);
for (let i = 0; i < end; i++) {
c = c * (n-i) / (end-i); // This is floating point
}
return c; // ... but result is integer
}
function getPickCount(sizes) {
// Main function, based on a list of sizes of subarrays
let count = 0;
let result = 1;
for (let i = 0; i < sizes.length; i++) {
let size = sizes[i];
// Number of ways to take items from this chunk:
// - when items can only be taken from one side: 1
// - otherwise: every time we have a choice between 2, except for the last remaining item
let pickCount = i == 0 || i == sizes.length-1 ? 1 : 2 ** (size-1);
// Number of ways to merge/weave two arrays, where relative order of elements is not changed
// = a "k multichoice from n". See
// https://en.wikipedia.org/wiki/Combination#Number_of_combinations_with_repetition
let weaveCount = count == 0 ? 1 // First time only
: Combi(size+count, Math.min(count, size));
// Number of possibilities:
result *= pickCount * weaveCount;
// Update the size to be the size of the merged/woven array
count += size;
}
return result;
}
// Demo with the example input (n = 6, visited = 1 and 5)
let result = getPickCount(getSubArraySizes(6, [1, 5]));
console.log(result);

Given an array of ints and a number n, calculate the number of ways to sum to n using the ints

I saw this problem in my interview preparation.
Given an array of ints and a number n, calculate the number of ways to
sum to n using the ints
Following code is my solution. I tried to solve this by recursion. Subproblem is for each int in the array, we can either pick it or not.
public static int count(List<Integer> list, int n) {
System.out.print(list.size() + ", " + n);
System.out.println();
if (n < 0 || list.size() == 0)
return 0;
if (list.get(0) == n)
return 1;
int e = list.remove(0);
return count(list, n) + count(list, n - e);
}
I tried to use [10, 1, 2, 7, 6, 1, 5] for ints, and set n to 8. The result should be 4. However, I got 0. I tried to print what I have on each layer of stack to debug as showed in the code. Following is what I have:
7, 8
6, 8
5, 8
4, 8
3, 8
2, 8
1, 8
0, 8
0, 3
0, 7
0, 2
0, 1
0, 6
0, 7
0, -2
This result confuses me. I think it looks right from beginning to (0, 3). Starting from (0, 7), it looks wrong to me. I expect (1, 7) there. Because if I understand correctly, this is for count(list, n - e) call on second to the bottom layer on the stack. The list operation on the lower layer shouldn't impact the list on the current layer.
So my questions are:
why is it (0, 7) instead of (1, 7) based on my current code?
what adjustment should I do to my current code to get the correct result?
Thanks!
The reason why your algorithm is not working is because you are using one list that is being modified before the recursive calls.
Since the list is passed by reference, what ends up happening is that you recursively call remove until there is nothing in the list any more and then all of your recursive calls are going to return 0
What you could do is create two copies of the list on every recursive step. However, this would be way too inefficient.
A better way would be to use an index i that marks the element in the list that is being looked at during the call:
public static int count(List<Integer> list, int n, int i) {
//System.out.print(list.size() + ", " + n);
//System.out.println();
if (n < 0 || i <= 0)
return 0;
int e = list.get(i); // e is the i-th element in the list
if (e == n)
return 1 + count(list, n, i-1); // Return 1 + check for more possibilities without picking e
return count(list, n, i-1) + count(list, n - e, i-1); // Result if e is not picked + result if e is picked
}
You would then pass yourList.size() - 1 for i on the initial function call.
One more point is that when you return 1, you still have to add the number of possibilities for when your element e is not picked to be part of a sum. Otherwise, if - for example - your last element in the list was n, the recursion would end on the first step only returning 1 and not checking for more possible number combinations.
Finally, you might want to rewrite the algorithm using a dynamic approach, since that would give you a way better running time.

Algorithm for combining different age groups together based on their values

Let's say we have an array of age groups and an array of the number of people in each age group
For example:
Ages = ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
People = (1, 10, 21, 3, 2, 1)
I want to have an algorithm that combines these age groups with the following logic if there are fewer than 5 people in each group. The algorithm that I have so far does the following:
Start from the last element (e.g., "51+") can you combine it with the next group? (here "41-50") if yes add the numbers 1+2 and combine their labels. So we get the following
Ages = ("1-13", "14-20", "21-30", "31-40", "41+")
People = (1, 10, 21, 3, 3)
Take the last one again (here is "41+"). Can you combine it with the next group (31-40)? the answer is yes so we get:
Ages = ("1-13", "14-20", "21-30", "31+")
People = (1, 10, 21, 6)
since the group 31+ now has 6 members we cannot collapse it into the next group.
we cannot collapse "21-30" into the next one "14-20" either
"14-20" also has 10 people (>5) so we don't do anything on this either
for the first one ("1-13") since we have only one person and it is the last group we combine it with the next group "14-20" and get the following
Ages = ("1-20", "21-30", "31+")
People = (11, 21, 6)
I have an implementation of this algorithm that uses many flags to keep track of whether or not any data is changed and it makes a number of passes on the two arrays to finish this task.
My question is if you know any efficient way of doing the same thing? any data structure that can help? any algorithm that can help me do the same thing without doing too much bookkeeping would be great.
Update:
A radical example would be (5,1,5)
in the first pass it becomes (5,6) [collapsing the one on the right into the one in the middle]
then we have (5,6). We cannot touch 6 since it is larger than our threshold:5. so we go to the next one (which is element on the very left 5) since it is less than or equal to 5 and since it is the last one on the left we group it with the one on its right. so we finally get (11)
Here is an OCaml solution of a left-to-right merge algorithm:
let close_group acc cur_count cur_names =
(List.rev cur_names, cur_count) :: acc
let merge_small_groups mini l =
let acc, cur_count, cur_names =
List.fold_left (
fun (acc, cur_count, cur_names) (name, count) ->
if cur_count <= mini || count <= mini then
(acc, cur_count + count, name :: cur_names)
else
(close_group acc cur_count cur_names, count, [name])
) ([], 0, []) l
in
List.rev (close_group acc cur_count cur_names)
let input = [
"1-13", 1;
"14-20", 10;
"21-30", 21;
"31-40", 3;
"41-50", 2;
"51+", 1
]
let output = merge_small_groups 5 input
(* output = [(["1-13"; "14-20"], 11); (["21-30"; "31-40"; "41-50"; "51+"], 27)] *)
As you can see, the result of merging from left to right may not be what you want.
Depending on the goal, it may make more sense to merge the pair of consecutive elements whose sum is smallest and iterate until all counts are above the minimum of 5.
Here is my scala approach.
We start with two lists:
val people = List (1, 10, 21, 3, 2, 1)
val ages = List ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
and combine them to a kind of mapping:
val agegroup = ages.zip (people)
define a method to merge two Strings, describing an (open ended) interval. The first parameter is, if any, the one with the + in "51+".
/**
combine age-strings
a+ b-c => b+
a-b c-d => c-b
*/
def merge (xs: String, ys: String) = {
val xab = xs.split ("[+-]")
val yab = ys.split ("-")
if (xs.contains ("+")) yab(0) + "+" else
yab (0) + "-" + xab (1)
}
Here is the real work:
/**
reverse the list, combine groups < threshold.
*/
def remap (map: List [(String, Int)], threshold : Int) = {
def remap (mappings: List [(String, Int)]) : List [(String, Int)] = mappings match {
case Nil => Nil
case x :: Nil => x :: Nil
case x :: y :: xs => if (x._2 > threshold) x :: remap (y :: xs) else
remap ((merge (x._1, y._1), x._2 + y._2) :: xs) }
val nearly = (remap (map.reverse)).reverse
// check for first element
if (! nearly.isEmpty && nearly.length > 1 && nearly (0)._2 < threshold) {
val a = nearly (0)
val b = nearly (1)
val rest = nearly.tail.tail
(merge (b._1, a._1), a._2 + b._2) :: rest
} else nearly
}
and invocation
println (remap (agegroup, 5))
with result:
scala> println (remap (agegroup, 5))
List((1-20,11), (21-30,21), (31+,6))
The result is a list of pairs, age-group and membercount.
I guess the main part is easy to understand: There are 3 basic cases: an empty list, which can't be grouped, a list of one group, which is the solution itself, and more than one element.
If the first element (I reverse the list in the beginning, to start with the end) is bigger than 5 (6, whatever), yield it, and procede with the rest - if not, combine it with the second, and take this combined element and call it with the rest in a recursive way.
If 2 elements get combined, the merge-method for the strings is called.
The map is remapped, after reverting it, and the result reverted again. Now the first element has to be inspected and eventually combined.
We're done.
I think a good data structure would be a linked list of pairs, where each pair contains the age span and the count. Using that, you can easily walk the list, and join two pairs in O(1).

How to generate cross product of sets in specific order

Given some sets (or lists) of numbers, I would like to iterate through the cross product of these sets in the order determined by the sum of the returned numbers. For example, if the given sets are { 1,2,3 }, { 2,4 }, { 5 }, then I would like to retrieve the cross-products in the order
<3,4,5>,
<2,4,5>,
<3,2,5> or <1,4,5>,
<2,2,5>,
<1,2,5>
I can't compute all the cross-products first and then sort them, because there are way too many. Is there any clever way to achieve this with an iterator?
(I'm using Perl for this, in case there are modules that would help.)
For two sets A and B, we can use a min heap as follows.
Sort A.
Sort B.
Push (0, 0) into a min heap H with priority function (i, j) |-> A[i] + B[j]. Break ties preferring small i and j.
While H is not empty, pop (i, j), output (A[i], B[j]), insert (i + 1, j) and (i, j + 1) if they exist and don't already belong to H.
For more than two sets, use the naive algorithm and sort to get down to two sets. In the best case (which happens when each set is relatively small), this requires storage for O(√#tuples) tuples instead of Ω(#tuples).
Here's some Python to do this. It should transliterate reasonably straightforwardly to Perl. You'll need a heap library from CPAN and to convert my tuples to strings so that they can be keys in a Perl hash. The set can be stored as a hash as well.
from heapq import heappop, heappush
def largest_to_smallest(lists):
"""
>>> print list(largest_to_smallest([[1, 2, 3], [2, 4], [5]]))
[(3, 4, 5), (2, 4, 5), (3, 2, 5), (1, 4, 5), (2, 2, 5), (1, 2, 5)]
"""
for lst in lists:
lst.sort(reverse=True)
num_lists = len(lists)
index_tuples_in_heap = set()
min_heap = []
def insert(index_tuple):
if index_tuple in index_tuples_in_heap:
return
index_tuples_in_heap.add(index_tuple)
minus_sum = 0 # compute -sum because it's a min heap, not a max heap
for i in xrange(num_lists): # 0, ..., num_lists - 1
if index_tuple[i] >= len(lists[i]):
return
minus_sum -= lists[i][index_tuple[i]]
heappush(min_heap, (minus_sum, index_tuple))
insert((0,) * num_lists)
while min_heap:
minus_sum, index_tuple = heappop(min_heap)
elements = []
for i in xrange(num_lists):
elements.append(lists[i][index_tuple[i]])
yield tuple(elements) # this is where the tuple is returned
for i in xrange(num_lists):
neighbor = []
for j in xrange(num_lists):
if i == j:
neighbor.append(index_tuple[j] + 1)
else:
neighbor.append(index_tuple[j])
insert(tuple(neighbor))

How to generate a list of subsets with restrictions?

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.
Some examples:
Input
{1,2,3}
Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}
Input
{1,2,3,4}
Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}
Input
{1,2,2,3}
Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}
I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.
Any help is appreciated. Thanks.
If you were generating all subsets you would end up generating 2n subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2n-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2n subsets.
For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:
0 = 000b → ()
1 = 001b → (1)
2 = 010b → (2)
3 = 011b → (1, 2)
4 = 100b → (3)
5 = 101b → (1, 3)
6 = 110b → (2, 3)
7 = 111b → (1, 2, 3)
In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2n-1 - 1 and then stop.
1 = 001b → (1) + (2, 3)
2 = 010b → (2) + (1, 3)
3 = 011b → (1, 2) + (3)
To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.
Here's some Python code putting this all together.
#!/usr/bin/env python
def split_subsets(items):
subsets = set()
for n in xrange(1, 2 ** len(items) / 2):
# Use ith index if ith bit of n is set.
l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
# Use the indices NOT present in l_indices.
r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]
# Get the items corresponding to the indices above.
l = tuple(items[i] for i in l_indices)
r = tuple(items[i] for i in r_indices)
# Swap l and r if they are reversed.
if (len(l), l) > (len(r), r):
l, r = r, l
subsets.add((l, r))
# Sort the subset pairs so the left items are in ascending order.
return sorted(subsets, key = lambda (l, r): (len(l), l))
for l, r in split_subsets([1, 2, 2, 3]):
print l, r
Output:
(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)
The following C++ function does exactly what you need, but the order differs from the one in examples:
// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
typedef std::map<int,int> Map;
std::map<int,int> mp;
for (size_t i = 0; i < input.size(); ++i) {
mp[input[i]]++;
}
std::vector<int> numbers;
std::vector<int> mult;
for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
numbers.push_back(it->first);
mult.push_back(it->second);
}
std::vector<int> cur(mult.size());
for (;;) {
size_t i = 0;
while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
if (i == cur.size()) break;
cur[i]++;
std::vector<int> list1, list2;
for (size_t i = 0; i < cur.size(); ++i) {
list1.insert(list1.end(), cur[i], numbers[i]);
list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
}
if (list1.size() == 0 || list2.size() == 0) continue;
if (list1 > list2) continue;
std::cout << "{{";
for (size_t i = 0; i < list1.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list1[i];
}
std::cout << "},{";
for (size_t i = 0; i < list2.size(); ++i) {
if (i > 0) std::cout << ",";
std::cout << list2[i];
}
std::cout << "}\n";
}
}
A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...
do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
[[{[H|L1],L2},{L1, [H|L2]}] || {L1,L2} <- all(T)]).
filtered(L) ->
lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
false -> {L2,L1} end
|| {L1,L2} <- do(L)]).
in pseudocode this means that:
for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
{{H},{T}} (the first element as a single element list, and the remaining list)
also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}
My suggestion is...
First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.
Iterate through that number of combinations.
At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.
For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.
If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.
I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Resources