Algorithm for combining different age groups together based on their values - algorithm

Let's say we have an array of age groups and an array of the number of people in each age group
For example:
Ages = ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
People = (1, 10, 21, 3, 2, 1)
I want to have an algorithm that combines these age groups with the following logic if there are fewer than 5 people in each group. The algorithm that I have so far does the following:
Start from the last element (e.g., "51+") can you combine it with the next group? (here "41-50") if yes add the numbers 1+2 and combine their labels. So we get the following
Ages = ("1-13", "14-20", "21-30", "31-40", "41+")
People = (1, 10, 21, 3, 3)
Take the last one again (here is "41+"). Can you combine it with the next group (31-40)? the answer is yes so we get:
Ages = ("1-13", "14-20", "21-30", "31+")
People = (1, 10, 21, 6)
since the group 31+ now has 6 members we cannot collapse it into the next group.
we cannot collapse "21-30" into the next one "14-20" either
"14-20" also has 10 people (>5) so we don't do anything on this either
for the first one ("1-13") since we have only one person and it is the last group we combine it with the next group "14-20" and get the following
Ages = ("1-20", "21-30", "31+")
People = (11, 21, 6)
I have an implementation of this algorithm that uses many flags to keep track of whether or not any data is changed and it makes a number of passes on the two arrays to finish this task.
My question is if you know any efficient way of doing the same thing? any data structure that can help? any algorithm that can help me do the same thing without doing too much bookkeeping would be great.
Update:
A radical example would be (5,1,5)
in the first pass it becomes (5,6) [collapsing the one on the right into the one in the middle]
then we have (5,6). We cannot touch 6 since it is larger than our threshold:5. so we go to the next one (which is element on the very left 5) since it is less than or equal to 5 and since it is the last one on the left we group it with the one on its right. so we finally get (11)

Here is an OCaml solution of a left-to-right merge algorithm:
let close_group acc cur_count cur_names =
(List.rev cur_names, cur_count) :: acc
let merge_small_groups mini l =
let acc, cur_count, cur_names =
List.fold_left (
fun (acc, cur_count, cur_names) (name, count) ->
if cur_count <= mini || count <= mini then
(acc, cur_count + count, name :: cur_names)
else
(close_group acc cur_count cur_names, count, [name])
) ([], 0, []) l
in
List.rev (close_group acc cur_count cur_names)
let input = [
"1-13", 1;
"14-20", 10;
"21-30", 21;
"31-40", 3;
"41-50", 2;
"51+", 1
]
let output = merge_small_groups 5 input
(* output = [(["1-13"; "14-20"], 11); (["21-30"; "31-40"; "41-50"; "51+"], 27)] *)
As you can see, the result of merging from left to right may not be what you want.
Depending on the goal, it may make more sense to merge the pair of consecutive elements whose sum is smallest and iterate until all counts are above the minimum of 5.

Here is my scala approach.
We start with two lists:
val people = List (1, 10, 21, 3, 2, 1)
val ages = List ("1-13", "14-20", "21-30", "31-40", "41-50", "51+")
and combine them to a kind of mapping:
val agegroup = ages.zip (people)
define a method to merge two Strings, describing an (open ended) interval. The first parameter is, if any, the one with the + in "51+".
/**
combine age-strings
a+ b-c => b+
a-b c-d => c-b
*/
def merge (xs: String, ys: String) = {
val xab = xs.split ("[+-]")
val yab = ys.split ("-")
if (xs.contains ("+")) yab(0) + "+" else
yab (0) + "-" + xab (1)
}
Here is the real work:
/**
reverse the list, combine groups < threshold.
*/
def remap (map: List [(String, Int)], threshold : Int) = {
def remap (mappings: List [(String, Int)]) : List [(String, Int)] = mappings match {
case Nil => Nil
case x :: Nil => x :: Nil
case x :: y :: xs => if (x._2 > threshold) x :: remap (y :: xs) else
remap ((merge (x._1, y._1), x._2 + y._2) :: xs) }
val nearly = (remap (map.reverse)).reverse
// check for first element
if (! nearly.isEmpty && nearly.length > 1 && nearly (0)._2 < threshold) {
val a = nearly (0)
val b = nearly (1)
val rest = nearly.tail.tail
(merge (b._1, a._1), a._2 + b._2) :: rest
} else nearly
}
and invocation
println (remap (agegroup, 5))
with result:
scala> println (remap (agegroup, 5))
List((1-20,11), (21-30,21), (31+,6))
The result is a list of pairs, age-group and membercount.
I guess the main part is easy to understand: There are 3 basic cases: an empty list, which can't be grouped, a list of one group, which is the solution itself, and more than one element.
If the first element (I reverse the list in the beginning, to start with the end) is bigger than 5 (6, whatever), yield it, and procede with the rest - if not, combine it with the second, and take this combined element and call it with the rest in a recursive way.
If 2 elements get combined, the merge-method for the strings is called.
The map is remapped, after reverting it, and the result reverted again. Now the first element has to be inspected and eventually combined.
We're done.

I think a good data structure would be a linked list of pairs, where each pair contains the age span and the count. Using that, you can easily walk the list, and join two pairs in O(1).

Related

Scala best way to find pairs in a collection [duplicate]

This question already has answers here:
Composing a list of all pairs
(3 answers)
Closed 5 years ago.
I'm trying to find the most optimal way of finding pairs in a Scala collection. For example,
val list = List(1,2,3)
should produce these pairs
(1,2) (1,3) (2,1) (2,3) (3,1) (3,2)
My current implement seems quite expensive. How can I further optimize this?
val pairs = list.flatMap { currentElement =>
val clonedList: mutable.ListBuffer[Int] = list.to[ListBuffer]
val currentIndex = list.indexOf(currentElement)
val removedValue = clonedList.remove(currentIndex)
clonedList.map { y =>
(currentElement, y)
}
}
val l = Array(1,2,3,4)
val result = scala.collection.mutable.HashSet[(Int, Int)]()
for(i <- 0 until l.size) {
for(j<- (i+1) until l.size) {
result += l(i)->l(j)
result += l(j)->l(i)
}
}
Several optimizations here. First, with the second loop, we only traverse the list from the current element to the end, dividing the number of iterations by two. Then we limit the number of object creations to the minimum (Only tuples are created and added to a mutable hashset). Finally, with the hashset you handle the duplicates for free. An additional optimization would be to check if the set already contains the tuple to avoid creating an object for nothing.
For 1,000 elements, it takes less than 1s on my laptop. 7s for 10k distinct elements.
Note that recursively, you could do it this way:
def combi(s : Seq[Int]) : Seq[(Int, Int)] =
if(s.isEmpty)
Seq()
else
s.tail.flatMap(x=> Seq(s.head -> x, x -> s.head)) ++ combi(s.tail)
It takes a little bit more than 1s for 1000 elements.
Supposing that "most optimal way" could be treated differently (e.g. most of time I treat it the one which allows myself to be more productive) I suggest the following approach:
val originalList = (1 to 1000) toStream
def orderedPairs[T](list: Stream[T]) = list.combinations(2).map( p => (p(0), p(1)) ).toStream
val pairs = orderedPairs(originalList) ++ orderedPairs(originalList.reverse)
println(pairs.slice(0, 1000).toList)

What is the logic behind the algorithm

I am trying to solve a problem from codility
"Even sums"
but am unable to do so. Here is the question below.
Even sums is a game for two players. Players are given a sequence of N positive integers and take turns alternately. In each turn, a player chooses a non-empty slice (a subsequence of consecutive elements) such that the sum of values in this slice is even, then removes the slice and concatenates the remaining parts of the sequence. The first player who is unable to make a legal move loses the game.
You play this game against your opponent and you want to know if you can win, assuming both you and your opponent play optimally. You move first.
Write a function:
string solution(vector< int>& A);
that, given a zero-indexed array A consisting of N integers, returns a string of format "X,Y" where X and Y are, respectively, the first and last positions (inclusive) of the slice that you should remove on your first move in order to win, assuming you have a winning strategy. If there is more than one such winning slice, the function should return the one with the smallest value of X. If there is more than one slice with the smallest value of X, the function should return the shortest. If you do not have a winning strategy, the function should return "NO SOLUTION".
For example, given the following array:
A[0] = 4 A[1] = 5 A[2] = 3 A[3] = 7 A[4] = 2
the function should return "1,2". After removing a slice from positions 1 to 2 (with an even sum of 5 + 3 = 8), the remaining array is [4, 7, 2]. Then the opponent will be able to remove the first element (of even sum 4) or the last element (of even sum 2). Afterwards you can make a move that leaves the array containing just [7], so your opponent will not have a legal move and will lose. One of possible games is shown on the following picture
Note that removing slice "2,3" (with an even sum of 3 + 7 = 10) is also a winning move, but slice "1,2" has a smaller value of X.
For the following array:
A[0] = 2 A[ 1 ] = 5 A[2] = 4
the function should return "NO SOLUTION", since there is no strategy that guarantees you a win.
Assume that:
N is an integer within the range [1..100,000]; each element of array A is an integer within the range [1..1,000,000,000]. Complexity:
expected worst-case time complexity is O(N); expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments). Elements of input arrays can be modified.
I have found a solution online in python.
def check(start, end):
if start>end:
res = 'NO SOLUTION'
else:
res = str(start) + ',' + str(end)
return res
def trans( strr ):
if strr =='NO SOLUTION':
return (-1, -1)
else:
a, b = strr.split(',')
return ( int(a), int(b) )
def solution(A):
# write your code in Python 2.7
odd_list = [ ind for ind in range(len(A)) if A[ind]%2==1 ]
if len(odd_list)%2==0:
return check(0, len(A)-1)
odd_list = [-1] + odd_list + [len(A)]
res_cand = []
# the numbers at the either end of A are even
count = odd_list[1]
second_count = len(A)-1-odd_list[-2]
first_count = odd_list[2]-odd_list[1]-1
if second_count >= count:
res_cand.append( trans(check( odd_list[1]+1, len(A)-1-count )))
if first_count >= count:
res_cand.append( trans(check( odd_list[1]+count+1, len(A)-1 )))
twosum = first_count + second_count
if second_count < count <= twosum:
res_cand.append( trans(check( odd_list[1]+(first_count-(count-second_count))+1, odd_list[-2] )))
###########################################
count = len(A)-1-odd_list[-2]
first_count = odd_list[1]
second_count = odd_list[-2]-odd_list[-3]-1
if first_count >= count:
res_cand.append( trans(check( count, odd_list[-2]-1 )))
if second_count >= count:
res_cand.append( trans(check( 0, odd_list[-2]-count-1)) )
twosum = first_count + second_count
if second_count < count <= twosum:
res_cand.append( trans(check( count-second_count, odd_list[-3])) )
res_cand = sorted( res_cand, key=lambda x: (-x[0],-x[1]) )
cur = (-1, -2)
for item in res_cand:
if item[0]!=-1:
cur = item
return check( cur[0], cur[1] )
This code works and I am unable to understand the code and flow of one function to the the other. However I don't understand the logic of the algorithm. How it has approached the problem and solved it. This might be a long task but can anybody please care enough to explain me the algorithm. Thanks in advance.
So far I have figured out that the number of odd numbers are crucial to find out the result. Especially the index of the first odd number and the last odd number is needed to calculate the important values.
Now I need to understand the logic behind the comparison such as "if first_count >= count" and if "second_count < count <= twosum".
Update:
Hey guys I found out the solution to my question and finally understood the logic of the algorithm.
The idea lies behind the symmetry of the array. We can never win the game if the array is symmetrical. Here symmetrical is defined as the array where there is only one odd in the middle and equal number of evens on the either side of that one odd.
If there are even number of odds we can directly win the game.
If there are odd number of odds we should always try to make the array symmetrical. That is what the algorithm is trying to do.
Now there are two cases to it. Either the last odd will remain or the first odd will remain. I will be happy to explain more if you guys didn't understand it. Thanks.

Finding the lowest sum of values in a list to form a target factor

I'm stuck as to how to make an algorithm to find a combination of elements from a list where the sum of those factors is the lowest possible where the factor of those numbers is a predetermined target value.
For instance a list:
(2,5,7,6,8,2,3)
And a target value:
12
Would result in these factors:
(2,2,3) and (2,6)
But the optimal combination would be:
(2,2,3)
As it has a lower sum
First erase from the list all numbers that aren't factors of n. So in your example your list would reduce to (2, 6, 2, 3). Then I would sort the list. So you have (2, 2, 3, 6). Start multiplying the elements from the left to right if you reach n stop. If you exceed n find the next smallest permutation of your numbers and repeat. This will be (2, 2, 6, 3) (for a C++ function that finds the next permutation see this link). This will guarantee to find the multiplication with the smallest sum because the we are checking the products in order from smallest sum to largest. This runs in the size of your list factorial but I think that is as good as you're going to get. This problem sounds NP hard.
You can do slightly better by pruning the permutations. Lets say you were looking for 24 and your list is (2, 4, 8, 12). The only subset is (2, 12). But the next permutation will be (2, 4, 12, 8) which you don't even need to generate because you knew that 2*4 was too small and 2*4*8 was too big and swapping 12 with 8 only increased 2*4*8. This way you didn't have to test that permutation.
You should be able to break the problem down recursively. You have a multiset of potential factors S = {n_1, n_2, ..., n_k}. Let f(S,n) be the maximum sum n_i_1 + n_i_2 + ... + n_i_j where n_i_l are distinct elements of the multiset and n_i_1 * ... * n_i_j = n. Then f(S,n) = max_i { (n_i + f(S-{n_i},n/n_i)) where n_i divides n }. In other words, f(S,n) can be computed recursively. With a little more work you can get the algorithm to spit out the actual n_is that work. The time complexity could be bad, but you don't say what your goals are in that regard.
def primes(n):
primfac = []
d = 2
while d*d <= n:
while (n % d) == 0:
primfac.append(d) # supposing you want multiple factors repeated
n //= d
d += 1
if n > 1:
primfac.append(n)
return primfac
def get_factors_list(dividend, ceiling = float('infinity')):
""" Yield all lists of factors where the largest is no larger than ceiling """
for divisor in range(min(ceiling, dividend - 1), 1, -1):
quotient, mod = divmod(dividend, divisor)
if mod == 0:
if quotient <= divisor:
yield [divisor, quotient]
for factors in get_factors_list(quotient, divisor):
yield [divisor] + factors
def print_factors(x):
factorList = []
if x > 0:
for factors in get_factors_list(x):
factorList.append(list(map(int, factors)))
return factorList
Here's is how you could do it in Haskell:
import Data.List(sortBy, subsequences)
import Data.Function(on)
lowestSumTargetFactor :: (Ord b, Num b) => [b] -> b -> [b]
lowestSumTargetFactor xs target = do
let l = filter (/= []) $ sortBy (compare `on` sum)
[x | x <- subsequences xs, product x == target]
if l == []
then error $ "lowestSumTargetFactor: " ++
"no subsequence product equals target."
else head l
Here's what is happening:
[x | x <- subsequences xs, product x == target] builds a list made of all subsequences of the list xs whose product equals target. In your example, it would build the list [[2,6],[6,2],[2,2,3]].
Then the sortBy (compareonsum) part sorts that list of list by the sum of it's list elements. It would return the list [[2,2,3],[2,6],[6,2]].
I then filter that list, removing any [] elements because product [] returns 1 (don't know the reasoning for this, yet). This was done because lowestSumTargetFactor [1, 1, 1] 1 would return [] instead of the expected [1].
Then I ask if the list we built is []. If no, I use the function head to return the first element of that list ([2,2,3] in your case). If yes, it returns the error as written.
Obs1: where it appears above, the $ just means that everything after it is enclosed in parentheses.
Obs2: the lowestSumTargetFactor :: (Ord b, Num b) => [b] -> b -> [b] part is just the function's type signature. It means that the function takes a list made of bs, a second argument b and returns another list made of bs, b being a member of both the Ord class of totally ordered datatypes, and the Num class, the basic numeric class.
Obs3: I'm still a beginner. A more experienced programmer would probably do this much more efficiently and elegantly.

Find all possible combinations from 4 input numbers which can add up to 24

Actually, this question can be generalized as below:
Find all possible combinations from a given set of elements, which meets
a certain criteria.
So, any good algorithms?
There are only 16 possibilities (and one of those is to add together "none of them", which ain't gonna give you 24), so the old-fashioned "brute force" algorithm looks pretty good to me:
for (unsigned int choice = 1; choice < 16; ++choice) {
int sum = 0;
if (choice & 1) sum += elements[0];
if (choice & 2) sum += elements[1];
if (choice & 4) sum += elements[2];
if (choice & 8) sum += elements[3];
if (sum == 24) {
// we have a winner
}
}
In the completely general form of your problem, the only way to tell whether a combination meets "certain criteria" is to evaluate those criteria for every single combination. Given more information about the criteria, maybe you could work out some ways to avoid testing every combination and build an algorithm accordingly, but not without those details. So again, brute force is king.
There are two interesting explanations about the sum problem, both in Wikipedia and MathWorld.
In the case of the first question you asked, the first answer is good for a limited number of elements. You should realize that the reason Mr. Jessop used 16 as the boundary for his loop is because this is 2^4, where 4 is the number of elements in your set. If you had 100 elements, the loop limit would become 2^100 and your algorithm would literally take forever to finish.
In the case of a bounded sum, you should consider a depth first search, because when the sum of elements exceeds the sum you are looking for, you can prune your branch and backtrack.
In the case of the generic question, finding the subset of elements that satisfy certain criteria, this is known as the Knapsack problem, which is known to be NP-Complete. Given that, there is no algorithm that will solve it in less than exponential time.
Nevertheless, there are several heuristics that bring good results to the table, including (but not limited to) genetic algorithms (one I personally like, for I wrote a book on them) and dynamic programming. A simple search in Google will show many scientific papers that describe different solutions for this problem.
Find all possible combinations from a given set of elements, which
meets a certain criteria
If i understood you right, this code will helpful for you:
>>> from itertools import combinations as combi
>>> combi.__doc__
'combinations(iterable, r) --> combinations object\n\nReturn successive r-length
combinations of elements in the iterable.\n\ncombinations(range(4), 3) --> (0,1
,2), (0,1,3), (0,2,3), (1,2,3)'
>>> set = range(4)
>>> set
[0, 1, 2, 3]
>>> criteria = range(3)
>>> criteria
[0, 1, 2]
>>> for tuple in list(combi(set, len(criteria))):
... if cmp(list(tuple), criteria) == 0:
... print 'criteria exists in tuple: ', tuple
...
criteria exists in tuple: (0, 1, 2)
>>> list(combi(set, len(criteria)))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
Generally for a problem as this you have to try all posebilities, the thing you should do have the code abort the building of combiantion if you know it will not satesfie the criteria (if you criteria is that you do not have more then two blue balls, then you have to abort calculation that has more then two). Backtracing
def perm(set,permutation):
if lenght(set) == lenght(permutation):
print permutation
else:
for element in set:
if permutation.add(element) == criteria:
perm(sett,permutation)
else:
permutation.pop() //remove the element added in the if
The set of input numbers matters, as you can tell as soon as you allow e.g. negative numbers, imaginary numbers, rational numbers etc in your start set. You could also restrict to e.g. all even numbers, all odd number inputs etc.
That means that it's hard to build something deductive. You need brute force, a.k.a. try every combination etc.
In this particular problem you could build an algoritm that recurses - e.g. find every combination of 3 Int ( 1,22) that add up to 23, then add 1, every combination that add to 22 and add 2 etc. Which can again be broken into every combination of 2 that add up to 21 etc. You need to decide if you can count same number twice.
Once you have that you have a recursive function to call -
combinations( 24 , 4 ) = combinations( 23, 3 ) + combinations( 22, 3 ) + ... combinations( 4, 3 );
combinations( 23 , 3 ) = combinations( 22, 2 ) + ... combinations( 3, 2 );
etc
This works well except you have to be careful around repeating numbers in the recursion.
private int[][] work()
{
const int target = 24;
List<int[]> combos = new List<int[]>();
for(int i = 0; i < 9; i++)
for(int x = 0; x < 9; x++)
for(int y = 0; y < 9; y++)
for (int z = 0; z < 9; z++)
{
int res = x + y + z + i;
if (res == target)
{
combos.Add(new int[] { x, y, z, i });
}
}
return combos.ToArray();
}
It works instantly, but there probably are better methods rather than 'guess and check'. All I am doing is looping through every possibility, adding them all together, and seeing if it comes out to the target value.
If i understand your question correctly, what you are asking for is called "Permutations" or the number (N) of possible ways to arrange (X) numbers taken from a set of (Y) numbers.
N = Y! / (Y - X)!
I don't know if this will help, but this is a solution I came up with for an assignment on permutations.
You have an input of : 123 (string) using the substr functions
1) put each number of the input into an array
array[N1,N2,N3,...]
2)Create a swap function
function swap(Number A, Number B)
{
temp = Number B
Number B = Number A
Number A = temp
}
3)This algorithm uses the swap function to move the numbers around until all permutations are done.
original_string= '123'
temp_string=''
While( temp_string != original_string)
{
swap(array element[i], array element[i+1])
if (i == 1)
i == 0
temp_string = array.toString
i++
}
Hopefully you can follow my pseudo code, but this works at least for 3 digit permutations
(n X n )
built up a square matrix of nxn
and print all together its corresponding crossed values
e.g.
1 2 3 4
1 11 12 13 14
2 .. .. .. ..
3 ..
4 .. ..

Filter elements in a list by length - Ocaml

I have the following list:
["A";"AA";"ABC";"BCD";"B";"C"]
I am randomly extracting an element from the list. But the element I extract should be of size 3 only not lesser than 3.
I am trying to do this as follows:
let randomnum = (Random.int(List.length (list)));;
let rec code c =
if (String.length c) = 3 then c
else (code ((List.nth (list) (randomnum)))) ;;
print_string (code ( (List.nth (list) (randomnum)))) ;;
This works fine if randomly a string of length 3 is picked out from the list.
But the program does not terminate if a string of length < 3 is picked up.
I am trying to do a recursive call so that new code keeps getting picked up till we get one of length = 3.
I am unable to figure out why this is does not terminate. Nothing gets output by the print statement.
What you probably want to write is
let rec code list =
let n = Random.int (List.length list) in
let s = List.nth list in
if String.length s < 3 then code list else s
Note that, depending on the size of the list and the number of strings of size greater than 3, you might want to work directly on a list with only strings greater than 3:
let code list =
let list = List.filter (fun s -> String.length s >= 3) list in
match list with
| [] -> raise Not_found
| _ -> List.nth list (Random.int (List.length list))
This second function is better, as it always terminate, especially when there are no strings greater than 3.
You only pick a random number once. Say you pick 5. You just keep recursing with 5 over and over and over. You need to get a new random number.
For your code to terminate, it would be better to first filter the list for suitable elements, then take your random number:
let code list =
let suitables = List.filter (fun x -> String.length x = 3) list in
match List.length suitables with
| 0 -> raise Not_found (* no suitable elements at all! *)
| len -> List.nth suitables (Random.int len)
Otherwise your code would take very long to terminate on a large list of elements with size <> 3; or worse on a list with no element of size 3, it would not terminate at all!

Resources