Related
The inputs are an array A of positive or null integers and another integer K.
We should partition A into K blocks of consecutive elements (by "partition" I mean that every element of A belongs to some block and 2 different blocks don't contain any element in common).
We define the sum of a block as sum of the elements of the block.
The goal is to find such a partition in K blocks such that the maximum of the sums of each block (let's call that "MaxSumBlock") is minimized.
We need to output the MaxSumBlock (we don't need to find an actual partition)
Here is an example:
Input:
A = {2, 1, 5, 1, 2, 2, 2}
K = 3
Expected output:
MaxSumBlock: 6
(with partition: {2, 1}, {5, 1}, {2, 2, 2})
In the expected output, the sums of each block are 3, 6 and 6. The max is 6.
Here is an non optimal partition:
partition: {2, 1}, {5}, {1, 2, 2, 2}
The sums of each block in that case are 3, 6 and 7. The max is hence 7. It is not a correct answer.
What algorithm solves this problem?
EDIT: K and the size of A is no bigger than 100'000. Each element of A is no bigger than 10'000
Use binary search.
Let max sum range from 0 to sum(array). So, mid = (range / 2). See if mid can be achieved by partitioning into k sets in O(n) time. If yes, go for lower range and if not, go for a higher range.
This will give you the result in O(n log n).
PS: if you have any problem with writing the code, I can help but I'd suggest you try it first yourself.
EDIT:
as requested, I'll explain how to find if mid can be achieved by partitioning into k sets in O(n) time.
Iterate through the elements till sum is less than or equal to mid. As soon as it gets greater than mid, let it be part of next set. If you get k or less sets, mid is achievable, else not.
Problem is simple -
Suppose I have an array of following numbers -
4,1,4,5,7,4,3,1,5
I have to find number of sets of k elements each that can be created from above numbers having largest sum. Two sets are considered to be different if they have at least one different element.
e.g.
if k = 2, then there can be two sets - {7,5} and {7,5}. Note: 5 appears twice in above array.
I think I can start with something like-
1. Sort array
2. Create two arrays. One for different number and an other in parallel for number's occurence.
But I am stuck now. Any suggestions?
The algorithm is as follows:
1) Sort elements in descending order.
2) Look at this array. It may look something like this:
a ... a b ... b c ... c d ...
| <- k -> |
Now obviously all elements a and b will be in the sets with the largest sum. You can't replace any of them with a smaller element, because then the sum wouldn't be the largest possible. So you have no choice here, you have to choose all a and b for any of the sets.
On the other hand only some of the elements c will be in those sets. So the answer is just the number of possibilities, to choose c's to fill the positions left in the sets, after you have taken all larger elements. That is the binomial coefficient:
count of c's choose (k - (count of elements larger than c))
For example for an array (already sorted here)
[9, 8, 7, 7, 5, 5, 5, 5, 4, 4, 2, 2, 1, 1, 1]
and k = 6, you must choose 9, 8 and both 7's for every set with the largest sum (which is 41). And then you can choose any two out of the four 5's. So the result will be 4 choose 2 = 6.
With the same array and k = 4, the result would be x choose 0 = 1 (that unique set is {9, 8, 7, 7}), with k = 7 the result would be 4 choose 3 = 4, and with k = 9: 2 choose 1 = 2 (choosing any 4 for the set with the largest sum).
EDIT: I edited the answer, because we figured it out that OP needs to count multisets.
First, find the largest k numbers in the array. This is of course easy, and if k is very small, you can do it in O(k) by performing k linear scans. If k is not so small, you can use a binary heap, or a priority queue or just sort the array to do that which is respectively O(n * log(k)) or O(n * log(n)) when using sorting.
Let assume that you have computed k largest numbers. Of course all sets of size k with the largest sum have to contain exactly these k largest numbers and no more other numbers. On the other hand, any different set doesn't have the largest sum.
Let count[i] be the number of occurrences of number i in the input sequence.
Let occ[i] be the number of occurrences of number i in the largest k numbers.
We can compute these both tables in very different ways, for example using a hash table or if input numbers are small, you can use an array indexed by these numbers.
Let B be the array of distinct numbers from the largest k numbers.
Let m be the size of B.
Now let's compute the answer. We will do it in m steps. After i-th step we will have computed the number of different multisets consisting of the first i numbers from B. At the beginning the result is 1 since there is only one empty multiset. In the i-th step, we will multiply the current result by the number of possible chooses of occ[B[i]] elements from count[B[i]] elements, which is equal to binomial(occ[i], count[i])
For example, let's consider your instance with added one more 7 at the end and k set to 3:
k = 3
A = [4, 1, 4, 5, 7, 4, 3, 1, 5, 7]
The largest three numbers in A are 7, 7, 5
At the beginning we have:
count[7] = 2
count[5] = 2
occ[7] = 2
occ[5] = 1
result = 1
B = [7, 5]
We start with the first element in B which is 7. Its count is 2 and its occ is also 2, so we do:
// binomial(2, 2) is 1
result = result * binomial(2, 2)
Next element in B is 5, its count is 2 and its occ is 1, so we do:
// binomial(2, 1) is 2
result = result * binomial(2, 1)
And the final result is 2, since there are two different multisets [7, 7, 5]
I'd create a sorted dictionary of the frequencies of occurrence of the numbers in the input. Then take the two largest numbers and multiply the number of times they occur.
In C++, it could look something like this:
std::vector<int> inputs { 4, 1, 4, 5, 7, 3, 1, 5};
std::map<int, int> counts;
for (auto i : inputs)
++counts[i];
auto last = counts.rbegin();
int largest_count = *last;
int second_count = *++last;
int set_count = largeest_count * second_count;
You can do the following:
1) Sort the elements in descending order;
2) define variable answer=1;
3) Start from the beginning of the array and for each new value you see, count the number of its occurrence (lets call this variable count). every time do: answer = answer * count. The pseudo-code should look like this.
find_count(Array A, K)
{
sort(A,'descending);
int answer=1;
int count=1;
for (int i=1,j=1; i<K && j<A.length;j++)
{
if(A[i] != A[i-1])
{
answer = answer *count;
i++;
count=1;
}
else
count++;
}
return answer;
}
I have a university assignment that requires me to code the Counting Sort algorithm with n threads in Java. We haven't really been given more information than that. I thought that the best way would be to partition the array into n sections, then each thread sorts a section. The problem is that I am unsure of how to partition the array properly; I have only seen examples on how to partition into 2 sections, not n sections.
I would appreciate it if someone could provide me with the logic on how to partition it like I've explained, or give some pseudo-code. No source code please, this is an assignment I have to do.
I have no problem with the actual sorting, just the partitioning.
Thanks.
Definitions
Let's say that you have an array a[0..n-1] to sort and you want to do it using k threads.
For simplicity, let's assume that the smallest element has value 0 and the largest have a value m. If the smallest is not equal 0, then you can scale the values during assigning elements to threads.
Splitting into threads
Partition your array into k chunks each consisting of at most floor(m/k) + 1 different values of elements.
The i-th chunk consists of elements a[j] such that:
(i - 1) * (floor(m/k) + 1) <= a[j] < i * (floor(m/k) + 1)
For example, if you have an array with 10 elements:
a[0..9] = {1, 2, 5, 0, 3, 7, 2, 3 ,4, 6} and k = 3, then m = 7 and the 3 chunks are:
chunk_1: elements in range [0,3) -> [1, 2, 0, 2]
chunk_2: elements in range [3,6) -> [5, 3, 3, 4]
chunk_3: elements in range [6,9) -> [6, 7]
Next, assign each chunk to a separated thread. Each thread sorts one chunk and to get the whole array sorted, just concatenate the results from all threads in order:
thread_1 thread_2 ... thread_k
Complexity:
As you know, the complexity of the count sort is O(n + L) where n is the number of elements to sort, and L is the maximum value of element.
First, notice that you can scale down values in each thread in such a way, that L < floor(m/k) + 1 in that thread, so the complexity of count sort in each thread always depends on the number of elements in that thread.
If you assume that the distribution of the values is uniform, then the expected number of elements in each thread is also floor(m/k) so the total complexity of each thread is O(m/k).
The first idea popping into my mind is to partition the array recursively. That means if you can partition into 2, you can also partition into 4 , right?
A more advanced and modern approach is to partition into many more parts than you have threads or processes. Then assign these parts dynamically to the threads.
Here is the problem, an unsorted array a[n], and I need to find the kth smallest number in range [i, j], and absolutely 1<=i<=j<=n, k<=j-i+1.
Typically I will use quick-find to do the job, but it is not fast enough if there many query requests with different range [i, j], I hardly to figure out a algorithm to do the query in O(logn) time (preprocessing is allowed).
Any idea is appreciated.
PS
Let me make the problem easier to understand. Any kinds of preprocessing is allowed, but the query needs to be done in O(logn) time. And there will be many (more than 1) queries, like find the 1st in range [3,7], or 3rd in range [10,17], or 11th in range [33, 52].
By range [i, j] I mean in the original array, not sorted or something.
For example, a[5] = {3,1,7,5,9}, query 1st in range [3,4] is 5, 2nd in range [1,3] is 5, 3rd in range [0,2] is 7.
If pre-processing is allowed and not counted towards the time complexity, just use that to construct sub-lists so that you can efficiently find the element you're looking for. As with most optimisations, this trades space for time.
Your pre-processing step is to take your original list of n numbers and create a number of new sublists.
Each of these sublists is a portion of the original, starting with the nth element, extending for m elements and then sorted. So your original list of:
{3, 1, 7, 5, 9}
gives you:
list[0][0] = {3}
list[0][1] = {1, 3}
list[0][2] = {1, 3, 7}
list[0][3] = {1, 3, 5, 7}
list[0][4] = {1, 3, 5, 7, 9}
list[1][0] = {1}
list[1][1] = {1, 7}
list[1][2] = {1, 5, 7}
list[1][3] = {1, 5, 7, 9}
list[2][0] = {7}
list[2][1] = {5, 7}
list[2][2] = {5, 7, 9}
list[3][0] = {5}
list[3][1] = {5,9}
list[4][0] = {9}
This isn't a cheap operation (in time or space) so you may want to maintain a "dirty" flag on the list so you only perform it the first time after you do an modifying operation (insert, delete, change).
In fact, you can use lazy evaluation for even more efficiency. Basically set all sublists to an empty list when you start and whenever you perform a modifying operation. Then, whenever you attempt to access a sublist and it's empty, calculate that sublist (and that one only) before trying to get the kth value out of it.
That ensures sublists are evaluated only when needed and cached to prevent unnecessary recalculation. For example, if you never ask for a value from the 3-through-6 sublist, it's never calculated.
The pseudo-code for creating all the sublists is basically (for loops inclusive at both ends):
for n = 0 to a.lastindex:
create array list[n]
for m = 0 to a.lastindex - n
create array list[n][m]
for i = 0 to m:
list[n][m][i] = a[n+i]
sort list[n][m]
The code for lazy evaluation is a little more complex (but only a little), so I won't provide pseudo-code for that.
Then, in order to find the kth smallest number in the range i through j (where i and j are the original indexes), you simply look up lists[i][j-i][k-1], a very fast O(1) operation:
+--------------------------+
| |
| v
1st in range [3,4] (values 5,9), list[3][4-3=1][1-1-0] = 5
2nd in range [1,3] (values 1,7,5), list[1][3-1=2][2-1=1] = 5
3rd in range [0,2] (values 3,1,7), list[0][2-0=2][3-1=2] = 7
| | ^ ^ ^
| | | | |
| +-------------------------+----+ |
| |
+-------------------------------------------------+
Here's some Python code which shows this in action:
orig = [3,1,7,5,9]
print orig
print "====="
list = []
for n in range (len(orig)):
list.append([])
for m in range (len(orig) - n):
list[-1].append([])
for i in range (m+1):
list[-1][-1].append(orig[n+i])
list[-1][-1] = sorted(list[-1][-1])
print "(%d,%d)=%s"%(n,m,list[-1][-1])
print "====="
# Gives xth smallest in index range y through z inclusive.
x = 1; y = 3; z = 4; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 2; y = 1; z = 3; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 3; y = 0; z = 2; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
print "====="
As expected, the output is:
[3, 1, 7, 5, 9]
=====
(0,0)=[3]
(0,1)=[1, 3]
(0,2)=[1, 3, 7]
(0,3)=[1, 3, 5, 7]
(0,4)=[1, 3, 5, 7, 9]
(1,0)=[1]
(1,1)=[1, 7]
(1,2)=[1, 5, 7]
(1,3)=[1, 5, 7, 9]
(2,0)=[7]
(2,1)=[5, 7]
(2,2)=[5, 7, 9]
(3,0)=[5]
(3,1)=[5, 9]
(4,0)=[9]
=====
(1,3,4)=5
(2,1,3)=5
(3,0,2)=7
=====
Current solution is O( (logn)^2 ). I am pretty sure it can be modified to run on O(logn). The main advantage of this algorithm over paxdiablo's algorithm is space efficiency. This algorithm needs O(nlogn) space, not O(n^2) space.
First, the complexity of finding kth smallest element from two sorted arrays of length m and n is O(logm + logn). Complexity of finding kth smallest element from arrays of lengths a,b,c,d.. is O(loga+logb+.....).
Now, sort the whole array and store it. Sort the first half and second half of the array and store it and so on. You will have 1 sorted array of length n, 2 sorted of arrays of length n/2, 4 sorted arrays of length n/4 and so on. Total memory required = 1*n+2*n/2+4*n/4+8*n/8...= nlogn.
Once you have i and j figure out the list of of subarrays which, when concatenated, give you range [i,j]. There are going to be logn number of arrays. Finding kth smallest number among them would take O( (logn)^2) time.
Example for the last paragraph:
Assume the array is of size 8 (indexed from 0 to 7). You have the following sorted lists:
A:0-7, B:0-3, C:4-7, D:0-1, E:2-3, F:4-5, G:6-7.
Now construct a tree with pointers to these arrays such that every node contains its immediate constituents. A will be root, B and C are its children and so on.
Now implement a recursive function that returns a list of arrays.
def getArrays(node, i, j):
if i==node.min and j==node.max:
return [node];
if i<=node.left.max:
if j<=node.left.max:
return [getArrays(node.left, i, j)]; # (i,j) is located within left node
else:
return [ getArrays(node.left, i, node.left.max), getArrays(node.right, node.right.min, j) ]; # (i,j) is spread over left and right node
else:
return [getArrays(node.right, i, j)]; # (i,j) is located within right node
Preprocess: Make an nxn array where the [k][r] element is the kth smallest element of the first r elements (1-indexed for convenience).
Then, given some particular range [i,j] and value for k, do the following:
Find the element at the [k][j] slot of the matrix; call this x.
go down the i-1 column of your matrix and find how many values in it are smaller than or equal to x (treat column 0 as having 0 smaller entries). By construction, this column will be sorted (all columns will be sorted), so it can be found in log time. Call this value s
Find the element in the [k+s][j] slot of the matrix. This is your answer.
E.g., given 3 1 7 5 9
3 1 1 1 1
X 3 3 3 3
X X 7 5 5
X X X 7 7
X X X X 9
Now, if we're asked for the 2nd smallest in [2,4] range (again, 1-indexing), I first find the 2nd smallest in [1,4] range which is 3. I then look at column 1 and see that there is 1 element less than or equal to 3. Finally, I find the 3rd smallest in [1,4] range at [3][5] slot which is 5, as desired.
This takes n^2 space, and log(n) lookup time.
This one does not require pre-process but is somehow slower than O(logN). It's significantly faster than a naive iterate&count, and could support dynamic modification on the sequence.
It goes like this. Suppose the length n has n=2^x for some x. Construct a segment-tree whose root node represent [0,n-1]. For each of the node, if it represent a node [a,b], b>a, let it has two child nodes each representing [a,(a+b)/2], [(a+b)/2+1,b]. (That is, do a recursive divide-by-two).
Then, on each node, maintain a separate binary search tree for the numbers within that segment. Therefore, each modification on the sequence takes O(logN)[on the segement]*O(logN)[on the BST]. Queries can be done like this, Let Q(a,b,x) be rank of x within segment [a,b]. Obviously, if Q(a,b,x) can be computed efficiently, a binary search on x can compute the answer desired effectively (with an extra O(logE) factor.
Q(a,b,x) can be computed as: find smallest number of segments that make up [a,b], which can be done in O(logN) on the segment tree. For each segment, query on the binary search tree for that segment for the number of elements less than x. Add all these numbers to get Q(a,b,x).
This should be O(logN*logE*logN). Well not exactly what you have asked for though.
In O(log n) time it's not possible to read all of the elements of the array. Since it's not sorted, and there's no other provided information, this is impossible.
There's no way you can do better than O(n) in both worst and average case. You have to look at every single element.
E.g.: Array: 4,3,0,1,5 {Assume all digits are >=0. Also each element in array correspond to a digit. i.e. each element on the array is between 0 and 9. }
In the above array, the largest number is: 5430 {using digits 5, 4, 3 and 0 from the array}
My Approach:
For divisibility by 3, we need the sum of digits to be divisible by 3.
So,
Step-1: Remove all the zeroes from the array.
Step-2: These zeroes will come at the end. {Since they dont affect the sum and we have to find the largest number}
Step-3: Find the subset of the elements of array (excluding zeroes) such that the number of digits is MAXIMUM and also that the sum of digits is MAXIMUM and the sum is divisible by 3.
STEP-4: The required digit consists of the digits in the above found set in decreasing order.
So, the main step is STEP-3 i.e. How to find the subset such that it contains MAXIMUM possible number of elements such that their sum is MAX and is divisible by 3 .
I was thinking, maybe Step-3 could be done by GREEDY CHOICE of taking all the elements and keep on removing the smallest element in the set till the sum is divisible by 3.
But i am not convinced that this GREEDY choice will work.
Please tell if my approach is correct.
If it is, then please suggest as to how to do Step-3 ?
Also, please suggest any other possible/efficient algorithm.
Observation: If you can get a number that is divisible by 3, you need to remove at most 2 numbers, to maintain optimal solution.
A simple O(n^2) solution will be to check all possibilities to remove 1 number, and if none is valid, check all pairs (There are O(n^2) of those).
EDIT:
O(n) solution: Create 3 buckets - bucket1, bucket2, bucket0. Each will denote the modulus 3 value of the numbers. Ignore bucket0 in the next algorithm.
Let the sum of the array be sum.
If sum % 3 ==0: we are done.
else if sum % 3 == 1:
if there is a number in bucket1 - chose the minimal
else: take 2 minimals from bucket 2
else if sum % 3 == 2
if there is a number in bucket2 - chose the minimal
else: take 2 minimals from bucket1
Note: You don't actually need the bucket, to achieve O(1) space - you need only the 2 minimal values from bucket1 and bucket2, since it is the only number we actually used from these buckets.
Example:
arr = { 3, 4, 0, 1, 5 }
bucket0 = {3,0} ; bucket1 = {4,1} bucket2 = { 5 }
sum = 13 ; sum %3 = 1
bucket1 is not empty - chose minimal from it (1), and remove it from the array.
result array = { 3, 4, 0, 5 }
proceed to STEP 4 "as planned"
Greedy choice definitely doesn't work: consider the set {5, 2, 1}. You'd remove the 1 first, but you should remove the 2.
I think you should work out the sum of the array modulo 3, which is either 0 (you're finished), or 1, or 2. Then you're looking to remove the minimal subset whose sum modulo 3 is 1 or 2.
I think that's fairly straightforward, so no real need for dynamic programming. Do it by removing one number with that modulus if possible, otherwise do it by removing two numbers with the other modulus. Once you know how many to remove, choose the smallest possible. You'll never need to remove three numbers.
You don't need to treat 0 specially, although if you're going to do that then you can further reduce the set under consideration in step 3 if you temporarily remove all 0, 3, 6, 9 from it.
Putting it all together, I would probably:
Sort the digits, descending.
Calculate the modulus. If 0, we're finished.
Try to remove a digit with that modulus, starting from the end. If successful, we're finished.
Remove two digits with negative-that-modulus, starting from the end. This always succeeds, so we're finished.
We might be left with an empty array (e.g. if the input is 1, 1), in which case the problem was impossible. Otherwise, the array contains the digits of our result.
Time complexity is O(n) provided that you do a counting sort in step 1. Which you certainly can since the values are digits.
What do you think about this:
first sort an array elements by value
sum up all numbers
- if sum's remainder after division by 3 is equal to 0, just return the sorted
array
- otherwise
- if sum of remainders after division by 3 of all the numbers is smaller
than the remainder of their sum, there is no solution
- otherwise
- if it's equal to 1, try to return the smallest number with remainder
equal to 1, or if no such, try two smallest with remainder equal to 2,
if no such two (I suppose it can happen), there's no solution
- if it's equal to 2, try to return the smallest number with remainder
equal to 2, or if no such, try two smallest with remainder equal to 1,
if no such two, there's no solution
first sort an array elements by remainder of division by 3 ascending
then each subset of equal remainder sort by value descending
First, this problem reduces to maximizing the number of elements selected such that their sum is divisible by 3.
Trivial: Select all numbers divisible by 3 (0,3,6,9).
Le a be the elements that leave 1 as remainder, b be the elements that leave 2 as remainder. If (|a|-|b|)%3 is 0, then select all elements from both a and b. If (|a|-|b|)%3 is 1, select all elements from b, and |a|-1 highest numbers from a. If the remainder is 2, then select all numbers from a, and |b|-1 highest numbers from b.
Once you have all the numbers, sort them in reverse order and concatenate. that is your answer.
Ultimately if n is the number of elements this algorithm returns a number that is al least n-1 digits long (except corner cases. see below).
NOTE: Take care of corner cases(i.e. what is |a|=0 or |b|=0 etc). (-1)%3 = 2 and (-2)%3 = 1 .
If m is the size of alphabet, and n is the number of elements, this my algorithm is O(m+n)
Sorting the data is unnecessary, since there are only ten different values.
Just count the number of zeroes, ones, twos etc. in O (n) if n digits are given.
Calculate the sum of all digits, check whether the remainder modulo 3 is 0, 1 or 2.
If the remainder is 1: Remove the first of the following which is possible (one of these is guaranteed to be possible): 1, 4, 7, 2+2, 2+5, 5+5, 2+8, 5+8, 8+8.
If the remainder is 2: Remove the first of the following which is possible (one of these is guaranteed to be possible): 2, 5, 8, 1+1, 1+4, 4+4, 1+7, 4+7, 7+7.
If there are no digits left then the problem cannot be solved. Otherwise, the solution is created by concatenating 9's, 8's, 7's, and so on as many as are remaining.
(Sorting n digits would take O (n log n). Unless of course you sort by counting how often each digit occurs and generating the sorted result according to these numbers).
Amit's answer has a tiny thing missing.
If bucket1 is not empty but it has a humongous value, lets say 79 and 97 and b2 is not empty as well and its 2 minimals are, say 2 and 5. Then in this case, when the modulus of the sum of all digits is 1, we should choose to remove 2 and 5 from bucket 2 instead of the minimal in bucket 1 to get the largest concatenated number.
Test case : 8 2 3 5 78 79
If we follow Amits and Steve's suggested method, largest number would be 878532 whereas the largest number possible divisble by 3 in this array is 879783
Solution would be to compare the appropriate bucket's smallest minimal with the concatenation of both the minimals of the other bucket and eliminate the smaller one.