I came across the following problem, which has been on my mind ever since:
Alice has written N consecutive, positive integers on a blackboard. E.g. "99, 100, 101, 102". Bob has erased all digits, but one, from each number, so the sequence now reads e.g. "9, 0, 0, 1". Notice that the digit he leaves over can be a different one for every integer.
Our task is, in O(N log N) time complexity, to find the smallest number that may have started the sequence. In the above example the answer would be 99. For the length 7 sequence "1, 4, 0, 5, 4, 1, 4" , the answer would be 1042. (Which yield the sequence 1042, 1043, 1044, 1045, 1046, 1047, 1048).
I can show an upper bound up of around 1234567890*N, so the output can't be of unlimited size. However I haven't been able to even find an efficient O(N^2) solution.
Any ideas?
UPDATE: For those interested, this problem appeared in the Baltic Olympiad in Informatics (BOI) 2014 (it's the task "Sequence"). Due to Codeforces user Fdg, here's an O(N log N) solution: Try every possible last digit of the starting value. Partition contiguous array elements into groups that have the digits 0 to 9 at the end (this can be inferred from the starting value's last digit). We know that all values in the same group have the same prefix, after we remove their last digit. Let's eliminate all those values where the digit in the input matches the last digit that it should have according to their position.
Now we have a slightly different subproblem: For every group of 10, we know a set of digits that appear in its prefix. We collapse this into a single array element. This generalized problem only has one tenth the size of the original problem and can be solved using the same algorithm, recursively.
We thus get the recurrence T(N) = 10 * T(N / 10) + O(N), which we solve as T(N) = O(N log N) using the master theorem.
Example:
Let's say the input is [1, 4, 0, 5, 4, 1, 4, 9, 5, 0, 1, 0]. So in the generalized form, we know the following subsets of digits for every position:
{1} {4} {0} {5} {4} {1} {4} {9} {5} {0} {1} {0}
We check the number 2 as the last digit of the starting number (of course we check all the other digits too, but this branch will turn out to contain the minimum solution). We know that the sequence of last digits goes like
2 3 4 5 6 7 8 9 0 1 2 3
So we know the groups that have the same prefix (2-9 and 0-3). We also eliminate those digits from the sets that we already know are at the correct position:
{1} {4} {0} {} {4} {1} {4} {} | {5} {0} {1} {0}
By collecting all the digits of each group, we arrive at the reduced problem
{0,1,4} {0,1,5}
Again we brute-force the second to last digit. Let's say we are checking 4. We get:
4 5
{0,1} {0,1}
Which reduces to
{0,1}
Now that we are down to only one array element, we just have to build the lexicographically smallest number out of those digits that has no leading zeroes, which is 10. So the result is 1042.
Old version
I believe one key observation here is that in a progression like this, of length N, only the last ceil(log_10(N)) digits are changed more than once. So we could brute-force the last ceil(log_10(N)) digits, the number of nines at the end of the prefix and the digit before that in O(N * log N).
So we fix the pattern
P..PX9..9S...S
where the suffix S is known, the number of nines is known, X < 9 is known, but the prefix P is not.
We can now remove those numbers from the sequence that already match one of the digits we already know appears at their respective position. We are left with a set of digits which we know comprise the prefix P. We just form the lexicographically smallest string that has no leading zeroes and contains all the digits.
The runtime is O(N^2 log N).
Related
I need to prepare an algorithm, which will display n-th digit (counting from right) of the biggest number divisible by B-1 where B is the base of specified number system. The number can only consist of digits provided in the input.
So for example: The base of number system is 3, the provided digits are [0, 1, 2] and I'm looking for 2nd digit. So that I need to find 2nd digit of the biggest number consisted of 0, 1, 2 divisible by 2. In this case the result will be 2, because the biggest number was 203.
I've tried to find this algorithm in many ways, but I cannot find any connection between input and output values.
You're asking about algorithm so basically I 'd do:
Generate biggest number from digits (it means sort them in desc order)
check if it is divisible by B-1
If yes sucess just return n-th digit if no go to step 1. but generate next number (by swaping different digits)
Second approach. Not efficient in most cases.
Generate all possible numbers
Sort in desc order
Get first divisible by B - 1 and return n-th digit
This question already has answers here:
Ranking and unranking of permutations with duplicates
(4 answers)
Closed 7 years ago.
This question has two parts, though since I'm trying to compe up with a Prolog implementation, solving one will probably immediately lead to a solution of the other one.
Given a permutation of a list of integers {1,2,...,N}, how can I tell what is the index of that permutation in lexicographic ordering?
Given a number k, how can I calculate k-th permutation of numbers {1,2...,N}?
I'm looking for an algorithm that can do this reasonably better than just iterating a next permutation function k times. Afaik it should be possible to directly compute both of these.
What I came up with so far is that by looking at numbers from the left, I can tell how many permutations were before each number at a particular index, and then somehow combine those, but I'm not really sure if this leads to a correct solution.
Think how many permutations start with the number 1, how many start with the number 2, and so on. Let's say n = 5, then 24 permutations start with 1, 24 start with 2, and so on. If you are looking for permutation say k = 53, there are 48 permutations starting with 1 or 2, so #53 is the fifth of the permutations starting with 3.
Of the permutations starting with 3, 6 each start with 31, 32, 34 or 35. So you are looking for the fifth permutation starting with (3, 1). There are two permutations each starting with 312, 314 and 315. So you are looking for the first of the two permutations starting with 315. Which is 31524.
Should be easy enough to turn this into code.
You can also have a look at the factorial number system, especially the part regarding permutations. For a given number k, you are first supposed to find its factorial representation, which then easily gives the required permutation (actually, (k+1)-st permutation).
An example for k=5 and numbers {1,2,3}:
5 = 2*2! + 1*1! + 0*0! = (210)_!
so the factorial representation of 5 is 210. Let's now map that representation into the permutation. We start with the ordered list (1,2,3). The leftmost digit in our factorial representation is 2, so we are looking for the element in the list at the index 2, which is 3 (list is zero-indexed). Now we are left with the list (1,2) and continue the procedure. The leftmost digit in our factorial representation, after removing 2, is 1, so we get the element at the index 1, which is 2. Finally, we are left with 1, so the (k+1)-st (6th) permutation of {1,2,3} is {3,2,1}.
Even though it takes some time to understand it, it is quite efficient algorithm and simple to program. The reverse mapping is similar.
I'll just give the outline of a solution for each:
Given a permutation of a list of integers {1,2,...,N}, how can I tell what is the index of that permutation in lexicographic ordering?
To do this, ask yourself how many permutations start with 1? There are (N - 1)!. Now, let's do an example:
3 1 2
How many permutations of 1 2 3 start with 1 or 2? 2*2!. This one has to be after those, so its index is at least 2*2! = 4. Now check the next element. How many permutations of 1 2 start with 0? None. You're done, the index is 4. You can add 1 if you want to use 1-based indexing.
Given a number k, how can I calculate k-th permutation of numbers {1,2...,N}?
Given 4, how can we get 3 1 2? We have to find each element.
What can we have on the first position? If we have 1, the maximum index can be 2! - 1 = 1 (I'm using zero-based indexing). If we have 2, the maximum can be 2*2! - 1 = 3. If we have 3, the maximum can be 5. So we must have 3:
3
Now, we have reduced the problem to finding the 4 - 2*2! = 0-th permutation of 1 2, which is 1 2 (you can reason about it recursively as above).
I have an exercise that needs to be done with O(n) time complexity, however, I can only solve it with an O(n^2) solution.
You have an array and you need to count the longest contiguous sequence such that it's sum can be divided to 3 without any remainder. For example for array {1,2,3,-4,-1), the function will return 4 because the longest sequence that its sum(0) can be divided to 3 is {2,3,-4,-1}.
My solution O(n^2) is based on arithmetic progression. Is there any way to do it with O(n) complexity?
Please, I only want a clue or a theoretical explanation. Please don't write the full solution :)
Let's take a look at prefix sums. A [L, R] subarray is divisble by 3 if and only if prefixSum[L - 1] mod 3 = prefixSum[R] mod 3. This observation gives a very simple linear solution(because there are only 3 possible values of a prefix sum mod 3, we can simply find the first and the last one).
For example, if the input array is {1, 2, 3, -4, -1}, the prefix sums are {0, 1, 0, 0, 2, 1}. (there are n + 1 prefix sums because of an empty prefix). Now you can just take a look at the first and last occurrence of 0, 1 and 2.
As a non-CS person, this is interesting. First approach of mine was simply to calc the running sum mod 3. You'll get a sequence of {0,1,2}. Now look for the first and the last 0, the first and the last 1 and the first and the last 2, and compare their respective distances...
Iterate through the array, summing the total as you go. Record the position of the first position where the modulo sum is 0. Also, record the position of he first position where the modulo sum is 1. And, finally, record the position of he first position where the modulo sum is 2.
Do the same thing backwards also, recording the last position where the modulo sum is 0, 1, and 2. That gives three possibilities for the longest sequence - you just check which pair are farthest apart.
You apply dynamic programming.
For every position you compute 3 values:
The longest sequence ending in that position which has sum s = 0 mod 3
The longest sequence ending in that position which has sum s = 1 mod 3
The longest sequence ending in that position which has sum s = 2 mod 3
So given this value for position i you can easily compute the new ones for position i+1.
How to find, in a binary string, the longest substring where the balance, i.e. the difference between the number of ones and zeros, is >= 0?
Example:
01110000010 -> 6: 011100
1110000011110000111 -> 19: entire string
While this problem looks very similar to the Maximum Value Contiguous Subsequence (Maximum Contiguous Sum) problem, a dynamic programming solution doesn't seem to be obvious. In a divide-and-conquer approach, how to do the merging? Is an "efficient" algorithm possible after all? (A trivial O(n^2) algorithm will just iterate over all substrings for all possible starting points.)
This is a modified variant of Finding a substring, with some additional conditions. The difference is that in the linked question, only such substrings are allowed where balance never falls below zero (looking at the string in either forward or backward direction). In the given problem, balance is allowed to fall below zero, provided it recovers at some later stage.
I have a solution that requires O(n) additional memory and O(n) time.
Let's denote the 'height' of an index h(i) as
h(i) = <number of 1s in the substring 1..i> - <number of 0s in the same substring>
The problem can now be reformulated as: find i and j such as h(i) <= h(j) and j-i -> max.
Obviously, h(0) = 0, and if h(n) = 0, then the solution is the entire string.
Now let's compute the array B so that B[x] = min{i: h(i) = -x}. In other words, let B[x] be the leftmost index i at which h(i)= -x.
The array B[x] has a length of at most n, and is computed in one linear pass.
Now we can iterate over the original string and for each index i compute the length of the longest sequence with non-negative balance that ends on i as follows:
Lmax(i) = i - B[MIN{0, h(i)}]
The largest Lmax(i) across all i will give you the desired length.
I leave the proof as an exercise :) Contact me if you can't figure it out.
Also, my algorithm needs 2 passes of the original string, but you can collapse them into one.
This can be answered quite easily in O(n) using "height array", representing the number of 1's relative to the number of 0's. Like my answer in the linked question.
Now, instead of focusing on the original array, we now focus on two arrays indexed by the heights, and one will contain the smallest index such height is found, and the other will contain the largest index such height is found. Since we don't want a negative index, we can shift everything up, such that the minimum height is 0.
So for the sample cases (I added two more 1's at the end to show my point):
1110000011010000011111
Array height visualization
/\
/ \
/ \
\ /\/\ /
\/ \ /
\ /
\ /
\/
(lowest height = -5)
Shifted height array:
[5, 6, 7, 8, 7, 6, 5, 4, 3, 4, 5, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3]
Height: 0 1 2 3 4 5 6 7 8
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view = [17,18,19,20,21,22, 5, 4, 3]
note that we have 22 numbers and 23 distinct indices, 0-22, representing the 23 spaces between and padding the numbers
We can build the first_view and last_view array in O(n).
Now, for each height in the first_view, we only need to check every larger heights in last_view, and take the index with maximum difference from the first_view index. For example, from height 0, the maximum value of index in larger heights is 22. So the longest substring starting at index 17+1 will end at index 22.
To find the maximum index on the last_view array, you can convert it to a maximum to the right in O(n):
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
And so finding answer is simply subtracting first_view from last_view_max,
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
result = [ 5, 6, 7,14,15,22, 4, 2, 0]
and taking the maximum (again in O(n)), which is 22, achieved from starting index 0 to ending index 22, i.e., the whole string. =D
Proof of correctness:
Suppose that the maximum substring starts at index i, ends at index j.
If the height at index i is the same as the height at index k<i, then k..j would be a longer substring still satisfying the requirement. Therefore it suffices to consider the first index of each height. Analogously for the last index.
Compressed quadratic runtime
We will be looking for (locally) longest substrings with balance zero, starting at the beginning. We will ignore strings of zeros. (Corner cases: All zeros -> empty string, balance never reaches zero again -> entire string.) Of these substrings with balance zero, all trailing zeros will be removed.
Denote by B a substring with balance > 0 and by Z a substring with only zeros. Each input string can be decomposed as follows (pseudo-regex notation):
B? (Z B)* Z?
Each of the Bs is a maximum feasible solution, meaning that it cannot be extended in either direction without reducing balance. However, it might be possible to collapse sequences of BZB or ZBZ if the balance is still larger than zero after collapsing.
Note that it is always possible to collapse sequences of BZBZB to a single B if the ZBZ part has balance >= 0. (Can be done in one pass in linear time.) Once all such sequences have been collapsed, the balance of each ZBZ part is below zero. Still, it is possible that there exist BZB parts with balance above zero -- even that in a BZBZB sequence with balance below zero both the leading and trailing BZB parts have balance over zero. At this point, it seems to be difficult to decide which BZB to collapse.
Still quadratic...
Anyway, with this simplified data structure one can try all Bs as starting points (possibly extending to the left if there's still balance left). Run time is still quadratic, but (in practice) with a much smaller n.
Divide and conquer
Another classic. Should run in O(n log n), but rather difficult to implement.
Idea
The longest feasible substring is either in the left half, in the right half, or it passes over the boundary. Call the algorithm for both halves. For the boundary:
Assume problem size n. For the longest feasible substring that crosses the boundary, we are going to compute the balance of the left-half part of the substring.
Determine, for each possible balance between -n/2 and n/2, in the left half, the length of the longest string that ends at the boundary and has this (or a larger) balance. (Linear time!) Do the same for the right half and the longest string that starts at the boundary. The result is two arrays of size n + 1; we reverse one of them, add them element-wise and find the maximum. (Again, linear.)
Why does it work?
A substring with balance >= 0 that crosses the boundary can have balance < 0 in either the left or the right part, if the other part compensates this. ("Borrowing" balance.) The crucial question is how much to borrow; we iterate over all potential "balance credits" and find the best trade-off.
Why is this O(n log n)?
Because merging (looking at boundary-crossing string) takes only linear time.
Why is merging O(n)?
Exercise left to the reader.
Dynamic programming -- linear run time (finally!)
inspired by this blog post. Simple and efficient, one-pass online algorithm, but takes some time to explain.
Idea
The link above shows a different problem: Maximum subsequence sum. It cannot be mapped 1:1 to the given problem, here a "state" of O(n) is needed, in contrast to O(1) for the original problem. Still, the state can be updated in O(1).
Let's rephrase the problem. We are looking for the longest substring in the input where the balance, i.e. the difference between 0's and 1's, is greater than zero.
The state is similar to my other divide-and-conquer solution: We compute, for each position i and for each possible balance b the starting position s(i, b) of the longest string with balance b or greater that ends at position i. That is, the string that starts at index s(i, b) + 1 and ends at i has balance b or greater, and there is no longer such string that ends at i.
We find the result by maximizing i - s(i, 0).
Algorithm
Of course, we do not keep all s(i, b) in memory, just those for the current i (which we iterate over the input). We start with s(0, b) := 0 for b <= 0 and := undefined for b > 0. For each i, we update with the following rule:
If 1 is read: s(i, b) := s(i - 1, b - 1).
If 0 is read: s(i, b) := s(i - 1, b + 1) if defined, s(i, 0) := i if s(i - 1, 1) undefined.
The function s (for current i) can be implemented as a pointer into an array of length 2n + 1; this pointer is moved forward or backward depending on the input. At each iteration, we note the value of s(i, 0).
How does it work?
The state function s becomes effective especially if the balance from the start to i is negative. It records the earliest start point where zero balance is reached, for all possible numbers of 1s that have not been read yet.
Why does it work?
Because the recursive definition of the state function is equivalent to its direct definition -- the starting position of the longest string with balance b or greater that ends at position i.
Why is the recursive definition correct?
Proof by induction.
E.g.: Array: 4,3,0,1,5 {Assume all digits are >=0. Also each element in array correspond to a digit. i.e. each element on the array is between 0 and 9. }
In the above array, the largest number is: 5430 {using digits 5, 4, 3 and 0 from the array}
My Approach:
For divisibility by 3, we need the sum of digits to be divisible by 3.
So,
Step-1: Remove all the zeroes from the array.
Step-2: These zeroes will come at the end. {Since they dont affect the sum and we have to find the largest number}
Step-3: Find the subset of the elements of array (excluding zeroes) such that the number of digits is MAXIMUM and also that the sum of digits is MAXIMUM and the sum is divisible by 3.
STEP-4: The required digit consists of the digits in the above found set in decreasing order.
So, the main step is STEP-3 i.e. How to find the subset such that it contains MAXIMUM possible number of elements such that their sum is MAX and is divisible by 3 .
I was thinking, maybe Step-3 could be done by GREEDY CHOICE of taking all the elements and keep on removing the smallest element in the set till the sum is divisible by 3.
But i am not convinced that this GREEDY choice will work.
Please tell if my approach is correct.
If it is, then please suggest as to how to do Step-3 ?
Also, please suggest any other possible/efficient algorithm.
Observation: If you can get a number that is divisible by 3, you need to remove at most 2 numbers, to maintain optimal solution.
A simple O(n^2) solution will be to check all possibilities to remove 1 number, and if none is valid, check all pairs (There are O(n^2) of those).
EDIT:
O(n) solution: Create 3 buckets - bucket1, bucket2, bucket0. Each will denote the modulus 3 value of the numbers. Ignore bucket0 in the next algorithm.
Let the sum of the array be sum.
If sum % 3 ==0: we are done.
else if sum % 3 == 1:
if there is a number in bucket1 - chose the minimal
else: take 2 minimals from bucket 2
else if sum % 3 == 2
if there is a number in bucket2 - chose the minimal
else: take 2 minimals from bucket1
Note: You don't actually need the bucket, to achieve O(1) space - you need only the 2 minimal values from bucket1 and bucket2, since it is the only number we actually used from these buckets.
Example:
arr = { 3, 4, 0, 1, 5 }
bucket0 = {3,0} ; bucket1 = {4,1} bucket2 = { 5 }
sum = 13 ; sum %3 = 1
bucket1 is not empty - chose minimal from it (1), and remove it from the array.
result array = { 3, 4, 0, 5 }
proceed to STEP 4 "as planned"
Greedy choice definitely doesn't work: consider the set {5, 2, 1}. You'd remove the 1 first, but you should remove the 2.
I think you should work out the sum of the array modulo 3, which is either 0 (you're finished), or 1, or 2. Then you're looking to remove the minimal subset whose sum modulo 3 is 1 or 2.
I think that's fairly straightforward, so no real need for dynamic programming. Do it by removing one number with that modulus if possible, otherwise do it by removing two numbers with the other modulus. Once you know how many to remove, choose the smallest possible. You'll never need to remove three numbers.
You don't need to treat 0 specially, although if you're going to do that then you can further reduce the set under consideration in step 3 if you temporarily remove all 0, 3, 6, 9 from it.
Putting it all together, I would probably:
Sort the digits, descending.
Calculate the modulus. If 0, we're finished.
Try to remove a digit with that modulus, starting from the end. If successful, we're finished.
Remove two digits with negative-that-modulus, starting from the end. This always succeeds, so we're finished.
We might be left with an empty array (e.g. if the input is 1, 1), in which case the problem was impossible. Otherwise, the array contains the digits of our result.
Time complexity is O(n) provided that you do a counting sort in step 1. Which you certainly can since the values are digits.
What do you think about this:
first sort an array elements by value
sum up all numbers
- if sum's remainder after division by 3 is equal to 0, just return the sorted
array
- otherwise
- if sum of remainders after division by 3 of all the numbers is smaller
than the remainder of their sum, there is no solution
- otherwise
- if it's equal to 1, try to return the smallest number with remainder
equal to 1, or if no such, try two smallest with remainder equal to 2,
if no such two (I suppose it can happen), there's no solution
- if it's equal to 2, try to return the smallest number with remainder
equal to 2, or if no such, try two smallest with remainder equal to 1,
if no such two, there's no solution
first sort an array elements by remainder of division by 3 ascending
then each subset of equal remainder sort by value descending
First, this problem reduces to maximizing the number of elements selected such that their sum is divisible by 3.
Trivial: Select all numbers divisible by 3 (0,3,6,9).
Le a be the elements that leave 1 as remainder, b be the elements that leave 2 as remainder. If (|a|-|b|)%3 is 0, then select all elements from both a and b. If (|a|-|b|)%3 is 1, select all elements from b, and |a|-1 highest numbers from a. If the remainder is 2, then select all numbers from a, and |b|-1 highest numbers from b.
Once you have all the numbers, sort them in reverse order and concatenate. that is your answer.
Ultimately if n is the number of elements this algorithm returns a number that is al least n-1 digits long (except corner cases. see below).
NOTE: Take care of corner cases(i.e. what is |a|=0 or |b|=0 etc). (-1)%3 = 2 and (-2)%3 = 1 .
If m is the size of alphabet, and n is the number of elements, this my algorithm is O(m+n)
Sorting the data is unnecessary, since there are only ten different values.
Just count the number of zeroes, ones, twos etc. in O (n) if n digits are given.
Calculate the sum of all digits, check whether the remainder modulo 3 is 0, 1 or 2.
If the remainder is 1: Remove the first of the following which is possible (one of these is guaranteed to be possible): 1, 4, 7, 2+2, 2+5, 5+5, 2+8, 5+8, 8+8.
If the remainder is 2: Remove the first of the following which is possible (one of these is guaranteed to be possible): 2, 5, 8, 1+1, 1+4, 4+4, 1+7, 4+7, 7+7.
If there are no digits left then the problem cannot be solved. Otherwise, the solution is created by concatenating 9's, 8's, 7's, and so on as many as are remaining.
(Sorting n digits would take O (n log n). Unless of course you sort by counting how often each digit occurs and generating the sorted result according to these numbers).
Amit's answer has a tiny thing missing.
If bucket1 is not empty but it has a humongous value, lets say 79 and 97 and b2 is not empty as well and its 2 minimals are, say 2 and 5. Then in this case, when the modulus of the sum of all digits is 1, we should choose to remove 2 and 5 from bucket 2 instead of the minimal in bucket 1 to get the largest concatenated number.
Test case : 8 2 3 5 78 79
If we follow Amits and Steve's suggested method, largest number would be 878532 whereas the largest number possible divisble by 3 in this array is 879783
Solution would be to compare the appropriate bucket's smallest minimal with the concatenation of both the minimals of the other bucket and eliminate the smaller one.