Related
Give a base b and a length n, I'd like to find through all integers in base b, with leading zeroes to reach length n, that satisfy:
For digits i and j, if i < j then the count of i's is >= the count of j's.
E.g., base 3, length 4:
0000, 0001, 0010, 0011, 0012, 0021, 0100,
0101, 0102, 0110, 0120, 0201, 0210 1000,
1001, 1002, 1010, 1020, 1100, 1200, 2001,
2010, 2100
My current approach is to increment through all integers in the range in base 10, convert to base b, count digits, and reject if the digit counts fail our criterion. This is slow.
I think the language I'm using is irrelevant but if it matters, it's Rust.
This problem is equivalent to generating integer partitions of value n into b parts, then using every partition elements as counts of digits and applying permutations (last stage alike Shridhar R Kulkarni approach, but another combinatorial object is used)
For n=7 and b=4 some intermediate parition of 7 into 4 parts is [3, 2, 2, 0] that denotes digit combination [0, 0, 0, 1, 1, 2, 2], then we permute the last one in lexicographic order. partitions function provides non-increasing parts order, so if i < j then the count of i's is >= the count of j's. condition is fulfilled.
Ideone code to play with.
def next_permutation(arr):
#https://www.nayuki.io/page/next-lexicographical-permutation-algorithm
i = len(arr) - 1
while i > 0 and arr[i - 1] >= arr[i]:
i -= 1
if i <= 0:
return False
j = len(arr) - 1
while arr[j] <= arr[i - 1]:
j -= 1
arr[i - 1], arr[j] = arr[j], arr[i - 1]
arr[i : ] = arr[len(arr) - 1 : i - 1 : -1]
return True
def partitions(Sum, K, lst, Minn = 0):
if K == 0:
if Sum == 0:
#print(lst) [3, 1, 0] denotes three zeros and one 1
arr = []
for i in range(len(lst)):
if lst[i]:
arr.extend([i]*lst[i])
#transform [3, 1, 0] to [0,0,0,1]
print(arr)
while next_permutation(arr):
print(arr)
return
for i in range(Minn, min(Sum + 1, Sum + 1)):
partitions(Sum - i, K - 1, [i] + lst, i)
b = 3
n = 4
partitions(n, b, [])
result
[0, 0, 0, 0] [0, 0, 0, 1] [0, 0, 1, 0] [0, 1, 0, 0]
[1, 0, 0, 0] [0, 0, 1, 1] [0, 1, 0, 1] [0, 1, 1, 0]
[1, 0, 0, 1] [1, 0, 1, 0] [1, 1, 0, 0] [0, 0, 1, 2]
[0, 0, 2, 1] [0, 1, 0, 2] [0, 1, 2, 0] [0, 2, 0, 1]
[0, 2, 1, 0] [1, 0, 0, 2] [1, 0, 2, 0] [1, 2, 0, 0]
[2, 0, 0, 1] [2, 0, 1, 0] [2, 1, 0, 0]
This problem can be solved with dynamic programming. Here is one approach (using Python):
from functools import lru_cache
from collections import Counter
from itertools import product
def naive(base, length):
result = 0
for tup in product(range(base), repeat=length):
ctr = Counter(tup)
is_valid = all(ctr[i] >= ctr[i+1] for i in range(base))
if is_valid:
result += 1
return result
#lru_cache(None)
def binom(n, k):
# compute binomial coefficient
if n == 0:
if k == 0:
return 1
else:
return 0
return binom(n - 1, k) + binom(n - 1, k - 1)
def count_seq(base, length):
#lru_cache(None)
def count_helper(base, length, max_repeats):
if base < 0 or length < 0:
return 0
elif length == 0:
return 1
return sum(binom(length, k) * count_helper(base - 1, length - k, k)
for k in range(max_repeats+1))
return count_helper(base, length, length)
assert all(count_seq(base, length) == naive(base, length)
for base in range(7) for length in range(7))
print(count_seq(100, 60))
#21047749425803338154212116084613212619618570995864645505458031212645031666717071397
The key function is count_helper(base, length, max_repeats) that counts the number of valid sequences s.t. the most common digit does not repeat more than max_repeats times. Ignoring the base case, this function satisfies a recurrence relation:
count_helper(base, length, max_repeats) = sum(
binom(length, k) * count_helper(base - 1, length - k, k)
for k in range(max_repeats+1))
At this point, we are deciding how many copies of digit base to insert into the sequence. We can choose any number k between 0 and max_repeats inclusive. For a given, value of k, there are length choose k ways to insert the digit we are adding. Each choice of k leads to a recursive call to a subproblem where base is reduced by 1, length is reduced by k and max_repeats is set to k.
When base = 3 and length = 4, the answer would be
['0000', '0001', '0010', '0011', '0012', '0021', '0100', '0101', '0102', '0110', '0120', '0201', '0210', '1000', '1001', '1002', '1010', '1020', '1100', '1111', '1112', '1121', '1122', '1200', '1211', '1212', '1221', '2001', '2010', '2100', '2111', '2112', '2121', '2211', '2222']
We can observe that all the numbers in the answer would be permutations of ['0000', '0001', '0011', '0012', '1111', '1112', '1122', '2222']. Let us call them unique_numbers.
So, our solution is easy and simple. Generate all the unique_numbers and add their permutations to the result.
from itertools import permutations
base = 3
length = 4
unique_numbers = []
def getUniqueNumbers(curr_digit, curr_count, max_count, curr_num):
#Add the curr_num to unique_numbers
if len(curr_num) == length:
unique_numbers.append(curr_num)
return
#Try to include the curr_digit again
if curr_count + 1 <= max_count:
getUniqueNumbers(curr_digit, curr_count + 1, max_count, curr_num + str(curr_digit))
#Try to include the next digit
if curr_digit + 1 < base:
getUniqueNumbers(curr_digit+1, 1, curr_count, curr_num + str(curr_digit+1))
#Try generating unique numbers starting with every digit
for i in range(base):
getUniqueNumbers(i, 0, length, "")
result = set()
for num in unique_numbers:
permList = permutations(num)
for perm in list(permList):
result.add(''.join(perm))
print(result)
This question was asked during Microsoft interview for intern position, I have no idea how to even approach this.
Array has n positive integers, sum of all elements in the array is at most max_sum, absolute difference between any two consecutive elements in the array is at most 1.
Return maximum value of the integer at index k in array.
Input : n = 3, max_sum = 7, k = 1
Output: 3
In this case let's say array is [2,3,2]
Input: n = 4, max_sum = 6, k = 2
output = 2
In this case let's say array is [1,1,2,1]
This is a brute-force approach. A better solution would be to calculate the values, but I'll leave it to you do figure that out. This is your challenge to get the job, right?
Input: n = 7, max_sum = 34, k = 4
Set all values to 0.
// ↓ k
array = { 0, 0, 0, 0, 0, 0, 0 }, sum = 0
Since we want maximum value at k, with lowest sum, just increment the value to 1.
// ↓ k
array = { 0, 0, 0, 0, 1, 0, 0 }, sum = 1 (+1)
Since consecutive elements must be at most 1 apart, when we increment value at k again, we need to increment the neighboring values too.
// ↓ k
array = { 0, 0, 0, 1, 2, 1, 0 }, sum = 4 (+3)
Repeat, repeatedly.
// ↓ k
array = { 0, 0, 1, 2, 3, 2, 1 }, sum = 9 (+5)
array = { 0, 1, 2, 3, 4, 3, 2 }, sum = 15 (+6)
array = { 1, 2, 3, 4, 5, 4, 3 }, sum = 22 (+7)
array = { 2, 3, 4, 5, 6, 5, 4 }, sum = 29 (+7)
We can't repeat again, because we'd get sum = 29 + 7 = 36 if we did, and that would exceed max_sum = 34.
Result: Max value at k is 6.
There are many ways to distribute the remaining 5 points, to get the exact sum, but showing a solution with the exact sum isn't the goal, so we don't need to do anything about the 5 extra points.
Let's define a as average array value:
a = max_sum / n
Let's find the maximum for k = 0:
max(0) = a + n/2
In this case, all other values of array would decrease, so the last value will be
a - n/2
for k = 1 we can see that maximum will not exceed max(0)-1, so
max(1) = a + n/2 - 1
and so on until k = n/2. for k > n/2 the max value will increase up to a + n/2 at k = n-1, so we have "V"-like curve with minimum at k=n/2, equal to a.
The only thing rest is to properly process border conditions, odd or even n and so on. I hope you got the idea.
For example, here is a matrix:
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],
I want to find some rows, whose sum is equal to [4, 3, 2, 1].
The expected answer is rows: {0,1,3,4}.
Because:
[1, 0, 0, 0] + [1, 1, 0, 0] + [1, 1, 1, 0] + [1, 1, 1, 1] = [4, 3, 2, 1]
Is there some famous or related algrithoms to resolve the problem?
Thank #sascha and #N. Wouda for the comments.
To clarify it, here I provide some more details.
In my problem, the matrix will have about 50 rows and 25 columns. But echo row will just have less than 4 elements (other is zero). And every solution has 8 rows.
If I try all combinations, c(8, 50) is about 0.55 billion times of attempt. Too complex. So I want to find a more effective algrithom.
If you want to make the jump to using a solver, I'd recommend it. This is a pretty straightforward Integer Program. Below solutions use python, python's pyomo math programming package to formulate the problem, and COIN OR's cbc solver for Integer Programs and Mixed Integer Programs, which needs to be installed separately (freeware) available: https://www.coin-or.org/downloading/
Here is the an example with your data followed by an example with 100,000 rows. The example above solves instantly, the 100,000 row example takes about 2 seconds on my machine.
# row selection Integer Program
import pyomo.environ as pyo
data1 = [ [1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],]
data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
model = pyo.ConcreteModel()
# sets
model.I = pyo.Set(initialize=range(len(data1))) # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0]))) # a simple column index
# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict) # hold the sparse matrix of values
magic_sum = [4, 3, 2, 1 ]
# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j]
model.c1 = pyo.Constraint(model.J, rule=min_sum)
# objective function
# minimze the overage
def objective(model):
delta = 0
for j in model.J:
delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j]
return delta
model.OBJ = pyo.Objective(rule=objective)
model.pprint() # verify everything
solver = pyo.SolverFactory('cbc') # need to have cbc solver installed
result = solver.solve(model)
result.write() # solver details
model.row_select.display() # output
Output:
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
User time: -1.0
System time: 0.0
Wallclock time: 0.0
Termination condition: optimal
Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Black box:
Number of iterations: 0
Error rc: 0
Time: 0.01792597770690918
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
row_select : Size=5, Index=I
Key : Lower : Value : Upper : Fixed : Stale : Domain
0 : 0 : 1.0 : 1 : False : False : Boolean
1 : 0 : 1.0 : 1 : False : False : Boolean
2 : 0 : 0.0 : 1 : False : False : Boolean
3 : 0 : 1.0 : 1 : False : False : Boolean
4 : 0 : 1.0 : 1 : False : False : Boolean
A more stressful rendition with 100,000 rows:
# row selection Integer Program stress test
import pyomo.environ as pyo
import numpy as np
# make a large matrix 100,000 x 8
data1 = np.random.randint(0, 1000, size=(100_000, 8))
# inject "the right answer into 3 rows"
data1[42602] = [8, 0, 0, 0, 0, 0, 0, 0 ]
data1[3] = [0, 0, 0, 0, 4, 3, 2, 1 ]
data1[10986] = [0, 7, 6, 5, 0, 0, 0, 0 ]
data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}
model = pyo.ConcreteModel()
# sets
model.I = pyo.Set(initialize=range(len(data1))) # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0]))) # a simple column index
# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict) # hold the sparse matrix of values
magic_sum = [8, 7, 6, 5, 4, 3, 2, 1 ]
# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable
# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j]
model.c1 = pyo.Constraint(model.J, rule=min_sum)
# objective function
# minimze the overage
def objective(model):
delta = 0
for j in model.J:
delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j]
return delta
model.OBJ = pyo.Objective(rule=objective)
solver = pyo.SolverFactory('cbc')
result = solver.solve(model)
result.write()
print('\n\n======== row selections =======')
for i in model.I:
if model.row_select[i].value > 0:
print (f'row {i} selected')
Output:
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
User time: -1.0
System time: 2.18
Wallclock time: 2.61
Termination condition: optimal
Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Black box:
Number of iterations: 0
Error rc: 0
Time: 2.800779104232788
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
======== row selections =======
row 3 selected
row 10986 selected
row 42602 selected
This one picks and not picks an element (recursivly). As soon as the tree is impossible to solve (no elements left or any target value negative) it will return false. In case the sum of the target is 0 a solution is found and returned in form of the picked elements.
Feel free to add time and memory complexity in the comments. Worst case should be 2^(n+1)
Please let me know how it performs on your 8/50 data.
const elements = [
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1]
];
const target = [4, 3, 2, 1];
let iterations = 0;
console.log(iter(elements, target, [], 0));
console.log(`Iterations: ${iterations}`);
function iter(elements, target, picked, index) {
iterations++;
const sum = target.reduce(function(element, sum) {
return sum + element;
});
if (sum === 0) return picked;
if (elements.length === 0) return false;
const result = iter(
removeElement(elements, 0),
target,
picked,
index + 1
);
if (result !== false) return result;
const newTarget = matrixSubtract(target, elements[0]);
const hasNegatives = newTarget.some(function(element) {
return element < 0;
});
if (hasNegatives) return false;
return iter(
removeElement(elements, 0),
newTarget,
picked.concat(index),
index + 1
);
}
function removeElement(target, i) {
return target.slice(0, i).concat(target.slice(i + 1));
}
function matrixSubtract(minuend, subtrahend) {
let i = 0;
return minuend.map(function(element) {
return minuend[i] - subtrahend[i++]
});
}
I'm given a string which looks like this:
1011010100
And my task is to find the length of a substring which number of nulls is always <= number of ones. And this should always happen while 'scanning' substring from right to left and from left to right. So in this example, the answer would be:
10110101 => 8
I know that the complexity should be either O(n) or O(n log n), because length can reach up to 10^6
Any ideas?
The O(n) solution is quite simple actually, by building the "height array", representing the number of 1's relative to number of 0's. So a height of 2 means there are 2 more 1's than 0's. The we iterate over the height array once performing some maximality checking with some conditions.
Crucial Observation
Note that a subarray fulfilling the conditions must have its height minimum at the beginning, and maximum at the end (as relative to the subarray, not the whole array).
The height array for the sample in the question can be plotted like this, with marked answer:
v
/\/\/\
/\/ \
^
Proof:
Suppose the height is not minimum at the beginning, that means there is a point inside the subarray where the height is lower than the beginning. At this point, the number of 0 should be larger than the number of 1. Contradiction.
Suppose the height is not maximum at the end, that means there is a point in the subarray where the height is larger than the end, say at index j. Then at index j to the end there are more 0 than 1 (since the height decreases), and so when we "scan" the subarray from right to left we will find more 0 than 1 at index j. Contradiction.
Algorithm
Now the problem can be interpreted as finding the longest subarray which ends with the highest height in the subarray, while keeping the minimum to not exceed the height at the beginning. This is very similar to maximum subarray problem like mentioned by klrmlr ("contiguous subsequence of an array" is better said as "subarray"). And the idea is not keeping an O(n) state, but rather keeping the "maximum so far" and "maximum at this point".
Following that algorithm, below is the pseudocode, traversing the array once:
Procedure Balance_Left_Right
Record the lowest and highest point so far
If the height at this point is lower than the lowest point so far, then change the starting point to the index after this point
If the height at this point is higher or equal to the highest point so far, then this is a valid subarray, record the length (and start and end indices, if you like)
However we will soon see a problem (as pointed by Adam Jackson through personal communication) for this test case: 1100101, visualized as follows:
/\
/ \/\/
The correct answer is 3 (the last 101), but the above algorithm will get 2 (the first 11). This is because our answer is apparently hidden behind a "high mountain" (i.e., the lowest point in the answer is not lower than the mountain, and the highest point in the answer is not higher than the mountain).
And so we need to ensure that when we run the Procedure Balance_Left_Right (above), there is no "high mountain" hiding the answer. And so the solution is to traverse the array once from the right, try to partition the array into multiple sections where in each section, this property holds: "the number of 1's is always >= the number of 0's, as traversed from the right", and also for each section, it can't be extended to the left anymore.
Then, in each section, when traversed from the left, will have the maximum height at the end of the section, and this is the maximum. And it can be proven that with this property, the method balance_left_right will find the correct answer for this section. So, we just call our balance_left_right method on each section, and then take the maximum answer among those.
Now, you may ask, why it's sufficient to run Balance_Left_Right on each section? This is because the answer requires the property to hold from the left and from the right, and so it must lies inside one of the sections, since each of the section satisfies half of the property.
The algorithm is still O(n) because we only visit each element twice, once from the right, and once from the left.
The last test case will be partitioned as follows:
/|\ |
/ | \|/\/
** ***
where only the sections marked with asterisk (*) are taken.
So the new algorithm is as follows:
Procedure Max_Balance_Left_Right
Partition the input where the number of 1 >= number of 0 from the right (Using Balance_Left from the right, or can call it Balance_right)
Run Balance_Left_Right on each partition
Take the maximum
Here's the code in Python:
def balance_left_right(arr):
lower = 0
upper = -2**32
lower_idx = 0 # Optional
upper_idx = -1 # Optional
result = (0,0,0)
height = 0
length = 0
for idx, num in enumerate(arr):
length += 1
height += 1 if num==1 else -1
if height<lower:
lower = height # Reset the lowest
upper = height # Reset the highest
lower_idx = idx+1 # Optional, record the starting point
length = 0 # Reset the answer
if height>=upper:
upper = height
upper_idx = idx # Optional, record the end point
if length > result[0]: # Take maximum length
result = (length, lower_idx, upper_idx)
return result
def max_balance_left_right(arr):
all_partitions = []
start = 0
end = len(arr)
right_partitions = balance_left(reversed(arr[start:end]))
for right_start, right_end in right_partitions:
all_partitions.append((end-right_end, end-right_start))
result = (0,0,0)
for start, end in all_partitions:
candidate = balance_left_right(arr[start:end])
if result[0] < candidate[0]:
result = (candidate[0], candidate[1]+start, candidate[2]+start)
return result
def balance_left(arr):
lower = 0
start_idx = 0
end_idx = -1
height = 0
result = []
for idx, num in enumerate(arr):
height += 1 if num==1 else -1
if height < lower:
if end_idx != -1:
result.append((start_idx,end_idx))
lower = height
start_idx = idx+1
end_idx = -1
else:
end_idx = idx+1
if end_idx != -1:
result.append((start_idx, end_idx))
return result
test_cases = [
[1,0,1,1,0,1,0,1,0,0],
[0,0,1,0,1,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1],
[1,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,1,0],
[1,1,0,0,1,0,1,0,1,1,0,0,1,0,0],
[1,1,0,0,1,0,1],
[1,1,1,1,1,0,0,0,1,0,1,0,1,1,0,0,1,0,0,1,0,1,1]
]
for test_case in test_cases:
print 'Balance left right:'
print test_case
print balance_left_right(test_case)
print 'Max balance left right:'
print test_case
print max_balance_left_right(test_case)
print
which will print:
Balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)
Max balance left right:
[1, 0, 1, 1, 0, 1, 0, 1, 0, 0]
(8, 0, 7)
Balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)
Max balance left right:
[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
(6, 12, 17)
Balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)
Max balance left right:
[1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0]
(8, 9, 16)
Balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0]
(10, 0, 9)
Balance left right:
[1, 1, 0, 0, 1, 0, 1]
(2, 0, 1)
Max balance left right:
[1, 1, 0, 0, 1, 0, 1]
(3, 4, 6)
Balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(5, 0, 4)
Max balance left right:
[1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
(6, 8, 13)
For your eyes enjoyment, the height array for the test cases:
First:
v
/\/\/\
/\/ \
^
Second:
\
\/\/\ v
\/\/\ /\/\/\
\/ \/
^
Third:
v
/\ /\
/ \ /\/
/ \/\ /\/
\/
^
Fourth:
v
/\ /\
/ \/\/\/ \/\
^ \
Fifth:
/\ v
/ \/\/
^
Sixth:
/\ v
/ \ /\
/ \/\/\/ \/\ /
/ ^ \/\/
/
Clarification Regarding the Question
As some of the readers are confused on what exactly OP wants, although it's already stated clearly in the question, let me explain the question by some examples.
First, the task from the question:
And my task is to find the length of a substring which number of nulls is always <= number of ones. And this should always happen while 'scanning' substring from right to left and from left to right
This refers to something like "Catalan Number Ballot Problem" or "Available Change Problem". In the Wiki you can check the "monotonic path" problem, where you can map "move right" as "1" and "move up" as "0".
The problem is to find a subarray of the original array, such that, when the subarray is traversed from left-to-right and right-to-left, this property holds:
The number of 0's seen so far should not exceed the number of 1's seen so far.
For example, the string 1010 holds the property from left-to-right, because if we scan the array from left-to-right, there will always be more 1's than 0's. But the property doesn't hold from right-to-left, since the first character encountered from the right is 0, and so at the beginning we have more 0's (there is one) than 1's (there is none).
For example given by OP, we see that the answer for the string 1011010100 is the first eight characters, namely: 10110101. Why?
Ok, so when we traverse the subarray from left to right, we see that there is always more 1's than 0's. Let's check the number of 1's and 0's as we traverse the array from left-to-right:
1: num(0) = 0, num(1) = 1
0: num(0) = 1, num(1) = 1
1: num(0) = 1, num(1) = 2
1: num(0) = 1, num(1) = 3
0: num(0) = 2, num(1) = 3
1: num(0) = 2, num(1) = 4
0: num(0) = 3, num(1) = 4
1: num(0) = 3, num(1) = 5
We can see that at any point of time the number of 0's is always less than or equal to the number of 1's. That's why the property holds from left-to-right. And the same check can be done from right-to-left.
So why isn't 1011010100 and answer?
Let's see when we traverse the string right-to-left:
0: num(0) = 1, num(1) = 0
0: num(0) = 2, num(1) = 0
1: num(0) = 2, num(1) = 1
...
I didn't put the full traversal because the property has already been violated since the first step, since we have num(0) > num(1). That's why the string 1011010100 doesn't satisfy the constraints of the problem.
You can see also that my "height array" is actually the difference between the number of 1's and the number of 0's, namely: num(1) - num(0). So in order to have the property, we must have the [relative] height positive. That, can be visualized by having the height not less than the initial height.
Here goes my algorithm:
Start from right side:
1. if you find 0 increment the value of count
2. if you find 1 decrement the count
Store these values in an array i.e. v[].
e.g.
a[] = {1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1}
v[] = {0, 1, 0,-1, 0, 1, 0, 1, 2, 1, 2, 1, 0, -1}
Now the problem reduces to find indexes from V i.e. i, j such that v[i] < v[j] and i<j.
proof:
if you see here i=0 and j=11 is the possible answer and values are v[i]=0, v[j]=1.
This means that till j we have one 0 extra in the string and as the v[i]=0 that means from i to j window size the extra 0 is cancelled by putting extra 1. Hence the answer.
Hope it helps. please let me know if you have doubt. Thanks.
(Almost correct, i.e. subtly wrong) linear time solution
with two recodings of the problem (one removed later...), and a sliding window.
Encoding A
You can compress the input to yield the number of subsequent zeros or ones:
+1 -1 +2 -1 +1 -1 +1 -2
This yields encoding A and needs O(n) time.
Encoding B
Now, in encoding A, whenever you encounter two consecutive numbers that sum up to > 0, you compress further. In encoding B, the number in parentheses denotes the length of the substring:
+2(4) -1 +1 -1 +1 -2 ==> +2(6) -1 +1 -2 ==> +2(8) -2
This requires O(n), too. Here, we have the solution immediately: A string of length 8 with two more 1's than 0's. Let's try a more complicated instance (given in encoding A):
+5 -8 +4
Here, the transformation to encoding B doesn't help:
+5(5) -8 +4(4)
Consider also the following instance (encoding B):
+5(9) -6 +4(4) -6 +5(7) -6 +4(6) -6 +5(9)
This sequence will be used to demonstrate the...
Sliding window
First, determine the best solution that starts at the left:
+5 -6 +4 -6 +5 > 0 ==> 9+6+4+6+7=32
Now, extend this to find the best solution that starts at the third position (+4(4)):
+4 -6 +5 -6 +4 > 0 ==> 4+6+7+6+6=29
This solution is not better than the first we have found. Move on:
+5 -6 +4 -6 +5 > 0 ==> 7+6+6+6+9=34
This is the best solution. The algorithm can be implemented in O(n), since head and tail move only forward.
The brief description above doesn't cover all subtleties (negative number at the left in encoding B, head and tail meet, ...). Also, perhaps the recodings are unnecessary and the sliding window can be implemented directly on the 0-1 representation. However, I was able to fully understand the problem only after recoding it.
Getting rid of encoding B
Actually as kindly noted by Millie Smith, "encoding B" might be lossy, meaning that it might lead to inferior solutions in certain (yet to be identified) corner cases. But the sliding window algorithm works just as well on encoding A, so it might be even necessary to skip the conversion to encoding B. (Too lazy to rewrite the explanation of the algorithm...)
I know that A run is a sequence of adjacent repeated values , How would you write pseudo code for computing the length of the longest run in an array e.g.
5 would be the longest run in this array of integers.
1 2 4 4 3 1 2 4 3 5 5 5 5 3 6 5 5 6 3 1
Any idea would be helpful.
def longest_run(array):
result = None
prev = None
size = 0
max_size = 0
for element in array:
if (element == prev):
size += 1
if size > max_size:
result = element
max_size = size
else:
size = 0
prev = element
return result
EDIT
Wow. Just wow! This pseudocode is actually working:
>>> longest_run([1,2,4,4,3,1,2,4,3,5,5,5,5,3,6,5,5,6,3,1])
5
max_run_length = 0;
current_run_length = 0;
loop through the array storing the current index value, and the previous index's value
if the value is the same as the previous one, current_run_length++;
otherwise {
if current_run_length > max_run_length : max_run_length = current_run_length
current_run_length = 1;
}
Here a different functional approach in Python (Python looks like Pseudocode). This code works only with Python 3.3+. Otherwise you must replace "return" with "raise StopIteration".
I'm using a generator to yield a tuple with quantity of the element and the element itself. It's more universal. You can use this also for infinite sequences. If you want to get the longest repeated element from the sequence, it must be a finite sequence.
def group_same(iterable):
iterator = iter(iterable)
last = next(iterator)
counter = 1
while True:
try:
element = next(iterator)
if element is last:
counter += 1
continue
else:
yield (counter, last)
counter = 1
last = element
except StopIteration:
yield (counter, last)
return
If you have a list like this:
li = [0, 0, 2, 1, 1, 1, 1, 1, 5, 5, 6, 7, 7, 7, 12, 'Text', 'Text', 'Text2']
Then you can make a new list of it:
list(group_same(li))
Then you'll get a new list:
[(2, 0),
(1, 2),
(5, 1),
(2, 5),
(1, 6),
(3, 7),
(1, 12),
(2, 'Text'),
(1, 'Text2')]
To get longest repeated element, you can use the max function.
gen = group_same(li) # Generator, does nothing until iterating over it
grouped_elements = list(gen) # iterate over the generator until it's exhausted
longest = max(grouped_elements, key=lambda x: x[0])
Or as a one liner:
max(list(group_same(li)), key=lambda x: x[0])
The function max gives us the biggest element in a list. In this case, the list has more than one element. The argument key is just used to get the first element of the tuple as max value, but you'll still get back the tuple.
In : max(list(group_same(li)), key=lambda x: x[0])
Out: (5, 1)
The element 1 occurred 5 times repeatedly.
int main()
{
int a[20] = {1, 2, 4, 4, 3, 1, 2, 4, 3, 5, 5, 5, 5, 3, 6, 5, 5, 6, 3, 1};
int c=0;
for (int i=0;i<19;i++)
{
if (a[i] == a[i+1])
{
if (i != (i+1))
{
c++;
}
}
}
cout << c-1;
return 0;
}