Longest Common Prefix Array - algorithm

Following is the Suffix array and LCP array information for string MISSISSIPPI. I know that LCP gives information about the lenght of the longest common prefix between str[i - 1] and str[i]. How Do I get longest common prefix length between any two arbitrary suffixes of this string. For example, I want longest common prefix between MISSISSIPPI and ISSIPPI
SA LCP
12 0 $
11 0 I$
8 1 IPPI$
5 1 ISSIPPI$
2 4 ISSISSIPPI$
1 0 MISSISSIPPI$
10 0 PI$
9 1 PPI$
7 0 SIPPI$
4 2 SISSIPPI$
6 1 SSIPPI$
3 3 SSISSIPPI$

From http://en.wikipedia.org/wiki/Suffix_array, we have that "The fact that the minimum lcp value belonging to a consecutive set of sorted suffixes gives the longest common prefix among all of those suffixes can also be useful." So in your case, the LCP between MISSISSIPPI and ISSIPPI is min(4, 0) = 0.
You can find the minimum in a range in time O(1) via http://en.wikipedia.org/wiki/Range_Minimum_Query, and there is a lot of info on alternative approaches if you look at the TopCoder link there.

Longest Common Prefix on leetcode solved in dart language
class Solution {
String longestCommonPrefix(List<String> strs) {
if (strs.length == 0 || strs.isEmpty)
{return '';}
for (int i = 0; i < strs[0].length; i++) {
String c = strs[0][i];
for (int j = 1; j < strs.length; j++) {
if (i == strs[j].length || strs[j][i] != c)
return strs[0].substring(0, i);
}
}
return strs[0];
}
}

Javascript Solution to the Longest Common Prefix Problem
const longestPrefix = arr => {
if (arr.length === 0) {
return "";
}
if (arr.length === 1) {
return arr[0];
}
let end = 0;
let check = false
for (let j = 0; j < arr[0].length; j++){
for (let i = 1; i < arr.length; i++) {
if (arr[0][j] !== arr[i][j]) {
check = true;
break;
}
}
if (check) {
break;
}
end++;
}
return (arr[0].slice(0, end))
}
Test Input
console.log(longestPrefix(["Jabine", "Jabinder", "Jabbong"]))
Output
Jab

Related

Codility PermMissingElem

My solution scored only 40% correctness on Codility.
What am I doing wrong?
Here is the test result (https://codility.com/demo/results/trainingU7KSSG-YNX/)
Problem:
A zero-indexed array A consisting of N different integers is given. The array contains integers in the range [1..(N + 1)], which means that exactly one element is missing.
Your goal is to find that missing element.
Solution:
function solution(A) {
var output = 1;
var arrayLength = A.length;
if(!arrayLength){
return output;
}
if(arrayLength == 1) {
return A[0] + 1;
}
var sorted = A.sort(sortingFn);
for(var i = 0; i < A.length - 1; i++) {
if(A[i+1] - A[i] > 1) {
output = A[i] + 1;
break;
}
}
return output;
}
function sortingFn(a, b) {
return a - b;
}
Result
Your algorithm find the missing element by comparing neighboring elements in the array. This means it is incapable of handling cases where the first or last element is missing, as these only have a single neighbor.
Consider as an example [1, 2, 3]. The missing element would be 4. But since 4 has precisely one neighbor (3), it can't be found by the algorithm, and 1 will be returned.
In addition your algorithm is rather inefficient, as sorting takes O(n lg n), while the problem is solveable in O(n):
find_missing(arr):
s = sum(arr)
s' = (len(arr) + 1) * (len(arr) + 2) / 2
return s' - s
This code works by summing up all elements in the array and comparing it to expected sum, if all elements were present. The advantage of this approach is that it only requires linear operations and will find the missing element with relative simplicity.
Try this in c#:
using System;
using System.Linq;
private static int PermMissingElem(int[] A)
{
if (!A.Any() || !A.Any(x => x == 1)) { return 1; }
var size = A.Length;
var numberTwoList = Enumerable.Range(1, size);
var failNumber = numberTwoList.Except(A);
if (!failNumber.Any()) { return A.Max() + 1; }
return failNumber.FirstOrDefault();
}
Well, when the last element is missing, you obviously return 1, since your if statement's condition is always false. Same for first element.
Take as example this input:
1 2 3 4 5
the difference will be always 1, but element 6 is missing.
The reason for this incapability of your algorithm to catch these cases, is that it examines neighboring elements (A[i + 1] and A[i]).
JS solution #1
function solution(A) {
if (A.length === 1) {
return A[0] > 1 ? 1 : 2;
}
const map = {};
let max = 0;
for (let i = 0, len = A.length; i < len; i++) {
map[A[i]] = A[i];
if (A[i] > max) {
max = A[i]
}
}
for (let i = 0, len = A.length; i < len; i++) {
if (!map[i + 1]) {
return i + 1;
}
}
return max + 1
}
JS solution #2
function solution(A) {
const s = A.reduce((a, b) => {return a + b}, 0);
const s2 = (A.length + 1) * (A.length + 2) / 2
return s2 - s;
}
try this arrays (like the end test):
[1,2,3] -> must return 4;
[1] -> must return 2;
[2] -> must return 1;
[2,3] -> must return 1;
[1, 3] -> 2
But for #2 solution [4] returns -1 and for [123] returns -120. The test will show 100 points. But actually, it doesn't work as expected on my opinion.
Both solutions work with the same performance.
Try this javascript function:
function solution(A) {
let givenSum = 0;
let expectedSum = 0;
let size = A.length;
for(let i = 1; i <= size +1; i++){
expectedSum = expectedSum + i;
}
for(let i = 0; i < size; i++){
givenSum += A[i];
}
return expectedSum - givenSum;
}
here is my solution:
https://app.codility.com/demo/results/trainingMZWVVT-55Y/
function solution(A) {
A = A.sort((a,b)=>a-b)
if(A[0]!==1) return 1
for(let i = 0; i < A.length; i++)
{
if(A[i+1]-A[i]!==1) return A[i] + 1
}
return A[A.length] + 1
}
Tested on Codility with 100% score see here
The solution i implemented is using set difference. since the question guaranties exactly one element is missing.
def solution(A):
# write your code in Python 3.6
N = len(A)
difference = set(range(1, N+2)) - set(A)
return difference.pop()

Find the length of the longest valid parenthesis sequence in a string, in O(n) time

My friend ran into a question in an interview and he was told that there is an O(n) solution. However, neither of us can think it up. Here is the question:
There is a string which contains just ( and ), find the length of the longest valid parentheses substring, which should be well formed.
For example ")()())", the longest valid parentheses is ()() and the length is 4.
I figured it out with dynamic programming, but it is not O(n). Any ideas?
public int getLongestLen(String s) {
if (s == null || s.length() == 0)
return 0;
int len = s.length(), maxLen = 0;
boolean[][] isValid = new boolean[len][len];
for (int l = 2; l < len; l *= 2)
for (int start = 0; start <= len - l; start++) {
if ((s.charAt(start) == '(' && s.charAt(start + l - 1) == ')') &&
(l == 2 || isValid[start+1][start+l-2])) {
isValid[start][start+l-1] = true;
maxLen = Math.max(maxLen, l);
}
}
return maxLen;
}
I did this question before, and it is not easy to come up with O(n) solution under pressure. Here is it, which is solved with stack.
private int getLongestLenByStack(String s) {
//use last to store the last matched index
int len = s.length(), maxLen = 0, last = -1;
if (len == 0 || len == 1)
return 0;
//use this stack to store the index of '('
Stack<Integer> stack = new Stack<Integer>();
for (int i = 0; i < len; i++) {
if (s.charAt(i) == '(')
stack.push(i);
else {
//if stack is empty, it means that we already found a complete valid combo
//update the last index.
if (stack.isEmpty()) {
last = i;
} else {
stack.pop();
//found a complete valid combo and calculate max length
if (stack.isEmpty())
maxLen = Math.max(maxLen, i - last);
else
//calculate current max length
maxLen = Math.max(maxLen, i - stack.peek());
}
}
}
return maxLen;
}
We need to store indexes of previously starting brackets in a stack.
We push the first element of stack as a special element as "-1" or any other number which will not occur in the indexes.
Now we traverse through the string, when we encounter "(" braces we push them, else when we encounter ")" we first pop them and
If stack is not empty, we find length of maximum valid substring till that point by taking maximum of result(initialised as zero) and the difference between current index and index at top of the stack.
Else if stack is empty we push the index.
int result=0;
stack<int> s1;
s1.push(-1);
for(int i=0;i<s.size();++i)
{
if(s[i]=='(')
s1.push(i);
else if(s[i]==')')
{
s1.pop();
if(!s1.empty())
result=max(result,i-s1.top());
else
s1.push(i);
}
}
cout<<result<<endl;
Here 's' is the string and 's1' is the stack.
You can increment/decrement an int variable for each open-parenthesis/close-parenthesis respectively. Keep track of the number of such valid operations (where the variable doesn't go below 0) as the current length, and keep track of the longest-such as the max.
public int getLongestLen(String s) {
if (s == null || s.length() == 0) {
return 0;
}
int stack = 0;
int counter = 0;
int max = 0;
for (Character c: s.toCharArray()) {
if (c == '(') {
stack++;
}
if (c == ')') {
stack--;
}
if (stack >= 0) {
counter++;
}
if (stack < 0) {
counter = 0;
stack = 0;
}
if (counter > max && stack == 0) {
max = counter;
}
}
return max;
}
ALGORITHM: Entire code on GitHub
1. Add to stack
1.1 initialize with -1,handle )) without ((
2. When you see ) pop from stack
2.a if stack size == 0 (no match), push current index values
2.b if stack size > 0 (match), get max length by subtracting index of value at top from current index (totally wicked!)
def longestMatchingParenthesis(a):
pstack = [] #index position of left parenthesis
pstack.append(-1) #default value; handles ) without ( and when match adds up to 2!
stack_size = 1
result = 0
for i in range(0,len(a)):
if a[i] == '(':
pstack.append(i) #Append current index
stack_size += 1
else: # handle )
pstack.pop()
stack_size -= 1
#determine length of longest match!
if stack_size > 0:
#difference of current index - index at top of the stack (yet to be matched)
result = max(result, i - pstack[-1])
else:
#stack size == 0, append current index
pstack.append(i)
stack_size += 1
return result
a = ["()()()", "", "((((", "(((()", "(((())(", "()(()" ,"()(())"]
for x in a:
print("%s = %s" % (x,longestMatchingParenthesis(x)))
#output
()()() = 6
= 0
(((( = 0
(((() = 2
(((())( = 4
()(() = 2
()(()) = 6
O(n) can be achieved without the conventional use of stacks if you are open to a dynamic approach of finding a valid element and then trying to increase its size by checking the adjoining elements .
Firstly we find a single '()'
Then we try to find a longer string including this :
The possibilities are:
('()') where we check an index before and an index after
'()'() where we check the next valid unit so that we don't repeat it in the search.
Next we update the start and end indices of the current check in each loop
At the end of the valid string ,check the current counter with the maximum length till now and update if necessary.
Link to code in Python on GitHub Click Here.
just came up with the solution, do comment if there is anything wrong
count = 0 //stores the number of longest valid paranthesis
empty stack s
arr[]; //contains the string which has the input, something like ())(()(
while(i<sizeof(arr))
{
if(a[i] == '(' )
{
if(top == ')' ) //top of a stack,
{
count = 0;
push a[i] in stack;
}
}
else
{
if(top == '(' )
{
count+=2;
pop from stack;
}
else
{
push a[i] in stack;
}
}
}
print count
The solution below has O(n) time complexity, and O(1) space complexity.
It is very intuitive. We first traverse the string from left to right, looking for the longest valid substring of parens, using the 'count' method that is normally used to check the validity of parens. While doing this, we also record the maximum length of such a substring, if found. Then, we do the same while going from right to left.
The algorithm would be as follows:
// Initialize variables
1. count = 0, len = 0, max_len_so_far = 0
// Check for longest valid string of parens while traversing from left to right
2. iterate over input string from left to right:
- len += 1
- if next character is '(',
count += 1
- if next character is ')',
count -= 1
- if (count == 0 and len > max_len_so_far),
max_len_so_far = len
- if (count < 0),
len = 0, count = 0
// Set count and len to zero again, but leave max_len_so_far untouched
3. count = 0, len = 0
// Now do a very similar thing while traversing from right to left
// (Though do observe the switched '(' and ')' in the code below)
4. iterate over input string from right to left:
- len += 1
- if next character is ')',
count += 1
- if next character is '(',
count -= 1
- if (count == 0 and len > max_len_so_far),
max_len_so_far = len
- if (count < 0),
len = 0, count = 0
// max_len_so_far is now our required answer
5. Finally,
return max_len_so_far
As an example, consider the string
"((())"
Suppose this string is zero-indexed.
We first go left to right.
So, at index 0, count would be 1, then 2 at index 1, 3 at index 2, 2 at index 3, and and 1 at index 4. In this step, max_len wouldn't even change, because count is never 0 again.
Then we go right to left.
At index 4, count is 1, then 2 at index 3, then 1 at index 2, then 0 at index 1. At this point, len is 4 and max_len_so_far=0, so we set max_len = 4.
Then, at index 0, count is 1.
At this point, we stop and return 4, which is indeed the correct answer.
A proof of correctness is left as an inclusive exercise to the reader.
NOTE: This algorithm could also be very simply tweaked to return the longest valid substring of parentheses itself, rather than just its length.
public static void main(String[] args) {
String s="))((())";
String finalString="";
for(int i=0;i<s.length();i++){
if (s.charAt(i) == '('&& s.charAt(i+1) == ')') {
String ss= s.substring(i, i+2);
finalString=finalString+ss;
// System.out.println(ss);
}
}
System.out.println(finalString.length());
}
Using Dynamic Programing to Store and re-use already computed results
def longest_valid_paranthesis(str):
l = len(str)
dp = [0]*len(str)
for i in range(l):
if str[i] == '(':
dp[i] = 0
elif str[i-1] == '(':
dp[i] = dp[i-2] + 2
elif str[i - dp[i-1] - 1] == '(':
dp[i] = dp[i-1] + 2 + dp[i - (dp[i-1] + 2)]
static int LongestvalidParentheses()
{
int cnt = 0;
string str = "()()))()";
char f = '(';
char s = ')';
var chararr = str.ToCharArray();
for (int i = 0; i < str.Length - 1; i++)
{
if (chararr[i] == f)
{
if (chararr[i + 1] == s)
{
cnt++;
}
}
}
return cnt;
}

Solve number of substrings having two unique characters in O(n)

I'm working on a series of substring problem:
Given a string:
Find the substring containing only two unique characters that has maximum length.
Find the number of all substrings containing AT MOST two unique characters.
Find the number of all substrings containing two unique characters.
Seems like problem 1 and 2 has O(n) solution. However I cannot think of a O(n) solution for problem 3.(Here is the solution for problem 2 and here is for problem 1.).
So I would like to know does a O(n) solution for problem 3 exist or not?
Adding sample input/output for problem 3:
Given: abbac
Return: 6
Because there are 6 substring containing two unique chars:
ab,abb,abba,bba,ba,ac
Find the number of all substrings containing two unique characters.
Edit : I misread the question. This solution finds unique substrings with at least 2 unique characters
The number of substrings for a given word whose length is len is given by len * (len + 1) / 2
sum = len * (len + 1) / 2
We are looking for substrings whose length is greater than 1. The above formula includes substrings which are of length 1. We need to substract those substrings.
So the total number of 2 letter substrings now is len * (len + 1) / 2 - l.
sum = `len * (len + 1) / 2 - l`
Find the longest consecutive run of characters which are alike. Apply step 1 and 2.
Subtract this current sum from the sum as obtained from step 2.
Sample implementation follows.
public static int allUniq2Substrings(char s[]) {
int sum = s.length * (s.length + 1) / 2 - s.length;
int sameRun = 0;
for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
if (s[i] != prev) {
sum -= sameRun * (sameRun + 1) / 2 - sameRun;
sameRun = 1;
} else {
sameRun++;
}
}
return sum - (sameRun * (sameRun + 1) / 2 - sameRun);
}
allUniq2Substrings("aaac".toCharArray());
3
allUniq2Substrings("aabc".toCharArray());
5
allUniq2Substrings("aaa".toCharArray());
0
allUniq2Substrings("abcd".toCharArray());
6
Edit
Let me try this again. I use the above 3 invariants.
This is a subproblem of finding all substrings which contain at least 2 unique characters.
I have a method posted above which gives me unique substrings for any length. I will use it to generate substrings from a set which contains at 2 unique characters.
We only need to keep track of the longest consequent run of characters whose set length is 2. ie Any permutation of 2 unique characters. The sum of such runs gives us the total number of desired substrings.
public static int allUniq2Substrings(char s[]) {
int sum = s.length * (s.length + 1) / 2 - s.length;
int sameRun = 0;
for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
if (s[i] != prev) {
sum -= sameRun * (sameRun + 1) / 2 - sameRun;
sameRun = 1;
} else {
sameRun++;
}
}
return sum - (sameRun * (sameRun + 1) / 2 - sameRun);
}
public static int uniq2substring(char s[]) {
int last = 0, secondLast = 0;
int sum = 0;
for (int i = 1; i < s.length; i++) {
if (s[i] != s[i - 1]) {
last = i;
break;
}
}
boolean OneTwo = false;
int oneTwoIdx = -1; //alternating pattern
for (int i = last + 1; i < s.length; ++i) {
if (s[secondLast] != s[i] && s[last] != s[i]) { //detected more than 2 uniq chars
sum += allUniq2Substrings(Arrays.copyOfRange(s, secondLast, i));
secondLast = last;
last = i;
if (OneTwo) {
secondLast = oneTwoIdx;
}
OneTwo = false;
} else if (s[i] != last) { //alternating pattern detected a*b*a
OneTwo = true;
oneTwoIdx = i;
}
}
return sum + allUniq2Substrings(Arrays.copyOfRange(s, secondLast, s.length));
}
uniq2substring("abaac".toCharArray())
6
uniq2substring("aab".toCharArray())
2
uniq2substring("aabb".toCharArray())
4
uniq2substring("ab".toCharArray())
1
I think the link posted by you for the solution of the problem 2
http://coders-stop.blogspot.in/2012/09/directi-online-test-number-of.html
can we very easily be modelled for the solution of the third problem as well.
Just modify the driver program as under
int numberOfSubstrings ( string A ) {
int len = A.length();
int res = 0, j = 1, c = 1, a[2][2];
a[0][0] = A[0]; a[0][1] = 1;
for(int i=0;i<len;i++) {
>>int start = -1;
for (;j<len; j++) {
c = isInArray(a, c, A[j]);
>> if (c == 2 && start != - 1) start = j;
if(c == -1) break;
}
>>c = removeFromArray(a,A[i]);
res = (res + j - start);
}
return res;
}
The complete explanation on the derivation can be found in the link itself :)

Word wrap to X lines instead of maximum width (Least raggedness)

Does anyone know a good algorithm to word wrap an input string to a specified number of lines rather than a set width. Basically to achieve the minimum width for X lines.
e.g. "I would like to be wrapped into two lines"
goes to
"I would like to be
wrapped into two lines"
"I would like to be wrapped into three lines"
goes to
"I would like to
be wrapped into
three lines"
Inserting new lines as required. I can find other word wrap questions but they all have a known width and want to insert as many lines as needed to fit that width. I am after the opposite.
Answers preferable in a .NET language but any language would be helpful. Obviously if there is a framework way to do this I am not aware of let me know.
Edit I have found this since which I think the accepted answer is the solution to my problem but am having difficulty understanding it. Algorithm to divide text into 3 evenly-sized groups any chance someone could convert it to c# or vb.net.
A way of solvng this problem would be using dynamic programming, You can solve this problem using dynamic programming, cf Minimum raggedness algorithm.
I used some of the informations you add when you eddited your post with :
Algorithm to divide text into 3 evenly-sized groups
Notations:
Let name your text document="word1 word2 .... wordp"
n= number of line required
LineWidth=len(document)/n
Cost function:
First you need to define a cost function of having word[i] to word[j] in the same line , you can take the same as the one as the one on wikipedia, with p=2 for example:
It represent the distance between the objective length of a line and the actual lenght.
The total cost function for the optimal solution can be defined with the following recursiion relation:
Solving the problem:
You can solve this problem using dynamic programming.
I took the code from the link you gave, and changed it a so you see what the program is using.
At stage k you add words to line k.
Then you look at the optimal cost of
having word i to j at line k.
Once you've gone from line 1 to n,
you tacke the smallest cost in the
last step and you have your optimal
result:
Here is the result from the code:
D=minragged('Just testing to see how this works.')
number of words: 7
------------------------------------
stage : 0
------------------------------------
word i to j in line 0 TotalCost (f(j))
------------------------------------
i= 0 j= 0 121.0
i= 0 j= 1 49.0
i= 0 j= 2 1.0
i= 0 j= 3 16.0
i= 0 j= 4 64.0
i= 0 j= 5 144.0
i= 0 j= 6 289.0
i= 0 j= 7 576.0
------------------------------------
stage : 1
------------------------------------
word i to j in line 1 TotalCost (f(j))
------------------------------------
i= 0 j= 0 242.0
i= 0 j= 1 170.0
i= 0 j= 2 122.0
i= 0 j= 3 137.0
i= 0 j= 4 185.0
i= 0 j= 5 265.0
i= 0 j= 6 410.0
i= 0 j= 7 697.0
i= 1 j= 2 65.0
i= 1 j= 3 50.0
i= 1 j= 4 58.0
i= 1 j= 5 98.0
i= 1 j= 6 193.0
i= 1 j= 7 410.0
i= 2 j= 4 26.0
i= 2 j= 5 2.0
i= 2 j= 6 17.0
i= 2 j= 7 122.0
i= 3 j= 7 80.0
------------------------------------
stage : 2
------------------------------------
word i to j in line 2 TotalCost (f(j))
------------------------------------
i= 0 j= 7 818.0
i= 1 j= 7 531.0
i= 2 j= 7 186.0
i= 3 j= 7 114.0
i= 4 j= 7 42.0
i= 5 j= 7 2.0
reversing list
------------------------------------
Just testing 12
to see how 10
this works. 11
*There fore the best choice is to have words 5 to 7 in last line.(cf
stage2)
then words 2 to 5 in second line (cf
stage1)
then words 0 to 2 in first line (cf
stage 0).*
Reverse this and you get:
Just testing 12
to see how 10
this works. 11
Here is the code to print the reasonning,(in python sorry I don't use C#...but I someone actually translated the code in C#) :
def minragged(text, n=3):
P=2
words = text.split()
cumwordwidth = [0]
# cumwordwidth[-1] is the last element
for word in words:
cumwordwidth.append(cumwordwidth[-1] + len(word))
totalwidth = cumwordwidth[-1] + len(words) - 1 # len(words) - 1 spaces
linewidth = float(totalwidth - (n - 1)) / float(n) # n - 1 line breaks
print "number of words:", len(words)
def cost(i, j):
"""
cost of a line words[i], ..., words[j - 1] (words[i:j])
"""
actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
return (linewidth - float(actuallinewidth)) ** P
"""
printing the reasoning and reversing the return list
"""
F={} # Total cost function
for stage in range(n):
print "------------------------------------"
print "stage :",stage
print "------------------------------------"
print "word i to j in line",stage,"\t\tTotalCost (f(j))"
print "------------------------------------"
if stage==0:
F[stage]=[]
i=0
for j in range(i,len(words)+1):
print "i=",i,"j=",j,"\t\t\t",cost(i,j)
F[stage].append([cost(i,j),0])
elif stage==(n-1):
F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
for i in range(len(words)+1):
j=len(words)
if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
F[stage][j][1]=i
print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]
else:
F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
for i in range(len(words)+1):
for j in range(i,len(words)+1):
if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
F[stage][j][1]=i
print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]
print 'reversing list'
print "------------------------------------"
listWords=[]
a=len(words)
for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
listWords.append(' '.join(words[F[k][a][1]:a]))
a=F[k][a][1]
listWords.append(' '.join(words[0:a]))
listWords.reverse()
for line in listWords:
print line, '\t\t',len(line)
return listWords
Here is the accepted solution from Algorithm to divide text into 3 evenly-sized groups converted to C#:
static List<string> Minragged(string text, int n = 3)
{
var words = text.Split();
var cumwordwidth = new List<int>();
cumwordwidth.Add(0);
foreach (var word in words)
cumwordwidth.Add(cumwordwidth[cumwordwidth.Count - 1] + word.Length);
var totalwidth = cumwordwidth[cumwordwidth.Count - 1] + words.Length - 1;
var linewidth = (double)(totalwidth - (n - 1)) / n;
var cost = new Func<int, int, double>((i, j) =>
{
var actuallinewidth = Math.Max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
});
var best = new List<List<Tuple<double, int>>>();
var tmp = new List<Tuple<double, int>>();
best.Add(tmp);
tmp.Add(new Tuple<double, int>(0.0f, -1));
foreach (var word in words)
tmp.Add(new Tuple<double, int>(double.MaxValue, -1));
for (int l = 1; l < n + 1; ++l)
{
tmp = new List<Tuple<double, int>>();
best.Add(tmp);
for (int j = 0; j < words.Length + 1; ++j)
{
var min = new Tuple<double, int>(best[l - 1][0].Item1 + cost(0, j), 0);
for (int k = 0; k < j + 1; ++k)
{
var loc = best[l - 1][k].Item1 + cost(k, j);
if (loc < min.Item1 || (loc == min.Item1 && k < min.Item2))
min = new Tuple<double, int>(loc, k);
}
tmp.Add(min);
}
}
var lines = new List<string>();
var b = words.Length;
for (int l = n; l > 0; --l)
{
var a = best[l][b].Item2;
lines.Add(string.Join(" ", words, a, b - a));
b = a;
}
lines.Reverse();
return lines;
}
There was a discussion about this exact problem (though it was phrased in a different way) at http://www.perlmonks.org/?node_id=180276.
In the end the best solution was to do a binary search through all possible widths to find the smallest width that wound up with no more than the desired number of columns. If there are n items and the average width is m, then you'll need O(log(n) + log(m)) passes to find the right width, each of which takes O(n) time, for O(n * (log(n) + log(m))). This is probably fast enough with no more need to be clever.
If you wish to be clever, you can create an array of word counts, and cumulative lengths of the words. Then use binary searches on this data structure to figure out where the line breaks are. Creating this data structure is O(n), and it makes all of the passes to figure out the right width be O(log(n) * (log(n) + log(m))) which for reasonable lengths of words is dominated by your first O(n) pass.
If the widths of words can be floating point, you'll need to do something more clever with the binary searches, but you are unlikely to need that particular optimization.
btilly has the right answer here, but just for fun I decided to code up a solution in python:
def wrap_min_width(words, n):
r, l = [], ""
for w in words:
if len(w) + len(l) > n:
r, l = r + [l], ""
l += (" " if len(l) > 0 else "") + w
return r + [l]
def min_lines(phrase, lines):
words = phrase.split(" ")
hi, lo = sum([ len(w) for w in words ]), min([len(w) for w in words])
while lo < hi:
mid = lo + (hi-lo)/2
v = wrap_min_width(words, mid)
if len(v) > lines:
lo = mid + 1
elif len(v) <= lines:
hi = mid
return lo, "\n".join(wrap_min_width(words, lo))
Now this still may not be exactly what you want, since if it is possible to wrap the words in fewer than n lines using the same line width, it instead returns the smallest number of lines encoding. (Of course you can always add extra empty lines, but it is a bit silly.) If I run it on your test case, here is what I get:
Case: "I would like to be wrapped into three lines", 3 lines
Result: 14 chars/line
I would like to
be wrapped into
three lines
I just thought of an approach:
You can write a function accepting two parameters 1. String 2. Number of lines
Get the length of the string (String.length if using C#).
Divide the length by number of lines (lets say the result is n)
Now start a loop and access each character of the string (using string[i])
Insert a '\n\r' after every nth occurrence in the array of characters.
In the loop maintain a temp string array which would be null if there is a blank character(maintaining each word).
If there is a nth occurrence and temp string is not null then insert '\n\r' after that temp string.
I'll assume you're trying to minimize the maximum width of a string with n breaks. This can be done in O(words(str)*n) time and space using dynamic programming or recursion with memoziation.
The recurrence would look like this where the word has been split in to words
def wordwrap(remaining_words, n):
if n > 0 and len(remaining_words)==0:
return INFINITY #we havent chopped enough lines
if n == 0:
return len(remaining_words.join(' ')) # rest of the string
best = INFINITY
for i in range remaining_words:
# split here
best = min( max(wordwrap( remaining_words[i+1:], n-1),remaining_words[:i].join(' ')), best )
return best
I converted the C# accepted answer to JavaScript for something I was working on. Posting it here might save someone a few minutes of doing it themselves.
function WrapTextWithLimit(text, n) {
var words = text.toString().split(' ');
var cumwordwidth = [0];
words.forEach(function(word) {
cumwordwidth.push(cumwordwidth[cumwordwidth.length - 1] + word.length);
});
var totalwidth = cumwordwidth[cumwordwidth.length - 1] + words.length - 1;
var linewidth = (totalwidth - (n - 1.0)) / n;
var cost = function(i, j) {
var actuallinewidth = Math.max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
};
var best = [];
var tmp = [];
best.push(tmp);
tmp.push([0.0, -1]);
words.forEach(function(word) {
tmp.push([Number.MAX_VALUE, -1]);
});
for (var l = 1; l < n + 1; ++l)
{
tmp = [];
best.push(tmp);
for (var j = 0; j < words.length + 1; ++j)
{
var min = [best[l - 1][0][0] + cost(0, j), 0];
for (var k = 0; k < j + 1; ++k)
{
var loc = best[l - 1][k][0] + cost(k, j);
if (loc < min[0] || (loc === min[0] && k < min[1])) {
min = [loc, k];
}
}
tmp.push(min);
}
}
var lines = [];
var b = words.length;
for (var p = n; p > 0; --p) {
var a = best[p][b][1];
lines.push(words.slice(a, b).join(' '));
b = a;
}
lines.reverse();
return lines;
}
This solution improves on Mikola's.
It's better because
It doesn't use strings. You don't need to use strings and concatenate them. You just need an array of their lengths. So, because of this it's faster, also you can use this method with any kind of "element" - you just need the widths.
There was some unnecessary processing in the wrap_min_width function. It just kept going even when it went beyond the point of failure. Also, it just builds the string unnecessarily.
Added the "separator width" as an adjustable parameter.
It calculates the min width - which is really what you want.
Fixed some bugs.
This is written in Javascript:
// For testing calcMinWidth
var formatString = function (str, nLines) {
var words = str.split(" ");
var elWidths = words.map(function (s, i) {
return s.length;
});
var width = calcMinWidth(elWidths, 1, nLines, 0.1);
var format = function (width)
{
var lines = [];
var curLine = null;
var curLineLength = 0;
for (var i = 0; i < words.length; ++i) {
var word = words[i];
var elWidth = elWidths[i];
if (curLineLength + elWidth > width)
{
lines.push(curLine.join(" "));
curLine = [word];
curLineLength = elWidth;
continue;
}
if (i === 0)
curLine = [word];
else
{
curLineLength += 1;
curLine.push(word);
}
curLineLength += elWidth;
}
if (curLine !== null)
lines.push(curLine.join(" "));
return lines.join("\n");
};
return format(width);
};
var calcMinWidth = function (elWidths, separatorWidth, lines, tolerance)
{
var testFit = function (width)
{
var nCurLine = 1;
var curLineLength = 0;
for (var i = 0; i < elWidths.length; ++i) {
var elWidth = elWidths[i];
if (curLineLength + elWidth > width)
{
if (elWidth > width)
return false;
if (++nCurLine > lines)
return false;
curLineLength = elWidth;
continue;
}
if (i > 0)
curLineLength += separatorWidth;
curLineLength += elWidth;
}
return true;
};
var hi = 0;
var lo = null;
for (var i = 0; i < elWidths.length; ++i) {
var elWidth = elWidths[i];
if (i > 0)
hi += separatorWidth;
hi += elWidth;
if (lo === null || elWidth > lo)
lo = elWidth;
}
if (lo === null)
lo = 0;
while (hi - lo > tolerance)
{
var guess = (hi + lo) / 2;
if (testFit(guess))
hi = guess;
else
lo = guess;
}
return hi;
};

finding longest sequence of a particular value

I want to find the longest sequence of a particular number i.e. 1 appearing in an array. Suppose the array is {1,0,0,0,1,1,1,1,0,0,1,1}; the answer should be 4 as one appears at most four times consecutively.
Use run length encoding.
In R, it's just
max(rle(x)$lengths)
Start with an array of numbers, A, find the longest
contiguous run of some number N in A.
Pseudo C...
MaxRun = 0 /* Longest run so far */
for (i = 0; i < length(A);) {
if A[i] = N {
/* Potential run of N's... */
/* Scan backward for first N in run */
for (j = i; j > 0 & A[j-1] = N; j--);
/* Scan forward to last N in run */
for (k = i; k < length(A)-1 & A[k+1] = N; k++);
/* Check to see if longer run found... */
if (k-j+1 > MaxRun) then MaxRun = k-j+1;
i = k /* jump i to last N found */
}
i = i + MaxRun + 1 /* Jump by longest run plus 1 */
}
MaxRun is the answer
The idea is that once you find a contiguous run of N's you can
jump ahead at least that far in the array before checking for
another candidate.
This algorithm has a possible sublinear run time because of the jump factor. Worst case is that every A[i] will be examined.
There will be more efficient methods, but this is what i got for now (C#):
int count = 0;
int maxCount = 0;
for (int i = 0; i < someArray.Count(); i++)
{
if (someArray[i] == 1)
{
count++;
}
else
{
if(count > maxCount)
{
maxCount = count;
}
count = 0;
}
}
A = array, L = its length
cnt = 0
max = 0
for i = 0 .. L - 1
if A[i] == 0
if (cnt > max) max = cnt
cnt = 0
else
cnt = cnt + 1
if (cnt > max) max = cnt
Here is another linear solution, idea is to maintain two runners. On the beginning boundary of 1 the 1st runner waits until 2nd runner has reached the end (i.e 0).
int i = 0, j= 0, max = 0, n = A.length;
while ( j < n ) {
if (j == (n-1)) { // reached boundary
j = ( A[j] == 1) ? j++ : j;
int k = j-i;
if ( k > max ) { max = k;}
}
else if ( A[j] == 1 ) { j++; }// increment 2nd runner
else {
int k = j-i;
if ( k > max ) { max = k;}
j++; i = j;
}
}
max is answer.

Resources